[NET]: Close NETIF_F_LLTX race conditions.

When drivers other than loopback were using the LLTX feature a race window was present. While sending queued packets, the packet scheduler layer drops the queue lock then calls directly into the drivers xmit handler. The driver then grabs it's private TX lock and goes to work. However, as soon as we've dropped the queue lock another thread doing TX processing for that card can execute a netif_stop_queue() due to the TX queue filling up. This race window causes problems because a properly coded driver should never end up in it's ->hard_start_xmit() handler if the queue on the device has been stopped and we even BUG() trap for this condition in all of the device drivers. That is how this race window was discovered by Roland and the Infiniband folks. Various suggestions were made to close this race. One of which involved holding onto the queue lock all the way into the ->hard_start_xmit() routine. Then having the driver drop that lock only after taking it's private TX lock. This solution was deemed grotty because it is not wise to put queueing discipline internals into the device drivers. The solution taken here, which is based upon ideas from Stephen Hemminger, is twofold: 1) Leave LLTX around for purely software devices that need no locking at all for TX processing. The existing example is loopback, although all tunnel devices could be converted in this way too. 2) Stop trying to use LLTX for the other devices. Instead achieve the same goal using a different mechanism. For #2, the thing we were trying to achieve with LLTX was to eliminate excess locking. We accomplish that now by letting the device driver use dev->xmit_lock directly instead of a seperate priv->tx_lock of some sort. In order to allow that, we had to turn dev->xmit_lock into a hardware IRQ disabling lock instead of a BH disabling one. Signed-off-by: David S. Miller <davem@davemloft.net>

[NET]: Close NETIF_F_LLTX race conditions.
When drivers other than loopback were using the LLTX feature a race window was present. While sending queued packets, the packet scheduler layer drops the queue lock then calls directly into the drivers xmit handler. The driver then grabs it's private TX lock and goes to work. However, as soon as we've dropped the queue lock another thread doing TX processing for that card can execute a netif_stop_queue() due to the TX queue filling up. This race window causes problems because a properly coded driver should never end up in it's ->hard_start_xmit() handler if the queue on the device has been stopped and we even BUG() trap for this condition in all of the device drivers. That is how this race window was discovered by Roland and the Infiniband folks. Various suggestions were made to close this race. One of which involved holding onto the queue lock all the way into the ->hard_start_xmit() routine. Then having the driver drop that lock only after taking it's private TX lock. This solution was deemed grotty because it is not wise to put queueing discipline internals into the device drivers. The solution taken here, which is based upon ideas from Stephen Hemminger, is twofold: 1) Leave LLTX around for purely software devices that need no locking at all for TX processing. The existing example is loopback, although all tunnel devices could be converted in this way too. 2) Stop trying to use LLTX for the other devices. Instead achieve the same goal using a different mechanism. For #2, the thing we were trying to achieve with LLTX was to eliminate excess locking. We accomplish that now by letting the device driver use dev->xmit_lock directly instead of a seperate priv->tx_lock of some sort. In order to allow that, we had to turn dev->xmit_lock into a hardware IRQ disabling lock instead of a BH disabling one. Signed-off-by: David S. Miller <davem@davemloft.net>
4debb9ea · David S. Miller · a3c24c0b · 4debb9ea · 4debb9ea · 4debb9ea
Commit 4debb9ea authored Jan 19, 2005 by David S. Miller
18 changed files
--- a/Documentation/networking/netdevices.txt
+++ b/Documentation/networking/netdevices.txt
@@ -45,10 +45,9 @@ dev->hard_start_xmit:
 	Synchronization: dev->xmit_lock spinlock.
 	When the driver sets NETIF_F_LLTX in dev->features this will be
 	called without holding xmit_lock. In this case the driver 
-	has to lock by itself when needed. It is recommended to use a try lock
-	for this and return -1 when the spin lock fails. 
-	The locking there should also properly protect against 
-	set_multicast_list
+	has to execute it's transmission routine in a completely lockless
+	manner.  It is recommended only for queueless devices such
+	loopback and tunnels.
 	Context: BHs disabled
 	Notes: netif_queue_stopped() is guaranteed false
 	Return codes: 
@@ -56,8 +55,6 @@ dev->hard_start_xmit:
 	o NETDEV_TX_BUSY Cannot transmit packet, try later 
 	  Usually a bug, means queue start/stop flow control is broken in
 	  the driver. Note: the driver must NOT put the skb in its DMA ring.
-	o NETDEV_TX_LOCKED Locking failed, please retry quickly.
-	  Only valid when NETIF_F_LLTX is set.

 dev->tx_timeout:
 	Synchronization: dev->xmit_lock spinlock.

--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -104,10 +104,10 @@ struct ipoib_buf {
 };

 /*
- * Device private locking: tx_lock protects members used in TX fast
- * path (and we use LLTX so upper layers don't do extra locking).
- * lock protects everything else.  lock nests inside of tx_lock (ie
- * tx_lock must be acquired first if needed).
+ * Device private locking: netdev->xmit_lock protects members used
+ * in TX fast path.
+ * lock protects everything else.  lock nests inside of xmit_lock (ie
+ * xmit_lock must be acquired first if needed).
 */
 struct ipoib_dev_priv {
 	spinlock_t lock;
@@ -150,7 +150,6 @@ struct ipoib_dev_priv {

 	struct ipoib_buf *rx_ring;

-	spinlock_t        tx_lock;
 	struct ipoib_buf *tx_ring;
 	unsigned          tx_head;
 	unsigned          tx_tail;

--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -247,12 +247,12 @@ static void ipoib_ib_handle_wc(struct net_device *dev,

 		dev_kfree_skb_any(tx_req->skb);

-		spin_lock_irqsave(&priv->tx_lock, flags);
+		spin_lock_irqsave(&dev->xmit_lock, flags);
 		++priv->tx_tail;
 		if (netif_queue_stopped(dev) &&
 		    priv->tx_head - priv->tx_tail <= IPOIB_TX_RING_SIZE / 2)
 			netif_wake_queue(dev);
-		spin_unlock_irqrestore(&priv->tx_lock, flags);
+		spin_unlock_irqrestore(&dev->xmit_lock, flags);

 		if (wc->status != IB_WC_SUCCESS &&
 		    wc->status != IB_WC_WR_FLUSH_ERR)

--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -411,7 +411,7 @@ static void neigh_add_path(struct sk_buff *skb, struct net_device *dev)

 	/*
 	 * We can only be called from ipoib_start_xmit, so we're
-	 * inside tx_lock -- no need to save/restore flags.
+	 * inside dev->xmit_lock -- no need to save/restore flags.
 	 */
 	spin_lock(&priv->lock);

@@ -483,7 +483,7 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev,

 	/*
 	 * We can only be called from ipoib_start_xmit, so we're
-	 * inside tx_lock -- no need to save/restore flags.
+	 * inside dev->xmit_lock -- no need to save/restore flags.
 	 */
 	spin_lock(&priv->lock);

@@ -526,27 +526,11 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev,
 	spin_unlock(&priv->lock);
 }

+/* Called with dev->xmit_lock held and IRQs disabled.  */
 static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ipoib_neigh *neigh;
-	unsigned long flags;
-
-	local_irq_save(flags);
-	if (!spin_trylock(&priv->tx_lock)) {
-		local_irq_restore(flags);
-		return NETDEV_TX_LOCKED;
-	}
-
-	/*
-	 * Check if our queue is stopped.  Since we have the LLTX bit
-	 * set, we can't rely on netif_stop_queue() preventing our
-	 * xmit function from being called with a full queue.
-	 */
-	if (unlikely(netif_queue_stopped(dev))) {
-		spin_unlock_irqrestore(&priv->tx_lock, flags);
-		return NETDEV_TX_BUSY;
-	}

 	if (skb->dst && skb->dst->neighbour) {
 		if (unlikely(!*to_ipoib_neigh(skb->dst->neighbour))) {
@@ -601,7 +585,6 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}

 out:
-	spin_unlock_irqrestore(&priv->tx_lock, flags);

 	return NETDEV_TX_OK;
 }
@@ -797,7 +780,7 @@ static void ipoib_setup(struct net_device *dev)
 	dev->addr_len 		 = INFINIBAND_ALEN;
 	dev->type 		 = ARPHRD_INFINIBAND;
 	dev->tx_queue_len 	 = IPOIB_TX_RING_SIZE * 2;
-	dev->features            = NETIF_F_VLAN_CHALLENGED | NETIF_F_LLTX;
+	dev->features            = NETIF_F_VLAN_CHALLENGED;

 	/* MTU will be reset when mcast join happens */
 	dev->mtu 		 = IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN;
@@ -812,7 +795,6 @@ static void ipoib_setup(struct net_device *dev)
 	priv->dev = dev;

 	spin_lock_init(&priv->lock);
-	spin_lock_init(&priv->tx_lock);

 	init_MUTEX(&priv->mcast_mutex);
 	init_MUTEX(&priv->vlan_mutex);

--- a/drivers/net/e1000/e1000.h
+++ b/drivers/net/e1000/e1000.h
@@ -209,7 +209,6 @@ struct e1000_adapter {

 	/* TX */
 	struct e1000_desc_ring tx_ring;
-	spinlock_t tx_lock;
 	uint32_t txd_cmd;
 	uint32_t tx_int_delay;
 	uint32_t tx_abs_int_delay;

--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -291,7 +291,9 @@ e1000_up(struct e1000_adapter *adapter)
 			e1000_phy_reset(&adapter->hw);
 	}

+	spin_lock_irq(&netdev->xmit_lock);
 	e1000_set_multi(netdev);
+	spin_unlock_irq(&netdev->xmit_lock);

 	e1000_restore_vlan(adapter);

@@ -520,9 +522,6 @@ e1000_probe(struct pci_dev *pdev,
 	if(pci_using_dac)
 		netdev->features |= NETIF_F_HIGHDMA;

- 	/* hard_start_xmit is safe against parallel locking */
- 	netdev->features |= NETIF_F_LLTX; 
- 
 	/* before reading the EEPROM, reset the controller to 
 	 * put the device in a known good starting state */
 	
@@ -732,7 +731,6 @@ e1000_sw_init(struct e1000_adapter *adapter)

 	atomic_set(&adapter->irq_sem, 1);
 	spin_lock_init(&adapter->stats_lock);
-	spin_lock_init(&adapter->tx_lock);

 	return 0;
 }
@@ -1293,6 +1291,8 @@ e1000_set_mac(struct net_device *netdev, void *p)
 * list or the network interface flags are updated.  This routine is
 * responsible for configuring the hardware for proper multicast,
 * promiscuous mode, and all-multi behavior.
+ *
+ * Called with netdev->xmit_lock held and IRQs disabled.
 **/

 static void
@@ -1304,12 +1304,9 @@ e1000_set_multi(struct net_device *netdev)
 	uint32_t rctl;
 	uint32_t hash_value;
 	int i;
-	unsigned long flags;

 	/* Check for Promiscuous and All Multicast modes */

-	spin_lock_irqsave(&adapter->tx_lock, flags);
-
 	rctl = E1000_READ_REG(hw, RCTL);

 	if(netdev->flags & IFF_PROMISC) {
@@ -1358,8 +1355,6 @@ e1000_set_multi(struct net_device *netdev)

 	if(hw->mac_type == e1000_82542_rev2_0)
 		e1000_leave_82542_rst(adapter);
-
-	spin_unlock_irqrestore(&adapter->tx_lock, flags);
 }

 /* Need to wait a few seconds after link up to get diagnostic information from
@@ -1786,6 +1781,8 @@ e1000_82547_fifo_workaround(struct e1000_adapter *adapter, struct sk_buff *skb)
 }

 #define TXD_USE_COUNT(S, X) (((S) >> (X)) + 1 )
+
+/* Called with dev->xmit_lock held and interrupts disabled.  */
 static int
 e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 {
@@ -1794,7 +1791,6 @@ e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 	unsigned int max_txd_pwr = E1000_MAX_TXD_PWR;
 	unsigned int tx_flags = 0;
 	unsigned int len = skb->len;
-	unsigned long flags;
 	unsigned int nr_frags = 0;
 	unsigned int mss = 0;
 	int count = 0;
@@ -1838,18 +1834,10 @@ e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 	if(adapter->pcix_82544)
 		count += nr_frags;

- 	local_irq_save(flags); 
- 	if (!spin_trylock(&adapter->tx_lock)) { 
- 		/* Collision - tell upper layer to requeue */ 
- 		local_irq_restore(flags); 
- 		return NETDEV_TX_LOCKED; 
- 	} 
-
 	/* need: count + 2 desc gap to keep tail from touching
 	 * head, otherwise try next time */
 	if(unlikely(E1000_DESC_UNUSED(&adapter->tx_ring) < count + 2)) {
 		netif_stop_queue(netdev);
-		spin_unlock_irqrestore(&adapter->tx_lock, flags);
 		return NETDEV_TX_BUSY;
 	}

@@ -1857,7 +1845,6 @@ e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 		if(unlikely(e1000_82547_fifo_workaround(adapter, skb))) {
 			netif_stop_queue(netdev);
 			mod_timer(&adapter->tx_fifo_stall_timer, jiffies);
-			spin_unlock_irqrestore(&adapter->tx_lock, flags);
 			return NETDEV_TX_BUSY;
 		}
 	}
@@ -1884,7 +1871,6 @@ e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 	if(unlikely(E1000_DESC_UNUSED(&adapter->tx_ring) < MAX_SKB_FRAGS + 2))
 		netif_stop_queue(netdev);

-	spin_unlock_irqrestore(&adapter->tx_lock, flags);
 	return NETDEV_TX_OK;
 }

@@ -2234,13 +2220,13 @@ e1000_clean_tx_irq(struct e1000_adapter *adapter)

 	tx_ring->next_to_clean = i;

-	spin_lock(&adapter->tx_lock);
+	spin_lock(&netdev->xmit_lock);

 	if(unlikely(cleaned && netif_queue_stopped(netdev) &&
 		    netif_carrier_ok(netdev)))
 		netif_wake_queue(netdev);

-	spin_unlock(&adapter->tx_lock);
+	spin_unlock(&netdev->xmit_lock);

 	return cleaned;
 }
@@ -2819,7 +2805,10 @@ e1000_suspend(struct pci_dev *pdev, uint32_t state)

 	if(wufc) {
 		e1000_setup_rctl(adapter);
+
+		spin_lock_irq(&netdev->xmit_lock);
 		e1000_set_multi(netdev);
+		spin_unlock_irq(&netdev->xmit_lock);

 		/* turn on all-multi mode if wake on multicast is enabled */
 		if(adapter->wol & E1000_WUFC_MC) {

--- a/drivers/net/sungem.c
+++ b/drivers/net/sungem.c
--- a/drivers/net/sungem.h
+++ b/drivers/net/sungem.h
@@ -953,7 +953,6 @@ enum link_state {

 struct gem {
 	spinlock_t lock;
-	spinlock_t tx_lock;
 	void __iomem *regs;
 	int rx_new, rx_old;
 	int tx_new, tx_old;

--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
--- a/drivers/net/tg3.h
+++ b/drivers/net/tg3.h
@@ -1980,12 +1980,11 @@ struct tg3 {
 	 * lock: Held during all operations except TX packet
 	 *       processing.
 	 *
-	 * tx_lock: Held during tg3_start_xmit{,_4gbug} and tg3_tx
+	 * dev->xmit_lock: Held during tg3_start_xmit and tg3_tx
 	 *
 	 * If you want to shut up all asynchronous processing you must
-	 * acquire both locks, 'lock' taken before 'tx_lock'.  IRQs must
-	 * be disabled to take 'lock' but only softirq disabling is
-	 * necessary for acquisition of 'tx_lock'.
+	 * acquire both locks, 'lock' taken before 'xmit_lock'.  IRQs must
+	 * be disabled to take either lock.
 	 */
 	spinlock_t			lock;
 	spinlock_t			indirect_lock;
@@ -2004,8 +2003,6 @@ struct tg3 {
 	u32				tx_cons;
 	u32				tx_pending;

-	spinlock_t			tx_lock;
-
 	struct tg3_tx_buffer_desc	*tx_ring;
 	struct tx_ring_info		*tx_buffers;
 	dma_addr_t			tx_desc_mapping;

--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -76,7 +76,6 @@ struct ethtool_ops;
 /* Driver transmit return codes */
 #define NETDEV_TX_OK 0		/* driver took care of packet */
 #define NETDEV_TX_BUSY 1	/* driver tx path was busy*/
-#define NETDEV_TX_LOCKED -1	/* driver tx lock was already taken */

 /*
 *	Compute the worst case header length according to the protocols
@@ -415,7 +414,7 @@ struct net_device
 #define NETIF_F_HW_VLAN_FILTER	512	/* Receive filtering on VLAN */
 #define NETIF_F_VLAN_CHALLENGED	1024	/* Device cannot handle VLAN packets */
 #define NETIF_F_TSO		2048	/* Can offload TCP/IP segmentation */
-#define NETIF_F_LLTX		4096	/* LockLess TX */
+#define NETIF_F_LLTX		4096	/* Do not grab xmit_lock during ->hard_start_xmit */

 	/* Called after device is detached from network. */
 	void			(*uninit)(struct net_device *dev);
@@ -894,9 +893,11 @@ static inline void __netif_rx_complete(struct net_device *dev)

 static inline void netif_tx_disable(struct net_device *dev)
 {
-	spin_lock_bh(&dev->xmit_lock);
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->xmit_lock, flags);
 	netif_stop_queue(dev);
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irqrestore(&dev->xmit_lock, flags);
 }

 /* These functions live elsewhere (drivers/net/net_init.c, but related) */

--- a/net/atm/clip.c
+++ b/net/atm/clip.c
@@ -97,7 +97,7 @@ static void unlink_clip_vcc(struct clip_vcc *clip_vcc)
 		printk(KERN_CRIT "!clip_vcc->entry (clip_vcc %p)\n",clip_vcc);
 		return;
 	}
-	spin_lock_bh(&entry->neigh->dev->xmit_lock);	/* block clip_start_xmit() */
+	spin_lock_irq(&entry->neigh->dev->xmit_lock);	/* block clip_start_xmit() */
 	entry->neigh->used = jiffies;
 	for (walk = &entry->vccs; *walk; walk = &(*walk)->next)
 		if (*walk == clip_vcc) {
@@ -121,7 +121,7 @@ static void unlink_clip_vcc(struct clip_vcc *clip_vcc)
 	printk(KERN_CRIT "ATMARP: unlink_clip_vcc failed (entry %p, vcc "
 	  "0x%p)\n",entry,clip_vcc);
 out:
-	spin_unlock_bh(&entry->neigh->dev->xmit_lock);
+	spin_unlock_irq(&entry->neigh->dev->xmit_lock);
 }

 /* The neighbour entry n->lock is held. */

--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1190,7 +1190,7 @@ int __skb_linearize(struct sk_buff *skb, int gfp_mask)

 #define HARD_TX_LOCK(dev, cpu) {			\
 	if ((dev->features & NETIF_F_LLTX) == 0) {	\
-		spin_lock(&dev->xmit_lock);		\
+		spin_lock_irq(&dev->xmit_lock);		\
 		dev->xmit_lock_owner = cpu;		\
 	}						\
 }
@@ -1198,7 +1198,7 @@ int __skb_linearize(struct sk_buff *skb, int gfp_mask)
 #define HARD_TX_UNLOCK(dev) {				\
 	if ((dev->features & NETIF_F_LLTX) == 0) {	\
 		dev->xmit_lock_owner = -1;		\
-		spin_unlock(&dev->xmit_lock);		\
+		spin_unlock_irq(&dev->xmit_lock);	\
 	}						\
 }


--- a/net/core/dev_mcast.c
+++ b/net/core/dev_mcast.c
@@ -93,9 +93,9 @@ static void __dev_mc_upload(struct net_device *dev)

 void dev_mc_upload(struct net_device *dev)
 {
-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	__dev_mc_upload(dev);
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 }

 /*
@@ -107,7 +107,7 @@ int dev_mc_delete(struct net_device *dev, void *addr, int alen, int glbl)
 	int err = 0;
 	struct dev_mc_list *dmi, **dmip;

-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);

 	for (dmip = &dev->mc_list; (dmi = *dmip) != NULL; dmip = &dmi->next) {
 		/*
@@ -139,13 +139,13 @@ int dev_mc_delete(struct net_device *dev, void *addr, int alen, int glbl)
 			 */
 			__dev_mc_upload(dev);
 			
-			spin_unlock_bh(&dev->xmit_lock);
+			spin_unlock_irq(&dev->xmit_lock);
 			return 0;
 		}
 	}
 	err = -ENOENT;
 done:
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 	return err;
 }

@@ -160,7 +160,7 @@ int dev_mc_add(struct net_device *dev, void *addr, int alen, int glbl)

 	dmi1 = (struct dev_mc_list *)kmalloc(sizeof(*dmi), GFP_ATOMIC);

-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	for (dmi = dev->mc_list; dmi != NULL; dmi = dmi->next) {
 		if (memcmp(dmi->dmi_addr, addr, dmi->dmi_addrlen) == 0 &&
 		    dmi->dmi_addrlen == alen) {
@@ -176,7 +176,7 @@ int dev_mc_add(struct net_device *dev, void *addr, int alen, int glbl)
 	}

 	if ((dmi = dmi1) == NULL) {
-		spin_unlock_bh(&dev->xmit_lock);
+		spin_unlock_irq(&dev->xmit_lock);
 		return -ENOMEM;
 	}
 	memcpy(dmi->dmi_addr, addr, alen);
@@ -189,11 +189,11 @@ int dev_mc_add(struct net_device *dev, void *addr, int alen, int glbl)

 	__dev_mc_upload(dev);
 	
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 	return 0;

 done:
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 	if (dmi1)
 		kfree(dmi1);
 	return err;
@@ -205,7 +205,7 @@ int dev_mc_add(struct net_device *dev, void *addr, int alen, int glbl)

 void dev_mc_discard(struct net_device *dev)
 {
-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	
 	while (dev->mc_list != NULL) {
 		struct dev_mc_list *tmp = dev->mc_list;
@@ -216,7 +216,7 @@ void dev_mc_discard(struct net_device *dev)
 	}
 	dev->mc_count = 0;

-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 }

 #ifdef CONFIG_PROC_FS
@@ -251,7 +251,7 @@ static int dev_mc_seq_show(struct seq_file *seq, void *v)
 	struct dev_mc_list *m;
 	struct net_device *dev = v;

-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	for (m = dev->mc_list; m; m = m->next) {
 		int i;

@@ -263,7 +263,7 @@ static int dev_mc_seq_show(struct seq_file *seq, void *v)

 		seq_putc(seq, '\n');
 	}
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 	return 0;
 }


--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -188,7 +188,7 @@ static void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
 		return;
 	}

-	spin_lock(&np->dev->xmit_lock);
+	spin_lock_irq(&np->dev->xmit_lock);
 	np->dev->xmit_lock_owner = smp_processor_id();

 	/*
@@ -197,7 +197,7 @@ static void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
 	 */
 	if (netif_queue_stopped(np->dev)) {
 		np->dev->xmit_lock_owner = -1;
-		spin_unlock(&np->dev->xmit_lock);
+		spin_unlock_irq(&np->dev->xmit_lock);

 		netpoll_poll(np);
 		goto repeat;
@@ -205,7 +205,7 @@ static void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)

 	status = np->dev->hard_start_xmit(skb, np->dev);
 	np->dev->xmit_lock_owner = -1;
-	spin_unlock(&np->dev->xmit_lock);
+	spin_unlock_irq(&np->dev->xmit_lock);

 	/* transmit busy */
 	if(status) {

--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2664,12 +2664,11 @@ __inline__ void pktgen_xmit(struct pktgen_dev *pkt_dev)
 		}
 	}
 	
-	spin_lock_bh(&odev->xmit_lock);
+	spin_lock_irq(&odev->xmit_lock);
 	if (!netif_queue_stopped(odev)) {
 		u64 now;

 		atomic_inc(&(pkt_dev->skb->users));
-retry_now:
 		ret = odev->hard_start_xmit(pkt_dev->skb, odev);
 		if (likely(ret == NETDEV_TX_OK)) {
 			pkt_dev->last_ok = 1;    
@@ -2677,10 +2676,6 @@ __inline__ void pktgen_xmit(struct pktgen_dev *pkt_dev)
 			pkt_dev->seq_num++;
 			pkt_dev->tx_bytes += pkt_dev->cur_pkt_size;
 			
-		} else if (ret == NETDEV_TX_LOCKED 
-			   && (odev->features & NETIF_F_LLTX)) {
-			cpu_relax();
-			goto retry_now;
 		} else {  /* Retry it next time */
 			
 			atomic_dec(&(pkt_dev->skb->users));
@@ -2716,7 +2711,7 @@ __inline__ void pktgen_xmit(struct pktgen_dev *pkt_dev)
 		pkt_dev->next_tx_ns = 0;
        }

-	spin_unlock_bh(&odev->xmit_lock);
+	spin_unlock_irq(&odev->xmit_lock);
 	
 	/* If pkt_dev->count is zero, then run forever */
 	if ((pkt_dev->count != 0) && (pkt_dev->sofar >= pkt_dev->count)) {

--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -99,17 +99,11 @@ int qdisc_restart(struct net_device *dev)
 	if ((skb = q->dequeue(q)) != NULL) {
 		unsigned nolock = (dev->features & NETIF_F_LLTX);
 		/*
-		 * When the driver has LLTX set it does its own locking
-		 * in start_xmit. No need to add additional overhead by
-		 * locking again. These checks are worth it because
-		 * even uncongested locks can be quite expensive.
-		 * The driver can do trylock like here too, in case
-		 * of lock congestion it should return -1 and the packet
-		 * will be requeued.
+		 * When the driver has LLTX set it does not require any
+		 * locking in start_xmit.
 		 */
 		if (!nolock) {
-			if (!spin_trylock(&dev->xmit_lock)) {
-			collision:
+			if (!spin_trylock_irq(&dev->xmit_lock)) {
 				/* So, someone grabbed the driver. */
 				
 				/* It may be transient configuration error,
@@ -143,22 +137,18 @@ int qdisc_restart(struct net_device *dev)
 				if (ret == NETDEV_TX_OK) { 
 					if (!nolock) {
 						dev->xmit_lock_owner = -1;
-						spin_unlock(&dev->xmit_lock);
+						spin_unlock_irq(&dev->xmit_lock);
 					}
 					spin_lock(&dev->queue_lock);
 					return -1;
 				}
-				if (ret == NETDEV_TX_LOCKED && nolock) {
-					spin_lock(&dev->queue_lock);
-					goto collision; 
-				}
 			}

 			/* NETDEV_TX_BUSY - we need to requeue */
 			/* Release the driver */
 			if (!nolock) { 
 				dev->xmit_lock_owner = -1;
-				spin_unlock(&dev->xmit_lock);
+				spin_unlock_irq(&dev->xmit_lock);
 			} 
 			spin_lock(&dev->queue_lock);
 			q = dev->qdisc;
@@ -186,7 +176,7 @@ static void dev_watchdog(unsigned long arg)
 {
 	struct net_device *dev = (struct net_device *)arg;

-	spin_lock(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	if (dev->qdisc != &noop_qdisc) {
 		if (netif_device_present(dev) &&
 		    netif_running(dev) &&
@@ -200,7 +190,7 @@ static void dev_watchdog(unsigned long arg)
 				dev_hold(dev);
 		}
 	}
-	spin_unlock(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);

 	dev_put(dev);
 }
@@ -224,17 +214,17 @@ void __netdev_watchdog_up(struct net_device *dev)

 static void dev_watchdog_up(struct net_device *dev)
 {
-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	__netdev_watchdog_up(dev);
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 }

 static void dev_watchdog_down(struct net_device *dev)
 {
-	spin_lock_bh(&dev->xmit_lock);
+	spin_lock_irq(&dev->xmit_lock);
 	if (del_timer(&dev->watchdog_timer))
 		__dev_put(dev);
-	spin_unlock_bh(&dev->xmit_lock);
+	spin_unlock_irq(&dev->xmit_lock);
 }

 /* "NOOP" scheduler: the best scheduler, recommended for all interfaces

--- a/net/sched/sch_teql.c
+++ b/net/sched/sch_teql.c
@@ -301,12 +301,12 @@ static int teql_master_xmit(struct sk_buff *skb, struct net_device *dev)

 		switch (teql_resolve(skb, skb_res, slave)) {
 		case 0:
-			if (spin_trylock(&slave->xmit_lock)) {
+			if (spin_trylock_irq(&slave->xmit_lock)) {
 				slave->xmit_lock_owner = smp_processor_id();
 				if (!netif_queue_stopped(slave) &&
 				    slave->hard_start_xmit(skb, slave) == 0) {
 					slave->xmit_lock_owner = -1;
-					spin_unlock(&slave->xmit_lock);
+					spin_unlock_irq(&slave->xmit_lock);
 					master->slaves = NEXT_SLAVE(q);
 					netif_wake_queue(dev);
 					master->stats.tx_packets++;
@@ -314,7 +314,7 @@ static int teql_master_xmit(struct sk_buff *skb, struct net_device *dev)
 					return 0;
 				}
 				slave->xmit_lock_owner = -1;
-				spin_unlock(&slave->xmit_lock);
+				spin_unlock_irq(&slave->xmit_lock);
 			}
 			if (netif_queue_stopped(dev))
 				busy = 1;