Commits · ead81cc5fc6d996db6afb20f211241612610a07a · nexedi / linux

18 Jul, 2008 24 commits

netdevice: Move qdisc_list back into net_device proper. · ead81cc5
David S. Miller authored Jul 17, 2008
```
And give it it's own lock.
Signed-off-by: David S. Miller <davem@davemloft.net>
```
ead81cc5

pkt_sched: Kill qdisc_lock_tree usage in cls_route.c · 15b458fa

David S. Miller authored Jul 16, 2008

It just wants the qdisc tree to be synchronized, so grabbing
qdisc_root_lock() is sufficient.
Signed-off-by: David S. Miller <davem@davemloft.net>

15b458fa

pkt_sched: Remove qdisc_lock_tree usage in cls_api.c · 55dbc640

David S. Miller authored Jul 16, 2008

It just wants the qdisc tree for the filter to be synchronized.
So just BH lock qdisc_root_lock(q) instead.
Signed-off-by: David S. Miller <davem@davemloft.net>

55dbc640

pkt_sched: Use per-queue locking in shutdown_scheduler_queue. · 17715e62
David S. Miller authored Jul 16, 2008
```
This eliminates another qdisc_lock_tree user.
Signed-off-by: David S. Miller <davem@davemloft.net>
```
17715e62

pkt_sched: Perform bulk of qdisc destruction in RCU. · 8a34c5dc

David S. Miller authored Jul 17, 2008

This allows less strict control of access to the qdisc attached to a
netdev_queue.  It is even allowed to enqueue into a qdisc which is
in the process of being destroyed.  The RCU handler will toss out
those packets.

We will need this to handle sharing of a qdisc amongst multiple
TX queues.  In such a setup the lock has to be shared, so will
be inside of the qdisc itself.  At which point the netdev_queue
lock cannot be used to hard synchronize access to the ->qdisc
pointer.

One operation we have to keep inside of qdisc_destroy() is the list
deletion.  It is the only piece of state visible after the RCU quiesce
period, so we have to undo it early and under the appropriate locking.

The operations in the RCU handler do not need any looking because the
qdisc tree is no longer visible to anything at that point.
Signed-off-by: David S. Miller <davem@davemloft.net>

8a34c5dc

pkt_sched: dev_init_scheduler() does not need to lock qdisc tree. · 16361127

David S. Miller authored Jul 16, 2008

We are registering the device, there is no way anyone can get
at this object's qdiscs yet in any meaningful way.
Signed-off-by: David S. Miller <davem@davemloft.net>

16361127

pkt_sched: Schedule qdiscs instead of netdev_queue. · 37437bb2

David S. Miller authored Jul 16, 2008

When we have shared qdiscs, packets come out of the qdiscs
for multiple transmit queues.

Therefore it doesn't make any sense to schedule the transmit
queue when logically we cannot know ahead of time the TX
queue of the SKB that the qdisc->dequeue() will give us.

Just for sanity I added a BUG check to make sure we never
get into a state where the noop_qdisc is scheduled.
Signed-off-by: David S. Miller <davem@davemloft.net>

37437bb2

pkt_sched: Add and use qdisc_root() and qdisc_root_lock(). · 7698b4fc

David S. Miller authored Jul 16, 2008

When code wants to lock the qdisc tree state, the logic
operation it's doing is locking the top-level qdisc that
sits of the root of the netdev_queue.

Add qdisc_root_lock() to represent this and convert the
easiest cases.

In order for this to work out in all cases, we have to
hook up the noop_qdisc to a dummy netdev_queue.
Signed-off-by: David S. Miller <davem@davemloft.net>

7698b4fc

pkt_sched: Make QDISC_RUNNING a qdisc state. · e2627c8c

David S. Miller authored Jul 16, 2008

Currently it is associated with a netdev_queue, but when we have
qdisc sharing that no longer makes any sense.
Signed-off-by: David S. Miller <davem@davemloft.net>

e2627c8c

pkt_sched: Move gso_skb into Qdisc. · d3b753db

David S. Miller authored Jul 15, 2008

We liberate any dangling gso_skb during qdisc destruction.

It really only matters for the root qdisc.  But when qdiscs
can be shared by multiple netdev_queue objects, we can't
have the gso_skb in the netdev_queue any more.
Signed-off-by: David S. Miller <davem@davemloft.net>

d3b753db

niu: Add TX multiqueue support. · b4c21639
David S. Miller authored Jul 15, 2008
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
b4c21639
netdev: Kill plain netif_schedule() · 92831bc3
David S. Miller authored Jul 15, 2008
```
No more users.
Signed-off-by: David S. Miller <davem@davemloft.net>
```
92831bc3

netdev: Convert all drivers away from netif_schedule(). · 263ba320

David S. Miller authored Jul 15, 2008

They logically all want to trigger a schedule for all device
TX queues.
Signed-off-by: David S. Miller <davem@davemloft.net>

263ba320

net: Implement simple sw TX hashing. · 8f0f2223

David S. Miller authored Jul 15, 2008

It just xor hashes over IPv4/IPv6 addresses and ports of transport.

The only assumption it makes is that skb_network_header() is set
correctly.

With bug fixes from Eric Dumazet.
Signed-off-by: David S. Miller <davem@davemloft.net>

8f0f2223

mac80211: Reimplement WME using ->select_queue(). · 51cb6db0

David S. Miller authored Jul 15, 2008

The only behavior change is that we do not drop packets under any
circumstances.  If that is absolutely needed, we could easily add it
back.

With cleanups and help from Johannes Berg.
Signed-off-by: David S. Miller <davem@davemloft.net>

51cb6db0

netdev: Add netdev->select_queue() method. · eae792b7

David S. Miller authored Jul 15, 2008

Devices or device layers can set this to control the queue selection
performed by dev_pick_tx().

This function runs under RCU protection, which allows overriding
functions to have some way of synchronizing with things like dynamic
->real_num_tx_queues adjustments.

This makes the spinlock prefetch in dev_queue_xmit() a little bit
less effective, but that's the price right now for correctness.
Signed-off-by: David S. Miller <davem@davemloft.net>

eae792b7

netdev: netdev_priv() can now be sane again. · e3c50d5d

David S. Miller authored Jul 15, 2008

The private area of a netdev is now at a fixed offset once more.

Unfortunately, some assumptions that netdev_priv() == netdev->priv
crept back into the tree.  In particular this happened in the
loopback driver.  Make it use netdev->ml_priv.
Signed-off-by: David S. Miller <davem@davemloft.net>

e3c50d5d

netdev: Kill struct net_device_subqueue and netdev->egress_subqueue* · 6b0fb126
David S. Miller authored Jul 15, 2008
```
No longer used.
Signed-off-by: David S. Miller <davem@davemloft.net>
```
6b0fb126

net: Use queue aware tests throughout. · fd2ea0a7

David S. Miller authored Jul 17, 2008

This effectively "flips the switch" by making the core networking
and multiqueue-aware drivers use the new TX multiqueue structures.

Non-multiqueue drivers need no changes.  The interfaces they use such
as netif_stop_queue() degenerate into an operation on TX queue zero.
So everything "just works" for them.

Code that really wants to do "X" to all TX queues now invokes a
routine that does so, such as netif_tx_wake_all_queues(),
netif_tx_stop_all_queues(), etc.

pktgen and netpoll required a little bit more surgery than the others.

In particular the pktgen changes, whilst functional, could be largely
improved.  The initial check in pktgen_xmit() will sometimes check the
wrong queue, which is mostly harmless.  The thing to do is probably to
invoke fill_packet() earlier.

The bulk of the netpoll changes is to make the code operate solely on
the TX queue indicated by by the SKB queue mapping.

Setting of the SKB queue mapping is entirely confined inside of
net/core/dev.c:dev_pick_tx().  If we end up needing any kind of
special semantics (drops, for example) it will be implemented here.

Finally, we now have a "real_num_tx_queues" which is where the driver
indicates how many TX queues are actually active.

With IGB changes from Jeff Kirsher.
Signed-off-by: David S. Miller <davem@davemloft.net>

fd2ea0a7

mac80211: Temporarily mark QoS support BROKEN. · 24344d26
David S. Miller authored Jul 15, 2008
```
We will undo this after a few changsets.
Signed-off-by: David S. Miller <davem@davemloft.net>
```
24344d26

pkt_sched: Remove RR scheduler. · 1d8ae3fd

David S. Miller authored Jul 15, 2008

This actually fixes a bug added by the RR scheduler changes.  The
->bands and ->prio2band parameters were being set outside of the
sch_tree_lock() and thus could result in strange behavior and
inconsistencies.

It might be possible, in the new design (where there will be one qdisc
per device TX queue) to allow similar functionality via a TX hash
algorithm for RR but I really see no reason to export this aspect of
how these multiqueue cards actually implement the scheduling of the
the individual DMA TX rings and the single physical MAC/PHY port.
Signed-off-by: David S. Miller <davem@davemloft.net>

1d8ae3fd

netdev: Kill NETIF_F_MULTI_QUEUE. · 09e83b5d

David S. Miller authored Jul 17, 2008

There is no need for a feature bit for something that
can be tested by simply checking the TX queue count.
Signed-off-by: David S. Miller <davem@davemloft.net>

09e83b5d

netdev: Allocate multiple queues for TX. · e8a0464c

David S. Miller authored Jul 17, 2008

alloc_netdev_mq() now allocates an array of netdev_queue
structures for TX, based upon the queue_count argument.

Furthermore, all accesses to the TX queues are now vectored
through the netdev_get_tx_queue() and netdev_for_each_tx_queue()
interfaces.  This makes it easy to grep the tree for all
things that want to get to a TX queue of a net device.

Problem spots which are not really multiqueue aware yet, and
only work with one queue, can easily be spotted by grepping
for all netdev_get_tx_queue() calls that pass in a zero index.
Signed-off-by: David S. Miller <davem@davemloft.net>

e8a0464c

igb: Kill CONFIG_NETDEVICES_MULTIQUEUE references, no longer exists. · 070825b3
David S. Miller authored Jul 17, 2008
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
070825b3

17 Jul, 2008 16 commits

garp: retry sending JoinIn messages after allocation failures · 51ce7ec9

Patrick McHardy authored Jul 16, 2008

Increase reliability by retrying to send JoinIn messages after memory
allocation failures on each TRANSMIT_PDU event until it succeeds.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

51ce7ec9

core: add stat to track unresolved discards in neighbor cache · 9a6d276e

Neil Horman authored Jul 16, 2008

in __neigh_event_send, if we have a neighbour entry which is in
NUD_INCOMPLETE state, we enqueue any outbound frames to that neighbour
to the neighbours arp_queue, which is default capped to a length of 3
skbs.  If that queue exceeds its set length, it will drop an skb on
the queue to enqueue the newly arrived skb.  This results in a drop
for which we have no statistics incremented.  This patch adds an
unresolved_discards stat to /proc/net/stat/ndisc_cache to track these
lost frames.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9a6d276e

mib: add net to NET_ADD_STATS_USER · ed88098e

Pavel Emelyanov authored Jul 16, 2008

Done with NET_XXX_STATS macros :)

To be continued...
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ed88098e

mib: add net to NET_ADD_STATS_BH · f2bf415c

Pavel Emelyanov authored Jul 16, 2008

This one is tricky. 

The thing is that this macro is only used when killing tw buckets, 
but since this killer is promiscuous wrt to which net each particular
tw belongs to, I have to use it only when NET_NS is off. When the net
namespaces are on, I use the INET_INC_STATS_BH for each bucket.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f2bf415c

mib: add net to NET_INC_STATS_USER · 6f67c817

Pavel Emelyanov authored Jul 16, 2008

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

6f67c817

mib: add net to NET_INC_STATS_BH · de0744af

Pavel Emelyanov authored Jul 16, 2008

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

de0744af

mib: add net to NET_INC_STATS · 4e673444

Pavel Emelyanov authored Jul 16, 2008

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

4e673444

tcp: replace tcp_sock argument with sock in some places · 1ed83465

Pavel Emelyanov authored Jul 16, 2008

These places have a tcp_sock, but we'd prefer the sock itself to
get net from it. Fortunately, tcp_sk macro is just a type cast, so
this replace is really cheap.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

1ed83465

inet: prepare net on the stack for NET accounting macros · ca12a1a4
Pavel Emelyanov authored Jul 16, 2008
```
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
ca12a1a4

sock: add net to prot->enter_memory_pressure callback · 5c52ba17

Pavel Emelyanov authored Jul 16, 2008

The tcp_enter_memory_pressure calls NET_INC_STATS, but doesn't
have where to get the net from.

I decided to add a sk argument, not the net itself, only to factor
all the required sock_net(sk) calls inside the enter_memory_pressure 
callback itself.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c52ba17

mib: add net to TCP_ADD_STATS_USER · cf1100a7

Pavel Emelyanov authored Jul 16, 2008

Now we're done with the TCP_XXX_STATS macros.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

cf1100a7

mib: add net to TCP_DEC_STATS · 74688e48

Pavel Emelyanov authored Jul 16, 2008

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

74688e48

mib: add net to TCP_INC_STATS_BH · 63231bdd

Pavel Emelyanov authored Jul 16, 2008

Same as before - the sock is always there to get the net from,
but there are also some places with the net already saved on 
the stack.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

63231bdd

mib: add net to TCP_INC_STATS · 81cc8a75

Pavel Emelyanov authored Jul 16, 2008

Fortunately (almost) all the TCP code has a sock to get the net from :)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

81cc8a75

tcp: add net to tcp_mib_init · a9c19329

Pavel Emelyanov authored Jul 16, 2008

This one sets TCP MIBs after zeroing them, and thus requires
the net.

The existing single caller can use init_net (temporarily).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

a9c19329

mib: drop unused TCP_XXX_STATS macros · f10f8431

Pavel Emelyanov authored Jul 16, 2008

TCP_INC_STATS_USER and TCP_ADD_STATS_BH are currently unused.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f10f8431