Commits · 4f948db1915ff05e4ce0fd98e6323db6a3ec0fc0 · nexedi / linux

18 Mar, 2010 8 commits

netfilter: xtables: remove almost-unused xt_match_param.data member · 4f948db1

Jan Engelhardt authored Mar 18, 2010

This member is taking up a "long" per match, yet is only used by one
module out of the roughly 90 modules, ip6t_hbh. ip6t_hbh can be
restructured a little to accomodate for the lack of the .data member.
This variant uses checking the par->match address, which should avoid
having to add two extra functions, including calls, i.e.

(hbh_mt6: call hbhdst_mt6(skb, par, NEXTHDR_OPT),
dst_mt6: call hbhdst_mt6(skb, par, NEXTHDR_DEST))
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

4f948db1

netfilter: update documentation fields of x_tables.h · 16599786
Jan Engelhardt authored Mar 18, 2010
```
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
```
16599786

netfilter: xtables: make use of caller family rather than match family · aa5fa318

Jan Engelhardt authored Mar 18, 2010

The matches can have .family = NFPROTO_UNSPEC, and though that is not
the case for the touched modules, it seems better to just use the
nfproto from the caller.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

aa5fa318

netfilter: xtables: resort osf kconfig text · 115bc8f2

Jan Engelhardt authored Mar 16, 2010

Restore alphabetical ordering of the list and put the xt_osf option
into its 'right' place again.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

115bc8f2

netfilter: xtables: limit xt_mac to ethernet devices · e5042a29

Jan Engelhardt authored Mar 16, 2010

I do not see a point of allowing the MAC module to work with devices
that don't possibly have one, e.g. various tunnel interfaces such as
tun and sit.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

e5042a29

netfilter: xtables: clean up xt_mac match routine · 1d1c397d
Jan Engelhardt authored Mar 16, 2010
```
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
```
1d1c397d

netfilter: xtables: do without explicit XT_ALIGN · 7d5f7ed8

Jan Engelhardt authored Mar 09, 2010

XT_ALIGN is already applied on matchsize/targetsize in x_tables.c,
so it is not strictly needed in the extensions.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

7d5f7ed8

Merge branch 'master' of ../nf-2.6 · e8a96f69
Patrick McHardy authored Mar 18, 2010

e8a96f69

17 Mar, 2010 32 commits

netfilter: remove unused headers in net/netfilter/nfnetlink.c · c01ae818

Zhitong Wang authored Mar 17, 2010

Remove unused headers in net/netfilter/nfnetlink.c
Signed-off-by: Zhitong Wang <zhitong.wangzt@alibaba-inc.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

c01ae818

netfilter: xt_recent: check for unsupported user space flags · 606a9a02
Tim Gardner authored Mar 17, 2010
```
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
```
606a9a02

netfilter: xt_recent: add an entry reaper · 0079c5ae

Tim Gardner authored Mar 16, 2010

One of the problems with the way xt_recent is implemented is that
there is no efficient way to remove expired entries. Of course,
one can write a rule '-m recent --remove', but you have to know
beforehand which entry to delete. This commit adds reaper
logic which checks the head of the LRU list when a rule
is invoked that has a '--seconds' value and XT_RECENT_REAP set. If an
entry ceases to accumulate time stamps, then it will eventually bubble
to the top of the LRU list where it is then reaped.
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

0079c5ae

netfilter: xt_recent: remove old proc directory · 5be4a4f5

Jan Engelhardt authored Mar 01, 2010

The compat option was introduced in October 2008.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

5be4a4f5

netfilter: xt_recent: update description · 06bf514e

Jan Engelhardt authored Feb 28, 2010

It had IPv6 for quite a while already :-)
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

06bf514e

netfilter: ebt_ip6: add principal maintainer in a MODULE_AUTHOR tag · 8244f4ba
Jan Engelhardt authored Feb 28, 2010
```
Cc: Kuo-Lang Tseng <kuo-lang.tseng@intel.com>
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
```
8244f4ba
netfilter: update my email address · 408ffaa4
Jan Engelhardt authored Feb 28, 2010
```
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
```
408ffaa4
netfilter: xtables: schedule xt_NOTRACK for removal · 0cb47ea2
Jan Engelhardt authored Mar 16, 2010
```
It is being superseded by xt_CT (-j CT --notrack).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
```
0cb47ea2
netfilter: xtables: merge xt_CONNMARK into xt_connmark · b8f00ba2
Jan Engelhardt authored Feb 26, 2010
```
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
```
b8f00ba2

netfilter: xtables: merge xt_MARK into xt_mark · 28b94988

Jan Engelhardt authored Feb 28, 2009

Two arguments for combining the two:
- xt_mark is pretty useless without xt_MARK
- the actual code is so small anyway that the kmod metadata and the module
  in its loaded state totally outweighs the combined actual code size.

i586-before:
-rw-r--r-- 1 jengelh users 3821 Feb 10 01:01 xt_MARK.ko
-rw-r--r-- 1 jengelh users 2592 Feb 10 00:04 xt_MARK.o
-rw-r--r-- 1 jengelh users 3274 Feb 10 01:01 xt_mark.ko
-rw-r--r-- 1 jengelh users 2108 Feb 10 00:05 xt_mark.o
   text    data     bss     dec     hex filename
    354     264       0     618     26a xt_MARK.o
    223     176       0     399     18f xt_mark.o
And the runtime size is like 14 KB.

i586-after:
-rw-r--r-- 1 jengelh users 3264 Feb 18 17:28 xt_mark.o
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

28b94988

netfilter: xtables: add comment markers to Xtables Kconfig · 44c58731
Jan Engelhardt authored Feb 26, 2010
```
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
```
44c58731
netfilter: xt_NFQUEUE: consolidate v4/v6 targets into one · f76a47c8
Jan Engelhardt authored Jun 05, 2009
```
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
```
f76a47c8
netfilter: xt_CT: par->family is an nfproto · 076f7839
Jan Engelhardt authored Mar 11, 2010
```
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
```
076f7839
e1000e: Fix build with CONFIG_PM disabled. · e50208a0
David S. Miller authored Mar 16, 2010
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
e50208a0

drivers/net/e100.c: Use pr_<level> and netif_<level> · fa05e1ad

Joe Perches authored Mar 16, 2010

Convert DPRINTK, commonly used for debugging, to netif_<level>
Remove #define PFX
Use #define pr_fmt
Consistently use no periods for non-sentence logging messages
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa05e1ad

NET: Support clause 45 MDIO commands at the MDIO bus level · abf35df2

Jason Gunthorpe authored Mar 09, 2010

IEEE 802.3ae clause 45 specifies a somewhat modified MDIO protocol
for use by 10GIGE phys. The main change is a 21 bit address split into
a 5 bit device ID and a 16 bit register offset. The definition is designed
so that normal and extended devices can run on the same MDIO bus.

Extend mdio-bitbang to do the new protocol. At the MDIO bus level the
protocol is requested by or'ing MII_ADDR_C45 into the register offset.

Make phy_read/phy_write/etc pass a full 32 bit register offset.

This does not attempt to make the phy layer support C45 style PHYs, just
to provide the MDIO bus support.

Tested against a Broadcom 10GE phy with ID 0x206034, and several
Broadcom 10/100/1000 Phys in normal mode.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

abf35df2

e1000e / PCI / PM: Add basic runtime PM support (rev. 4) · 23606cf5

Rafael J. Wysocki authored Mar 14, 2010

Use the PCI runtime power management framework to add basic PCI
runtime PM support to the e1000e driver.  Namely, make the driver
suspend the device when the link is off and set it up for generating
a wakeup event after the link has been detected again.  [This
feature is disabled until the user space enables it with the help of
the /sys/devices/.../power/contol device attribute.]

Based on a patch from Matthew Garrett.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

23606cf5

r8169 / PCI / PM: Add simplified runtime PM support (rev. 3) · e1759441

Rafael J. Wysocki authored Mar 14, 2010

Use the PCI runtime power management framework to add basic PCI
runtime PM support to the r8169 driver.  Namely, make the driver
suspend the device when the link is not present and set it up for
generating a wakeup event after the link has been detected again.
[This feature is disabled until the user space enables it with the
help of the /sys/devices/.../power/contol device attribute.]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1759441

net: convert multiple drivers to use netdev_for_each_mc_addr, part7 · ff6e2163

Jiri Pirko authored Mar 01, 2010

In mlx4, using char * to store mc address in private structure instead.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff6e2163

drivers/net/ks*: Use netdev_<level>, netif_<level> and pr_<level> · 0dc7d2b3

Joe Perches authored Feb 27, 2010

I'm not sure this is correct.

It changes logging macros from:
	dev_<level>(&ks->spidev->dev,
to
	netdev_<level>(ks->netdev,

Comments?

Use netdev_<level>
Use netif_<level>
Use pr_<level>
Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
Add missing line to message in ks8851_remove
Change kmalloc/memset(,0) to kzalloc
Remove ks_<level> macros
Consolidation code into set_media_state
Signed-off-by: David S. Miller <davem@davemloft.net>

0dc7d2b3

tipc: Allow retransmission of cloned buffers · ca509101

Neil Horman authored Mar 15, 2010

Forward port commit
fc477e160af086f6e30c3d4fdf5f5c000d29beb5
from git://tipc.cslab.ericsson.net/pub/git/people/allan/tipc.git

Origional commit message:

Allow retransmission of cloned buffers

This patch fixes an issue with TIPC's message retransmission logic
that prevented retransmission of clone sk_buffs.  Originally intended
as a means of avoiding wasted work in retransmitting messages that
were still on the driver's outbound queue, it also prevented TIPC
from retransmitting messages through other means -- such as the
secondary bearer of the broadcast link, or another interface in a
set of bonded interfaces.  This fix removes existing checks for
cloned sk_buffs that prevented such retransmission.
Origionally-Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ca509101

tipc: Increase frequency of load distribution over broadcast link · 1a624832

Neil Horman authored Mar 15, 2010

Forward port commit 29eb572941501c40ac6e62dbc5043bf9ee76ee56
from git://tipc.cslab.ericsson.net/pub/git/people/allan/tipc.git

Origional commit message:
Increase frequency of load distribution over broadcast link

This patch enhances the behavior of TIPC's broadcast link so that it
alternates between redundant bearers (if available) after every
message sent, rather than after every 10 messages.  This change helps
to speed up delivery of retransmitted messages by ensuring that
they are not sent repeatedly over a bearer that is no longer working,
but not yet recognized as failed.

Tested by myself in the latest net-2.6 tree using the tipc sanity test suite
Origionally-signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

bcast.c |   35 ++++++++++++++---------------------
1 file changed, 14 insertions(+), 21 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>

1a624832

net: core: add IFLA_STATS64 support · 10708f37

Jan Engelhardt authored Mar 11, 2010

`ip -s link` shows interface counters truncated to 32 bit. This is
because interface statistics are transported only in 32-bit quantity
to userspace. This commit adds a new IFLA_STATS64 attribute that
exports them in full 64 bit.

References: http://lkml.indiana.edu/hypermail/linux/kernel/0307.3/0215.htmlSigned-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

10708f37

net: tcp: make veno selectable as default congestion module · 6ce1a6df
Jan Engelhardt authored Mar 11, 2010
```
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
6ce1a6df
net: tcp: make hybla selectable as default congestion module · dd2acaa7
Jan Engelhardt authored Mar 11, 2010
```
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
dd2acaa7

net: remove rcu locking from fib_rules_event() · 2fb3573d

Eric Dumazet authored Mar 09, 2010

We hold RTNL at this point and dont use RCU variants of list traversals,
we dont need rcu_read_lock()/rcu_read_unlock()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2fb3573d

bridge: per-cpu packet statistics (v3) · 14bb4789

stephen hemminger authored Mar 02, 2010

The shared packet statistics are a potential source of slow down
on bridged traffic. Convert to per-cpu array, but only keep those
statistics which change per-packet.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

14bb4789

rps: Receive Packet Steering · 0a9627f2

Tom Herbert authored Mar 16, 2010

This patch implements software receive side packet steering (RPS).  RPS
distributes the load of received packet processing across multiple CPUs.

Problem statement: Protocol processing done in the NAPI context for received
packets is serialized per device queue and becomes a bottleneck under high
packet load.  This substantially limits pps that can be achieved on a single
queue NIC and provides no scaling with multiple cores.

This solution queues packets early on in the receive path on the backlog queues
of other CPUs.   This allows protocol processing (e.g. IP and TCP) to be
performed on packets in parallel.   For each device (or each receive queue in
a multi-queue device) a mask of CPUs is set to indicate the CPUs that can
process packets. A CPU is selected on a per packet basis by hashing contents
of the packet header (e.g. the TCP or UDP 4-tuple) and using the result to index
into the CPU mask.  The IPI mechanism is used to raise networking receive
softirqs between CPUs.  This effectively emulates in software what a multi-queue
NIC can provide, but is generic requiring no device support.

Many devices now provide a hash over the 4-tuple on a per packet basis
(e.g. the Toeplitz hash).  This patch allow drivers to set the HW reported hash
in an skb field, and that value in turn is used to index into the RPS maps.
Using the HW generated hash can avoid cache misses on the packet when
steering it to a remote CPU.

The CPU mask is set on a per device and per queue basis in the sysfs variable
/sys/class/net/<device>/queues/rx-<n>/rps_cpus.  This is a set of canonical
bit maps for receive queues in the device (numbered by <n>).  If a device
does not support multi-queue, a single variable is used for the device (rx-0).

Generally, we have found this technique increases pps capabilities of a single
queue device with good CPU utilization.  Optimal settings for the CPU mask
seem to depend on architectures and cache hierarcy.  Below are some results
running 500 instances of netperf TCP_RR test with 1 byte req. and resp.
Results show cumulative transaction rate and system CPU utilization.

e1000e on 8 core Intel
   Without RPS: 108K tps at 33% CPU
   With RPS:    311K tps at 64% CPU

forcedeth on 16 core AMD
   Without RPS: 156K tps at 15% CPU
   With RPS:    404K tps at 49% CPU

bnx2x on 16 core AMD
   Without RPS  567K tps at 61% CPU (4 HW RX queues)
   Without RPS  738K tps at 96% CPU (8 HW RX queues)
   With RPS:    854K tps at 76% CPU (4 HW RX queues)

Caveats:
- The benefits of this patch are dependent on architecture and cache hierarchy.
Tuning the masks to get best performance is probably necessary.
- This patch adds overhead in the path for processing a single packet.  In
a lightly loaded server this overhead may eliminate the advantages of
increased parallelism, and possibly cause some relative performance degradation.
We have found that masks that are cache aware (share same caches with
the interrupting CPU) mitigate much of this.
- The RPS masks can be changed dynamically, however whenever the mask is changed
this introduces the possibility of generating out of order packets.  It's
probably best not change the masks too frequently.
Signed-off-by: Tom Herbert <therbert@google.com>

 include/linux/netdevice.h |   32 ++++-
 include/linux/skbuff.h    |    3 +
 net/core/dev.c            |  335 +++++++++++++++++++++++++++++++++++++--------
 net/core/net-sysfs.c      |  225 ++++++++++++++++++++++++++++++-
 net/core/skbuff.c         |    2 +
 5 files changed, 538 insertions(+), 59 deletions(-)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0a9627f2

RDS: Enable per-cpu workqueue threads · 768bbedf

Tina Yang authored Mar 11, 2010

Create per-cpu workqueue threads instead of a single
krdsd thread. This is a step towards better scalability.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

768bbedf

RDS: Do not call set_page_dirty() with irqs off · 561c7df6

Andy Grover authored Mar 11, 2010

set_page_dirty() unconditionally re-enables interrupts, so
if we call it with irqs off, they will be on after the call,
and that's bad. This patch moves the call after we've re-enabled
interrupts in send_drop_to(), so it's safe.

Also, add BUG_ONs to let us know if we ever do call set_page_dirty
with interrupts off.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

561c7df6

RDS: Properly unmap when getting a remote access error · 450d06c0

Sherman Pun authored Mar 11, 2010

If the RDMA op has aborted with a remote access error,
in addition to what we already do (tell userspace it has
completed with an error) also unmap it and put() the rm.

Otherwise, hangs may occur on arches that track maps and
will not exit without proper cleanup.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

450d06c0

RDS: only put sockets that have seen congestion on the poll_waitq · b98ba52f

Andy Grover authored Mar 11, 2010

rds_poll_waitq's listeners will be awoken if we receive a congestion
notification. Bad performance may result because *all* polled sockets
contend for this single lock. However, it should not be necessary to
wake pollers when a congestion update arrives if they have never
experienced congestion, and not putting these on the waitq will
hopefully greatly reduce contention.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b98ba52f