Commits · d70308f78bb8192a76a7dc38f5f9de6c2695532b · Kirill Smelkov / linux

23 Dec, 2011 5 commits

netfilter: nat: remove module reference counting from NAT protocols · d70308f7

Patrick McHardy authored Dec 23, 2011

The only remaining user of NAT protocol module reference counting is NAT
ctnetlink support. Since this is a fairly short sequence of code, convert
over to use RCU and remove module reference counting.

Module unregistration is already protected by RCU using synchronize_rcu(),
so no further changes are necessary.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

d70308f7

netfilter: nf_nat: add missing nla_policy entry for CTA_NAT_PROTO attribute · 329fb58a
Patrick McHardy authored Dec 23, 2011
```
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
```
329fb58a

netfilter: nf_nat: use hash random for bysource hash · 4d4e61c6

Patrick McHardy authored Dec 23, 2011

Use nf_conntrack_hash_rnd in NAT bysource hash to avoid hash chain attacks.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

4d4e61c6

netfilter: nf_nat: export NAT definitions to userspace · cbc9f2f4

Patrick McHardy authored Dec 23, 2011

Export the NAT definitions to userspace. So far userspace (specifically,
iptables) has been copying the headers files from include/net. Also
rename some structures and definitions in preparation for IPv6 NAT.
Since these have never been officially exported, this doesn't affect
existing userspace code.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

cbc9f2f4

netfilter: rework user-space expectation helper support · 3d058d7b

Pablo Neira Ayuso authored Dec 18, 2011

This partially reworks bc01befd
which added userspace expectation support.

This patch removes the nf_ct_userspace_expect_list since now we
force to use the new iptables CT target feature to add the helper
extension for conntracks that have attached expectations from
userspace.

A new version of the proof-of-concept code to implement userspace
helpers from userspace is available at:

http://people.netfilter.org/pablo/userspace-conntrack-helpers/nf-ftp-helper-POC.tar.bz2

This patch also modifies the CT target to allow to set the
conntrack's userspace helper status flags. This flag is used
to tell the conntrack system to explicitly allocate the helper
extension.

This helper extension is useful to link the userspace expectations
with the master conntrack that is being tracked from one userspace
helper.

This feature fixes a problem in the current approach of the
userspace helper support. Basically, if the master conntrack that
has got a userspace expectation vanishes, the expectations point to
one invalid memory address. Thus, triggering an oops in the
expectation deletion event path.

I decided not to add a new revision of the CT target because
I only needed to add a new flag for it. I'll document in this
issue in the iptables manpage. I have also changed the return
value from EINVAL to EOPNOTSUPP if one flag not supported is
specified. Thus, in the future adding new features that only
require a new flag can be added without a new revision.

There is no official code using this in userspace (apart from
the proof-of-concept) that uses this infrastructure but there
will be some by beginning 2012.
Reported-by: Sam Roberts <vieuxtech@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

3d058d7b

18 Dec, 2011 3 commits

netfilter: ctnetlink: support individual atomic-get-and-reset of counters · c4042a33

Pablo Neira Ayuso authored Dec 14, 2011

This allows to use the get operation to atomically get-and-reset
counters.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

c4042a33

netfilter: ctnetlink: use expect instead of master tuple in get operation · 35dba1d7

Pablo Neira Ayuso authored Dec 14, 2011

Use the expect tuple (if possible) instead of the master tuple for
the get operation. If two or more expectations come from the same
master, the returned expectation may not be the one that user-space
is requesting.

This is how it works for the expect deletion operation.

Although I think that nobody has been seriously using this. We
accept both possibilities, using the expect tuple if possible.
I decided to do it like this to avoid breaking backward
compatibility.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

35dba1d7

netfilter: nf_conntrack: use atomic64 for accounting counters · b3e0bfa7

Eric Dumazet authored Dec 14, 2011

We can use atomic64_t infrastructure to avoid taking a spinlock in fast
path, and remove inaccuracies while reading values in
ctnetlink_dump_counters() and connbytes_mt() on 32bit arches.

Suggested by Pablo.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

b3e0bfa7

13 Dec, 2011 2 commits

IPVS: Modify the SH scheduler to use weights · 76ad94fc

Michael Maxim authored Dec 08, 2011

Modify the algorithm to build the source hashing hash table to add
extra slots for destinations with higher weight. This has the effect
of allowing an IPVS SH user to give more connections to hosts that
have been configured to have a higher weight.

The reason for the Kconfig change is because the size of the hash table
becomes more relevant/important if you decide to use the weights in the
manner this patch lets you. It would be conceivable that someone might
need to increase the size of that table to accommodate their
configuration, so it will be handy to be able to do that through the
regular configuration system instead of editing the source.
Signed-off-by: Michael Maxim <mike@okcupid.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

76ad94fc

netfilter: add ipv6 reverse path filter match · e26f9a48

Florian Westphal authored Aug 19, 2011

This is not merged with the ipv4 match into xt_rpfilter.c
to avoid ipv6 module dependency issues.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

e26f9a48

04 Dec, 2011 11 commits

ipv6: add ip6_route_lookup · ea6e574e

Florian Westphal authored Sep 05, 2011

like rt6_lookup, but allows caller to pass in flowi6 structure.
Will be used by the upcoming ipv6 netfilter reverse path filter
match.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ea6e574e

netfilter: add ipv4 reverse path filter match · 8f97339d

Florian Westphal authored Jul 04, 2011

This tries to do the same thing as fib_validate_source(), but differs
in several aspects.

The most important difference is that the reverse path filter built into
fib_validate_source uses the oif as iif when performing the reverse
lookup.  We do not do this, as the oif is not yet known by the time the
PREROUTING hook is invoked.

We can't wait until FORWARD chain because by the time FORWARD is invoked
ipv4 forward path may have already sent icmp messages is response
to to-be-discarded-via-rpfilter packets.

To avoid the such an additional lookup in PREROUTING, Patrick McHardy
suggested to attach the path information directly in the match
(i.e., just do what the standard ipv4 path does a bit earlier in PREROUTING).

This works, but it also has a few caveats. Most importantly, when using
marks in PREROUTING to re-route traffic based on the nfmark, -m rpfilter
would have to be used after the nfmark has been set; otherwise the nfmark
would have no effect (because the route is already attached).

Another problem would be interaction with -j TPROXY, as this target sets an
nfmark and uses ACCEPT instead of continue, i.e. such a version of
-m rpfilter cannot be used for the initial to-be-intercepted packets.

In case in turns out that the oif is required, we can add Patricks
suggestion with a new match option (e.g. --rpf-use-oif) to keep ruleset
compatibility.

Another difference to current builtin ipv4 rpfilter is that packets subject to ipsec
transformation are not automatically excluded. If you want this, simply
combine -m rpfilter with the policy match.

Packets arriving on loopback interfaces always match.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

8f97339d

net: ipv4: export fib_lookup and fib_table_lookup · 6fc01438

Florian Westphal authored Aug 25, 2011

The reverse path filter module will use fib_lookup.

If CONFIG_IP_MULTIPLE_TABLES is not set, fib_lookup is
only a static inline helper that calls fib_table_lookup,
so export that too.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

6fc01438

tcp: tcp_sendmsg() page recycling · 761965ea

Eric Dumazet authored Dec 04, 2011

If our TCP_PAGE(sk) is not shared (page_count() == 1), we can set page
offset to 0.

This permits better filling of the pages on small to medium tcp writes.

"tbench 16" results on my dev server (2x4x2 machine) :

Before : 3072 MB/s
After  : 3146 MB/s  (2.4 % gain)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

761965ea

tcp: take care of misalignments · 117632e6

Eric Dumazet authored Dec 03, 2011

We discovered that TCP stack could retransmit misaligned skbs if a
malicious peer acknowledged sub MSS frame. This currently can happen
only if output interface is non SG enabled : If SG is enabled, tcp
builds headless skbs (all payload is included in fragments), so the tcp
trimming process only removes parts of skb fragments, header stay
aligned.

Some arches cant handle misalignments, so force a head reallocation and
shrink headroom to MAX_TCP_HEADER.

Dont care about misaligments on x86 and PPC (or other arches setting
NET_IP_ALIGN to 0)

This patch introduces __pskb_copy() which can specify the headroom of
new head, and pskb_copy() becomes a wrapper on top of __pskb_copy()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

117632e6

sfc: Use kcalloc instead of kzalloc to allocate array · c2e4e25a

Thomas Meyer authored Dec 02, 2011

The advantage of kcalloc is, that will prevent integer overflows which could
result from the multiplication of number of elements and size and it is also
a bit nicer to read.

The semantic patch that makes this change is available
in https://lkml.org/lkml/2011/11/25/107Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

c2e4e25a

ll_temac: Use kcalloc instead of kzalloc to allocate array · ddf98e6d

Thomas Meyer authored Dec 02, 2011

The advantage of kcalloc is, that will prevent integer overflows which could
result from the multiplication of number of elements and size and it is also
a bit nicer to read.

The semantic patch that makes this change is available
in https://lkml.org/lkml/2011/11/25/107Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ddf98e6d

enic: Use kcalloc instead of kzalloc to allocate array · a1de2219

Thomas Meyer authored Nov 29, 2011

The advantage of kcalloc is, that will prevent integer overflows which could
result from the multiplication of number of elements and size and it is also
a bit nicer to read.

The semantic patch that makes this change is available
in https://lkml.org/lkml/2011/11/25/107Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

a1de2219

bnx2x: Use kcalloc instead of kzalloc to allocate array · 01e23742

Thomas Meyer authored Nov 29, 2011

The advantage of kcalloc is, that will prevent integer overflows which could
result from the multiplication of number of elements and size and it is also
a bit nicer to read.

The semantic patch that makes this change is available
in https://lkml.org/lkml/2011/11/25/107Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

01e23742

tcp: drop SYN+FIN messages · fdf5af0d

Eric Dumazet authored Dec 02, 2011

Denys Fedoryshchenko reported that SYN+FIN attacks were bringing his
linux machines to their limits.

Dont call conn_request() if the TCP flags includes SYN flag
Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fdf5af0d

Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch · 78a8a36f
David S. Miller authored Dec 03, 2011

78a8a36f

03 Dec, 2011 9 commits

ipv6: Kill ndisc_get_neigh() inline helper. · 04a6f441

David S. Miller authored Dec 03, 2011

It's only used in net/ipv6/route.c and the NULL device check is
superfluous for all of the existing call sites.

Just expand the __ndisc_lookup_errno() call at each location.
Signed-off-by: David S. Miller <davem@davemloft.net>

04a6f441

ipv6: Various cleanups in route.c · 38308473

David S. Miller authored Dec 03, 2011

1) x == NULL --> !x
2) x != NULL --> x
3) (x&BIT) --> (x & BIT)
4) (BIT1|BIT2) --> (BIT1 | BIT2)
5) proper argument and struct member alignment
Signed-off-by: David S. Miller <davem@davemloft.net>

38308473

ipv6: Various cleanups in ip6_route.c · 507c9b1e

David S. Miller authored Dec 03, 2011

1) x == NULL --> !x
2) x != NULL --> x
3) if() --> if ()
4) while() --> while ()
5) (x & BIT) == 0 --> !(x & BIT)
6) (x&BIT) --> (x & BIT)
7) x=y --> x = y
8) (BIT1|BIT2) --> (BIT1 | BIT2)
9) if ((x & BIT)) --> if (x & BIT)
10) proper argument and struct member alignment
Signed-off-by: David S. Miller <davem@davemloft.net>

507c9b1e

net: Add Open vSwitch kernel components. · ccb1352e

Jesse Gross authored Oct 25, 2011

Open vSwitch is a multilayer Ethernet switch targeted at virtualized
environments.  In addition to supporting a variety of features
expected in a traditional hardware switch, it enables fine-grained
programmatic extension and flow-based control of the network.
This control is useful in a wide variety of applications but is
particularly important in multi-server virtualization deployments,
which are often characterized by highly dynamic endpoints and the need
to maintain logical abstractions for multiple tenants.

The Open vSwitch datapath provides an in-kernel fast path for packet
forwarding.  It is complemented by a userspace daemon, ovs-vswitchd,
which is able to accept configuration from a variety of sources and
translate it into packet processing rules.

See http://openvswitch.org for more information and userspace
utilities.
Signed-off-by: Jesse Gross <jesse@nicira.com>

ccb1352e

ipv6: Add fragment reporting to ipv6_skip_exthdr(). · 75f2811c

Jesse Gross authored Nov 30, 2011

While parsing through IPv6 extension headers, fragment headers are
skipped making them invisible to the caller.  This reports the
fragment offset of the last header in order to make it possible to
determine whether the packet is fragmented and, if so whether it is
a first or last fragment.
Signed-off-by: Jesse Gross <jesse@nicira.com>

75f2811c

vlan: Move vlan_set_encap_proto() to vlan header file · 396cf943

Pravin B Shelar authored Nov 18, 2011

Open vSwitch needs this function for vlan handling.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>

396cf943

genetlink: Add rcu_dereference_genl and genl_dereference. · b4e16611

Jesse Gross authored Nov 19, 2011

This adds rcu_dereference_genl and genl_dereference, which are genl
variants of the RTNL functions to enforce proper locking with lockdep
and sparse.
Signed-off-by: Jesse Gross <jesse@nicira.com>

b4e16611

genetlink: Add lockdep_genl_is_held(). · 86b1309c

Pravin B Shelar authored Nov 10, 2011

Open vSwitch uses genl_mutex locking to protect datapath
data-structures like flow-table, flow-actions. Following patch adds
lockdep_genl_is_held() which is used for rcu annotation to prove
locking.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>

86b1309c

genetlink: Add genl_notify() · 263ba61d

Pravin B Shelar authored Nov 10, 2011

Open vSwitch uses Generic Netlink interface for communication
between userspace and kernel module. genl_notify() is used
for sending notification back to userspace.

genl_notify() is analogous to rtnl_notify() but uses genl_sock
instead of rtnl.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>

263ba61d

02 Dec, 2011 10 commits

atm: clip: Remove code commented out since eternity. · 340e8dc1
David S. Miller authored Dec 02, 2011
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
340e8dc1
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · b3613118
David S. Miller authored Dec 02, 2011

b3613118

forcedeath: Fix bql support for forcedeath · 7505afe2

Igor Maravic authored Dec 01, 2011

Moved netdev_completed_queue() out of while loop in function nv_tx_done_optimized().
Because this function was in while loop,
BUG_ON(count > dql->num_queued - dql->num_completed)
was hit in dql_completed().
Signed-off-by: Igor Maravic <igorm@etf.rs>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7505afe2

niu: Fix typo in comment. · 1d214fa3

David S. Miller authored Dec 02, 2011

Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d214fa3

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 5983fe2b

Linus Torvalds authored Dec 01, 2011

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
  netfilter: Remove ADVANCED dependency from NF_CONNTRACK_NETBIOS_NS
  ipv4: flush route cache after change accept_local
  sch_red: fix red_change
  Revert "udp: remove redundant variable"
  bridge: master device stuck in no-carrier state forever when in user-stp mode
  ipv4: Perform peer validation on cached route lookup.
  net/core: fix rollback handler in register_netdevice_notifier
  sch_red: fix red_calc_qavg_from_idle_time
  bonding: only use primary address for ARP
  ipv4: fix lockdep splat in rt_cache_seq_show
  sch_teql: fix lockdep splat
  net: fec: Select the FEC driver by default for i.MX SoCs
  isdn: avoid copying too long drvid
  isdn: make sure strings are null terminated
  netlabel: Fix build problems when IPv6 is not enabled
  sctp: better integer overflow check in sctp_auth_create_key()
  sctp: integer overflow in sctp_auth_create_key()
  ipv6: Set mcast_hops to IPV6_DEFAULT_MCASTHOPS when -1 was given.
  net: Fix corruption in /proc/*/net/dev_mcast
  mac80211: fix race between the AGG SM and the Tx data path
  ...

5983fe2b

netfilter: Remove ADVANCED dependency from NF_CONNTRACK_NETBIOS_NS · 3ced1be5
David S. Miller authored Dec 01, 2011
```
firewalld in Fedora 16 needs this.
Signed-off-by: David S. Miller <davem@davemloft.net>
```
3ced1be5
niu: Add support for byte queue limits. · efa230f2
David S. Miller authored Dec 01, 2011
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
efa230f2

niu: Remove redundant PHY ID test. · 63ec3c83

David S. Miller authored Dec 01, 2011

Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

63ec3c83

ipv4: flush route cache after change accept_local · d01ff0a0

Peter Pan(潘卫平) authored Dec 01, 2011

After reset ipv4_devconf->data[IPV4_DEVCONF_ACCEPT_LOCAL] to 0,
we should flush route cache, or it will continue receive packets with local
source address, which should be dropped.
Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d01ff0a0

sch_red: fix red_change · 1ee5fa1e

Eric Dumazet authored Dec 01, 2011

Le mercredi 30 novembre 2011 à 14:36 -0800, Stephen Hemminger a écrit :

> (Almost) nobody uses RED because they can't figure it out.
> According to Wikipedia, VJ says that:
>  "there are not one, but two bugs in classic RED."

RED is useful for high throughput routers, I doubt many linux machines
act as such devices.

I was considering adding Adaptative RED (Sally Floyd, Ramakrishna
Gummadi, Scott Shender), August 2001

In this version, maxp is dynamic (from 1% to 50%), and user only have to
setup min_th (target average queue size)
(max_th and wq (burst in linux RED) are automatically setup)

By the way it seems we have a small bug in red_change()

if (skb_queue_empty(&sch->q))
	red_end_of_idle_period(&q->parms);

First, if queue is empty, we should call
red_start_of_idle_period(&q->parms);

Second, since we dont use anymore sch->q, but q->qdisc, the test is
meaningless.

Oh well...

[PATCH] sch_red: fix red_change()

Now RED is classful, we must check q->qdisc->q.qlen, and if queue is empty,
we start an idle period, not end it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1ee5fa1e