Commits · 55a3d4e15c7c953ecc55b96b83d2679abf8a7899 · Kirill Smelkov / linux

15 Mar, 2011 7 commits

ipvs: properly zero stats and rates · 55a3d4e1

Julian Anastasov authored Mar 14, 2011

 	Currently, the new percpu counters are not zeroed and
the zero commands do not work as expected, we still show the old
sum of percpu values. OTOH, we can not reset the percpu counters
from user context without causing the incrementing to use old
and bogus values.

 	So, as Eric Dumazet suggested fix that by moving all overhead
to stats reading in user context. Do not introduce overhead in
timer context (estimator) and incrementing (packet handling in
softirqs).

 	The new ustats0 field holds the zero point for all
counter values, the rates always use 0 as base value as before.
When showing the values to user space just give the difference
between counters and the base values. The only drawback is that
percpu stats are not zeroed, they are accessible only from /proc
and are new interface, so it should not be a compatibility problem
as long as the sum stats are correct after zeroing.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

55a3d4e1

ipvs: reorganize tot_stats · 2a0751af

Julian Anastasov authored Mar 04, 2011

 	The global tot_stats contains cpustats field just like the
stats for dest and svc, so better use it to simplify the usage
in estimation_timer. As tot_stats is registered as estimator
we can remove the special ip_vs_read_cpu_stats call for
tot_stats. Fix ip_vs_read_cpu_stats to be called under
stats lock because it is still used as synchronization between
estimation timer and user context (the stats readers).

 	Also, make sure ip_vs_stats_percpu_show reads properly
the u64 stats from user context.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

2a0751af

ipvs: move struct netns_ipvs · 2553d064

Julian Anastasov authored Mar 04, 2011

 	Remove include/net/netns/ip_vs.h because it depends on
structures from include/net/ip_vs.h. As ipvs is pointer in
struct net it is better to move struct netns_ipvs into
include/net/ip_vs.h, so that we can easily use other structures
in struct netns_ipvs.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

2553d064

IPVS: Fix variable assignment in ip_vs_notrack · 06b69390

Jesper Juhl authored Mar 09, 2011

There's no sense to 'ct = ct = ' in ip_vs_notrack(). Just assign
nf_ct_get()'s return value directly to the pointer variable 'ct' once.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Simon Horman <horms@verge.net.au>

06b69390

netfilter:ipvs: use kmemdup · 6060c74a

Shan Wei authored Mar 07, 2011

The semantic patch that makes this output is available
in scripts/coccinelle/api/memdup.cocci.

More information about semantic patching is available at
http://coccinelle.lip6.fr/Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

6060c74a

ipvs: remove _bh from percpu stats reading · 4a569c0c

Julian Anastasov authored Mar 04, 2011

 	ip_vs_read_cpu_stats is called only from timer, so
no need for _bh locks.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

4a569c0c

ipvs: avoid lookup for fwmark 0 · 097fc76a

Julian Anastasov authored Mar 04, 2011

 	Restore the previous behaviour to lookup for fwmark
service only when fwmark is non-null. This saves only CPU.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

097fc76a

14 Mar, 2011 2 commits

netfilter: nf_conntrack: fix sysctl memory leak · fe8f661f

Stephen Hemminger authored Mar 14, 2011

Message in log because sysctl table was not empty at netns exit
 WARNING: at net/sysctl_net.c:84 sysctl_net_exit+0x2a/0x2c()

Instrumenting showed that the nf_conntrack_timestamp was the entry
that was being created but not cleared.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

fe8f661f

netfilter: x_tables: return -ENOENT for non-existant matches/targets · 42046e2e

Patrick McHardy authored Mar 14, 2011

As Stephen correctly points out, we need to return -ENOENT in
xt_find_match()/xt_find_target() after the patch "netfilter: x_tables:
misuse of try_then_request_module" in order to properly indicate
a non-existant module to the caller.
Signed-off-by: Patrick McHardy <kaber@trash.net>

42046e2e

09 Mar, 2011 1 commit

netfilter: x_tables: misuse of try_then_request_module · adb00ae2

Stephen Hemminger authored Mar 09, 2011

Since xt_find_match() returns ERR_PTR(xx) on error not NULL,
the macro try_then_request_module won't work correctly here.
The macro expects its first argument will be zero if condition
fails. But ERR_PTR(-ENOENT) is not zero.

The correct solution is to propagate the error value
back.

Found by inspection, and compile tested only.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

adb00ae2

08 Mar, 2011 1 commit

netfilter: ipset: fix the compile warning in ip_set_create · 9846ada1

Shan Wei authored Mar 08, 2011

net/netfilter/ipset/ip_set_core.c:615: warning: ‘clash’ may be used uninitialized in this function
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>

9846ada1

28 Feb, 2011 1 commit

netfilter: nf_ct_tcp: fix out of sync scenario while in SYN_RECV · 8a80c79a

Pablo Neira Ayuso authored Feb 28, 2011

This patch fixes the out of sync scenarios while in SYN_RECV state.

Quoting Jozsef, what it happens if we are out of sync if the
following:

> > b. conntrack entry is outdated, new SYN received
> >    - (b1) we ignore it but save the initialization data from it
> >    - (b2) when the reply SYN/ACK receives and it matches the saved data,
> >      we pick up the new connection
This is what it should happen if we are in SYN_RECV state. Initially,
the SYN packet hits b1, thus we save data from it. But the SYN/ACK
packet is considered a retransmission given that we're in SYN_RECV
state. Therefore, we never hit b2 and we don't get in sync. To fix
this, we ignore SYN/ACK if we are in SYN_RECV. If the previous packet
was a SYN, then we enter the ignore case that get us in sync.

This patch helps a lot to conntrackd in stress scenarios (assumming a
client that generates lots of small TCP connections). During the failover,
consider that the new primary has injected one outdated flow in SYN_RECV
state (this is likely to happen if the conntrack event rate is high
because the backup will be a bit delayed from the primary). With the
current code, if the client starts a new fresh connection that matches
the tuple, the SYN packet will be ignored without updating the state
tracking, and the SYN+ACK in reply will blocked as it will not pass
checkings III or IV (since all state tracking in the original direction
is not initialized because of the SYN packet was ignored and the ignore
case that get us in sync is not applied).

I posted a couple of patches before this one. Changli Gao spotted
a simpler way to fix this problem. This patch implements his idea.

Cc: Changli Gao <xiaosuo@gmail.com>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>

8a80c79a

25 Feb, 2011 1 commit

ipvs: unify the formula to estimate the overhead of processing connections · b552f7e3

Changli Gao authored Feb 19, 2011

lc and wlc use the same formula, but lblc and lblcr use another one. There
is no reason for using two different formulas for the lc variants.

The formula used by lc is used by all the lc variants in this patch.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Wensong Zhang <wensong@linux-vs.org>
Signed-off-by: Simon Horman <horms@verge.net.au>

b552f7e3

24 Feb, 2011 1 commit

ipvs: use enum to instead of magic numbers · 17a8f8e3

Changli Gao authored Feb 24, 2011

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

17a8f8e3

22 Feb, 2011 1 commit

ipvs: use hlist instead of list · 731109e7

Changli Gao authored Feb 19, 2011

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

731109e7

16 Feb, 2011 1 commit

ipvs: make "no destination available" message more informative · 41ac51ee

Patrick Schaaf authored Feb 11, 2011

When IP_VS schedulers do not find a destination, they output a terse
"WLC: no destination available" message through kernel syslog, which I
can not only make sense of because syslog puts them in a logfile
together with keepalived checker results.

This patch makes the output a bit more informative, by telling you which
virtual service failed to find a destination.

Example output:

kernel: [1539214.552233] IPVS: wlc: TCP 192.168.8.30:22 - no destination available
kernel: [1539299.674418] IPVS: wlc: FWM 22 0x00000016 - no destination available

I have tested the code for IPv4 and FWM services, as you can see from
the example; I do not have an IPv6 setup to test the third code path
with.

To avoid code duplication, I put a new function ip_vs_scheduler_err()
into ip_vs_sched.c, and use that from the schedulers instead of calling
IP_VS_ERR_RL directly.
Signed-off-by: Patrick Schaaf <netdev@bof.de>
Signed-off-by: Simon Horman <horms@verge.net.au>

41ac51ee

15 Feb, 2011 3 commits

ipvs: remove extra lookups for ICMP packets · 6cb90db5

Julian Anastasov authored Feb 09, 2011

 	Remove code that should not be called anymore.
Now when ip_vs_out handles replies for local clients at
LOCAL_IN hook we do not need to call conn_out_get and
handle_response_icmp from ip_vs_in_icmp* because such
lookups were already performed for the ICMP packet and no
connection was found.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

6cb90db5

ipvs: fix timer in get_curr_sync_buff · 16a7fd32

Tinggong Wang authored Feb 09, 2011

 	Fix get_curr_sync_buff to keep buffer for 2 seconds
as intended, not just for the current jiffie. By this way
we will sync more connection structures with single packet.
Signed-off-by: Tinggong Wang <wangtinggong@gmail.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

16a7fd32

netfilter: nfnetlink_log: remove unused parameter · 8248779b
Florian Westphal authored Feb 15, 2011
```
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
```
8248779b

14 Feb, 2011 3 commits

netfilter: xt_conntrack: warn about use in raw table · a2361c87

Jan Engelhardt authored Feb 14, 2011

nfct happens to run after the raw table only.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>

a2361c87

Revert "netfilter: xt_connlimit: connlimit-above early loop termination" · 20b7975e

Stefan Berger authored Feb 14, 2011

This reverts commit 44bd4de9.

I have to revert the early loop termination in connlimit since it generates
problems when an iptables statement does not use -m state --state NEW before
the connlimit match extension.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

20b7975e

bridge: netfilter: fix information leak · d846f711

Vasiliy Kulikov authored Feb 14, 2011

Struct tmp is copied from userspace.  It is not checked whether the "name"
field is NULL terminated.  This may lead to buffer overflow and passing
contents of kernel stack as a module name to try_then_request_module() and,
consequently, to modprobe commandline.  It would be seen by all userspace
processes.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

d846f711

11 Feb, 2011 1 commit

netfilter: xt_connlimit: connlimit-above early loop termination · 44bd4de9

Stefan Berger authored Feb 11, 2011

The patch below introduces an early termination of the loop that is
counting matches. It terminates once the counter has exceeded the
threshold provided by the user. There's no point in continuing the loop
afterwards and looking at other entries.

It plays together with the following code further below:

return (connections > info->limit) ^ info->inverse;

where connections is the result of the counted connection, which in turn
is the matches variable in the loop. So once

        -> matches = info->limit + 1
alias   -> matches > info->limit
alias   -> matches > threshold

we can terminate the loop.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

44bd4de9

10 Feb, 2011 1 commit

netfilter: ipset: add dependency on CONFIG_NETFILTER_NETLINK · c16e19c1

Patrick McHardy authored Feb 10, 2011

When SYSCTL and PROC_FS and NETFILTER_NETLINK are not enabled:

net/built-in.o: In function `try_to_load_type':
ip_set_core.c:(.text+0x3ab49): undefined reference to `nfnl_unlock'
ip_set_core.c:(.text+0x3ab4e): undefined reference to `nfnl_lock'
...
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

c16e19c1

07 Feb, 2011 1 commit

IPVS: precedence bug in ip_vs_sync_switch_mode() · 7c9989a7

Dan Carpenter authored Feb 07, 2011

'!' has higher precedence than '&'.  IP_VS_STATE_MASTER is 0x1 so
the original code is equivelent to if (!ipvs->sync_state) ...
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

7c9989a7

03 Feb, 2011 1 commit

IPVS: Use correct lock in SCTP module · 8525d6f8

Simon Horman authored Feb 03, 2011

Use sctp_app_lock instead of tcp_app_lock in the SCTP protocol module.

This appears to be a typo introduced by the netns changes.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>

8525d6f8

02 Feb, 2011 4 commits

netfilter: xtables: add device group match · 9291747f

Patrick McHardy authored Feb 03, 2011

Add a new 'devgroup' match to match on the device group of the
incoming and outgoing network device of a packet.
Signed-off-by: Patrick McHardy <kaber@trash.net>

9291747f

netfilter: ipset: send error message manually · 5f52bc3c

Jozsef Kadlecsik authored Feb 02, 2011

When a message carries multiple commands and one of them triggers
an error, we have to report to the userspace which one was that.
The line number of the command plays this role and there's an attribute
reserved in the header part of the message to be filled out with the error
line number. In order not to modify the original message received from
the userspace, we construct a new, complete netlink error message and
modifies the attribute there, then send it.
Netlink is notified not to send its ACK/error message.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu
Signed-off-by: Patrick McHardy <kaber@trash.net>

5f52bc3c

netfilter: ipset: fix linking with CONFIG_IPV6=n · 724bab47

Patrick McHardy authored Feb 02, 2011

Add a dummy ip_set_get_ip6_port function that unconditionally
returns false for CONFIG_IPV6=n and convert the real function
to ipv6_skip_exthdr() to avoid pulling in the ip6_tables module
when loading ipset.
Signed-off-by: Patrick McHardy <kaber@trash.net>

724bab47

netfilter: ipset: add missing break statemtns in ip_set_get_ip_port() · 316ed388

Patrick McHardy authored Feb 02, 2011

Don't fall through in the switch statement, otherwise IPv4 headers
are incorrectly parsed again as IPv6 and the return value will always
be 'false'.
Signed-off-by: Patrick McHardy <kaber@trash.net>

316ed388

01 Feb, 2011 10 commits

netfilter: ipset: install ipset related header files · e3e241b2
Patrick McHardy authored Feb 01, 2011
```
Signed-off-by: Patrick McHardy <kaber@trash.net>
```
e3e241b2

IPVS: Remove ip_vs_sync_cleanup from section __exit · ed3d1e7b

Simon Horman authored Feb 01, 2011

ip_vs_sync_cleanup() may be called from ip_vs_init() on error
and thus needs to be accesible from section __init
Reporte-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Tested-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

ed3d1e7b

IPVS: Allow compilation with CONFIG_SYSCTL disabled · 0443929f

Simon Horman authored Feb 01, 2011

This is a rather naieve approach to allowing PVS to compile with
CONFIG_SYSCTL disabled.  I am working on a more comprehensive patch which
will remove compilation of all sysctl-related IPVS code when CONFIG_SYSCTL
is disabled.
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Tested-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

0443929f

IPVS: Remove unused variables · a1367647

Simon Horman authored Feb 01, 2011

These variables are unused as a result of the recent netns work.
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Tested-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

a1367647

IPVS: remove duplicate initialisation or rs_table · 258e958b

Simon Horman authored Feb 01, 2011

Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Tested-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

258e958b

IPVS: use z modifier for sizeof() argument · a870c8c5

Simon Horman authored Feb 01, 2011

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Tested-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

a870c8c5

netfilter: ctnetlink: fix ctnetlink_parse_tuple() warning · a00f1f36

Patrick McHardy authored Feb 01, 2011

net/netfilter/nf_conntrack_netlink.c: In function 'ctnetlink_parse_tuple':
net/netfilter/nf_conntrack_netlink.c:832:11: warning: comparison between 'enum ctattr_tuple' and 'enum ctattr_type'

Use ctattr_type for the 'type' parameter since that's the type of all attributes
passed to this function.
Signed-off-by: Patrick McHardy <kaber@trash.net>

a00f1f36

netfilter: ipset: remove unnecessary includes · 582e1fc8

Patrick McHardy authored Feb 01, 2011

None of the set types need uaccess.h since this is handled centrally
in ip_set_core. Most set types additionally don't need bitops.h and
spinlock.h since they use neither. tcp.h is only needed by those
using before(), udp.h is not needed at all.
Signed-off-by: Patrick McHardy <kaber@trash.net>

582e1fc8

netfilter: ipset: use nla_parse_nested() · 8da560ce

Patrick McHardy authored Feb 01, 2011

Replace calls of the form:

nla_parse(tb, ATTR_MAX, nla_data(attr), nla_len(attr), policy)

by:

nla_parse_nested(tb, ATTR_MAX, attr, policy)
Signed-off-by: Patrick McHardy <kaber@trash.net>

8da560ce

netfilter: xtables: "set" match and "SET" target support · d956798d

Jozsef Kadlecsik authored Feb 01, 2011

The patch adds the combined module of the "SET" target and "set" match
to netfilter. Both the previous and the current revisions are supported.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>

d956798d