Commits · 42eab94fff18cb1091d3501cd284d6bd6cc9c143 · nexedi / linux

15 Mar, 2011 29 commits

netfilter: arp_tables: fix infoleak to userspace · 42eab94f

Vasiliy Kulikov authored Mar 15, 2011

Structures ipt_replace, compat_ipt_replace, and xt_get_revision are
copied from userspace.  Fields of these structs that are
zero-terminated strings are not checked.  When they are used as argument
to a format string containing "%s" in request_module(), some sensitive
information is leaked to userspace via argument of spawned modprobe
process.

The first bug was introduced before the git epoch;  the second is
introduced by 6b7d31fc (v2.6.15-rc1);  the third is introduced by
6b7d31fc (v2.6.15-rc1).  To trigger the bug one should have
CAP_NET_ADMIN.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

42eab94f

netfilter: xt_connlimit: remove connlimit_rnd_inited · 4656c4d6

Changli Gao authored Mar 15, 2011

A potential race condition when generating connlimit_rnd is also fixed.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

4656c4d6

netfilter: xt_connlimit: use hlist instead · 3e0d5149

Changli Gao authored Mar 15, 2011

The header of hlist is smaller than list.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

3e0d5149

netfilter: xt_connlimit: use kmalloc() instead of kzalloc() · 0e23ca14

Changli Gao authored Mar 15, 2011

All the members are initialized after kzalloc().
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

0e23ca14

netfilter: xt_connlimit: fix daddr connlimit in SNAT scenario · 8183e3a8

Changli Gao authored Mar 15, 2011

We use the reply tuples when limiting the connections by the destination
addresses, however, in SNAT scenario, the final reply tuples won't be
ready until SNAT is done in POSTROUING or INPUT chain, and the following
nf_conntrack_find_get() in count_tem() will get nothing, so connlimit
can't work as expected.

In this patch, the original tuples are always used, and an additional
member addr is appended to save the address in either end.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

8183e3a8

IPVS: Conditionally include sysctl members of struct netns_ipvs · f2247fbd

Simon Horman authored Feb 04, 2011

There is now no need to include sysctl members of struct netns_ipvs
unless CONFIG_SYSCTL is defined.
Signed-off-by: Simon Horman <horms@verge.net.au>

f2247fbd

IPVS: Add __ip_vs_control_{init,cleanup}_sysctl() · 14e40546

Simon Horman authored Feb 04, 2011

Break out the portions of __ip_vs_control_init() and
__ip_vs_control_cleanup() where aren't necessary when
CONFIG_SYSCTL is undefined.
Signed-off-by: Simon Horman <horms@verge.net.au>

14e40546

IPVS: Conditionally define and use ip_vs_lblc{r}_table · fb1de432

Simon Horman authored Feb 04, 2011

ip_vs_lblc_table and ip_vs_lblcr_table, and code that uses them
are unnecessary when CONFIG_SYSCTL is undefined.
Signed-off-by: Simon Horman <horms@verge.net.au>

fb1de432

IPVS: Minimise ip_vs_leave when CONFIG_SYSCTL is undefined · a7a86b86

Simon Horman authored Feb 04, 2011

Much of ip_vs_leave() is unnecessary if CONFIG_SYSCTL is undefined.

I tried an approach of breaking the now #ifdef'ed portions out
into a separate function. However this appeared to grow the
compiled code on x86_64 by about 200 bytes in the case where
CONFIG_SYSCTL is defined. So I have gone with the simpler though
less elegant #ifdef'ed solution for now.
Signed-off-by: Simon Horman <horms@verge.net.au>

a7a86b86

IPVS: Conditional ip_vs_conntrack_enabled() · a4e2f5a7

Simon Horman authored Feb 04, 2011

ip_vs_conntrack_enabled() becomes a noop when CONFIG_SYSCTL is undefined.

In preparation for not including sysctl_conntrack in
struct netns_ipvs when CONFIG_SYCTL is not defined.
Signed-off-by: Simon Horman <horms@verge.net.au>

a4e2f5a7

IPVS: ip_vs_todrop() becomes a noop when CONFIG_SYSCTL is undefined · 3a1bbf18
Simon Horman authored Feb 04, 2011
```
Signed-off-by: Simon Horman <horms@verge.net.au>
```
3a1bbf18

IPVS: Conditinally use sysctl_lblc{r}_expiration · b27d777e

Simon Horman authored Feb 04, 2011

In preparation for not including sysctl_lblc{r}_expiration in
struct netns_ipvs when CONFIG_SYCTL is not defined.
Signed-off-by: Simon Horman <horms@verge.net.au>

b27d777e

IPVS: Add expire_quiescent_template() · 8e1b0b1b

Simon Horman authored Feb 04, 2011

In preparation for not including sysctl_expire_quiescent_template in
struct netns_ipvs when CONFIG_SYCTL is not defined.
Signed-off-by: Simon Horman <horms@verge.net.au>

8e1b0b1b

IPVS: Add sysctl_expire_nodest_conn() · 71a8ab6c

Simon Horman authored Feb 04, 2011

In preparation for not including sysctl_expire_nodest_conn in
struct netns_ipvs when CONFIG_SYCTL is not defined.
Signed-off-by: Simon Horman <horms@verge.net.au>

71a8ab6c

IPVS: Add sysctl_sync_ver() · 7532e8d4

Simon Horman authored Feb 04, 2011

In preparation for not including sysctl_sync_ver in
struct netns_ipvs when CONFIG_SYCTL is not defined.
Signed-off-by: Simon Horman <horms@verge.net.au>

7532e8d4

IPVS: Add {sysctl_sync_threshold,period}() · 59e0350e

Simon Horman authored Feb 04, 2011

In preparation for not including sysctl_sync_threshold in
struct netns_ipvs when CONFIG_SYCTL is not defined.
Signed-off-by: Simon Horman <horms@verge.net.au>

59e0350e

IPVS: Add sysctl_nat_icmp_send() · 0cfa558e

Simon Horman authored Feb 04, 2011

In preparation for not including sysctl_nat_icmp_send in
struct netns_ipvs when CONFIG_SYCTL is not defined.
Signed-off-by: Simon Horman <horms@verge.net.au>

0cfa558e

IPVS: Add sysctl_snat_reroute() · 84b3cee3

Simon Horman authored Feb 04, 2011

In preparation for not including sysctl_snat_reroute in
struct netns_ipvs when CONFIG_SYCTL is not defined.
Signed-off-by: Simon Horman <horms@verge.net.au>

84b3cee3

IPVS: Add ip_vs_route_me_harder() · ba4fd7e9

Simon Horman authored Feb 04, 2011

Add ip_vs_route_me_harder() to avoid repeating the same code twice.
Signed-off-by: Simon Horman <horms@verge.net.au>

ba4fd7e9

ipvs: rename estimator functions · 6ef757f9

Julian Anastasov authored Mar 14, 2011

 	Rename ip_vs_new_estimator to ip_vs_start_estimator
and ip_vs_kill_estimator to ip_vs_stop_estimator to better
match their logic.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

6ef757f9

ipvs: optimize rates reading · ea9f22cc

Julian Anastasov authored Mar 14, 2011

 	Move the estimator reading from estimation_timer to user
context. ip_vs_read_estimator() will be used to decode the rate
values. As the decoded rates are not set by estimation timer
there is no need to reset them in ip_vs_zero_stats.

 	There is no need ip_vs_new_estimator() to encode stats
to rates, if the destination is in trash both the stats and the
rates are inactive.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ea9f22cc

ipvs: remove unused seqcount stats · 87d68a15

Julian Anastasov authored Mar 14, 2011

 	Remove ustats_seq, IPVS_STAT_INC and IPVS_STAT_ADD
because they are not used. They were replaced with u64_stats.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

87d68a15

ipvs: properly zero stats and rates · 55a3d4e1

Julian Anastasov authored Mar 14, 2011

 	Currently, the new percpu counters are not zeroed and
the zero commands do not work as expected, we still show the old
sum of percpu values. OTOH, we can not reset the percpu counters
from user context without causing the incrementing to use old
and bogus values.

 	So, as Eric Dumazet suggested fix that by moving all overhead
to stats reading in user context. Do not introduce overhead in
timer context (estimator) and incrementing (packet handling in
softirqs).

 	The new ustats0 field holds the zero point for all
counter values, the rates always use 0 as base value as before.
When showing the values to user space just give the difference
between counters and the base values. The only drawback is that
percpu stats are not zeroed, they are accessible only from /proc
and are new interface, so it should not be a compatibility problem
as long as the sum stats are correct after zeroing.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

55a3d4e1

ipvs: reorganize tot_stats · 2a0751af

Julian Anastasov authored Mar 04, 2011

 	The global tot_stats contains cpustats field just like the
stats for dest and svc, so better use it to simplify the usage
in estimation_timer. As tot_stats is registered as estimator
we can remove the special ip_vs_read_cpu_stats call for
tot_stats. Fix ip_vs_read_cpu_stats to be called under
stats lock because it is still used as synchronization between
estimation timer and user context (the stats readers).

 	Also, make sure ip_vs_stats_percpu_show reads properly
the u64 stats from user context.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

2a0751af

ipvs: move struct netns_ipvs · 2553d064

Julian Anastasov authored Mar 04, 2011

 	Remove include/net/netns/ip_vs.h because it depends on
structures from include/net/ip_vs.h. As ipvs is pointer in
struct net it is better to move struct netns_ipvs into
include/net/ip_vs.h, so that we can easily use other structures
in struct netns_ipvs.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

2553d064

IPVS: Fix variable assignment in ip_vs_notrack · 06b69390

Jesper Juhl authored Mar 09, 2011

There's no sense to 'ct = ct = ' in ip_vs_notrack(). Just assign
nf_ct_get()'s return value directly to the pointer variable 'ct' once.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Simon Horman <horms@verge.net.au>

06b69390

netfilter:ipvs: use kmemdup · 6060c74a

Shan Wei authored Mar 07, 2011

The semantic patch that makes this output is available
in scripts/coccinelle/api/memdup.cocci.

More information about semantic patching is available at
http://coccinelle.lip6.fr/Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

6060c74a

ipvs: remove _bh from percpu stats reading · 4a569c0c

Julian Anastasov authored Mar 04, 2011

 	ip_vs_read_cpu_stats is called only from timer, so
no need for _bh locks.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

4a569c0c

ipvs: avoid lookup for fwmark 0 · 097fc76a

Julian Anastasov authored Mar 04, 2011

 	Restore the previous behaviour to lookup for fwmark
service only when fwmark is non-null. This saves only CPU.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

097fc76a

14 Mar, 2011 2 commits

netfilter: nf_conntrack: fix sysctl memory leak · fe8f661f

Stephen Hemminger authored Mar 14, 2011

Message in log because sysctl table was not empty at netns exit
 WARNING: at net/sysctl_net.c:84 sysctl_net_exit+0x2a/0x2c()

Instrumenting showed that the nf_conntrack_timestamp was the entry
that was being created but not cleared.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

fe8f661f

netfilter: x_tables: return -ENOENT for non-existant matches/targets · 42046e2e

Patrick McHardy authored Mar 14, 2011

As Stephen correctly points out, we need to return -ENOENT in
xt_find_match()/xt_find_target() after the patch "netfilter: x_tables:
misuse of try_then_request_module" in order to properly indicate
a non-existant module to the caller.
Signed-off-by: Patrick McHardy <kaber@trash.net>

42046e2e

09 Mar, 2011 1 commit

netfilter: x_tables: misuse of try_then_request_module · adb00ae2

Stephen Hemminger authored Mar 09, 2011

Since xt_find_match() returns ERR_PTR(xx) on error not NULL,
the macro try_then_request_module won't work correctly here.
The macro expects its first argument will be zero if condition
fails. But ERR_PTR(-ENOENT) is not zero.

The correct solution is to propagate the error value
back.

Found by inspection, and compile tested only.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

adb00ae2

08 Mar, 2011 1 commit

netfilter: ipset: fix the compile warning in ip_set_create · 9846ada1

Shan Wei authored Mar 08, 2011

net/netfilter/ipset/ip_set_core.c:615: warning: ‘clash’ may be used uninitialized in this function
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>

9846ada1

28 Feb, 2011 1 commit

netfilter: nf_ct_tcp: fix out of sync scenario while in SYN_RECV · 8a80c79a

Pablo Neira Ayuso authored Feb 28, 2011

This patch fixes the out of sync scenarios while in SYN_RECV state.

Quoting Jozsef, what it happens if we are out of sync if the
following:

> > b. conntrack entry is outdated, new SYN received
> >    - (b1) we ignore it but save the initialization data from it
> >    - (b2) when the reply SYN/ACK receives and it matches the saved data,
> >      we pick up the new connection
This is what it should happen if we are in SYN_RECV state. Initially,
the SYN packet hits b1, thus we save data from it. But the SYN/ACK
packet is considered a retransmission given that we're in SYN_RECV
state. Therefore, we never hit b2 and we don't get in sync. To fix
this, we ignore SYN/ACK if we are in SYN_RECV. If the previous packet
was a SYN, then we enter the ignore case that get us in sync.

This patch helps a lot to conntrackd in stress scenarios (assumming a
client that generates lots of small TCP connections). During the failover,
consider that the new primary has injected one outdated flow in SYN_RECV
state (this is likely to happen if the conntrack event rate is high
because the backup will be a bit delayed from the primary). With the
current code, if the client starts a new fresh connection that matches
the tuple, the SYN packet will be ignored without updating the state
tracking, and the SYN+ACK in reply will blocked as it will not pass
checkings III or IV (since all state tracking in the original direction
is not initialized because of the SYN packet was ignored and the ignore
case that get us in sync is not applied).

I posted a couple of patches before this one. Changli Gao spotted
a simpler way to fix this problem. This patch implements his idea.

Cc: Changli Gao <xiaosuo@gmail.com>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>

8a80c79a

25 Feb, 2011 1 commit

ipvs: unify the formula to estimate the overhead of processing connections · b552f7e3

Changli Gao authored Feb 19, 2011

lc and wlc use the same formula, but lblc and lblcr use another one. There
is no reason for using two different formulas for the lc variants.

The formula used by lc is used by all the lc variants in this patch.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Wensong Zhang <wensong@linux-vs.org>
Signed-off-by: Simon Horman <horms@verge.net.au>

b552f7e3

24 Feb, 2011 1 commit

ipvs: use enum to instead of magic numbers · 17a8f8e3

Changli Gao authored Feb 24, 2011

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

17a8f8e3

22 Feb, 2011 1 commit

ipvs: use hlist instead of list · 731109e7

Changli Gao authored Feb 19, 2011

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

731109e7

16 Feb, 2011 1 commit

ipvs: make "no destination available" message more informative · 41ac51ee

Patrick Schaaf authored Feb 11, 2011

When IP_VS schedulers do not find a destination, they output a terse
"WLC: no destination available" message through kernel syslog, which I
can not only make sense of because syslog puts them in a logfile
together with keepalived checker results.

This patch makes the output a bit more informative, by telling you which
virtual service failed to find a destination.

Example output:

kernel: [1539214.552233] IPVS: wlc: TCP 192.168.8.30:22 - no destination available
kernel: [1539299.674418] IPVS: wlc: FWM 22 0x00000016 - no destination available

I have tested the code for IPv4 and FWM services, as you can see from
the example; I do not have an IPv6 setup to test the third code path
with.

To avoid code duplication, I put a new function ip_vs_scheduler_err()
into ip_vs_sched.c, and use that from the schedulers instead of calling
IP_VS_ERR_RL directly.
Signed-off-by: Patrick Schaaf <netdev@bof.de>
Signed-off-by: Simon Horman <horms@verge.net.au>

41ac51ee

15 Feb, 2011 2 commits

ipvs: remove extra lookups for ICMP packets · 6cb90db5

Julian Anastasov authored Feb 09, 2011

 	Remove code that should not be called anymore.
Now when ip_vs_out handles replies for local clients at
LOCAL_IN hook we do not need to call conn_out_get and
handle_response_icmp from ip_vs_in_icmp* because such
lookups were already performed for the ICMP packet and no
connection was found.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

6cb90db5

ipvs: fix timer in get_curr_sync_buff · 16a7fd32

Tinggong Wang authored Feb 09, 2011

 	Fix get_curr_sync_buff to keep buffer for 2 seconds
as intended, not just for the current jiffie. By this way
we will sync more connection structures with single packet.
Signed-off-by: Tinggong Wang <wangtinggong@gmail.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

16a7fd32