Commits · 8b4ed8634f8b3f9aacfc42b4a872d30c36b9e255 · nexedi / linux

25 Mar, 2015 17 commits

tipc: eliminate race condition at dual link establishment · 8b4ed863

Jon Paul Maloy authored Mar 25, 2015

Despite recent improvements, the establishment of dual parallel
links still has a small glitch where messages can bypass each
other. When the second link in a dual-link configuration is
established, part of the first link's traffic will be steered over
to the new link. Although we do have a mechanism to ensure that
packets sent before and after the establishment of the new link
arrive in sequence to the destination node, this is not enough.
The arriving messages will still be delivered upwards in different
threads, something entailing a risk of message disordering during
the transition phase.

To fix this, we introduce a synchronization mechanism between the
two parallel links, so that traffic arriving on the new link cannot
be added to its input queue until we are guaranteed that all
pre-establishment messages have been delivered on the old, parallel
link.

This problem seems to always have been around, but its occurrence is
so rare that it has not been noticed until recent intensive testing.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8b4ed863

tipc: clean up handling of link congestion · 3127a020

Jon Paul Maloy authored Mar 25, 2015

After the recent changes in message importance handling it becomes
possible to simplify handling of messages and sockets when we
encounter link congestion.

We merge the function tipc_link_cong() into link_schedule_user(),
and simplify the code of the latter. The code should now be
easier to follow, especially regarding return codes and handling
of the message that caused the situation.

In case the scheduling function is unable to pre-allocate a wakeup
message buffer, it now returns -ENOBUFS, which is a more correct
code than the previously used -EHOSTUNREACH.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3127a020

tipc: introduce starvation free send algorithm · 1f66d161

Jon Paul Maloy authored Mar 25, 2015

Currently, we only use a single counter; the length of the backlog
queue, to determine whether a message should be accepted to the queue
or not. Each time a message is being sent, the queue length is compared
to a threshold value for the message's importance priority. If the queue
length is beyond this threshold, the message is rejected. This algorithm
implies a risk of starvation of low importance senders during very high
load, because it may take a long time before the backlog queue has
decreased enough to accept a lower level message.

We now eliminate this risk by introducing a counter for each importance
priority. When a message is sent, we check only the queue level for that
particular message's priority. If that is ok, the message can be added
to the backlog, irrespective of the queue level for other priorities.
This way, each level is guaranteed a certain portion of the total
bandwidth, and any risk of starvation is eliminated.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1f66d161

net: dsa: Handle non-bridge master change · b06b107a

Guenter Roeck authored Mar 25, 2015

Master change notifications may occur other than when joining or
leaving a bridge, for example when being added to or removed from
a bond or Open vSwitch. In that case, do nothing instead of asking
the switch driver to remove a port from a bridge that it didn't join.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b06b107a

crypto: algif - fix warn: unsigned 'used' is never less than zero · ac110f49

tadeusz.struk@intel.com authored Mar 25, 2015

Change type from unsigned long to int to fix an issue reported by kbuild robot:
crypto/algif_skcipher.c:596 skcipher_recvmsg_async() warn: unsigned 'used' is
never less than zero.
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ac110f49

tipc: fix a link reset issue due to retransmission failures · bc14b8d6

Ying Xue authored Mar 25, 2015

When a node joins a cluster while we are transmitting a fragment
stream over the broadcast link, it's missing the preceding fragments
needed to build a meaningful message. As a result, the node has to
drop it. However, as the fragment message is not acknowledged to
its sender before it's dropped, it accidentally causes link reset
of retransmission failure on the node.
Reported-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bc14b8d6

s390: fix /proc/interrupts output · 358e048d

Sebastian Ott authored Mar 25, 2015

The irqclass_sub_desc array and enum interruption_class are out of sync
thus /proc/interrupts is broken. Remove IRQIO_CLW.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

358e048d

sctp: avoid to repeatedly declare external variables · 7e3ea6d5

Ying Xue authored Mar 25, 2015

Move the declaration for external variables to sctp.h file avoiding
to repeatedly declare them with extern keyword.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7e3ea6d5

tcp: fix ipv4 mapped request socks · 0144a81c

Eric Dumazet authored Mar 24, 2015

ss should display ipv4 mapped request sockets like this :

tcp    SYN-RECV   0      0  ::ffff:192.168.0.1:8080   ::ffff:192.0.2.1:35261

and not like this :

tcp    SYN-RECV   0      0  192.168.0.1:8080   192.0.2.1:35261

We should init ireq->ireq_family based on listener sk_family,
not the actual protocol carried by SYN packet.

This means we can set ireq_family in inet_reqsk_alloc()

Fixes: 3f66b083 ("inet: introduce ireq_family")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0144a81c

virtio: change comment in transmit · d631b94e

stephen hemminger authored Mar 24, 2015

The original comment was not really informative or funny
as well as sexist. Replace it with a better explanation of
why the driver does stop and what the impacts are.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

d631b94e

tools: bpf_asm: cleanup vlan extension related token · 835c3d9b

Daniel Borkmann authored Mar 24, 2015

We now have K_VLANT, K_VLANP and K_VLANTPID. Clean them up into more
descriptive token, namely K_VLAN_TCI, K_VLAN_AVAIL and K_VLAN_TPID.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

835c3d9b

Merge branch 'listener_refactor_16' · b1275eb3

David S. Miller authored Mar 24, 2015

Eric Dumazet says:

====================
tcp: listener refactor part 16

A CONFIG_PROVE_RCU=y build revealed an RCU splat I had to fix.

I added const qualifiers to various md5 methods, as I expect
to call them on behalf of request sock traffic even if
the listener socket is not locked. This seems ok, but adding
const makes the contract clearer. Note a good reduction
of code size thanks to request/establish sockets convergence.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b1275eb3

tcp: md5: get rid of tcp_v[46]_reqsk_md5_lookup() · fd3a154a

Eric Dumazet authored Mar 24, 2015

With request socks convergence, we no longer need
different lookup methods. A request socket can
use generic lookup function.

Add const qualifier to 2nd tcp_v[46]_md5_lookup() parameter.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fd3a154a

tcp: md5: remove request sock argument of calc_md5_hash() · 39f8e58e

Eric Dumazet authored Mar 24, 2015

Since request and established sockets now have same base,
there is no need to pass two pointers to tcp_v4_md5_hash_skb()
or tcp_v6_md5_hash_skb()

Also add a const qualifier to their struct tcp_md5sig_key argument.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

39f8e58e

tcp: md5: input path is run under rcu protected sections · ff74e23f

Eric Dumazet authored Mar 24, 2015

It is guaranteed that both tcp_v4_rcv() and tcp_v6_rcv()
run from rcu read locked sections :

ip_local_deliver_finish() and ip6_input_finish() both
use rcu_read_lock()

Also align tcp_v6_inbound_md5_hash() on tcp_v4_inbound_md5_hash()
by returning a boolean.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff74e23f

tcp: use C99 initializers in new_state[] · 0980c1e3

Eric Dumazet authored Mar 24, 2015

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0980c1e3

tcp: md5: fix rcu lockdep splat · 80f03e27

Eric Dumazet authored Mar 24, 2015

While timer handler effectively runs a rcu read locked section,
there is no explicit rcu_read_lock()/rcu_read_unlock() annotations
and lockdep can be confused here :

net/ipv4/tcp_ipv4.c-906- /* caller either holds rcu_read_lock() or socket lock */
net/ipv4/tcp_ipv4.c:907: md5sig = rcu_dereference_check(tp->md5sig_info,
net/ipv4/tcp_ipv4.c-908- sock_owned_by_user(sk) ||
net/ipv4/tcp_ipv4.c-909- lockdep_is_held(&sk->sk_lock.slock));

Let's explicitely acquire rcu_read_lock() in tcp_make_synack()

Before commit fa76ce73 ("inet: get rid of central tcp/dccp listener
timer"), we were holding listener lock so lockdep was happy.

Fixes: fa76ce73 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric DUmazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

80f03e27

24 Mar, 2015 23 commits

Merge branch 'rhashtable-next' · 9ead3527

David S. Miller authored Mar 24, 2015

Thomas Graf says:

====================
rhashtable updates on top of Herbert's work

Patch 1 is a bugfix for an RCU splash I encountered while testing.
Patch 2 & 3 are pure cleanups. Patch 4 disables automatic shrinking
by default as discussed in previous thread. Patch 5 removes some
rhashtable internal knowledge from nft_hash and fixes another RCU
splash.

I've pushed various rhashtable tests (Netlink, nft) together with a
Makefile to a git tree [0] for easier stress testing.

[0] https://github.com/tgraf/rhashtable
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9ead3527

rhashtable: Add rhashtable_free_and_destroy() · 6b6f302c

Thomas Graf authored Mar 24, 2015

rhashtable_destroy() variant which stops rehashes, iterates over
the table and calls a callback to release resources.

Avoids need for nft_hash to embed rhashtable internals and allows to
get rid of the being_destroyed flag. It also saves a 2nd mutex
lock upon destruction.

Also fixes an RCU lockdep splash on nft set destruction due to
calling rht_for_each_entry_safe() without holding bucket locks.
Open code this loop as we need know that no mutations may occur in
parallel.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

6b6f302c

rhashtable: Disable automatic shrinking by default · b5e2c150

Thomas Graf authored Mar 24, 2015

Introduce a new bool automatic_shrinking to require the
user to explicitly opt-in to automatic shrinking of tables.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

b5e2c150

rhashtable: Mark internal/private inline functions as such · ac833bdd
Thomas Graf authored Mar 24, 2015
```
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
ac833bdd

rhashtable: Use 'unsigned int' consistently · 299e5c32

Thomas Graf authored Mar 24, 2015

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

299e5c32

rhashtable: Extend RCU read lock into rhashtable_insert_rehash() · 58be8a58

Thomas Graf authored Mar 24, 2015

rhashtable_insert_rehash() requires RCU locks to be held in order
to access ht->tbl and traverse to the last table.

Fixes: ccd57b1b ("rhashtable: Add immediate rehash during insertion")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

58be8a58

filter: introduce SKF_AD_VLAN_TPID BPF extension · 27cd5452

Michal Sekletar authored Mar 24, 2015

If vlan offloading takes place then vlan header is removed from frame
and its contents, both vlan_tci and vlan_proto, is available to user
space via TPACKET interface. However, only vlan_tci can be used in BPF
filters.

This commit introduces a new BPF extension. It makes possible to load
the value of vlan_proto (vlan TPID) to register A. Support for classic
BPF and eBPF is being added, analogous to skb->protocol.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Michal Sekletar <msekleta@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

27cd5452

Merge branch 'cxgb4-fcoe' · f6bb76cd

David S. Miller authored Mar 24, 2015

Varun Prakash says:

====================
FCoE support in cxgb4 driver

This patch series enables FCoE support in cxgb4 driver, it enables
FCOE_CRC and FCOE_MTU net device features.

This series is created against net-next tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f6bb76cd

cxgb4: update Kconfig and Makefile for FCoE support · 241e9247

Varun Prakash authored Mar 24, 2015

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

241e9247

cxgb4: add cxgb4_fcoe.c for FCoE · 84a200b3

Varun Prakash authored Mar 24, 2015

This patch adds cxgb4_fcoe.c and enables FCOE_CRC, FCOE_MTU
net device features.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

84a200b3

cxgb4: add cxgb4_fcoe.h and macro definitions for FCoE · 76fed8a9

Varun Prakash authored Mar 24, 2015

This patch adds new header file cxgb4_fcoe.h and defines new
macros for FCoE support in cxgb4 driver.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

76fed8a9

ipv6: fix sparse warnings in privacy stable addresses generation · ff40217e

Hannes Frederic Sowa authored Mar 24, 2015

Those warnings reported by sparse endianness check (via kbuild test robot)
are harmless, nevertheless fix them up and make the code a little bit
easier to read.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 622c81d5 ("ipv6: generation of stable privacy addresses for link-local and autoconf")
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff40217e

tipc: fix compile error when IPV6=m and TIPC=y · ed3e852a

Ying Xue authored Mar 24, 2015

When IPV6=m and TIPC=y, below error will appear during building kernel
image:

net/tipc/udp_media.c:196:
undefined reference to `ip6_dst_lookup'
make: *** [vmlinux] Error 1

As ip6_dst_lookup() is implemented in IPV6 and IPV6 is compiled as
module, ip6_dst_lookup() is not built-in core kernel image. As a
result, compiler cannot find 'ip6_dst_lookup' reference while
compiling TIPC code into core kernel image.

But with the method introduced by commit 5f81bd2e ("ipv6: export a
stub for IPv6 symbols used by vxlan"), we can avoid the compile error
through "ipv6_stub" pointer to access ip6_dst_lookup().

Fixes: d0f91938 ("tipc: add ip/udp media type")
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ed3e852a

net: allow to delete a whole device group · 66400d54

WANG Cong authored Mar 24, 2015

With dev group, we can change a batch of net devices,
so we should allow to delete them together too.

Group 0 is not allowed to be deleted since it is
the default group.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

66400d54

rhashtable: Add comment on choice of elasticity value · 27ed44a5

Herbert Xu authored Mar 24, 2015

This patch adds a comment on the choice of the value 16 as the
maximum chain length before we force a rehash.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

27ed44a5

cx82310_eth: fix semicolon.cocci warnings · 8263d57e

Wu Fengguang authored Mar 24, 2015

drivers/net/usb/cx82310_eth.c:175:2-3: Unneeded semicolon

 Removes unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

CC: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8263d57e

tipc: validate length of sockaddr in connect() for dgram/rdm · 610600c8

Sasha Levin authored Mar 23, 2015

Commit f2f8036e ("tipc: add support for connect() on dgram/rdm sockets")
hasn't validated user input length for the sockaddr structure which allows
a user to overwrite kernel memory with arbitrary input.

Fixes: f2f8036e ("tipc: add support for connect() on dgram/rdm sockets")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

610600c8

net: remove never used forwarding_accel_ops pointer from net_device · 0117ec19

Hannes Frederic Sowa authored Mar 23, 2015

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

0117ec19

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · d5c1d8c5

David S. Miller authored Mar 23, 2015

Conflicts:
	net/netfilter/nf_tables_core.c

The nf_tables_core.c conflict was resolved using a conflict resolution
from Stephen Rothwell as a guide.
Signed-off-by: David S. Miller <davem@davemloft.net>

d5c1d8c5

rhashtable: Fix sleeping inside RCU critical section in walk_stop · ba7c95ea

Herbert Xu authored Mar 24, 2015

The commit 963ecbd4 ("rhashtable:
Fix use-after-free in rhashtable_walk_stop") fixed a real bug
but created another one because we may end up sleeping inside an
RCU critical section.

This patch fixes it properly by replacing the mutex with a spin
lock that specifically protects the walker lists.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

ba7c95ea

Merge branch 'ipv6_stable_privacy_address' · ce046c56

David S. Miller authored Mar 23, 2015

Hannes Frederic Sowa says:

====================
ipv6: RFC7217 stable privacy addresses implementation

this is an implementation of basic support for RFC7217 stable privacy
addresses. Please review and consider for net-next.

v2:
* Correct references to RFC 7212 -> RFC 7217 in documentation patch (thanks, Eric!)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ce046c56

ipv6: add documentation for stable_secret, idgen_delay and idgen_retries knobs · 9f0761c1

Hannes Frederic Sowa authored Mar 23, 2015

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9f0761c1

ipv6: introduce idgen_delay and idgen_retries knobs · 1855b7c3

Hannes Frederic Sowa authored Mar 23, 2015

This is specified by RFC 7217.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

1855b7c3