Commits · 454594f3b93a49ef568cd190c5af31376b105a7b · Kirill Smelkov / linux

22 Oct, 2013 1 commit

Revert "bridge: only expire the mdb entry when query is received" · 454594f3

Linus Lüssing authored Oct 20, 2013

While this commit was a good attempt to fix issues occuring when no
multicast querier is present, this commit still has two more issues:

1) There are cases where mdb entries do not expire even if there is a
querier present. The bridge will unnecessarily continue flooding
multicast packets on the according ports.

2) Never removing an mdb entry could be exploited for a Denial of
Service by an attacker on the local link, slowly, but steadily eating up
all memory.

Actually, this commit became obsolete with
"bridge: disable snooping if there is no querier" (b00589af)
which included fixes for a few more cases.

Therefore reverting the following commits (the commit stated in the
commit message plus three of its follow up fixes):

====================
Revert "bridge: update mdb expiration timer upon reports."
This reverts commit f144febd.
Revert "bridge: do not call setup_timer() multiple times"
This reverts commit 1faabf2a.
Revert "bridge: fix some kernel warning in multicast timer"
This reverts commit c7e8e8a8.
Revert "bridge: only expire the mdb entry when query is received"
This reverts commit 9f00b2e7.
====================

CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Reviewed-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

454594f3

21 Oct, 2013 19 commits

tcp: initialize passive-side sk_pacing_rate after 3WHS · 02cf4ebd

Neal Cardwell authored Oct 21, 2013

For passive TCP connections, upon receiving the ACK that completes the
3WHS, make sure we set our pacing rate after we get our first RTT
sample.

On passive TCP connections, when we receive the ACK completing the
3WHS we do not take an RTT sample in tcp_ack(), but rather in
tcp_synack_rtt_meas(). So upon receiving the ACK that completes the
3WHS, tcp_ack() leaves sk_pacing_rate at its initial value.

Originally the initial sk_pacing_rate value was 0, so passive-side
connections defaulted to sysctl_tcp_min_tso_segs (2 segs) in skbuffs
made in the first RTT. With a default initial cwnd of 10 packets, this
happened to be correct for RTTs 5ms or bigger, so it was hard to
see problems in WAN or emulated WAN testing.

Since 7eec4174 ("pkt_sched: fq: fix non TCP flows pacing"), the
initial sk_pacing_rate is 0xffffffff. So after that change, passive
TCP connections were keeping this value (and using large numbers of
segments per skbuff) until receiving an ACK for data.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

02cf4ebd

davinci_emac.c: Fix IFF_ALLMULTI setup · d69e0f7e

Mariusz Ceier authored Oct 21, 2013

When IFF_ALLMULTI flag is set on interface and IFF_PROMISC isn't,
emac_dev_mcast_set should only enable RX of multicasts and reset
MACHASH registers.

It does this, but afterwards it either sets up multicast MACs
filtering or disables RX of multicasts and resets MACHASH registers
again, rendering IFF_ALLMULTI flag useless.

This patch fixes emac_dev_mcast_set, so that multicast MACs filtering and
disabling of RX of multicasts are skipped when IFF_ALLMULTI flag is set.

Tested with kernel 2.6.37.
Signed-off-by: Mariusz Ceier <mceier+kernel@gmail.com>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d69e0f7e

mac802154: correct a typo in ieee802154_alloc_device() prototype · 7e4d8a19

Alexandre Belloni authored Oct 21, 2013

This has no other impact than a cosmetic one.
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7e4d8a19

ipv6: probe routes asynchronous in rt6_probe · c2f17e82

Hannes Frederic Sowa authored Oct 21, 2013

Routes need to be probed asynchronous otherwise the call stack gets
exhausted when the kernel attemps to deliver another skb inline, like
e.g. xt_TEE does, and we probe at the same time.

We update neigh->updated still at once, otherwise we would send to
many probes.

Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c2f17e82

Merge branch 'rt6i_gateway' · 3a70417c

David S. Miller authored Oct 21, 2013

Julian Anastasov says:

====================
ipv6: use rt6i_gateway as nexthop

	The following patchset makes sure that rt6i_gateway
contains valid nexthop information in all cases, so that
we can use different nexthop for sending.

	The first patch is a simple fix that makes IPVS, TEE,
RAW(hdrincl) and RTF_DYNAMIC(without RTF_GATEWAY) work as
before 3.9. There is a single corner case not solved by
this patch: RAW(hdrincl) or TEE using local address for
nexthop, a silly feature, I guess. In this case we
see zeroes in rt6i_gateway because we get route that is not
cloned. This is solved only with patch 2.

	The second patch is an optimization that makes sure
all resulting routes have rt6i_gateway filled, so that we
can avoid the complex ipv6_addr_any() call added to rt6_nexthop()
by patch 1. And it sets rt6i_gateway for local routes, a case
not handled by patch 1.

	The third patch uses the new rt6_nexthop() function to fix
the matching of gateways in the same way as commit bbb5823c
("netfilter: nf_conntrack: fix rt_gateway checks for H.323 helper")
fixes nf_conntrack_h323_main.c for IPv4. Currently, it depends on
the new definition of rt6_nexthop() in patch 2. Actually, if
patch 2 is applied, patch 3 becomes a cosmetic change.

	I see the following two alternatives for applying these
patches:

1. Linger patch 2 in net-next to avoid surprises in the upcoming
release. In this case patch 3 can be reworked not to depend on
the new rt6_nexthop() definition in patch 2. I guess this is a
better option, so that patch 2 can be reviewed and tested for
longer time.

2. Include all 3 patches in net tree - more risky because this
is my first attempt to change IPv6.

	Here is the situation as handled by patch 2:

	In IPv6 the resolved routes are always host routes (/128
with DST_HOST), mostly cloned ones. We allow routes in FIB
to contain rt6i_gateway with zeroes (eg. for local subnets) but
on cloning we can fill the rt6i_gateway field in result.
This works even without this patchset.

	There is a single special case where dst is provided as
skb_dst directly without a routing call: icmp6_dst_alloc(). It is a
private dst allocated just for the particular ICMP packet. Patch 2
fills rt6i_gateway in this case, needed for the new rt6_nexthop()
simplification.

	The last case is addrconf_dst_alloc(), it can put in
FIB local/anycast routes when addresses are added. Patch 2
needs to fill rt6i_gateway in this case because such routes
are returned without cloning.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3a70417c

netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper · 56e42441

Julian Anastasov authored Oct 20, 2013

Now when rt6_nexthop() can return nexthop address we can use it
for proper nexthop comparison of directly connected destinations.
For more information refer to commit bbb5823c
("netfilter: nf_conntrack: fix rt_gateway checks for H.323 helper").
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

56e42441

ipv6: fill rt6i_gateway with nexthop address · 550bab42

Julian Anastasov authored Oct 20, 2013

Make sure rt6i_gateway contains nexthop information in
all routes returned from lookup or when routes are directly
attached to skb for generated ICMP packets.

The effect of this patch should be a faster version of
rt6_nexthop() and the consideration of local addresses as
nexthop.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

550bab42

ipv6: always prefer rt6i_gateway if present · 96dc8095

Julian Anastasov authored Oct 20, 2013

In v3.9 6fd6ce20 ("ipv6: Do not depend on rt->n in
ip6_finish_output2()." changed the behaviour of ip6_finish_output2()
such that the recently introduced rt6_nexthop() is used
instead of an assigned neighbor.

As rt6_nexthop() prefers rt6i_gateway only for gatewayed
routes this causes a problem for users like IPVS, xt_TEE and
RAW(hdrincl) if they want to use different address for routing
compared to the destination address.

Another case is when redirect can create RTF_DYNAMIC
route without RTF_GATEWAY flag, we ignore the rt6i_gateway
in rt6_nexthop().

Fix the above problems by considering the rt6i_gateway if
present, so that traffic routed to address on local subnet is
not wrongly diverted to the destination address.

Thanks to Simon Horman and Phil Oester for spotting the
problematic commit.

Thanks to Hannes Frederic Sowa for his review and help in testing.
Reported-by: Phil Oester <kernel@linuxace.com>
Reported-by: Mark Brooks <mark@loadbalancer.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

96dc8095

Merge branch 'bnx2x' · 4440c6f7

David S. Miller authored Oct 21, 2013

Yuval Mintz says:

====================
bnx2x: Bug fixes patch series

This patch series contains fixes for various flows - several SR-IOV issues
are fixed, ethtool callbacks (coalescing and register dump) are corrected,
null pointer dereference on error flows is prevented, etc.

Changes from V1
---------------
 - Patch 2  "bnx2x: Prevent an illegal pointer dereference during panic"
   is revised, with improved handling of edge cases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4440c6f7

bnx2x: Set NETIF_F_HIGHDMA unconditionally · edd31476

Merav Sicron authored Oct 20, 2013

Current driver implementation incorrectly sets the flag only if 64-bit
DMA mask succeeded.
Signed-off-by: Merav Sicron <meravs@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

edd31476

bnx2x: Don't pretend during register dump · 4293b9f5

Dmitry Kravkov authored Oct 20, 2013

As part of a register dump, the interface pretends to have the identity
of other interfaces of the same physical device in order to perform
HW configuration for them - specifically, it needs to prevent attentions
from generating on those functions as the register dump accesses registers
in common blocks which whose reading might generate an attention.

However, such pretension is unsafe - unlike other flows in which the driver
uses pretend, during register dump there is no guarantee no other HW access
will take place (by other flows). If such access will take place, the HW will
be accessed by the wrong interface, and leave both functions in an incorrect
state.

This patch removes all pretensions from the register dump flow. Instead, it
changes initial configuration of attentions such that no fatal attention will
be generated for other functions as a result of the register dump
(notice however, a debug print claiming an attention from other functions IS
possible during the register dump)
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4293b9f5

bnx2x: Lock DMAE when used by statistic flow · 32316a46

Ariel Elior authored Oct 20, 2013

bnx2x has several clients to its DMAE machines - all of them with the exception
of the statistics flow used the same locking mechanisms to synchronize the DMAE
machines' usage.

Since statistics (which are periodically entered) use DMAE without taking the
locks, they may erase the commands which were previously set -
e.g., it may cause a VF to timeout while waiting for a PF answer on the VF-PF
channel as that command header would have been overwritten by the statistics'
header.

This patch makes certain that all flows utilizing DMAE will use the same
API, assuring that the locking scheme will be kept by all said flows.
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

32316a46

bnx2x: Prevent null pointer dereference on error flow · 6b991c37

Yuval Mintz authored Oct 20, 2013

If debug message is open and bnx2x_vfop_qdtor_cmd() were to fail,
the resulting print would have caused a null pointer dereference.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6b991c37

bnx2x: Fix config when SR-IOV and iSCSI are enabled · 0907f34c

Ariel Elior authored Oct 20, 2013

Starting with commit b9871bcf "bnx2x: VF RSS support - PF side", if a PF will
have SR-IOV supported in its PCI configuration space, storage drivers will not
work for that interface.

This patch fixes the resource calculation to allow such a configuration to
properly work.
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0907f34c

bnx2x: Fix Coalescing configuration · 6802516e

Dmitry Kravkov authored Oct 20, 2013

bnx2x drivers configure coalescing incorrectly (e.g., as a result of a call
to 'ethtool -c'). Although this is almost invisible to the user (due to NAPI)
designated tests will show the configuration is incorrect.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6802516e

bnx2x: Unlock VF-PF channel on MAC/VLAN config error · 31329afd

Ariel Elior authored Oct 20, 2013

Current code returns upon failure, leaving the VF-PF in an unusable state;
This patch adds the missing release so further commands could pass between
PF and VF.
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

31329afd

bnx2x: Prevent an illegal pointer dereference during panic · 1a6974b2

Yuval Mintz authored Oct 20, 2013

During a panic, the driver tries to print the Management FW buffer of recent
commands. To do so, the driver reads the address of that buffer from a known
address. If the buffer is unavailable (e.g., PCI reads don't work, MCP is
failing, etc.), the driver will try to access the address it has read, possibly
causing a kernel panic.

This check 'sanitizes' the access, validating the read value is indeed a valid
address inside the management FW's buffers.
The patch also removes a read outside the scope of the buffer, which resulted
in some unrelated chraracters appearing in the log.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1a6974b2

bnx2x: Fix Maximum CoS estimation for VFs · b1239723

Yuval Mintz authored Oct 20, 2013

bnx2x VFs do not support Multi-CoS; Current implementation
erroneously sets the VFs maximal number of CoS to be > 1.

This will cause the driver to call alloc_etherdev_mqs() with
a number of queues it cannot possibly support and reflects
in 'odd' driver prints.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b1239723

drivers: net: cpsw: fix kernel warn during iperf test with interrupt pacing · 49595b7b

Mugunthan V N authored Oct 20, 2013

When interrupt pacing is enabled, receive/transmit statistics are not
updated properly by hardware which leads to ISR return with IRQ_NONE
and inturn kernel disables the interrupt. This patch removed the checking
of receive/transmit statistics from ISR.

This patch is verified with AM335x Beagle Bone Black and below is the
kernel warn when interrupt pacing is enabled.

[  104.298254] irq 58: nobody cared (try booting with the "irqpoll" option)
[  104.305356] CPU: 0 PID: 1073 Comm: iperf Not tainted 3.12.0-rc3-00342-g77d4015b #3
[  104.313284] [<c001bb84>] (unwind_backtrace+0x0/0xf0) from [<c0017db0>] (show_stack+0x10/0x14)
[  104.322282] [<c0017db0>] (show_stack+0x10/0x14) from [<c0507920>] (dump_stack+0x78/0x94)
[  104.330816] [<c0507920>] (dump_stack+0x78/0x94) from [<c0088c1c>] (__report_bad_irq+0x20/0xc0)
[  104.339889] [<c0088c1c>] (__report_bad_irq+0x20/0xc0) from [<c008912c>] (note_interrupt+0x1dc/0x23c)
[  104.349505] [<c008912c>] (note_interrupt+0x1dc/0x23c) from [<c0086d74>] (handle_irq_event_percpu+0xc4/0x238)
[  104.359851] [<c0086d74>] (handle_irq_event_percpu+0xc4/0x238) from [<c0086f24>] (handle_irq_event+0x3c/0x5c)
[  104.370198] [<c0086f24>] (handle_irq_event+0x3c/0x5c) from [<c008991c>] (handle_level_irq+0xac/0x10c)
[  104.379907] [<c008991c>] (handle_level_irq+0xac/0x10c) from [<c00866d8>] (generic_handle_irq+0x20/0x30)
[  104.389812] [<c00866d8>] (generic_handle_irq+0x20/0x30) from [<c0014ce8>] (handle_IRQ+0x4c/0xb0)
[  104.399066] [<c0014ce8>] (handle_IRQ+0x4c/0xb0) from [<c000856c>] (omap3_intc_handle_irq+0x60/0x74)
[  104.408598] [<c000856c>] (omap3_intc_handle_irq+0x60/0x74) from [<c050d8e4>] (__irq_svc+0x44/0x5c)
[  104.418021] Exception stack(0xde4f7c00 to 0xde4f7c48)
[  104.423345] 7c00: 00000001 00000000 00000000 dd002140 60000013 de006e54 00000002 00000000
[  104.431952] 7c20: de345748 00000040 c11c8588 00018ee0 00000000 de4f7c48 c009dfc8 c050d300
[  104.440553] 7c40: 60000013 ffffffff
[  104.444237] [<c050d8e4>] (__irq_svc+0x44/0x5c) from [<c050d300>] (_raw_spin_unlock_irqrestore+0x34/0x44)
[  104.454220] [<c050d300>] (_raw_spin_unlock_irqrestore+0x34/0x44) from [<c00868c0>] (__irq_put_desc_unlock+0x14/0x38)
[  104.465295] [<c00868c0>] (__irq_put_desc_unlock+0x14/0x38) from [<c0088068>] (enable_irq+0x4c/0x74)
[  104.474829] [<c0088068>] (enable_irq+0x4c/0x74) from [<c03abd24>] (cpsw_poll+0xb8/0xdc)
[  104.483276] [<c03abd24>] (cpsw_poll+0xb8/0xdc) from [<c044ef68>] (net_rx_action+0xc0/0x1e8)
[  104.492085] [<c044ef68>] (net_rx_action+0xc0/0x1e8) from [<c0048a90>] (__do_softirq+0x100/0x27c)
[  104.501338] [<c0048a90>] (__do_softirq+0x100/0x27c) from [<c0048cd0>] (do_softirq+0x68/0x70)
[  104.510224] [<c0048cd0>] (do_softirq+0x68/0x70) from [<c0048e8c>] (local_bh_enable+0xd0/0xe4)
[  104.519211] [<c0048e8c>] (local_bh_enable+0xd0/0xe4) from [<c048c774>] (tcp_rcv_established+0x450/0x648)
[  104.529201] [<c048c774>] (tcp_rcv_established+0x450/0x648) from [<c0494904>] (tcp_v4_do_rcv+0x154/0x474)
[  104.539195] [<c0494904>] (tcp_v4_do_rcv+0x154/0x474) from [<c043d750>] (release_sock+0xac/0x1ac)
[  104.548448] [<c043d750>] (release_sock+0xac/0x1ac) from [<c04844e8>] (tcp_recvmsg+0x4d0/0xa8c)
[  104.557528] [<c04844e8>] (tcp_recvmsg+0x4d0/0xa8c) from [<c04a8720>] (inet_recvmsg+0xcc/0xf0)
[  104.566507] [<c04a8720>] (inet_recvmsg+0xcc/0xf0) from [<c0439744>] (sock_recvmsg+0x90/0xb0)
[  104.575394] [<c0439744>] (sock_recvmsg+0x90/0xb0) from [<c043b778>] (SyS_recvfrom+0x88/0xd8)
[  104.584280] [<c043b778>] (SyS_recvfrom+0x88/0xd8) from [<c043b7e0>] (sys_recv+0x18/0x20)
[  104.592805] [<c043b7e0>] (sys_recv+0x18/0x20) from [<c0013da0>] (ret_fast_syscall+0x0/0x48)
[  104.601587] handlers:
[  104.603992] [<c03acd94>] cpsw_interrupt
[  104.608040] Disabling IRQ #58

Cc: Sebastian Siewior <bigeasy@linutronix.de>
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

49595b7b

19 Oct, 2013 6 commits

Merge branch 'ufo_fixes' · 77d4015b

David S. Miller authored Oct 19, 2013

Jiri Pirko says:

====================
UFO fixes

Couple of patches fixing UFO functionality in different situations.

v1->v2:
- minor if{}else{} coding style adjustment suggested by Sergei Shtylyov
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

77d4015b

ip_output: do skb ufo init for peeked non ufo skb as well · e93b7d74

Jiri Pirko authored Oct 19, 2013

Now, if user application does:
sendto len<mtu flag MSG_MORE
sendto len>mtu flag 0
The skb is not treated as fragmented one because it is not initialized
that way. So move the initialization to fix this.

introduced by:
commit e89e9cf5 "[IPv4/IPv6]: UFO Scatter-gather approach"
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e93b7d74

ip6_output: do skb ufo init for peeked non ufo skb as well · c547dbf5

Jiri Pirko authored Oct 19, 2013

Now, if user application does:
sendto len<mtu flag MSG_MORE
sendto len>mtu flag 0
The skb is not treated as fragmented one because it is not initialized
that way. So move the initialization to fix this.

introduced by:
commit e89e9cf5 "[IPv4/IPv6]: UFO Scatter-gather approach"
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c547dbf5

udp6: respect IPV6_DONTFRAG sockopt in case there are pending frames · e36d3ff9

Jiri Pirko authored Oct 19, 2013

if up->pending != 0 dontfrag is left with default value -1. That
causes that application that do:
sendto len>mtu flag MSG_MORE
sendto len>mtu flag 0
will receive EMSGSIZE errno as the result of the second sendto.

This patch fixes it by respecting IPV6_DONTFRAG socket option.

introduced by:
commit 4b340ae2 "IPv6: Complete IPV6_DONTFRAG support"
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e36d3ff9

net: fix cipso packet validation when !NETLABEL · f2e5ddcc

Seif Mazareeb authored Oct 17, 2013

When CONFIG_NETLABEL is disabled, the cipso_v4_validate() function could loop
forever in the main loop if opt[opt_iter +1] == 0, this will causing a kernel
crash in an SMP system, since the CPU executing this function will
stall /not respond to IPIs.

This problem can be reproduced by running the IP Stack Integrity Checker
(http://isic.sourceforge.net) using the following command on a Linux machine
connected to DUT:

"icmpsic -s rand -d <DUT IP address> -r 123456"
wait (1-2 min)
Signed-off-by: Seif Mazareeb <seif@marvell.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f2e5ddcc

net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix race · 90c6bd34

Daniel Borkmann authored Oct 17, 2013

In the case of credentials passing in unix stream sockets (dgram
sockets seem not affected), we get a rather sparse race after
commit 16e57262 ("af_unix: dont send SCM_CREDENTIALS by default").

We have a stream server on receiver side that requests credential
passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED
on each spawned/accepted socket on server side to 1 first (as it's
not inherited), it can happen that in the time between accept() and
setsockopt() we get interrupted, the sender is being scheduled and
continues with passing data to our receiver. At that time SO_PASSCRED
is neither set on sender nor receiver side, hence in cmsg's
SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534
(== overflow{u,g}id) instead of what we actually would like to see.

On the sender side, here nc -U, the tests in maybe_add_creds()
invoked through unix_stream_sendmsg() would fail, as at that exact
time, as mentioned, the sender has neither SO_PASSCRED on his side
nor sees it on the server side, and we have a valid 'other' socket
in place. Thus, sender believes it would just look like a normal
connection, not needing/requesting SO_PASSCRED at that time.

As reverting 16e57262 would not be an option due to the significant
performance regression reported when having creds always passed,
one way/trade-off to prevent that would be to set SO_PASSCRED on
the listener socket and allow inheriting these flags to the spawned
socket on server side in accept(). It seems also logical to do so
if we'd tell the listener socket to pass those flags onwards, and
would fix the race.

Before, strace:

recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
        msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
        cmsg_type=SCM_CREDENTIALS{pid=0, uid=65534, gid=65534}},
        msg_flags=0}, 0) = 5

After, strace:

recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
        msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
        cmsg_type=SCM_CREDENTIALS{pid=11580, uid=1000, gid=1000}},
        msg_flags=0}, 0) = 5
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

90c6bd34

18 Oct, 2013 8 commits

qlcnic: Validate Tx queue only for 82xx adapters. · 66c562ef

Himanshu Madhani authored Oct 17, 2013

o validate Tx queue only in case of adapters which supports
  multi Tx queue.

  This patch is to fix regression introduced in commit
  aa4a1f7d
  "qlcnic: Enable Tx queue changes using ethtool for 82xx Series adapter"
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

66c562ef

be2net: pass if_id for v1 and V2 versions of TX_CREATE cmd · 0fb88d61

Vasundhara Volam authored Oct 17, 2013

It is a required field for all TX_CREATE cmd versions > 0.
This fixes a driver initialization failure, caused by recent SH-R Firmwares
(versions > 10.0.639.0) failing the TX_CREATE cmd when if_id field is
not passed.
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0fb88d61

wanxl: fix info leak in ioctl · 2b13d06c

Salva Peiró authored Oct 16, 2013

The wanxl_ioctl() code fails to initialize the two padding bytes of
struct sync_serial_settings after the ->loopback member. Add an explicit
memset(0) before filling the structure to avoid the info leak.
Signed-off-by: Salva Peiró <speiro@ai2.upv.es>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b13d06c

Merge branch 'bridge_pvid' · b8bde1c4

David S. Miller authored Oct 18, 2013

Toshiaki Makita says:

====================
bridge: Fix problems around the PVID

There seem to be some undesirable behaviors related with PVID.
1. It has no effect assigning PVID to a port. PVID cannot be applied
to any frame regardless of whether we set it or not.
2. FDB entries learned via frames applied PVID are registered with
VID 0 rather than VID value of PVID.
3. We can set 0 or 4095 as a PVID that are not allowed in IEEE 802.1Q.
This leads interoperational problems such as sending frames with VID
4095, which is not allowed in IEEE 802.1Q, and treating frames with VID
0 as they belong to VLAN 0, which is expected to be handled as they have
no VID according to IEEE 802.1Q.

Note: 2nd and 3rd problems are potential and not exposed unless 1st problem
is fixed, because we cannot activate PVID due to it.

This is my analysis for each behavior.
1. We are using VLAN_TAG_PRESENT bit when getting PVID, and not when
adding/deleting PVID.
It can be fixed in either way using or not using VLAN_TAG_PRESENT,
but I think the latter is slightly more efficient.

2. We are setting skb->vlan_tci with the value of PVID but the variable
vid, which is used in FDB later, is set to 0 at br_allowed_ingress()
when untagged frames arrive at a port with PVID valid. I'm afraid that
vid should be updated to the value of PVID if PVID is valid.

3. According to IEEE 802.1Q-2011 (6.9.1 and Table 9-2), we cannot use
VID 0 or 4095 as a PVID.
It looks like that there are more stuff to consider.

- VID 0:
VID 0 shall not be configured in any FDB entry and used in a tag header
to indicate it is a 802.1p priority-tagged frame.
Priority-tagged frames should be applied PVID (from IEEE 802.1Q 6.9.1).
In my opinion, since we can filter incomming priority-tagged frames by
deleting PVID, we don't need to filter them by vlan_bitmap.
In other words, priority-tagged frames don't have VID 0 but have no VID,
which is the same as untagged frames, and should be filtered by unsetting
PVID.
So, not only we cannot set PVID as 0, but also we don't need to add 0 to
vlan_bitmap, which enables us to simply forbid to add vlan 0.

- VID 4095:
VID 4095 shall not be transmitted in a tag header. This VID value may be
used to indicate a wildcard match for the VID in management operations or
FDB entries (from IEEE 802.1Q Table 9-2).
In current implementation, we can create a static FDB entry with all
existing VIDs by not specifying any VID when creating it.
I don't think this way to add wildcard-like entries needs to change,
and VID 4095 looks no use and can be unacceptable to add.

Consequently, I believe what we should do for 3rd problem is below:
- Not allowing VID 0 and 4095 to be added.
- Applying PVID to priority-tagged (VID 0) frames.

Note: It has been descovered that another problem related to priority-tags
remains. If we use vlan 0 interface such as eth0.0, we cannot communicate
with another end station via a linux bridge.
This problem exists regardless of whether this patch set is applied or not
because we might receive untagged frames from another end station even if we
are sending priority-tagged frames.
This issue will be addressed by another patch set introducing an additional
egress policy, on which Vlad Yasevich is working.
See http://marc.info/?t=137880893800001&r=1&w=2 for detailed discussion.

Patch set follows this mail.
The order of patches is not the same as described above, because the way
to fix 1st problem is based on the assumption that we don't use VID 0 as
a PVID, which is realized by fixing 3rd problem.
(1/4)(2/4): Fix 3rd problem.
(3/4): Fix 1st problem.
(4/4): Fix 2nd probelm.

v2:
- Add descriptions about the problem related to priority-tags in cover letter.
- Revise patch comments to reference the newest spec.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b8bde1c4

bridge: Fix updating FDB entries when the PVID is applied · dfb5fa32

Toshiaki Makita authored Oct 16, 2013

We currently set the value that variable vid is pointing, which will be
used in FDB later, to 0 at br_allowed_ingress() when we receive untagged
or priority-tagged frames, even though the PVID is valid.
This leads to FDB updates in such a wrong way that they are learned with
VID 0.
Update the value to that of PVID if the PVID is applied.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dfb5fa32

bridge: Fix the way the PVID is referenced · d1c6c708

Toshiaki Makita authored Oct 16, 2013

We are using the VLAN_TAG_PRESENT bit to detect whether the PVID is
set or not at br_get_pvid(), while we don't care about the bit in
adding/deleting the PVID, which makes it impossible to forward any
incomming untagged frame with vlan_filtering enabled.

Since vid 0 cannot be used for the PVID, we can use vid 0 to indicate
that the PVID is not set, which is slightly more efficient than using
the VLAN_TAG_PRESENT.

Fix the problem by getting rid of using the VLAN_TAG_PRESENT.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d1c6c708

bridge: Apply the PVID to priority-tagged frames · b90356ce

Toshiaki Makita authored Oct 16, 2013

IEEE 802.1Q says that when we receive priority-tagged (VID 0) frames
use the PVID for the port as its VID.
(See IEEE 802.1Q-2011 6.9.1 and Table 9-2)

Apply the PVID to not only untagged frames but also priority-tagged frames.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b90356ce

bridge: Don't use VID 0 and 4095 in vlan filtering · 8adff41c

Toshiaki Makita authored Oct 16, 2013

IEEE 802.1Q says that:
- VID 0 shall not be configured as a PVID, or configured in any Filtering
Database entry.
- VID 4095 shall not be configured as a PVID, or transmitted in a tag
header. This VID value may be used to indicate a wildcard match for the VID
in management operations or Filtering Database entries.
(See IEEE 802.1Q-2011 6.9.1 and Table 9-2)

Don't accept adding these VIDs in the vlan_filtering implementation.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

8adff41c

17 Oct, 2013 6 commits

bridge: Correctly clamp MAX forward_delay when enabling STP · 4b6c7879

Vlad Yasevich authored Oct 15, 2013

Commit be4f154d
	bridge: Clamp forward_delay when enabling STP
had a typo when attempting to clamp maximum forward delay.

It is possible to set bridge_forward_delay to be higher then
permitted maximum when STP is off.  When turning STP on, the
higher then allowed delay has to be clamed down to max value.

CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Reviewed-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

4b6c7879

tcp: remove the sk_can_gso() check from tcp_set_skb_tso_segs() · 8f26fb1c

Eric Dumazet authored Oct 15, 2013

sk_can_gso() should only be used as a hint in tcp_sendmsg() to build GSO
packets in the first place. (As a performance hint)

Once we have GSO packets in write queue, we can not decide they are no
longer GSO only because flow now uses a route which doesn't handle
TSO/GSO.

Core networking stack handles the case very well for us, all we need
is keeping track of packet counts in MSS terms, regardless of
segmentation done later (in GSO or hardware)

Right now, if  tcp_fragment() splits a GSO packet in two parts,
@left and @right, and route changed through a non GSO device,
both @left and @right have pcount set to 1, which is wrong,
and leads to incorrect packet_count tracking.

This problem was added in commit d5ac99a6 ("[TCP]: skb pcount with MTU
discovery")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8f26fb1c

tcp: must unclone packets before mangling them · c52e2421

Eric Dumazet authored Oct 15, 2013

TCP stack should make sure it owns skbs before mangling them.

We had various crashes using bnx2x, and it turned out gso_size
was cleared right before bnx2x driver was populating TC descriptor
of the _previous_ packet send. TCP stack can sometime retransmit
packets that are still in Qdisc.

Of course we could make bnx2x driver more robust (using
ACCESS_ONCE(shinfo->gso_size) for example), but the bug is TCP stack.

We have identified two points where skb_unclone() was needed.

This patch adds a WARN_ON_ONCE() to warn us if we missed another
fix of this kind.

Kudos to Neal for finding the root cause of this bug. Its visible
using small MSS.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c52e2421

Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless · df709a18

David S. Miller authored Oct 17, 2013

John W. Linville says:

====================
Please pull this batch of fixes intended for the 3.12 stream!

For the mac80211 bits, Johannes says:

"Jouni fixes a remain-on-channel vs. scan bug, and Felix fixes client TX
probing on VLANs."

And also:

"This time I have two fixes from Emmanuel for RF-kill issues, and fixed
two issues reported by Evan Huus and Thomas Lindroth respectively."

On top of those...

Avinash Patil adds a couple of mwifiex fixes to properly inform cfg80211
about some different types of disconnects, avoiding WARNINGs.

Mark Cave-Ayland corrects a pointer arithmetic problem in rtlwifi,
avoiding incorrect automatic gain calculations.

Solomon Peachy sends a cw1200 fix for locking around calls to
cw1200_irq_handler, addressing "lost interrupt" problems.

Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

df709a18

net: qmi_wwan: Olivetti Olicard 200 support · ce97fef4

Enrico Mioso authored Oct 15, 2013

This is a QMI device, manufactured by TCT Mobile Phones.
A companion patch blacklisting this device's QMI interface in the option.c
driver has been sent.
Signed-off-by: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: Antonella Pellizzari <anto.pellizzari83@gmail.com>
Tested-by: Dan Williams <dcbw@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ce97fef4

virtio-net: refill only when device is up during setting queues · 35ed159b

Jason Wang authored Oct 15, 2013

We used to schedule the refill work unconditionally after changing the
number of queues. This may lead an issue if the device is not
up. Since we only try to cancel the work in ndo_stop(), this may cause
the refill work still work after removing the device. Fix this by only
schedule the work when device is up.

The bug were introduce by commit 9b9cd802.
(virtio-net: fix the race between channels setting and refill)

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

35ed159b