Commits · 8a4d4c8dde7a4119bce3fd8287dca193ff6356da · nexedi / linux

30 Jan, 2016 3 commits

bnxt_en: Exclude rx_drop_pkts hw counter from the stack's rx_dropped counter. · 8a4d4c8d

Michael Chan authored Jan 28, 2016

This hardware counter is misleading as it counts dropped packets that
don't match the hardware filters for unicast/broadcast/multicast. We
will still report this counter in ethtool -S for diagnostics purposes.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8a4d4c8d

bnxt_en: Ring free response from close path should use completion ring · 74608fc9

Prashant Sreedharan authored Jan 28, 2016

Use completion ring for ring free response from firmware. The response
will be the last entry in the ring and we can free the ring after getting
the response. This will guarantee no spurious DMA to freed memory.
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

74608fc9

net_sched: drr: check for NULL pointer in drr_dequeue · df3eb6cd

Bernie Harris authored Jan 28, 2016

There are cases where qdisc_dequeue_peeked can return NULL, and the result
is dereferenced later on in the function.

Similarly to the other qdisc dequeue functions, check whether the skb
pointer is NULL and if it is, goto out.
Signed-off-by: Bernie Harris <bernie.harris@alliedtelesis.co.nz>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

df3eb6cd

29 Jan, 2016 22 commits

ptp: ixp46x: use helpers for converting ns to timespec · b83ef507

Kefeng Wang authored Jan 28, 2016

Convert the driver to use ns_to_timespec64() and timespec64_to_ns()
instead of open coding the same logic.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b83ef507

tcp: Change reference to experimental CWND RFC. · 21603fc4

Jörg Thalheim authored Jan 27, 2016

Signed-off-by: Jörg Thalheim <joerg@higgsboson.tk>
Signed-off-by: David S. Miller <davem@davemloft.net>

21603fc4

macvlan: make operstate and carrier more accurate · de7d244d

Nikolay Aleksandrov authored Jan 27, 2016

Currently when a macvlan is being initialized and the lower device is
netif_carrier_ok(), the macvlan device doesn't run through
rfc2863_policy() and is left with UNKNOWN operstate. Fix it by adding an
unconditional linkwatch event for the new macvlan device. Similar fix is
already used by the 8021q device (see register_vlan_dev()). Also fix the
inconsistent state when the lower device has been down and its carrier
was changed (when a device is down NETDEV_CHANGE doesn't get generated).
The second issue can be seen f.e. when we have a macvlan on top of a 8021q
device which has been down and its real device has been changing carrier
states, after setting the 8021q device up, the macvlan device will have
the same carrier state as it was before even though the 8021q can now
have a different state.
Example for case 1:
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 1000

$ ip l add l eth2 macvl0 type macvlan
$ ip l set macvl0 up
$ ip l sh macvl0
72: macvl0@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
noqueue state UNKNOWN mode DEFAULT group default
    link/ether f6:0b:54:0a:9d:a3 brd ff:ff:ff:ff:ff:ff

Example for case 2 (order is important):
Prestate: eth2 UP/CARRIER, vlan1 down, vlan1-macvlan down
$ ip l set vlan1-macvlan up
$ ip l sh vlan1-macvlan
71: vlan1-macvlan@vlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/ether 4a:b8:44:56:b9:b9 brd ff:ff:ff:ff:ff:ff

[ eth2 loses CARRIER before vlan1 has been UP-ed ]

$ ip l sh eth2
4: eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast
state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:bf:57:16 brd ff:ff:ff:ff:ff:ff
$ ip l sh vlan1-macvlan
71: vlan1-macvlan@vlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/ether 4a:b8:44:56:b9:b9 brd ff:ff:ff:ff:ff:ff
$ ip l set vlan1 up
$ ip l sh vlan1
70: vlan1@eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc
noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:bf:57:16 brd ff:ff:ff:ff:ff:ff
$ ip l sh vlan1-macvlan
71: vlan1-macvlan@vlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/ether 4a:b8:44:56:b9:b9 brd ff:ff:ff:ff:ff:ff

vlan1-macvlan is still UP, still has carrier and is still in the same
operstate as before. After the patch in case 1 macvl0 has state UP as it
should and in case 2 vlan1-macvlan has state LOWERLAYERDOWN again as it
should. Note that while the lower macvlan device is down their carrier
and thus operstate can go out of sync but that will be fixed once the
lower device goes up again.
This behaviour seems to have been present since beginning of git history.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de7d244d

tipc: fix connection abort during subscription cancel · 4d5cfcba

Parthasarathy Bhuvaragan authored Jan 27, 2016

In 'commit 7fe8097c ("tipc: fix nullpointer bug when subscribing
to events")', we terminate the connection if the subscription
creation fails.
In the same commit, the subscription creation result was based on
the value of the subscription pointer (set in the function) instead
of the return code.

Unfortunately, the same function tipc_subscrp_create() handles
subscription cancel request. For a subscription cancellation request,
the subscription pointer cannot be set. Thus if a subscriber has
several subscriptions and cancels any of them, the connection is
terminated.

In this commit, we terminate the connection based on the return value
of tipc_subscrp_create().
Fixes: commit 7fe8097c ("tipc: fix nullpointer bug when subscribing to events")
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4d5cfcba

net: cavium: liquidio: use helpers ns_to_timespec64() · 286af315

Kefeng Wang authored Jan 27, 2016

Convert the driver to use ns_to_timespec64() to keep consistency
with timespec64_to_ns() instead of open coding the same logic.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

286af315

ipv4: early demux should be aware of fragments · 63e51b6a

Eric Dumazet authored Jan 26, 2016

We should not assume a valid protocol header is present,
as this is not the case for IPv4 fragments.

Lets avoid extra cache line misses and potential bugs
if we actually find a socket and incorrectly uses its dst.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

63e51b6a

Merge branch 'phylib-regressions-part-2' · b6443885

David S. Miller authored Jan 28, 2016

Andrew Lunn says:

====================
Part 2 of v4.5-rc1 phylib regression

White list PHY compatible values which indicate PHYs.  Issue a warning
when one is encountered.

Update the documentation to make it clear what is expected in the
compatible string.

v2:
Fix Grammar, reword changelog, add Tested-by and Acked-by.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b6443885

DT: phy.txt: Clarify expected compatible values · e4bf797a

Andrew Lunn authored Jan 28, 2016

PHY devices may only list compatibility with clause 22, 45, and if
they need to be more specific, their PHY identifier values. No other
compatible strings are allowed.  Make this clear in the documentation,
and remove examples where make/model compatible strings are listed.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4bf797a

of: of_mdio: Add a whitelist of PHY compatibilities. · ae461131

Andrew Lunn authored Jan 28, 2016

Some phy nodes list a compatible value indicating the PHY make/model.
This is never used to match the device to the driver. However it does
confuse the code to separate a PHY from a generic MDIO device like a
switch. Generic MDIO devices must have a compatible value, PHYs can
list clause 22 or 45, but nothing else.

Issue a warning if we find a compatible value known on the whitelist,
and say it is a PHY.

Fixes: a9049e0c ("mdio: Add support for mdio drivers.")
Reported-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Reported-by: Olof Johansson <olof@lixom.net>
Tested-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

ae461131

Merge branch 'lan78xx-fixes' · a77ce1bc

David S. Miller authored Jan 28, 2016

Woojung Huh says:

====================
lan78xx: update and fixes

  lan78xx: change to use updated phy-ignore-interrupts
  lan78xx: Add to handle mux control per chip id
  lan78xx: throttle TX path at slower than SuperSpeed USB
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a77ce1bc

lan78xx: throttle TX path at slower than SuperSpeed USB · 4b2a4a96

Woojung.Huh@microchip.com authored Jan 27, 2016

Throttle TX path only at slower than SuperSpeed USB.
SuperSpeed USB has enough bandwidth to maintain GigE.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4b2a4a96

lan78xx: Add to handle mux control per chip id · a0db7d10

Woojung.Huh@microchip.com authored Jan 27, 2016

Depends on chip, some EEPROM pins are muxed with LED function.
Disable & restore LED function to access EEPROM.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a0db7d10

lan78xx: change to use updated phy-ignore-interrupts · e4953910

Woojung.Huh@microchip.com authored Jan 27, 2016

Update lan78xx to use patch of commit 4f2aaf7d
("Merge branch 'fix-phy-ignore-interrupts'").
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4953910

tcp: beware of alignments in tcp_get_info() · ff5d7497

Eric Dumazet authored Jan 27, 2016

With some combinations of user provided flags in netlink command,
it is possible to call tcp_get_info() with a buffer that is not 8-bytes
aligned.

It does matter on some arches, so we need to use put_unaligned() to
store the u64 fields.

Current iproute2 package does not trigger this particular issue.

Fixes: 0df48c26 ("tcp: add tcpi_bytes_acked to tcp_info")
Fixes: 977cb0ec ("tcp: add pacing_rate information into tcp_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff5d7497

switchdev: Require RTNL mutex to be held when sending FDB notifications · 4f2c6ae5

Ido Schimmel authored Jan 27, 2016

When switchdev drivers process FDB notifications from the underlying
device they resolve the netdev to which the entry points to and notify
the bridge using the switchdev notifier.

However, since the RTNL mutex is not held there is nothing preventing
the netdev from disappearing in the middle, which will cause
br_switchdev_event() to dereference a non-existing netdev.

Make switchdev drivers hold the lock at the beginning of the
notification processing session and release it once it ends, after
notifying the bridge.

Also, remove switchdev_mutex and fdb_lock, as they are no longer needed
when RTNL mutex is held.

Fixes: 03bf0c28 ("switchdev: introduce switchdev notifier")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4f2c6ae5

xen-netfront: request Tx response events more often · 7d0105b5

Malcolm Crossley authored Jan 26, 2016

Trying to batch Tx response events results in poor performance because
this delays freeing the transmitted skbs.

Instead use the standard RING_FINAL_CHECK_FOR_RESPONSES() macro to be
notified once the next Tx response is placed on the ring.
Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7d0105b5

net: mv643xx_eth: fix packet corruption with TSO and tiny unaligned packets. · 3b89624a

Nicolas Schichan authored Jan 26, 2016

The code in txq_put_data() would use txq->tx_curr_desc to index the
tso_hdrs/tso_hdrs_dma buffers, for less than 8 bytes unaligned
fragments, which is already moved to the next descriptor at the
beginning of the function.

If that fragment was the last of the the skb, the next skb would use
that same space to place the ip headers, overwritting that small
fragment data.

Fixes: 91986fd3 (net: mv643xx_eth: Ensure proper data alignment in TSO TX path)
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Reviewed-by: Philipp Kirchhofer <philipp@familie-kirchhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

3b89624a

of: of_mdio: Ensure mdio device is a PHY · 6ed74236

Andrew Lunn authored Jan 26, 2016

of_phy_find_device() is used to find the phy device associated with a
device node. It is expected the node is for a PHY device, but in fact
it could of been probed as a generic MDIO device. Ensure the device is
a PHY before returning it.

Fixes: a9049e0c ("mdio: Add support for mdio drivers.")
Reported-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Reported-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6ed74236

Merge tag 'mac80211-for-davem-2016-01-26' of... · bd7c5e31

David S. Miller authored Jan 28, 2016

Merge tag 'mac80211-for-davem-2016-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Here's a first set of fixes for the 4.5-rc cycle:
 * make regulatory messages much less verbose by default
 * various remain-on-channel fixes
 * scheduled scanning fixes with hardware restart
 * a PS-Poll handling fix; was broken just recently
 * bugfix to avoid buffering non-bufferable MMPDUs
 * world regulatory domain data fix
 * a fix for scanning causing other work to get stuck
 * hwsim: revert an older problematic patch that caused some
   userspace tools to have issues - not that big a deal as
   it's a debug only driver though
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

bd7c5e31

net: Fix dependencies for !HAS_IOMEM archs · c731f0e3

Richard Weinberger authored Jan 25, 2016

Not every arch has io memory.
So, unbreak the build by fixing the dependencies.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: David S. Miller <davem@davemloft.net>

c731f0e3

tcp: fix tcp_mark_head_lost to check skb len before fragmenting · d88270ee

Neal Cardwell authored Jan 25, 2016

This commit fixes a corner case in tcp_mark_head_lost() which was
causing the WARN_ON(len > skb->len) in tcp_fragment() to fire.

tcp_mark_head_lost() was assuming that if a packet has
tcp_skb_pcount(skb) of N, then it's safe to fragment off a prefix of
M*mss bytes, for any M < N. But with the tricky way TCP pcounts are
maintained, this is not always true.

For example, suppose the sender sends 4 1-byte packets and have the
last 3 packet sacked. It will merge the last 3 packets in the write
queue into an skb with pcount = 3 and len = 3 bytes. If another
recovery happens after a sack reneging event, tcp_mark_head_lost()
may attempt to split the skb assuming it has more than 2*MSS bytes.

This sounds very counterintuitive, but as the commit description for
the related commit c0638c24 ("tcp: don't fragment SACKed skbs in
tcp_mark_head_lost()") notes, this is because tcp_shifted_skb()
coalesces adjacent regions of SACKed skbs, and when doing this it
preserves the sum of their packet counts in order to reflect the
real-world dynamics on the wire. The c0638c24 commit tried to
avoid problems by not fragmenting SACKed skbs, since SACKed skbs are
where the non-proportionality between pcount and skb->len/mss is known
to be possible. However, that commit did not handle the case where
during a reneging event one of these weird SACKed skbs becomes an
un-SACKed skb, which tcp_mark_head_lost() can then try to fragment.

The fix is to simply mark the entire skb lost when this happens.
This makes the recovery slightly more aggressive in such corner
cases before we detect reordering. But once we detect reordering
this code path is by-passed because FACK is disabled.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d88270ee

inet: frag: Always orphan skbs inside ip_defrag() · 8282f274

Joe Stringer authored Jan 22, 2016

Later parts of the stack (including fragmentation) expect that there is
never a socket attached to frag in a frag_list, however this invariant
was not enforced on all defrag paths. This could lead to the
BUG_ON(skb->sk) during ip_do_fragment(), as per the call stack at the
end of this commit message.

While the call could be added to openvswitch to fix this particular
error, the head and tail of the frags list are already orphaned
indirectly inside ip_defrag(), so it seems like the remaining fragments
should all be orphaned in all circumstances.

kernel BUG at net/ipv4/ip_output.c:586!
[...]
Call Trace:
 <IRQ>
 [<ffffffffa0205270>] ? do_output.isra.29+0x1b0/0x1b0 [openvswitch]
 [<ffffffffa02167a7>] ovs_fragment+0xcc/0x214 [openvswitch]
 [<ffffffff81667830>] ? dst_discard_out+0x20/0x20
 [<ffffffff81667810>] ? dst_ifdown+0x80/0x80
 [<ffffffffa0212072>] ? find_bucket.isra.2+0x62/0x70 [openvswitch]
 [<ffffffff810e0ba5>] ? mod_timer_pending+0x65/0x210
 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
 [<ffffffffa03205a2>] ? nf_conntrack_in+0x252/0x500 [nf_conntrack]
 [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
 [<ffffffffa02051a3>] do_output.isra.29+0xe3/0x1b0 [openvswitch]
 [<ffffffffa0206411>] do_execute_actions+0xe11/0x11f0 [openvswitch]
 [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
 [<ffffffffa0206822>] ovs_execute_actions+0x32/0xd0 [openvswitch]
 [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
 [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
 [<ffffffffa02068a2>] ovs_execute_actions+0xb2/0xd0 [openvswitch]
 [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
 [<ffffffffa0215019>] ? ovs_ct_get_labels+0x49/0x80 [openvswitch]
 [<ffffffffa0213a1d>] ovs_vport_receive+0x5d/0xa0 [openvswitch]
 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
 [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
 [<ffffffffa02148fc>] internal_dev_xmit+0x6c/0x140 [openvswitch]
 [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
 [<ffffffff81660299>] dev_hard_start_xmit+0x2b9/0x5e0
 [<ffffffff8165fc21>] ? netif_skb_features+0xd1/0x1f0
 [<ffffffff81660f20>] __dev_queue_xmit+0x800/0x930
 [<ffffffff81660770>] ? __dev_queue_xmit+0x50/0x930
 [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
 [<ffffffff81669876>] ? neigh_resolve_output+0x106/0x220
 [<ffffffff81661060>] dev_queue_xmit+0x10/0x20
 [<ffffffff816698e8>] neigh_resolve_output+0x178/0x220
 [<ffffffff816a8e6f>] ? ip_finish_output2+0x1ff/0x590
 [<ffffffff816a8e6f>] ip_finish_output2+0x1ff/0x590
 [<ffffffff816a8cee>] ? ip_finish_output2+0x7e/0x590
 [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
 [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
 [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
 [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
 [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
 [<ffffffff816ab4c0>] ip_output+0x70/0x110
 [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
 [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
 [<ffffffff816abf89>] ip_send_skb+0x19/0x40
 [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
 [<ffffffff816df21a>] icmp_push_reply+0xea/0x120
 [<ffffffff816df93d>] icmp_reply.constprop.23+0x1ed/0x230
 [<ffffffff816df9ce>] icmp_echo.part.21+0x4e/0x50
 [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
 [<ffffffff810d5f9e>] ? rcu_read_lock_held+0x5e/0x70
 [<ffffffff816dfa06>] icmp_echo+0x36/0x70
 [<ffffffff816e0d11>] icmp_rcv+0x271/0x450
 [<ffffffff816a4ca7>] ip_local_deliver_finish+0x127/0x3a0
 [<ffffffff816a4bc1>] ? ip_local_deliver_finish+0x41/0x3a0
 [<ffffffff816a5160>] ip_local_deliver+0x60/0xd0
 [<ffffffff816a4b80>] ? ip_rcv_finish+0x560/0x560
 [<ffffffff816a46fd>] ip_rcv_finish+0xdd/0x560
 [<ffffffff816a5453>] ip_rcv+0x283/0x3e0
 [<ffffffff810b6302>] ? match_held_lock+0x192/0x200
 [<ffffffff816a4620>] ? inet_del_offload+0x40/0x40
 [<ffffffff8165d062>] __netif_receive_skb_core+0x392/0xae0
 [<ffffffff8165e68e>] ? process_backlog+0x8e/0x230
 [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
 [<ffffffff8165d7c8>] __netif_receive_skb+0x18/0x60
 [<ffffffff8165e678>] process_backlog+0x78/0x230
 [<ffffffff8165e6dd>] ? process_backlog+0xdd/0x230
 [<ffffffff8165e355>] net_rx_action+0x155/0x400
 [<ffffffff8106b48c>] __do_softirq+0xcc/0x420
 [<ffffffff816a8e87>] ? ip_finish_output2+0x217/0x590
 [<ffffffff8178e78c>] do_softirq_own_stack+0x1c/0x30
 <EOI>
 [<ffffffff8106b88e>] do_softirq+0x4e/0x60
 [<ffffffff8106b948>] __local_bh_enable_ip+0xa8/0xb0
 [<ffffffff816a8eb0>] ip_finish_output2+0x240/0x590
 [<ffffffff816a9a31>] ? ip_do_fragment+0x831/0x8a0
 [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
 [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
 [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
 [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
 [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
 [<ffffffff816ab4c0>] ip_output+0x70/0x110
 [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
 [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
 [<ffffffff816abf89>] ip_send_skb+0x19/0x40
 [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
 [<ffffffff816d55d3>] raw_sendmsg+0x7d3/0xc30
 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
 [<ffffffff816e7557>] ? inet_sendmsg+0xc7/0x1d0
 [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
 [<ffffffff816e759a>] inet_sendmsg+0x10a/0x1d0
 [<ffffffff816e7495>] ? inet_sendmsg+0x5/0x1d0
 [<ffffffff8163e398>] sock_sendmsg+0x38/0x50
 [<ffffffff8163ec5f>] ___sys_sendmsg+0x25f/0x270
 [<ffffffff811aadad>] ? handle_mm_fault+0x8dd/0x1320
 [<ffffffff8178c147>] ? _raw_spin_unlock+0x27/0x40
 [<ffffffff810529b2>] ? __do_page_fault+0x1e2/0x460
 [<ffffffff81204886>] ? __fget_light+0x66/0x90
 [<ffffffff8163f8e2>] __sys_sendmsg+0x42/0x80
 [<ffffffff8163f932>] SyS_sendmsg+0x12/0x20
 [<ffffffff8178cb17>] entry_SYSCALL_64_fastpath+0x12/0x6f
Code: 00 00 44 89 e0 e9 7c fb ff ff 4c 89 ff e8 e7 e7 ff ff 41 8b 9d 80 00 00 00 2b 5d d4 89 d8 c1 f8 03 0f b7 c0 e9 33 ff ff f
 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48
RIP  [<ffffffff816a9a92>] ip_do_fragment+0x892/0x8a0
 RSP <ffff88006d603170>

Fixes: 7f8a436e ("openvswitch: Add conntrack action")
Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

8282f274

28 Jan, 2016 15 commits

Merge branch 'sctp-transport-races' · 2cc5e4ca

David S. Miller authored Jan 28, 2016

Xin Long says:

====================
fix the transport dead race check by using atomic_add_unless on refcnt

  sctp: fix the transport dead race check by using atomic_add_unless on
    refcnt
  sctp: hold transport before we access t->asoc in sctp proc
  sctp: remove the dead field of sctp_transport
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

2cc5e4ca

sctp: remove the dead field of sctp_transport · 47faa1e4

Xin Long authored Jan 22, 2016

After we use refcnt to check if transport is alive, the dead can be
removed from sctp_transport.

The traversal of transport_addr_list in procfs dump is using
list_for_each_entry_rcu, no need to check if it has been freed.

sctp_generate_t3_rtx_event and sctp_generate_heartbeat_event is
protected by sock lock, it's not necessary to check dead, either.
also, the timers are cancelled when sctp_transport_free() is
called, that it doesn't wait for refcnt to reach 0 to cancel them.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

47faa1e4

sctp: hold transport before we access t->asoc in sctp proc · fba4c330

Xin Long authored Jan 22, 2016

Previously, before rhashtable, /proc assoc listing was done by
read-locking the entire hash entry and dumping all assocs at once, so we
were sure that the assoc wasn't freed because it wouldn't be possible to
remove it from the hash meanwhile.

Now we use rhashtable to list transports, and dump entries one by one.
That is, now we have to check if the assoc is still a good one, as the
transport we got may be being freed.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fba4c330

sctp: fix the transport dead race check by using atomic_add_unless on refcnt · 1eed6779

Xin Long authored Jan 22, 2016

Now when __sctp_lookup_association is running in BH, it will try to
check if t->dead is set, but meanwhile other CPUs may be freeing this
transport and this assoc and if it happens that
__sctp_lookup_association checked t->dead a bit too early, it may think
that the association is still good while it was already freed.

So we fix this race by using atomic_add_unless in sctp_transport_hold.
After we get one transport from hashtable, we will hold it only when
this transport's refcnt is not 0, so that we can make sure t->asoc
cannot be freed before we hold the asoc again.

Note that sctp association is not freed using RCU so we can't use
atomic_add_unless() with it as it may just be too late for that either.

Fixes: 4f008781 ("sctp: apply rhashtable api to send/recv path")
Reported-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1eed6779

Merge branch 'mlxsw-fixes' · 2baaa2d1

David S. Miller authored Jan 28, 2016

Jiri Pirko says:

====================
mlxsw: driver fixes

Couple of various mlxsw driver fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

2baaa2d1

mlxsw: reg: Use correct offset in field definiton · bbeeda27

Ido Schimmel authored Jan 27, 2016

The rx_lane, tx_lane and module fields in the PMLP register don't have
an additional offset besides the base one (0x04), so set it to 0x00.

Fixes: 4ec14b76 ("mlxsw: Add interface to access registers and process events")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bbeeda27

mlxsw: spectrum: Compare local ports instead of pointers · 3f47f867

Ido Schimmel authored Jan 27, 2016

When dumping the FDB we can't compare the actual pointers of the ports
structs, as it's possible the struct represents a vPort instead of the
underlying physical port.

Solve this by comparing the local port number instead, as it's shared
between the physical ports and all the vPorts on top of him.

Fixes: 54a73201 ("mlxsw: spectrum: Adjust switchdev ops for VLAN devices")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3f47f867

mlxsw: spectrum: Dump LAG FDB records only once · 304f5158

Ido Schimmel authored Jan 27, 2016

LAG FDB records can only point to LAG devices or VLAN devices configured
on top of them. Therefore, when dumping the FDB we shouldn't associate
these records with the underlying physical ports.

Fixes: 8a1ab5d7 ("mlxsw: spectrum: Implement FDB add/remove/dump for LAG")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

304f5158

mlxsw: spectrum: Use correct netdev when notifying bridge · e43aca22

Ido Schimmel authored Jan 27, 2016

LAG FDB entries pointing to VLAN devices should be reported to the
bridge with the matching VLAN device and not the underlying LAG device.

Fixes: aac78a44 ("mlxsw: spectrum: Adjust FDB notifications for VLAN devices")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e43aca22

mlxsw: spectrum: Don't report VLAN for 802.1D FDB entries · 004f85ea

Ido Schimmel authored Jan 27, 2016

When dumping the hardware FDB we should report entries pointing to VLAN
devices with VLAN 0, as packets coming into the bridge are untagged.
Likewise, pass FDB_{ADD,DEL} notifications with VLAN 0 for these
devices.

Fixes: 54a73201 ("mlxsw: spectrum: Adjust switchdev ops for VLAN devices")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

004f85ea

mlxsw: spectrum: Notify bridge's FDB only based on learning_sync · 45827d78

Ido Schimmel authored Jan 27, 2016

When we disable learning on bridge port we should still update the
software bridge's FDB when entry pointing to this bridge port is
aged-out. We can otherwise have an inconsistency between software and
hardware tables.

Fixes: 8a1ab5d7 ("mlxsw: spectrum: Implement FDB add/remove/dump for LAG")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

45827d78

mlxsw: spectrum: Disable learning according to STP state · 45491133

Ido Schimmel authored Jan 27, 2016

When port is put into LISTENING state it shouldn't populate the FDB, so
set the port's STP state in hardware to DISCARDING instead of LEARNING.
It will therefore keep listening to BPDU packets, but discard other
non-control packets and won't perform any learning.

Fixes: 56ade8fe ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

45491133

mlxsw: spectrum: Don't forward packets when STP state is DISABLED · 9cb026eb

Ido Schimmel authored Jan 27, 2016

When STP state is set to DISABLED the port is assumed to be inactive, but
currently we forward packets ingressing through it.

Instead, set the port's STP state in hardware to DISCARDING, which means
it doesn't forward packets or perform any learning, but it does trap
control packets. However, these packets will be dropped by bridge code,
which results in the expected behavior.

Fixes: 56ade8fe ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9cb026eb

mlxsw: spectrum: Flush FDB when leaving bridge · 039c49a6

Ido Schimmel authored Jan 27, 2016

As explained in previous commit, we should always take care of flushing
the FDB in the driver and not rely on bridge code.

We need to distinguish between two cases with regards to LAG:

1) Port is leaving LAG while LAG is bridged (or VLAN devices on top of
it). In this case don't flush the FDB entries pointing to the LAG ID, as
this will affect other ports still member in the LAG. Only flush the FDB
when the last port in the LAG is leaving the bridge.

2) LAG device is leaving the bridge. In this case the CHANGEUPPER event
is simply propagated to each member port, so make each port flush the
FDB in its turn.

Note that emptying a bridged LAG from ports creates an inconsistency
between hardware and software. A user who later (< ageing_time)
re-populates the LAG won't have any FDB entries pointing to the LAG ID
in hardware, but they will be present in the software bridge's FDB.
Currently there is no good solution to this problem, but this will be
addressed by us in the future.

In order to optimize the flushing process, flush by port or LAG ID if
there are no VLAN interfaces on top of the port. Otherwise, flush using
(Port / LAG ID, FID=VID} for each of the lower 4K FIDs. In the case of
VLAN device simply flush using {Port / LAG ID, vFID} with the vFID to
which the VLAN device is mapped to.

Fixes: 56ade8fe ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

039c49a6

mlxsw: reg: Add the Switch Filtering DB Flush register · 41933271

Ido Schimmel authored Jan 27, 2016

When removing a net device from a bridge we should flush the FDB entries
associated with this net device. Up until now, we relied upon bridge
code to do that for us, but it is possible for user to prevent hardware
from syncing with the software bridge (learning_sync=0), so we need to
flush overselves.

Add the Switch Filtering DB Flush (SFDF) register that is used to flush
FDB entries according to different parameters (per-port, per-FID etc).

Fixes: 56ade8fe ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

41933271