1. 10 Sep, 2016 1 commit
    • Marcelo Ricardo Leitner's avatar
      sctp: identify chunks that need to be fragmented at IP level · 7303a147
      Marcelo Ricardo Leitner authored
      Previously, without GSO, it was easy to identify it: if the chunk didn't
      fit and there was no data chunk in the packet yet, we could fragment at
      IP level. So if there was an auth chunk and we were bundling a big data
      chunk, it would fragment regardless of the size of the auth chunk. This
      also works for the context of PMTU reductions.
      
      But with GSO, we cannot distinguish such PMTU events anymore, as the
      packet is allowed to exceed PMTU.
      
      So we need another check: to ensure that the chunk that we are adding,
      actually fits the current PMTU. If it doesn't, trigger a flush and let
      it be fragmented at IP level in the next round.
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7303a147
  2. 09 Sep, 2016 8 commits
  3. 08 Sep, 2016 9 commits
  4. 06 Sep, 2016 6 commits
    • Wei Yongjun's avatar
      ipv6: addrconf: fix dev refcont leak when DAD failed · 751eb6b6
      Wei Yongjun authored
      In general, when DAD detected IPv6 duplicate address, ifp->state
      will be set to INET6_IFADDR_STATE_ERRDAD and DAD is stopped by a
      delayed work, the call tree should be like this:
      
      ndisc_recv_ns
        -> addrconf_dad_failure        <- missing ifp put
           -> addrconf_mod_dad_work
             -> schedule addrconf_dad_work()
               -> addrconf_dad_stop()  <- missing ifp hold before call it
      
      addrconf_dad_failure() called with ifp refcont holding but not put.
      addrconf_dad_work() call addrconf_dad_stop() without extra holding
      refcount. This will not cause any issue normally.
      
      But the race between addrconf_dad_failure() and addrconf_dad_work()
      may cause ifp refcount leak and netdevice can not be unregister,
      dmesg show the following messages:
      
      IPv6: eth0: IPv6 duplicate address fe80::XX:XXXX:XXXX:XX detected!
      ...
      unregister_netdevice: waiting for eth0 to become free. Usage count = 1
      
      Cc: stable@vger.kernel.org
      Fixes: c15b1cca ("ipv6: move DAD and addrconf_verify processing
      to workqueue")
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      751eb6b6
    • Michael Chan's avatar
      bnxt_en: Fix TX push operation on ARM64. · 9d13744b
      Michael Chan authored
      There is a code path where we are calling __iowrite64_copy() on
      an address that is not 64-bit aligned.  This causes an exception on
      some architectures such as arm64.  Fix that code path by using
      __iowrite32_copy().
      Reported-by: default avatarJD Zheng <jiandong.zheng@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d13744b
    • Mark Tomlinson's avatar
      net: Don't delete routes in different VRFs · 5a56a0b3
      Mark Tomlinson authored
      When deleting an IP address from an interface, there is a clean-up of
      routes which refer to this local address. However, there was no check to
      see that the VRF matched. This meant that deletion wasn't confined to
      the VRF it should have been.
      
      To solve this, a new field has been added to fib_info to hold a table
      id. When removing fib entries corresponding to a local ip address, this
      table id is also used in the comparison.
      
      The table id is populated when the fib_info is created. This was already
      done in some places, but not in ip_rt_ioctl(). This has now been fixed.
      
      Fixes: 021dd3b8 ("net: Add routes to the table associated with the device")
      Acked-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarMark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a56a0b3
    • Sudip Mukherjee's avatar
      net: smsc: remove build warning of duplicate definition · daa7ee8d
      Sudip Mukherjee authored
      The build of m32r was giving warning:
      
      In file included from drivers/net/ethernet/smsc/smc91x.c:92:0:
      drivers/net/ethernet/smsc/smc91x.h:448:0: warning: "SMC_inb" redefined
       #define SMC_inb(ioaddr, reg)  ({ BUG(); 0; })
      
      drivers/net/ethernet/smsc/smc91x.h:106:0:
      	note: this is the location of the previous definition
       #define SMC_inb(a, r)  inb(((u32)a) + (r))
      
      drivers/net/ethernet/smsc/smc91x.h:449:0: warning: "SMC_outb" redefined
       #define SMC_outb(x, ioaddr, reg) BUG()
      
      drivers/net/ethernet/smsc/smc91x.h:108:0:
      	note: this is the location of the previous definition
       #define SMC_outb(v, a, r) outb(v, ((u32)a) + (r))
      Signed-off-by: default avatarSudip Mukherjee <sudip.mukherjee@codethink.co.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      daa7ee8d
    • Helmut Buchsbaum's avatar
      net: macb: initialize checksum when using checksum offloading · 007e4ba3
      Helmut Buchsbaum authored
      I'm still struggling to get this fix right..
      
      Changes since v2:
       - do not blindly modify SKB contents according to Dave's legitimate
         objection
      
      Changes since v1:
       - dropped disabling HW checksum offload for Zynq
       - initialize checksum similar to net/ethernet/freescale/fec_main.c
      
      -- >8 --
      MACB/GEM needs the checksum field initialized to 0 to get correct
      results on transmit in all cases, e.g. on Zynq, UDP packets with
      payload <= 2 otherwise contain a wrong checksums.
      Signed-off-by: default avatarHelmut Buchsbaum <helmut.buchsbaum@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      007e4ba3
    • Dave Jones's avatar
      ipv6: release dst in ping_v6_sendmsg · 03c2778a
      Dave Jones authored
      Neither the failure or success paths of ping_v6_sendmsg release
      the dst it acquires.  This leads to a flood of warnings from
      "net/core/dst.c:288 dst_release" on older kernels that
      don't have 8bf4ada2 backported.
      
      That patch optimistically hoped this had been fixed post 3.10, but
      it seems at least one case wasn't, where I've seen this triggered
      a lot from machines doing unprivileged icmp sockets.
      
      Cc: Martin Lau <kafai@fb.com>
      Signed-off-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03c2778a
  5. 04 Sep, 2016 6 commits
    • Linus Torvalds's avatar
      af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock' · 6e1ce3c3
      Linus Torvalds authored
      Right now we use the 'readlock' both for protecting some of the af_unix
      IO path and for making the bind be single-threaded.
      
      The two are independent, but using the same lock makes for a nasty
      deadlock due to ordering with regards to filesystem locking.  The bind
      locking would want to nest outside the VSF pathname locking, but the IO
      locking wants to nest inside some of those same locks.
      
      We tried to fix this earlier with commit c845acb3 ("af_unix: Fix
      splice-bind deadlock") which moved the readlock inside the vfs locks,
      but that caused problems with overlayfs that will then call back into
      filesystem routines that take the lock in the wrong order anyway.
      
      Splitting the locks means that we can go back to having the bind lock be
      the outermost lock, and we don't have any deadlocks with lock ordering.
      Acked-by: default avatarRainer Weikusat <rweikusat@cyberadapt.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e1ce3c3
    • Linus Torvalds's avatar
      Revert "af_unix: Fix splice-bind deadlock" · 38f7bd94
      Linus Torvalds authored
      This reverts commit c845acb3.
      
      It turns out that it just replaces one deadlock with another one: we can
      still get the wrong lock ordering with the readlock due to overlayfs
      calling back into the filesystem layer and still taking the vfs locks
      after the readlock.
      
      The proper solution ends up being to just split the readlock into two
      pieces: the bind lock (taken *outside* the vfs locks) and the IO lock
      (taken *inside* the filesystem locks).  The two locks are independent
      anyway.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38f7bd94
    • David S. Miller's avatar
      Merge branch 'vxlan-fixes' · 2f83a53a
      David S. Miller authored
      Jiri Benc says:
      
      ====================
      vxlan: fix error reporting
      
      This patchset improves checking for invalid configuration in VXLAN and
      fixes problems with duplicated and inappropriate error messages.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f83a53a
    • Jiri Benc's avatar
      vxlan: fix duplicated and wrong error messages · 3555621d
      Jiri Benc authored
      vxlan_dev_configure outputs error messages before returning, no need to
      print again the same mesages in vxlan_newlink. Also, vxlan_dev_configure may
      return a particular error code for a different reason than vxlan_newlink
      thinks.
      
      Move the remaining error messages into vxlan_dev_configure and let
      vxlan_newlink just pass on the error code.
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3555621d
    • Jiri Benc's avatar
      vxlan: reject multicast destination without an interface · 9b4cdd51
      Jiri Benc authored
      Currently, kernel accepts configurations such as:
      
        ip l a type vxlan dstport 4789 id 1 group 239.192.0.1
        ip l a type vxlan dstport 4789 id 1 group ff0e::110
      
      However, neither of those really works. In the IPv4 case, the interface
      cannot be brought up ("RTNETLINK answers: No such device"). This is because
      multicast join will be rejected without the interface being specified.
      
      In the IPv6 case, multicast wil be joined on the first interface found. This
      is not what the user wants as it depends on random factors (order of
      interfaces).
      
      Note that it's possible to add a local address but it doesn't solve
      anything. For IPv4, it's not considered in the multicast join (thus the same
      error as above is returned on ifup). This could be added but it wouldn't
      help for IPv6 anyway. For IPv6, we do need the interface.
      
      Just reject a configuration that sets multicast address and does not provide
      an interface. Nobody can depend on the previous behavior as it never worked.
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b4cdd51
    • Mahesh Bandewar's avatar
      bonding: Fix bonding crash · 24b27fc4
      Mahesh Bandewar authored
      Following few steps will crash kernel -
      
        (a) Create bonding master
            > modprobe bonding miimon=50
        (b) Create macvlan bridge on eth2
            > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
      	   type macvlan
        (c) Now try adding eth2 into the bond
            > echo +eth2 > /sys/class/net/bond0/bonding/slaves
            <crash>
      
      Bonding does lots of things before checking if the device enslaved is
      busy or not.
      
      In this case when the notifier call-chain sends notifications, the
      bond_netdev_event() assumes that the rx_handler /rx_handler_data is
      registered while the bond_enslave() hasn't progressed far enough to
      register rx_handler for the new slave.
      
      This patch adds a rx_handler check that can be performed right at the
      beginning of the enslave code to avoid getting into this situation.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      24b27fc4
  6. 03 Sep, 2016 5 commits
  7. 02 Sep, 2016 4 commits
    • Sabrina Dubroca's avatar
      l2tp: fix use-after-free during module unload · 2f86953e
      Sabrina Dubroca authored
      Tunnel deletion is delayed by both a workqueue (l2tp_tunnel_delete -> wq
       -> l2tp_tunnel_del_work) and RCU (sk_destruct -> RCU ->
      l2tp_tunnel_destruct).
      
      By the time l2tp_tunnel_destruct() runs to destroy the tunnel and finish
      destroying the socket, the private data reserved via the net_generic
      mechanism has already been freed, but l2tp_tunnel_destruct() actually
      uses this data.
      
      Make sure tunnel deletion for the netns has completed before returning
      from l2tp_exit_net() by first flushing the tunnel removal workqueue, and
      then waiting for RCU callbacks to complete.
      
      Fixes: 167eb17e ("l2tp: create tunnel sockets in the right namespace")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f86953e
    • Eli Cooper's avatar
      ipv6: Don't unset flowi6_proto in ipxip6_tnl_xmit() · ab343801
      Eli Cooper authored
      Commit 8eb30be0 ("ipv6: Create ip6_tnl_xmit") unsets
      flowi6_proto in ip4ip6_tnl_xmit() and ip6ip6_tnl_xmit().
      Since xfrm_selector_match() relies on this info, IPv6 packets
      sent by an ip6tunnel cannot be properly selected by their
      protocols after removing it. This patch puts flowi6_proto back.
      
      Cc: stable@vger.kernel.org
      Fixes: 8eb30be0 ("ipv6: Create ip6_tnl_xmit")
      Signed-off-by: default avatarEli Cooper <elicooper@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab343801
    • Guilherme G. Piccoli's avatar
      bnx2x: don't reset chip on cleanup if PCI function is offline · b44e108b
      Guilherme G. Piccoli authored
      When PCI error is detected, in some architectures (like PowerPC) a slot
      reset is performed - the driver's error handlers are in charge of "disable"
      device before the reset, and re-enable it after a successful slot reset.
      
      There are two cases though that another path is taken on the code: if the
      slot reset is not successful or if too many errors already happened in the
      specific adapter (meaning that possibly the device is experiencing a HW
      failure that slot reset is not able to solve), the core PCI error mechanism
      (called EEH in PowerPC) will remove the adapter from the system, since it
      will consider this as a permanent failure on device. In this case, a path
      is taken that leads to bnx2x_chip_cleanup() calling bnx2x_reset_hw(), which
      then tries to perform a HW reset on chip. This reset won't succeed since
      the HW is in a fault state, which can be seen by multiple messages on
      kernel log like below:
      
      	bnx2x: [bnx2x_issue_dmae_with_comp:552(eth1)]DMAE timeout!
      	bnx2x: [bnx2x_write_dmae:600(eth1)]DMAE returned failure -1
      
      After some time, the PCI error mechanism gives up on waiting the driver's
      correct removal procedure and forcibly remove the adapter from the system.
      We can see soft lockup while core PCI error mechanism is waiting for driver
      to accomplish the right removal process.
      
      This patch adds a verification to avoid a chip reset whenever the function
      is in PCI error state - since this case is only reached when we have a
      device being removed because of a permanent failure, the HW chip reset is
      not expected to work fine neither is necessary.
      
      Also, as a minor improvement in error path, we avoid the MCP information dump
      in case of non-recoverable PCI error (when adapter is about to be removed),
      since it will certainly fail.
      Reported-by: default avatarHarsha Thyagaraja <hathyaga@in.ibm.com>
      Signed-off-by: default avatarGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Acked-By: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b44e108b
    • Gao Feng's avatar
      rps: flow_dissector: Fix uninitialized flow_keys used in __skb_get_hash possibly · 635c223c
      Gao Feng authored
      The original codes depend on that the function parameters are evaluated from
      left to right. But the parameter's evaluation order is not defined in C
      standard actually.
      
      When flow_keys_have_l4(&keys) is invoked before ___skb_get_hash(skb, &keys,
      hashrnd) with some compilers or environment, the keys passed to
      flow_keys_have_l4 is not initialized.
      
      Fixes: 6db61d79 ("flow_dissector: Ignore flow dissector return value from ___skb_get_hash")
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarGao Feng <fgao@ikuai8.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      635c223c
  8. 01 Sep, 2016 1 commit