1. 06 Sep, 2016 1 commit
    • Liping Zhang's avatar
      netfilter: nft_chain_route: re-route before skb is queued to userspace · d1a6cba5
      Liping Zhang authored
      Imagine such situation, user add the following nft rules, and queue
      the packets to userspace for further check:
        # ip rule add fwmark 0x0/0x1 lookup eth0
        # ip rule add fwmark 0x1/0x1 lookup eth1
        # nft add table filter
        # nft add chain filter output {type route hook output priority 0 \;}
        # nft add rule filter output mark set 0x1
        # nft add rule filter output queue num 0
      
      But after we reinject the skbuff, the packet will be sent via the
      wrong route, i.e. in this case, the packet will be routed via eth0
      table, not eth1 table. Because we skip to do re-route when verdict
      is NF_QUEUE, even if the mark was changed.
      
      Acctually, we should not touch sk_buff if verdict is NF_DROP or
      NF_STOLEN, and when re-route fails, return NF_DROP with error code.
      This is consistent with the mangle table in iptables.
      Signed-off-by: default avatarLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d1a6cba5
  2. 05 Sep, 2016 1 commit
  3. 04 Sep, 2016 6 commits
    • Linus Torvalds's avatar
      af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock' · 6e1ce3c3
      Linus Torvalds authored
      Right now we use the 'readlock' both for protecting some of the af_unix
      IO path and for making the bind be single-threaded.
      
      The two are independent, but using the same lock makes for a nasty
      deadlock due to ordering with regards to filesystem locking.  The bind
      locking would want to nest outside the VSF pathname locking, but the IO
      locking wants to nest inside some of those same locks.
      
      We tried to fix this earlier with commit c845acb3 ("af_unix: Fix
      splice-bind deadlock") which moved the readlock inside the vfs locks,
      but that caused problems with overlayfs that will then call back into
      filesystem routines that take the lock in the wrong order anyway.
      
      Splitting the locks means that we can go back to having the bind lock be
      the outermost lock, and we don't have any deadlocks with lock ordering.
      Acked-by: default avatarRainer Weikusat <rweikusat@cyberadapt.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e1ce3c3
    • Linus Torvalds's avatar
      Revert "af_unix: Fix splice-bind deadlock" · 38f7bd94
      Linus Torvalds authored
      This reverts commit c845acb3.
      
      It turns out that it just replaces one deadlock with another one: we can
      still get the wrong lock ordering with the readlock due to overlayfs
      calling back into the filesystem layer and still taking the vfs locks
      after the readlock.
      
      The proper solution ends up being to just split the readlock into two
      pieces: the bind lock (taken *outside* the vfs locks) and the IO lock
      (taken *inside* the filesystem locks).  The two locks are independent
      anyway.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38f7bd94
    • David S. Miller's avatar
      Merge branch 'vxlan-fixes' · 2f83a53a
      David S. Miller authored
      Jiri Benc says:
      
      ====================
      vxlan: fix error reporting
      
      This patchset improves checking for invalid configuration in VXLAN and
      fixes problems with duplicated and inappropriate error messages.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f83a53a
    • Jiri Benc's avatar
      vxlan: fix duplicated and wrong error messages · 3555621d
      Jiri Benc authored
      vxlan_dev_configure outputs error messages before returning, no need to
      print again the same mesages in vxlan_newlink. Also, vxlan_dev_configure may
      return a particular error code for a different reason than vxlan_newlink
      thinks.
      
      Move the remaining error messages into vxlan_dev_configure and let
      vxlan_newlink just pass on the error code.
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3555621d
    • Jiri Benc's avatar
      vxlan: reject multicast destination without an interface · 9b4cdd51
      Jiri Benc authored
      Currently, kernel accepts configurations such as:
      
        ip l a type vxlan dstport 4789 id 1 group 239.192.0.1
        ip l a type vxlan dstport 4789 id 1 group ff0e::110
      
      However, neither of those really works. In the IPv4 case, the interface
      cannot be brought up ("RTNETLINK answers: No such device"). This is because
      multicast join will be rejected without the interface being specified.
      
      In the IPv6 case, multicast wil be joined on the first interface found. This
      is not what the user wants as it depends on random factors (order of
      interfaces).
      
      Note that it's possible to add a local address but it doesn't solve
      anything. For IPv4, it's not considered in the multicast join (thus the same
      error as above is returned on ifup). This could be added but it wouldn't
      help for IPv6 anyway. For IPv6, we do need the interface.
      
      Just reject a configuration that sets multicast address and does not provide
      an interface. Nobody can depend on the previous behavior as it never worked.
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b4cdd51
    • Mahesh Bandewar's avatar
      bonding: Fix bonding crash · 24b27fc4
      Mahesh Bandewar authored
      Following few steps will crash kernel -
      
        (a) Create bonding master
            > modprobe bonding miimon=50
        (b) Create macvlan bridge on eth2
            > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
      	   type macvlan
        (c) Now try adding eth2 into the bond
            > echo +eth2 > /sys/class/net/bond0/bonding/slaves
            <crash>
      
      Bonding does lots of things before checking if the device enslaved is
      busy or not.
      
      In this case when the notifier call-chain sends notifications, the
      bond_netdev_event() assumes that the rx_handler /rx_handler_data is
      registered while the bond_enslave() hasn't progressed far enough to
      register rx_handler for the new slave.
      
      This patch adds a rx_handler check that can be performed right at the
      beginning of the enslave code to avoid getting into this situation.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      24b27fc4
  4. 03 Sep, 2016 5 commits
  5. 02 Sep, 2016 4 commits
    • Sabrina Dubroca's avatar
      l2tp: fix use-after-free during module unload · 2f86953e
      Sabrina Dubroca authored
      Tunnel deletion is delayed by both a workqueue (l2tp_tunnel_delete -> wq
       -> l2tp_tunnel_del_work) and RCU (sk_destruct -> RCU ->
      l2tp_tunnel_destruct).
      
      By the time l2tp_tunnel_destruct() runs to destroy the tunnel and finish
      destroying the socket, the private data reserved via the net_generic
      mechanism has already been freed, but l2tp_tunnel_destruct() actually
      uses this data.
      
      Make sure tunnel deletion for the netns has completed before returning
      from l2tp_exit_net() by first flushing the tunnel removal workqueue, and
      then waiting for RCU callbacks to complete.
      
      Fixes: 167eb17e ("l2tp: create tunnel sockets in the right namespace")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f86953e
    • Eli Cooper's avatar
      ipv6: Don't unset flowi6_proto in ipxip6_tnl_xmit() · ab343801
      Eli Cooper authored
      Commit 8eb30be0 ("ipv6: Create ip6_tnl_xmit") unsets
      flowi6_proto in ip4ip6_tnl_xmit() and ip6ip6_tnl_xmit().
      Since xfrm_selector_match() relies on this info, IPv6 packets
      sent by an ip6tunnel cannot be properly selected by their
      protocols after removing it. This patch puts flowi6_proto back.
      
      Cc: stable@vger.kernel.org
      Fixes: 8eb30be0 ("ipv6: Create ip6_tnl_xmit")
      Signed-off-by: default avatarEli Cooper <elicooper@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab343801
    • Guilherme G. Piccoli's avatar
      bnx2x: don't reset chip on cleanup if PCI function is offline · b44e108b
      Guilherme G. Piccoli authored
      When PCI error is detected, in some architectures (like PowerPC) a slot
      reset is performed - the driver's error handlers are in charge of "disable"
      device before the reset, and re-enable it after a successful slot reset.
      
      There are two cases though that another path is taken on the code: if the
      slot reset is not successful or if too many errors already happened in the
      specific adapter (meaning that possibly the device is experiencing a HW
      failure that slot reset is not able to solve), the core PCI error mechanism
      (called EEH in PowerPC) will remove the adapter from the system, since it
      will consider this as a permanent failure on device. In this case, a path
      is taken that leads to bnx2x_chip_cleanup() calling bnx2x_reset_hw(), which
      then tries to perform a HW reset on chip. This reset won't succeed since
      the HW is in a fault state, which can be seen by multiple messages on
      kernel log like below:
      
      	bnx2x: [bnx2x_issue_dmae_with_comp:552(eth1)]DMAE timeout!
      	bnx2x: [bnx2x_write_dmae:600(eth1)]DMAE returned failure -1
      
      After some time, the PCI error mechanism gives up on waiting the driver's
      correct removal procedure and forcibly remove the adapter from the system.
      We can see soft lockup while core PCI error mechanism is waiting for driver
      to accomplish the right removal process.
      
      This patch adds a verification to avoid a chip reset whenever the function
      is in PCI error state - since this case is only reached when we have a
      device being removed because of a permanent failure, the HW chip reset is
      not expected to work fine neither is necessary.
      
      Also, as a minor improvement in error path, we avoid the MCP information dump
      in case of non-recoverable PCI error (when adapter is about to be removed),
      since it will certainly fail.
      Reported-by: default avatarHarsha Thyagaraja <hathyaga@in.ibm.com>
      Signed-off-by: default avatarGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Acked-By: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b44e108b
    • Gao Feng's avatar
      rps: flow_dissector: Fix uninitialized flow_keys used in __skb_get_hash possibly · 635c223c
      Gao Feng authored
      The original codes depend on that the function parameters are evaluated from
      left to right. But the parameter's evaluation order is not defined in C
      standard actually.
      
      When flow_keys_have_l4(&keys) is invoked before ___skb_get_hash(skb, &keys,
      hashrnd) with some compilers or environment, the keys passed to
      flow_keys_have_l4 is not initialized.
      
      Fixes: 6db61d79 ("flow_dissector: Ignore flow dissector return value from ___skb_get_hash")
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarGao Feng <fgao@ikuai8.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      635c223c
  6. 01 Sep, 2016 23 commits