1. 15 Dec, 2018 13 commits
  2. 14 Dec, 2018 27 commits
    • David S. Miller's avatar
      Merge branch 'net-prefer-listeners-bound-to-an-address' · b9948e11
      David S. Miller authored
      Peter Oskolkov says:
      
      ====================
      net: prefer listeners bound to an address
      
      A relatively common use case is to have several IPs configured
      on a host, and have different listeners for each of them. We would
      like to add a "catch all" listener on addr_any, to match incoming
      connections not served by any of the listeners bound to a specific
      address.
      
      However, port-only lookups can match addr_any sockets when sockets
      listening on specific addresses are present if so_reuseport flag
      is set. This patchset eliminates lookups into port-only hashtable,
      as lookups by (addr,port) tuple are easily available.
      
      In a future patchset I plan to explore whether it is possible
      to remove port-only hashtables completely: additional refactoring
      will be required, as some non-lookup code uses the hashtables.
      ====================
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9948e11
    • Peter Oskolkov's avatar
      selftests: net: test that listening sockets match on address properly · 6254e5c6
      Peter Oskolkov authored
      This patch adds a selftest that verifies that a socket listening
      on a specific address is chosen in preference over sockets
      that listen on any address. The test covers UDP/UDP6/TCP/TCP6.
      
      It is based on, and similar to, reuseport_dualstack.c selftest.
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6254e5c6
    • Peter Oskolkov's avatar
      net: tcp6: prefer listeners bound to an address · 0ee58dad
      Peter Oskolkov authored
      A relatively common use case is to have several IPs configured
      on a host, and have different listeners for each of them. We would
      like to add a "catch all" listener on addr_any, to match incoming
      connections not served by any of the listeners bound to a specific
      address.
      
      However, port-only lookups can match addr_any sockets when sockets
      listening on specific addresses are present if so_reuseport flag
      is set. This patch eliminates lookups into port-only hashtable,
      as lookups by (addr,port) tuple are easily available.
      
      In addition, compute_score() is tweaked to _not_ match
      addr_any sockets to specific addresses, as hash collisions
      could result in the unwanted behavior described above.
      
      Tested: the patch compiles; full test in the last patch in this
      patchset. Existing reuseport_* selftests also pass.
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ee58dad
    • Peter Oskolkov's avatar
      net: tcp: prefer listeners bound to an address · d9fbc7f6
      Peter Oskolkov authored
      A relatively common use case is to have several IPs configured
      on a host, and have different listeners for each of them. We would
      like to add a "catch all" listener on addr_any, to match incoming
      connections not served by any of the listeners bound to a specific
      address.
      
      However, port-only lookups can match addr_any sockets when sockets
      listening on specific addresses are present if so_reuseport flag
      is set. This patch eliminates lookups into port-only hashtable,
      as lookups by (addr,port) tuple are easily available.
      
      In addition, compute_score() is tweaked to _not_ match
      addr_any sockets to specific addresses, as hash collisions
      could result in the unwanted behavior described above.
      
      Tested: the patch compiles; full test in the last patch in this
      patchset. Existing reuseport_* selftests also pass.
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9fbc7f6
    • Peter Oskolkov's avatar
      net: udp6: prefer listeners bound to an address · 23b0269e
      Peter Oskolkov authored
      A relatively common use case is to have several IPs configured
      on a host, and have different listeners for each of them. We would
      like to add a "catch all" listener on addr_any, to match incoming
      connections not served by any of the listeners bound to a specific
      address.
      
      However, port-only lookups can match addr_any sockets when sockets
      listening on specific addresses are present if so_reuseport flag
      is set. This patch eliminates lookups into port-only hashtable,
      as lookups by (addr,port) tuple are easily available.
      
      In addition, compute_score() is tweaked to _not_ match
      addr_any sockets to specific addresses, as hash collisions
      could result in the unwanted behavior described above.
      
      Tested: the patch compiles; full test in the last patch in this
      patchset. Existing reuseport_* selftests also pass.
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23b0269e
    • Peter Oskolkov's avatar
      net: udp: prefer listeners bound to an address · 4cdeeee9
      Peter Oskolkov authored
      A relatively common use case is to have several IPs configured
      on a host, and have different listeners for each of them. We would
      like to add a "catch all" listener on addr_any, to match incoming
      connections not served by any of the listeners bound to a specific
      address.
      
      However, port-only lookups can match addr_any sockets when sockets
      listening on specific addresses are present if so_reuseport flag
      is set. This patch eliminates lookups into port-only hashtable,
      as lookups by (addr,port) tuple are easily available.
      
      In addition, compute_score() is tweaked to _not_ match
      addr_any sockets to specific addresses, as hash collisions
      could result in the unwanted behavior described above.
      
      Tested: the patch compiles; full test in the last patch in this
      patchset. Existing reuseport_* selftests also pass.
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4cdeeee9
    • yupeng's avatar
      add snmp counters document · 8e2ea53a
      yupeng authored
      Add explainations for some general IP counters, SACK and DSACK related
      counters
      Signed-off-by: default avataryupeng <yupeng0921@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e2ea53a
    • David S. Miller's avatar
      Merge branch 'neighbor-More-gc_list-changes' · 384aee46
      David S. Miller authored
      David Ahern says:
      
      ====================
      neighbor: More gc_list changes
      
      More gc_list changes and cleanups.
      
      The first 2 patches are bug fixes from the first gc_list change.
      Specifically, fix the locking order to be consistent - table lock
      followed by neighbor lock, and then entries in the FAILED state
      should always be candidates for forced_gc without waiting for any
      time span (return to the eviction logic prior to the separate gc_list).
      
      Patch 3 removes 2 now unnecessary arguments to neigh_del.
      
      Patch 4 moves a helper from a header file to core code in preparation
      for Patch 5 which removes NTF_EXT_LEARNED entries from the gc_list.
      These entries are already exempt from forced_gc; patch 5 removes them
      from consideration and makes them on par with PERMANENT entries given
      that they are also managed by userspace.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      384aee46
    • David Ahern's avatar
      neighbor: Remove externally learned entries from gc_list · e997f8a2
      David Ahern authored
      Externally learned entries are similar to PERMANENT entries in the
      sense they are managed by userspace and can not be garbage collected.
      As such remove them from the gc_list, remove the flags check from
      neigh_forced_gc and skip threshold checks in neigh_alloc. As with
      PERMANENT entries, this allows unlimited number of NTF_EXT_LEARNED
      entries.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e997f8a2
    • David Ahern's avatar
      neighbor: Move neigh_update_ext_learned to core file · 526f1b58
      David Ahern authored
      neigh_update_ext_learned has one caller in neighbour.c so does not need
      to be defined in the header. Move it and in the process remove the
      intialization of ndm_flags and just set it based on the flags check.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      526f1b58
    • David Ahern's avatar
      neighbor: Remove state and flags arguments to neigh_del · 7e6f182b
      David Ahern authored
      neigh_del now only has 1 caller, and the state and flags arguments
      are both 0. Remove them and simplify neigh_del.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e6f182b
    • David Ahern's avatar
      neighbor: Fix state check in neigh_forced_gc · 758a7f0b
      David Ahern authored
      PERMANENT entries are not on the gc_list so the state check is now
      redundant. Also, the move to not purge entries until after 5 seconds
      should not apply to FAILED entries; those can be removed immediately
      to make way for newer ones. This restores the previous logic prior to
      the gc_list.
      
      Fixes: 58956317 ("neighbor: Improve garbage collection")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      758a7f0b
    • David Ahern's avatar
      neighbor: Fix locking order for gc_list changes · 9c29a2f5
      David Ahern authored
      Lock checker noted an inverted lock order between neigh_change_state
      (neighbor lock then table lock) and neigh_periodic_work (table lock and
      then neighbor lock) resulting in:
      
      [  121.057652] ======================================================
      [  121.058740] WARNING: possible circular locking dependency detected
      [  121.059861] 4.20.0-rc6+ #43 Not tainted
      [  121.060546] ------------------------------------------------------
      [  121.061630] kworker/0:2/65 is trying to acquire lock:
      [  121.062519] (____ptrval____) (&n->lock){++--}, at: neigh_periodic_work+0x237/0x324
      [  121.063894]
      [  121.063894] but task is already holding lock:
      [  121.064920] (____ptrval____) (&tbl->lock){+.-.}, at: neigh_periodic_work+0x194/0x324
      [  121.066274]
      [  121.066274] which lock already depends on the new lock.
      [  121.066274]
      [  121.067693]
      [  121.067693] the existing dependency chain (in reverse order) is:
      ...
      
      Fix by renaming neigh_change_state to neigh_update_gc_list, changing
      it to only manage whether an entry should be on the gc_list and taking
      locks in the same order as neigh_periodic_work. Invoke at the end of
      neigh_update only if diff between old or new states has the PERMANENT
      flag set.
      
      Fixes: 8cc196d6 ("neighbor: gc_list changes should be protected by table lock")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c29a2f5
    • Cong Wang's avatar
      net_sched: fold tcf_block_cb_call() into tc_setup_cb_call() · aeb3fecd
      Cong Wang authored
      After commit 69bd4840 ("net/sched: Remove egdev mechanism"),
      tc_setup_cb_call() is nearly identical to tcf_block_cb_call(),
      so we can just fold tcf_block_cb_call() into tc_setup_cb_call()
      and remove its unused parameter 'exts'.
      
      Fixes: 69bd4840 ("net/sched: Remove egdev mechanism")
      Cc: Oz Shlomo <ozsh@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarOz Shlomo <ozsh@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aeb3fecd
    • Wen Yang's avatar
      net/ibmvnic: Remove tests of member address · 390de194
      Wen Yang authored
      The driver was checking for non-NULL address.
      - adapter->napi[i]
      
      This is pointless as these will be always non-NULL, since the
      'dapter->napi' is allocated in init_napi().
      It is safe to get rid of useless checks for addresses to fix the
      coccinelle warning:
      >>drivers/net/ethernet/ibm/ibmvnic.c: test of a variable/field address
      Since such statements always return true, they are redundant.
      Signed-off-by: default avatarWen Yang <wen.yang99@zte.com.cn>
      CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      CC: Michael Ellerman <mpe@ellerman.id.au>
      CC: Thomas Falcon <tlfalcon@linux.ibm.com>
      CC: John Allen <jallen@linux.ibm.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: linuxppc-dev@lists.ozlabs.org
      CC: netdev@vger.kernel.org
      CC: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      390de194
    • Prashant Bhole's avatar
      tun: replace get_cpu_ptr with this_cpu_ptr when bh disabled · 6342ca64
      Prashant Bhole authored
      tun_xdp_one() runs with local bh disabled. So there is no need to
      disable preemption by calling get_cpu_ptr while updating stats. This
      patch replaces the use of get_cpu_ptr() with this_cpu_ptr() as a
      micro-optimization. Also removes related put_cpu_ptr call.
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6342ca64
    • Arnd Bergmann's avatar
      hamradio, ppp: change semaphore to completion · c2c79a32
      Arnd Bergmann authored
      ppp and hamradio have copies of the same code that uses a semaphore
      in place of a completion for historic reasons. Make it use the
      proper interface instead in all copies.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2c79a32
    • Arnd Bergmann's avatar
      hns3: prevent building without CONFIG_INET · 2aa55dcc
      Arnd Bergmann authored
      We now get a link failure when CONFIG_INET is disabled, since
      tcp_gro_complete is unavailable:
      
      drivers/net/ethernet/hisilicon/hns3/hns3_enet.o: In function `hns3_set_gro_param':
      hns3_enet.c:(.text+0x230c): undefined reference to `tcp_gro_complete'
      
      Add an explicit CONFIG_INET dependency here to avoid the broken
      configuration.
      
      Fixes: a6d53b97 ("net: hns3: Adds GRO params to SKB for the stack")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2aa55dcc
    • David S. Miller's avatar
      Merge branch 'Introduce-NETDEV_PRE_CHANGEADDR' · 522185d5
      David S. Miller authored
      Petr Machata says:
      
      ====================
      Introduce NETDEV_PRE_CHANGEADDR
      
      Spectrum devices have a limitation that all router interfaces need to
      have the same address prefix. In Spectrum-1, the requirement is for the
      initial 38 bits of all RIFs to be the same, in Spectrum-2 the limit is
      36 bits. Currently violations of this requirement are not diagnosed. At
      the same time, if the condition is not upheld, the mismatched MAC
      address ends up overwriting the common prefix, and all RIF MAC addresses
      silently change to the new prefix.
      
      It is therefore desirable to be able at least to diagnose the issue, and
      better to reject attempts to change MAC addresses in ways that is
      incompatible with the device.
      
      Currently MAC address changes are notified through emission of
      NETDEV_CHANGEADDR, which is done after the change. Extending this
      message to allow vetoing is certainly possible, but several other
      notification types have instead adopted a simple two-stage approach:
      first a "pre" notification is sent to make sure all interested parties
      are OK with the change that's about to be done. Then the change is done,
      and afterwards a "post" notification is sent.
      
      This dual approach is easier to use: when the change is vetoed, nothing
      has changed yet, and it's therefore unnecessary to roll anything back.
      Therefore this patchset introduces it for NETDEV_CHANGEADDR as well.
      
      One prominent path to emitting NETDEV_CHANGEADDR is through
      dev_set_mac_address(). Therefore in patch #1, give this function an
      extack argument, so that a textual reason for rejection (or a warning)
      can be communicated back to the user.
      
      In patch #2, add the new notification type. In patch #3, have dev.c emit
      the notification for instances of dev_addr change, or addition of an
      address to dev_addrs list.
      
      In patches #4 and #5, extend the bridge driver to handle and emit the
      new notifier.
      
      In patch #6, change IPVLAN to emit the new notifier.
      
      Likewise for bonding driver in patches #7 and #8. Note that the team
      driver doesn't need this treatment, as it goes through
      dev_set_mac_address().
      
      In patches #9, #10 and #11 adapt mlxsw to veto MAC addresses on router
      interfaces, if they violate the requirement that all RIF MAC addresses
      have the same prefix.
      
      Finally in patches #12 and #13, add a test for vetoing of a direct
      change of a port device MAC, and indirect change of a bridge MAC.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      522185d5
    • Petr Machata's avatar
      selftests: mlxsw: Test FID RIF MAC vetoing · 9651ee10
      Petr Machata authored
      When a FID RIF is created for a bridge with IP address, its MAC address
      must obey the same requirements as other RIFs. Test that attempts to
      change the address incompatibly by attaching a device are vetoed with
      extack.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9651ee10
    • Petr Machata's avatar
      selftests: mlxsw: Test RIF MAC vetoing · 555afaae
      Petr Machata authored
      Test that attempts to change address in a way that violates Spectrum
      requirements are vetoed with extack.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      555afaae
    • Petr Machata's avatar
      mlxsw: spectrum_router: Veto unsupported RIF MAC addresses · 74bc9939
      Petr Machata authored
      On NETDEV_PRE_CHANGEADDR, if the change is related to a RIF interface,
      verify that it satisfies the criterion that all RIF interfaces have the
      same MAC address prefix, as indicated by mlxsw_sp.mac_mask.
      
      Additionally, besides explicit address changes, check that the address
      of an interface for which a RIF is about to be added matches the
      required pattern as well.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74bc9939
    • Petr Machata's avatar
      mlxsw: spectrum: Add mlxsw_sp.mac_mask · 9329b816
      Petr Machata authored
      The Spectrum hardware demands that all router interfaces in the system
      have the same first 38 resp. 36 bits of MAC address: the former limit
      holds on Spectrum, the latter on Spectrum-2. Add a field that refers to
      the required prefix mask and initialize in mlxsw_sp1_init() and
      mlxsw_sp2_init().
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9329b816
    • Petr Machata's avatar
      mlxsw: spectrum_router: Generalize mlxsw_sp_netdevice_router_port_event() · 9735f2d2
      Petr Machata authored
      Prepare mlxsw_sp_netdevice_router_port_event() for handling of
      NETDEV_PRE_CHANGEADDR. Split out the part that deals with the actual
      changes and call it for the two events currently handled.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9735f2d2
    • Petr Machata's avatar
      net: bonding: Issue NETDEV_PRE_CHANGEADDR · 1caf40de
      Petr Machata authored
      Give interested parties an opportunity to veto an impending HW address
      change.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1caf40de
    • Petr Machata's avatar
      net: bonding: Give bond_set_dev_addr() a return value · b9245914
      Petr Machata authored
      Before NETDEV_CHANGEADDR, bond driver should emit NETDEV_PRE_CHANGEADDR,
      and allow consumers to veto the address change. To propagate further the
      return code from NETDEV_PRE_CHANGEADDR, give the function that
      implements address change a return value.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9245914
    • Petr Machata's avatar
      net: ipvlan: Issue NETDEV_PRE_CHANGEADDR · 61345fab
      Petr Machata authored
      A NETDEV_CHANGEADDR event implies a change of address of each of the
      IPVLANs of this IPVLAN device. Therefore propagate NETDEV_PRE_CHANGEADDR
      to all the IPVLANs.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61345fab