1. 08 Dec, 2018 2 commits
    • yupeng's avatar
      net: call sk_dst_reset when set SO_DONTROUTE · 0fbe82e6
      yupeng authored
      after set SO_DONTROUTE to 1, the IP layer should not route packets if
      the dest IP address is not in link scope. But if the socket has cached
      the dst_entry, such packets would be routed until the sk_dst_cache
      expires. So we should clean the sk_dst_cache when a user set
      SO_DONTROUTE option. Below are server/client python scripts which
      could reprodue this issue:
      
      server side code:
      
      ==========================================================================
      import socket
      import struct
      import time
      
      s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
      s.bind(('0.0.0.0', 9000))
      s.listen(1)
      sock, addr = s.accept()
      sock.setsockopt(socket.SOL_SOCKET, socket.SO_DONTROUTE, struct.pack('i', 1))
      while True:
          sock.send(b'foo')
          time.sleep(1)
      ==========================================================================
      
      client side code:
      ==========================================================================
      import socket
      import time
      
      s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
      s.connect(('server_address', 9000))
      while True:
          data = s.recv(1024)
          print(data)
      ==========================================================================
      Signed-off-by: default avataryupeng <yupeng0921@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fbe82e6
    • David Ahern's avatar
      neighbor: Improve garbage collection · 58956317
      David Ahern authored
      The existing garbage collection algorithm has a number of problems:
      
      1. The gc algorithm will not evict PERMANENT entries as those entries
         are managed by userspace, yet the existing algorithm walks the entire
         hash table which means it always considers PERMANENT entries when
         looking for entries to evict. In some use cases (e.g., EVPN) there
         can be tens of thousands of PERMANENT entries leading to wasted
         CPU cycles when gc kicks in. As an example, with 32k permanent
         entries, neigh_alloc has been observed taking more than 4 msec per
         invocation.
      
      2. Currently, when the number of neighbor entries hits gc_thresh2 and
         the last flush for the table was more than 5 seconds ago gc kicks in
         walks the entire hash table evicting *all* entries not in PERMANENT
         or REACHABLE state and not marked as externally learned. There is no
         discriminator on when the neigh entry was created or if it just moved
         from REACHABLE to another NUD_VALID state (e.g., NUD_STALE).
      
         It is possible for entries to be created or for established neighbor
         entries to be moved to STALE (e.g., an external node sends an ARP
         request) right before the 5 second window lapses:
      
              -----|---------x|----------|-----
                  t-5         t         t+5
      
         If that happens those entries are evicted during gc causing unnecessary
         thrashing on neighbor entries and userspace caches trying to track them.
      
         Further, this contradicts the description of gc_thresh2 which says
         "Entries older than 5 seconds will be cleared".
      
         One workaround is to make gc_thresh2 == gc_thresh3 but that negates the
         whole point of having separate thresholds.
      
      3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries
         when gc_thresh2 is exceeded is over kill and contributes to trashing
         especially during startup.
      
      This patch addresses these problems as follows:
      
      1. Use of a separate list_head to track entries that can be garbage
         collected along with a separate counter. PERMANENT entries are not
         added to this list.
      
         The gc_thresh parameters are only compared to the new counter, not the
         total entries in the table. The forced_gc function is updated to only
         walk this new gc_list looking for entries to evict.
      
      2. Entries are added to the list head at the tail and removed from the
         front.
      
      3. Entries are only evicted if they were last updated more than 5 seconds
         ago, adhering to the original intent of gc_thresh2.
      
      4. Forced gc is stopped once the number of gc_entries drops below
         gc_thresh2.
      
      5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
         when allocating a new neighbor for a PERMANENT entry. By extension this
         means there are no explicit limits on the number of PERMANENT entries
         that can be created, but this is no different than FIB entries or FDB
         entries.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58956317
  2. 07 Dec, 2018 33 commits
  3. 06 Dec, 2018 5 commits
    • David S. Miller's avatar
      Merge branch 'Pass-extack-to-NETDEV_PRE_UP' · ef2df7fc
      David S. Miller authored
      Petr Machata says:
      
      ====================
      Pass extack to NETDEV_PRE_UP
      
      Drivers may need to validate configuration of a device that's about to
      be upped. An example is mlxsw, which needs to check the configuration of
      a VXLAN device attached to an offloaded bridge. Should the validation
      fail, there's currently no way to communicate details of the failure to
      the user, beyond an error number.
      
      Therefore this patch set extends the NETDEV_PRE_UP event to include
      extack, if available.
      
      There are three vectors through which NETDEV_PRE_UP invocation can be
      reached. The two major ones are dev_open() and dev_change_flags(), the
      last is then __dev_change_flags().
      
      In patch #1, the first access vector, dev_open() is addressed. An extack
      parameter is added and all users converted to use it.
      
      Before addressing the second vector, two preparatory patches propagate
      extack argument to the proximity of the dev_change_flags() call in VRF
      and IPVLAN drivers. That happens in patches #2 and #3. Then in patch #4,
      dev_change_flags() is treated similarly to dev_open().
      
      Likewise in patch #5, __dev_change_flags() is extended.
      
      Then in patches #6 and #7, the extack is finally propagated all the way
      to the point where the notification is emitted.
      
      This change allows particularly mlxsw (which already has code to
      leverage extack if available) to communicate to the user error messages
      regarding VXLAN configuration. In patch #8, add a test case that
      exercises this code and checks that an error message is propagated.
      
      For example:
      
      	local 192.0.2.17 remote 192.0.2.18 \
      	dstport 4789 nolearning noudpcsum tos inherit ttl 100
      	local 192.0.2.17 remote 192.0.2.18 \
      	dstport 4789 nolearning noudpcsum tos inherit ttl 100
      Error: mlxsw_spectrum: Conflicting NVE tunnels configuration.
      
      v2:
      - Add David Ahern's tags.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef2df7fc
    • Petr Machata's avatar
      selftests: mlxsw: Add a new test extack.sh · 1ba1daed
      Petr Machata authored
      Add a testsuite dedicated to testing extack propagation and related
      functionality.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ba1daed
    • Petr Machata's avatar
      net: core: dev: Attach extack to NETDEV_PRE_UP · 40c900aa
      Petr Machata authored
      Drivers may need to validate configuration of a device that's about to
      be upped. Should the validation fail, there's currently no way to
      communicate details of the failure to the user, beyond an error number.
      
      To mend that, change __dev_open() to take an extack argument and pass it
      from __dev_change_flags() and dev_open(), where it was propagated in the
      previous patches.
      
      Change __dev_open() to call call_netdevice_notifiers_extack() so that
      the passed-in extack is attached to the NETDEV_PRE_UP notifier.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40c900aa
    • Petr Machata's avatar
      net: core: dev: Add call_netdevice_notifiers_extack() · 26372605
      Petr Machata authored
      In order to propagate extack through NETDEV_PRE_UP, add a new function
      call_netdevice_notifiers_extack() that primes the extack field of the
      notifier info. Convert call_netdevice_notifiers() to a simple wrapper
      around the new function that passes NULL for extack.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26372605
    • Petr Machata's avatar
      net: core: dev: Add extack argument to __dev_change_flags() · 6d040321
      Petr Machata authored
      In order to pass extack together with NETDEV_PRE_UP notifications, it's
      necessary to route the extack to __dev_open() from diverse (possibly
      indirect) callers. The last missing API is __dev_change_flags().
      
      Therefore extend __dev_change_flags() with and extra extack argument and
      update the two existing users.
      
      Since the function declaration line is changed anyway, name the struct
      net_device argument to placate checkpatch.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d040321