1. 29 Mar, 2018 23 commits
    • Eric Dumazet's avatar
      ipv6: export ip6 fragments sysctl to unprivileged users · 18dcbe12
      Eric Dumazet authored
      IPv4 was changed in commit 52a773d6 ("net: Export ip fragment
      sysctl to unprivileged users")
      
      The only sysctl that is not per-netns is not used :
      ip6frag_secret_interval
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Nikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18dcbe12
    • Intiyaz Basha's avatar
      liquidio: Prioritize control messages · 697fefc7
      Intiyaz Basha authored
      During heavy tx traffic, control messages (sent by liquidio driver to NIC
      firmware) sometimes do not get processed in a timely manner.  Reason is:
      the low-level metadata of control messages and that of egress network
      packets indicate that they have the same priority.
      
      Fix it by setting a higher priority for control messages through the new
      ctrl_qpg field in the oct_txpciq struct.  It is the NIC firmware that does
      the actual setting of priority by writing to the new ctrl_qpg field; the
      host driver treats that value as opaque and just assigns it to pki_ih3->qpg
      Signed-off-by: default avatarIntiyaz Basha <intiyaz.basha@cavium.com>
      Signed-off-by: default avatarFelix Manlunas <felix.manlunas@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      697fefc7
    • David S. Miller's avatar
      Merge branch 'net-Allow-FIB-notifiers-to-fail-add-and-replace' · b349e0b5
      David S. Miller authored
      David Ahern says:
      
      ====================
      net: Allow FIB notifiers to fail add and replace
      
      I wanted to revisit how resource overload is handled for hardware offload
      of FIB entries and rules. At the moment, the in-kernel fib notifier can
      tell a driver about a route or rule add, replace, and delete, but the
      notifier can not affect the action. Specifically, in the case of mlxsw
      if a route or rule add is going to overflow the ASIC resources the only
      recourse is to abort hardware offload. Aborting offload is akin to taking
      down the switch as the path from data plane to the control plane simply
      can not support the traffic bandwidth of the front panel ports. Further,
      the current state of FIB notifiers is inconsistent with other resources
      where a driver can affect a user request - e.g., enslavement of a port
      into a bridge or a VRF.
      
      As a result of the work done over the past 3+ years, I believe we are
      at a point where we can bring consistency to the stack and offloads,
      and reliably allow the FIB notifiers to fail a request, pushing an error
      along with a suitable error message back to the user. Rather than
      aborting offload when the switch is out of resources, userspace is simply
      prevented from adding more routes and has a clear indication of why.
      
      This set does not resolve the corner case where rules or routes not
      supported by the device are installed prior to the driver getting loaded
      and registering for FIB notifications. In that case, hardware offload has
      not been established and it can refuse to offload anything, sending
      errors back to userspace via extack. Since conceptually the driver owns
      the netdevices associated with its asic, this corner case mainly applies
      to unsupported rules and any races during the bringup phase.
      
      Patch 1 fixes call_fib_notifiers to extract the errno from the encoded
      response from handlers.
      
      Patches 2-5 allow the call to call_fib_notifiers to fail the add or
      replace of a route or rule.
      
      Patch 6 adds a simple resource controller to netdevsim to illustrate
      how a FIB resource controller can limit the number of route entries.
      
      Changes since RFC
      - correct return code for call_fib_notifier
      - dropped patch 6 exporting devlink symbols
      - limited example resource controller to init_net only
      - updated Kconfig for netdevsim to use MAY_USE_DEVLINK
      - updated cover letter regarding startup case noted by Ido
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b349e0b5
    • David Ahern's avatar
      netdevsim: Add simple FIB resource controller via devlink · 37923ed6
      David Ahern authored
      Add devlink support to netdevsim and use it to implement a simple,
      profile based resource controller. Only one controller is needed
      per namespace, so the first netdevsim netdevice in a namespace
      registers with devlink. If that device is deleted, the resource
      settings are deleted.
      
      The resource controller allows a user to limit the number of IPv4 and
      IPv6 FIB entries and FIB rules. The resource paths are:
          /IPv4
          /IPv4/fib
          /IPv4/fib-rules
          /IPv6
          /IPv6/fib
          /IPv6/fib-rules
      
      The IPv4 and IPv6 top level resources are unlimited in size and can not
      be changed. From there, the number of FIB entries and FIB rule entries
      are unlimited by default. A user can specify a limit for the fib and
      fib-rules resources:
      
          $ devlink resource set netdevsim/netdevsim0 path /IPv4/fib size 96
          $ devlink resource set netdevsim/netdevsim0 path /IPv4/fib-rules size 16
          $ devlink resource set netdevsim/netdevsim0 path /IPv6/fib size 64
          $ devlink resource set netdevsim/netdevsim0 path /IPv6/fib-rules size 16
          $ devlink dev reload netdevsim/netdevsim0
      
      such that the number of rules or routes is limited (96 ipv4 routes in the
      example above):
          $ for n in $(seq 1 32); do ip ro add 10.99.$n.0/24 dev eth1; done
          Error: netdevsim: Exceeded number of supported fib entries.
      
          $ devlink resource show netdevsim/netdevsim0
          netdevsim/netdevsim0:
            name IPv4 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables non
              resources:
                name fib size 96 occ 96 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables
          ...
      
      With this template in place for resource management, it is fairly trivial
      to extend and shows one way to implement a simple counter based resource
      controller typical of network profiles.
      
      Currently, devlink only supports initial namespace. Code is in place to
      adapt netdevsim to a per namespace controller once the network namespace
      issues are resolved.
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37923ed6
    • David Ahern's avatar
      net/ipv6: Move call_fib6_entry_notifiers up for route adds · 2233000c
      David Ahern authored
      Move call to call_fib6_entry_notifiers for new IPv6 routes to right
      before the insertion into the FIB. At this point notifier handlers can
      decide the fate of the new route with a clean path to delete the
      potential new entry if the notifier returns non-0.
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2233000c
    • David Ahern's avatar
      net/ipv4: Allow notifier to fail route replace · c1d7ee67
      David Ahern authored
      Add checking to call to call_fib_entry_notifiers for IPv4 route replace.
      Allows a notifier handler to fail the replace.
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1d7ee67
    • David Ahern's avatar
      net/ipv4: Move call_fib_entry_notifiers up for new routes · 6635f311
      David Ahern authored
      Move call to call_fib_entry_notifiers for new IPv4 routes to right
      before the call to fib_insert_alias. At this point the only remaining
      failure path is memory allocations in fib_insert_node. Handle that
      very unlikely failure with a call to call_fib_entry_notifiers to
      tell drivers about it.
      
      At this point notifier handlers can decide the fate of the new route
      with a clean path to delete the potential new entry if the notifier
      returns non-0.
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6635f311
    • David Ahern's avatar
      net: Move call_fib_rule_notifiers up in fib_nl_newrule · 9776d325
      David Ahern authored
      Move call_fib_rule_notifiers up in fib_nl_newrule to the point right
      before the rule is inserted into the list. At this point there are no
      more failure paths within the core rule code, so if the notifier
      does not fail then the rule will be inserted into the list.
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9776d325
    • David Ahern's avatar
      net: Fix fib notifer to return errno · c30d9356
      David Ahern authored
      Notifier handlers use notifier_from_errno to convert any potential error
      to an encoded format. As a consequence the other side, call_fib_notifier{s}
      in this case, needs to use notifier_to_errno to return the error from
      the handler back to its caller.
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c30d9356
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2018-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 6e2135ce
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2018-03-27 (Misc updates & SQ recovery)
      
      This series contains Misc updates and cleanups for mlx5e rx path
      and SQ recovery feature for tx path.
      
      From Tariq: (RX updates)
          - Disable Striding RQ when PCI devices, striding RQ limits the use
            of CQE compression feature, which is very critical for slow PCI
            devices performance, in this change we will prefer CQE compression
            over Striding RQ only on specific "slow"  PCIe links.
          - RX path cleanups
          - Private flag to enable/disable striding RQ
      
      From Eran: (TX fast recovery)
          - TX timeout logic improvements, fast SQ recovery and TX error reporting
            if a HW error occurs while transmitting on a specific SQ, the driver will
            ignore such error and will wait for TX timeout to occur and reset all
            the rings. Instead, the current series improves the resiliency for such
            HW errors by detecting TX completions with errors, which will report them
            and perform a fast recover for the specific faulty SQ even before a TX
            timeout is detected.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e2135ce
    • David S. Miller's avatar
      Merge branch 'Introduce-net_rwsem-to-protect-net_namespace_list' · 038d49ba
      David S. Miller authored
      Kirill Tkhai says:
      
      ====================
      Introduce net_rwsem to protect net_namespace_list
      
      The series introduces fine grained rw_semaphore, which will be used
      instead of rtnl_lock() to protect net_namespace_list.
      
      This improves scalability and allows to do non-exclusive sleepable
      iteration for_each_net(), which is enough for most cases.
      
      scripts/get_maintainer.pl gives enormous list of people, and I add
      all to CC.
      
      Note, that this patch is independent of "Close race between
      {un, }register_netdevice_notifier and pernet_operations":
      https://patchwork.ozlabs.org/project/netdev/list/?series=36495Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      038d49ba
    • Kirill Tkhai's avatar
      net: Remove rtnl_lock() in nf_ct_iterate_destroy() · 152f2531
      Kirill Tkhai authored
      rtnl_lock() doesn't protect net::ct::count,
      and it's not needed for__nf_ct_unconfirmed_destroy()
      and for nf_queue_nf_hook_drop().
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      152f2531
    • Kirill Tkhai's avatar
      ovs: Remove rtnl_lock() from ovs_exit_net() · ec9c7809
      Kirill Tkhai authored
      Here we iterate for_each_net() and removes
      vport from alive net to the exiting net.
      
      ovs_net::dps are protected by ovs_mutex(),
      and the others, who change it (ovs_dp_cmd_new(),
      __dp_destroy()) also take it.
      The same with datapath::ports list.
      
      So, we remove rtnl_lock() here.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec9c7809
    • Kirill Tkhai's avatar
      security: Remove rtnl_lock() in selinux_xfrm_notify_policyload() · 350311aa
      Kirill Tkhai authored
      rt_genid_bump_all() consists of ipv4 and ipv6 part.
      ipv4 part is incrementing of net::ipv4::rt_genid,
      and I see many places, where it's read without rtnl_lock().
      
      ipv6 part calls __fib6_clean_all(), and it's also
      called without rtnl_lock() in other places.
      
      So, rtnl_lock() here was used to iterate net_namespace_list only,
      and we can remove it.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      350311aa
    • Kirill Tkhai's avatar
      net: Don't take rtnl_lock() in wireless_nlevent_flush() · 10256deb
      Kirill Tkhai authored
      This function iterates over net_namespace_list and flushes
      the queue for every of them. What does this rtnl_lock()
      protects?! Since we may add skbs to net::wext_nlevents
      without rtnl_lock(), it does not protects us about queuers.
      
      It guarantees, two threads can't flush the queue in parallel,
      that can change the order, but since skb can be queued
      in any order, it doesn't matter, how many threads do this
      in parallel. In case of several threads, this will be even
      faster.
      
      So, we can remove rtnl_lock() here, as it was used for
      iteration over net_namespace_list only.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10256deb
    • Kirill Tkhai's avatar
      net: Introduce net_rwsem to protect net_namespace_list · f0b07bb1
      Kirill Tkhai authored
      rtnl_lock() is used everywhere, and contention is very high.
      When someone wants to iterate over alive net namespaces,
      he/she has no a possibility to do that without exclusive lock.
      But the exclusive rtnl_lock() in such places is overkill,
      and it just increases the contention. Yes, there is already
      for_each_net_rcu() in kernel, but it requires rcu_read_lock(),
      and this can't be sleepable. Also, sometimes it may be need
      really prevent net_namespace_list growth, so for_each_net_rcu()
      is not fit there.
      
      This patch introduces new rw_semaphore, which will be used
      instead of rtnl_mutex to protect net_namespace_list. It is
      sleepable and allows not-exclusive iterations over net
      namespaces list. It allows to stop using rtnl_lock()
      in several places (what is made in next patches) and makes
      less the time, we keep rtnl_mutex. Here we just add new lock,
      while the explanation of we can remove rtnl_lock() there are
      in next patches.
      
      Fine grained locks generally are better, then one big lock,
      so let's do that with net_namespace_list, while the situation
      allows that.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f0b07bb1
    • David S. Miller's avatar
      Merge branch 'net-bgmac-Couple-of-small-bgmac-changes' · 906edee9
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      net: bgmac: Couple of small bgmac changes
      
      This patch series addresses two minor issues with the bgmac driver:
      
      - provides the interface name through /proc/interrupts rather than "bgmac"
      - makes sure the interrupts are masked during probe, in case the block was
        not properly reset
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      906edee9
    • Florian Fainelli's avatar
      net: bgmac: Mask interrupts during probe · 34322615
      Florian Fainelli authored
      We can have interrupts left enabled form e.g: the bootloader which used
      the network device for network boot. Make sure we have those disabled as
      early as possible to avoid spurious interrupts.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34322615
    • Florian Fainelli's avatar
      net: bgmac: Use interface name to request interrupt · d72e7c21
      Florian Fainelli authored
      When the system contains several BGMAC adapters, it is nice to be able
      to tell which one is which by looking at /proc/interrupts. Use the
      network device name as a name to request_irq() with.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d72e7c21
    • David S. Miller's avatar
      Merge tag 'rxrpc-next-20180327' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 36fc2d72
      David S. Miller authored
      David Howells says:
      
      ====================
      rxrpc: Tracing updates
      
      Here are some patches that update tracing in AF_RXRPC and AFS:
      
       (1) Add a tracepoint for tracking resend events.
      
       (2) Use debug_ids in traces rather than pointers (as pointers are now hashed)
           and allow use of the same debug_id in AFS calls as in the corresponding
           AF_RXRPC calls.  This makes filtering the trace output much easier.
      
       (3) Add a tracepoint for tracking call completion.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36fc2d72
    • Moritz Fischer's avatar
      net: ethernet: nixge: Add support for National Instruments XGE netdev · 492caffa
      Moritz Fischer authored
      Add support for the National Instruments XGE 1/10G network device.
      
      It uses the EEPROM on the board via NVMEM.
      Signed-off-by: default avatarMoritz Fischer <mdf@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      492caffa
    • Moritz Fischer's avatar
      dt-bindings: net: Add bindings for National Instruments XGE netdev · 75530a78
      Moritz Fischer authored
      This adds bindings for the NI XGE 1G/10G network device.
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarMoritz Fischer <mdf@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75530a78
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next · 56455e09
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net-next): ipsec-next 2018-03-29
      
      1) Remove a redundant pointer initialization esp_input_set_header().
         From Colin Ian King.
      
      2) Mark the xfrm kmem_caches as __ro_after_init.
         From Alexey Dobriyan.
      
      3) Do the checksum for an ipsec offlad packet in software
         if the device does not advertise NETIF_F_HW_ESP_TX_CSUM.
         From Shannon Nelson.
      
      4) Use booleans for true and false instead of integers
         in xfrm_policy_cache_flush().
         From Gustavo A. R. Silva
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56455e09
  2. 28 Mar, 2018 15 commits
  3. 27 Mar, 2018 2 commits
    • David Howells's avatar
      rxrpc: Trace call completion · 1bae5d22
      David Howells authored
      Add a tracepoint to track rxrpc calls moving into the completed state and
      to log the completion type and the recorded error value and abort code.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      1bae5d22
    • David Howells's avatar
      rxrpc, afs: Use debug_ids rather than pointers in traces · a25e21f0
      David Howells authored
      In rxrpc and afs, use the debug_ids that are monotonically allocated to
      various objects as they're allocated rather than pointers as kernel
      pointers are now hashed making them less useful.  Further, the debug ids
      aren't reused anywhere nearly as quickly.
      
      In addition, allow kernel services that use rxrpc, such as afs, to take
      numbers from the rxrpc counter, assign them to their own call struct and
      pass them in to rxrpc for both client and service calls so that the trace
      lines for each will have the same ID tag.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      a25e21f0