1. 23 May, 2018 15 commits
    • Jakub Kicinski's avatar
      nfp: move rtsym helpers to pf code · 8f6196f6
      Jakub Kicinski authored
      nfp_net_pf_rtsym_read_optional() and nfp_net_pf_map_rtsym() are not
      really related to networking code.  Move them to the PF code and
      remove the net from their names.  They will soon be needed by code
      outside of nfp_net_main.c anyway.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f6196f6
    • David S. Miller's avatar
      Merge branch 'bpfilter' · e95a5f54
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      bpfilter
      
      v2->v3:
      - followed Luis's suggestion and significantly simplied first patch
        with shmem_kernel_file_setup+kernel_write. Added kdoc for new helper
      - fixed typos and race to access pipes with mutex
      - tested with bpfilter being 'builtin'. CONFIG_BPFILTER_UMH=y|m both work.
        Interesting to see a usermode executable being embedded inside vmlinux.
      - it doesn't hurt to enable bpfilter in .config.
        ip_setsockopt commands sent to usermode via pipes and -ENOPROTOOPT is
        returned from userspace, so kernel falls back to original iptables code
      
      v1->v2:
      this patch set is almost a full rewrite of the earlier umh modules approach
      The v1 of patches and follow up discussion was covered by LWN:
      https://lwn.net/Articles/749108/
      
      I believe the v2 addresses all issues brought up by Andy and others.
      Mainly there are zero changes to kernel/module.c
      Instead of teaching module loading logic to recognize special
      umh module, let normal kernel modules execute part of its own
      .init.rodata as a new user space process (Andy's idea)
      Patch 1 introduces this new helper:
      int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
      Input:
        data + len == executable file
      Output:
        struct umh_info {
             struct file *pipe_to_umh;
             struct file *pipe_from_umh;
             pid_t pid;
        };
      
      Advantages vs v1:
      - the embedded user mode executable is stored as .init.rodata inside
        normal kernel module. These pages are freed when .ko finishes loading
      - the elf file is copied into tmpfs file. The user mode process is swappable.
      - the communication between user mode process and 'parent' kernel module
        is done via two unix pipes, hence protocol is not exposed to
        user space
      - impossible to launch umh on its own (that was the main issue of v1)
        and impossible to be man-in-the-middle due to pipes
      - bpfilter.ko consists of tiny kernel part that passes the data
        between kernel and umh via pipes and much bigger umh part that
        doing all the work
      - 'lsmod' shows bpfilter.ko as usual.
        'rmmod bpfilter' removes kernel module and kills corresponding umh
      - signed bpfilter.ko covers the whole image including umh code
      
      Few issues:
      - the user can still attach to the process and debug it with
        'gdb /proc/pid/exe pid', but 'gdb -p pid' doesn't work.
        (a bit worse comparing to v1)
      - tinyconfig will notice a small increase in .text
        +766 | TEXT | 7c8b94806bec umh: introduce fork_usermode_blob() helper
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e95a5f54
    • Alexei Starovoitov's avatar
      net: add skeleton of bpfilter kernel module · d2ba09c1
      Alexei Starovoitov authored
      bpfilter.ko consists of bpfilter_kern.c (normal kernel module code)
      and user mode helper code that is embedded into bpfilter.ko
      
      The steps to build bpfilter.ko are the following:
      - main.c is compiled by HOSTCC into the bpfilter_umh elf executable file
      - with quite a bit of objcopy and Makefile magic the bpfilter_umh elf file
        is converted into bpfilter_umh.o object file
        with _binary_net_bpfilter_bpfilter_umh_start and _end symbols
        Example:
        $ nm ./bld_x64/net/bpfilter/bpfilter_umh.o
        0000000000004cf8 T _binary_net_bpfilter_bpfilter_umh_end
        0000000000004cf8 A _binary_net_bpfilter_bpfilter_umh_size
        0000000000000000 T _binary_net_bpfilter_bpfilter_umh_start
      - bpfilter_umh.o and bpfilter_kern.o are linked together into bpfilter.ko
      
      bpfilter_kern.c is a normal kernel module code that calls
      the fork_usermode_blob() helper to execute part of its own data
      as a user mode process.
      
      Notice that _binary_net_bpfilter_bpfilter_umh_start - end
      is placed into .init.rodata section, so it's freed as soon as __init
      function of bpfilter.ko is finished.
      As part of __init the bpfilter.ko does first request/reply action
      via two unix pipe provided by fork_usermode_blob() helper to
      make sure that umh is healthy. If not it will kill it via pid.
      
      Later bpfilter_process_sockopt() will be called from bpfilter hooks
      in get/setsockopt() to pass iptable commands into umh via bpfilter.ko
      
      If admin does 'rmmod bpfilter' the __exit code bpfilter.ko will
      kill umh as well.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2ba09c1
    • Alexei Starovoitov's avatar
      umh: introduce fork_usermode_blob() helper · 449325b5
      Alexei Starovoitov authored
      Introduce helper:
      int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
      struct umh_info {
             struct file *pipe_to_umh;
             struct file *pipe_from_umh;
             pid_t pid;
      };
      
      that GPLed kernel modules (signed or unsigned) can use it to execute part
      of its own data as swappable user mode process.
      
      The kernel will do:
      - allocate a unique file in tmpfs
      - populate that file with [data, data + len] bytes
      - user-mode-helper code will do_execve that file and, before the process
        starts, the kernel will create two unix pipes for bidirectional
        communication between kernel module and umh
      - close tmpfs file, effectively deleting it
      - the fork_usermode_blob will return zero on success and populate
        'struct umh_info' with two unix pipes and the pid of the user process
      
      As the first step in the development of the bpfilter project
      the fork_usermode_blob() helper is introduced to allow user mode code
      to be invoked from a kernel module. The idea is that user mode code plus
      normal kernel module code are built as part of the kernel build
      and installed as traditional kernel module into distro specified location,
      such that from a distribution point of view, there is
      no difference between regular kernel modules and kernel modules + umh code.
      Such modules can be signed, modprobed, rmmod, etc. The use of this new helper
      by a kernel module doesn't make it any special from kernel and user space
      tooling point of view.
      
      Such approach enables kernel to delegate functionality traditionally done
      by the kernel modules into the user space processes (either root or !root) and
      reduces security attack surface of the new code. The buggy umh code would crash
      the user process, but not the kernel. Another advantage is that umh code
      of the kernel module can be debugged and tested out of user space
      (e.g. opening the possibility to run clang sanitizers, fuzzers or
      user space test suites on the umh code).
      In case of the bpfilter project such architecture allows complex control plane
      to be done in the user space while bpf based data plane stays in the kernel.
      
      Since umh can crash, can be oom-ed by the kernel, killed by the admin,
      the kernel module that uses them (like bpfilter) needs to manage life
      time of umh on its own via two unix pipes and the pid of umh.
      
      The exit code of such kernel module should kill the umh it started,
      so that rmmod of the kernel module will cleanup the corresponding umh.
      Just like if the kernel module does kmalloc() it should kfree() it
      in the exit code.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      449325b5
    • David S. Miller's avatar
      Merge branch 'qed-firmware-TLV' · 1fe8c06c
      David S. Miller authored
      Sudarsana Reddy Kalluru says:
      
      ====================
      qed*: Add support for management firmware TLV request.
      
      Management firmware (MFW) requires config and state information from
      the driver. It queries this via TLV (type-length-value) request wherein
      mfw specificies the list of required TLVs. Driver fills the TLV data
      and responds back to MFW.
      This patch series adds qed/qede/qedf/qedi driver implementation for
      supporting the TLV queries from MFW.
      
      Changes from previous versions:
      -------------------------------
      v2: Split patch (2) into multiple simpler patches.
      v2: Update qed_tlv_parsed_buf->p_val datatype to void pointer to avoid
          bunch of unnecessary typecasts.
      
      Please consider applying this series to "net-next".
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fe8c06c
    • Manish Rangankar's avatar
    • Manish Rangankar's avatar
      qedi: Add support for populating ethernet TLVs. · 534bbdf8
      Manish Rangankar authored
      This patch adds callbacks for providing the ethernet protocol driver TLVs.
      Signed-off-by: default avatarManish Rangankar <manish.rangankar@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      534bbdf8
    • Chad Dupuis's avatar
      8673daf4
    • Chad Dupuis's avatar
      qedf: Add support for populating ethernet TLVs. · 642a0b37
      Chad Dupuis authored
      This patch adds callbacks for providing the ethernet protocol driver TLVs.
      Signed-off-by: default avatarChad Dupuis <chad.dupuis@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      642a0b37
    • Sudarsana Reddy Kalluru's avatar
      qede: Add support for populating ethernet TLVs. · d25b859c
      Sudarsana Reddy Kalluru authored
      This patch adds callbacks for providing the ethernet protocol driver TLVs.
      Signed-off-by: default avatarSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
      Signed-off-by: default avatarAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d25b859c
    • Sudarsana Reddy Kalluru's avatar
      qed: Add driver infrastucture for handling mfw requests. · 59ccf86f
      Sudarsana Reddy Kalluru authored
      MFW requests the TLVs in interrupt context. Extracting of the required
      data from upper layers and populating of the TLVs require process context.
      The patch adds work-queues for processing the tlv requests. It also adds
      the implementation for requesting the tlv values from appropriate protocol
      driver.
      Signed-off-by: default avatarSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
      Signed-off-by: default avatarAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59ccf86f
    • Sudarsana Reddy Kalluru's avatar
    • Sudarsana Reddy Kalluru's avatar
    • Sudarsana Reddy Kalluru's avatar
      qed: Add support for tlv request processing. · 2528c389
      Sudarsana Reddy Kalluru authored
      The patch adds driver support for processing TLV requests/repsonses
      from the mfw and upper driver layers respectively. The implementation
      reads the requested TLVs from the shared memory, requests the values
      from upper layer drivers, populates this info (TLVs) shared memory and
      notifies MFW about the TLV values.
      Signed-off-by: default avatarSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
      Signed-off-by: default avatarAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2528c389
    • Sudarsana Reddy Kalluru's avatar
      qed: Add MFW interfaces for TLV request support. · dd006921
      Sudarsana Reddy Kalluru authored
      The patch adds required management firmware (MFW) interfaces such as
      mailbox commands, TLV types etc.
      Signed-off-by: default avatarSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
      Signed-off-by: default avatarAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd006921
  2. 22 May, 2018 22 commits
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 9c803cfd
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2018-05-22
      
      This series contains updates to i40e only.
      
      Jake provides all the changes in this series starting with making it
      consistent in how we approach the bit lock.  Fixed the reporting of the
      VEB statistics and the queue statistics to always return every queue
      even if it is not currently in use.  Use WARN_ONCE() so that the first
      time we end up with an incorrect size we will dump a stack trace and a
      message to help highlight the issue early in testing.  Folded the fixed
      string prefix into the stat string definition.  Instead of using a
      separate char *p pointer when copying strings, use the data pointer
      directly.  Added code comments for several of the statistic functions to
      better explain the number and ordering of statistics.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c803cfd
    • David S. Miller's avatar
      Merge branch 'tcp-ECN-quickack' · 119768c9
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      tcp: reduce quickack pressure for ECN
      
      Small patch series changing TCP behavior vs quickack and ECN
      
      First patch is a refactoring, adding parameter to tcp_incr_quickack()
      and tcp_enter_quickack_mode() helpers.
      
      Second patch implements the change, lowering number of ACK packets
      sent after an ECN event.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      119768c9
    • Eric Dumazet's avatar
      tcp: do not aggressively quick ack after ECN events · 522040ea
      Eric Dumazet authored
      ECN signals currently forces TCP to enter quickack mode for
      up to 16 (TCP_MAX_QUICKACKS) following incoming packets.
      
      We believe this is not needed, and only sending one immediate ack
      for the current packet should be enough.
      
      This should reduce the extra load noticed in DCTCP environments,
      after congestion events.
      
      This is part 2 of our effort to reduce pure ACK packets.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      522040ea
    • Eric Dumazet's avatar
      tcp: add max_quickacks param to tcp_incr_quickack and tcp_enter_quickack_mode · 9a9c9b51
      Eric Dumazet authored
      We want to add finer control of the number of ACK packets sent after
      ECN events.
      
      This patch is not changing current behavior, it only enables following
      change.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a9c9b51
    • Vlad Buslov's avatar
      net: sched: don't disable bh when accessing action idr · 290aa0ad
      Vlad Buslov authored
      Initial net_device implementation used ingress_lock spinlock to synchronize
      ingress path of device. This lock was used in both process and bh context.
      In some code paths action map lock was obtained while holding ingress_lock.
      Commit e1e992e5 ("[NET_SCHED] protect action config/dump from irqs")
      modified actions to always disable bh, while using action map lock, in
      order to prevent deadlock on ingress_lock in softirq. This lock was removed
      from net_device, so disabling bh, while accessing action map, is no longer
      necessary.
      
      Replace all action idr spinlock usage with regular calls that do not
      disable bh.
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      290aa0ad
    • David S. Miller's avatar
      Merge branch 'net-ipv6-Fix-route-append-and-replace-use-cases' · 73bf1fc5
      David S. Miller authored
      David Ahern says:
      
      ====================
      net/ipv6: Fix route append and replace use cases
      
      This patch set fixes a few append and replace uses cases for IPv6 and
      adds test cases that codifies the expectations of how append and replace
      are expected to work. In paricular it allows a multipath route to have
      a dev-only nexthop, something Thomas tried to accomplish with commit
      edd7ceb7 ("ipv6: Allow non-gateway ECMP for IPv6") which had to be
      reverted because of breakage, and to replace an existing FIB entry
      with a reject route.
      
      There are a number of inconsistent and surprising aspects to the Linux
      API for adding, deleting, replacing and changing FIB entries. For example,
      with IPv4 NLM_F_APPEND means insert the route after any existing entries
      with the same key (prefix + priority + TOS for IPv4) and NLM_F_CREATE
      without the append flag inserts the new route before any existing entries.
      
      IPv6 on the other hand attempts to guess whether a new route should be
      appended to an existing one, possibly creating a multipath route, or to
      add a new entry after any existing ones. This applies to both the 'append'
      (NLM_F_CREATE + NLM_F_APPEND) and 'prepend' (NLM_F_CREATE only) cases
      meaning for IPv6 the NLM_F_APPEND is basically ignored. This guessing
      whether the route should be added to a multipath route (gateway routes)
      or inserted after existing entries (non-gateway based routes) means a
      multipath route can not have a dev only nexthop (potentially required in
      some cases - tunnels or VRF route leaking for example) and route 'replace'
      is a bit adhoc treating gateway based routes and dev-only / reject routes
      differently.
      
      This has led to frustration with developers working on routing suites
      such as FRR where workarounds such as delete and add are used instead of
      replace.
      
      After this patch set there are 2 differences between IPv4 and IPv6:
      1. 'ip ro prepend' = NLM_F_CREATE only
          IPv4 adds the new route before any existing ones
          IPv6 adds new route after any existing ones
      
      2. 'ip ro append' = NLM_F_CREATE|NLM_F_APPEND
         IPv4 adds the new route after any existing ones
         IPv6 adds the nexthop to existing routes converting to multipath
      
      For the former, there are cases where we want same prefix routes added
      after existing ones (e.g., multicast, prefix routes for macvlan when used
      for virtual router redundancy). Requiring the APPEND flag to add a new
      route to an existing one helps here but is a slight change in behavior
      since prepend with gateway routes now create a separate entry.
      
      For the latter IPv6 behavior is preferred - appending a route for the same
      prefix and metric to make a multipath route, so really IPv4 not allowing an
      existing route to be updated is the limiter. This will be fixed when
      nexthops become separate objects - a future patch set.
      
      Thank you to Thomas and Ido for testing earlier versions of this set, and
      to Ido for providing an update to the mlxsw driver.
      
      Changes since RFC
      - cleanup wording in test script; add comments about expected failures
        and why
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73bf1fc5
    • David Ahern's avatar
      selftests: fib_tests: Add ipv4 route add append replace tests · abb1860a
      David Ahern authored
      Add IPv4 route tests covering add, append and replace permutations.
      Assumes the ability to add a basic single path route works; this is
      required for example when adding an address to an interface.
      
      $ fib_tests.sh -t ipv4_rt
      
      IPv4 route add / append tests
          TEST: Attempt to add duplicate route - gw                           [ OK ]
          TEST: Attempt to add duplicate route - dev only                     [ OK ]
          TEST: Attempt to add duplicate route - reject route                 [ OK ]
          TEST: Add new nexthop for existing prefix                           [ OK ]
          TEST: Append nexthop to existing route - gw                         [ OK ]
          TEST: Append nexthop to existing route - dev only                   [ OK ]
          TEST: Append nexthop to existing route - reject route               [ OK ]
          TEST: Append nexthop to existing reject route - gw                  [ OK ]
          TEST: Append nexthop to existing reject route - dev only            [ OK ]
          TEST: add multipath route                                           [ OK ]
          TEST: Attempt to add duplicate multipath route                      [ OK ]
          TEST: Route add with different metrics                              [ OK ]
          TEST: Route delete with metric                                      [ OK ]
      
      IPv4 route replace tests
          TEST: Single path with single path                                  [ OK ]
          TEST: Single path with multipath                                    [ OK ]
          TEST: Single path with reject route                                 [ OK ]
          TEST: Single path with single path via multipath attribute          [ OK ]
          TEST: Invalid nexthop                                               [ OK ]
          TEST: Single path - replace of non-existent route                   [ OK ]
          TEST: Multipath with multipath                                      [ OK ]
          TEST: Multipath with single path                                    [ OK ]
          TEST: Multipath with single path via multipath attribute            [ OK ]
          TEST: Multipath with reject route                                   [ OK ]
          TEST: Multipath - invalid first nexthop                             [ OK ]
          TEST: Multipath - invalid second nexthop                            [ OK ]
          TEST: Multipath - replace of non-existent route                     [ OK ]
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abb1860a
    • David Ahern's avatar
      selftests: fib_tests: Add ipv6 route add append replace tests · f9a5a9d8
      David Ahern authored
      Add IPv6 route tests covering add, append and replace permutations.
      Assumes the ability to add a basic single path route works; this is
      required for example when adding an address to an interface.
      
      $ fib_tests.sh -t ipv6_rt
      
      IPv6 route add / append tests
          TEST: Attempt to add duplicate route - gw                           [ OK ]
          TEST: Attempt to add duplicate route - dev only                     [ OK ]
          TEST: Attempt to add duplicate route - reject route                 [ OK ]
          TEST: Add new route for existing prefix (w/o NLM_F_EXCL)            [ OK ]
          TEST: Append nexthop to existing route - gw                         [ OK ]
          TEST: Append nexthop to existing route - dev only                   [ OK ]
          TEST: Append nexthop to existing route - reject route               [ OK ]
          TEST: Append nexthop to existing reject route - gw                  [ OK ]
          TEST: Append nexthop to existing reject route - dev only            [ OK ]
          TEST: Add multipath route                                           [ OK ]
          TEST: Attempt to add duplicate multipath route                      [ OK ]
          TEST: Route add with different metrics                              [ OK ]
          TEST: Route delete with metric                                      [ OK ]
      
      IPv6 route replace tests
          TEST: Single path with single path                                  [ OK ]
          TEST: Single path with multipath                                    [ OK ]
          TEST: Single path with reject route                                 [ OK ]
          TEST: Single path with single path via multipath attribute          [ OK ]
          TEST: Invalid nexthop                                               [ OK ]
          TEST: Single path - replace of non-existent route                   [ OK ]
          TEST: Multipath with multipath                                      [ OK ]
          TEST: Multipath with single path                                    [ OK ]
          TEST: Multipath with single path via multipath attribute            [ OK ]
          TEST: Multipath with reject route                                   [ OK ]
          TEST: Multipath - invalid first nexthop                             [ OK ]
          TEST: Multipath - invalid second nexthop                            [ OK ]
          TEST: Multipath - replace of non-existent route                     [ OK ]
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9a5a9d8
    • David Ahern's avatar
      selftests: fib_tests: Add option to pause after each test · 7df15e6c
      David Ahern authored
      Add option to pause after each test before cleanup is done. Allows
      user to do manual inspection or more ad-hoc testing after each test
      with the setup in tact.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7df15e6c
    • David Ahern's avatar
      selftests: fib_tests: Add command line options · 1c7447b4
      David Ahern authored
      Add command line options for controlling pause on fail, controlling
      specific tests to run and verbose mode rather than relying on environment
      variables.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c7447b4
    • David Ahern's avatar
      selftests: fib_tests: Add success-fail counts · 37ce42c1
      David Ahern authored
      As more tests are added, it is convenient to have a tally at the end.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37ce42c1
    • David Ahern's avatar
      net/ipv6: Simplify route replace and appending into multipath route · f34436a4
      David Ahern authored
      Bring consistency to ipv6 route replace and append semantics.
      
      Remove rt6_qualify_for_ecmp which is just guess work. It fails in 2 cases:
      1. can not replace a route with a reject route. Existing code appends
         a new route instead of replacing the existing one.
      
      2. can not have a multipath route where a leg uses a dev only nexthop
      
      Existing use cases affected by this change:
      1. adding a route with existing prefix and metric using NLM_F_CREATE
         without NLM_F_APPEND or NLM_F_EXCL (ie., what iproute2 calls
         'prepend'). Existing code auto-determines that the new nexthop can
         be appended to an existing route to create a multipath route. This
         change breaks that by requiring the APPEND flag for the new route
         to be added to an existing one. Instead the prepend just adds another
         route entry.
      
      2. route replace. Existing code replaces first matching multipath route
         if new route is multipath capable and fallback to first matching
         non-ECMP route (reject or dev only route) in case one isn't available.
         New behavior replaces first matching route. (Thanks to Ido for spotting
         this one)
      
      Note: Newer iproute2 is needed to display multipath routes with a dev-only
            nexthop. This is due to a bug in iproute2 and parsing nexthops.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f34436a4
    • David Ahern's avatar
      mlxsw: spectrum_router: Add support for route append · 5a15a1b0
      David Ahern authored
      Handle append for gateway based routes. Dev-only multipath routes will
      be handled by a follow on patch.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a15a1b0
    • Jacob Keller's avatar
      i40e: use the more traditional 'i' loop variable · 3f76bdb4
      Jacob Keller authored
      Since we no longer use i as an array index for the data variable,
      replace the use of 'j' with 'i' so that we match the general loop
      variable name.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      3f76bdb4
    • Jacob Keller's avatar
      i40e: add function doc headers for ethtool stats functions · ec29bbf8
      Jacob Keller authored
      Add documentation for the i40e_get_stats_count, i40e_get_stat_strings
      and i40e_get_ethtool_stats explaining that the number and ordering of
      statistics must remain constant for a given netdevice.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      ec29bbf8
    • Jacob Keller's avatar
      i40e: update data pointer directly when copying to the buffer · e08696dc
      Jacob Keller authored
      A future patch is going to add a helper function i40e_add_ethtool_stats
      that will help lower the amount of boiler plate code in the
      i40e_get_ethtool_stats function.
      
      This conversion will take place over many patches, and the helper
      function will work by directly updating a reference to the data pointer.
      
      Since this would not work combined with the current method of accessing
      data like an array, update all the code that copies stats into the data
      buffer to use direct updates to the pointer instead of array accesses.
      
      This will prevent incorrect stat updates for patches in between the
      conversion.
      
      Similarly, when copying strings, we used a separate char *p pointer.
      Instead, use the data pointer directly as it's already a (u8 *) type
      which is the same size.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      e08696dc
    • Jacob Keller's avatar
      i40e: fold prefix strings directly into stat names · bf1c39e6
      Jacob Keller authored
      We always prefix these stats with a fixed string, so just fold this
      prefix into the stat string definition. This preparatory work will make
      it easier to implement a helper function to copy stats and strings into
      the supplied buffers in a future patch.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      bf1c39e6
    • Jacob Keller's avatar
      i40e: use WARN_ONCE to replace the commented BUG_ON size check · 9b10df59
      Jacob Keller authored
      We don't really want to use BUG_ON here since that would completely
      crash the kernel, thus the reason we commented it out. We *can't* use
      BUILD_BUG_ON because at least now (a) the sizes aren't constant (we are
      fixing this) and (b) not all compilers are smart enough to understand
      that "p - data" is a constant.
      
      Instead, just use a WARN_ONCE so that the first time we end up with an
      incorrect size we will dump a stack trace and a message, hopefully
      highlighting the issues early in testing.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      9b10df59
    • Jacob Keller's avatar
      i40e: split i40e_get_strings() into smaller functions · 019b9cd4
      Jacob Keller authored
      Split the statistic strings and private flags strings into their own
      separate functions to aid code readability.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      019b9cd4
    • Jacob Keller's avatar
      i40e: always return all queue stat strings · b8312365
      Jacob Keller authored
      The ethtool API for obtaining device statistics is not intended to allow
      runtime changes in the number of statistics reported. It may *appear*
      this way, as there is an ability to request the number of stats using
      ethtool_get_set_count(). However, it is expected that this must always
      return the same value for invocations of the same device.
      
      If we don't satisfy this contract, and allow the number of stats to
      change during run time, we could cause invalid memory accesses or report
      the stat strings incorrectly. This is because the API for obtaining
      stats is to (1) get the size, (2) get the strings and finally (3) get
      the stats. Since these are each separate ethtool op commands, it is not
      possible to maintain consistency by holding the RTNL lock over the whole
      operation. This results in the potential for a race condition to occur
      where the size changed between any of the 3 calls.
      
      Avoid this issue by requiring that we always return the same value for
      a given device. We can check any values which remain constant for the
      life of the device, but must not report different sizes depending on
      runtime attributes.
      
      This patch specifically fixes the queue statistics to always return
      every queue even if it's not currently in use.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      b8312365
    • Jacob Keller's avatar
      i40e: always return VEB stat strings · 9955d494
      Jacob Keller authored
      The ethtool API for obtaining device statistics is not intended to allow
      runtime changes in the number of statistics reported. It may *appear*
      this way, as there is an ability to request the number of stats using
      ethtool_get_set_count(). However, it is expected that this must always
      return the same value for invocations of the same device.
      
      If we don't satisfy this contract, and allow the number of stats to
      change during run time, we could cause invalid memory accesses or report
      the stat strings incorrectly. This is because the API for obtaining
      stats is to (1) get the size, (2) get the strings and finally (3) get
      the stats. Since these are each separate ethtool op commands, it is not
      possible to maintain consistency by holding the RTNL lock over the whole
      operation. This results in the potential for a race condition to occur
      where the size changed between any of the 3 calls.
      
      Avoid this issue by requiring that we always return the same value for
      a given device. We can check any values which remain constant for the
      life of the device, but must not report different sizes depending on
      runtime attributes.
      
      This patch specifically fixes the VEB statistics strings to always be
      reported. Other issues will be fixed in future patches.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      9955d494
    • Jacob Keller's avatar
      i40e: free skb after clearing lock in ptp_stop · bdf27523
      Jacob Keller authored
      Use the same logic to free the skb after clearing the Tx timestamp bit
      lock in i40e_ptp_stop as we use in the other locations. It is not as
      important here since we are not racing against a future Tx timestamp
      request (as we are disabling PTP at this point). However it is good to
      be consistent in how we approach the bit lock so that future callers
      don't copy the old anti-pattern.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      bdf27523
  3. 21 May, 2018 3 commits