1. 08 Jan, 2018 28 commits
  2. 06 Jan, 2018 3 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-stacktrace-map-next-key-support' · 9be99bad
      Daniel Borkmann authored
      Yonghong Song says:
      
      ====================
      The patch set implements bpf syscall command BPF_MAP_GET_NEXT_KEY
      for stacktrace map. Patch #1 is the core implementation
      and Patch #2 implements a bpf test at tools/testing/selftests/bpf
      directory. Please see individual patch comments for details.
      
      Changelog:
        v1 -> v2:
         - For invalid key (key pointer is non-NULL), sets next_key to be the first valid key.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      9be99bad
    • Yonghong Song's avatar
      tools/bpf: add a bpf selftest for stacktrace · 3ced9b60
      Yonghong Song authored
      Added a bpf selftest in test_progs at tools directory for stacktrace.
      The test will populate a hashtable map and a stacktrace map
      at the same time with the same key, stackid.
      The user space will compare both maps, using BPF_MAP_LOOKUP_ELEM
      command and BPF_MAP_GET_NEXT_KEY command, to ensure that both have
      the same set of keys.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      3ced9b60
    • Yonghong Song's avatar
      bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map · 16f07c55
      Yonghong Song authored
      Currently, bpf syscall command BPF_MAP_GET_NEXT_KEY is not
      supported for stacktrace map. However, there are use cases where
      user space wants to enumerate all stacktrace map entries where
      BPF_MAP_GET_NEXT_KEY command will be really helpful.
      In addition, if user space wants to delete all map entries
      in order to save memory and does not want to close the
      map file descriptor, BPF_MAP_GET_NEXT_KEY may help improve
      performance if map entries are sparsely populated.
      
      The implementation has similar behavior for
      BPF_MAP_GET_NEXT_KEY implementation in hashtab. If user provides
      a NULL key pointer or an invalid key, the first key is returned.
      Otherwise, the first valid key after the input parameter "key"
      is returned, or -ENOENT if no valid key can be found.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      16f07c55
  3. 05 Jan, 2018 9 commits
    • Alexei Starovoitov's avatar
      Merge branch 'xdp_rxq_info' · 11d16edb
      Alexei Starovoitov authored
      Jesper Dangaard Brouer says:
      
      ====================
      V4:
      * Added reviewers/acks to patches
      * Fix patch desc in i40e that got out-of-sync with code
      * Add SPDX license headers for the two new files added in patch 14
      
      V3:
      * Fixed bug in virtio_net driver
      * Removed export of xdp_rxq_info_init()
      
      V2:
      * Changed API exposed to drivers
        - Removed invocation of "init" in drivers, and only call "reg"
          (Suggested by Saeed)
        - Allow "reg" to fail and handle this in drivers
          (Suggested by David Ahern)
      * Removed the SINKQ qtype, instead allow to register as "unused"
      * Also fixed some drivers during testing on actual HW (noted in patches)
      
      There is a need for XDP to know more about the RX-queue a given XDP
      frames have arrived on.  For both the XDP bpf-prog and kernel side.
      
      Instead of extending struct xdp_buff each time new info is needed,
      this patchset takes a different approach.  Struct xdp_buff is only
      extended with a pointer to a struct xdp_rxq_info (allowing for easier
      extending this later).  This xdp_rxq_info contains information related
      to how the driver have setup the individual RX-queue's.  This is
      read-mostly information, and all xdp_buff frames (in drivers
      napi_poll) point to the same xdp_rxq_info (per RX-queue).
      
      We stress this data/cache-line is for read-mostly info.  This is NOT
      for dynamic per packet info, use the data_meta for such use-cases.
      
      This patchset start out small, and only expose ingress_ifindex and the
      RX-queue index to the XDP/BPF program. Access to tangible info like
      the ingress ifindex and RX queue index, is fairly easy to comprehent.
      The other future use-cases could allow XDP frames to be recycled back
      to the originating device driver, by providing info on RX device and
      queue number.
      
      As XDP doesn't have driver feature flags, and eBPF code due to
      bpf-tail-calls cannot determine that XDP driver invoke it, this
      patchset have to update every driver that support XDP.
      
      For driver developers (review individual driver patches!):
      
      The xdp_rxq_info is tied to the drivers RX-ring(s). Whenever a RX-ring
      modification require (temporary) stopping RX frames, then the
      xdp_rxq_info should (likely) also be unregistred and re-registered,
      especially if reallocating the pages in the ring. Make sure ethtool
      set_channels does the right thing. When replacing XDP prog, if and
      only if RX-ring need to be changed, then also re-register the
      xdp_rxq_info.
      
      I'm Cc'ing the individual driver patches to the registered maintainers.
      
      Testing:
      
      I've only tested the NIC drivers I have hardware for.  The general
      test procedure is to (DUT = Device Under Test):
       (1) run pktgen script pktgen_sample04_many_flows.sh       (against DUT)
       (2) run samples/bpf program xdp_rxq_info --dev $DEV       (on DUT)
       (3) runtime modify number of NIC queues via ethtool -L    (on DUT)
       (4) runtime modify number of NIC ring-size via ethtool -G (on DUT)
      
      Patch based on git tree bpf-next (at commit fb982666):
       https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      11d16edb
    • Jesper Dangaard Brouer's avatar
      samples/bpf: program demonstrating access to xdp_rxq_info · 0fca931a
      Jesper Dangaard Brouer authored
      This sample program can be used for monitoring and reporting how many
      packets per sec (pps) are received per NIC RX queue index and which
      CPU processed the packet. In itself it is a useful tool for quickly
      identifying RSS imbalance issues, see below.
      
      The default XDP action is XDP_PASS in-order to provide a monitor
      mode. For benchmarking purposes it is possible to specify other XDP
      actions on the cmdline --action.
      
      Output below shows an imbalance RSS case where most RXQ's deliver to
      CPU-0 while CPU-2 only get packets from a single RXQ.  Looking at
      things from a CPU level the two CPUs are processing approx the same
      amount, BUT looking at the rx_queue_index levels it is clear that
      RXQ-2 receive much better service, than other RXQs which all share CPU-0.
      
      Running XDP on dev:i40e1 (ifindex:3) action:XDP_PASS
      XDP stats       CPU     pps         issue-pps
      XDP-RX CPU      0       900,473     0
      XDP-RX CPU      2       906,921     0
      XDP-RX CPU      total   1,807,395
      
      RXQ stats       RXQ:CPU pps         issue-pps
      rx_queue_index    0:0   180,098     0
      rx_queue_index    0:sum 180,098
      rx_queue_index    1:0   180,098     0
      rx_queue_index    1:sum 180,098
      rx_queue_index    2:2   906,921     0
      rx_queue_index    2:sum 906,921
      rx_queue_index    3:0   180,098     0
      rx_queue_index    3:sum 180,098
      rx_queue_index    4:0   180,082     0
      rx_queue_index    4:sum 180,082
      rx_queue_index    5:0   180,093     0
      rx_queue_index    5:sum 180,093
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0fca931a
    • Jesper Dangaard Brouer's avatar
      bpf: finally expose xdp_rxq_info to XDP bpf-programs · 02dd3291
      Jesper Dangaard Brouer authored
      Now all XDP driver have been updated to setup xdp_rxq_info and assign
      this to xdp_buff->rxq.  Thus, it is now safe to enable access to some
      of the xdp_rxq_info struct members.
      
      This patch extend xdp_md and expose UAPI to userspace for
      ingress_ifindex and rx_queue_index.  Access happens via bpf
      instruction rewrite, that load data directly from struct xdp_rxq_info.
      
      * ingress_ifindex map to xdp_rxq_info->dev->ifindex
      * rx_queue_index  map to xdp_rxq_info->queue_index
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      02dd3291
    • Jesper Dangaard Brouer's avatar
      xdp: generic XDP handling of xdp_rxq_info · e817f856
      Jesper Dangaard Brouer authored
      Hook points for xdp_rxq_info:
       * reg  : netif_alloc_rx_queues
       * unreg: netif_free_rx_queues
      
      The net_device have some members (num_rx_queues + real_num_rx_queues)
      and data-area (dev->_rx with struct netdev_rx_queue's) that were
      primarily used for exporting information about RPS (CONFIG_RPS) queues
      to sysfs (CONFIG_SYSFS).
      
      For generic XDP extend struct netdev_rx_queue with the xdp_rxq_info,
      and remove some of the CONFIG_SYSFS ifdefs.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e817f856
    • Jesper Dangaard Brouer's avatar
      virtio_net: setup xdp_rxq_info · 754b8a21
      Jesper Dangaard Brouer authored
      The virtio_net driver doesn't dynamically change the RX-ring queue
      layout and backing pages, but instead reject XDP setup if all the
      conditions for XDP is not meet.  Thus, the xdp_rxq_info also remains
      fairly static.  This allow us to simply add the reg/unreg to
      net_device open/close functions.
      
      Driver hook points for xdp_rxq_info:
       * reg  : virtnet_open
       * unreg: virtnet_close
      
      V3:
       - bugfix, also setup xdp.rxq in receive_mergeable()
       - Tested bpf-sample prog inside guest on a virtio_net device
      
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: virtualization@lists.linux-foundation.org
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Reviewed-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      754b8a21
    • Jesper Dangaard Brouer's avatar
      tun: setup xdp_rxq_info · 8bf5c4ee
      Jesper Dangaard Brouer authored
      Driver hook points for xdp_rxq_info:
       * reg  : tun_attach
       * unreg: __tun_detach
      
      I've done some manual testing of this tun driver, but I would
      appriciate good review and someone else running their use-case tests,
      as I'm not 100% sure I understand the tfile->detached semantics.
      
      V2: Removed the skb_array_cleanup() call from V1 by request from Jason Wang.
      
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Reviewed-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8bf5c4ee
    • Jesper Dangaard Brouer's avatar
      thunderx: setup xdp_rxq_info · 27e95e36
      Jesper Dangaard Brouer authored
      This driver uses a bool scheme for "enable"/"disable" when setting up
      different resources.  Thus, the hook points for xdp_rxq_info is done
      in the same function call nicvf_rcv_queue_config().  This is activated
      through enable/disable via nicvf_config_data_transfer(), which is tied
      into nicvf_stop()/nicvf_open().
      
      Extending driver packet handler call-path nicvf_rcv_pkt_handler() with
      a pointer to the given struct rcv_queue, in-order to access the
      xdp_rxq_info data area (in nicvf_xdp_rx()).
      
      V2: Driver have no proper error path for failed XDP RX-queue info reg,
      as nicvf_rcv_queue_config is a void function.
      
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Sunil Goutham <sgoutham@cavium.com>
      Cc: Robert Richter <rric@kernel.org>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      27e95e36
    • Jesper Dangaard Brouer's avatar
      nfp: setup xdp_rxq_info · 7f1c684a
      Jesper Dangaard Brouer authored
      Driver hook points for xdp_rxq_info:
       * reg  : nfp_net_rx_ring_alloc
       * unreg: nfp_net_rx_ring_free
      
      In struct nfp_net_rx_ring moved member @size into a hole on 64-bit.
      Thus, the size remaines the same after adding member @xdp_rxq.
      
      Cc: oss-drivers@netronome.com
      Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
      Cc: Simon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7f1c684a
    • Jesper Dangaard Brouer's avatar
      bnxt_en: setup xdp_rxq_info · 96a8604f
      Jesper Dangaard Brouer authored
      Driver hook points for xdp_rxq_info:
       * reg  : bnxt_alloc_rx_rings
       * unreg: bnxt_free_rx_rings
      
      This driver should be updated to re-register when changing
      allocation mode of RX rings.
      
      Tested on actual hardware.
      
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Michael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      96a8604f