1. 17 Jan, 2020 5 commits
    • Alexei Starovoitov's avatar
      Merge branch 'xdp_redirect-bulking' · ba926603
      Alexei Starovoitov authored
      Toke Høiland-Jørgensen says:
      
      ====================
      Since commit 96360004 ("xdp: Make devmap flush_list common for all map
      instances"), devmap flushing is a global operation instead of tied to a
      particular map. This means that with a bit of refactoring, we can finally fix
      the performance delta between the bpf_redirect_map() and bpf_redirect() helper
      functions, by introducing bulking for the latter as well.
      
      This series makes this change by moving the data structure used for the bulking
      into struct net_device itself, so we can access it even when there is not
      devmap. Once this is done, moving the bpf_redirect() helper to use the bulking
      mechanism becomes quite trivial, and brings bpf_redirect() up to the same as
      bpf_redirect_map():
      
                             Before:   After:
      1 CPU:
      bpf_redirect_map:      8.4 Mpps  8.4 Mpps  (no change)
      bpf_redirect:          5.0 Mpps  8.4 Mpps  (+68%)
      2 CPUs:
      bpf_redirect_map:     15.9 Mpps  16.1 Mpps  (+1% or ~no change)
      bpf_redirect:          9.5 Mpps  15.9 Mpps  (+67%)
      
      After this patch series, the only semantics different between the two variants
      of the bpf() helper (apart from the absence of a map argument, obviously) is
      that the _map() variant will return an error if passed an invalid map index,
      whereas the bpf_redirect() helper will succeed, but drop packets on
      xdp_do_redirect(). This is because the helper has no reference to the calling
      netdev, so unfortunately we can't do the ifindex lookup directly in the helper.
      
      Changelog:
      
      v3:
        - Switch two more fields to avoid a list_head spanning two cache lines
        - Include Jesper's tracepoint patch
        - Also rename xdp_do_flush_map()
        - Fix a few nits from Maciej
      
      v2:
        - Consolidate code paths and tracepoints for map and non-map redirect variants
          (Björn)
        - Add performance data for 2-CPU test (Jesper)
        - Move fields to avoid shifting cache lines in struct net_device (Eric)
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ba926603
    • Jesper Dangaard Brouer's avatar
      devmap: Adjust tracepoint for map-less queue flush · 58aa94f9
      Jesper Dangaard Brouer authored
      Now that we don't have a reference to a devmap when flushing the device
      bulk queue, let's change the the devmap_xmit tracepoint to remote the
      map_id and map_index fields entirely. Rearrange the fields so 'drops' and
      'sent' stay in the same position in the tracepoint struct, to make it
      possible for the xdp_monitor utility to read both the old and the new
      format.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/157918768613.1458396.9165902403373826572.stgit@toke.dk
      58aa94f9
    • Toke Høiland-Jørgensen's avatar
      xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths · 1d233886
      Toke Høiland-Jørgensen authored
      Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
      we can re-use the bulking for the non-map version of the bpf_redirect()
      helper. This is a simple matter of having xdp_do_redirect_slow() queue the
      frame on the bulk queue instead of sending it out with __bpf_tx_xdp().
      
      Unfortunately we can't make the bpf_redirect() helper return an error if
      the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
      have a reference to the network namespace of the ingress device at the time
      the helper is called. So we have to leave it as-is and keep the device
      lookup in xdp_do_redirect_slow().
      
      Since this leaves less reason to have the non-map redirect code in a
      separate function, so we get rid of the xdp_do_redirect_slow() function
      entirely. This does lose us the tracepoint disambiguation, but fortunately
      the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint
      entry structures. This means both can contain a map index, so we can just
      amend the tracepoint definitions so we always emit the xdp_redirect(_err)
      tracepoints, but with the map ID only populated if a map is present. This
      means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep
      the definitions around in case someone is still listening for them.
      
      With this change, the performance of the xdp_redirect sample program goes
      from 5Mpps to 8.4Mpps (a 68% increase).
      
      Since the flush functions are no longer map-specific, rename the flush()
      functions to drop _map from their names. One of the renamed functions is
      the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To
      keep from having to update all drivers, use a #define to keep the old name
      working, and only update the virtual drivers in this patch.
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk
      1d233886
    • Toke Høiland-Jørgensen's avatar
      xdp: Move devmap bulk queue into struct net_device · 75ccae62
      Toke Høiland-Jørgensen authored
      Commit 96360004 ("xdp: Make devmap flush_list common for all map
      instances"), changed devmap flushing to be a global operation instead of a
      per-map operation. However, the queue structure used for bulking was still
      allocated as part of the containing map.
      
      This patch moves the devmap bulk queue into struct net_device. The
      motivation for this is reusing it for the non-map variant of XDP_REDIRECT,
      which will be changed in a subsequent commit.  To avoid other fields of
      struct net_device moving to different cache lines, we also move a couple of
      other members around.
      
      We defer the actual allocation of the bulk queue structure until the
      NETDEV_REGISTER notification devmap.c. This makes it possible to check for
      ndo_xdp_xmit support before allocating the structure, which is not possible
      at the time struct net_device is allocated. However, we keep the freeing in
      free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER.
      
      Because of this change, we lose the reference back to the map that
      originated the redirect, so change the tracepoint to always return 0 as the
      map ID and index. Otherwise no functional change is intended with this
      patch.
      
      After this patch, the relevant part of struct net_device looks like this,
      according to pahole:
      
      	/* --- cacheline 14 boundary (896 bytes) --- */
      	struct netdev_queue *      _tx __attribute__((__aligned__(64))); /*   896     8 */
      	unsigned int               num_tx_queues;        /*   904     4 */
      	unsigned int               real_num_tx_queues;   /*   908     4 */
      	struct Qdisc *             qdisc;                /*   912     8 */
      	unsigned int               tx_queue_len;         /*   920     4 */
      	spinlock_t                 tx_global_lock;       /*   924     4 */
      	struct xdp_dev_bulk_queue * xdp_bulkq;           /*   928     8 */
      	struct xps_dev_maps *      xps_cpus_map;         /*   936     8 */
      	struct xps_dev_maps *      xps_rxqs_map;         /*   944     8 */
      	struct mini_Qdisc *        miniq_egress;         /*   952     8 */
      	/* --- cacheline 15 boundary (960 bytes) --- */
      	struct hlist_head  qdisc_hash[16];               /*   960   128 */
      	/* --- cacheline 17 boundary (1088 bytes) --- */
      	struct timer_list  watchdog_timer;               /*  1088    40 */
      
      	/* XXX last struct has 4 bytes of padding */
      
      	int                        watchdog_timeo;       /*  1128     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	struct list_head   todo_list;                    /*  1136    16 */
      	/* --- cacheline 18 boundary (1152 bytes) --- */
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/157918768397.1458396.12673224324627072349.stgit@toke.dk
      75ccae62
    • Andrii Nakryiko's avatar
      libbpf: Revert bpf_helper_defs.h inclusion regression · 20f21d98
      Andrii Nakryiko authored
      Revert bpf_helpers.h's change to include auto-generated bpf_helper_defs.h
      through <> instead of "", which causes it to be searched in include path. This
      can break existing applications that don't have their include path pointing
      directly to where libbpf installs its headers.
      
      There is ongoing work to make all (not just bpf_helper_defs.h) includes more
      consistent across libbpf and its consumers, but this unbreaks user code as is
      right now without any regressions. Selftests still behave sub-optimally
      (taking bpf_helper_defs.h from libbpf's source directory, if it's present
      there), which will be fixed in subsequent patches.
      
      Fixes: 6910d7d3 ("selftests/bpf: Ensure bpf_helper_defs.h are taken from selftests dir")
      Reported-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200117004103.148068-1-andriin@fb.com
      20f21d98
  2. 16 Jan, 2020 3 commits
  3. 15 Jan, 2020 22 commits
  4. 14 Jan, 2020 9 commits
  5. 10 Jan, 2020 1 commit
    • Andrii Nakryiko's avatar
      selftests/bpf: Add BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macros · ac065870
      Andrii Nakryiko authored
      Streamline BPF_TRACE_x macro by moving out return type and section attribute
      definition out of macro itself. That makes those function look in source code
      similar to other BPF programs. Additionally, simplify its usage by determining
      number of arguments automatically (so just single BPF_TRACE vs a family of
      BPF_TRACE_1, BPF_TRACE_2, etc). Also, allow more natural function argument
      syntax without commas inbetween argument type and name.
      
      Given this helper is useful not only for tracing tp_btf/fenty/fexit programs,
      but could be used for LSM programs and others following the same pattern,
      rename BPF_TRACE macro into more generic BPF_PROG. Existing BPF_TRACE_x
      usages in selftests are converted to new BPF_PROG macro.
      
      Following the same pattern, define BPF_KPROBE and BPF_KRETPROBE macros for
      nicer usage of kprobe/kretprobe arguments, respectively. BPF_KRETPROBE, adopts
      same convention used by fexit programs, that last defined argument is probed
      function's return result.
      
      v4->v5:
      - fix test_overhead test (__set_task_comm is void) (Alexei);
      
      v3->v4:
      - rebased and fixed one more BPF_TRACE_x occurence (Alexei);
      
      v2->v3:
      - rename to shorter and as generic BPF_PROG (Alexei);
      
      v1->v2:
      - verified GCC handles pragmas as expected;
      - added descriptions to macros;
      - converted new STRUCT_OPS selftest to BPF_HANDLER (worked as expected);
      - added original context as 'ctx' parameter, for cases where it has to be
        passed into BPF helpers. This might cause an accidental naming collision,
        unfortunately, but at least it's easy to work around. Fortunately, this
        situation produces quite legible compilation error:
      
      progs/bpf_dctcp.c:46:6: error: redefinition of 'ctx' with a different type: 'int' vs 'unsigned long long *'
              int ctx = 123;
                  ^
      progs/bpf_dctcp.c:42:6: note: previous definition is here
      void BPF_HANDLER(dctcp_init, struct sock *sk)
           ^
      ./bpf_trace_helpers.h:58:32: note: expanded from macro 'BPF_HANDLER'
      ____##name(unsigned long long *ctx, ##args)
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20200110211634.1614739-1-andriin@fb.com
      ac065870