1. 16 Dec, 2019 10 commits
    • Prashant Bhole's avatar
      libbpf: Fix build by renaming variables · a79ac2d1
      Prashant Bhole authored
      In btf__align_of() variable name 't' is shadowed by inner block
      declaration of another variable with same name. Patch renames
      variables in order to fix it.
      
        CC       sharedobjs/btf.o
      btf.c: In function ‘btf__align_of’:
      btf.c:303:21: error: declaration of ‘t’ shadows a previous local [-Werror=shadow]
        303 |   int i, align = 1, t;
            |                     ^
      btf.c:283:25: note: shadowed declaration is here
        283 |  const struct btf_type *t = btf__type_by_id(btf, id);
            |
      
      Fixes: 3d208f4c ("libbpf: Expose btf__align_of() API")
      Signed-off-by: default avatarPrashant Bhole <prashantbhole.linux@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20191216082738.28421-1-prashantbhole.linux@gmail.com
      a79ac2d1
    • Alexei Starovoitov's avatar
      Merge branch 'support-flex-arrays' · 0849e102
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      Add support for flexible array accesses in a relocatable manner in BPF CO-RE.
      It's a typical pattern in C, and kernel in particular, to provide
      a fixed-length struct with zero-sized or dimensionless array at the end. In
      such cases variable-sized array contents follows immediately after the end of
      a struct. This patch set adds support for such access pattern by allowing
      accesses to such arrays.
      
      Patch #1 adds libbpf support. Patch #2 adds few test cases for validation.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0849e102
    • Andrii Nakryiko's avatar
      selftests/bpf: Add flexible array relocation tests · 5f2eecef
      Andrii Nakryiko authored
      Add few tests validation CO-RE relocation handling of flexible array accesses.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191215070844.1014385-3-andriin@fb.com
      5f2eecef
    • Andrii Nakryiko's avatar
      libbpf: Support flexible arrays in CO-RE · 1b484b30
      Andrii Nakryiko authored
      Some data stuctures in kernel are defined with either zero-sized array or
      flexible (dimensionless) array at the end of a struct. Actual data of such
      array follows in memory immediately after the end of that struct, forming its
      variable-sized "body" of elements. Support such access pattern in CO-RE
      relocation handling.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191215070844.1014385-2-andriin@fb.com
      1b484b30
    • Alexei Starovoitov's avatar
      Merge branch 'extern-var-support' · 01c6f7aa
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      It's often important for BPF program to know kernel version or some specific
      config values (e.g., CONFIG_HZ to convert jiffies to seconds) and change or
      adjust program logic based on their values. As of today, any such need has to
      be resolved by recompiling BPF program for specific kernel and kernel
      configuration. In practice this is usually achieved by using BCC and its
      embedded LLVM/Clang. With such set up #ifdef CONFIG_XXX and similar
      compile-time constructs allow to deal with kernel varieties.
      
      With CO-RE (Compile Once – Run Everywhere) approach, this is not an option,
      unfortunately. All such logic variations have to be done as a normal
      C language constructs (i.e., if/else, variables, etc), not a preprocessor
      directives. This patch series add support for such advanced scenarios through
      C extern variables. These extern variables will be recognized by libbpf and
      supplied through extra .extern internal map, similarly to global data. This
      .extern map is read-only, which allows BPF verifier to track its content
      precisely as constants. That gives an opportunity to have pre-compiled BPF
      program, which can potentially use BPF functionality (e.g., BPF helpers) or
      kernel features (types, fields, etc), that are available only on a subset of
      targeted kernels, while effectively eleminating (through verifier's dead code
      detection) such unsupported functionality for other kernels (typically, older
      versions). Patch #3 explicitly tests a scenario of using unsupported BPF
      helper, to validate the approach.
      
      This patch set heavily relies on BTF type information emitted by compiler for
      each extern variable declaration. Based on specific types, libbpf does strict
      checks of config data values correctness. See patch #1 for details.
      
      Outline of the patch set:
      - patch #1 does a small clean up of internal map names contants;
      - patch #2 adds all of the libbpf internal machinery for externs support,
        including setting up BTF information for .extern data section;
      - patch #3 adds support for .extern into BPF skeleton;
      - patch #4 adds externs selftests, as well as enhances test_skeleton.c test to
        validate mmap()-ed .extern datasection functionality.
      
      v3->v4:
      - clean up copyrights and rebase onto latest skeleton patches (Alexei);
      
      v2->v3:
      - truncate too long strings (Alexei);
      - clean ups, adding comments (Alexei);
      
      v1->v2:
      - use BTF type information for externs (Alexei);
      - add strings support;
      - add BPF skeleton support for .extern.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      01c6f7aa
    • Andrii Nakryiko's avatar
      selftests/bpf: Add tests for libbpf-provided externs · 330a73a7
      Andrii Nakryiko authored
      Add a set of tests validating libbpf-provided extern variables. One crucial
      feature that's tested is dead code elimination together with using invalid BPF
      helper. CONFIG_MISSING is not supposed to exist and should always be specified
      by libbpf as zero, which allows BPF verifier to correctly do branch pruning
      and not fail validation, when invalid BPF helper is called from dead if branch.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191214014710.3449601-5-andriin@fb.com
      330a73a7
    • Andrii Nakryiko's avatar
      bpftool: Generate externs datasec in BPF skeleton · 2ad97d47
      Andrii Nakryiko authored
      Add support for generation of mmap()-ed read-only view of libbpf-provided
      extern variables. As externs are not supposed to be provided by user code
      (that's what .data, .bss, and .rodata is for), don't mmap() it initially. Only
      after skeleton load is performed, map .extern contents as read-only memory.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191214014710.3449601-4-andriin@fb.com
      2ad97d47
    • Andrii Nakryiko's avatar
      libbpf: Support libbpf-provided extern variables · 166750bc
      Andrii Nakryiko authored
      Add support for extern variables, provided to BPF program by libbpf. Currently
      the following extern variables are supported:
        - LINUX_KERNEL_VERSION; version of a kernel in which BPF program is
          executing, follows KERNEL_VERSION() macro convention, can be 4- and 8-byte
          long;
        - CONFIG_xxx values; a set of values of actual kernel config. Tristate,
          boolean, strings, and integer values are supported.
      
      Set of possible values is determined by declared type of extern variable.
      Supported types of variables are:
      - Tristate values. Are represented as `enum libbpf_tristate`. Accepted values
        are **strictly** 'y', 'n', or 'm', which are represented as TRI_YES, TRI_NO,
        or TRI_MODULE, respectively.
      - Boolean values. Are represented as bool (_Bool) types. Accepted values are
        'y' and 'n' only, turning into true/false values, respectively.
      - Single-character values. Can be used both as a substritute for
        bool/tristate, or as a small-range integer:
        - 'y'/'n'/'m' are represented as is, as characters 'y', 'n', or 'm';
        - integers in a range [-128, 127] or [0, 255] (depending on signedness of
          char in target architecture) are recognized and represented with
          respective values of char type.
      - Strings. String values are declared as fixed-length char arrays. String of
        up to that length will be accepted and put in first N bytes of char array,
        with the rest of bytes zeroed out. If config string value is longer than
        space alloted, it will be truncated and warning message emitted. Char array
        is always zero terminated. String literals in config have to be enclosed in
        double quotes, just like C-style string literals.
      - Integers. 8-, 16-, 32-, and 64-bit integers are supported, both signed and
        unsigned variants. Libbpf enforces parsed config value to be in the
        supported range of corresponding integer type. Integers values in config can
        be:
        - decimal integers, with optional + and - signs;
        - hexadecimal integers, prefixed with 0x or 0X;
        - octal integers, starting with 0.
      
      Config file itself is searched in /boot/config-$(uname -r) location with
      fallback to /proc/config.gz, unless config path is specified explicitly
      through bpf_object_open_opts' kernel_config_path option. Both gzipped and
      plain text formats are supported. Libbpf adds explicit dependency on zlib
      because of this, but this shouldn't be a problem, given libelf already depends
      on zlib.
      
      All detected extern variables, are put into a separate .extern internal map.
      It, similarly to .rodata map, is marked as read-only from BPF program side, as
      well as is frozen on load. This allows BPF verifier to track extern values as
      constants and perform enhanced branch prediction and dead code elimination.
      This can be relied upon for doing kernel version/feature detection and using
      potentially unsupported field relocations or BPF helpers in a CO-RE-based BPF
      program, while still having a single version of BPF program running on old and
      new kernels. Selftests are validating this explicitly for unexisting BPF
      helper.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191214014710.3449601-3-andriin@fb.com
      166750bc
    • Andrii Nakryiko's avatar
      libbpf: Extract internal map names into constants · ac9d1389
      Andrii Nakryiko authored
      Instead of duplicating string literals, keep them in one place and consistent.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191214014710.3449601-2-andriin@fb.com
      ac9d1389
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-obj-skel' · f7c0bbf2
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      This patch set introduces an alternative and complimentary to existing libbpf
      API interface for working with BPF objects, maps, programs, and global data
      from userspace side. This approach is relying on code generation. bpftool
      produces a struct (a.k.a. skeleton) tailored and specific to provided BPF
      object file. It includes hard-coded fields and data structures for every map,
      program, link, and global data present.
      
      Altogether this approach significantly reduces amount of userspace boilerplate
      code required to open, load, attach, and work with BPF objects. It improves
      attach/detach story, by providing pre-allocated space for bpf_links, and
      ensuring they are properly detached on shutdown. It allows to do away with by
      name/title lookups of maps and programs, because libbpf's skeleton API, in
      conjunction with generated code from bpftool, is filling in hard-coded fields
      with actual pointers to corresponding struct bpf_map/bpf_program/bpf_link.
      
      Also, thanks to BPF array mmap() support, working with global data (variables)
      from userspace is now as natural as it is from BPF side: each variable is just
      a struct field inside skeleton struct. Furthermore, this allows to have
      a natural way for userspace to pre-initialize global data (including
      previously impossible to initialize .rodata) by just assigning values to the
      same per-variable fields. Libbpf will carefully take into account this
      initialization image, will use it to pre-populate BPF maps at creation time,
      and will re-mmap() BPF map's contents at exactly the same userspace memory
      address such that it can continue working with all the same pointers without
      any interruptions. If kernel doesn't support mmap(), global data will still be
      successfully initialized, but after map creation global data structures inside
      skeleton will be NULL-ed out. This allows userspace application to gracefully
      handle lack of mmap() support, if necessary.
      
      A bunch of selftests are also converted to using skeletons, demonstrating
      significant simplification of userspace part of test and reduction in amount
      of code necessary.
      
      v3->v4:
      - add OPTS_VALID check to btf_dump__emit_type_decl (Alexei);
      - expose skeleton as LIBBPF_API functions (Alexei);
      - copyright clean up, update internal map init refactor (Alexei);
      
      v2->v3:
      - make skeleton part of public API;
      - expose btf_dump__emit_type_decl and btf__align_of APIs;
      - move LIBBPF_API and DECLARE_LIBBPF_OPTS into libbpf_common.h for reuse;
      
      v1->v2:
      - checkpatch.pl and reverse Christmas tree styling (Jakub);
      - sanitize variable names to accomodate in-function static vars;
      
      rfc->v1:
      - runqslower moved out into separate patch set waiting for vmlinux.h
        improvements;
      - skeleton generation code deals with unknown internal maps more gracefully.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f7c0bbf2
  2. 15 Dec, 2019 22 commits
  3. 13 Dec, 2019 8 commits
    • Stanislav Fomichev's avatar
      a06bf42f
    • Stanislav Fomichev's avatar
      bpf: Expose __sk_buff wire_len/gso_segs to BPF_PROG_TEST_RUN · 850a88cc
      Stanislav Fomichev authored
      wire_len should not be less than real len and is capped by GSO_MAX_SIZE.
      gso_segs is capped by GSO_MAX_SEGS.
      
      v2:
      * set wire_len to skb->len when passed wire_len is 0 (Alexei Starovoitov)
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191213223028.161282-1-sdf@google.com
      850a88cc
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-dispatcher' · 02620d9e
      Alexei Starovoitov authored
      Björn Töpel says:
      
      ====================
      Overview
      ========
      
      This is the 6th iteration of the series that introduces the BPF
      dispatcher, which is a mechanism to avoid indirect calls.
      
      The BPF dispatcher is a multi-way branch code generator, targeted for
      BPF programs. E.g. when an XDP program is executed via the
      bpf_prog_run_xdp(), it is invoked via an indirect call. With
      retpolines enabled, the indirect call has a substantial performance
      impact. The dispatcher is a mechanism that transform indirect calls to
      direct calls, and therefore avoids the retpoline. The dispatcher is
      generated using the BPF JIT, and relies on text poking provided by
      bpf_arch_text_poke().
      
      The dispatcher hijacks a trampoline function it via the __fentry__ nop
      of the trampoline. One dispatcher instance currently supports up to 48
      dispatch points. This can be extended in the future.
      
      In this series, only one dispatcher instance is supported, and the
      only user is XDP. The dispatcher is updated when an XDP program is
      attached/detached to/from a netdev. An alternative to this could have
      been to update the dispatcher at program load point, but as there are
      usually more XDP programs loaded than attached, so the latter was
      picked.
      
      The XDP dispatcher is always enabled, if available, because it helps
      even when retpolines are disabled. Please refer to the "Performance"
      section below.
      
      The first patch refactors the image allocation from the BPF trampoline
      code. Patch two introduces the dispatcher, and patch three adds a
      dispatcher for XDP, and wires up the XDP control-/ fast-path. Patch
      four adds the dispatcher to BPF_TEST_RUN. Patch five adds a simple
      selftest, and the last adds alignment to jump targets.
      
      I have rebased the series on commit 679152d3 ("libbpf: Fix printf
      compilation warnings on ppc64le arch").
      
      Generated code, x86-64
      ======================
      
      The dispatcher currently has a maximum of 48 entries, where one entry
      is a unique BPF program. Multiple users of a dispatcher instance using
      the same BPF program will share that entry.
      
      The program/slot lookup is performed by a binary search, O(log
      n). Let's have a look at the generated code.
      
      The trampoline function has the following signature:
      
        unsigned int tramp(const void *ctx,
                           const struct bpf_insn *insnsi,
                           unsigned int (*bpf_func)(const void *,
                                                    const struct bpf_insn *))
      
      On Intel x86-64 this means that rdx will contain the bpf_func. To,
      make it easier to read, I've let the BPF programs have the following
      range: 0xffffffffffffffff (-1) to 0xfffffffffffffff0
      (-16). 0xffffffff81c00f10 is the retpoline thunk, in this case
      __x86_indirect_thunk_rdx. If retpolines are disabled the thunk will be
      a regular indirect call.
      
      The minimal dispatcher will then look like this:
      
      ffffffffc0002000: cmp    rdx,0xffffffffffffffff
      ffffffffc0002007: je     0xffffffffffffffff ; -1
      ffffffffc000200d: jmp    0xffffffff81c00f10
      
      A 16 entry dispatcher looks like this:
      
      ffffffffc0020000: cmp    rdx,0xfffffffffffffff7 ; -9
      ffffffffc0020007: jg     0xffffffffc0020130
      ffffffffc002000d: cmp    rdx,0xfffffffffffffff3 ; -13
      ffffffffc0020014: jg     0xffffffffc00200a0
      ffffffffc002001a: cmp    rdx,0xfffffffffffffff1 ; -15
      ffffffffc0020021: jg     0xffffffffc0020060
      ffffffffc0020023: cmp    rdx,0xfffffffffffffff0 ; -16
      ffffffffc002002a: jg     0xffffffffc0020040
      ffffffffc002002c: cmp    rdx,0xfffffffffffffff0 ; -16
      ffffffffc0020033: je     0xfffffffffffffff0 ; -16
      ffffffffc0020039: jmp    0xffffffff81c00f10
      ffffffffc002003e: xchg   ax,ax
      ffffffffc0020040: cmp    rdx,0xfffffffffffffff1 ; -15
      ffffffffc0020047: je     0xfffffffffffffff1 ; -15
      ffffffffc002004d: jmp    0xffffffff81c00f10
      ffffffffc0020052: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc002005a: nop    WORD PTR [rax+rax*1+0x0]
      ffffffffc0020060: cmp    rdx,0xfffffffffffffff2 ; -14
      ffffffffc0020067: jg     0xffffffffc0020080
      ffffffffc0020069: cmp    rdx,0xfffffffffffffff2 ; -14
      ffffffffc0020070: je     0xfffffffffffffff2 ; -14
      ffffffffc0020076: jmp    0xffffffff81c00f10
      ffffffffc002007b: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc0020080: cmp    rdx,0xfffffffffffffff3 ; -13
      ffffffffc0020087: je     0xfffffffffffffff3 ; -13
      ffffffffc002008d: jmp    0xffffffff81c00f10
      ffffffffc0020092: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc002009a: nop    WORD PTR [rax+rax*1+0x0]
      ffffffffc00200a0: cmp    rdx,0xfffffffffffffff5 ; -11
      ffffffffc00200a7: jg     0xffffffffc00200f0
      ffffffffc00200a9: cmp    rdx,0xfffffffffffffff4 ; -12
      ffffffffc00200b0: jg     0xffffffffc00200d0
      ffffffffc00200b2: cmp    rdx,0xfffffffffffffff4 ; -12
      ffffffffc00200b9: je     0xfffffffffffffff4 ; -12
      ffffffffc00200bf: jmp    0xffffffff81c00f10
      ffffffffc00200c4: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc00200cc: nop    DWORD PTR [rax+0x0]
      ffffffffc00200d0: cmp    rdx,0xfffffffffffffff5 ; -11
      ffffffffc00200d7: je     0xfffffffffffffff5 ; -11
      ffffffffc00200dd: jmp    0xffffffff81c00f10
      ffffffffc00200e2: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc00200ea: nop    WORD PTR [rax+rax*1+0x0]
      ffffffffc00200f0: cmp    rdx,0xfffffffffffffff6 ; -10
      ffffffffc00200f7: jg     0xffffffffc0020110
      ffffffffc00200f9: cmp    rdx,0xfffffffffffffff6 ; -10
      ffffffffc0020100: je     0xfffffffffffffff6 ; -10
      ffffffffc0020106: jmp    0xffffffff81c00f10
      ffffffffc002010b: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc0020110: cmp    rdx,0xfffffffffffffff7 ; -9
      ffffffffc0020117: je     0xfffffffffffffff7 ; -9
      ffffffffc002011d: jmp    0xffffffff81c00f10
      ffffffffc0020122: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc002012a: nop    WORD PTR [rax+rax*1+0x0]
      ffffffffc0020130: cmp    rdx,0xfffffffffffffffb ; -5
      ffffffffc0020137: jg     0xffffffffc00201d0
      ffffffffc002013d: cmp    rdx,0xfffffffffffffff9 ; -7
      ffffffffc0020144: jg     0xffffffffc0020190
      ffffffffc0020146: cmp    rdx,0xfffffffffffffff8 ; -8
      ffffffffc002014d: jg     0xffffffffc0020170
      ffffffffc002014f: cmp    rdx,0xfffffffffffffff8 ; -8
      ffffffffc0020156: je     0xfffffffffffffff8 ; -8
      ffffffffc002015c: jmp    0xffffffff81c00f10
      ffffffffc0020161: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc0020169: nop    DWORD PTR [rax+0x0]
      ffffffffc0020170: cmp    rdx,0xfffffffffffffff9 ; -7
      ffffffffc0020177: je     0xfffffffffffffff9 ; -7
      ffffffffc002017d: jmp    0xffffffff81c00f10
      ffffffffc0020182: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc002018a: nop    WORD PTR [rax+rax*1+0x0]
      ffffffffc0020190: cmp    rdx,0xfffffffffffffffa ; -6
      ffffffffc0020197: jg     0xffffffffc00201b0
      ffffffffc0020199: cmp    rdx,0xfffffffffffffffa ; -6
      ffffffffc00201a0: je     0xfffffffffffffffa ; -6
      ffffffffc00201a6: jmp    0xffffffff81c00f10
      ffffffffc00201ab: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc00201b0: cmp    rdx,0xfffffffffffffffb ; -5
      ffffffffc00201b7: je     0xfffffffffffffffb ; -5
      ffffffffc00201bd: jmp    0xffffffff81c00f10
      ffffffffc00201c2: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc00201ca: nop    WORD PTR [rax+rax*1+0x0]
      ffffffffc00201d0: cmp    rdx,0xfffffffffffffffd ; -3
      ffffffffc00201d7: jg     0xffffffffc0020220
      ffffffffc00201d9: cmp    rdx,0xfffffffffffffffc ; -4
      ffffffffc00201e0: jg     0xffffffffc0020200
      ffffffffc00201e2: cmp    rdx,0xfffffffffffffffc ; -4
      ffffffffc00201e9: je     0xfffffffffffffffc ; -4
      ffffffffc00201ef: jmp    0xffffffff81c00f10
      ffffffffc00201f4: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc00201fc: nop    DWORD PTR [rax+0x0]
      ffffffffc0020200: cmp    rdx,0xfffffffffffffffd ; -3
      ffffffffc0020207: je     0xfffffffffffffffd ; -3
      ffffffffc002020d: jmp    0xffffffff81c00f10
      ffffffffc0020212: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc002021a: nop    WORD PTR [rax+rax*1+0x0]
      ffffffffc0020220: cmp    rdx,0xfffffffffffffffe ; -2
      ffffffffc0020227: jg     0xffffffffc0020240
      ffffffffc0020229: cmp    rdx,0xfffffffffffffffe ; -2
      ffffffffc0020230: je     0xfffffffffffffffe ; -2
      ffffffffc0020236: jmp    0xffffffff81c00f10
      ffffffffc002023b: nop    DWORD PTR [rax+rax*1+0x0]
      ffffffffc0020240: cmp    rdx,0xffffffffffffffff ; -1
      ffffffffc0020247: je     0xffffffffffffffff ; -1
      ffffffffc002024d: jmp    0xffffffff81c00f10
      
      The nops are there to align jump targets to 16 B.
      
      Performance
      ===========
      
      The tests were performed using the xdp_rxq_info sample program with
      the following command-line:
      
      1. XDP_DRV:
        # xdp_rxq_info --dev eth0 --action XDP_DROP
      2. XDP_SKB:
        # xdp_rxq_info --dev eth0 -S --action XDP_DROP
      3. xdp-perf, from selftests/bpf:
        # test_progs -v -t xdp_perf
      
      Run with mitigations=auto
      -------------------------
      
      Baseline:
      1. 21.7 Mpps (21736190)
      2. 3.8 Mpps   (3837582)
      3. 15 ns
      
      Dispatcher:
      1. 30.2 Mpps (30176320)
      2. 4.0 Mpps   (4015579)
      3. 5 ns
      
      Dispatcher (full; walk all entries, and fallback):
      1. 22.0 Mpps (21986704)
      2. 3.8 Mpps   (3831298)
      3. 17 ns
      
      Run with mitigations=off
      ------------------------
      
      Baseline:
      1. 29.9 Mpps (29875135)
      2. 4.1 Mpps   (4100179)
      3. 4 ns
      
      Dispatcher:
      1. 30.4 Mpps (30439241)
      2. 4.1 Mpps   (4109350)
      1. 4 ns
      
      Dispatcher (full; walk all entries, and fallback):
      1. 28.9 Mpps (28903269)
      2. 4.1 Mpps   (4080078)
      3. 5 ns
      
      xdp-perf runs, aliged vs non-aligned jump targets
      -------------------------------------------------
      
      In this test dispatchers of different sizes, with and without jump
      target alignment, were exercised. As outlined above the function
      lookup is performed via binary search. This means that depending on
      the pointer value of the function, it can reside in the upper or lower
      part of the search table. The performed tests were:
      
      1. aligned, mititations=auto, function entry < other entries
      2. aligned, mititations=auto, function entry > other entries
      3. non-aligned, mititations=auto, function entry < other entries
      4. non-aligned, mititations=auto, function entry > other entries
      5. aligned, mititations=off, function entry < other entries
      6. aligned, mititations=off, function entry > other entries
      7. non-aligned, mititations=off, function entry < other entries
      8. non-aligned, mititations=off, function entry > other entries
      
      The micro benchmarks showed that alignment of jump target has some
      positive impact.
      
      A reply to this cover letter will contain complete data for all runs.
      
      Multiple xdp-perf baseline with mitigations=auto
      ------------------------------------------------
      
       Performance counter stats for './test_progs -v -t xdp_perf' (1024 runs):
      
                   16.69 msec task-clock                #    0.984 CPUs utilized            ( +-  0.08% )
                       2      context-switches          #    0.123 K/sec                    ( +-  1.11% )
                       0      cpu-migrations            #    0.000 K/sec                    ( +- 70.68% )
                      97      page-faults               #    0.006 M/sec                    ( +-  0.05% )
              49,254,635      cycles                    #    2.951 GHz                      ( +-  0.09% )  (12.28%)
              42,138,558      instructions              #    0.86  insn per cycle           ( +-  0.02% )  (36.15%)
               7,315,291      branches                  #  438.300 M/sec                    ( +-  0.01% )  (59.43%)
               1,011,201      branch-misses             #   13.82% of all branches          ( +-  0.01% )  (83.31%)
              15,440,788      L1-dcache-loads           #  925.143 M/sec                    ( +-  0.00% )  (99.40%)
                  39,067      L1-dcache-load-misses     #    0.25% of all L1-dcache hits    ( +-  0.04% )
                   6,531      LLC-loads                 #    0.391 M/sec                    ( +-  0.05% )
                     442      LLC-load-misses           #    6.76% of all LL-cache hits     ( +-  0.77% )
         <not supported>      L1-icache-loads
                  57,964      L1-icache-load-misses                                         ( +-  0.06% )
              15,442,496      dTLB-loads                #  925.246 M/sec                    ( +-  0.00% )
                     514      dTLB-load-misses          #    0.00% of all dTLB cache hits   ( +-  0.73% )  (40.57%)
                     130      iTLB-loads                #    0.008 M/sec                    ( +-  2.75% )  (16.69%)
           <not counted>      iTLB-load-misses                                              ( +-  8.71% )  (0.60%)
         <not supported>      L1-dcache-prefetches
         <not supported>      L1-dcache-prefetch-misses
      
               0.0169558 +- 0.0000127 seconds time elapsed  ( +-  0.07% )
      
      Multiple xdp-perf dispatcher with mitigations=auto
      --------------------------------------------------
      
      Note that this includes generating the dispatcher.
      
       Performance counter stats for './test_progs -v -t xdp_perf' (1024 runs):
      
                    4.80 msec task-clock                #    0.953 CPUs utilized            ( +-  0.06% )
                       1      context-switches          #    0.258 K/sec                    ( +-  1.57% )
                       0      cpu-migrations            #    0.000 K/sec
                      97      page-faults               #    0.020 M/sec                    ( +-  0.05% )
              14,185,861      cycles                    #    2.955 GHz                      ( +-  0.17% )  (50.49%)
              45,691,935      instructions              #    3.22  insn per cycle           ( +-  0.01% )  (99.19%)
               8,346,008      branches                  # 1738.709 M/sec                    ( +-  0.00% )
                  13,046      branch-misses             #    0.16% of all branches          ( +-  0.10% )
              15,443,735      L1-dcache-loads           # 3217.365 M/sec                    ( +-  0.00% )
                  39,585      L1-dcache-load-misses     #    0.26% of all L1-dcache hits    ( +-  0.05% )
                   7,138      LLC-loads                 #    1.487 M/sec                    ( +-  0.06% )
                     671      LLC-load-misses           #    9.40% of all LL-cache hits     ( +-  0.73% )
         <not supported>      L1-icache-loads
                  56,213      L1-icache-load-misses                                         ( +-  0.08% )
              15,443,735      dTLB-loads                # 3217.365 M/sec                    ( +-  0.00% )
           <not counted>      dTLB-load-misses                                              (0.00%)
           <not counted>      iTLB-loads                                                    (0.00%)
           <not counted>      iTLB-load-misses                                              (0.00%)
         <not supported>      L1-dcache-prefetches
         <not supported>      L1-dcache-prefetch-misses
      
              0.00503705 +- 0.00000546 seconds time elapsed  ( +-  0.11% )
      
      Revisions
      =========
      
      v4->v5: [1]
        * Fixed s/xdp_ctx/ctx/ type-o (Toke)
        * Marked dispatcher trampoline with noinline attribute (Alexei)
      
      v3->v4: [2]
        * Moved away from doing dispatcher lookup based on the trampoline
          function, to a model where the dispatcher instance is explicitly
          passed to the bpf_dispatcher_change_prog() (Alexei)
      
      v2->v3: [3]
        * Removed xdp_call, and instead make the dispatcher available to all
          XDP users via bpf_prog_run_xdp() and dev_xdp_install(). (Toke)
        * Always enable the dispatcher, if available (Alexei)
        * Reuse BPF trampoline image allocator (Alexei)
        * Make sure the dispatcher is exercised in selftests (Alexei)
        * Only allow one dispatcher, and wire it to XDP
      
      v1->v2: [4]
        * Fixed i386 build warning (kbuild robot)
        * Made bpf_dispatcher_lookup() static (kbuild robot)
        * Make sure xdp_call.h is only enabled for builtins
        * Add xdp_call() to ixgbe, mlx4, and mlx5
      
      RFC->v1: [5]
        * Improved error handling (Edward and Andrii)
        * Explicit cleanup (Andrii)
        * Use 32B with sext cmp (Alexei)
        * Align jump targets to 16B (Alexei)
        * 4 to 16 entries (Toke)
        * Added stats to xdp_call_run()
      
      [1] https://lore.kernel.org/bpf/20191211123017.13212-1-bjorn.topel@gmail.com/
      [2] https://lore.kernel.org/bpf/20191209135522.16576-1-bjorn.topel@gmail.com/
      [3] https://lore.kernel.org/bpf/20191123071226.6501-1-bjorn.topel@gmail.com/
      [4] https://lore.kernel.org/bpf/20191119160757.27714-1-bjorn.topel@gmail.com/
      [5] https://lore.kernel.org/bpf/20191113204737.31623-1-bjorn.topel@gmail.com/
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      02620d9e
    • Björn Töpel's avatar
      bpf, x86: Align dispatcher branch targets to 16B · 116eb788
      Björn Töpel authored
      >From Intel 64 and IA-32 Architectures Optimization Reference Manual,
      3.4.1.4 Code Alignment, Assembly/Compiler Coding Rule 11: All branch
      targets should be 16-byte aligned.
      
      This commits aligns branch targets according to the Intel manual.
      
      The nops used to align branch targets make the dispatcher larger, and
      therefore the number of supported dispatch points/programs are
      descreased from 64 to 48.
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191213175112.30208-7-bjorn.topel@gmail.com
      116eb788
    • Björn Töpel's avatar
      selftests: bpf: Add xdp_perf test · e754f5a6
      Björn Töpel authored
      The xdp_perf is a dummy XDP test, only used to measure the the cost of
      jumping into a naive XDP program one million times.
      
      To build and run the program:
        $ cd tools/testing/selftests/bpf
        $ make
        $ ./test_progs -v -t xdp_perf
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191213175112.30208-6-bjorn.topel@gmail.com
      e754f5a6
    • Björn Töpel's avatar
      bpf: Start using the BPF dispatcher in BPF_TEST_RUN · f23c4b39
      Björn Töpel authored
      In order to properly exercise the BPF dispatcher, this commit adds BPF
      dispatcher usage to BPF_TEST_RUN when executing XDP programs.
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191213175112.30208-5-bjorn.topel@gmail.com
      f23c4b39
    • Björn Töpel's avatar
      bpf, xdp: Start using the BPF dispatcher for XDP · 7e6897f9
      Björn Töpel authored
      This commit adds a BPF dispatcher for XDP. The dispatcher is updated
      from the XDP control-path, dev_xdp_install(), and used when an XDP
      program is run via bpf_prog_run_xdp().
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191213175112.30208-4-bjorn.topel@gmail.com
      7e6897f9
    • Björn Töpel's avatar
      bpf: Introduce BPF dispatcher · 75ccbef6
      Björn Töpel authored
      The BPF dispatcher is a multi-way branch code generator, mainly
      targeted for XDP programs. When an XDP program is executed via the
      bpf_prog_run_xdp(), it is invoked via an indirect call. The indirect
      call has a substantial performance impact, when retpolines are
      enabled. The dispatcher transform indirect calls to direct calls, and
      therefore avoids the retpoline. The dispatcher is generated using the
      BPF JIT, and relies on text poking provided by bpf_arch_text_poke().
      
      The dispatcher hijacks a trampoline function it via the __fentry__ nop
      of the trampoline. One dispatcher instance currently supports up to 64
      dispatch points. A user creates a dispatcher with its corresponding
      trampoline with the DEFINE_BPF_DISPATCHER macro.
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191213175112.30208-3-bjorn.topel@gmail.com
      75ccbef6