1. 10 Mar, 2023 10 commits
    • Martin KaFai Lau's avatar
      bpf: Move a few bpf_local_storage functions to static scope · 4cbd23cc
      Martin KaFai Lau authored
      This patch moves the bpf_local_storage_free_rcu() and
      bpf_selem_unlink_map() to static because they are
      not used outside of bpf_local_storage.c.
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Link: https://lore.kernel.org/r/20230308065936.1550103-2-martin.lau@linux.devSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4cbd23cc
    • David Vernet's avatar
      bpf/selftests: Fix send_signal tracepoint tests · 4a54de65
      David Vernet authored
      The send_signal tracepoint tests are non-deterministically failing in
      CI. The test works as follows:
      
      1. Two pairs of file descriptors are created using the pipe() function.
         One pair is used to communicate between a parent process -> child
         process, and the other for the reverse direction.
      
      2. A child is fork()'ed. The child process registers a signal handler,
         notifies its parent that the signal handler is registered, and then
         and waits for its parent to have enabled a BPF program that sends a
         signal.
      
      3. The parent opens and loads a BPF skeleton with programs that send
         signals to the child process. The different programs are triggered by
         different perf events (either NMI or normal perf), or by regular
         tracepoints. The signal is delivered to the child whenever the child
         triggers the program.
      
      4. The child's signal handler is invoked, which sets a flag saying that
         the signal handler was reached. The child then signals to the parent
         that it received the signal, and the test ends.
      
      The perf testcases (send_signal_perf{_thread} and
      send_signal_nmi{_thread}) work 100% of the time, but the tracepoint
      testcases fail non-deterministically because the tracepoint is not
      always being fired for the child.
      
      There are two tracepoint programs registered in the test:
      'tracepoint/sched/sched_switch', and
      'tracepoint/syscalls/sys_enter_nanosleep'. The child never intentionally
      blocks, nor sleeps, so neither tracepoint is guaranteed to be triggered.
      To fix this, we can have the child trigger the nanosleep program with a
      usleep().
      
      Before this patch, the test would fail locally every 2-3 runs. Now, it
      doesn't fail after more than 1000 runs.
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/r/20230310061909.1420887-1-void@manifault.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4a54de65
    • Andrii Nakryiko's avatar
      bpf: take into account liveness when propagating precision · 52c2b005
      Andrii Nakryiko authored
      When doing state comparison, if old state has register that is not
      marked as REG_LIVE_READ, then we just skip comparison, regardless what's
      the state of corresponing register in current state. This is because not
      REG_LIVE_READ register is irrelevant for further program execution and
      correctness. All good here.
      
      But when we get to precision propagation, after two states were declared
      equivalent, we don't take into account old register's liveness, and thus
      attempt to propagate precision for register in current state even if
      that register in old state was not REG_LIVE_READ anymore. This is bad,
      because register in current state could be anything at all and this
      could cause -EFAULT due to internal logic bugs.
      
      Fix by taking into account REG_LIVE_READ liveness mark to keep the logic
      in state comparison in sync with precision propagation.
      
      Fixes: a3ce685d ("bpf: fix precision tracking")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230309224131.57449-1-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      52c2b005
    • Andrii Nakryiko's avatar
      bpf: ensure state checkpointing at iter_next() call sites · 4b5ce570
      Andrii Nakryiko authored
      State equivalence check and checkpointing performed in is_state_visited()
      employs certain heuristics to try to save memory by avoiding state checkpoints
      if not enough jumps and instructions happened since last checkpoint. This leads
      to unpredictability of whether a particular instruction will be checkpointed
      and how regularly. While normally this is not causing much problems (except
      inconveniences for predictable verifier tests, which we overcome with
      BPF_F_TEST_STATE_FREQ flag), turns out it's not the case for open-coded
      iterators.
      
      Checking and saving state checkpoints at iter_next() call is crucial for fast
      convergence of open-coded iterator loop logic, so we need to force it. If we
      don't do that, is_state_visited() might skip saving a checkpoint, causing
      unnecessarily long sequence of not checkpointed instructions and jumps, leading
      to exhaustion of jump history buffer, and potentially other undesired outcomes.
      It is expected that with correct open-coded iterators convergence will happen
      quickly, so we don't run a risk of exhausting memory.
      
      This patch adds, in addition to prune and jump instruction marks, also a
      "forced checkpoint" mark, and makes sure that any iter_next() call instruction
      is marked as such.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230310060149.625887-1-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4b5ce570
    • Alexei Starovoitov's avatar
      Merge branch 'selftests/bpf: make BPF_CFLAGS stricter with -Wall' · 1456ddcc
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      
      Make BPF-side compiler flags stricter by adding -Wall. Fix tons of small
      issues pointed out by compiler immediately after that. That includes newly
      added bpf_for(), bpf_for_each(), and bpf_repeat() macros.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1456ddcc
    • Andrii Nakryiko's avatar
      selftests/bpf: make BPF compiler flags stricter · 3d5a55dd
      Andrii Nakryiko authored
      We recently added -Wuninitialized, but it's not enough to catch various
      silly mistakes or omissions. Let's go all the way to -Wall, just like we
      do for user-space code.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230309054015.4068562-5-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3d5a55dd
    • Andrii Nakryiko's avatar
      selftests/bpf: fix lots of silly mistakes pointed out by compiler · c8ed6685
      Andrii Nakryiko authored
      Once we enable -Wall for BPF sources, compiler will complain about lots
      of unused variables, variables that are set but never read, etc.
      
      Fix all these issues first before enabling -Wall in Makefile.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230309054015.4068562-4-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c8ed6685
    • Andrii Nakryiko's avatar
      selftests/bpf: add __sink() macro to fake variable consumption · 713461b8
      Andrii Nakryiko authored
      Add __sink(expr) macro that forces compiler to believe that passed in
      expression is both read and written. It used a simple embedded asm for
      this. This is useful in a lot of tests where we assign value to some variable
      to trigger some action, but later don't read variable, causing compiler
      to complain (if corresponding compiler warnings are turned on, which
      we'll do in the next patch).
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230309054015.4068562-3-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      713461b8
    • Andrii Nakryiko's avatar
      selftests/bpf: prevent unused variable warning in bpf_for() · 2498e623
      Andrii Nakryiko authored
      Add __attribute__((unused)) to inner __p variable inside bpf_for(),
      bpf_for_each(), and bpf_repeat() macros to avoid compiler warnings about
      unused variable.
      Reported-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230309054015.4068562-2-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2498e623
    • Yonghong Song's avatar
      selftests/bpf: Workaround verification failure for fexit_bpf2bpf/func_replace_return_code · 63d78b7e
      Yonghong Song authored
      With latest llvm17, selftest fexit_bpf2bpf/func_replace_return_code
      has the following verification failure:
      
        0: R1=ctx(off=0,imm=0) R10=fp0
        ; int connect_v4_prog(struct bpf_sock_addr *ctx)
        0: (bf) r7 = r1                       ; R1=ctx(off=0,imm=0) R7_w=ctx(off=0,imm=0)
        1: (b4) w6 = 0                        ; R6_w=0
        ; memset(&tuple.ipv4.saddr, 0, sizeof(tuple.ipv4.saddr));
        ...
        ; return do_bind(ctx) ? 1 : 0;
        179: (bf) r1 = r7                     ; R1=ctx(off=0,imm=0) R7=ctx(off=0,imm=0)
        180: (85) call pc+147
        Func#3 is global and valid. Skipping.
        181: R0_w=scalar()
        181: (bc) w6 = w0                     ; R0_w=scalar() R6_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
        182: (05) goto pc-129
        ; }
        54: (bc) w0 = w6                      ; R0_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R6_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
        55: (95) exit
        At program exit the register R0 has value (0x0; 0xffffffff) should have been in (0x0; 0x1)
        processed 281 insns (limit 1000000) max_states_per_insn 1 total_states 26 peak_states 26 mark_read 13
        -- END PROG LOAD LOG --
        libbpf: prog 'connect_v4_prog': failed to load: -22
      
      The corresponding source code:
      
        __attribute__ ((noinline))
        int do_bind(struct bpf_sock_addr *ctx)
        {
              struct sockaddr_in sa = {};
      
              sa.sin_family = AF_INET;
              sa.sin_port = bpf_htons(0);
              sa.sin_addr.s_addr = bpf_htonl(SRC_REWRITE_IP4);
      
              if (bpf_bind(ctx, (struct sockaddr *)&sa, sizeof(sa)) != 0)
                      return 0;
      
              return 1;
        }
        ...
        SEC("cgroup/connect4")
        int connect_v4_prog(struct bpf_sock_addr *ctx)
        {
        ...
              return do_bind(ctx) ? 1 : 0;
        }
      
      Insn 180 is a call to 'do_bind'. The call's return value is also the return value
      for the program. Since do_bind() returns 0/1, so it is legitimate for compiler to
      optimize 'return do_bind(ctx) ? 1 : 0' to 'return do_bind(ctx)'. However, such
      optimization breaks verifier as the return value of 'do_bind()' is marked as any
      scalar which violates the requirement of prog return value 0/1.
      
      There are two ways to fix this problem, (1) changing 'return 1' in do_bind() to
      e.g. 'return 10' so the compiler has to do 'do_bind(ctx) ? 1 :0', or (2)
      suggested by Andrii, marking do_bind() with __weak attribute so the compiler
      cannot make any assumption on do_bind() return value.
      
      This patch adopted adding __weak approach which is simpler and more resistant
      to potential compiler optimizations.
      Suggested-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20230310012410.2920570-1-yhs@fb.com
      63d78b7e
  2. 09 Mar, 2023 13 commits
    • Lorenzo Bianconi's avatar
    • Lorenzo Bianconi's avatar
      selftests/bpf: Use ifname instead of ifindex in XDP compliance test tool · 27a36bc3
      Lorenzo Bianconi authored
      Rely on interface name instead of interface index in error messages or
      logs from XDP compliance test tool.
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/7dc5a8ff56c252b1a7ae29b059d0b2b1543c8b5d.1678382940.git.lorenzo@kernel.org
      27a36bc3
    • Michael Weiß's avatar
      bpf: Fix a typo for BPF_F_ANY_ALIGNMENT in bpf.h · 5a70f4a6
      Michael Weiß authored
      Fix s/BPF_PROF_LOAD/BPF_PROG_LOAD/ typo in the documentation comment
      for BPF_F_ANY_ALIGNMENT in bpf.h.
      Signed-off-by: default avatarMichael Weiß <michael.weiss@aisec.fraunhofer.de>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20230309133823.944097-1-michael.weiss@aisec.fraunhofer.de
      5a70f4a6
    • Martin KaFai Lau's avatar
      selftests/bpf: Fix flaky fib_lookup test · a6865576
      Martin KaFai Lau authored
      There is a report that fib_lookup test is flaky when running in parallel.
      A symptom of slowness or delay. An example:
      
      Testing IPv6 stale neigh
      set_lookup_params:PASS:inet_pton(IPV6_IFACE_ADDR) 0 nsec
      test_fib_lookup:PASS:bpf_prog_test_run_opts 0 nsec
      test_fib_lookup:FAIL:fib_lookup_ret unexpected fib_lookup_ret: actual 0 != expected 7
      test_fib_lookup:FAIL:dmac not match unexpected dmac not match: actual 1 != expected 0
      dmac expected 11:11:11:11:11:11 actual 00:00:00:00:00:00
      
      [ Note that the "fib_lookup_ret unexpected fib_lookup_ret actual 0 ..."
        is reversed in terms of expected and actual value. Fixing in this
        patch also. ]
      
      One possibility is the testing stale neigh entry was marked dead by the
      gc (in neigh_periodic_work). The default gc_stale_time sysctl is 60s.
      This patch increases it to 15 mins.
      
      It also:
      
      - fixes the reversed arg (actual vs expected) in one of the
        ASSERT_EQ test
      - removes the nodad command arg when adding v4 neigh entry which
        currently has a warning.
      
      Fixes: 168de023 ("selftests/bpf: Add bpf_fib_lookup test")
      Reported-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20230309060244.3242491-1-martin.lau@linux.dev
      a6865576
    • Alexei Starovoitov's avatar
      Merge branch 'BPF open-coded iterators' · 23e403b3
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      
      Add support for open-coded (aka inline) iterators in BPF world. This is a next
      evolution of gradually allowing more powerful and less restrictive looping and
      iteration capabilities to BPF programs.
      
      We set up a framework for implementing all kinds of iterators (e.g., cgroup,
      task, file, etc, iterators), but this patch set only implements numbers
      iterator, which is used to implement ergonomic bpf_for() for-like construct
      (see patches #4-#5). We also add bpf_for_each(), which is a generic
      foreach-like construct that will work with any kind of open-coded iterator
      implementation, as long as we stick with bpf_iter_<type>_{new,next,destroy}()
      naming pattern (which we now enforce on the kernel side).
      
      Patch #1 is preparatory refactoring for easier way to check for special kfunc
      calls. Patch #2 is adding iterator kfunc registration and validation logic,
      which is mostly independent from the rest of open-coded iterator logic, so is
      separated out for easier reviewing.
      
      The meat of verifier-side logic is in patch #3. Patch #4 implements numbers
      iterator. I kept them separate to have clean reference for how to integrate
      new iterator types (now even simpler to do than in v1 of this patch set).
      Patch #5 adds bpf_for(), bpf_for_each(), and bpf_repeat() macros to
      bpf_misc.h, and also adds yet another pyperf test variant, now with bpf_for()
      loop. Patch #6 is verification tests, based on numbers iterator (as the only
      available right now). Patch #7 actually tests runtime behavior of numbers
      iterator.
      
      Finally, with changes in v2, it's possible and trivial to implement custom
      iterators completely in kernel modules, which we showcase and test by adding
      a simple iterator returning same number a given number of times to
      bpf_testmod. Patch #8 is where all this happens and is tested.
      
      Most of the relevant details are in corresponding commit messages or code
      comments.
      
      v4->v5:
        - fixing missed inner for() in is_iter_reg_valid_uninit, and fixed return
          false (kernel test robot);
        - typo fixes and comment/commit description improvements throughout the
          patch set;
      v3->v4:
        - remove unused variable from is_iter_reg_valid_init (kernel test robot);
      v2->v3:
        - remove special kfunc leftovers for bpf_iter_num_{new,next,destroy};
        - add iters/testmod_seq* to DENYLIST.s390x, it doesn't support kfuncs in
          modules yet (CI);
      v1->v2:
        - rebased on latest, dropping previously landed preparatory patches;
        - each iterator type now have its own `struct bpf_iter_<type>` which allows
          each iterator implementation to use exactly as much stack space as
          necessary, allowing to avoid runtime allocations (Alexei);
        - reworked how iterator kfuncs are defined, no verifier changes are required
          when adding new iterator type;
        - added bpf_testmod-based iterator implementation;
        - address the rest of feedback, comments, commit message adjustment, etc.
      
      Cc: Tejun Heo <tj@kernel.org>
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      23e403b3
    • Andrii Nakryiko's avatar
      selftests/bpf: implement and test custom testmod_seq iterator · 7e86a8c4
      Andrii Nakryiko authored
      Implement a trivial iterator returning same specified integer value
      N times as part of bpf_testmod kernel module. Add selftests to validate
      everything works end to end.
      
      We also reuse these tests as "verification-only" tests to validate that
      kernel prints the state of custom kernel module-defined iterator correctly:
      
        fp-16=iter_testmod_seq(ref_id=1,state=drained,depth=0)
      
      "testmod_seq" part is an iterator type, and is coming from module's BTF
      data dynamically at runtime.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230308184121.1165081-9-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7e86a8c4
    • Andrii Nakryiko's avatar
      selftests/bpf: add number iterator tests · f59b1460
      Andrii Nakryiko authored
      Add number iterator (bpf_iter_num_{new,next,destroy}()) tests,
      validating the correct handling of various corner and common cases
      *at runtime*.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230308184121.1165081-8-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f59b1460
    • Andrii Nakryiko's avatar
      selftests/bpf: add iterators tests · 57400dcc
      Andrii Nakryiko authored
      Add various tests for open-coded iterators. Some of them excercise
      various possible coding patterns in C, some go down to low-level
      assembly for more control over various conditions, especially invalid
      ones.
      
      We also make use of bpf_for(), bpf_for_each(), bpf_repeat() macros in
      some of these tests.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230308184121.1165081-7-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      57400dcc
    • Andrii Nakryiko's avatar
      selftests/bpf: add bpf_for_each(), bpf_for(), and bpf_repeat() macros · 8c2b5e90
      Andrii Nakryiko authored
      Add bpf_for_each(), bpf_for(), and bpf_repeat() macros that make writing
      open-coded iterator-based loops much more convenient and natural. These
      macros utilize cleanup attribute to ensure proper destruction of the
      iterator and thanks to that manage to provide the ergonomics that is
      very close to C language's for() construct. Typical loop would look like:
      
        int i;
        int arr[N];
      
        bpf_for(i, 0, N) {
            /* verifier will know that i >= 0 && i < N, so could be used to
             * directly access array elements with no extra checks
             */
             arr[i] = i;
        }
      
      bpf_repeat() is very similar, but it doesn't expose iteration number and
      is meant as a simple "repeat action N times" loop:
      
        bpf_repeat(N) { /* whatever, N times */ }
      
      Note that `break` and `continue` statements inside the {} block work as
      expected.
      
      bpf_for_each() is a generalization over any kind of BPF open-coded
      iterator allowing to use for-each-like approach instead of calling
      low-level bpf_iter_<type>_{new,next,destroy}() APIs explicitly. E.g.:
      
        struct cgroup *cg;
      
        bpf_for_each(cgroup, cg, some, input, args) {
            /* do something with each cg */
        }
      
      would call (not-yet-implemented) bpf_iter_cgroup_{new,next,destroy}()
      functions to form a loop over cgroups, where `some, input, args` are
      passed verbatim into constructor as
      
        bpf_iter_cgroup_new(&it, some, input, args).
      
      As a first demonstration, add pyperf variant based on the bpf_for() loop.
      
      Also clean up a few tests that either included bpf_misc.h header
      unnecessarily from the user-space, which is unsupported, or included it
      before any common types are defined (and thus leading to unnecessary
      compilation warnings, potentially).
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230308184121.1165081-6-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8c2b5e90
    • Andrii Nakryiko's avatar
      bpf: implement numbers iterator · 6018e1f4
      Andrii Nakryiko authored
      Implement the first open-coded iterator type over a range of integers.
      
      It's public API consists of:
        - bpf_iter_num_new() constructor, which accepts [start, end) range
          (that is, start is inclusive, end is exclusive).
        - bpf_iter_num_next() which will keep returning read-only pointer to int
          until the range is exhausted, at which point NULL will be returned.
          If bpf_iter_num_next() is kept calling after this, NULL will be
          persistently returned.
        - bpf_iter_num_destroy() destructor, which needs to be called at some
          point to clean up iterator state. BPF verifier enforces that iterator
          destructor is called at some point before BPF program exits.
      
      Note that `start = end = X` is a valid combination to setup an empty
      iterator. bpf_iter_num_new() will return 0 (success) for any such
      combination.
      
      If bpf_iter_num_new() detects invalid combination of input arguments, it
      returns error, resets iterator state to, effectively, empty iterator, so
      any subsequent call to bpf_iter_num_next() will keep returning NULL.
      
      BPF verifier has no knowledge that returned integers are in the
      [start, end) value range, as both `start` and `end` are not statically
      known and enforced: they are runtime values.
      
      While the implementation is pretty trivial, some care needs to be taken
      to avoid overflows and underflows. Subsequent selftests will validate
      correctness of [start, end) semantics, especially around extremes
      (INT_MIN and INT_MAX).
      
      Similarly to bpf_loop(), we enforce that no more than BPF_MAX_LOOPS can
      be specified.
      
      bpf_iter_num_{new,next,destroy}() is a logical evolution from bounded
      BPF loops and bpf_loop() helper and is the basis for implementing
      ergonomic BPF loops with no statically known or verified bounds.
      Subsequent patches implement bpf_for() macro, demonstrating how this can
      be wrapped into something that works and feels like a normal for() loop
      in C language.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230308184121.1165081-5-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      6018e1f4
    • Andrii Nakryiko's avatar
      bpf: add support for open-coded iterator loops · 06accc87
      Andrii Nakryiko authored
      Teach verifier about the concept of the open-coded (or inline) iterators.
      
      This patch adds generic iterator loop verification logic, new STACK_ITER
      stack slot type to contain iterator state, and necessary kfunc plumbing
      for iterator's constructor, destructor and next methods. Next patch
      implements first specific iterator (numbers iterator for implementing
      for() loop logic). Such split allows to have more focused commits for
      verifier logic and separate commit that we could point later to
      demonstrating  what does it take to add a new kind of iterator.
      
      Each kind of iterator has its own associated struct bpf_iter_<type>,
      where <type> denotes a specific type of iterator. struct bpf_iter_<type>
      state is supposed to live on BPF program stack, so there will be no way
      to change its size later on without breaking backwards compatibility, so
      choose wisely! But given this struct is specific to a given <type> of
      iterator, this allows a lot of flexibility: simple iterators could be
      fine with just one stack slot (8 bytes), like numbers iterator in the
      next patch, while some other more complicated iterators might need way
      more to keep their iterator state. Either way, such design allows to
      avoid runtime memory allocations, which otherwise would be necessary if
      we fixed on-the-stack size and it turned out to be too small for a given
      iterator implementation.
      
      The way BPF verifier logic is implemented, there are no artificial
      restrictions on a number of active iterators, it should work correctly
      using multiple active iterators at the same time. This also means you
      can have multiple nested iteration loops. struct bpf_iter_<type>
      reference can be safely passed to subprograms as well.
      
      General flow is easiest to demonstrate with a simple example using
      number iterator implemented in next patch. Here's the simplest possible
      loop:
      
        struct bpf_iter_num it;
        int *v;
      
        bpf_iter_num_new(&it, 2, 5);
        while ((v = bpf_iter_num_next(&it))) {
            bpf_printk("X = %d", *v);
        }
        bpf_iter_num_destroy(&it);
      
      Above snippet should output "X = 2", "X = 3", "X = 4". Note that 5 is
      exclusive and is not returned. This matches similar APIs (e.g., slices
      in Go or Rust) that implement a range of elements, where end index is
      non-inclusive.
      
      In the above example, we see a trio of function:
        - constructor, bpf_iter_num_new(), which initializes iterator state
        (struct bpf_iter_num it) on the stack. If any of the input arguments
        are invalid, constructor should make sure to still initialize it such
        that subsequent bpf_iter_num_next() calls will return NULL. I.e., on
        error, return error and construct empty iterator.
        - next method, bpf_iter_num_next(), which accepts pointer to iterator
        state and produces an element. Next method should always return
        a pointer. The contract between BPF verifier is that next method will
        always eventually return NULL when elements are exhausted. Once NULL is
        returned, subsequent next calls should keep returning NULL. In the
        case of numbers iterator, bpf_iter_num_next() returns a pointer to an int
        (storage for this integer is inside the iterator state itself),
        which can be dereferenced after corresponding NULL check.
        - once done with the iterator, it's mandated that user cleans up its
        state with the call to destructor, bpf_iter_num_destroy() in this
        case. Destructor frees up any resources and marks stack space used by
        struct bpf_iter_num as usable for something else.
      
      Any other iterator implementation will have to implement at least these
      three methods. It is enforced that for any given type of iterator only
      applicable constructor/destructor/next are callable. I.e., verifier
      ensures you can't pass number iterator state into, say, cgroup
      iterator's next method.
      
      It is important to keep the naming pattern consistent to be able to
      create generic macros to help with BPF iter usability. E.g., one
      of the follow up patches adds generic bpf_for_each() macro to bpf_misc.h
      in selftests, which allows to utilize iterator "trio" nicely without
      having to code the above somewhat tedious loop explicitly every time.
      This is enforced at kfunc registration point by one of the previous
      patches in this series.
      
      At the implementation level, iterator state tracking for verification
      purposes is very similar to dynptr. We add STACK_ITER stack slot type,
      reserve necessary number of slots, depending on
      sizeof(struct bpf_iter_<type>), and keep track of necessary extra state
      in the "main" slot, which is marked with non-zero ref_obj_id. Other
      slots are also marked as STACK_ITER, but have zero ref_obj_id. This is
      simpler than having a separate "is_first_slot" flag.
      
      Another big distinction is that STACK_ITER is *always refcounted*, which
      simplifies implementation without sacrificing usability. So no need for
      extra "iter_id", no need to anticipate reuse of STACK_ITER slots for new
      constructors, etc. Keeping it simple here.
      
      As far as the verification logic goes, there are two extensive comments:
      in process_iter_next_call() and iter_active_depths_differ() explaining
      some important and sometimes subtle aspects. Please refer to them for
      details.
      
      But from 10,000-foot point of view, next methods are the points of
      forking a verification state, which are conceptually similar to what
      verifier is doing when validating conditional jump. We branch out at
      a `call bpf_iter_<type>_next` instruction and simulate two outcomes:
      NULL (iteration is done) and non-NULL (new element is returned). NULL is
      simulated first and is supposed to reach exit without looping. After
      that non-NULL case is validated and it either reaches exit (for trivial
      examples with no real loop), or reaches another `call bpf_iter_<type>_next`
      instruction with the state equivalent to already (partially) validated
      one. State equivalency at that point means we technically are going to
      be looping forever without "breaking out" out of established "state
      envelope" (i.e., subsequent iterations don't add any new knowledge or
      constraints to the verifier state, so running 1, 2, 10, or a million of
      them doesn't matter). But taking into account the contract stating that
      iterator next method *has to* return NULL eventually, we can conclude
      that loop body is safe and will eventually terminate. Given we validated
      logic outside of the loop (NULL case), and concluded that loop body is
      safe (though potentially looping many times), verifier can claim safety
      of the overall program logic.
      
      The rest of the patch is necessary plumbing for state tracking, marking,
      validation, and necessary further kfunc plumbing to allow implementing
      iterator constructor, destructor, and next methods.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230308184121.1165081-4-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      06accc87
    • Andrii Nakryiko's avatar
      bpf: add iterator kfuncs registration and validation logic · 215bf496
      Andrii Nakryiko authored
      Add ability to register kfuncs that implement BPF open-coded iterator
      contract and enforce naming and function proto convention. Enforcement
      happens at the time of kfunc registration and significantly simplifies
      the rest of iterators logic in the verifier.
      
      More details follow in subsequent patches, but we enforce the following
      conditions.
      
      All kfuncs (constructor, next, destructor) have to be named consistenly
      as bpf_iter_<type>_{new,next,destroy}(), respectively. <type> represents
      iterator type, and iterator state should be represented as a matching
      `struct bpf_iter_<type>` state type. Also, all iter kfuncs should have
      a pointer to this `struct bpf_iter_<type>` as the very first argument.
      
      Additionally:
        - Constructor, i.e., bpf_iter_<type>_new(), can have arbitrary extra
        number of arguments. Return type is not enforced either.
        - Next method, i.e., bpf_iter_<type>_next(), has to return a pointer
        type and should have exactly one argument: `struct bpf_iter_<type> *`
        (const/volatile/restrict and typedefs are ignored).
        - Destructor, i.e., bpf_iter_<type>_destroy(), should return void and
        should have exactly one argument, similar to the next method.
        - struct bpf_iter_<type> size is enforced to be positive and
        a multiple of 8 bytes (to fit stack slots correctly).
      
      Such strictness and consistency allows to build generic helpers
      abstracting important, but boilerplate, details to be able to use
      open-coded iterators effectively and ergonomically (see bpf_for_each()
      in subsequent patches). It also simplifies the verifier logic in some
      places. At the same time, this doesn't hurt generality of possible
      iterator implementations. Win-win.
      
      Constructor kfunc is marked with a new KF_ITER_NEW flags, next method is
      marked with KF_ITER_NEXT (and should also have KF_RET_NULL, of course),
      while destructor kfunc is marked as KF_ITER_DESTROY.
      
      Additionally, we add a trivial kfunc name validation: it should be
      a valid non-NULL and non-empty string.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230308184121.1165081-3-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      215bf496
    • Andrii Nakryiko's avatar
      bpf: factor out fetching basic kfunc metadata · 07236eab
      Andrii Nakryiko authored
      Factor out logic to fetch basic kfunc metadata based on struct bpf_insn.
      This is not exactly short or trivial code to just copy/paste and this
      information is sometimes necessary in other parts of the verifier logic.
      Subsequent patches will rely on this to determine if an instruction is
      a kfunc call to iterator next method.
      
      No functional changes intended, including that verbose() warning
      behavior when kfunc is not allowed for a particular program type.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20230308184121.1165081-2-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      07236eab
  3. 08 Mar, 2023 17 commits