1. 29 Jul, 2024 16 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-fix-tailcall-hierarchy' · 81a0b954
      Alexei Starovoitov authored
      Leon Hwang says:
      
      ====================
      bpf: Fix tailcall hierarchy
      
      This patchset fixes a tailcall hierarchy issue.
      
      The issue is confirmed in the discussions of
      "bpf, x64: Fix tailcall infinite loop" [0].
      
      The issue has been resolved on both x86_64 and arm64 [1].
      
      I provide a long commit message in the "bpf, x64: Fix tailcall hierarchy"
      patch to describe how the issue happens and how this patchset resolves the
      issue in details.
      
      How does this patchset resolve the issue?
      
      In short, it stores tail_call_cnt on the stack of main prog, and propagates
      tail_call_cnt_ptr to its subprogs.
      
      First, at the prologue of main prog, it initializes tail_call_cnt and
      prepares tail_call_cnt_ptr. And at the prologue of subprog, it reuses
      the tail_call_cnt_ptr from caller.
      
      Then, when a tailcall happens, it increments tail_call_cnt by its pointer.
      
      v5 -> v6:
        * Address comments from Eduard:
          * Add JITed dumping along annotating comments
          * Rewrite two selftests with RUN_TESTS macro.
      
      v4 -> v5:
        * Solution changes from tailcall run ctx to tail_call_cnt and its pointer.
          It's because v4 solution is unable to handle the case that there is no
          tailcall in subprog but there is tailcall in EXT prog which attaches to
          the subprog.
      
      v3 -> v4:
        * Solution changes from per-task tail_call_cnt to tailcall run ctx.
          As for per-cpu/per-task solution, there is a case it is unable to handle [2].
      
      v2 -> v3:
        * Solution changes from percpu tail_call_cnt to tail_call_cnt at task_struct.
      
      v1 -> v2:
        * Solution changes from extra run-time call insn to percpu tail_call_cnt.
        * Address comments from Alexei:
          * Use percpu tail_call_cnt.
          * Use asm to make sure no callee saved registers are touched.
      
      RFC v2 -> v1:
        * Solution changes from propagating tail_call_cnt with its pointer to extra
          run-time call insn.
        * Address comments from Maciej:
          * Replace all memcpy(prog, x86_nops[5], X86_PATCH_SIZE) with
              emit_nops(&prog, X86_PATCH_SIZE)
      
      RFC v1 -> RFC v2:
        * Address comments from Stanislav:
          * Separate moving emit_nops() as first patch.
      
      Links:
      [0] https://lore.kernel.org/bpf/6203dd01-789d-f02c-5293-def4c1b18aef@gmail.com/
      [1] https://github.com/kernel-patches/bpf/pull/7350/checks
      [2] https://lore.kernel.org/bpf/CAADnVQK1qF+uBjwom2s2W-yEmgd_3rGi5Nr+KiV3cW0T+UPPfA@mail.gmail.com/
      ====================
      
      Link: https://lore.kernel.org/r/20240714123902.32305-1-hffilwlqm@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      81a0b954
    • Leon Hwang's avatar
      selftests/bpf: Add testcases for tailcall hierarchy fixing · b83b936f
      Leon Hwang authored
      Add some test cases to confirm the tailcall hierarchy issue has been fixed.
      
      On x64, the selftests result is:
      
      cd tools/testing/selftests/bpf && ./test_progs -t tailcalls
      327/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
      327/19  tailcalls/tailcall_bpf2bpf_hierarchy_fentry:OK
      327/20  tailcalls/tailcall_bpf2bpf_hierarchy_fexit:OK
      327/21  tailcalls/tailcall_bpf2bpf_hierarchy_fentry_fexit:OK
      327/22  tailcalls/tailcall_bpf2bpf_hierarchy_fentry_entry:OK
      327/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
      327/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
      327     tailcalls:OK
      Summary: 1/24 PASSED, 0 SKIPPED, 0 FAILED
      
      On arm64, the selftests result is:
      
      cd tools/testing/selftests/bpf && ./test_progs -t tailcalls
      327/18  tailcalls/tailcall_bpf2bpf_hierarchy_1:OK
      327/19  tailcalls/tailcall_bpf2bpf_hierarchy_fentry:OK
      327/20  tailcalls/tailcall_bpf2bpf_hierarchy_fexit:OK
      327/21  tailcalls/tailcall_bpf2bpf_hierarchy_fentry_fexit:OK
      327/22  tailcalls/tailcall_bpf2bpf_hierarchy_fentry_entry:OK
      327/23  tailcalls/tailcall_bpf2bpf_hierarchy_2:OK
      327/24  tailcalls/tailcall_bpf2bpf_hierarchy_3:OK
      327     tailcalls:OK
      Summary: 1/24 PASSED, 0 SKIPPED, 0 FAILED
      Acked-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarLeon Hwang <hffilwlqm@gmail.com>
      Link: https://lore.kernel.org/r/20240714123902.32305-4-hffilwlqm@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      b83b936f
    • Leon Hwang's avatar
      bpf, arm64: Fix tailcall hierarchy · 66ff4d61
      Leon Hwang authored
      This patch fixes a tailcall issue caused by abusing the tailcall in
      bpf2bpf feature on arm64 like the way of "bpf, x64: Fix tailcall
      hierarchy".
      
      On arm64, when a tail call happens, it uses tail_call_cnt_ptr to
      increment tail_call_cnt, too.
      
      At the prologue of main prog, it has to initialize tail_call_cnt and
      prepare tail_call_cnt_ptr.
      
      At the prologue of subprog, it pushes x26 register twice, and does not
      initialize tail_call_cnt.
      
      At the epilogue, it pops x26 twice, no matter whether it is main prog or
      subprog.
      
      Fixes: d4609a5d ("bpf, arm64: Keep tail call count across bpf2bpf calls")
      Acked-by: default avatarPuranjay Mohan <puranjay@kernel.org>
      Signed-off-by: default avatarLeon Hwang <hffilwlqm@gmail.com>
      Link: https://lore.kernel.org/r/20240714123902.32305-3-hffilwlqm@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      66ff4d61
    • Leon Hwang's avatar
      bpf, x64: Fix tailcall hierarchy · 116e04ba
      Leon Hwang authored
      This patch fixes a tailcall issue caused by abusing the tailcall in
      bpf2bpf feature.
      
      As we know, tail_call_cnt propagates by rax from caller to callee when
      to call subprog in tailcall context. But, like the following example,
      MAX_TAIL_CALL_CNT won't work because of missing tail_call_cnt
      back-propagation from callee to caller.
      
      \#include <linux/bpf.h>
      \#include <bpf/bpf_helpers.h>
      \#include "bpf_legacy.h"
      
      struct {
      	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
      	__uint(max_entries, 1);
      	__uint(key_size, sizeof(__u32));
      	__uint(value_size, sizeof(__u32));
      } jmp_table SEC(".maps");
      
      int count = 0;
      
      static __noinline
      int subprog_tail1(struct __sk_buff *skb)
      {
      	bpf_tail_call_static(skb, &jmp_table, 0);
      	return 0;
      }
      
      static __noinline
      int subprog_tail2(struct __sk_buff *skb)
      {
      	bpf_tail_call_static(skb, &jmp_table, 0);
      	return 0;
      }
      
      SEC("tc")
      int entry(struct __sk_buff *skb)
      {
      	volatile int ret = 1;
      
      	count++;
      	subprog_tail1(skb);
      	subprog_tail2(skb);
      
      	return ret;
      }
      
      char __license[] SEC("license") = "GPL";
      
      At run time, the tail_call_cnt in entry() will be propagated to
      subprog_tail1() and subprog_tail2(). But, when the tail_call_cnt in
      subprog_tail1() updates when bpf_tail_call_static(), the tail_call_cnt
      in entry() won't be updated at the same time. As a result, in entry(),
      when tail_call_cnt in entry() is less than MAX_TAIL_CALL_CNT and
      subprog_tail1() returns because of MAX_TAIL_CALL_CNT limit,
      bpf_tail_call_static() in suprog_tail2() is able to run because the
      tail_call_cnt in subprog_tail2() propagated from entry() is less than
      MAX_TAIL_CALL_CNT.
      
      So, how many tailcalls are there for this case if no error happens?
      
      From top-down view, does it look like hierarchy layer and layer?
      
      With this view, there will be 2+4+8+...+2^33 = 2^34 - 2 = 17,179,869,182
      tailcalls for this case.
      
      How about there are N subprog_tail() in entry()? There will be almost
      N^34 tailcalls.
      
      Then, in this patch, it resolves this case on x86_64.
      
      In stead of propagating tail_call_cnt from caller to callee, it
      propagates its pointer, tail_call_cnt_ptr, tcc_ptr for short.
      
      However, where does it store tail_call_cnt?
      
      It stores tail_call_cnt on the stack of main prog. When tail call
      happens in subprog, it increments tail_call_cnt by tcc_ptr.
      
      Meanwhile, it stores tail_call_cnt_ptr on the stack of main prog, too.
      
      And, before jump to tail callee, it has to pop tail_call_cnt and
      tail_call_cnt_ptr.
      
      Then, at the prologue of subprog, it must not make rax as
      tail_call_cnt_ptr again. It has to reuse tail_call_cnt_ptr from caller.
      
      As a result, at run time, it has to recognize rax is tail_call_cnt or
      tail_call_cnt_ptr at prologue by:
      
      1. rax is tail_call_cnt if rax is <= MAX_TAIL_CALL_CNT.
      2. rax is tail_call_cnt_ptr if rax is > MAX_TAIL_CALL_CNT, because a
         pointer won't be <= MAX_TAIL_CALL_CNT.
      
      Here's an example to dump JITed.
      
      struct {
      	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
      	__uint(max_entries, 1);
      	__uint(key_size, sizeof(__u32));
      	__uint(value_size, sizeof(__u32));
      } jmp_table SEC(".maps");
      
      int count = 0;
      
      static __noinline
      int subprog_tail(struct __sk_buff *skb)
      {
      	bpf_tail_call_static(skb, &jmp_table, 0);
      	return 0;
      }
      
      SEC("tc")
      int entry(struct __sk_buff *skb)
      {
      	int ret = 1;
      
      	count++;
      	subprog_tail(skb);
      	subprog_tail(skb);
      
      	return ret;
      }
      
      When bpftool p d j id 42:
      
      int entry(struct __sk_buff * skb):
      bpf_prog_0c0f4c2413ef19b1_entry:
      ; int entry(struct __sk_buff *skb)
         0:	endbr64
         4:	nopl	(%rax,%rax)
         9:	xorq	%rax, %rax		;; rax = 0 (tail_call_cnt)
         c:	pushq	%rbp
         d:	movq	%rsp, %rbp
        10:	endbr64
        14:	cmpq	$33, %rax		;; if rax > 33, rax = tcc_ptr
        18:	ja	0x20			;; if rax > 33 goto 0x20 ---+
        1a:	pushq	%rax			;; [rbp - 8] = rax = 0      |
        1b:	movq	%rsp, %rax		;; rax = rbp - 8            |
        1e:	jmp	0x21			;; ---------+               |
        20:	pushq	%rax			;; <--------|---------------+
        21:	pushq	%rax			;; <--------+ [rbp - 16] = rax
        22:	pushq	%rbx			;; callee saved
        23:	movq	%rdi, %rbx		;; rbx = skb (callee saved)
      ; count++;
        26:	movabsq	$-82417199407104, %rdi
        30:	movl	(%rdi), %esi
        33:	addl	$1, %esi
        36:	movl	%esi, (%rdi)
      ; subprog_tail(skb);
        39:	movq	%rbx, %rdi		;; rdi = skb
        3c:	movq	-16(%rbp), %rax		;; rax = tcc_ptr
        43:	callq	0x80			;; call subprog_tail()
      ; subprog_tail(skb);
        48:	movq	%rbx, %rdi		;; rdi = skb
        4b:	movq	-16(%rbp), %rax		;; rax = tcc_ptr
        52:	callq	0x80			;; call subprog_tail()
      ; return ret;
        57:	movl	$1, %eax
        5c:	popq	%rbx
        5d:	leave
        5e:	retq
      
      int subprog_tail(struct __sk_buff * skb):
      bpf_prog_3a140cef239a4b4f_subprog_tail:
      ; int subprog_tail(struct __sk_buff *skb)
         0:	endbr64
         4:	nopl	(%rax,%rax)
         9:	nopl	(%rax)			;; do not touch tail_call_cnt
         c:	pushq	%rbp
         d:	movq	%rsp, %rbp
        10:	endbr64
        14:	pushq	%rax			;; [rbp - 8]  = rax (tcc_ptr)
        15:	pushq	%rax			;; [rbp - 16] = rax (tcc_ptr)
        16:	pushq	%rbx			;; callee saved
        17:	pushq	%r13			;; callee saved
        19:	movq	%rdi, %rbx		;; rbx = skb
      ; asm volatile("r1 = %[ctx]\n\t"
        1c:	movabsq	$-105487587488768, %r13	;; r13 = jmp_table
        26:	movq	%rbx, %rdi		;; 1st arg, skb
        29:	movq	%r13, %rsi		;; 2nd arg, jmp_table
        2c:	xorl	%edx, %edx		;; 3rd arg, index = 0
        2e:	movq	-16(%rbp), %rax		;; rax = [rbp - 16] (tcc_ptr)
        35:	cmpq	$33, (%rax)
        39:	jae	0x4e			;; if *tcc_ptr >= 33 goto 0x4e --------+
        3b:	jmp	0x4e			;; jmp bypass, toggled by poking       |
        40:	addq	$1, (%rax)		;; (*tcc_ptr)++                        |
        44:	popq	%r13			;; callee saved                        |
        46:	popq	%rbx			;; callee saved                        |
        47:	popq	%rax			;; undo rbp-16 push                    |
        48:	popq	%rax			;; undo rbp-8  push                    |
        49:	nopl	(%rax,%rax)		;; tail call target, toggled by poking |
      ; return 0;				;;                                     |
        4e:	popq	%r13			;; restore callee saved <--------------+
        50:	popq	%rbx			;; restore callee saved
        51:	leave
        52:	retq
      
      Furthermore, when trampoline is the caller of bpf prog, which is
      tail_call_reachable, it is required to propagate rax through trampoline.
      
      Fixes: ebf7d1f5 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
      Fixes: e411901c ("bpf: allow for tailcalls in BPF subprograms for x64 JIT")
      Reviewed-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarLeon Hwang <hffilwlqm@gmail.com>
      Link: https://lore.kernel.org/r/20240714123902.32305-2-hffilwlqm@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      116e04ba
    • Andrii Nakryiko's avatar
      Merge branch 'bpf-track-find_equal_scalars-history-on-per-instruction-level' · bde0c5a7
      Andrii Nakryiko authored
      Eduard Zingerman says:
      
      ====================
      bpf: track find_equal_scalars history on per-instruction level
      
      This is a fix for precision tracking bug reported in [0].
      It supersedes my previous attempt to fix similar issue in commit [1].
      Here is a minimized test case from [0]:
      
          0:  call bpf_get_prandom_u32;
          1:  r7 = r0;
          2:  r8 = r0;
          3:  call bpf_get_prandom_u32;
          4:  if r0 > 1 goto +0;
          /* --- checkpoint #1: r7.id=1, r8.id=1 --- */
          5:  if r8 >= r0 goto 9f;
          6:  r8 += r8;
          /* --- checkpoint #2: r7.id=1, r8.id=0 --- */
          7:  if r7 == 0 goto 9f;
          8:  r0 /= 0;
          /* --- checkpoint #3 --- */
          9:  r0 = 42;
          10: exit;
      
      W/o this fix verifier incorrectly assumes that instruction at label
      (8) is unreachable. The issue is caused by failure to infer
      precision mark for r0 at checkpoint #1:
      - first verification path is:
        - (0-4): r0 range [0,1];
        - (5): r8 range [0,0], propagated to r7;
        - (6): r8.id is reset;
        - (7): jump is predicted to happen;
        - (9-10): safe exit.
      - when jump at (7) is predicted mark_chain_precision() for r7 is
        called and backtrack_insn() proceeds as follows:
        - at (7) r7 is marked as precise;
        - at (5) r8 is not currently tracked and thus r0 is not marked;
        - at (4-5) boundary logic from [1] is triggered and r7,r8 are marked
          as precise;
        - => r0 precision mark is missed.
      - when second branch of (4) is considered, verifier prunes the state
        because r0 is not marked as precise in the visited state.
      
      Basically, backtracking logic fails to notice that at (5)
      range information is gained for both r7 and r8, and thus both
      r8 and r0 have to be marked as precise.
      This happens because [1] can only account for such range
      transfers at parent/child state boundaries.
      
      The solution suggested by Andrii Nakryiko in [0] is to use jump
      history to remember which registers gained range as a result of
      find_equal_scalars() [renamed to sync_linked_regs()] and use
      this information in backtrack_insn().
      Which is what this patch-set does.
      
      The patch-set uses u64 value as a vector of 10-bit values that
      identify registers gaining range in find_equal_scalars().
      This amounts to maximum of 6 possible values.
      To check if such capacity is sufficient I've instrumented kernel
      to track a histogram for maximal amount of registers that gain range
      in find_equal_scalars per program verification [2].
      Measurements done for verifier selftests and Cilium bpf object files
      from [3] show that number of such registers is *always* <= 4 and
      in 98% of cases it is <= 2.
      
      When tested on a subset of selftests identified by
      selftests/bpf/veristat.cfg and Cilium bpf object files from [3]
      this patch-set has minimal verification performance impact:
      
      File                      Program                   Insns   (DIFF)  States (DIFF)
      ------------------------  ------------------------  --------------  -------------
      bpf_host.o                tail_handle_nat_fwd_ipv4    -75 (-0.61%)    -3 (-0.39%)
      pyperf600_nounroll.bpf.o  on_event                  +1673 (+0.33%)    +3 (+0.01%)
      
      [0] https://lore.kernel.org/bpf/CAEf4BzZ0xidVCqB47XnkXcNhkPWF6_nTV7yt+_Lf0kcFEut2Mg@mail.gmail.com/
      [1] commit 904e6ddf ("bpf: Use scalar ids in mark_chain_precision()")
      [2] https://github.com/eddyz87/bpf/tree/find-equal-scalars-in-jump-history-with-stats
      [3] https://github.com/anakryiko/cilium
      
      Changes:
      - v2 -> v3:
        A number of stylistic changes suggested by Andrii:
        - renamings:
          - struct reg_or_spill   -> linked_reg;
          - find_equal_scalars()  -> collect_linked_regs;
          - copy_known_reg()      -> sync_linked_regs;
        - collect_linked_regs() now returns linked regs set of
          size 2 or larger;
        - dropped usage of bit fields in struct linked_reg;
        - added a patch changing references to find_equal_scalars() in
          selftests comments.
      - v1 -> v2:
        - patch "bpf: replace env->cur_hist_ent with a getter function" is
          dropped (Andrii);
        - added structure linked_regs and helper functions to [de]serialize
          u64 value as such structure (Andrii);
        - bt_set_equal_scalars() renamed to bt_sync_linked_regs(), moved to
          start and end of backtrack_insn() in order to untie linked
          register logic from conditional jumps backtracking.
          Andrii requested a more radical change of moving linked registers
          processing to bt_set_xxx() functions, I did an experiment in this
          direction:
          https://github.com/eddyz87/bpf/tree/find-equal-scalars-in-jump-history--linked-regs-in-bt-set-reg
          the end result of the experiment seems much uglier than version
          presented in v2.
      
      Revisions:
      - v1: https://lore.kernel.org/bpf/20240222005005.31784-1-eddyz87@gmail.com/
      - v2: https://lore.kernel.org/bpf/20240705205851.2635794-1-eddyz87@gmail.com/
      ====================
      
      Link: https://lore.kernel.org/r/20240718202357.1746514-1-eddyz87@gmail.comSigned-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      bde0c5a7
    • Eduard Zingerman's avatar
      selftests/bpf: Update comments find_equal_scalars->sync_linked_regs · cfbf2548
      Eduard Zingerman authored
      find_equal_scalars() is renamed to sync_linked_regs(),
      this commit updates existing references in the selftests comments.
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240718202357.1746514-5-eddyz87@gmail.com
      cfbf2548
    • Eduard Zingerman's avatar
      selftests/bpf: Tests for per-insn sync_linked_regs() precision tracking · bebc17b1
      Eduard Zingerman authored
      Add a few test cases to verify precision tracking for scalars gaining
      range because of sync_linked_regs():
      - check what happens when more than 6 registers might gain range in
        sync_linked_regs();
      - check if precision is propagated correctly when operand of
        conditional jump gained range in sync_linked_regs() and one of
        linked registers is marked precise;
      - check if precision is propagated correctly when operand of
        conditional jump gained range in sync_linked_regs() and a
        other-linked operand of the conditional jump is marked precise;
      - add a minimized reproducer for precision tracking bug reported in [0];
      - Check that mark_chain_precision() for one of the conditional jump
        operands does not trigger equal scalars precision propagation.
      
      [0] https://lore.kernel.org/bpf/CAEf4BzZ0xidVCqB47XnkXcNhkPWF6_nTV7yt+_Lf0kcFEut2Mg@mail.gmail.com/Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240718202357.1746514-4-eddyz87@gmail.com
      bebc17b1
    • Eduard Zingerman's avatar
      bpf: Remove mark_precise_scalar_ids() · 842edb55
      Eduard Zingerman authored
      Function mark_precise_scalar_ids() is superseded by
      bt_sync_linked_regs() and equal scalars tracking in jump history.
      mark_precise_scalar_ids() propagates precision over registers sharing
      same ID on parent/child state boundaries, while jump history records
      allow bt_sync_linked_regs() to propagate same information with
      instruction level granularity, which is strictly more precise.
      
      This commit removes mark_precise_scalar_ids() and updates test cases
      in progs/verifier_scalar_ids to reflect new verifier behavior.
      
      The tests are updated in the following manner:
      - mark_precise_scalar_ids() propagated precision regardless of
        presence of conditional jumps, while new jump history based logic
        only kicks in when conditional jumps are present.
        Hence test cases are augmented with conditional jumps to still
        trigger precision propagation.
      - As equal scalars tracking no longer relies on parent/child state
        boundaries some test cases are no longer interesting,
        such test cases are removed, namely:
        - precision_same_state and precision_cross_state are superseded by
          linked_regs_bpf_k;
        - precision_same_state_broken_link and equal_scalars_broken_link
          are superseded by linked_regs_broken_link.
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240718202357.1746514-3-eddyz87@gmail.com
      842edb55
    • Eduard Zingerman's avatar
      bpf: Track equal scalars history on per-instruction level · 4bf79f9b
      Eduard Zingerman authored
      Use bpf_verifier_state->jmp_history to track which registers were
      updated by find_equal_scalars() (renamed to collect_linked_regs())
      when conditional jump was verified. Use recorded information in
      backtrack_insn() to propagate precision.
      
      E.g. for the following program:
      
                  while verifying instructions
        1: r1 = r0              |
        2: if r1 < 8  goto ...  | push r0,r1 as linked registers in jmp_history
        3: if r0 > 16 goto ...  | push r0,r1 as linked registers in jmp_history
        4: r2 = r10             |
        5: r2 += r0             v mark_chain_precision(r0)
      
                  while doing mark_chain_precision(r0)
        5: r2 += r0             | mark r0 precise
        4: r2 = r10             |
        3: if r0 > 16 goto ...  | mark r0,r1 as precise
        2: if r1 < 8  goto ...  | mark r0,r1 as precise
        1: r1 = r0              v
      
      Technically, do this as follows:
      - Use 10 bits to identify each register that gains range because of
        sync_linked_regs():
        - 3 bits for frame number;
        - 6 bits for register or stack slot number;
        - 1 bit to indicate if register is spilled.
      - Use u64 as a vector of 6 such records + 4 bits for vector length.
      - Augment struct bpf_jmp_history_entry with a field 'linked_regs'
        representing such vector.
      - When doing check_cond_jmp_op() remember up to 6 registers that
        gain range because of sync_linked_regs() in such a vector.
      - Don't propagate range information and reset IDs for registers that
        don't fit in 6-value vector.
      - Push a pair {instruction index, linked registers vector}
        to bpf_verifier_state->jmp_history.
      - When doing backtrack_insn() check if any of recorded linked
        registers is currently marked precise, if so mark all linked
        registers as precise.
      
      This also requires fixes for two test_verifier tests:
      - precise: test 1
      - precise: test 2
      
      Both tests contain the following instruction sequence:
      
      19: (bf) r2 = r9                      ; R2=scalar(id=3) R9=scalar(id=3)
      20: (a5) if r2 < 0x8 goto pc+1        ; R2=scalar(id=3,umin=8)
      21: (95) exit
      22: (07) r2 += 1                      ; R2_w=scalar(id=3+1,...)
      23: (bf) r1 = r10                     ; R1_w=fp0 R10=fp0
      24: (07) r1 += -8                     ; R1_w=fp-8
      25: (b7) r3 = 0                       ; R3_w=0
      26: (85) call bpf_probe_read_kernel#113
      
      The call to bpf_probe_read_kernel() at (26) forces r2 to be precise.
      Previously, this forced all registers with same id to become precise
      immediately when mark_chain_precision() is called.
      After this change, the precision is propagated to registers sharing
      same id only when 'if' instruction is backtracked.
      Hence verification log for both tests is changed:
      regs=r2,r9 -> regs=r2 for instructions 25..20.
      
      Fixes: 904e6ddf ("bpf: Use scalar ids in mark_chain_precision()")
      Reported-by: default avatarHao Sun <sunhao.th@gmail.com>
      Suggested-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240718202357.1746514-2-eddyz87@gmail.com
      
      Closes: https://lore.kernel.org/bpf/CAEf4BzZ0xidVCqB47XnkXcNhkPWF6_nTV7yt+_Lf0kcFEut2Mg@mail.gmail.com/
      4bf79f9b
    • Ihor Solodrai's avatar
      selftests/bpf: Use auto-dependencies for test objects · 844f7315
      Ihor Solodrai authored
      Make use of -M compiler options when building .test.o objects to
      generate .d files and avoid re-building all tests every time.
      
      Previously, if a single test bpf program under selftests/bpf/progs/*.c
      has changed, make would rebuild all the *.bpf.o, *.skel.h and *.test.o
      objects, which is a lot of unnecessary work.
      
      A typical dependency chain is:
      progs/x.c -> x.bpf.o -> x.skel.h -> x.test.o -> trunner_binary
      
      However for many tests it's not a 1:1 mapping by name, and so far
      %.test.o have been simply dependent on all %.skel.h files, and
      %.skel.h files on all %.bpf.o objects.
      
      Avoid full rebuilds by instructing the compiler (via -MMD) to
      produce *.d files with real dependencies, and appropriately including
      them. Exploit make feature that rebuilds included makefiles if they
      were changed by setting %.test.d as prerequisite for %.test.o files.
      
      A couple of examples of compilation time speedup (after the first
      clean build):
      
      $ touch progs/verifier_and.c && time make -j8
      Before: real	0m16.651s
      After:  real	0m2.245s
      $ touch progs/read_vsyscall.c && time make -j8
      Before: real	0m15.743s
      After:  real	0m1.575s
      
      A drawback of this change is that now there is an overhead due to make
      processing lots of .d files, which potentially may slow down unrelated
      targets. However a time to make all from scratch hasn't changed
      significantly:
      
      $ make clean && time make -j8
      Before: real	1m31.148s
      After:  real	1m30.309s
      Suggested-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarIhor Solodrai <ihor.solodrai@pm.me>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/VJihUTnvtwEgv_mOnpfy7EgD9D2MPNoHO-MlANeLIzLJPGhDeyOuGKIYyKgk0O6KPjfM-MuhtvPwZcngN8WFqbTnTRyCSMc2aMZ1ODm1T_g=@pm.me
      844f7315
    • Markus Elfring's avatar
      bpf: Simplify character output in seq_print_delegate_opts() · f157f9cb
      Markus Elfring authored
      Single characters should be put into a sequence.
      Thus use the corresponding function “seq_putc” for two selected calls.
      
      This issue was transformed by using the Coccinelle software.
      Suggested-by: default avatarChristophe Jaillet <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/abde0992-3d71-44d2-ab27-75b382933a22@web.de
      f157f9cb
    • Markus Elfring's avatar
      bpf: Replace 8 seq_puts() calls by seq_putc() calls · df862de4
      Markus Elfring authored
      Single line breaks should occasionally be put into a sequence.
      Thus use the corresponding function “seq_putc”.
      
      This issue was transformed by using the Coccinelle software.
      Signed-off-by: default avatarMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/e26b7df9-cd63-491f-85e8-8cabe60a85e5@web.de
      df862de4
    • Martin KaFai Lau's avatar
      Merge branch 'use network helpers, part 9' · 22912472
      Martin KaFai Lau authored
      Geliang Tang says:
      
      ====================
      v3:
       - patch 2:
         - clear errno before connect_to_fd_opts.
         - print err logs in run_test.
         - set err to -1 when fd >= 0.
       - patch 3:
         - drop "int err".
      
      v2:
       - update patch 2 as Martin suggested.
      
      This is the 9th part of series "use network helpers" all BPF selftests
      wide.
      
      Patches 1-2 update network helpers interfaces suggested by Martin.
      Patch 3 adds a new helper connect_to_addr_str() as Martin suggested
      instead of adding connect_fd_to_addr_str().
      Patch 4 uses this newly added helper in make_client().
      Patch 5 uses make_client() in sk_lookup and drop make_socket().
      ====================
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      22912472
    • Geliang Tang's avatar
      selftests/bpf: Add connect_to_addr_str helper · c70b2d90
      Geliang Tang authored
      Similar to connect_to_addr() helper for connecting to a server with the
      given sockaddr_storage type address, this patch adds a new helper named
      connect_to_addr_str() for connecting to a server with the given string
      type address "addr_str", together with its "family" and "port" as other
      parameters of connect_to_addr_str().
      
      In connect_to_addr_str(), the parameters "family", "addr_str" and "port"
      are used to create a sockaddr_storage type address "addr" by invoking
      make_sockaddr(). Then pass this "addr" together with "addrlen", "type"
      and "opts" to connect_to_addr().
      Suggested-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarGeliang Tang <tanggeliang@kylinos.cn>
      Link: https://lore.kernel.org/r/647e82170831558dbde132a7a3d86df660dba2c4.1721282219.git.tanggeliang@kylinos.cnSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      c70b2d90
    • Geliang Tang's avatar
      selftests/bpf: Drop must_fail from network_helper_opts · e1ee5a48
      Geliang Tang authored
      The struct member "must_fail" of network_helper_opts() is only used in
      cgroup_v1v2 tests, it makes sense to drop it from network_helper_opts.
      
      Return value (fd) of connect_to_fd_opts() and the expect errno (EPERM)
      can be checked in cgroup_v1v2.c directly, no need to check them in
      connect_fd_to_addr() in network_helpers.c.
      
      This also makes connect_fd_to_addr() function useless. It can be replaced
      by connect().
      Suggested-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarGeliang Tang <tanggeliang@kylinos.cn>
      Link: https://lore.kernel.org/r/3faf336019a9a48e2e8951f4cdebf19e3ac6e441.1721282219.git.tanggeliang@kylinos.cnSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      e1ee5a48
    • Geliang Tang's avatar
      selftests/bpf: Drop type of connect_to_fd_opts · a63507f3
      Geliang Tang authored
      The "type" parameter of connect_to_fd_opts() is redundant of "server_fd".
      Since the "type" can be obtained inside by invoking getsockopt(SO_TYPE),
      without passing it in as a parameter.
      
      This patch drops the "type" parameter of connect_to_fd_opts() and updates
      its callers.
      Suggested-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarGeliang Tang <tanggeliang@kylinos.cn>
      Link: https://lore.kernel.org/r/50d8ce7ab7ab0c0f4d211fc7cc4ebe3d3f63424c.1721282219.git.tanggeliang@kylinos.cnSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      a63507f3
  2. 25 Jul, 2024 24 commits