1. 05 Jun, 2023 7 commits
    • David Vernet's avatar
      selftests/bpf: Add test for non-NULLable PTR_TO_BTF_IDs · f904c678
      David Vernet authored
      In a recent patch, we taught the verifier that trusted PTR_TO_BTF_ID can
      never be NULL. This prevents the verifier from incorrectly failing to
      load certain programs where it gets confused and thinks a reference
      isn't dropped because it incorrectly assumes that a branch exists in
      which a NULL PTR_TO_BTF_ID pointer is never released.
      
      This patch adds a testcase that verifies this cannot happen.
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/20230602150112.1494194-2-void@manifault.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f904c678
    • David Vernet's avatar
      bpf: Teach verifier that trusted PTR_TO_BTF_ID pointers are non-NULL · 51302c95
      David Vernet authored
      In reg_type_not_null(), we currently assume that a pointer may be NULL
      if it has the PTR_MAYBE_NULL modifier, or if it doesn't belong to one of
      several base type of pointers that are never NULL-able. For example,
      PTR_TO_CTX, PTR_TO_MAP_VALUE, etc.
      
      It turns out that in some cases, PTR_TO_BTF_ID can never be NULL as
      well, though we currently don't specify it. For example, if you had the
      following program:
      
      SEC("tc")
      long example_refcnt_fail(void *ctx)
      {
      	struct bpf_cpumask *mask1, *mask2;
      
      	mask1 = bpf_cpumask_create();
      	mask2 = bpf_cpumask_create();
      
              if (!mask1 || !mask2)
      		goto error_release;
      
      	bpf_cpumask_test_cpu(0, (const struct cpumask *)mask1);
      	bpf_cpumask_test_cpu(0, (const struct cpumask *)mask2);
      
      error_release:
      	if (mask1)
      		bpf_cpumask_release(mask1);
      	if (mask2)
      		bpf_cpumask_release(mask2);
      	return ret;
      }
      
      The verifier will incorrectly fail to load the program, thinking
      (unintuitively) that we have a possibly-unreleased reference if the mask
      is NULL, because we (correctly) don't issue a bpf_cpumask_release() on
      the NULL path.
      
      The reason the verifier gets confused is due to the fact that we don't
      explicitly tell the verifier that trusted PTR_TO_BTF_ID pointers can
      never be NULL. Basically, if we successfully get past the if check
      (meaning both pointers go from ptr_or_null_bpf_cpumask to
      ptr_bpf_cpumask), the verifier will correctly assume that the references
      need to be dropped on any possible branch that leads to program exit.
      However, it will _incorrectly_ think that the ptr == NULL branch is
      possible, and will erroneously detect it as a branch on which we failed
      to drop the reference.
      
      The solution is of course to teach the verifier that trusted
      PTR_TO_BTF_ID pointers can never be NULL, so that it doesn't incorrectly
      think it's possible for the reference to be present on the ptr == NULL
      branch.
      
      A follow-on patch will add a selftest that verifies this behavior.
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/r/20230602150112.1494194-1-void@manifault.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      51302c95
    • Daniel T. Lee's avatar
      bpf: Replace open code with for allocated object check · 503e4def
      Daniel T. Lee authored
      >From commit 282de143 ("bpf: Introduce allocated objects support"),
      With this allocated object with BPF program, (PTR_TO_BTF_ID | MEM_ALLOC)
      has been a way of indicating to check the type is the allocated object.
      
      commit d8939cb0 ("bpf: Loosen alloc obj test in verifier's
      reg_btf_record")
      >From the commit, there has been helper function for checking this, named
      type_is_ptr_alloc_obj(). But still, some of the code use open code to
      retrieve this info. This commit replaces the open code with the
      type_is_alloc(), and the type_is_ptr_alloc_obj() function.
      Signed-off-by: default avatarDaniel T. Lee <danieltimlee@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20230527122706.59315-1-danieltimlee@gmail.com
      503e4def
    • Jesper Dangaard Brouer's avatar
      bpf/xdp: optimize bpf_xdp_pointer to avoid reading sinfo · 41148662
      Jesper Dangaard Brouer authored
      Currently we observed a significant performance degradation in
      samples/bpf xdp1 and xdp2, due XDP multibuffer "xdp.frags" handling,
      added in commit 77225174 ("samples/bpf: fixup some tools to be able
      to support xdp multibuffer").
      
      This patch reduce the overhead by avoiding to read/load shared_info
      (sinfo) memory area, when XDP packet don't have any frags. This improves
      performance because sinfo is located in another cacheline.
      
      Function bpf_xdp_pointer() is used by BPF helpers bpf_xdp_load_bytes()
      and bpf_xdp_store_bytes(). As a help to reviewers, xdp_get_buff_len() can
      potentially access sinfo, but it uses xdp_buff_has_frags() flags bit check
      to avoid accessing sinfo in no-frags case.
      
      The likely/unlikely instrumentation lays out asm code such that sinfo
      access isn't interleaved with no-frags case (checked on GCC 12.2.1-4).
      The generated asm code is more compact towards the no-frags case.
      
      The BPF kfunc bpf_dynptr_slice() also use bpf_xdp_pointer(). Thus, it
      should also take effect for that.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/r/168563651438.3436004.17735707525651776648.stgit@firesoulSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      41148662
    • Dave Marchevsky's avatar
      bpf: Make bpf_refcount_acquire fallible for non-owning refs · 7793fc3b
      Dave Marchevsky authored
      This patch fixes an incorrect assumption made in the original
      bpf_refcount series [0], specifically that the BPF program calling
      bpf_refcount_acquire on some node can always guarantee that the node is
      alive. In that series, the patch adding failure behavior to rbtree_add
      and list_push_{front, back} breaks this assumption for non-owning
      references.
      
      Consider the following program:
      
        n = bpf_kptr_xchg(&mapval, NULL);
        /* skip error checking */
      
        bpf_spin_lock(&l);
        if(bpf_rbtree_add(&t, &n->rb, less)) {
          bpf_refcount_acquire(n);
          /* Failed to add, do something else with the node */
        }
        bpf_spin_unlock(&l);
      
      It's incorrect to assume that bpf_refcount_acquire will always succeed in this
      scenario. bpf_refcount_acquire is being called in a critical section
      here, but the lock being held is associated with rbtree t, which isn't
      necessarily the lock associated with the tree that the node is already
      in. So after bpf_rbtree_add fails to add the node and calls bpf_obj_drop
      in it, the program has no ownership of the node's lifetime. Therefore
      the node's refcount can be decr'd to 0 at any time after the failing
      rbtree_add. If this happens before the refcount_acquire above, the node
      might be free'd, and regardless refcount_acquire will be incrementing a
      0 refcount.
      
      Later patches in the series exercise this scenario, resulting in the
      expected complaint from the kernel (without this patch's changes):
      
        refcount_t: addition on 0; use-after-free.
        WARNING: CPU: 1 PID: 207 at lib/refcount.c:25 refcount_warn_saturate+0xbc/0x110
        Modules linked in: bpf_testmod(O)
        CPU: 1 PID: 207 Comm: test_progs Tainted: G           O       6.3.0-rc7-02231-g723de1a718a2-dirty #371
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
        RIP: 0010:refcount_warn_saturate+0xbc/0x110
        Code: 6f 64 f6 02 01 e8 84 a3 5c ff 0f 0b eb 9d 80 3d 5e 64 f6 02 00 75 94 48 c7 c7 e0 13 d2 82 c6 05 4e 64 f6 02 01 e8 64 a3 5c ff <0f> 0b e9 7a ff ff ff 80 3d 38 64 f6 02 00 0f 85 6d ff ff ff 48 c7
        RSP: 0018:ffff88810b9179b0 EFLAGS: 00010082
        RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
        RDX: 0000000000000202 RSI: 0000000000000008 RDI: ffffffff857c3680
        RBP: ffff88810027d3c0 R08: ffffffff8125f2a4 R09: ffff88810b9176e7
        R10: ffffed1021722edc R11: 746e756f63666572 R12: ffff88810027d388
        R13: ffff88810027d3c0 R14: ffffc900005fe030 R15: ffffc900005fe048
        FS:  00007fee0584a700(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00005634a96f6c58 CR3: 0000000108ce9002 CR4: 0000000000770ee0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        PKRU: 55555554
        Call Trace:
         <TASK>
         bpf_refcount_acquire_impl+0xb5/0xc0
      
        (rest of output snipped)
      
      The patch addresses this by changing bpf_refcount_acquire_impl to use
      refcount_inc_not_zero instead of refcount_inc and marking
      bpf_refcount_acquire KF_RET_NULL.
      
      For owning references, though, we know the above scenario is not possible
      and thus that bpf_refcount_acquire will always succeed. Some verifier
      bookkeeping is added to track "is input owning ref?" for bpf_refcount_acquire
      calls and return false from is_kfunc_ret_null for bpf_refcount_acquire on
      owning refs despite it being marked KF_RET_NULL.
      
      Existing selftests using bpf_refcount_acquire are modified where
      necessary to NULL-check its return value.
      
        [0]: https://lore.kernel.org/bpf/20230415201811.343116-1-davemarchevsky@fb.com/
      
      Fixes: d2dcc67d ("bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail")
      Reported-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230602022647.1571784-5-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7793fc3b
    • Dave Marchevsky's avatar
      bpf: Fix __bpf_{list,rbtree}_add's beginning-of-node calculation · cc0d76ca
      Dave Marchevsky authored
      Given the pointer to struct bpf_{rb,list}_node within a local kptr and
      the byte offset of that field within the kptr struct, the calculation changed
      by this patch is meant to find the beginning of the kptr so that it can
      be passed to bpf_obj_drop.
      
      Unfortunately instead of doing
      
        ptr_to_kptr = ptr_to_node_field - offset_bytes
      
      the calculation is erroneously doing
      
        ptr_to_ktpr = ptr_to_node_field - (offset_bytes * sizeof(struct bpf_rb_node))
      
      or the bpf_list_node equivalent.
      
      This patch fixes the calculation.
      
      Fixes: d2dcc67d ("bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail")
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230602022647.1571784-4-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      cc0d76ca
    • Dave Marchevsky's avatar
      bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs · 2140a6e3
      Dave Marchevsky authored
      In verifier.c, fixup_kfunc_call uses struct bpf_insn_aux_data's
      kptr_struct_meta field to pass information about local kptr types to
      various helpers and kfuncs at runtime. The recent bpf_refcount series
      added a few functions to the set that need this information:
      
        * bpf_refcount_acquire
          * Needs to know where the refcount field is in order to increment
        * Graph collection insert kfuncs: bpf_rbtree_add, bpf_list_push_{front,back}
          * Were migrated to possibly fail by the bpf_refcount series. If
            insert fails, the input node is bpf_obj_drop'd. bpf_obj_drop needs
            the kptr_struct_meta in order to decr refcount and properly free
            special fields.
      
      Unfortunately the verifier handling of collection insert kfuncs was not
      modified to actually populate kptr_struct_meta. Accordingly, when the
      node input to those kfuncs is passed to bpf_obj_drop, it is done so
      without the information necessary to decr refcount.
      
      This patch fixes the issue by populating kptr_struct_meta for those
      kfuncs.
      
      Fixes: d2dcc67d ("bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail")
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230602022647.1571784-3-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2140a6e3
  2. 01 Jun, 2023 2 commits
    • Louis DeLosSantos's avatar
      selftests/bpf: Test table ID fib lookup BPF helper · d4ae3e58
      Louis DeLosSantos authored
      Add additional test cases to `fib_lookup.c` prog_test.
      
      These test cases add a new /24 network to the previously unused veth2
      device, removes the directly connected route from the main routing table
      and moves it to table 100.
      
      The first test case then confirms a fib lookup for a remote address in
      this directly connected network, using the main routing table fails.
      
      The second test case ensures the same fib lookup using table 100 succeeds.
      
      An additional pair of tests which function in the same manner are added
      for IPv6.
      Signed-off-by: default avatarLouis DeLosSantos <louis.delos.devel@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20230505-bpf-add-tbid-fib-lookup-v2-2-0a31c22c748c@gmail.com
      d4ae3e58
    • Louis DeLosSantos's avatar
      bpf: Add table ID to bpf_fib_lookup BPF helper · 8ad77e72
      Louis DeLosSantos authored
      Add ability to specify routing table ID to the `bpf_fib_lookup` BPF
      helper.
      
      A new field `tbid` is added to `struct bpf_fib_lookup` used as
      parameters to the `bpf_fib_lookup` BPF helper.
      
      When the helper is called with the `BPF_FIB_LOOKUP_DIRECT` and
      `BPF_FIB_LOOKUP_TBID` flags the `tbid` field in `struct bpf_fib_lookup`
      will be used as the table ID for the fib lookup.
      
      If the `tbid` does not exist the fib lookup will fail with
      `BPF_FIB_LKUP_RET_NOT_FWDED`.
      
      The `tbid` field becomes a union over the vlan related output fields
      in `struct bpf_fib_lookup` and will be zeroed immediately after usage.
      
      This functionality is useful in containerized environments.
      
      For instance, if a CNI wants to dictate the next-hop for traffic leaving
      a container it can create a container-specific routing table and perform
      a fib lookup against this table in a "host-net-namespace-side" TC program.
      
      This functionality also allows `ip rule` like functionality at the TC
      layer, allowing an eBPF program to pick a routing table based on some
      aspect of the sk_buff.
      
      As a concrete use case, this feature will be used in Cilium's SRv6 L3VPN
      datapath.
      
      When egress traffic leaves a Pod an eBPF program attached by Cilium will
      determine which VRF the egress traffic should target, and then perform a
      FIB lookup in a specific table representing this VRF's FIB.
      Signed-off-by: default avatarLouis DeLosSantos <louis.delos.devel@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20230505-bpf-add-tbid-fib-lookup-v2-1-0a31c22c748c@gmail.com
      8ad77e72
  3. 31 May, 2023 4 commits
  4. 30 May, 2023 27 commits