1. 28 Aug, 2023 1 commit
  2. 27 Aug, 2023 13 commits
  3. 26 Aug, 2023 18 commits
  4. 25 Aug, 2023 8 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-refcount-followups-3-bpf_mem_free_rcu-refcounted-nodes' · ec0ded2e
      Alexei Starovoitov authored
      Dave Marchevsky says:
      
      ====================
      BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes
      
      This series is the third of three (or more) followups to address issues
      in the bpf_refcount shared ownership implementation discovered by Kumar.
      This series addresses the use-after-free scenario described in [0]. The
      first followup series ([1]) also attempted to address the same
      use-after-free, but only got rid of the splat without addressing the
      underlying issue. After this series the underyling issue is fixed and
      bpf_refcount_acquire can be re-enabled.
      
      The main fix here is migration of bpf_obj_drop to use
      bpf_mem_free_rcu. To understand why this fixes the issue, let us consider
      the example interleaving provided by Kumar in [0]:
      
      CPU 0                                   CPU 1
      n = bpf_obj_new
      lock(lock1)
      bpf_rbtree_add(rbtree1, n)
      m = bpf_rbtree_acquire(n)
      unlock(lock1)
      
      kptr_xchg(map, m) // move to map
      // at this point, refcount = 2
      					m = kptr_xchg(map, NULL)
      					lock(lock2)
      lock(lock1)				bpf_rbtree_add(rbtree2, m)
      p = bpf_rbtree_first(rbtree1)			if (!RB_EMPTY_NODE) bpf_obj_drop_impl(m) // A
      bpf_rbtree_remove(rbtree1, p)
      unlock(lock1)
      bpf_obj_drop(p) // B
      					bpf_refcount_acquire(m) // use-after-free
      					...
      
      Before this series, bpf_obj_drop returns memory to the allocator using
      bpf_mem_free. At this point (B in the example) there might be some
      non-owning references to that memory which the verifier believes are valid,
      but where the underlying memory was reused for some other allocation.
      Commit 7793fc3b ("bpf: Make bpf_refcount_acquire fallible for
      non-owning refs") attempted to fix this by doing refcount_inc_non_zero
      on refcount_acquire in instead of refcount_inc under the assumption that
      preventing erroneous incr-on-0 would be sufficient. This isn't true,
      though: refcount_inc_non_zero must *check* if the refcount is zero, and
      the memory it's checking could have been reused, so the check may look
      at and incr random reused bytes.
      
      If we wait to reuse this memory until all non-owning refs that could
      point to it are gone, there is no possibility of this scenario
      happening. Migrating bpf_obj_drop to use bpf_mem_free_rcu for refcounted
      nodes accomplishes this.
      
      For such nodes, the validity of their underlying memory is now tied to
      RCU critical section. This matches MEM_RCU trustedness
      expectations, so the series takes the opportunity to more explicitly
      mark this trustedness state.
      
      The functional effects of trustedness changes here are rather small.
      This is largely due to local kptrs having separate verifier handling -
      with implicit trustedness assumptions - than arbitrary kptrs.
      Regardless, let's take the opportunity to move towards a world where
      trustedness is more explicitly handled.
      
      Changelog:
      
      v1 -> v2: https://lore.kernel.org/bpf/20230801203630.3581291-1-davemarchevsky@fb.com/
      
      Patch 1 ("bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire")
        * Spent some time experimenting with a better approach as per convo w/
          Yonghong on v1's patch. It started getting too complex, so left unchanged
          for now. Yonghong was fine with this approach being shipped.
      
      Patch 2 ("bpf: Consider non-owning refs trusted")
        * Add Yonghong ack
      Patch 3 ("bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes")
        * Add Yonghong ack
      Patch 4 ("bpf: Reenable bpf_refcount_acquire")
        * Add Yonghong ack
      
      Patch 5 ("bpf: Consider non-owning refs to refcounted nodes RCU protected")
        * Undo a nonfunctional whitespace change that shouldn't have been included
          (Yonghong)
        * Better logging message when complaining about rcu_read_{lock,unlock} in
          rbtree cb (Alexei)
        * Don't invalidate_non_owning_refs when processing bpf_rcu_read_unlock
          (Yonghong, Alexei)
      
      Patch 6 ("[RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS")
        * preempt_{disable,enable} in __bpf_spin_{lock,unlock} (Alexei)
          * Due to this we can consider spin_lock CS an RCU-sched read-side CS (per
            RCU/Design/Requirements/Requirements.rst). Modify in_rcu_cs accordingly.
        * no need to check for !in_rcu_cs before allowing bpf_spin_{lock,unlock}
          (Alexei)
        * RFC tag removed and renamed to "bpf: Allow bpf_spin_{lock,unlock} in
          sleepable progs"
      
      Patch 7 ("selftests/bpf: Add tests for rbtree API interaction in sleepable progs")
        * Remove "no explicit bpf_rcu_read_lock" failure test, add similar success
          test (Alexei)
      
      Summary of patch contents, with sub-bullets being leading questions and
      comments I think are worth reviewer attention:
      
        * Patches 1 and 2 are moreso documententation - and
          enforcement, in patch 1's case - of existing semantics / expectations
      
        * Patch 3 changes bpf_obj_drop behavior for refcounted nodes such that
          their underlying memory is not reused until RCU grace period elapses
          * Perhaps it makes sense to move to mem_free_rcu for _all_
            non-owning refs in the future, not just refcounted. This might
            allow custom non-owning ref lifetime + invalidation logic to be
            entirely subsumed by MEM_RCU handling. IMO this needs a bit more
            thought and should be tackled outside of a fix series, so it's not
            attempted here.
      
        * Patch 4 re-enables bpf_refcount_acquire as changes in patch 3 fix
          the remaining use-after-free
          * One might expect this patch to be last in the series, or last
            before selftest changes. Patches 5 and 6 don't change
            verification or runtime behavior for existing BPF progs, though.
      
        * Patch 5 brings the verifier's understanding of refcounted node
          trustedness in line with Patch 4's changes
      
        * Patch 6 allows some bpf_spin_{lock, unlock} calls in sleepable
          progs. Marked RFC for a few reasons:
          * bpf_spin_{lock,unlock} haven't been usable in sleepable progs
            since before the introduction of bpf linked list and rbtree. As
            such this feels more like a new feature that may not belong in
            this fixes series.
      
        * Patch 7 adds tests
      
        [0]: https://lore.kernel.org/bpf/atfviesiidev4hu53hzravmtlau3wdodm2vqs7rd7tnwft34e3@xktodqeqevir/
        [1]: https://lore.kernel.org/bpf/20230602022647.1571784-1-davemarchevsky@fb.com/
      ====================
      
      Link: https://lore.kernel.org/r/20230821193311.3290257-1-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ec0ded2e
    • Dave Marchevsky's avatar
      selftests/bpf: Add tests for rbtree API interaction in sleepable progs · 312aa5bd
      Dave Marchevsky authored
      Confirm that the following sleepable prog states fail verification:
        * bpf_rcu_read_unlock before bpf_spin_unlock
           * RCU CS will last at least as long as spin_lock CS
      
      Also confirm that correct usage passes verification, specifically:
        * Explicit use of bpf_rcu_read_{lock, unlock} in sleepable test prog
        * Implied RCU CS due to spin_lock CS
      
      None of the selftest progs actually attach to bpf_testmod's
      bpf_testmod_test_read.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230821193311.3290257-8-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      312aa5bd
    • Dave Marchevsky's avatar
      bpf: Allow bpf_spin_{lock,unlock} in sleepable progs · 5861d1e8
      Dave Marchevsky authored
      Commit 9e7a4d98 ("bpf: Allow LSM programs to use bpf spin locks")
      disabled bpf_spin_lock usage in sleepable progs, stating:
      
       Sleepable LSM programs can be preempted which means that allowng spin
       locks will need more work (disabling preemption and the verifier
       ensuring that no sleepable helpers are called when a spin lock is
       held).
      
      This patch disables preemption before grabbing bpf_spin_lock. The second
      requirement above "no sleepable helpers are called when a spin lock is
      held" is implicitly enforced by current verifier logic due to helper
      calls in spin_lock CS being disabled except for a few exceptions, none
      of which sleep.
      
      Due to above preemption changes, bpf_spin_lock CS can also be considered
      a RCU CS, so verifier's in_rcu_cs check is modified to account for this.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230821193311.3290257-7-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5861d1e8
    • Dave Marchevsky's avatar
      bpf: Consider non-owning refs to refcounted nodes RCU protected · 0816b8c6
      Dave Marchevsky authored
      An earlier patch in the series ensures that the underlying memory of
      nodes with bpf_refcount - which can have multiple owners - is not reused
      until RCU grace period has elapsed. This prevents
      use-after-free with non-owning references that may point to
      recently-freed memory. While RCU read lock is held, it's safe to
      dereference such a non-owning ref, as by definition RCU GP couldn't have
      elapsed and therefore underlying memory couldn't have been reused.
      
      From the perspective of verifier "trustedness" non-owning refs to
      refcounted nodes are now trusted only in RCU CS and therefore should no
      longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
      MEM_RCU in order to reflect this new state.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Link: https://lore.kernel.org/r/20230821193311.3290257-6-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0816b8c6
    • Dave Marchevsky's avatar
      bpf: Reenable bpf_refcount_acquire · ba2464c8
      Dave Marchevsky authored
      Now that all reported issues are fixed, bpf_refcount_acquire can be
      turned back on. Also reenable all bpf_refcount-related tests which were
      disabled.
      
      This a revert of:
       * commit f3514a5d ("selftests/bpf: Disable newly-added 'owner' field test until refcount re-enabled")
       * commit 7deca5ea ("bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed")
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20230821193311.3290257-5-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ba2464c8
    • Dave Marchevsky's avatar
      bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes · 7e26cd12
      Dave Marchevsky authored
      This is the final fix for the use-after-free scenario described in
      commit 7793fc3b ("bpf: Make bpf_refcount_acquire fallible for
      non-owning refs"). That commit, by virtue of changing
      bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed
      the "refcount incr on 0" splat. The not_zero check in
      refcount_inc_not_zero, though, still occurs on memory that could have
      been free'd and reused, so the commit didn't properly fix the root
      cause.
      
      This patch actually fixes the issue by free'ing using the recently-added
      bpf_mem_free_rcu, which ensures that the memory is not reused until
      RCU grace period has elapsed. If that has happened then
      there are no non-owning references alive that point to the
      recently-free'd memory, so it can be safely reused.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20230821193311.3290257-4-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7e26cd12
    • Dave Marchevsky's avatar
      bpf: Consider non-owning refs trusted · 2a6d50b5
      Dave Marchevsky authored
      Recent discussions around default kptr "trustedness" led to changes such
      as commit 6fcd486b ("bpf: Refactor RCU enforcement in the
      verifier."). One of the conclusions of those discussions, as expressed
      in code and comments in that patch, is that we'd like to move away from
      'raw' PTR_TO_BTF_ID without some type flag or other register state
      indicating trustedness. Although PTR_TRUSTED and PTR_UNTRUSTED flags mark
      this state explicitly, the verifier currently considers trustedness
      implied by other register state. For example, owning refs to graph
      collection nodes must have a nonzero ref_obj_id, so they pass the
      is_trusted_reg check despite having no explicit PTR_{UN}TRUSTED flag.
      This patch makes trustedness of non-owning refs to graph collection
      nodes explicit as well.
      
      By definition, non-owning refs are currently trusted. Although the ref
      has no control over pointee lifetime, due to non-owning ref clobbering
      rules (see invalidate_non_owning_refs) dereferencing a non-owning ref is
      safe in the critical section controlled by bpf_spin_lock associated with
      its owning collection.
      
      Note that the previous statement does not hold true for nodes with shared
      ownership due to the use-after-free issue that this series is
      addressing. True shared ownership was disabled by commit 7deca5ea
      ("bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed"),
      though, so the statement holds for now. Further patches in the series will change
      the trustedness state of non-owning refs before re-enabling
      bpf_refcount_acquire.
      
      Let's add NON_OWN_REF type flag to BPF_REG_TRUSTED_MODIFIERS such that a
      non-owning ref reg state would pass is_trusted_reg check. Somewhat
      surprisingly, this doesn't result in any change to user-visible
      functionality elsewhere in the verifier: graph collection nodes are all
      marked MEM_ALLOC, which tends to be handled in separate codepaths from
      "raw" PTR_TO_BTF_ID. Regardless, let's be explicit here and document the
      current state of things before changing it elsewhere in the series.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20230821193311.3290257-3-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2a6d50b5
    • Dave Marchevsky's avatar
      bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire · f0d991a0
      Dave Marchevsky authored
      It's straightforward to prove that kptr_struct_meta must be non-NULL for
      any valid call to these kfuncs:
      
        * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any
          struct in user BTF with a special field (e.g. bpf_refcount,
          {rb,list}_node). These are stored in that BTF's struct_meta_tab.
      
        * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes
          have {rb,list}_node field and that it's at the correct offset.
          Similarly, check_kfunc_args ensures bpf_refcount field existence for
          node param to bpf_refcount_acquire.
      
        * So a btf_struct_meta must have been created for the struct type of
          node param to these kfuncs
      
        * That BTF and its struct_meta_tab are guaranteed to still be around.
          Any arbitrary {rb,list} node the BPF program interacts with either:
          came from bpf_obj_new or a collection removal kfunc in the same
          program, in which case the BTF is associated with the program and
          still around; or came from bpf_kptr_xchg, in which case the BTF was
          associated with the map and is still around
      
      Instead of silently continuing with NULL struct_meta, which caused
      confusing bugs such as those addressed by commit 2140a6e3 ("bpf: Set
      kptr_struct_meta for node param to list and rbtree insert funcs"), let's
      error out. Then, at runtime, we can confidently say that the
      implementations of these kfuncs were given a non-NULL kptr_struct_meta,
      meaning that special-field-specific functionality like
      bpf_obj_free_fields and the bpf_obj_drop change introduced later in this
      series are guaranteed to execute.
      
      This patch doesn't change functionality, just makes it easier to reason
      about existing functionality.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20230821193311.3290257-2-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f0d991a0