1. 11 Sep, 2023 1 commit
  2. 07 Sep, 2023 15 commits
  3. 04 Sep, 2023 5 commits
  4. 26 Aug, 2023 18 commits
  5. 25 Aug, 2023 1 commit
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-refcount-followups-3-bpf_mem_free_rcu-refcounted-nodes' · ec0ded2e
      Alexei Starovoitov authored
      Dave Marchevsky says:
      
      ====================
      BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes
      
      This series is the third of three (or more) followups to address issues
      in the bpf_refcount shared ownership implementation discovered by Kumar.
      This series addresses the use-after-free scenario described in [0]. The
      first followup series ([1]) also attempted to address the same
      use-after-free, but only got rid of the splat without addressing the
      underlying issue. After this series the underyling issue is fixed and
      bpf_refcount_acquire can be re-enabled.
      
      The main fix here is migration of bpf_obj_drop to use
      bpf_mem_free_rcu. To understand why this fixes the issue, let us consider
      the example interleaving provided by Kumar in [0]:
      
      CPU 0                                   CPU 1
      n = bpf_obj_new
      lock(lock1)
      bpf_rbtree_add(rbtree1, n)
      m = bpf_rbtree_acquire(n)
      unlock(lock1)
      
      kptr_xchg(map, m) // move to map
      // at this point, refcount = 2
      					m = kptr_xchg(map, NULL)
      					lock(lock2)
      lock(lock1)				bpf_rbtree_add(rbtree2, m)
      p = bpf_rbtree_first(rbtree1)			if (!RB_EMPTY_NODE) bpf_obj_drop_impl(m) // A
      bpf_rbtree_remove(rbtree1, p)
      unlock(lock1)
      bpf_obj_drop(p) // B
      					bpf_refcount_acquire(m) // use-after-free
      					...
      
      Before this series, bpf_obj_drop returns memory to the allocator using
      bpf_mem_free. At this point (B in the example) there might be some
      non-owning references to that memory which the verifier believes are valid,
      but where the underlying memory was reused for some other allocation.
      Commit 7793fc3b ("bpf: Make bpf_refcount_acquire fallible for
      non-owning refs") attempted to fix this by doing refcount_inc_non_zero
      on refcount_acquire in instead of refcount_inc under the assumption that
      preventing erroneous incr-on-0 would be sufficient. This isn't true,
      though: refcount_inc_non_zero must *check* if the refcount is zero, and
      the memory it's checking could have been reused, so the check may look
      at and incr random reused bytes.
      
      If we wait to reuse this memory until all non-owning refs that could
      point to it are gone, there is no possibility of this scenario
      happening. Migrating bpf_obj_drop to use bpf_mem_free_rcu for refcounted
      nodes accomplishes this.
      
      For such nodes, the validity of their underlying memory is now tied to
      RCU critical section. This matches MEM_RCU trustedness
      expectations, so the series takes the opportunity to more explicitly
      mark this trustedness state.
      
      The functional effects of trustedness changes here are rather small.
      This is largely due to local kptrs having separate verifier handling -
      with implicit trustedness assumptions - than arbitrary kptrs.
      Regardless, let's take the opportunity to move towards a world where
      trustedness is more explicitly handled.
      
      Changelog:
      
      v1 -> v2: https://lore.kernel.org/bpf/20230801203630.3581291-1-davemarchevsky@fb.com/
      
      Patch 1 ("bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire")
        * Spent some time experimenting with a better approach as per convo w/
          Yonghong on v1's patch. It started getting too complex, so left unchanged
          for now. Yonghong was fine with this approach being shipped.
      
      Patch 2 ("bpf: Consider non-owning refs trusted")
        * Add Yonghong ack
      Patch 3 ("bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes")
        * Add Yonghong ack
      Patch 4 ("bpf: Reenable bpf_refcount_acquire")
        * Add Yonghong ack
      
      Patch 5 ("bpf: Consider non-owning refs to refcounted nodes RCU protected")
        * Undo a nonfunctional whitespace change that shouldn't have been included
          (Yonghong)
        * Better logging message when complaining about rcu_read_{lock,unlock} in
          rbtree cb (Alexei)
        * Don't invalidate_non_owning_refs when processing bpf_rcu_read_unlock
          (Yonghong, Alexei)
      
      Patch 6 ("[RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS")
        * preempt_{disable,enable} in __bpf_spin_{lock,unlock} (Alexei)
          * Due to this we can consider spin_lock CS an RCU-sched read-side CS (per
            RCU/Design/Requirements/Requirements.rst). Modify in_rcu_cs accordingly.
        * no need to check for !in_rcu_cs before allowing bpf_spin_{lock,unlock}
          (Alexei)
        * RFC tag removed and renamed to "bpf: Allow bpf_spin_{lock,unlock} in
          sleepable progs"
      
      Patch 7 ("selftests/bpf: Add tests for rbtree API interaction in sleepable progs")
        * Remove "no explicit bpf_rcu_read_lock" failure test, add similar success
          test (Alexei)
      
      Summary of patch contents, with sub-bullets being leading questions and
      comments I think are worth reviewer attention:
      
        * Patches 1 and 2 are moreso documententation - and
          enforcement, in patch 1's case - of existing semantics / expectations
      
        * Patch 3 changes bpf_obj_drop behavior for refcounted nodes such that
          their underlying memory is not reused until RCU grace period elapses
          * Perhaps it makes sense to move to mem_free_rcu for _all_
            non-owning refs in the future, not just refcounted. This might
            allow custom non-owning ref lifetime + invalidation logic to be
            entirely subsumed by MEM_RCU handling. IMO this needs a bit more
            thought and should be tackled outside of a fix series, so it's not
            attempted here.
      
        * Patch 4 re-enables bpf_refcount_acquire as changes in patch 3 fix
          the remaining use-after-free
          * One might expect this patch to be last in the series, or last
            before selftest changes. Patches 5 and 6 don't change
            verification or runtime behavior for existing BPF progs, though.
      
        * Patch 5 brings the verifier's understanding of refcounted node
          trustedness in line with Patch 4's changes
      
        * Patch 6 allows some bpf_spin_{lock, unlock} calls in sleepable
          progs. Marked RFC for a few reasons:
          * bpf_spin_{lock,unlock} haven't been usable in sleepable progs
            since before the introduction of bpf linked list and rbtree. As
            such this feels more like a new feature that may not belong in
            this fixes series.
      
        * Patch 7 adds tests
      
        [0]: https://lore.kernel.org/bpf/atfviesiidev4hu53hzravmtlau3wdodm2vqs7rd7tnwft34e3@xktodqeqevir/
        [1]: https://lore.kernel.org/bpf/20230602022647.1571784-1-davemarchevsky@fb.com/
      ====================
      
      Link: https://lore.kernel.org/r/20230821193311.3290257-1-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ec0ded2e