1. 18 Nov, 2022 9 commits
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Allow locking bpf_spin_lock global variables · d0d78c1d
      Kumar Kartikeya Dwivedi authored
      Global variables reside in maps accessible using direct_value_addr
      callbacks, so giving each load instruction's rewrite a unique reg->id
      disallows us from holding locks which are global.
      
      The reason for preserving reg->id as a unique value for registers that
      may point to spin lock is that two separate lookups are treated as two
      separate memory regions, and any possible aliasing is ignored for the
      purposes of spin lock correctness.
      
      This is not great especially for the global variable case, which are
      served from maps that have max_entries == 1, i.e. they always lead to
      map values pointing into the same map value.
      
      So refactor the active_spin_lock into a 'active_lock' structure which
      represents the lock identity, and instead of the reg->id, remember two
      fields, a pointer and the reg->id. The pointer will store reg->map_ptr
      or reg->btf. It's only necessary to distinguish for the id == 0 case of
      global variables, but always setting the pointer to a non-NULL value and
      using the pointer to check whether the lock is held simplifies code in
      the verifier.
      
      This is generic enough to allow it for global variables, map lookups,
      and allocated objects at the same time.
      
      Note that while whether a lock is held can be answered by just comparing
      active_lock.ptr to NULL, to determine whether the register is pointing
      to the same held lock requires comparing _both_ ptr and id.
      
      Finally, as a result of this refactoring, pseudo load instructions are
      not given a unique reg->id, as they are doing lookup for the same map
      value (max_entries is never greater than 1).
      
      Essentially, we consider that the tuple of (ptr, id) will always be
      unique for any kind of argument to bpf_spin_{lock,unlock}.
      
      Note that this can be extended in the future to also remember offset
      used for locking, so that we can introduce multiple bpf_spin_lock fields
      in the same allocation.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221118015614.2013203-10-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d0d78c1d
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Allow locking bpf_spin_lock in allocated objects · 4e814da0
      Kumar Kartikeya Dwivedi authored
      Allow locking a bpf_spin_lock in an allocated object, in addition to
      already supported map value pointers. The handling is similar to that of
      map values, by just preserving the reg->id of PTR_TO_BTF_ID | MEM_ALLOC
      as well, and adjusting process_spin_lock to work with them and remember
      the id in verifier state.
      
      Refactor the existing process_spin_lock to work with PTR_TO_BTF_ID |
      MEM_ALLOC in addition to PTR_TO_MAP_VALUE. We need to update the
      reg_may_point_to_spin_lock which is used in mark_ptr_or_null_reg to
      preserve reg->id, that will be used in env->cur_state->active_spin_lock
      to remember the currently held spin lock.
      
      Also update the comment describing bpf_spin_lock implementation details
      to also talk about PTR_TO_BTF_ID | MEM_ALLOC type.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221118015614.2013203-9-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4e814da0
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Verify ownership relationships for user BTF types · 865ce09a
      Kumar Kartikeya Dwivedi authored
      Ensure that there can be no ownership cycles among different types by
      way of having owning objects that can hold some other type as their
      element. For instance, a map value can only hold allocated objects, but
      these are allowed to have another bpf_list_head. To prevent unbounded
      recursion while freeing resources, elements of bpf_list_head in local
      kptrs can never have a bpf_list_head which are part of list in a map
      value. Later patches will verify this by having dedicated BTF selftests.
      
      Also, to make runtime destruction easier, once btf_struct_metas is fully
      populated, we can stash the metadata of the value type directly in the
      metadata of the list_head fields, as that allows easier access to the
      value type's layout to destruct it at runtime from the btf_field entry
      of the list head itself.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221118015614.2013203-8-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      865ce09a
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Recognize lock and list fields in allocated objects · 8ffa5cc1
      Kumar Kartikeya Dwivedi authored
      Allow specifying bpf_spin_lock, bpf_list_head, bpf_list_node fields in a
      allocated object.
      
      Also update btf_struct_access to reject direct access to these special
      fields.
      
      A bpf_list_head allows implementing map-in-map style use cases, where an
      allocated object with bpf_list_head is linked into a list in a map
      value. This would require embedding a bpf_list_node, support for which
      is also included. The bpf_spin_lock is used to protect the bpf_list_head
      and other data.
      
      While we strictly don't require to hold a bpf_spin_lock while touching
      the bpf_list_head in such objects, as when have access to it, we have
      complete ownership of the object, the locking constraint is still kept
      and may be conditionally lifted in the future.
      
      Note that the specification of such types can be done just like map
      values, e.g.:
      
      struct bar {
      	struct bpf_list_node node;
      };
      
      struct foo {
      	struct bpf_spin_lock lock;
      	struct bpf_list_head head __contains(bar, node);
      	struct bpf_list_node node;
      };
      
      struct map_value {
      	struct bpf_spin_lock lock;
      	struct bpf_list_head head __contains(foo, node);
      };
      
      To recognize such types in user BTF, we build a btf_struct_metas array
      of metadata items corresponding to each BTF ID. This is done once during
      the btf_parse stage to avoid having to do it each time during the
      verification process's requirement to inspect the metadata.
      
      Moreover, the computed metadata needs to be passed to some helpers in
      future patches which requires allocating them and storing them in the
      BTF that is pinned by the program itself, so that valid access can be
      assumed to such data during program runtime.
      
      A key thing to note is that once a btf_struct_meta is available for a
      type, both the btf_record and btf_field_offs should be available. It is
      critical that btf_field_offs is available in case special fields are
      present, as we extensively rely on special fields being zeroed out in
      map values and allocated objects in later patches. The code ensures that
      by bailing out in case of errors and ensuring both are available
      together. If the record is not available, the special fields won't be
      recognized, so not having both is also fine (in terms of being a
      verification error and not a runtime bug).
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221118015614.2013203-7-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8ffa5cc1
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Introduce allocated objects support · 282de143
      Kumar Kartikeya Dwivedi authored
      Introduce support for representing pointers to objects allocated by the
      BPF program, i.e. PTR_TO_BTF_ID that point to a type in program BTF.
      This is indicated by the presence of MEM_ALLOC type flag in reg->type to
      avoid having to check btf_is_kernel when trying to match argument types
      in helpers.
      
      Whenever walking such types, any pointers being walked will always yield
      a SCALAR instead of pointer. In the future we might permit kptr inside
      such allocated objects (either kernel or program allocated), and it will
      then form a PTR_TO_BTF_ID of the respective type.
      
      For now, such allocated objects will always be referenced in verifier
      context, hence ref_obj_id == 0 for them is a bug. It is allowed to write
      to such objects, as long fields that are special are not touched
      (support for which will be added in subsequent patches). Note that once
      such a pointer is marked PTR_UNTRUSTED, it is no longer allowed to write
      to it.
      
      No PROBE_MEM handling is therefore done for loads into this type unless
      PTR_UNTRUSTED is part of the register type, since they can never be in
      an undefined state, and their lifetime will always be valid.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221118015614.2013203-6-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      282de143
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Populate field_offs for inner_map_meta · f73e601a
      Kumar Kartikeya Dwivedi authored
      Far too much code simply assumes that both btf_record and btf_field_offs
      are set to valid pointers together, or both are unset. They go together
      hand in hand as btf_record describes the special fields and
      btf_field_offs is compact representation for runtime copying/zeroing.
      
      It is very difficult to make this clear in the code when the only
      exception to this universal invariant is inner_map_meta which is used
      as reg->map_ptr in the verifier. This is simply a bug waiting to happen,
      as in verifier context we cannot easily distinguish if PTR_TO_MAP_VALUE
      is coming from an inner map, and if we ever end up using field_offs for
      any reason in the future, we will silently ignore the special fields for
      inner map case (as NULL is not an error but unset field_offs).
      
      Hence, simply copy field_offs from inner map together with btf_record.
      
      While at it, refactor code to unwind properly on errors with gotos.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221118015614.2013203-5-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f73e601a
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Free inner_map_meta when btf_record_dup fails · d4899572
      Kumar Kartikeya Dwivedi authored
      Whenever btf_record_dup fails, we must free inner_map_meta that was
      allocated before.
      
      This fixes a memory leak (in case of errors) during inner map creation.
      
      Fixes: aa3496ac ("bpf: Refactor kptr_off_tab into btf_record")
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221118015614.2013203-4-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d4899572
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Do btf_record_free outside map_free callback · d7f5ef65
      Kumar Kartikeya Dwivedi authored
      Since the commit being fixed, we now miss freeing btf_record for local
      storage maps which will have a btf_record populated in case they have
      bpf_spin_lock element.
      
      This was missed because I made the choice of offloading the job to free
      kptr_off_tab (now btf_record) to the map_free callback when adding
      support for kptrs.
      
      Revisiting the reason for this decision, there is the possibility that
      the btf_record gets used inside map_free callback (e.g. in case of maps
      embedding kptrs) to iterate over them and free them, hence doing it
      before the map_free callback would be leaking special field memory, and
      do invalid memory access. The btf_record keeps module references which
      is critical to ensure the dtor call made for referenced kptr is safe to
      do.
      
      If doing it after map_free callback, the map area is already freed, so
      we cannot access bpf_map structure anymore.
      
      To fix this and prevent such lapses in future, move bpf_map_free_record
      out of the map_free callback, and do it after map_free by remembering
      the btf_record pointer. There is no need to access bpf_map structure in
      that case, and we can avoid missing this case when support for new map
      types is added for other special fields.
      
      Since a btf_record and its btf_field_offs are used together, for
      consistency delay freeing of field_offs as well. While not a problem
      right now, a lot of code assumes that either both record and field_offs
      are set or none at once.
      
      Note that in case of map of maps (outer maps), inner_map_meta->record is
      only used during verification, not to free fields in map value, hence we
      simply keep the bpf_map_free_record call as is in bpf_map_meta_free and
      never touch map->inner_map_meta in bpf_map_free_deferred.
      
      Add a comment making note of these details.
      
      Fixes: db559117 ("bpf: Consolidate spin_lock, timer management into btf_record")
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221118015614.2013203-3-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d7f5ef65
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Fix early return in map_check_btf · c237bfa5
      Kumar Kartikeya Dwivedi authored
      Instead of returning directly with -EOPNOTSUPP for the timer case, we
      need to free the btf_record before returning to userspace.
      
      Fixes: db559117 ("bpf: Consolidate spin_lock, timer management into btf_record")
      Reported-by: default avatarDan Carpenter <error27@gmail.com>
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221118015614.2013203-2-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c237bfa5
  2. 17 Nov, 2022 5 commits
  3. 16 Nov, 2022 6 commits
  4. 15 Nov, 2022 10 commits
  5. 14 Nov, 2022 10 commits