Commit 367590b7 authored by Alexei Starovoitov's avatar Alexei Starovoitov

Merge branch 'Introduce typed pointer support in BPF maps'

Kumar Kartikeya Dwivedi says:

====================

This set enables storing pointers of a certain type in BPF map, and extends the
verifier to enforce type safety and lifetime correctness properties.

The infrastructure being added is generic enough for allowing storing any kind
of pointers whose type is available using BTF (user or kernel) in the future
(e.g. strongly typed memory allocation in BPF program), which are internally
tracked in the verifier as PTR_TO_BTF_ID, but for now the series limits them to
two kinds of pointers obtained from the kernel.

Obviously, use of this feature depends on map BTF.

1. Unreferenced kernel pointer

In this case, there are very few restrictions. The pointer type being stored
must match the type declared in the map value. However, such a pointer when
loaded from the map can only be dereferenced, but not passed to any in-kernel
helpers or kernel functions available to the program. This is because while the
verifier's exception handling mechanism coverts BPF_LDX to PROBE_MEM loads,
which are then handled specially by the JIT implementation, the same liberty is
not available to accesses inside the kernel. The pointer by the time it is
passed into a helper has no lifetime related guarantees about the object it is
pointing to, and may well be referencing invalid memory.

2. Referenced kernel pointer

This case imposes a lot of restrictions on the programmer, to ensure safety. To
transfer the ownership of a reference in the BPF program to the map, the user
must use the bpf_kptr_xchg helper, which returns the old pointer contained in
the map, as an acquired reference, and releases verifier state for the
referenced pointer being exchanged, as it moves into the map.

This a normal PTR_TO_BTF_ID that can be used with in-kernel helpers and kernel
functions callable by the program.

However, if BPF_LDX is used to load a referenced pointer from the map, it is
still not permitted to pass it to in-kernel helpers or kernel functions. To
obtain a reference usable with helpers, the user must invoke a kfunc helper
which returns a usable reference (which also must be eventually released before
BPF_EXIT, or moved into a map).

Since the load of the pointer (preserving data dependency ordering) must happen
inside the RCU read section, the kfunc helper will take a pointer to the map
value, which must point to the actual pointer of the object whose reference is
to be raised. The type will be verified from the BTF information of the kfunc,
as the prototype must be:

	T *func(T **, ... /* other arguments */);

Then, the verifier checks whether pointer at offset of the map value points to
the type T, and permits the call.

This convention is followed so that such helpers may also be called from
sleepable BPF programs, where RCU read lock is not necessarily held in the BPF
program context, hence necessiating the need to pass in a pointer to the actual
pointer to perform the load inside the RCU read section.

Notes
-----

 * C selftests require https://reviews.llvm.org/D119799 to pass.
 * Unlike BPF timers, kptr is not reset or freed on map_release_uref.
 * Referenced kptr storage is always treated as unsigned long * on kernel side,
   as BPF side cannot mutate it. The storage (8 bytes) is sufficient for both
   32-bit and 64-bit platforms.
 * Use of WRITE_ONCE to reset unreferenced kptr on 32-bit systems is fine, as
   the actual pointer is always word sized, so the store tearing into two 32-bit
   stores won't be a problem as the other half is always zeroed out.

Changelog:
----------
v5 -> v6
v5: https://lore.kernel.org/bpf/20220415160354.1050687-1-memxor@gmail.com

 * Address comments from Alexei
   * Drop 'Revisit stack usage' comment
   * Rename off_btf to kernel_btf
   * Add comment about searching using type from map BTF
   * Do kmemdup + btf_get instead of get + kmemdup + put
   * Add comment for btf_struct_ids_match
   * Add comment for assigning non-zero id for mark_ptr_or_null_reg
   * Rename PTR_RELEASE to OBJ_RELEASE
   * Rename BPF_MAP_OFF_DESC_TYPE_XXX_KPTR to BPF_KPTR_XXX
   * Remove unneeded likely/unlikely in cold functions
   * Fix other misc nits
 * Keep release_regno instead of replacing with bool + regno
 * Add a patch to prevent type match for first member when off == 0 for
   release functions (kfunc + BPF helpers)
 * Guard kptr/kptr_ref definition in libbpf header with __has_attribute
   to prevent selftests compilation error with old clang not support
   type tags

v4 -> v5
v4: https://lore.kernel.org/bpf/20220409093303.499196-1-memxor@gmail.com

 * Address comments from Joanne
   * Move __btf_member_bit_offset before strcmp
   * Move strcmp conditional on name to unref kptr patch
   * Directly return from btf_find_struct in patch 1
   * Use enum btf_field_type vs int field_type
   * Put btf and btf_id in off_desc in named struct 'kptr'
   * Switch order for BTF_FIELD_IGNORE check
   * Drop dead tab->nr_off = 0 store
   * Use i instead of tab->nr_off to btf_put on failure
   * Replace kzalloc + memcpy with kmemdup (kernel test robot)
   * Reject both BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG
   * Add logging statement for reject BPF_MODE(insn->code) != BPF_MEM
   * Rename off_desc -> kptr_off_desc in check_mem_access
   * Drop check for err, fallthrough to end of function
   * Remove is_release_function, use meta.release_regno to detect release
     function, release reference state, and remove check_release_regno
   * Drop off_desc->flags, use off_desc->type
   * Update comment for ARG_PTR_TO_KPTR
 * Distinguish between direct/indirect access to kptr
 * Drop check_helper_mem_access from process_kptr_func, check_mem_reg in kptr_get
 * Add verifier test for helper accessing kptr indirectly
 * Fix other misc nits, add Acked-by for patch 2

v3 -> v4
v3: https://lore.kernel.org/bpf/20220320155510.671497-1-memxor@gmail.com

 * Use btf_parse_kptrs, plural kptrs naming (Joanne, Andrii)
 * Remove unused parameters in check_map_kptr_access (Joanne)
 * Handle idx < info_cnt kludge using tmp variable (Andrii)
 * Validate tags always precede modifiers in BTF (Andrii)
   * Split out into https://lore.kernel.org/bpf/20220406004121.282699-1-memxor@gmail.com
 * Store u32 type_id in btf_field_info (Andrii)
 * Use base_type in map_kptr_match_type (Andrii)
 * Free	kptr_off_tab when not bpf_capable (Martin)
 * Use PTR_RELEASE flag instead of bools in bpf_func_proto (Joanne)
 * Drop extra reg->off and reg->ref_obj_id checks in map_kptr_match_type (Martin)
 * Use separate u32 and u8 arrays for offs and sizes in off_arr (Andrii)
 * Simplify and remove map->value_size sentinel in copy_map_value (Andrii)
 * Use sort_r to keep both arrays in sync while sorting (Andrii)
 * Rename check_and_free_timers_and_kptr to check_and_free_fields (Andrii)
 * Move dtor prototype checks to registration phase (Alexei)
 * Use ret variable for checking ASSERT_XXX, use shorter strings (Andrii)
 * Fix missing checks for other maps (Jiri)
 * Fix various other nits, and bugs noticed during self review

v2 -> v3
v2: https://lore.kernel.org/bpf/20220317115957.3193097-1-memxor@gmail.com

 * Address comments from Alexei
   * Set name, sz, align in btf_find_field
   * Do idx >= info_cnt check in caller of btf_find_field_*
     * Use extra element in the info_arr to make this safe
   * Remove while loop, reject extra tags
   * Remove cases of defensive programming
   * Move bpf_capable() check to map_check_btf
   * Put check_ptr_off_reg reordering hunk into separate patch
   * Warn for ref_ptr once
   * Make the meta.ref_obj_id == 0 case simpler to read
   * Remove kptr_percpu and kptr_user support, remove their tests
   * Store size of field at offset in off_arr
 * Fix BPF_F_NO_PREALLOC set wrongly for hash map in C selftest
 * Add missing check_mem_reg call for kptr_get kfunc arg#0 check

v1 -> v2
v1: https://lore.kernel.org/bpf/20220220134813.3411982-1-memxor@gmail.com

 * Address comments from Alexei
   * Rename bpf_btf_find_by_name_kind_all to bpf_find_btf_id
   * Reduce indentation level in that function
   * Always take reference regardless of module or vmlinux BTF
   * Also made it the same for btf_get_module_btf
   * Use kptr, kptr_ref, kptr_percpu, kptr_user type tags
   * Don't reserve tag namespace
   * Refactor btf_find_field to be side effect free, allocate and populate
     kptr_off_tab in caller
   * Move module reference to dtor patch
   * Remove support for BPF_XCHG, BPF_CMPXCHG insn
   * Introduce bpf_kptr_xchg helper
   * Embed offset array in struct bpf_map, populate and sort it once
   * Adjust copy_map_value to memcpy directly using this offset array
   * Removed size member from offset array to save space
 * Fix some problems pointed out by kernel test robot
 * Tidy selftests
 * Lots of other minor fixes
====================
Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
parents d9d31cf8 792c0a34
......@@ -23,6 +23,7 @@
#include <linux/slab.h>
#include <linux/percpu-refcount.h>
#include <linux/bpfptr.h>
#include <linux/btf.h>
struct bpf_verifier_env;
struct bpf_verifier_log;
......@@ -155,6 +156,41 @@ struct bpf_map_ops {
const struct bpf_iter_seq_info *iter_seq_info;
};
enum {
/* Support at most 8 pointers in a BPF map value */
BPF_MAP_VALUE_OFF_MAX = 8,
BPF_MAP_OFF_ARR_MAX = BPF_MAP_VALUE_OFF_MAX +
1 + /* for bpf_spin_lock */
1, /* for bpf_timer */
};
enum bpf_kptr_type {
BPF_KPTR_UNREF,
BPF_KPTR_REF,
};
struct bpf_map_value_off_desc {
u32 offset;
enum bpf_kptr_type type;
struct {
struct btf *btf;
struct module *module;
btf_dtor_kfunc_t dtor;
u32 btf_id;
} kptr;
};
struct bpf_map_value_off {
u32 nr_off;
struct bpf_map_value_off_desc off[];
};
struct bpf_map_off_arr {
u32 cnt;
u32 field_off[BPF_MAP_OFF_ARR_MAX];
u8 field_sz[BPF_MAP_OFF_ARR_MAX];
};
struct bpf_map {
/* The first two cachelines with read-mostly members of which some
* are also accessed in fast-path (e.g. ops, max_entries).
......@@ -171,6 +207,7 @@ struct bpf_map {
u64 map_extra; /* any per-map-type extra fields */
u32 map_flags;
int spin_lock_off; /* >=0 valid offset, <0 error */
struct bpf_map_value_off *kptr_off_tab;
int timer_off; /* >=0 valid offset, <0 error */
u32 id;
int numa_node;
......@@ -182,10 +219,7 @@ struct bpf_map {
struct mem_cgroup *memcg;
#endif
char name[BPF_OBJ_NAME_LEN];
bool bypass_spec_v1;
bool frozen; /* write-once; write-protected by freeze_mutex */
/* 14 bytes hole */
struct bpf_map_off_arr *off_arr;
/* The 3rd and 4th cacheline with misc members to avoid false sharing
* particularly with refcounting.
*/
......@@ -205,6 +239,8 @@ struct bpf_map {
bool jited;
bool xdp_has_frags;
} owner;
bool bypass_spec_v1;
bool frozen; /* write-once; write-protected by freeze_mutex */
};
static inline bool map_value_has_spin_lock(const struct bpf_map *map)
......@@ -217,43 +253,44 @@ static inline bool map_value_has_timer(const struct bpf_map *map)
return map->timer_off >= 0;
}
static inline bool map_value_has_kptrs(const struct bpf_map *map)
{
return !IS_ERR_OR_NULL(map->kptr_off_tab);
}
static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
{
if (unlikely(map_value_has_spin_lock(map)))
memset(dst + map->spin_lock_off, 0, sizeof(struct bpf_spin_lock));
if (unlikely(map_value_has_timer(map)))
memset(dst + map->timer_off, 0, sizeof(struct bpf_timer));
if (unlikely(map_value_has_kptrs(map))) {
struct bpf_map_value_off *tab = map->kptr_off_tab;
int i;
for (i = 0; i < tab->nr_off; i++)
*(u64 *)(dst + tab->off[i].offset) = 0;
}
}
/* copy everything but bpf_spin_lock and bpf_timer. There could be one of each. */
static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
{
u32 s_off = 0, s_sz = 0, t_off = 0, t_sz = 0;
u32 curr_off = 0;
int i;
if (unlikely(map_value_has_spin_lock(map))) {
s_off = map->spin_lock_off;
s_sz = sizeof(struct bpf_spin_lock);
}
if (unlikely(map_value_has_timer(map))) {
t_off = map->timer_off;
t_sz = sizeof(struct bpf_timer);
if (likely(!map->off_arr)) {
memcpy(dst, src, map->value_size);
return;
}
if (unlikely(s_sz || t_sz)) {
if (s_off < t_off || !s_sz) {
swap(s_off, t_off);
swap(s_sz, t_sz);
}
memcpy(dst, src, t_off);
memcpy(dst + t_off + t_sz,
src + t_off + t_sz,
s_off - t_off - t_sz);
memcpy(dst + s_off + s_sz,
src + s_off + s_sz,
map->value_size - s_off - s_sz);
} else {
memcpy(dst, src, map->value_size);
for (i = 0; i < map->off_arr->cnt; i++) {
u32 next_off = map->off_arr->field_off[i];
memcpy(dst + curr_off, src + curr_off, next_off - curr_off);
curr_off += map->off_arr->field_sz[i];
}
memcpy(dst + curr_off, src + curr_off, map->value_size - curr_off);
}
void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
bool lock_src);
......@@ -342,7 +379,18 @@ enum bpf_type_flag {
*/
MEM_PERCPU = BIT(4 + BPF_BASE_TYPE_BITS),
__BPF_TYPE_LAST_FLAG = MEM_PERCPU,
/* Indicates that the argument will be released. */
OBJ_RELEASE = BIT(5 + BPF_BASE_TYPE_BITS),
/* PTR is not trusted. This is only used with PTR_TO_BTF_ID, to mark
* unreferenced and referenced kptr loaded from map value using a load
* instruction, so that they can only be dereferenced but not escape the
* BPF program into the kernel (i.e. cannot be passed as arguments to
* kfunc or bpf helpers).
*/
PTR_UNTRUSTED = BIT(6 + BPF_BASE_TYPE_BITS),
__BPF_TYPE_LAST_FLAG = PTR_UNTRUSTED,
};
/* Max number of base types. */
......@@ -391,6 +439,7 @@ enum bpf_arg_type {
ARG_PTR_TO_STACK, /* pointer to stack */
ARG_PTR_TO_CONST_STR, /* pointer to a null terminated read-only string */
ARG_PTR_TO_TIMER, /* pointer to bpf_timer */
ARG_PTR_TO_KPTR, /* pointer to referenced kptr */
__BPF_ARG_TYPE_MAX,
/* Extended arg_types. */
......@@ -400,6 +449,7 @@ enum bpf_arg_type {
ARG_PTR_TO_SOCKET_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_SOCKET,
ARG_PTR_TO_ALLOC_MEM_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_ALLOC_MEM,
ARG_PTR_TO_STACK_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_STACK,
ARG_PTR_TO_BTF_ID_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_BTF_ID,
/* This must be the last entry. Its purpose is to ensure the enum is
* wide enough to hold the higher bits reserved for bpf_type_flag.
......@@ -1396,6 +1446,12 @@ void bpf_prog_put(struct bpf_prog *prog);
void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock);
void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u32 offset);
void bpf_map_free_kptr_off_tab(struct bpf_map *map);
struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map);
bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b);
void bpf_map_free_kptrs(struct bpf_map *map, void *map_value);
struct bpf_map *bpf_map_get(u32 ufd);
struct bpf_map *bpf_map_get_with_uref(u32 ufd);
struct bpf_map *__bpf_map_get(struct fd f);
......@@ -1692,7 +1748,8 @@ int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf,
u32 *next_btf_id, enum bpf_type_flag *flag);
bool btf_struct_ids_match(struct bpf_verifier_log *log,
const struct btf *btf, u32 id, int off,
const struct btf *need_btf, u32 need_type_id);
const struct btf *need_btf, u32 need_type_id,
bool strict);
int btf_distill_func_proto(struct bpf_verifier_log *log,
struct btf *btf,
......
......@@ -523,8 +523,7 @@ int check_ptr_off_reg(struct bpf_verifier_env *env,
const struct bpf_reg_state *reg, int regno);
int check_func_arg_reg_off(struct bpf_verifier_env *env,
const struct bpf_reg_state *reg, int regno,
enum bpf_arg_type arg_type,
bool is_release_func);
enum bpf_arg_type arg_type);
int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
u32 regno);
int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
......
......@@ -17,6 +17,7 @@ enum btf_kfunc_type {
BTF_KFUNC_TYPE_ACQUIRE,
BTF_KFUNC_TYPE_RELEASE,
BTF_KFUNC_TYPE_RET_NULL,
BTF_KFUNC_TYPE_KPTR_ACQUIRE,
BTF_KFUNC_TYPE_MAX,
};
......@@ -35,11 +36,19 @@ struct btf_kfunc_id_set {
struct btf_id_set *acquire_set;
struct btf_id_set *release_set;
struct btf_id_set *ret_null_set;
struct btf_id_set *kptr_acquire_set;
};
struct btf_id_set *sets[BTF_KFUNC_TYPE_MAX];
};
};
struct btf_id_dtor_kfunc {
u32 btf_id;
u32 kfunc_btf_id;
};
typedef void (*btf_dtor_kfunc_t)(void *);
extern const struct file_operations btf_fops;
void btf_get(struct btf *btf);
......@@ -123,6 +132,8 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
u32 expected_offset, u32 expected_size);
int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
int btf_find_timer(const struct btf *btf, const struct btf_type *t);
struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
const struct btf_type *t);
bool btf_type_is_void(const struct btf_type *t);
s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
......@@ -344,6 +355,9 @@ bool btf_kfunc_id_set_contains(const struct btf *btf,
enum btf_kfunc_type type, u32 kfunc_btf_id);
int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
const struct btf_kfunc_id_set *s);
s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id);
int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt,
struct module *owner);
#else
static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
u32 type_id)
......@@ -367,6 +381,15 @@ static inline int register_btf_kfunc_id_set(enum bpf_prog_type prog_type,
{
return 0;
}
static inline s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id)
{
return -ENOENT;
}
static inline int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors,
u32 add_cnt, struct module *owner)
{
return 0;
}
#endif
#endif
......@@ -5143,6 +5143,17 @@ union bpf_attr {
* The **hash_algo** is returned on success,
* **-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
* invalid arguments are passed.
*
* void *bpf_kptr_xchg(void *map_value, void *ptr)
* Description
* Exchange kptr at pointer *map_value* with *ptr*, and return the
* old value. *ptr* can be NULL, otherwise it must be a referenced
* pointer which will be released when this helper is called.
* Return
* The old value of kptr (which can be NULL). The returned pointer
* if not NULL, is a reference which must be released using its
* corresponding release function, or moved into a BPF map before
* program exit.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
......@@ -5339,6 +5350,7 @@ union bpf_attr {
FN(copy_from_user_task), \
FN(skb_set_tstamp), \
FN(ima_file_hash), \
FN(kptr_xchg), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
......
......@@ -287,10 +287,12 @@ static int array_map_get_next_key(struct bpf_map *map, void *key, void *next_key
return 0;
}
static void check_and_free_timer_in_array(struct bpf_array *arr, void *val)
static void check_and_free_fields(struct bpf_array *arr, void *val)
{
if (unlikely(map_value_has_timer(&arr->map)))
if (map_value_has_timer(&arr->map))
bpf_timer_cancel_and_free(val + arr->map.timer_off);
if (map_value_has_kptrs(&arr->map))
bpf_map_free_kptrs(&arr->map, val);
}
/* Called from syscall or from eBPF program */
......@@ -327,7 +329,7 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
copy_map_value_locked(map, val, value, false);
else
copy_map_value(map, val, value);
check_and_free_timer_in_array(array, val);
check_and_free_fields(array, val);
}
return 0;
}
......@@ -386,7 +388,8 @@ static void array_map_free_timers(struct bpf_map *map)
struct bpf_array *array = container_of(map, struct bpf_array, map);
int i;
if (likely(!map_value_has_timer(map)))
/* We don't reset or free kptr on uref dropping to zero. */
if (!map_value_has_timer(map))
return;
for (i = 0; i < array->map.max_entries; i++)
......@@ -398,6 +401,13 @@ static void array_map_free_timers(struct bpf_map *map)
static void array_map_free(struct bpf_map *map)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
int i;
if (map_value_has_kptrs(map)) {
for (i = 0; i < array->map.max_entries; i++)
bpf_map_free_kptrs(map, array->value + array->elem_size * i);
bpf_map_free_kptr_off_tab(map);
}
if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
bpf_array_free_percpu(array);
......
This diff is collapsed.
......@@ -238,7 +238,7 @@ static void htab_free_prealloced_timers(struct bpf_htab *htab)
u32 num_entries = htab->map.max_entries;
int i;
if (likely(!map_value_has_timer(&htab->map)))
if (!map_value_has_timer(&htab->map))
return;
if (htab_has_extra_elems(htab))
num_entries += num_possible_cpus();
......@@ -254,6 +254,25 @@ static void htab_free_prealloced_timers(struct bpf_htab *htab)
}
}
static void htab_free_prealloced_kptrs(struct bpf_htab *htab)
{
u32 num_entries = htab->map.max_entries;
int i;
if (!map_value_has_kptrs(&htab->map))
return;
if (htab_has_extra_elems(htab))
num_entries += num_possible_cpus();
for (i = 0; i < num_entries; i++) {
struct htab_elem *elem;
elem = get_htab_elem(htab, i);
bpf_map_free_kptrs(&htab->map, elem->key + round_up(htab->map.key_size, 8));
cond_resched();
}
}
static void htab_free_elems(struct bpf_htab *htab)
{
int i;
......@@ -725,12 +744,15 @@ static int htab_lru_map_gen_lookup(struct bpf_map *map,
return insn - insn_buf;
}
static void check_and_free_timer(struct bpf_htab *htab, struct htab_elem *elem)
static void check_and_free_fields(struct bpf_htab *htab,
struct htab_elem *elem)
{
if (unlikely(map_value_has_timer(&htab->map)))
bpf_timer_cancel_and_free(elem->key +
round_up(htab->map.key_size, 8) +
htab->map.timer_off);
void *map_value = elem->key + round_up(htab->map.key_size, 8);
if (map_value_has_timer(&htab->map))
bpf_timer_cancel_and_free(map_value + htab->map.timer_off);
if (map_value_has_kptrs(&htab->map))
bpf_map_free_kptrs(&htab->map, map_value);
}
/* It is called from the bpf_lru_list when the LRU needs to delete
......@@ -757,7 +779,7 @@ static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node)
hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
if (l == tgt_l) {
hlist_nulls_del_rcu(&l->hash_node);
check_and_free_timer(htab, l);
check_and_free_fields(htab, l);
break;
}
......@@ -829,7 +851,7 @@ static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l)
{
if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH)
free_percpu(htab_elem_get_ptr(l, htab->map.key_size));
check_and_free_timer(htab, l);
check_and_free_fields(htab, l);
kfree(l);
}
......@@ -857,7 +879,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
htab_put_fd_value(htab, l);
if (htab_is_prealloc(htab)) {
check_and_free_timer(htab, l);
check_and_free_fields(htab, l);
__pcpu_freelist_push(&htab->freelist, &l->fnode);
} else {
atomic_dec(&htab->count);
......@@ -1104,7 +1126,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
if (!htab_is_prealloc(htab))
free_htab_elem(htab, l_old);
else
check_and_free_timer(htab, l_old);
check_and_free_fields(htab, l_old);
}
ret = 0;
err:
......@@ -1114,7 +1136,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
static void htab_lru_push_free(struct bpf_htab *htab, struct htab_elem *elem)
{
check_and_free_timer(htab, elem);
check_and_free_fields(htab, elem);
bpf_lru_push_free(&htab->lru, &elem->lru_node);
}
......@@ -1419,8 +1441,14 @@ static void htab_free_malloced_timers(struct bpf_htab *htab)
struct hlist_nulls_node *n;
struct htab_elem *l;
hlist_nulls_for_each_entry(l, n, head, hash_node)
check_and_free_timer(htab, l);
hlist_nulls_for_each_entry(l, n, head, hash_node) {
/* We don't reset or free kptr on uref dropping to zero,
* hence just free timer.
*/
bpf_timer_cancel_and_free(l->key +
round_up(htab->map.key_size, 8) +
htab->map.timer_off);
}
cond_resched_rcu();
}
rcu_read_unlock();
......@@ -1430,7 +1458,8 @@ static void htab_map_free_timers(struct bpf_map *map)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
if (likely(!map_value_has_timer(&htab->map)))
/* We don't reset or free kptr on uref dropping to zero. */
if (!map_value_has_timer(&htab->map))
return;
if (!htab_is_prealloc(htab))
htab_free_malloced_timers(htab);
......@@ -1453,11 +1482,14 @@ static void htab_map_free(struct bpf_map *map)
* not have executed. Wait for them.
*/
rcu_barrier();
if (!htab_is_prealloc(htab))
if (!htab_is_prealloc(htab)) {
delete_all_elements(htab);
else
} else {
htab_free_prealloced_kptrs(htab);
prealloc_destroy(htab);
}
bpf_map_free_kptr_off_tab(map);
free_percpu(htab->extra_elems);
bpf_map_area_free(htab->buckets);
for (i = 0; i < HASHTAB_MAP_LOCK_COUNT; i++)
......
......@@ -1374,6 +1374,28 @@ void bpf_timer_cancel_and_free(void *val)
kfree(t);
}
BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)
{
unsigned long *kptr = map_value;
return xchg(kptr, (unsigned long)ptr);
}
/* Unlike other PTR_TO_BTF_ID helpers the btf_id in bpf_kptr_xchg()
* helper is determined dynamically by the verifier.
*/
#define BPF_PTR_POISON ((void *)((0xeB9FUL << 2) + POISON_POINTER_DELTA))
const struct bpf_func_proto bpf_kptr_xchg_proto = {
.func = bpf_kptr_xchg,
.gpl_only = false,
.ret_type = RET_PTR_TO_BTF_ID_OR_NULL,
.ret_btf_id = BPF_PTR_POISON,
.arg1_type = ARG_PTR_TO_KPTR,
.arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL | OBJ_RELEASE,
.arg2_btf_id = BPF_PTR_POISON,
};
const struct bpf_func_proto bpf_get_current_task_proto __weak;
const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
const struct bpf_func_proto bpf_probe_read_user_proto __weak;
......@@ -1452,6 +1474,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
return &bpf_timer_start_proto;
case BPF_FUNC_timer_cancel:
return &bpf_timer_cancel_proto;
case BPF_FUNC_kptr_xchg:
return &bpf_kptr_xchg_proto;
default:
break;
}
......
......@@ -52,6 +52,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
inner_map_meta->max_entries = inner_map->max_entries;
inner_map_meta->spin_lock_off = inner_map->spin_lock_off;
inner_map_meta->timer_off = inner_map->timer_off;
inner_map_meta->kptr_off_tab = bpf_map_copy_kptr_off_tab(inner_map);
if (inner_map->btf) {
btf_get(inner_map->btf);
inner_map_meta->btf = inner_map->btf;
......@@ -71,6 +72,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
void bpf_map_meta_free(struct bpf_map *map_meta)
{
bpf_map_free_kptr_off_tab(map_meta);
btf_put(map_meta->btf);
kfree(map_meta);
}
......@@ -83,7 +85,8 @@ bool bpf_map_meta_equal(const struct bpf_map *meta0,
meta0->key_size == meta1->key_size &&
meta0->value_size == meta1->value_size &&
meta0->timer_off == meta1->timer_off &&
meta0->map_flags == meta1->map_flags;
meta0->map_flags == meta1->map_flags &&
bpf_map_equal_kptr_off_tab(meta0, meta1);
}
void *bpf_map_fd_get_ptr(struct bpf_map *map,
......
......@@ -404,7 +404,7 @@ BPF_CALL_2(bpf_ringbuf_submit, void *, sample, u64, flags)
const struct bpf_func_proto bpf_ringbuf_submit_proto = {
.func = bpf_ringbuf_submit,
.ret_type = RET_VOID,
.arg1_type = ARG_PTR_TO_ALLOC_MEM,
.arg1_type = ARG_PTR_TO_ALLOC_MEM | OBJ_RELEASE,
.arg2_type = ARG_ANYTHING,
};
......@@ -417,7 +417,7 @@ BPF_CALL_2(bpf_ringbuf_discard, void *, sample, u64, flags)
const struct bpf_func_proto bpf_ringbuf_discard_proto = {
.func = bpf_ringbuf_discard,
.ret_type = RET_VOID,
.arg1_type = ARG_PTR_TO_ALLOC_MEM,
.arg1_type = ARG_PTR_TO_ALLOC_MEM | OBJ_RELEASE,
.arg2_type = ARG_ANYTHING,
};
......
......@@ -6,6 +6,7 @@
#include <linux/bpf_trace.h>
#include <linux/bpf_lirc.h>
#include <linux/bpf_verifier.h>
#include <linux/bsearch.h>
#include <linux/btf.h>
#include <linux/syscalls.h>
#include <linux/slab.h>
......@@ -29,6 +30,7 @@
#include <linux/pgtable.h>
#include <linux/bpf_lsm.h>
#include <linux/poll.h>
#include <linux/sort.h>
#include <linux/bpf-netns.h>
#include <linux/rcupdate_trace.h>
#include <linux/memcontrol.h>
......@@ -473,14 +475,128 @@ static void bpf_map_release_memcg(struct bpf_map *map)
}
#endif
static int bpf_map_kptr_off_cmp(const void *a, const void *b)
{
const struct bpf_map_value_off_desc *off_desc1 = a, *off_desc2 = b;
if (off_desc1->offset < off_desc2->offset)
return -1;
else if (off_desc1->offset > off_desc2->offset)
return 1;
return 0;
}
struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u32 offset)
{
/* Since members are iterated in btf_find_field in increasing order,
* offsets appended to kptr_off_tab are in increasing order, so we can
* do bsearch to find exact match.
*/
struct bpf_map_value_off *tab;
if (!map_value_has_kptrs(map))
return NULL;
tab = map->kptr_off_tab;
return bsearch(&offset, tab->off, tab->nr_off, sizeof(tab->off[0]), bpf_map_kptr_off_cmp);
}
void bpf_map_free_kptr_off_tab(struct bpf_map *map)
{
struct bpf_map_value_off *tab = map->kptr_off_tab;
int i;
if (!map_value_has_kptrs(map))
return;
for (i = 0; i < tab->nr_off; i++) {
if (tab->off[i].kptr.module)
module_put(tab->off[i].kptr.module);
btf_put(tab->off[i].kptr.btf);
}
kfree(tab);
map->kptr_off_tab = NULL;
}
struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map)
{
struct bpf_map_value_off *tab = map->kptr_off_tab, *new_tab;
int size, i;
if (!map_value_has_kptrs(map))
return ERR_PTR(-ENOENT);
size = offsetof(struct bpf_map_value_off, off[tab->nr_off]);
new_tab = kmemdup(tab, size, GFP_KERNEL | __GFP_NOWARN);
if (!new_tab)
return ERR_PTR(-ENOMEM);
/* Do a deep copy of the kptr_off_tab */
for (i = 0; i < tab->nr_off; i++) {
btf_get(tab->off[i].kptr.btf);
if (tab->off[i].kptr.module && !try_module_get(tab->off[i].kptr.module)) {
while (i--) {
if (tab->off[i].kptr.module)
module_put(tab->off[i].kptr.module);
btf_put(tab->off[i].kptr.btf);
}
kfree(new_tab);
return ERR_PTR(-ENXIO);
}
}
return new_tab;
}
bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b)
{
struct bpf_map_value_off *tab_a = map_a->kptr_off_tab, *tab_b = map_b->kptr_off_tab;
bool a_has_kptr = map_value_has_kptrs(map_a), b_has_kptr = map_value_has_kptrs(map_b);
int size;
if (!a_has_kptr && !b_has_kptr)
return true;
if (a_has_kptr != b_has_kptr)
return false;
if (tab_a->nr_off != tab_b->nr_off)
return false;
size = offsetof(struct bpf_map_value_off, off[tab_a->nr_off]);
return !memcmp(tab_a, tab_b, size);
}
/* Caller must ensure map_value_has_kptrs is true. Note that this function can
* be called on a map value while the map_value is visible to BPF programs, as
* it ensures the correct synchronization, and we already enforce the same using
* the bpf_kptr_xchg helper on the BPF program side for referenced kptrs.
*/
void bpf_map_free_kptrs(struct bpf_map *map, void *map_value)
{
struct bpf_map_value_off *tab = map->kptr_off_tab;
unsigned long *btf_id_ptr;
int i;
for (i = 0; i < tab->nr_off; i++) {
struct bpf_map_value_off_desc *off_desc = &tab->off[i];
unsigned long old_ptr;
btf_id_ptr = map_value + off_desc->offset;
if (off_desc->type == BPF_KPTR_UNREF) {
u64 *p = (u64 *)btf_id_ptr;
WRITE_ONCE(p, 0);
continue;
}
old_ptr = xchg(btf_id_ptr, 0);
off_desc->kptr.dtor((void *)old_ptr);
}
}
/* called from workqueue */
static void bpf_map_free_deferred(struct work_struct *work)
{
struct bpf_map *map = container_of(work, struct bpf_map, work);
security_bpf_map_free(map);
kfree(map->off_arr);
bpf_map_release_memcg(map);
/* implementation dependent freeing */
/* implementation dependent freeing, map_free callback also does
* bpf_map_free_kptr_off_tab, if needed.
*/
map->ops->map_free(map);
}
......@@ -640,7 +756,7 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
int err;
if (!map->ops->map_mmap || map_value_has_spin_lock(map) ||
map_value_has_timer(map))
map_value_has_timer(map) || map_value_has_kptrs(map))
return -ENOTSUPP;
if (!(vma->vm_flags & VM_SHARED))
......@@ -767,6 +883,84 @@ int map_check_no_btf(const struct bpf_map *map,
return -ENOTSUPP;
}
static int map_off_arr_cmp(const void *_a, const void *_b, const void *priv)
{
const u32 a = *(const u32 *)_a;
const u32 b = *(const u32 *)_b;
if (a < b)
return -1;
else if (a > b)
return 1;
return 0;
}
static void map_off_arr_swap(void *_a, void *_b, int size, const void *priv)
{
struct bpf_map *map = (struct bpf_map *)priv;
u32 *off_base = map->off_arr->field_off;
u32 *a = _a, *b = _b;
u8 *sz_a, *sz_b;
sz_a = map->off_arr->field_sz + (a - off_base);
sz_b = map->off_arr->field_sz + (b - off_base);
swap(*a, *b);
swap(*sz_a, *sz_b);
}
static int bpf_map_alloc_off_arr(struct bpf_map *map)
{
bool has_spin_lock = map_value_has_spin_lock(map);
bool has_timer = map_value_has_timer(map);
bool has_kptrs = map_value_has_kptrs(map);
struct bpf_map_off_arr *off_arr;
u32 i;
if (!has_spin_lock && !has_timer && !has_kptrs) {
map->off_arr = NULL;
return 0;
}
off_arr = kmalloc(sizeof(*map->off_arr), GFP_KERNEL | __GFP_NOWARN);
if (!off_arr)
return -ENOMEM;
map->off_arr = off_arr;
off_arr->cnt = 0;
if (has_spin_lock) {
i = off_arr->cnt;
off_arr->field_off[i] = map->spin_lock_off;
off_arr->field_sz[i] = sizeof(struct bpf_spin_lock);
off_arr->cnt++;
}
if (has_timer) {
i = off_arr->cnt;
off_arr->field_off[i] = map->timer_off;
off_arr->field_sz[i] = sizeof(struct bpf_timer);
off_arr->cnt++;
}
if (has_kptrs) {
struct bpf_map_value_off *tab = map->kptr_off_tab;
u32 *off = &off_arr->field_off[off_arr->cnt];
u8 *sz = &off_arr->field_sz[off_arr->cnt];
for (i = 0; i < tab->nr_off; i++) {
*off++ = tab->off[i].offset;
*sz++ = sizeof(u64);
}
off_arr->cnt += tab->nr_off;
}
if (off_arr->cnt == 1)
return 0;
sort_r(off_arr->field_off, off_arr->cnt, sizeof(off_arr->field_off[0]),
map_off_arr_cmp, map_off_arr_swap, map);
return 0;
}
static int map_check_btf(struct bpf_map *map, const struct btf *btf,
u32 btf_key_id, u32 btf_value_id)
{
......@@ -820,9 +1014,33 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
return -EOPNOTSUPP;
}
if (map->ops->map_check_btf)
map->kptr_off_tab = btf_parse_kptrs(btf, value_type);
if (map_value_has_kptrs(map)) {
if (!bpf_capable()) {
ret = -EPERM;
goto free_map_tab;
}
if (map->map_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG)) {
ret = -EACCES;
goto free_map_tab;
}
if (map->map_type != BPF_MAP_TYPE_HASH &&
map->map_type != BPF_MAP_TYPE_LRU_HASH &&
map->map_type != BPF_MAP_TYPE_ARRAY) {
ret = -EOPNOTSUPP;
goto free_map_tab;
}
}
if (map->ops->map_check_btf) {
ret = map->ops->map_check_btf(map, btf, key_type, value_type);
if (ret < 0)
goto free_map_tab;
}
return ret;
free_map_tab:
bpf_map_free_kptr_off_tab(map);
return ret;
}
......@@ -912,10 +1130,14 @@ static int map_create(union bpf_attr *attr)
attr->btf_vmlinux_value_type_id;
}
err = security_bpf_map_alloc(map);
err = bpf_map_alloc_off_arr(map);
if (err)
goto free_map;
err = security_bpf_map_alloc(map);
if (err)
goto free_map_off_arr;
err = bpf_map_alloc_id(map);
if (err)
goto free_map_sec;
......@@ -938,6 +1160,8 @@ static int map_create(union bpf_attr *attr)
free_map_sec:
security_bpf_map_free(map);
free_map_off_arr:
kfree(map->off_arr);
free_map:
btf_put(map->btf);
map->ops->map_free(map);
......@@ -1639,7 +1863,7 @@ static int map_freeze(const union bpf_attr *attr)
return PTR_ERR(map);
if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS ||
map_value_has_timer(map)) {
map_value_has_timer(map) || map_value_has_kptrs(map)) {
fdput(f);
return -ENOTSUPP;
}
......
This diff is collapsed.
......@@ -550,8 +550,13 @@ struct sock * noinline bpf_kfunc_call_test3(struct sock *sk)
return sk;
}
struct prog_test_member1 {
int a;
};
struct prog_test_member {
u64 c;
struct prog_test_member1 m;
int c;
};
struct prog_test_ref_kfunc {
......@@ -576,6 +581,12 @@ bpf_kfunc_call_test_acquire(unsigned long *scalar_ptr)
return &prog_test_struct;
}
noinline struct prog_test_member *
bpf_kfunc_call_memb_acquire(void)
{
return &prog_test_struct.memb;
}
noinline void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p)
{
}
......@@ -584,6 +595,16 @@ noinline void bpf_kfunc_call_memb_release(struct prog_test_member *p)
{
}
noinline void bpf_kfunc_call_memb1_release(struct prog_test_member1 *p)
{
}
noinline struct prog_test_ref_kfunc *
bpf_kfunc_call_test_kptr_get(struct prog_test_ref_kfunc **p, int a, int b)
{
return &prog_test_struct;
}
struct prog_test_pass1 {
int x0;
struct {
......@@ -667,8 +688,11 @@ BTF_ID(func, bpf_kfunc_call_test1)
BTF_ID(func, bpf_kfunc_call_test2)
BTF_ID(func, bpf_kfunc_call_test3)
BTF_ID(func, bpf_kfunc_call_test_acquire)
BTF_ID(func, bpf_kfunc_call_memb_acquire)
BTF_ID(func, bpf_kfunc_call_test_release)
BTF_ID(func, bpf_kfunc_call_memb_release)
BTF_ID(func, bpf_kfunc_call_memb1_release)
BTF_ID(func, bpf_kfunc_call_test_kptr_get)
BTF_ID(func, bpf_kfunc_call_test_pass_ctx)
BTF_ID(func, bpf_kfunc_call_test_pass1)
BTF_ID(func, bpf_kfunc_call_test_pass2)
......@@ -682,17 +706,26 @@ BTF_SET_END(test_sk_check_kfunc_ids)
BTF_SET_START(test_sk_acquire_kfunc_ids)
BTF_ID(func, bpf_kfunc_call_test_acquire)
BTF_ID(func, bpf_kfunc_call_memb_acquire)
BTF_ID(func, bpf_kfunc_call_test_kptr_get)
BTF_SET_END(test_sk_acquire_kfunc_ids)
BTF_SET_START(test_sk_release_kfunc_ids)
BTF_ID(func, bpf_kfunc_call_test_release)
BTF_ID(func, bpf_kfunc_call_memb_release)
BTF_ID(func, bpf_kfunc_call_memb1_release)
BTF_SET_END(test_sk_release_kfunc_ids)
BTF_SET_START(test_sk_ret_null_kfunc_ids)
BTF_ID(func, bpf_kfunc_call_test_acquire)
BTF_ID(func, bpf_kfunc_call_memb_acquire)
BTF_ID(func, bpf_kfunc_call_test_kptr_get)
BTF_SET_END(test_sk_ret_null_kfunc_ids)
BTF_SET_START(test_sk_kptr_acquire_kfunc_ids)
BTF_ID(func, bpf_kfunc_call_test_kptr_get)
BTF_SET_END(test_sk_kptr_acquire_kfunc_ids)
static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size,
u32 size, u32 headroom, u32 tailroom)
{
......@@ -1579,14 +1612,36 @@ int bpf_prog_test_run_syscall(struct bpf_prog *prog,
static const struct btf_kfunc_id_set bpf_prog_test_kfunc_set = {
.owner = THIS_MODULE,
.check_set = &test_sk_check_kfunc_ids,
.acquire_set = &test_sk_acquire_kfunc_ids,
.release_set = &test_sk_release_kfunc_ids,
.ret_null_set = &test_sk_ret_null_kfunc_ids,
.check_set = &test_sk_check_kfunc_ids,
.acquire_set = &test_sk_acquire_kfunc_ids,
.release_set = &test_sk_release_kfunc_ids,
.ret_null_set = &test_sk_ret_null_kfunc_ids,
.kptr_acquire_set = &test_sk_kptr_acquire_kfunc_ids
};
BTF_ID_LIST(bpf_prog_test_dtor_kfunc_ids)
BTF_ID(struct, prog_test_ref_kfunc)
BTF_ID(func, bpf_kfunc_call_test_release)
BTF_ID(struct, prog_test_member)
BTF_ID(func, bpf_kfunc_call_memb_release)
static int __init bpf_prog_test_run_init(void)
{
return register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_prog_test_kfunc_set);
const struct btf_id_dtor_kfunc bpf_prog_test_dtor_kfunc[] = {
{
.btf_id = bpf_prog_test_dtor_kfunc_ids[0],
.kfunc_btf_id = bpf_prog_test_dtor_kfunc_ids[1]
},
{
.btf_id = bpf_prog_test_dtor_kfunc_ids[2],
.kfunc_btf_id = bpf_prog_test_dtor_kfunc_ids[3],
},
};
int ret;
ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_prog_test_kfunc_set);
return ret ?: register_btf_id_dtor_kfuncs(bpf_prog_test_dtor_kfunc,
ARRAY_SIZE(bpf_prog_test_dtor_kfunc),
THIS_MODULE);
}
late_initcall(bpf_prog_test_run_init);
......@@ -6621,7 +6621,7 @@ static const struct bpf_func_proto bpf_sk_release_proto = {
.func = bpf_sk_release,
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_BTF_ID_SOCK_COMMON,
.arg1_type = ARG_PTR_TO_BTF_ID_SOCK_COMMON | OBJ_RELEASE,
};
BPF_CALL_5(bpf_xdp_sk_lookup_udp, struct xdp_buff *, ctx,
......
......@@ -5143,6 +5143,17 @@ union bpf_attr {
* The **hash_algo** is returned on success,
* **-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
* invalid arguments are passed.
*
* void *bpf_kptr_xchg(void *map_value, void *ptr)
* Description
* Exchange kptr at pointer *map_value* with *ptr*, and return the
* old value. *ptr* can be NULL, otherwise it must be a referenced
* pointer which will be released when this helper is called.
* Return
* The old value of kptr (which can be NULL). The returned pointer
* if not NULL, is a reference which must be released using its
* corresponding release function, or moved into a BPF map before
* program exit.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
......@@ -5339,6 +5350,7 @@ union bpf_attr {
FN(copy_from_user_task), \
FN(skb_set_tstamp), \
FN(ima_file_hash), \
FN(kptr_xchg), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
......
......@@ -149,6 +149,13 @@ enum libbpf_tristate {
#define __kconfig __attribute__((section(".kconfig")))
#define __ksym __attribute__((section(".ksyms")))
#if __has_attribute(btf_type_tag)
#define __kptr __attribute__((btf_type_tag("kptr")))
#define __kptr_ref __attribute__((btf_type_tag("kptr_ref")))
#else
#define __kptr
#define __kptr_ref
#endif
#ifndef ___bpf_concat
#define ___bpf_concat(a, b) a ## b
......
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
#include "map_kptr.skel.h"
void test_map_kptr(void)
{
struct map_kptr *skel;
int key = 0, ret;
char buf[24];
skel = map_kptr__open_and_load();
if (!ASSERT_OK_PTR(skel, "map_kptr__open_and_load"))
return;
ret = bpf_map_update_elem(bpf_map__fd(skel->maps.array_map), &key, buf, 0);
ASSERT_OK(ret, "array_map update");
ret = bpf_map_update_elem(bpf_map__fd(skel->maps.array_map), &key, buf, 0);
ASSERT_OK(ret, "array_map update2");
ret = bpf_map_update_elem(bpf_map__fd(skel->maps.hash_map), &key, buf, 0);
ASSERT_OK(ret, "hash_map update");
ret = bpf_map_delete_elem(bpf_map__fd(skel->maps.hash_map), &key);
ASSERT_OK(ret, "hash_map delete");
ret = bpf_map_update_elem(bpf_map__fd(skel->maps.hash_malloc_map), &key, buf, 0);
ASSERT_OK(ret, "hash_malloc_map update");
ret = bpf_map_delete_elem(bpf_map__fd(skel->maps.hash_malloc_map), &key);
ASSERT_OK(ret, "hash_malloc_map delete");
ret = bpf_map_update_elem(bpf_map__fd(skel->maps.lru_hash_map), &key, buf, 0);
ASSERT_OK(ret, "lru_hash_map update");
ret = bpf_map_delete_elem(bpf_map__fd(skel->maps.lru_hash_map), &key);
ASSERT_OK(ret, "lru_hash_map delete");
map_kptr__destroy(skel);
}
// SPDX-License-Identifier: GPL-2.0
#include <vmlinux.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_helpers.h>
struct map_value {
struct prog_test_ref_kfunc __kptr *unref_ptr;
struct prog_test_ref_kfunc __kptr_ref *ref_ptr;
};
struct array_map {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, int);
__type(value, struct map_value);
__uint(max_entries, 1);
} array_map SEC(".maps");
struct hash_map {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, int);
__type(value, struct map_value);
__uint(max_entries, 1);
} hash_map SEC(".maps");
struct hash_malloc_map {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, int);
__type(value, struct map_value);
__uint(max_entries, 1);
__uint(map_flags, BPF_F_NO_PREALLOC);
} hash_malloc_map SEC(".maps");
struct lru_hash_map {
__uint(type, BPF_MAP_TYPE_LRU_HASH);
__type(key, int);
__type(value, struct map_value);
__uint(max_entries, 1);
} lru_hash_map SEC(".maps");
#define DEFINE_MAP_OF_MAP(map_type, inner_map_type, name) \
struct { \
__uint(type, map_type); \
__uint(max_entries, 1); \
__uint(key_size, sizeof(int)); \
__uint(value_size, sizeof(int)); \
__array(values, struct inner_map_type); \
} name SEC(".maps") = { \
.values = { [0] = &inner_map_type }, \
}
DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_ARRAY_OF_MAPS, array_map, array_of_array_maps);
DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_ARRAY_OF_MAPS, hash_map, array_of_hash_maps);
DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_ARRAY_OF_MAPS, hash_malloc_map, array_of_hash_malloc_maps);
DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_ARRAY_OF_MAPS, lru_hash_map, array_of_lru_hash_maps);
DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_HASH_OF_MAPS, array_map, hash_of_array_maps);
DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_HASH_OF_MAPS, hash_map, hash_of_hash_maps);
DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_HASH_OF_MAPS, hash_malloc_map, hash_of_hash_malloc_maps);
DEFINE_MAP_OF_MAP(BPF_MAP_TYPE_HASH_OF_MAPS, lru_hash_map, hash_of_lru_hash_maps);
extern struct prog_test_ref_kfunc *bpf_kfunc_call_test_acquire(unsigned long *sp) __ksym;
extern struct prog_test_ref_kfunc *
bpf_kfunc_call_test_kptr_get(struct prog_test_ref_kfunc **p, int a, int b) __ksym;
extern void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) __ksym;
static void test_kptr_unref(struct map_value *v)
{
struct prog_test_ref_kfunc *p;
p = v->unref_ptr;
/* store untrusted_ptr_or_null_ */
v->unref_ptr = p;
if (!p)
return;
if (p->a + p->b > 100)
return;
/* store untrusted_ptr_ */
v->unref_ptr = p;
/* store NULL */
v->unref_ptr = NULL;
}
static void test_kptr_ref(struct map_value *v)
{
struct prog_test_ref_kfunc *p;
p = v->ref_ptr;
/* store ptr_or_null_ */
v->unref_ptr = p;
if (!p)
return;
if (p->a + p->b > 100)
return;
/* store NULL */
p = bpf_kptr_xchg(&v->ref_ptr, NULL);
if (!p)
return;
if (p->a + p->b > 100) {
bpf_kfunc_call_test_release(p);
return;
}
/* store ptr_ */
v->unref_ptr = p;
bpf_kfunc_call_test_release(p);
p = bpf_kfunc_call_test_acquire(&(unsigned long){0});
if (!p)
return;
/* store ptr_ */
p = bpf_kptr_xchg(&v->ref_ptr, p);
if (!p)
return;
if (p->a + p->b > 100) {
bpf_kfunc_call_test_release(p);
return;
}
bpf_kfunc_call_test_release(p);
}
static void test_kptr_get(struct map_value *v)
{
struct prog_test_ref_kfunc *p;
p = bpf_kfunc_call_test_kptr_get(&v->ref_ptr, 0, 0);
if (!p)
return;
if (p->a + p->b > 100) {
bpf_kfunc_call_test_release(p);
return;
}
bpf_kfunc_call_test_release(p);
}
static void test_kptr(struct map_value *v)
{
test_kptr_unref(v);
test_kptr_ref(v);
test_kptr_get(v);
}
SEC("tc")
int test_map_kptr(struct __sk_buff *ctx)
{
struct map_value *v;
int i, key = 0;
#define TEST(map) \
v = bpf_map_lookup_elem(&map, &key); \
if (!v) \
return 0; \
test_kptr(v)
TEST(array_map);
TEST(hash_map);
TEST(hash_malloc_map);
TEST(lru_hash_map);
#undef TEST
return 0;
}
SEC("tc")
int test_map_in_map_kptr(struct __sk_buff *ctx)
{
struct map_value *v;
int i, key = 0;
void *map;
#define TEST(map_in_map) \
map = bpf_map_lookup_elem(&map_in_map, &key); \
if (!map) \
return 0; \
v = bpf_map_lookup_elem(map, &key); \
if (!v) \
return 0; \
test_kptr(v)
TEST(array_of_array_maps);
TEST(array_of_hash_maps);
TEST(array_of_hash_malloc_maps);
TEST(array_of_lru_hash_maps);
TEST(hash_of_array_maps);
TEST(hash_of_hash_maps);
TEST(hash_of_hash_malloc_maps);
TEST(hash_of_lru_hash_maps);
#undef TEST
return 0;
}
char _license[] SEC("license") = "GPL";
......@@ -53,7 +53,7 @@
#define MAX_INSNS BPF_MAXINSNS
#define MAX_TEST_INSNS 1000000
#define MAX_FIXUPS 8
#define MAX_NR_MAPS 22
#define MAX_NR_MAPS 23
#define MAX_TEST_RUNS 8
#define POINTER_VALUE 0xcafe4all
#define TEST_DATA_LEN 64
......@@ -101,6 +101,7 @@ struct bpf_test {
int fixup_map_reuseport_array[MAX_FIXUPS];
int fixup_map_ringbuf[MAX_FIXUPS];
int fixup_map_timer[MAX_FIXUPS];
int fixup_map_kptr[MAX_FIXUPS];
struct kfunc_btf_id_pair fixup_kfunc_btf_id[MAX_FIXUPS];
/* Expected verifier log output for result REJECT or VERBOSE_ACCEPT.
* Can be a tab-separated sequence of expected strings. An empty string
......@@ -621,8 +622,15 @@ static int create_cgroup_storage(bool percpu)
* struct timer {
* struct bpf_timer t;
* };
* struct btf_ptr {
* struct prog_test_ref_kfunc __kptr *ptr;
* struct prog_test_ref_kfunc __kptr_ref *ptr;
* struct prog_test_member __kptr_ref *ptr;
* }
*/
static const char btf_str_sec[] = "\0bpf_spin_lock\0val\0cnt\0l\0bpf_timer\0timer\0t";
static const char btf_str_sec[] = "\0bpf_spin_lock\0val\0cnt\0l\0bpf_timer\0timer\0t"
"\0btf_ptr\0prog_test_ref_kfunc\0ptr\0kptr\0kptr_ref"
"\0prog_test_member";
static __u32 btf_raw_types[] = {
/* int */
BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
......@@ -638,6 +646,22 @@ static __u32 btf_raw_types[] = {
/* struct timer */ /* [5] */
BTF_TYPE_ENC(35, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 1), 16),
BTF_MEMBER_ENC(41, 4, 0), /* struct bpf_timer t; */
/* struct prog_test_ref_kfunc */ /* [6] */
BTF_STRUCT_ENC(51, 0, 0),
BTF_STRUCT_ENC(89, 0, 0), /* [7] */
/* type tag "kptr" */
BTF_TYPE_TAG_ENC(75, 6), /* [8] */
/* type tag "kptr_ref" */
BTF_TYPE_TAG_ENC(80, 6), /* [9] */
BTF_TYPE_TAG_ENC(80, 7), /* [10] */
BTF_PTR_ENC(8), /* [11] */
BTF_PTR_ENC(9), /* [12] */
BTF_PTR_ENC(10), /* [13] */
/* struct btf_ptr */ /* [14] */
BTF_STRUCT_ENC(43, 3, 24),
BTF_MEMBER_ENC(71, 11, 0), /* struct prog_test_ref_kfunc __kptr *ptr; */
BTF_MEMBER_ENC(71, 12, 64), /* struct prog_test_ref_kfunc __kptr_ref *ptr; */
BTF_MEMBER_ENC(71, 13, 128), /* struct prog_test_member __kptr_ref *ptr; */
};
static int load_btf(void)
......@@ -727,6 +751,25 @@ static int create_map_timer(void)
return fd;
}
static int create_map_kptr(void)
{
LIBBPF_OPTS(bpf_map_create_opts, opts,
.btf_key_type_id = 1,
.btf_value_type_id = 14,
);
int fd, btf_fd;
btf_fd = load_btf();
if (btf_fd < 0)
return -1;
opts.btf_fd = btf_fd;
fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, "test_map", 4, 24, 1, &opts);
if (fd < 0)
printf("Failed to create map with btf_id pointer\n");
return fd;
}
static char bpf_vlog[UINT_MAX >> 8];
static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
......@@ -754,6 +797,7 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
int *fixup_map_reuseport_array = test->fixup_map_reuseport_array;
int *fixup_map_ringbuf = test->fixup_map_ringbuf;
int *fixup_map_timer = test->fixup_map_timer;
int *fixup_map_kptr = test->fixup_map_kptr;
struct kfunc_btf_id_pair *fixup_kfunc_btf_id = test->fixup_kfunc_btf_id;
if (test->fill_helper) {
......@@ -947,6 +991,13 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
fixup_map_timer++;
} while (*fixup_map_timer);
}
if (*fixup_map_kptr) {
map_fds[22] = create_map_kptr();
do {
prog[*fixup_map_kptr].imm = map_fds[22];
fixup_map_kptr++;
} while (*fixup_map_kptr);
}
/* Patch in kfunc BTF IDs */
if (fixup_kfunc_btf_id->kfunc) {
......
......@@ -138,6 +138,26 @@
{ "bpf_kfunc_call_memb_release", 8 },
},
},
{
"calls: invalid kfunc call: don't match first member type when passed to release kfunc",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(),
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, BPF_PSEUDO_KFUNC_CALL, 0, 0),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.result = REJECT,
.errstr = "kernel function bpf_kfunc_call_memb1_release args#0 expected pointer",
.fixup_kfunc_btf_id = {
{ "bpf_kfunc_call_memb_acquire", 1 },
{ "bpf_kfunc_call_memb1_release", 5 },
},
},
{
"calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset",
.insns = {
......
This diff is collapsed.
......@@ -796,7 +796,7 @@
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.result = REJECT,
.errstr = "reference has not been acquired before",
.errstr = "R1 must be referenced when passed to release function",
},
{
/* !bpf_sk_fullsock(sk) is checked but !bpf_tcp_sock(sk) is not checked */
......
......@@ -417,7 +417,7 @@
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.result = REJECT,
.errstr = "reference has not been acquired before",
.errstr = "R1 must be referenced when passed to release function",
},
{
"bpf_sk_release(bpf_sk_fullsock(skb->sk))",
......@@ -436,7 +436,7 @@
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.result = REJECT,
.errstr = "reference has not been acquired before",
.errstr = "R1 must be referenced when passed to release function",
},
{
"bpf_sk_release(bpf_tcp_sock(skb->sk))",
......@@ -455,7 +455,7 @@
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.result = REJECT,
.errstr = "reference has not been acquired before",
.errstr = "R1 must be referenced when passed to release function",
},
{
"sk_storage_get(map, skb->sk, NULL, 0): value == NULL",
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment