• Kumar Kartikeya Dwivedi's avatar
    bpf: Allow storing unreferenced kptr in map · 61df10c7
    Kumar Kartikeya Dwivedi authored
    This commit introduces a new pointer type 'kptr' which can be embedded
    in a map value to hold a PTR_TO_BTF_ID stored by a BPF program during
    its invocation. When storing such a kptr, BPF program's PTR_TO_BTF_ID
    register must have the same type as in the map value's BTF, and loading
    a kptr marks the destination register as PTR_TO_BTF_ID with the correct
    kernel BTF and BTF ID.
    
    Such kptr are unreferenced, i.e. by the time another invocation of the
    BPF program loads this pointer, the object which the pointer points to
    may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are
    patched to PROBE_MEM loads by the verifier, it would safe to allow user
    to still access such invalid pointer, but passing such pointers into
    BPF helpers and kfuncs should not be permitted. A future patch in this
    series will close this gap.
    
    The flexibility offered by allowing programs to dereference such invalid
    pointers while being safe at runtime frees the verifier from doing
    complex lifetime tracking. As long as the user may ensure that the
    object remains valid, it can ensure data read by it from the kernel
    object is valid.
    
    The user indicates that a certain pointer must be treated as kptr
    capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using
    a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this
    information is recorded in the object BTF which will be passed into the
    kernel by way of map's BTF information. The name and kind from the map
    value BTF is used to look up the in-kernel type, and the actual BTF and
    BTF ID is recorded in the map struct in a new kptr_off_tab member. For
    now, only storing pointers to structs is permitted.
    
    An example of this specification is shown below:
    
    	#define __kptr __attribute__((btf_type_tag("kptr")))
    
    	struct map_value {
    		...
    		struct task_struct __kptr *task;
    		...
    	};
    
    Then, in a BPF program, user may store PTR_TO_BTF_ID with the type
    task_struct into the map, and then load it later.
    
    Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as
    the verifier cannot know whether the value is NULL or not statically, it
    must treat all potential loads at that map value offset as loading a
    possibly NULL pointer.
    
    Only BPF_LDX, BPF_STX, and BPF_ST (with insn->imm = 0 to denote NULL)
    are allowed instructions that can access such a pointer. On BPF_LDX, the
    destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX,
    it is checked whether the source register type is a PTR_TO_BTF_ID with
    same BTF type as specified in the map BTF. The access size must always
    be BPF_DW.
    
    For the map in map support, the kptr_off_tab for outer map is copied
    from the inner map's kptr_off_tab. It was chosen to do a deep copy
    instead of introducing a refcount to kptr_off_tab, because the copy only
    needs to be done when paramterizing using inner_map_fd in the map in map
    case, hence would be unnecessary for all other users.
    
    It is not permitted to use MAP_FREEZE command and mmap for BPF map
    having kptrs, similar to the bpf_timer case. A kptr also requires that
    BPF program has both read and write access to the map (hence both
    BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG are disallowed).
    
    Note that check_map_access must be called from both
    check_helper_mem_access and for the BPF instructions, hence the kptr
    check must distinguish between ACCESS_DIRECT and ACCESS_HELPER, and
    reject ACCESS_HELPER cases. We rename stack_access_src to bpf_access_src
    and reuse it for this purpose.
    Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20220424214901.2743946-2-memxor@gmail.com
    61df10c7
syscall.c 122 KB