• Alexei Starovoitov's avatar
    bpf: introduce bpf_spin_lock · d83525ca
    Alexei Starovoitov authored
    Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let
    bpf program serialize access to other variables.
    
    Example:
    struct hash_elem {
        int cnt;
        struct bpf_spin_lock lock;
    };
    struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key);
    if (val) {
        bpf_spin_lock(&val->lock);
        val->cnt++;
        bpf_spin_unlock(&val->lock);
    }
    
    Restrictions and safety checks:
    - bpf_spin_lock is only allowed inside HASH and ARRAY maps.
    - BTF description of the map is mandatory for safety analysis.
    - bpf program can take one bpf_spin_lock at a time, since two or more can
      cause dead locks.
    - only one 'struct bpf_spin_lock' is allowed per map element.
      It drastically simplifies implementation yet allows bpf program to use
      any number of bpf_spin_locks.
    - when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed.
    - bpf program must bpf_spin_unlock() before return.
    - bpf program can access 'struct bpf_spin_lock' only via
      bpf_spin_lock()/bpf_spin_unlock() helpers.
    - load/store into 'struct bpf_spin_lock lock;' field is not allowed.
    - to use bpf_spin_lock() helper the BTF description of map value must be
      a struct and have 'struct bpf_spin_lock anyname;' field at the top level.
      Nested lock inside another struct is not allowed.
    - syscall map_lookup doesn't copy bpf_spin_lock field to user space.
    - syscall map_update and program map_update do not update bpf_spin_lock field.
    - bpf_spin_lock cannot be on the stack or inside networking packet.
      bpf_spin_lock can only be inside HASH or ARRAY map value.
    - bpf_spin_lock is available to root only and to all program types.
    - bpf_spin_lock is not allowed in inner maps of map-in-map.
    - ld_abs is not allowed inside spin_lock-ed region.
    - tracing progs and socket filter progs cannot use bpf_spin_lock due to
      insufficient preemption checks
    
    Implementation details:
    - cgroup-bpf class of programs can nest with xdp/tc programs.
      Hence bpf_spin_lock is equivalent to spin_lock_irqsave.
      Other solutions to avoid nested bpf_spin_lock are possible.
      Like making sure that all networking progs run with softirq disabled.
      spin_lock_irqsave is the simplest and doesn't add overhead to the
      programs that don't use it.
    - arch_spinlock_t is used when its implemented as queued_spin_lock
    - archs can force their own arch_spinlock_t
    - on architectures where queued_spin_lock is not available and
      sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used.
    - presence of bpf_spin_lock inside map value could have been indicated via
      extra flag during map_create, but specifying it via BTF is cleaner.
      It provides introspection for map key/value and reduces user mistakes.
    
    Next steps:
    - allow bpf_spin_lock in other map types (like cgroup local storage)
    - introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper
      to request kernel to grab bpf_spin_lock before rewriting the value.
      That will serialize access to map elements.
    Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    d83525ca
hashtab.c 37 KB