• Daniel Borkmann's avatar
    bpf: add syscall side map freeze support · 87df15de
    Daniel Borkmann authored
    This patch adds a new BPF_MAP_FREEZE command which allows to
    "freeze" the map globally as read-only / immutable from syscall
    side.
    
    Map permission handling has been refactored into map_get_sys_perms()
    and drops FMODE_CAN_WRITE in case of locked map. Main use case is
    to allow for setting up .rodata sections from the BPF ELF which
    are loaded into the kernel, meaning BPF loader first allocates
    map, sets up map value by copying .rodata section into it and once
    complete, it calls BPF_MAP_FREEZE on the map fd to prevent further
    modifications.
    
    Right now BPF_MAP_FREEZE only takes map fd as argument while remaining
    bpf_attr members are required to be zero. I didn't add write-only
    locking here as counterpart since I don't have a concrete use-case
    for it on my side, and I think it makes probably more sense to wait
    once there is actually one. In that case bpf_attr can be extended
    as usual with a flag field and/or others where flag 0 means that
    we lock the map read-only hence this doesn't prevent to add further
    extensions to BPF_MAP_FREEZE upon need.
    
    A map creation flag like BPF_F_WRONCE was not considered for couple
    of reasons: i) in case of a generic implementation, a map can consist
    of more than just one element, thus there could be multiple map
    updates needed to set the map into a state where it can then be
    made immutable, ii) WRONCE indicates exact one-time write before
    it is then set immutable. A generic implementation would set a bit
    atomically on map update entry (if unset), indicating that every
    subsequent update from then onwards will need to bail out there.
    However, map updates can fail, so upon failure that flag would need
    to be unset again and the update attempt would need to be repeated
    for it to be eventually made immutable. While this can be made
    race-free, this approach feels less clean and in combination with
    reason i), it's not generic enough. A dedicated BPF_MAP_FREEZE
    command directly sets the flag and caller has the guarantee that
    map is immutable from syscall side upon successful return for any
    future syscall invocations that would alter the map state, which
    is also more intuitive from an API point of view. A command name
    such as BPF_MAP_LOCK has been avoided as it's too close with BPF
    map spin locks (which already has BPF_F_LOCK flag). BPF_MAP_FREEZE
    is so far only enabled for privileged users.
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    87df15de
syscall.c 65.7 KB