1. 08 Jul, 2022 6 commits
  2. 07 Jul, 2022 2 commits
  3. 06 Jul, 2022 17 commits
  4. 05 Jul, 2022 3 commits
  5. 01 Jul, 2022 1 commit
  6. 30 Jun, 2022 8 commits
  7. 29 Jun, 2022 3 commits
    • Quentin Monnet's avatar
      bpftool: Probe for memcg-based accounting before bumping rlimit · f0cf642c
      Quentin Monnet authored
      Bpftool used to bump the memlock rlimit to make sure to be able to load
      BPF objects. After the kernel has switched to memcg-based memory
      accounting [0] in 5.11, bpftool has relied on libbpf to probe the system
      for memcg-based accounting support and for raising the rlimit if
      necessary [1]. But this was later reverted, because the probe would
      sometimes fail, resulting in bpftool not being able to load all required
      objects [2].
      
      Here we add a more efficient probe, in bpftool itself. We first lower
      the rlimit to 0, then we attempt to load a BPF object (and finally reset
      the rlimit): if the load succeeds, then memcg-based memory accounting is
      supported.
      
      This approach was earlier proposed for the probe in libbpf itself [3],
      but given that the library may be used in multithreaded applications,
      the probe could have undesirable consequences if one thread attempts to
      lock kernel memory while memlock rlimit is at 0. Since bpftool is
      single-threaded and the rlimit is process-based, this is fine to do in
      bpftool itself.
      
      This probe was inspired by the similar one from the cilium/ebpf Go
      library [4].
      
        [0] commit 97306be4 ("Merge branch 'switch to memcg-based memory accounting'")
        [1] commit a777e18f ("bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK")
        [2] commit 6b4384ff ("Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"")
        [3] https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/t/#u
        [4] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39Suggested-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Link: https://lore.kernel.org/bpf/20220629111351.47699-1-quentin@isovalent.com
      f0cf642c
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: cgroup_sock lsm flavor' · d17b557e
      Alexei Starovoitov authored
      Stanislav Fomichev says:
      
      ====================
      
      This series implements new lsm flavor for attaching per-cgroup programs to
      existing lsm hooks. The cgroup is taken out of 'current', unless
      the first argument of the hook is 'struct socket'. In this case,
      the cgroup association is taken out of socket. The attachment
      looks like a regular per-cgroup attachment: we add new BPF_LSM_CGROUP
      attach type which, together with attach_btf_id, signals per-cgroup lsm.
      Behind the scenes, we allocate trampoline shim program and
      attach to lsm. This program looks up cgroup from current/socket
      and runs cgroup's effective prog array. The rest of the per-cgroup BPF
      stays the same: hierarchy, local storage, retval conventions
      (return 1 == success).
      
      Current limitations:
      * haven't considered sleepable bpf; can be extended later on
      * not sure the verifier does the right thing with null checks;
        see latest selftest for details
      * total of 10 (global) per-cgroup LSM attach points
      
      v11:
      - Martin: address selftest memory & fd leaks
      - Martin: address moving into root (instead have another temp leaf cgroup)
      - Martin: move tools/include/uapi/linux/bpf.h change from libbpf patch
        into 'sync tools' patch
      
      v10:
      - Martin: reword commit message, drop outdated items
      - Martin: remove rcu_real_lock from __cgroup_bpf_run_lsm_current
      - Martin: remove CONFIG_BPF_LSM from cgroup_bpf_release
      - Martin: fix leaking shim reference in bpf_cgroup_link_release
      - Martin: WARN_ON_ONCE for bpf_trampoline_lookup in bpf_trampoline_unlink_cgroup_shim
      - Martin: sync tools/include/linux/btf_ids.h
      - Martin: move progs/flags closer to the places where they are used in __cgroup_bpf_query
      - Martin: remove sk_clone_security & sctp_bind_connect from bpf_lsm_locked_sockopt_hooks
      - Martin: try to determine vmlinux btf_id in bpftool
      - Martin: update tools header in a separate commit
      - Quentin: do libbpf_find_kernel_btf from the ops that need it
      - lkp@intel.com: another build failure
      
      v9:
      Major change since last version is the switch to bpf_setsockopt to
      change the socket state instead of letting the progs poke socket directly.
      This, in turn, highlights the challenge that we need to care about whether
      the socket is locked or not when we call bpf_setsockopt. (with my original
      example selftest, the hooks are running early in the init phase for this
      not to matter).
      
      For now, I've added two btf id lists:
      * hooks where we know the socket is locked and it's safe to call bpf_setsockopt
      * hooks where we know the socket is _not_ locked, but the hook works on
        the socket that's not yet exposed to userspace so it should be safe
        (for this mode, special new set of bpf_{s,g}etsockopt helpers
         is added; they don't have sock_owned_by_me check)
      
      Going forward, for the rest of the hooks, this might be a good motivation
      to expand lsm cgroup to support sleeping bpf and allow the callers to
      lock/unlock sockets or have a new bpf_setsockopt variant that does the
      locking.
      
      - ifdef around cleanup in cgroup_bpf_release
      - Andrii: a few nits in libbpf patches
      - Martin: remove unused btf_id_set_index
      - Martin: bring back refcnt for cgroup_atype
      - Martin: make __cgroup_bpf_query a bit more readable
      - Martin: expose dst_prog->aux->attach_btf as attach_btf_obj_id as well
      - Martin: reorg check_return_code path for BPF_LSM_CGROUP
      - Martin: return directly from check_helper_call (instead of goto err)
      - Martin: add note to new warning in check_return_code, print only for void hooks
      - Martin: remove confusing shim reuse
      - Martin: use bpf_{s,g}etsockopt instead of poking into socket data
      - Martin: use CONFIG_CGROUP_BPF in bpf_prog_alloc_no_stats/bpf_prog_free_deferred
      
      v8:
      - CI: fix compile issue
      - CI: fix broken bpf_cookie
      - Yonghong: remove __bpf_trampoline_unlink_prog comment
      - Yonghong: move cgroup_atype around to fill the gap
      - Yonghong: make bpf_lsm_find_cgroup_shim void
      - Yonghong: rename regs to args
      - Yonghong: remove if(current) check
      - Martin: move refcnt into bpf_link
      - Martin: move shim management to bpf_link ops
      - Martin: use cgroup_atype for shim only
      - Martin: go back to arrays for managing cgroup_atype(s)
      - Martin: export bpf_obj_id(aux->attach_btf)
      - Andrii: reorder SEC_DEF("lsm_cgroup+")
      - Andrii: OPTS_SET instead of OPTS_HAS
      - Andrii: rename attach_btf_func_id
      - Andrii: move into 1.0 map
      
      v7:
      - there were a lot of comments last time, hope I didn't forget anything,
        some of the bigger ones:
        - Martin: use/extend BTF_SOCK_TYPE_SOCKET
        - Martin: expose bpf_set_retval
        - Martin: reject 'return 0' at the verifier for 'void' hooks
        - Martin: prog_query returns all BPF_LSM_CGROUP, prog_info
          returns attach_btf_func_id
        - Andrii: split libbpf changes
        - Andrii: add field access test to test_progs, not test_verifier (still
          using asm though)
      - things that I haven't addressed, stating them here explicitly, let
        me know if some of these are still problematic:
        1. Andrii: exposing only link-based api: seems like the changes
           to support non-link-based ones are minimal, couple of lines,
           so seems like it worth having it?
        2. Alexei: applying cgroup_atype for all cgroup hooks, not only
           cgroup lsm: looks a bit harder to apply everywhere that I
           originally thought; with lsm cgroup, we have a shim_prog pointer where
           we store cgroup_atype; for non-lsm programs, we don't have a
           trace program where to store it, so we still need some kind
           of global table to map from "static" hook to "dynamic" slot.
           So I'm dropping this "can be easily extended" clause from the
           description for now. I have converted this whole machinery
           to an RCU-managed list to remove synchronize_rcu().
      - also note that I had to introduce new bpf_shim_tramp_link and
        moved refcnt there; we need something to manage new bpf_tramp_link
      
      v6:
      - remove active count & stats for shim program (Martin KaFai Lau)
      - remove NULL/error check for btf_vmlinux (Martin)
      - don't check cgroup_atype in bpf_cgroup_lsm_shim_release (Martin)
      - use old_prog (instead of passed one) in __cgroup_bpf_detach (Martin)
      - make sure attach_btf_id is the same in __cgroup_bpf_replace (Martin)
      - enable cgroup local storage and test it (Martin)
      - properly implement prog query and add bpftool & tests (Martin)
      - prohibit non-shared cgroup storage mode for BPF_LSM_CGROUP (Martin)
      
      v5:
      - __cgroup_bpf_run_lsm_socket remove NULL sock/sk checks (Martin KaFai Lau)
      - __cgroup_bpf_run_lsm_{socket,current} s/prog/shim_prog/ (Martin)
      - make sure bpf_lsm_find_cgroup_shim works for hooks without args (Martin)
      - __cgroup_bpf_attach make sure attach_btf_id is the same when replacing (Martin)
      - call bpf_cgroup_lsm_shim_release only for LSM_CGROUP (Martin)
      - drop BPF_LSM_CGROUP from bpf_attach_type_to_tramp (Martin)
      - drop jited check from cgroup_shim_find (Martin)
      - new patch to convert cgroup_bpf to hlist_node (Jakub Sitnicki)
      - new shim flavor for 'struct sock' + list of exceptions (Martin)
      
      v4:
      - fix build when jit is on but syscall is off
      
      v3:
      - add BPF_LSM_CGROUP to bpftool
      - use simple int instead of refcnt_t (to avoid use-after-free
        false positive)
      
      v2:
      - addressed build bot failures
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d17b557e
    • Stanislav Fomichev's avatar
      selftests/bpf: lsm_cgroup functional test · dca85aac
      Stanislav Fomichev authored
      Functional test that exercises the following:
      
      1. apply default sk_priority policy
      2. permit TX-only AF_PACKET socket
      3. cgroup attach/detach/replace
      4. reusing trampoline shim
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/r/20220628174314.1216643-12-sdf@google.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      dca85aac