1. 06 Mar, 2023 5 commits
  2. 04 Mar, 2023 16 commits
  3. 03 Mar, 2023 14 commits
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Use separate RCU callbacks for freeing selem · e768e3c5
      Kumar Kartikeya Dwivedi authored
      Martin suggested that instead of using a byte in the hole (which he has
      a use for in his future patch) in bpf_local_storage_elem, we can
      dispatch a different call_rcu callback based on whether we need to free
      special fields in bpf_local_storage_elem data. The free path, described
      in commit 9db44fdd ("bpf: Support kptrs in local storage maps"),
      only waits for call_rcu callbacks when there are special (kptrs, etc.)
      fields in the map value, hence it is necessary that we only access
      smap in this case.
      
      Therefore, dispatch different RCU callbacks based on the BPF map has a
      valid btf_record, which dereference and use smap's btf_record only when
      it is valid.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20230303141542.300068-1-memxor@gmail.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      e768e3c5
    • Daniel Borkmann's avatar
      Merge branch 'bpf-kptr-rcu' · db55174d
      Daniel Borkmann authored
      Alexei Starovoitov says:
      
      ====================
      v4->v5:
      fix typos, add acks.
      
      v3->v4:
      - patch 3 got much cleaner after BPF_KPTR_RCU was removed as suggested by David.
      
      - make KF_RCU stronger and require that bpf program checks for NULL
      before passing such pointers into kfunc. The prog has to do that anyway
      to access fields and it aligns with BTF_TYPE_SAFE_RCU allowlist.
      
      - New patch 6: refactor RCU enforcement in the verifier.
      The patches 2,3,6 are part of one feature.
      The 2 and 3 alone are incomplete, since RCU pointers are barely useful
      without bpf_rcu_read_lock/unlock in GCC compiled kernel.
      Even if GCC lands support for btf_type_tag today it will take time
      to mandate that version for kernel builds. Hence go with allow list
      approach. See patch 6 for details.
      This allows to start strict enforcement of TRUSTED | UNTRUSTED
      in one part of PTR_TO_BTF_ID accesses.
      One step closer to KF_TRUSTED_ARGS by default.
      
      v2->v3:
      - Instead of requiring bpf progs to tag fields with __kptr_rcu
      teach the verifier to infer RCU properties based on the type.
      BPF_KPTR_RCU becomes kernel internal type of struct btf_field.
      - Add patch 2 to tag cgroups and dfl_cgrp as trusted.
      That bug was spotted by BPF CI on clang compiler kernels,
      since patch 3 is doing:
      static bool in_rcu_cs(struct bpf_verifier_env *env)
      {
              return env->cur_state->active_rcu_lock || !env->prog->aux->sleepable;
      }
      which makes all non-sleepable programs behave like they have implicit
      rcu_read_lock around them. Which is the case in practice.
      It was fine on gcc compiled kernels where task->cgroup deference was producing
      PTR_TO_BTF_ID, but on clang compiled kernels task->cgroup deference was
      producing PTR_TO_BTF_ID | MEM_RCU | MAYBE_NULL, which is more correct,
      but selftests were failing. Patch 2 fixes this discrepancy.
      With few more patches like patch 2 we can make KF_TRUSTED_ARGS default
      for kfuncs and helpers.
      - Add comment in selftest patch 5 that it's verifier only check.
      
      v1->v2:
      Instead of agressively allow dereferenced kptr_rcu pointers into KF_TRUSTED_ARGS
      kfuncs only allow them into KF_RCU funcs.
      The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with
      KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier guarantees
      that the objects are valid and there is no use-after-free, but the pointers
      maybe NULL and pointee object's reference count could have reached zero, hence
      kfuncs must do != NULL check and consider refcnt==0 case when accessing such
      arguments.
      No changes in patch 1.
      Patches 2,3,4 adjusted with above behavior.
      
      v1:
      The __kptr_ref turned out to be too limited, since any "trusted" pointer access
      requires bpf_kptr_xchg() which is impractical when the same pointer needs
      to be dereferenced by multiple cpus.
      The __kptr "untrusted" only access isn't very useful in practice.
      Rename __kptr to __kptr_untrusted with eventual goal to deprecate it,
      and rename __kptr_ref to __kptr, since that looks to be more common use of kptrs.
      Introduce __kptr_rcu that can be directly dereferenced and used similar
      to native kernel C code.
      Once bpf_cpumask and task_struct kfuncs are converted to observe RCU GP
      when refcnt goes to zero, both __kptr and __kptr_untrusted can be deprecated
      and __kptr_rcu can become the only __kptr tag.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      db55174d
    • Alexei Starovoitov's avatar
      bpf: Refactor RCU enforcement in the verifier. · 6fcd486b
      Alexei Starovoitov authored
      bpf_rcu_read_lock/unlock() are only available in clang compiled kernels. Lack
      of such key mechanism makes it impossible for sleepable bpf programs to use RCU
      pointers.
      
      Allow bpf_rcu_read_lock/unlock() in GCC compiled kernels (though GCC doesn't
      support btf_type_tag yet) and allowlist certain field dereferences in important
      data structures like tast_struct, cgroup, socket that are used by sleepable
      programs either as RCU pointer or full trusted pointer (which is valid outside
      of RCU CS). Use BTF_TYPE_SAFE_RCU and BTF_TYPE_SAFE_TRUSTED macros for such
      tagging. They will be removed once GCC supports btf_type_tag.
      
      With that refactor check_ptr_to_btf_access(). Make it strict in enforcing
      PTR_TRUSTED and PTR_UNTRUSTED while deprecating old PTR_TO_BTF_ID without
      modifier flags. There is a chance that this strict enforcement might break
      existing programs (especially on GCC compiled kernels), but this cleanup has to
      start sooner than later. Note PTR_TO_CTX access still yields old deprecated
      PTR_TO_BTF_ID. Once it's converted to strict PTR_TRUSTED or PTR_UNTRUSTED the
      kfuncs and helpers will be able to default to KF_TRUSTED_ARGS. KF_RCU will
      remain as a weaker version of KF_TRUSTED_ARGS where obj refcnt could be 0.
      
      Adjust rcu_read_lock selftest to run on gcc and clang compiled kernels.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/bpf/20230303041446.3630-7-alexei.starovoitov@gmail.com
      6fcd486b
    • Alexei Starovoitov's avatar
      selftests/bpf: Tweak cgroup kfunc test. · 0047d834
      Alexei Starovoitov authored
      Adjust cgroup kfunc test to dereference RCU protected cgroup pointer
      as PTR_TRUSTED and pass into KF_TRUSTED_ARGS kfunc.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/bpf/20230303041446.3630-6-alexei.starovoitov@gmail.com
      0047d834
    • Alexei Starovoitov's avatar
    • Alexei Starovoitov's avatar
      bpf: Introduce kptr_rcu. · 20c09d92
      Alexei Starovoitov authored
      The life time of certain kernel structures like 'struct cgroup' is protected by RCU.
      Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps.
      The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU.
      Derefrence of other kptr-s returns PTR_UNTRUSTED.
      
      For example:
      struct map_value {
         struct cgroup __kptr *cgrp;
      };
      
      SEC("tp_btf/cgroup_mkdir")
      int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path)
      {
        struct cgroup *cg, *cg2;
      
        cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0
        bpf_kptr_xchg(&v->cgrp, cg);
      
        cg2 = v->cgrp; // This is new feature introduced by this patch.
        // cg2 is PTR_MAYBE_NULL | MEM_RCU.
        // When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero
      
        if (cg2)
          bpf_cgroup_ancestor(cg2, level); // safe to do.
      }
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/bpf/20230303041446.3630-4-alexei.starovoitov@gmail.com
      20c09d92
    • Alexei Starovoitov's avatar
      bpf: Mark cgroups and dfl_cgrp fields as trusted. · 8d093b4e
      Alexei Starovoitov authored
      bpf programs sometimes do:
      bpf_cgrp_storage_get(&map, task->cgroups->dfl_cgrp, ...);
      It is safe to do, because cgroups->dfl_cgrp pointer is set diring init and
      never changes. The task->cgroups is also never NULL. It is also set during init
      and will change when task switches cgroups. For any trusted task pointer
      dereference of cgroups and dfl_cgrp should yield trusted pointers. The verifier
      wasn't aware of this. Hence in gcc compiled kernels task->cgroups dereference
      was producing PTR_TO_BTF_ID without modifiers while in clang compiled kernels
      the verifier recognizes __rcu tag in cgroups field and produces
      PTR_TO_BTF_ID | MEM_RCU | MAYBE_NULL.
      Tag cgroups and dfl_cgrp as trusted to equalize clang and gcc behavior.
      When GCC supports btf_type_tag such tagging will done directly in the type.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Link: https://lore.kernel.org/bpf/20230303041446.3630-3-alexei.starovoitov@gmail.com
      8d093b4e
    • Alexei Starovoitov's avatar
      bpf: Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted. · 03b77e17
      Alexei Starovoitov authored
      __kptr meant to store PTR_UNTRUSTED kernel pointers inside bpf maps.
      The concept felt useful, but didn't get much traction,
      since bpf_rdonly_cast() was added soon after and bpf programs received
      a simpler way to access PTR_UNTRUSTED kernel pointers
      without going through restrictive __kptr usage.
      
      Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted to indicate
      its intended usage.
      The main goal of __kptr_untrusted was to read/write such pointers
      directly while bpf_kptr_xchg was a mechanism to access refcnted
      kernel pointers. The next patch will allow RCU protected __kptr access
      with direct read. At that point __kptr_untrusted will be deprecated.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/bpf/20230303041446.3630-2-alexei.starovoitov@gmail.com
      03b77e17
    • Tero Kristo's avatar
      selftests/bpf: Add absolute timer test · 944459e8
      Tero Kristo authored
      Add test for the absolute BPF timer under the existing timer tests. This
      will run the timer two times with 1us expiration time, and then re-arm
      the timer at ~35s in the future. At the end, it is verified that the
      absolute timer expired exactly two times.
      Signed-off-by: default avatarTero Kristo <tero.kristo@linux.intel.com>
      Link: https://lore.kernel.org/r/20230302114614.2985072-3-tero.kristo@linux.intel.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      944459e8
    • Tero Kristo's avatar
      bpf: Add support for absolute value BPF timers · f71f8530
      Tero Kristo authored
      Add a new flag BPF_F_TIMER_ABS that can be passed to bpf_timer_start()
      to start an absolute value timer instead of the default relative value.
      This makes the timer expire at an exact point in time, instead of a time
      with latencies induced by both the BPF and timer subsystems.
      Suggested-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: default avatarTero Kristo <tero.kristo@linux.intel.com>
      Link: https://lore.kernel.org/r/20230302114614.2985072-2-tero.kristo@linux.intel.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f71f8530
    • Dave Marchevsky's avatar
      selftests/bpf: Add -Wuninitialized flag to bpf prog flags · ec97a76f
      Dave Marchevsky authored
      Per C99 standard [0], Section 6.7.8, Paragraph 10:
      
        If an object that has automatic storage duration is not initialized
        explicitly, its value is indeterminate.
      
      And in the same document, in appendix "J.2 Undefined behavior":
      
        The behavior is undefined in the following circumstances:
        [...]
        The value of an object with automatic storage duration is used while
        it is indeterminate (6.2.4, 6.7.8, 6.8).
      
      This means that use of an uninitialized stack variable is undefined
      behavior, and therefore that clang can choose to do a variety of scary
      things, such as not generating bytecode for "bunch of useful code" in
      the below example:
      
        void some_func()
        {
          int i;
          if (!i)
            return;
          // bunch of useful code
        }
      
      To add insult to injury, if some_func above is a helper function for
      some BPF program, clang can choose to not generate an "exit" insn,
      causing verifier to fail with "last insn is not an exit or jmp". Going
      from that verification failure to the root cause of uninitialized use
      is certain to be frustrating.
      
      This patch adds -Wuninitialized to the cflags for selftest BPF progs and
      fixes up existing instances of uninitialized use.
      
        [0]: https://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdfSigned-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Cc: David Vernet <void@manifault.com>
      Cc: Tejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/r/20230303005500.1614874-1-davemarchevsky@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ec97a76f
    • Tejun Heo's avatar
      bpf: Make bpf_get_current_[ancestor_]cgroup_id() available for all program types · c501bf55
      Tejun Heo authored
      These helpers are safe to call from any context and there's no reason to
      restrict access to them. Remove them from bpf_trace and filter lists and add
      to bpf_base_func_proto() under perfmon_capable().
      
      v2: After consulting with Andrii, relocated in bpf_base_func_proto() so that
          they require bpf_capable() but not perfomon_capable() as it doesn't read
          from or affect others on the system.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/ZAD8QyoszMZiTzBY@slm.duckdns.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c501bf55
    • David Vernet's avatar
      bpf, docs: Fix final bpf docs build failure · cacad346
      David Vernet authored
      maps.rst in the BPF documentation links to the
      /userspace-api/ebpf/syscall document
      (Documentation/userspace-api/ebpf/syscall.rst). For some reason, if you
      try to reference the document with :doc:, the docs build emits the
      following warning:
      
      ./Documentation/bpf/maps.rst:13: WARNING: \
          unknown document: '/userspace-api/ebpf/syscall'
      
      It appears that other places in the docs tree also don't support using
      :doc:. Elsewhere in the BPF documentation, we just reference the kernel
      docs page directly. Let's do that here to clean up the last remaining
      noise in the docs build.
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/r/20230302183918.54190-2-void@manifault.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      cacad346
    • David Vernet's avatar
      bpf, docs: Fix link to netdev-FAQ target · d56b0c46
      David Vernet authored
      The BPF devel Q&A documentation page makes frequent reference to the
      netdev-QA page via the netdev-FAQ rst link. This link is currently
      broken, as is evidenced by the build output when making BPF docs:
      
      ./Documentation/bpf/bpf_devel_QA.rst:150: WARNING: undefined label: 'netdev-faq'
      ./Documentation/bpf/bpf_devel_QA.rst:206: WARNING: undefined label: 'netdev-faq'
      ./Documentation/bpf/bpf_devel_QA.rst:231: WARNING: undefined label: 'netdev-faq'
      ./Documentation/bpf/bpf_devel_QA.rst:396: WARNING: undefined label: 'netdev-faq'
      ./Documentation/bpf/bpf_devel_QA.rst:412: WARNING: undefined label: 'netdev-faq'
      
      Fix the links to point to the actual netdev-faq page.
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/r/20230302183918.54190-1-void@manifault.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d56b0c46
  4. 02 Mar, 2023 5 commits
    • Joanne Koong's avatar
      bpf: Fix bpf_dynptr_slice{_rdwr} to return NULL instead of 0 · c45eac53
      Joanne Koong authored
      Change bpf_dynptr_slice and bpf_dynptr_slice_rdwr to return NULL instead
      of 0, in accordance with the codebase guidelines.
      
      Fixes: 66e3a13e ("bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarJoanne Koong <joannelkoong@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20230302053014.1726219-1-joannelkoong@gmail.com
      c45eac53
    • Andrii Nakryiko's avatar
      Merge branch 'Make uprobe attachment APK aware' · b1d462bc
      Andrii Nakryiko authored
      Daniel Müller says:
      
      ====================
      
      On Android, APKs (android packages; zip packages with somewhat
      prescriptive contents) are first class citizens in the system: the
      shared objects contained in them don't exist in unpacked form on the
      file system. Rather, they are mmaped directly from within the archive
      and the archive is also what the kernel is aware of.
      
      For users that complicates the process of attaching a uprobe to a
      function contained in a shared object in one such APK: they'd have to
      find the byte offset of said function from the beginning of the archive.
      That is cumbersome to do manually and can be fragile, because various
      changes could invalidate said offset.
      
      That is why for uprobes inside ELF files (not inside an APK), commit
      d112c9ce249b ("libbpf: Support function name-based attach uprobes") added
      support for attaching to symbols by name. On Android, that mechanism
      currently does not work, because this logic is not APK aware.
      
      This patch set introduces first class support for attaching uprobes to
      functions inside ELF objects contained in APKs via function names. We
      add support for recognizing the following syntax for a binary path:
        <archive>!/<binary-in-archive>
      
        (e.g., /system/app/test-app.apk!/lib/arm64-v8a/libc++.so)
      
      This syntax is common in the Android eco system and used by tools such
      as simpleperf. It is also what is being proposed for bcc [0].
      
      If the user provides such a binary path, we find <binary-in-archive>
      (lib/arm64-v8a/libc++.so in the example) inside of <archive>
      (/system/app/test-app.apk). We perform the regular ELF offset search
      inside the binary and add that to the offset within the archive itself,
      to retrieve the offset at which to attach the uprobe.
      
      [0] https://github.com/iovisor/bcc/pull/4440
      
      Changelog
      ---------
      v3->v4:
      - use ERR_PTR instead of libbpf_err_ptr() in zip_archive_open()
      - eliminated err variable from elf_find_func_offset_from_archive()
      
      v2->v3:
      - adjusted zip_archive_open() to report errno
      - fixed provided libbpf_strlcpy() buffer size argument
      - adjusted find_cd() to handle errors better
      - use fewer local variables in get_entry_at_offset()
      
      v1->v2:
      - removed unaligned_* types
      - switched to using __u32 and __u16
      - switched to using errno constants instead of hard-coded negative values
      - added another pr_debug() message
      - shortened central_directory_* to cd_*
      - inlined cd_file_header_at_offset() function
      - bunch of syntactical changes
      ====================
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      b1d462bc
    • Daniel Müller's avatar
      libbpf: Add support for attaching uprobes to shared objects in APKs · c44fd845
      Daniel Müller authored
      This change adds support for attaching uprobes to shared objects located
      in APKs, which is relevant for Android systems where various libraries
      may reside in APKs. To make that happen, we extend the syntax for the
      "binary path" argument to attach to with that supported by various
      Android tools:
        <archive>!/<binary-in-archive>
      
      For example:
        /system/app/test-app/test-app.apk!/lib/arm64-v8a/libc++_shared.so
      
      APKs need to be specified via full path, i.e., we do not attempt to
      resolve mere file names by searching system directories.
      
      We cannot currently test this functionality end-to-end in an automated
      fashion, because it relies on an Android system being present, but there
      is no support for that in CI. I have tested the functionality manually,
      by creating a libbpf program containing a uretprobe, attaching it to a
      function inside a shared object inside an APK, and verifying the sanity
      of the returned values.
      Signed-off-by: default avatarDaniel Müller <deso@posteo.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20230301212308.1839139-4-deso@posteo.net
      c44fd845
    • Daniel Müller's avatar
      libbpf: Introduce elf_find_func_offset_from_file() function · 434fdcea
      Daniel Müller authored
      This change splits the elf_find_func_offset() function in two:
      elf_find_func_offset(), which now accepts an already opened Elf object
      instead of a path to a file that is to be opened, as well as
      elf_find_func_offset_from_file(), which opens a binary based on a
      path and then invokes elf_find_func_offset() on the Elf object. Having
      this split in responsibilities will allow us to call
      elf_find_func_offset() from other code paths on Elf objects that did not
      necessarily come from a file on disk.
      Signed-off-by: default avatarDaniel Müller <deso@posteo.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20230301212308.1839139-3-deso@posteo.net
      434fdcea
    • Daniel Müller's avatar
      libbpf: Implement basic zip archive parsing support · 1eebcb60
      Daniel Müller authored
      This change implements support for reading zip archives, including
      opening an archive, finding an entry based on its path and name in it,
      and closing it.
      The code was copied from https://github.com/iovisor/bcc/pull/4440, which
      implements similar functionality for bcc. The author confirmed that he
      is fine with this usage and the corresponding relicensing. I adjusted it
      to adhere to libbpf coding standards.
      Signed-off-by: default avatarDaniel Müller <deso@posteo.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarMichał Gregorczyk <michalgr@meta.com>
      Link: https://lore.kernel.org/bpf/20230301212308.1839139-2-deso@posteo.net
      1eebcb60