Commits · 770546ae9f4c1ae1ebcaf0874f0dd9631d77ec97 · Kirill Smelkov / linux

25 Mar, 2024 4 commits

bpf: implement insn_is_cast_user() helper for JITs · 770546ae

Puranjay Mohan authored Mar 24, 2024

Implement a helper function to check if an instruction is
addr_space_cast from as(0) to as(1). Use this helper in the x86 JIT.

Other JITs can use this helper when they add support for this instruction.
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Link: https://lore.kernel.org/r/20240324183226.29674-1-puranjay12@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

770546ae

bpf: Avoid get_kernel_nofault() to fetch kprobe entry IP · a8497506

Andrii Nakryiko authored Mar 19, 2024

get_kernel_nofault() (or, rather, underlying copy_from_kernel_nofault())
is not free and it does pop up in performance profiles when
kprobes are heavily utilized with CONFIG_X86_KERNEL_IBT=y config.

Let's avoid using it if we know that fentry_ip - 4 can't cross page
boundary. We do that by masking lowest 12 bits and checking if they are

Another benefit (and actually what caused a closer look at this part of
code) is that now LBR record is (typically) not wasted on
copy_from_kernel_nofault() call and code, which helps tools like
retsnoop that grab LBR records from inside BPF code in kretprobes.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/bpf/20240319212013.1046779-1-andrii@kernel.org

a8497506

selftests/bpf: Use start_server in bpf_tcp_ca · c29083f3

Geliang Tang authored Mar 25, 2024

To simplify the code, use BPF selftests helper start_server() in
bpf_tcp_ca.c instead of open-coding it. This helper is defined in
network_helpers.c, and exported in network_helpers.h, which is already
included in bpf_tcp_ca.c.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/9926a79118db27dd6d91c4854db011c599cabd0e.1711331517.git.tanggeliang@kylinos.cn

c29083f3

bpf: Sync uapi bpf.h to tools directory · 476a5e92

Yonghong Song authored Mar 24, 2024

There is a difference between kernel uapi bpf.h and tools
uapi bpf.h. There is no functionality difference, but let
us sync properly to make it easy for later bpf.h update.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240325033842.1693553-1-yonghong.song@linux.dev

476a5e92

22 Mar, 2024 3 commits

libbpf: Add new sec_def "sk_skb/verdict" · 61df5756

Yonghong Song authored Mar 19, 2024

The new sec_def specifies sk_skb program type with
BPF_SK_SKB_VERDICT attachment type. This way, libbpf
will set expected_attach_type properly for the program.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240319175412.2941149-1-yonghong.song@linux.dev

61df5756

selftests/bpf: Mark uprobe trigger functions with nocf_check attribute · af8d27bf

Jiri Olsa authored Mar 22, 2024

Some distros seem to enable the -fcf-protection=branch by default,
which breaks our setup on first instruction of uprobe trigger
functions and place there endbr64 instruction.

Marking them with nocf_check attribute to skip that.

Ignoring unknown attribute warning in gcc for bench objects, because
nocf_check can be used only when -fcf-protection=branch is enabled,
otherwise we get a warning and break compilation.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240322134936.1075395-1-jolsa@kernel.org

af8d27bf

selftests/bpf: Use syscall(SYS_gettid) instead of gettid() wrapper in bench · 1684d6eb

Alan Maguire authored Mar 22, 2024

With glibc 2.28, selftests compilation fails for benchs/bench_trigger.c:

benchs/bench_trigger.c: In function ‘inc_counter’:
benchs/bench_trigger.c:25:23: error: implicit declaration of function ‘gettid’; did you mean ‘getgid’? [-Werror=implicit-function-declaration]
   25 |                 tid = gettid();
      |                       ^~~~~~
      |                       getgid
cc1: all warnings being treated as errors

It appears support for the gettid() wrapper is variable across glibc
versions, so may be safer to use syscall(SYS_gettid) instead.

Fixes: 520fad2e ("selftests/bpf: scale benchmark counting by using per-CPU counters")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240322095728.95671-1-alan.maguire@oracle.com

1684d6eb

21 Mar, 2024 2 commits

bpf-next: Avoid goto in regs_refine_cond_op() · 4c2a26fc

Harishankar Vishwanathan authored Mar 20, 2024

In case of GE/GT/SGE/JST instructions, regs_refine_cond_op()
reuses the logic that does analysis of LE/LT/SLE/SLT instructions.
This commit avoids the use of a goto to perform the reuse.
Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240321002955.808604-1-harishankar.vishwanathan@gmail.com

4c2a26fc

bpftool: Clean up HOST_CFLAGS, HOST_LDFLAGS for bootstrap bpftool · cc9b22df

Quentin Monnet authored Mar 20, 2024

Bpftool's Makefile uses $(HOST_CFLAGS) to build the bootstrap version of
bpftool, in order to pick the flags for the host (where we run the
bootstrap version) and not for the target system (where we plan to run
the full bpftool binary). But we pass too much information through this
variable.

In particular, we set HOST_CFLAGS by copying most of the $(CFLAGS); but
we do this after the feature detection for bpftool, which means that
$(CFLAGS), hence $(HOST_CFLAGS), contain all macro definitions for using
the different optional features. For example, -DHAVE_LLVM_SUPPORT may be
passed to the $(HOST_CFLAGS), even though the LLVM disassembler is not
used in the bootstrap version, and the related library may even be
missing for the host architecture.

A similar thing happens with the $(LDFLAGS), that we use unchanged for
linking the bootstrap version even though they may contains flags to
link against additional libraries.

To address the $(HOST_CFLAGS) issue, we move the definition of
$(HOST_CFLAGS) earlier in the Makefile, before the $(CFLAGS) update
resulting from the feature probing - none of which being relevant to the
bootstrap version. To clean up the $(LDFLAGS) for the bootstrap version,
we introduce a dedicated $(HOST_LDFLAGS) variable that we base on
$(LDFLAGS), before the feature probing as well.

On my setup, the following macro and libraries are removed from the
compiler invocation to build bpftool after this patch:

  -DUSE_LIBCAP
  -DHAVE_LLVM_SUPPORT
  -I/usr/lib/llvm-17/include
  -D_GNU_SOURCE
  -D__STDC_CONSTANT_MACROS
  -D__STDC_FORMAT_MACROS
  -D__STDC_LIMIT_MACROS
  -lLLVM-17
  -L/usr/lib/llvm-17/lib

Another advantage of cleaning up these flags is that displaying
available features with "bpftool version" becomes more accurate for the
bootstrap bpftool, and no longer reflects the features detected (and
available only) for the final binary.

Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Quentin Monnet <qmo@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Message-ID: <20240320014103.45641-1-qmo@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

cc9b22df

20 Mar, 2024 9 commits

selftests/bpf: scale benchmark counting by using per-CPU counters · 520fad2e

Andrii Nakryiko authored Mar 15, 2024

When benchmarking with multiple threads (-pN, where N>1), we start
contending on single atomic counter that both BPF trigger benchmarks are
using, as well as "baseline" tests in user space (trig-base and
trig-uprobe-base benchmarks). As such, we start bottlenecking on
something completely irrelevant to benchmark at hand.

Scale counting up by using per-CPU counters on BPF side. On use space
side we do the next best thing: hash thread ID to approximate per-CPU
behavior. It seems to work quite well in practice.

To demonstrate the difference, I ran three benchmarks with 1, 2, 4, 8,
16, and 32 threads:
  - trig-uprobe-base (no syscalls, pure tight counting loop in user-space);
  - trig-base (get_pgid() syscall, atomic counter in user-space);
  - trig-fentry (syscall to trigger fentry program, atomic uncontended per-CPU
    counter on BPF side).

Command used:

  for b in uprobe-base base fentry; do \
    for p in 1 2 4 8 16 32; do \
      printf "%-11s %2d: %s\n" $b $p \
        "$(sudo ./bench -w2 -d5 -a -p$p trig-$b | tail -n1 | cut -d'(' -f1 | cut -d' ' -f3-)"; \
    done; \
  done

Before these changes, aggregate throughput across all threads doesn't
scale well with number of threads, it actually even falls sharply for
uprobe-base due to a very high contention:

  uprobe-base  1:  138.998 ± 0.650M/s
  uprobe-base  2:   70.526 ± 1.147M/s
  uprobe-base  4:   63.114 ± 0.302M/s
  uprobe-base  8:   54.177 ± 0.138M/s
  uprobe-base 16:   45.439 ± 0.057M/s
  uprobe-base 32:   37.163 ± 0.242M/s
  base         1:   16.940 ± 0.182M/s
  base         2:   19.231 ± 0.105M/s
  base         4:   21.479 ± 0.038M/s
  base         8:   23.030 ± 0.037M/s
  base        16:   22.034 ± 0.004M/s
  base        32:   18.152 ± 0.013M/s
  fentry       1:   14.794 ± 0.054M/s
  fentry       2:   17.341 ± 0.055M/s
  fentry       4:   23.792 ± 0.024M/s
  fentry       8:   21.557 ± 0.047M/s
  fentry      16:   21.121 ± 0.004M/s
  fentry      32:   17.067 ± 0.023M/s

After these changes, we see almost perfect linear scaling, as expected.
The sub-linear scaling when going from 8 to 16 threads is interesting
and consistent on my test machine, but I haven't investigated what is
causing it this peculiar slowdown (across all benchmarks, could be due
to hyperthreading effects, not sure).

  uprobe-base  1:  139.980 ± 0.648M/s
  uprobe-base  2:  270.244 ± 0.379M/s
  uprobe-base  4:  532.044 ± 1.519M/s
  uprobe-base  8: 1004.571 ± 3.174M/s
  uprobe-base 16: 1720.098 ± 0.744M/s
  uprobe-base 32: 3506.659 ± 8.549M/s
  base         1:   16.869 ± 0.071M/s
  base         2:   33.007 ± 0.092M/s
  base         4:   64.670 ± 0.203M/s
  base         8:  121.969 ± 0.210M/s
  base        16:  207.832 ± 0.112M/s
  base        32:  424.227 ± 1.477M/s
  fentry       1:   14.777 ± 0.087M/s
  fentry       2:   28.575 ± 0.146M/s
  fentry       4:   56.234 ± 0.176M/s
  fentry       8:  106.095 ± 0.385M/s
  fentry      16:  181.440 ± 0.032M/s
  fentry      32:  369.131 ± 0.693M/s
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Message-ID: <20240315213329.1161589-1-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

520fad2e

bpftool: Remove unnecessary source files from bootstrap version · e9a826dd

Quentin Monnet authored Mar 20, 2024

Commit d510296d ("bpftool: Use syscall/loader program in "prog load"
and "gen skeleton" command.") added new files to the list of objects to
compile in order to build the bootstrap version of bpftool. As far as I
can tell, these objects are unnecessary and were added by mistake; maybe
a draft version intended to add support for loading loader programs from
the bootstrap version. Anyway, we can remove these object files from the
list to make the bootstrap bpftool binary a tad smaller and faster to
build.

Fixes: d510296d ("bpftool: Use syscall/loader program in "prog load" and "gen skeleton" command.")
Signed-off-by: Quentin Monnet <qmo@kernel.org>
Message-ID: <20240320013457.44808-1-qmo@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

e9a826dd

bpftool: Enable libbpf logs when loading pid_iter in debug mode · be24a895

Quentin Monnet authored Mar 20, 2024

When trying to load the pid_iter BPF program used to iterate over the
PIDs of the processes holding file descriptors to BPF links, we would
unconditionally silence libbpf in order to keep the output clean if the
kernel does not support iterators and loading fails.

Although this is the desirable behaviour in most cases, this may hide
bugs in the pid_iter program that prevent it from loading, and it makes
it hard to debug such load failures, even in "debug" mode. Instead, it
makes more sense to print libbpf's logs when we pass the -d|--debug flag
to bpftool, so that users get the logs to investigate failures without
having to edit bpftool's source code.
Signed-off-by: Quentin Monnet <qmo@kernel.org>
Message-ID: <20240320012241.42991-1-qmo@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

be24a895

Merge branch 'bpf-raw-tracepoint-support-for-bpf-cookie' · 2e244a72

Alexei Starovoitov authored Mar 19, 2024

Andrii Nakryiko says:

====================
BPF raw tracepoint support for BPF cookie

Add ability to specify and retrieve BPF cookie for raw tracepoint programs.
Both BTF-aware (SEC("tp_btf")) and non-BTF-aware (SEC("raw_tp")) are
supported, as they are exactly the same at runtime.

This issue recently came up in production use cases, where custom tried to
switch from slower classic tracepoints to raw tracepoints and ran into this
limitation. Luckily, it's not that hard to support this for raw_tp programs.

v2->v3:
  - s/bpf_raw_tp_open/bpf_raw_tracepoint_open_opts/ (Alexei, Eduard);
v1->v2:
  - fixed type definition for stubs of bpf_probe_{register,unregister};
  - added __u32 :u32 and aligned raw_tp fields (Jiri);
  - added Stanislav's ack.
====================

Link: https://lore.kernel.org/r/20240319233852.1977493-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

2e244a72

selftests/bpf: add raw_tp/tp_btf BPF cookie subtests · 51146ff0

Andrii Nakryiko authored Mar 19, 2024

Add test validating BPF cookie can be passed during raw_tp/tp_btf
attachment and can be retried at runtime with bpf_get_attach_cookie()
helper.
Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Message-ID: <20240319233852.1977493-6-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

51146ff0

libbpf: add support for BPF cookie for raw_tp/tp_btf programs · 36ffb202

Andrii Nakryiko authored Mar 19, 2024

Wire up BPF cookie passing or raw_tp and tp_btf programs, both in
low-level and high-level APIs.
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Message-ID: <20240319233852.1977493-5-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

36ffb202

bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs · 68ca5d4e

Andrii Nakryiko authored Mar 19, 2024

Wire up BPF cookie for raw tracepoint programs (both BTF and non-BTF
aware variants). This brings them up to part w.r.t. BPF cookie usage
with classic tracepoint and fentry/fexit programs.
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Message-ID: <20240319233852.1977493-4-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

68ca5d4e

bpf: pass whole link instead of prog when triggering raw tracepoint · d4dfc570

Andrii Nakryiko authored Mar 19, 2024

Instead of passing prog as an argument to bpf_trace_runX() helpers, that
are called from tracepoint triggering calls, store BPF link itself
(struct bpf_raw_tp_link for raw tracepoints). This will allow to pass
extra information like BPF cookie into raw tracepoint registration.

Instead of replacing `struct bpf_prog *prog = __data;` with
corresponding `struct bpf_raw_tp_link *link = __data;` assignment in
`__bpf_trace_##call` I just passed `__data` through into underlying
bpf_trace_runX() call. This works well because we implicitly cast `void *`,
and it also avoids naming clashes with arguments coming from
tracepoint's "proto" list. We could have run into the same problem with
"prog", we just happened to not have a tracepoint that has "prog" input
argument. We are less lucky with "link", as there are tracepoints using
"link" argument name already. So instead of trying to avoid naming
conflicts, let's just remove intermediate local variable. It doesn't
hurt readibility, it's either way a bit of a maze of calls and macros,
that requires careful reading.
Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Message-ID: <20240319233852.1977493-3-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

d4dfc570

bpf: flatten bpf_probe_register call chain · 6b9c2950

Andrii Nakryiko authored Mar 19, 2024

bpf_probe_register() and __bpf_probe_register() have identical
signatures and bpf_probe_register() just redirect to
__bpf_probe_register(). So get rid of this extra function call step to
simplify following the source code.

It has no difference at runtime due to inlining, of course.
Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Message-ID: <20240319233852.1977493-2-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

6b9c2950

19 Mar, 2024 8 commits

selftests/bpf: Prevent client connect before server bind in test_tc_tunnel.sh · f803bcf9

Alessandro Carminati (Red Hat) authored Mar 14, 2024

In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:

   # ip gre none gso
   # encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
   # test basic connectivity
   # Ncat: Connection refused.

The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
Signed-off-by: Alessandro Carminati (Red Hat) <alessandro.carminati@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240314105911.213411-1-alessandro.carminati@gmail.com

f803bcf9

Merge branch 'current_pid_tgid-for-all-prog-types' · 437ffcb0

Andrii Nakryiko authored Mar 19, 2024

Yonghong Song says:

====================
current_pid_tgid() for all prog types

Currently bpf_get_current_pid_tgid() is allowed in tracing, cgroup
and sk_msg progs while bpf_get_ns_current_pid_tgid() is only allowed
in tracing progs.

We have an internal use case where for an application running
in a container (with pid namespace), user wants to get
the pid associated with the pid namespace in a cgroup bpf
program. Besides cgroup, the only prog type, supporting
bpf_get_current_pid_tgid() but not bpf_get_ns_current_pid_tgid(),
is sk_msg.

But actually both bpf_get_current_pid_tgid() and
bpf_get_ns_current_pid_tgid() helpers do not reveal kernel internal
data and there is no reason that they cannot be used in other
program types. This patch just did this and enabled these
two helpers for all program types.

Patch 1 added the kernel support and patches 2-5 added
the test for cgroup and sk_msg.

Change logs:
  v1 -> v2:
    - allow bpf_get_[ns_]current_pid_tgid() for all prog types.
    - for network related selftests, using netns.
====================

Link: https://lore.kernel.org/r/20240315184849.2974556-1-yonghong.song@linux.devSigned-off-by: Andrii Nakryiko <andrii@kernel.org>

437ffcb0

selftests/bpf: Add a sk_msg prog bpf_get_ns_current_pid_tgid() test · 4c195ee4

Yonghong Song authored Mar 15, 2024

Add a sk_msg bpf program test where the program is running in a pid
namespace. The test is successful:
  #165/4   ns_current_pid_tgid/new_ns_sk_msg:OK
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240315184915.2976718-1-yonghong.song@linux.dev

4c195ee4

selftests/bpf: Add a cgroup prog bpf_get_ns_current_pid_tgid() test · 87ade6cd

Yonghong Song authored Mar 15, 2024

Add a cgroup bpf program test where the bpf program is running
in a pid namespace. The test is successfully:
  #165/3   ns_current_pid_tgid/new_ns_cgrp:OK
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240315184910.2976522-1-yonghong.song@linux.dev

87ade6cd

selftests/bpf: Refactor out some functions in ns_current_pid_tgid test · 4d4bd29e

Yonghong Song authored Mar 15, 2024

Refactor some functions in both user space code and bpf program
as these functions are used by later cgroup/sk_msg tests.
Another change is to mark tp program optional loading as later
patches will use optional loading as well since they have quite
different attachment and testing logic.

There is no functionality change.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240315184904.2976123-1-yonghong.song@linux.dev

4d4bd29e

selftests/bpf: Replace CHECK with ASSERT_* in ns_current_pid_tgid test · 84239a24

Yonghong Song authored Mar 15, 2024

Replace CHECK in selftest ns_current_pid_tgid with recommended ASSERT_* style.
I also shortened subtest name as the prefix of subtest name is covered
by the test name already.

This patch does fix a testing issue. Currently even if bss->user_{pid,tgid}
is not correct, the test still passed since the clone func returns 0.
I fixed it to return a non-zero value if bss->user_{pid,tgid} is incorrect.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240315184859.2975543-1-yonghong.song@linux.dev

84239a24

bpf: Allow helper bpf_get_[ns_]current_pid_tgid() for all prog types · eb166e52

Yonghong Song authored Mar 15, 2024

Currently bpf_get_current_pid_tgid() is allowed in tracing, cgroup
and sk_msg progs while bpf_get_ns_current_pid_tgid() is only allowed
in tracing progs.

We have an internal use case where for an application running
in a container (with pid namespace), user wants to get
the pid associated with the pid namespace in a cgroup bpf
program. Currently, cgroup bpf progs already allow
bpf_get_current_pid_tgid(). Let us allow bpf_get_ns_current_pid_tgid()
as well.

With auditing the code, bpf_get_current_pid_tgid() is also used
by sk_msg prog. But there are no side effect to expose these two
helpers to all prog types since they do not reveal any kernel specific
data. The detailed discussion is in [1].

So with this patch, both bpf_get_current_pid_tgid() and bpf_get_ns_current_pid_tgid()
are put in bpf_base_func_proto(), making them available to all
program types.

  [1] https://lore.kernel.org/bpf/20240307232659.1115872-1-yonghong.song@linux.dev/Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240315184854.2975190-1-yonghong.song@linux.dev

eb166e52

bpf/lpm_trie: Inline longest_prefix_match for fastpath · 1a4a0cb7

Jesper Dangaard Brouer authored Mar 18, 2024

The BPF map type LPM (Longest Prefix Match) is used heavily
in production by multiple products that have BPF components.
Perf data shows trie_lookup_elem() and longest_prefix_match()
being part of kernels perf top.

For every level in the LPM tree trie_lookup_elem() calls out
to longest_prefix_match().  The compiler is free to inline this
call, but chooses not to inline, because other slowpath callers
(that can be invoked via syscall) exists like trie_update_elem(),
trie_delete_elem() or trie_get_next_key().

 bcc/tools/funccount -Ti 1 'trie_lookup_elem|longest_prefix_match.isra.0'
 FUNC                                    COUNT
 trie_lookup_elem                       664945
 longest_prefix_match.isra.0           8101507

Observation on a single random machine shows a factor 12 between
the two functions. Given an average of 12 levels in the trie being
searched.

This patch force inlining longest_prefix_match(), but only for
the lookup fastpath to balance object instruction size.

In production with AMD CPUs, measuring the function latency of
'trie_lookup_elem' (bcc/tools/funclatency) we are seeing an improvement
function latency reduction 7-8% with this patch applied (to production
kernels 6.6 and 6.1). Analyzing perf data, we can explain this rather
large improvement due to reducing the overhead for AMD side-channel
mitigation SRSO (Speculative Return Stack Overflow).

Fixes: fb3bd914 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/171076828575.2141737.18370644069389889027.stgit@firesoul

1a4a0cb7

18 Mar, 2024 4 commits

bpf: Check return from set_memory_rox() · c733239f

Christophe Leroy authored Mar 16, 2024

arch_protect_bpf_trampoline() and alloc_new_pack() call
set_memory_rox() which can fail, leading to unprotected memory.

Take into account return from set_memory_rox() function and add
__must_check flag to arch_protect_bpf_trampoline().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/fe1c163c83767fde5cab31d209a4a6be3ddb3a73.1710574353.git.christophe.leroy@csgroup.euSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

c733239f

bpf: Remove arch_unprotect_bpf_trampoline() · e3362acd

Christophe Leroy authored Mar 16, 2024

Last user of arch_unprotect_bpf_trampoline() was removed by
commit 187e2af0 ("bpf: struct_ops supports more than one page for
trampolines.")

Remove arch_unprotect_bpf_trampoline()
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Fixes: 187e2af0 ("bpf: struct_ops supports more than one page for trampolines.")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/42c635bb54d3af91db0f9b85d724c7c290069f67.1710574353.git.christophe.leroy@csgroup.euSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

e3362acd

libbpbpf: Check bpf_map/bpf_program fd validity · 7b30c296

Mykyta Yatsenko authored Mar 18, 2024

libbpf creates bpf_program/bpf_map structs for each program/map that
user defines, but it allows to disable creating/loading those objects in
kernel, in that case they won't have associated file descriptor
(fd < 0). Such functionality is used for backward compatibility
with some older kernels.

Nothing prevents users from passing these maps or programs with no
kernel counterpart to libbpf APIs. This change introduces explicit
checks for kernel objects existence, aiming to improve visibility of
those edge cases and provide meaningful warnings to users.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240318131808.95959-1-yatsenko@meta.com

7b30c296

bpf: Remove unnecessary err < 0 check in bpf_struct_ops_map_update_elem · 7f3edd0c

Martin KaFai Lau authored Mar 15, 2024

There is a "if (err)" check earlier, so the "if (err < 0)"
check that this patch removing is unnecessary. It was my overlook
when making adjustments to the bpf_struct_ops_prepare_trampoline()
such that the caller does not have to worry about the new page when
the function returns error.

Fixes: 187e2af0 ("bpf: struct_ops supports more than one page for trampolines.")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20240315192112.2825039-1-martin.lau@linux.dev

7f3edd0c

15 Mar, 2024 4 commits

selftests/bpf: Remove second semicolon · 4c8644f8

Colin Ian King authored Mar 15, 2024

There are statements with two semicolons. Remove the second one, it
is redundant.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240315092654.2431062-1-colin.i.king@gmail.com

4c8644f8

bpf: Take return from set_memory_rox() into account with bpf_jit_binary_lock_ro() · e60adf51

Christophe Leroy authored Mar 08, 2024

set_memory_rox() can fail, leaving memory unprotected.

Check return and bail out when bpf_jit_binary_lock_ro() returns
an error.

Link: https://github.com/KSPP/linux/issues/7Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: linux-hardening@vger.kernel.org <linux-hardening@vger.kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Puranjay Mohan <puranjay12@gmail.com>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> # s390x
Acked-by: Tiezhu Yang <yangtiezhu@loongson.cn> # LoongArch
Reviewed-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> # MIPS Part
Message-ID: <036b6393f23a2032ce75a1c92220b2afcb798d5d.1709850515.git.christophe.leroy@csgroup.eu>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

e60adf51

bpf: Take return from set_memory_ro() into account with bpf_prog_lock_ro() · 7d2cc63e

Christophe Leroy authored Mar 08, 2024

set_memory_ro() can fail, leaving memory unprotected.

Check its return and take it into account as an error.

Link: https://github.com/KSPP/linux/issues/7Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: linux-hardening@vger.kernel.org <linux-hardening@vger.kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Message-ID: <286def78955e04382b227cb3e4b6ba272a7442e3.1709850515.git.christophe.leroy@csgroup.eu>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

7d2cc63e

bpf: preserve sleepable bit in subprog info · 4d8926a0

Andrii Nakryiko authored Mar 13, 2024

Copy over main program's sleepable bit into subprog's info. This might
be important for, e.g., freplace cases.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Message-ID: <20240314000127.3881569-1-andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

4d8926a0

14 Mar, 2024 6 commits

Merge branch 'ignore-additional-fields-in-the-struct_ops-maps-in-an-updated-version' · 6cda7e17

Andrii Nakryiko authored Mar 14, 2024

Kui-Feng Lee says:

====================
Ignore additional fields in the struct_ops maps in an updated version.

According to an offline discussion, it would be beneficial to
implement a backward-compatible method for struct_ops types with
additional fields that are not present in older kernels.

This patchset accepts additional fields of a struct_ops map with all
zero values even if these fields are not in the corresponding type in
the kernel. This provides a way to be backward compatible. User space
programs can use the same map on a machine running an old kernel by
clearing fields that do not exist in the kernel.

For example, in a test case, it adds an additional field "zeroed" that
doesn't exist in struct bpf_testmod_ops of the kernel.

    struct bpf_testmod_ops___zeroed {
    	int (*test_1)(void);
    	void (*test_2)(int a, int b);
    	int (*test_maybe_null)(int dummy, struct task_struct *task);
    	int zeroed;
    };

    SEC(".struct_ops.link")
    struct bpf_testmod_ops___zeroed testmod_zeroed = {
    	.test_1 = (void *)test_1,
    	.test_2 = (void *)test_2_v2,
    };

Here, it doesn't assign a value to "zeroed" of testmod_zeroed, and by
default the value of this field will be zero. So, the map will be
accepted by libbpf, but libbpf will skip the "zeroed" field. However,
if the "zeroed" field is assigned to any value other than "0", libbpf
will reject to load this map.
---
Changes from v1:

 - Fix the issue about function pointer fields.

 - Change a warning message, and add an info message for skipping
   fields.

 - Add a small demo of additional arguments that are not in the
   function pointer prototype in the kernel.

v1: https://lore.kernel.org/all/20240312183245.341141-1-thinker.li@gmail.com/

Kui-Feng Lee (3):
  libbpf: Skip zeroed or null fields if not found in the kernel type.
  selftests/bpf: Ensure libbpf skip all-zeros fields of struct_ops maps.
  selftests/bpf: Accept extra arguments if they are not used.

 tools/lib/bpf/libbpf.c                        |  24 +++-
 .../bpf/prog_tests/test_struct_ops_module.c   | 103 ++++++++++++++++++
 .../bpf/progs/struct_ops_extra_arg.c          |  49 +++++++++
 .../selftests/bpf/progs/struct_ops_module.c   |  16 ++-
 4 files changed, 186 insertions(+), 6 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_extra_arg.c
====================

Link: https://lore.kernel.org/r/20240313214139.685112-1-thinker.li@gmail.comSigned-off-by: Andrii Nakryiko <andrii@kernel.org>

6cda7e17

selftests/bpf: Ensure libbpf skip all-zeros fields of struct_ops maps. · 26a7cf2b

Kui-Feng Lee authored Mar 13, 2024

A new version of a type may have additional fields that do not exist in
older versions. Previously, libbpf would reject struct_ops maps with a new
version containing extra fields when running on a machine with an old
kernel. However, we have updated libbpf to ignore these fields if their
values are all zeros or null in order to provide backward compatibility.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240313214139.685112-3-thinker.li@gmail.com

26a7cf2b

libbpf: Skip zeroed or null fields if not found in the kernel type. · c911fc61

Kui-Feng Lee authored Mar 13, 2024

Accept additional fields of a struct_ops type with all zero values even if
these fields are not in the corresponding type in the kernel. This provides
a way to be backward compatible. User space programs can use the same map
on a machine running an old kernel by clearing fields that do not exist in
the kernel.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240313214139.685112-2-thinker.li@gmail.com

c911fc61

libbpf: Prevent null-pointer dereference when prog to load has no BTF · 9bf48fa1

Quentin Monnet authored Mar 14, 2024

In bpf_objec_load_prog(), there's no guarantee that obj->btf is non-NULL
when passing it to btf__fd(), and this function does not perform any
check before dereferencing its argument (as bpf_object__btf_fd() used to
do). As a consequence, we get segmentation fault errors in bpftool (for
example) when trying to load programs that come without BTF information.

v2: Keep btf__fd() in the fix instead of reverting to bpf_object__btf_fd().

Fixes: df7c3f7d ("libbpf: make uniform use of btf__fd() accessor inside libbpf")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Quentin Monnet <qmo@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240314150438.232462-1-qmo@kernel.org

9bf48fa1

bpftool: Fix missing pids during link show · fe879bb4

Yonghong Song authored Mar 11, 2024

Current 'bpftool link' command does not show pids, e.g.,
  $ tools/build/bpftool/bpftool link
  ...
  4: tracing  prog 23
        prog_type lsm  attach_type lsm_mac
        target_obj_id 1  target_btf_id 31320

Hack the following change to enable normal libbpf debug output,
  --- a/tools/bpf/bpftool/pids.c
  +++ b/tools/bpf/bpftool/pids.c
  @@ -121,9 +121,9 @@ int build_obj_refs_table(struct hashmap **map, enum bpf_obj_type type)
          /* we don't want output polluted with libbpf errors if bpf_iter is not
           * supported
           */
  -       default_print = libbpf_set_print(libbpf_print_none);
  +       /* default_print = libbpf_set_print(libbpf_print_none); */
          err = pid_iter_bpf__load(skel);
  -       libbpf_set_print(default_print);
  +       /* libbpf_set_print(default_print); */

Rerun the above bpftool command:
  $ tools/build/bpftool/bpftool link
  libbpf: prog 'iter': BPF program load failed: Permission denied
  libbpf: prog 'iter': -- BEGIN PROG LOAD LOG --
  0: R1=ctx() R10=fp0
  ; struct task_struct *task = ctx->task; @ pid_iter.bpf.c:69
  0: (79) r6 = *(u64 *)(r1 +8)          ; R1=ctx() R6_w=ptr_or_null_task_struct(id=1)
  ; struct file *file = ctx->file; @ pid_iter.bpf.c:68
  ...
  ; struct bpf_link *link = (struct bpf_link *) file->private_data; @ pid_iter.bpf.c:103
  80: (79) r3 = *(u64 *)(r8 +432)       ; R3_w=scalar() R8=ptr_file()
  ; if (link->type == bpf_core_enum_value(enum bpf_link_type___local, @ pid_iter.bpf.c:105
  81: (61) r1 = *(u32 *)(r3 +12)
  R3 invalid mem access 'scalar'
  processed 39 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 2
  -- END PROG LOAD LOG --
  libbpf: prog 'iter': failed to load: -13
  ...

The 'file->private_data' returns a 'void' type and this caused subsequent 'link->type'
(insn #81) failed in verification.

To fix the issue, restore the previous BPF_CORE_READ so old kernels can also work.
With this patch, the 'bpftool link' runs successfully with 'pids'.
  $ tools/build/bpftool/bpftool link
  ...
  4: tracing  prog 23
        prog_type lsm  attach_type lsm_mac
        target_obj_id 1  target_btf_id 31320
        pids systemd(1)

Fixes: 44ba7b30 ("bpftool: Use a local copy of BPF_LINK_TYPE_PERF_EVENT in pid_iter.bpf.c")
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Quentin Monnet <quentin@isovalent.com>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20240312023249.3776718-1-yonghong.song@linux.dev

fe879bb4

bpftool: Cast pointers for shadow types explicitly. · c2a0257c

Kui-Feng Lee authored Mar 11, 2024

According to a report, skeletons fail to assign shadow pointers when being
compiled with C++ programs. Unlike C doing implicit casting for void
pointers, C++ requires an explicit casting.

To support C++, we do explicit casting for each shadow pointer.

Also add struct_ops_module.skel.h to test_cpp to validate C++
compilation as part of BPF selftests.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20240312013726.1780720-1-thinker.li@gmail.com

c2a0257c