Commits · c53e853c2d8145859f57c63662030f7aaa61cfdc · Kirill Smelkov / linux

11 Apr, 2024 11 commits

Merge branch 'export send_recv_data' · c53e853c

Martin KaFai Lau authored Apr 11, 2024

Geliang Tang says:

====================
v5:
 - address Martin's comments for v4 (thanks).
 - update patch 2, use 'return err' instead of 'return -1/0'.
 - drop patch 3 in v4.

v4:
 - fix a bug in v3, it should be 'if (err)', not 'if (!err)'.
 - move "selftests/bpf: Use log_err in network_helpers" out of this
   series.

v3:
 - add two more patches.
 - use log_err instead of ASSERT in v3.
 - let send_recv_data return int as Martin suggested.

v2:

Address Martin's comments for v1 (thanks.)
 - drop patch 1, "export send_byte helper".
 - drop "WRITE_ONCE(arg.stop, 0)".
 - rebased.

send_recv_data will be re-used in MPTCP bpf tests, but not included
in this set because it depends on other patches that have not been
in the bpf-next yet. It will be sent as another set soon.
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

c53e853c

selftests/bpf: Export send_recv_data helper · dc34e44e

Geliang Tang authored Apr 11, 2024

This patch extracts the code to send and receive data into a new
helper named send_recv_data() in network_helpers.c and export it
in network_helpers.h.

This helper will be used for MPTCP BPF selftests.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/5231103be91fadcce3674a589542c63b6a5eedd4.1712813933.git.tanggeliang@kylinos.cnSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

dc34e44e

selftests/bpf: Add struct send_recv_arg · 68acca6e

Geliang Tang authored Apr 11, 2024

Avoid setting total_bytes and stop as global variables, this patch adds
a new struct named send_recv_arg to pass arguments between threads. Put
these two variables together with fd into this struct and pass it to
server thread, so that server thread can access these two variables without
setting them as global ones.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/ca1dd703b796f6810985418373e750f7068b4186.1712813933.git.tanggeliang@kylinos.cnSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

68acca6e

selftests/bpf: Fix umount cgroup2 error in test_sockmap · d75142db

Geliang Tang authored Apr 09, 2024

This patch fixes the following "umount cgroup2" error in test_sockmap.c:

 (cgroup_helpers.c:353: errno: Device or resource busy) umount cgroup2

Cgroup fd cg_fd should be closed before cleanup_cgroup_environment().

Fixes: 13a5f3ff ("bpf: Selftests, sockmap test prog run without setting cgroup")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/0399983bde729708773416b8488bac2cd5e022b8.1712639568.git.tanggeliang@kylinos.cnSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

d75142db

selftests/bpf: Enable tests for atomics with cpuv4 · ffa6b26b

Yonghong Song authored Apr 10, 2024

When looking at Alexei's patch ([1]) which added tests for atomics,
I noticed that the tests will be skipped with cpuv4. For example,
with latest llvm19, I see:
  [root@arch-fb-vm1 bpf]# ./test_progs -t arena_atomics
  #3/1     arena_atomics/add:OK
  ...
  #3/7     arena_atomics/xchg:OK
  #3       arena_atomics:OK
  Summary: 1/7 PASSED, 0 SKIPPED, 0 FAILED
  [root@arch-fb-vm1 bpf]# ./test_progs-cpuv4 -t arena_atomics
  #3       arena_atomics:SKIP
  Summary: 1/0 PASSED, 1 SKIPPED, 0 FAILED
  [root@arch-fb-vm1 bpf]#

It is perfectly fine to enable atomics-related tests for cpuv4.
With this patch, I have
  [root@arch-fb-vm1 bpf]# ./test_progs-cpuv4 -t arena_atomics
  #3/1     arena_atomics/add:OK
  ...
  #3/7     arena_atomics/xchg:OK
  #3       arena_atomics:OK
  Summary: 1/7 PASSED, 0 SKIPPED, 0 FAILED

  [1] https://lore.kernel.org/r/20240405231134.17274-2-alexei.starovoitov@gmail.comSigned-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240410153326.1851055-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

ffa6b26b

Merge branch 'bpf-add-bpf_link-support-for-sk_msg-and-sk_skb-progs' · ded8c009

Alexei Starovoitov authored Apr 10, 2024

Yonghong Song says:

====================
bpf: Add bpf_link support for sk_msg and sk_skb progs

One of our internal services started to use sk_msg program and currently
it used existing prog attach/detach2 as demonstrated in selftests.
But attach/detach of all other bpf programs are based on bpf_link.
Consistent attach/detach APIs for all programs will make things easy to
undersand and less error prone. So this patch added bpf_link
support for BPF_PROG_TYPE_SK_MSG. Based on comments from
previous RFC patch, I added BPF_PROG_TYPE_SK_SKB support as well
as both program types have similar treatment w.r.t. bpf_link
handling.

For the patch series, patch 1 added kernel support. Patch 2
added libbpf support. Patch 3 added bpftool support and
patches 4/5 added some new tests.

Changelogs:
  v6 -> v7:
    - fix an missing-mutex_unlock error.
  v5 -> v6:
    - resolve libbpf conflict due to recent upstream change.
    - add a bpf_link_create() test.
    - some code refactoring for better code quality.
  v4 -> v5:
    - increase scope of mutex protection in link_release.
    - remove previous-leftover entry in libbpf.map.
    - make some code changes for better understanding.
  v3 -> v4:
    - use a single mutex lock to protect both attach/detach/update
      and release/fill_info/show_fdinfo.
    - simplify code for sock_map_link_lookup().
    - fix a few bugs.
    - add more tests.
  v2 -> v3:
    - consolidate link types of sk_msg and sk_skb to
      a single link type BPF_PROG_TYPE_SOCKMAP.
    - fix bpf_link lifecycle issue. in v2, after bpf_link
      is attached, a subsequent prog_attach could change
      that bpf_link. this patch makes bpf_link having
      correct behavior such that it won't go away facing
      other prog/link attach for the same map and the same
      attach type.
====================

Link: https://lore.kernel.org/r/20240410043522.3736912-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

ded8c009

selftests/bpf: Add some tests with new bpf_program__attach_sockmap() APIs · 8ba218e6

Yonghong Song authored Apr 09, 2024

Add a few more tests in sockmap_basic.c and sockmap_listen.c to
test bpf_link based APIs for SK_MSG and SK_SKB programs.
Link attach/detach/update are all tested.

All tests are passed.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240410043547.3738448-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

8ba218e6

selftests/bpf: Refactor out helper functions for a few tests · a15d58b2

Yonghong Song authored Apr 09, 2024

These helper functions will be used later new tests as well.
There are no functionality change.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240410043542.3738166-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

a15d58b2

bpftool: Add link dump support for BPF_LINK_TYPE_SOCKMAP · 1f3e2091

Yonghong Song authored Apr 09, 2024

An example output looks like:
  $ bpftool link
    1776: sk_skb  prog 49730
            map_id 0  attach_type sk_skb_verdict
            pids test_progs(8424)
    1777: sk_skb  prog 49755
            map_id 0  attach_type sk_skb_stream_verdict
            pids test_progs(8424)
    1778: sk_msg  prog 49770
            map_id 8208  attach_type sk_msg_verdict
            pids test_progs(8424)
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Quentin Monnet <qmo@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240410043537.3737928-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

1f3e2091

libbpf: Add bpf_link support for BPF_PROG_TYPE_SOCKMAP · 849989af

Yonghong Song authored Apr 09, 2024

Introduce a libbpf API function bpf_program__attach_sockmap()
which allow user to get a bpf_link for their corresponding programs.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240410043532.3737722-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

849989af

bpf: Add bpf_link support for sk_msg and sk_skb progs · 699c23f0

Yonghong Song authored Apr 09, 2024

Add bpf_link support for sk_msg and sk_skb programs. We have an
internal request to support bpf_link for sk_msg programs so user
space can have a uniform handling with bpf_link based libbpf
APIs. Using bpf_link based libbpf API also has a benefit which
makes system robust by decoupling prog life cycle and
attachment life cycle.
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240410043527.3737160-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

699c23f0

09 Apr, 2024 2 commits

selftests/bpf: Add tests for atomics in bpf_arena. · d0a2ba19

Alexei Starovoitov authored Apr 05, 2024

Add selftests for atomic instructions in bpf_arena.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240405231134.17274-2-alexei.starovoitov@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

d0a2ba19

bpf: Add support for certain atomics in bpf_arena to x86 JIT · d503a04f

Alexei Starovoitov authored Apr 05, 2024

Support atomics in bpf_arena that can be JITed as a single x86 instruction.
Instructions that are JITed as loops are not supported at the moment,
since they require more complex extable and loop logic.

JITs can choose to do smarter things with bpf_jit_supports_insn().
Like arm64 may decide to support all bpf atomics instructions
when emit_lse_atomic is available and none in ll_sc mode.

bpf_jit_supports_percpu_insn(), bpf_jit_supports_ptr_xchg() and
other such callbacks can be replaced with bpf_jit_supports_insn()
in the future.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240405231134.17274-1-alexei.starovoitov@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

d503a04f

08 Apr, 2024 1 commit

selftests/bpf: eliminate warning of get_cgroup_id_from_path() · bb761fcb

Jason Xing authored Apr 06, 2024

The output goes like this if I make samples/bpf:
...warning: no previous prototype for ‘get_cgroup_id_from_path’...

Make this function static could solve the warning problem since
no one outside of the file calls it.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240406144613.4434-1-kerneljasonxing@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

bb761fcb

06 Apr, 2024 4 commits

Merge branch 'libbpf-api-to-partially-consume-items-from-ringbuffer' · 50408d7a

Andrii Nakryiko authored Apr 06, 2024

Andrea Righi says:

====================
libbpf: API to partially consume items from ringbuffer

Introduce ring__consume_n() and ring_buffer__consume_n() API to
partially consume items from one (or more) ringbuffer(s).

This can be useful, for example, to consume just a single item or when
we need to copy multiple items to a limited user-space buffer from the
ringbuffer callback.

Practical example (where this API can be used):
https://github.com/sched-ext/scx/blob/b7c06b9ed9f72cad83c31e39e9c4e2cfd8683a55/rust/scx_rustland_core/src/bpf.rs#L217

See also:
https://lore.kernel.org/lkml/20240310154726.734289-1-andrea.righi@canonical.com/T/#u

v4:
 - open a new 1.5.0 cycle

v3:
 - rename ring__consume_max() -> ring__consume_n() and
   ring_buffer__consume_max() -> ring_buffer__consume_n()
 - add new API to a new 1.5.0 cycle
 - fixed minor nits / comments

v2:
 - introduce a new API instead of changing the callback's retcode
   behavior
====================

Link: https://lore.kernel.org/r/20240406092005.92399-1-andrea.righi@canonical.comSigned-off-by: Andrii Nakryiko <andrii@kernel.org>

50408d7a

libbpf: Add ring__consume_n / ring_buffer__consume_n · 4d22ea94

Andrea Righi authored Apr 06, 2024

Introduce a new API to consume items from a ring buffer, limited to a
specified amount, and return to the caller the actual number of items
consumed.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/lkml/20240310154726.734289-1-andrea.righi@canonical.com/T
Link: https://lore.kernel.org/bpf/20240406092005.92399-4-andrea.righi@canonical.com

4d22ea94

libbpf: ringbuf: Allow to consume up to a certain amount of items · 13e8125a

Andrea Righi authored Apr 06, 2024

In some cases, instead of always consuming all items from ring buffers
in a greedy way, we may want to consume up to a certain amount of items,
for example when we need to copy items from the BPF ring buffer to a
limited user buffer.

This change allows to set an upper limit to the amount of items consumed
from one or more ring buffers.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240406092005.92399-3-andrea.righi@canonical.com

13e8125a

libbpf: Start v1.5 development cycle · 5bd2ed65

Andrea Righi authored Apr 06, 2024

Bump libbpf.map to v1.5.0 to start a new libbpf version cycle.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240406092005.92399-2-andrea.righi@canonical.com

5bd2ed65

05 Apr, 2024 12 commits

Merge branch 'bpf-allow-invoking-kfuncs-from-bpf_prog_type_syscall-progs' · d564ffde

Andrii Nakryiko authored Apr 05, 2024

David Vernet says:

====================
bpf: Allow invoking kfuncs from BPF_PROG_TYPE_SYSCALL progs

Currently, a set of core BPF kfuncs (e.g. bpf_task_*, bpf_cgroup_*,
bpf_cpumask_*, etc) cannot be invoked from BPF_PROG_TYPE_SYSCALL
programs. The whitelist approach taken for enabling kfuncs makes sense:
it not safe to call these kfuncs from every program type. For example,
it may not be safe to call bpf_task_acquire() in an fentry to
free_task().

BPF_PROG_TYPE_SYSCALL, on the other hand, is a perfectly safe program
type from which to invoke these kfuncs, as it's a very controlled
environment, and we should never be able to run into any of the typical
problems such as recursive invoations, acquiring references on freeing
kptrs, etc. Being able to invoke these kfuncs would be useful, as
BPF_PROG_TYPE_SYSCALL can be invoked with BPF_PROG_RUN, and would
therefore enable user space programs to synchronously call into BPF to
manipulate these kptrs.
---

v1: https://lore.kernel.org/all/20240404010308.334604-1-void@manifault.com/
v1 -> v2:

- Create new verifier_kfunc_prog_types testcase meant to specifically
  validate calling core kfuncs from various program types. Remove the
  macros and testcases that had been added to the task, cgrp, and
  cpumask kfunc testcases (Andrii and Yonghong)
====================

Link: https://lore.kernel.org/r/20240405143041.632519-1-void@manifault.comSigned-off-by: Andrii Nakryiko <andrii@kernel.org>

d564ffde

selftests/bpf: Verify calling core kfuncs from BPF_PROG_TYPE_SYCALL · 1bc724af

David Vernet authored Apr 05, 2024

Now that we can call some kfuncs from BPF_PROG_TYPE_SYSCALL progs, let's
add some selftests that verify as much. As a bonus, let's also verify
that we can't call the progs from raw tracepoints. Do do this, we add a
new selftest suite called verifier_kfunc_prog_types.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240405143041.632519-3-void@manifault.com

1bc724af

bpf: Allow invoking kfuncs from BPF_PROG_TYPE_SYSCALL progs · a8e03b6b

David Vernet authored Apr 05, 2024

Currently, a set of core BPF kfuncs (e.g. bpf_task_*, bpf_cgroup_*,
bpf_cpumask_*, etc) cannot be invoked from BPF_PROG_TYPE_SYSCALL
programs. The whitelist approach taken for enabling kfuncs makes sense:
it not safe to call these kfuncs from every program type. For example,
it may not be safe to call bpf_task_acquire() in an fentry to
free_task().

BPF_PROG_TYPE_SYSCALL, on the other hand, is a perfectly safe program
type from which to invoke these kfuncs, as it's a very controlled
environment, and we should never be able to run into any of the typical
problems such as recursive invoations, acquiring references on freeing
kptrs, etc. Being able to invoke these kfuncs would be useful, as
BPF_PROG_TYPE_SYSCALL can be invoked with BPF_PROG_RUN, and would
therefore enable user space programs to synchronously call into BPF to
manipulate these kptrs.

This patch therefore enables invoking the aforementioned core kfuncs
from BPF_PROG_TYPE_SYSCALL progs.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240405143041.632519-2-void@manifault.com

a8e03b6b

bpf, docs: Editorial nits in instruction-set.rst · 00d5d22a

Dave Thaler authored Apr 05, 2024

This patch addresses a number of editorial nits including
spelling, punctuation, grammar, and wording consistency issues
in instruction-set.rst.
Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240405155245.3618-1-dthaler1968@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

00d5d22a

selftests/bpf: Make sure libbpf doesn't enforce the signature of a func pointer. · ba0cbe2b

Kui-Feng Lee authored Apr 04, 2024

The verifier in the kernel ensures that the struct_ops operators behave
correctly by checking that they access parameters and context
appropriately. The verifier will approve a program as long as it correctly
accesses the context/parameters, regardless of its function signature. In
contrast, libbpf should not verify the signature of function pointers and
functions to enable flexibility in loading various implementations of an
operator even if the signature of the function pointer does not match those
in the implementations or the kernel.

With this flexibility, user space applications can adapt to different
kernel versions by loading a specific implementation of an operator based
on feature detection.

This is a follow-up of the commit c911fc61 ("libbpf: Skip zeroed or
null fields if not found in the kernel type.")
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240404232342.991414-1-thinker.li@gmail.com

ba0cbe2b

Merge branch 'bpf-allow-bpf_for_each_map_elem-helper-with-different-input-maps' · 27095479

Alexei Starovoitov authored Apr 05, 2024

Philo Lu says:

====================
bpf: allow bpf_for_each_map_elem() helper with different input maps

Currently, taking different maps within a single bpf_for_each_map_elem
call is not allowed. For example the following codes cannot pass the
verifier (with error "tail_call abusing map_ptr"):
```
static void test_by_pid(int pid)
{
	if (pid <= 100)
		bpf_for_each_map_elem(&map1, map_elem_cb, NULL, 0);
	else
		bpf_for_each_map_elem(&map2, map_elem_cb, NULL, 0);
}
```

This is because during bpf_for_each_map_elem verifying,
bpf_insn_aux_data->map_ptr_state is expected as map_ptr (instead of poison
state), which is then needed by set_map_elem_callback_state. However, as
there are two different map ptr input, map_ptr_state is marked as
BPF_MAP_PTR_POISON, and thus the second map_ptr would be lost.
BPF_MAP_PTR_POISON is also needed by bpf_for_each_map_elem to skip
retpoline optimization in do_misc_fixups(). Therefore, map_ptr_state and
map_ptr are both needed for bpf_for_each_map_elem.

This patchset solves it by transform bpf_insn_aux_data->map_ptr_state as a
new struct, storing poison/unpriv state and map pointer together without
additional memory overhead. Then bpf_for_each_map_elem works well with
different input maps. It also makes map_ptr_state logic clearer.

A test case is added to selftest, which would fail to load without this
patchset.

Changelogs
-> v1:
- PATCH 1/3:
  - make the commit log clearer
  - change poison and unpriv to bool in struct bpf_map_ptr_state, also the
    return value in bpf_map_ptr_poisoned() and bpf_map_ptr_unpriv()
- PATCH 2/3:
  - change the comments in set_map_elem_callback_state()
- PATCH 3/3:
  - remove the "skipping the last element" logic during map updating
  - change if() to ASSERT_OK()

Please review, thanks.
====================

Link: https://lore.kernel.org/r/20240405025536.18113-1-lulie@linux.alibaba.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

27095479

selftests/bpf: add test for bpf_for_each_map_elem() with different maps · fecb1597

Philo Lu authored Apr 05, 2024

A test is added for bpf_for_each_map_elem() with either an arraymap or a
hashmap.
$ tools/testing/selftests/bpf/test_progs -t for_each
 #93/1    for_each/hash_map:OK
 #93/2    for_each/array_map:OK
 #93/3    for_each/write_map_key:OK
 #93/4    for_each/multi_maps:OK
 #93      for_each:OK
Summary: 1/4 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240405025536.18113-4-lulie@linux.alibaba.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

fecb1597

bpf: allow invoking bpf_for_each_map_elem with different maps · 9d482da9

Philo Lu authored Apr 05, 2024

Taking different maps within a single bpf_for_each_map_elem call is not
allowed before, because from the second map,
bpf_insn_aux_data->map_ptr_state will be marked as *poison*. In fact
both map_ptr and state are needed to support this use case: map_ptr is
used by set_map_elem_callback_state() while poison state is needed to
determine whether to use direct call.
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240405025536.18113-3-lulie@linux.alibaba.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

9d482da9

bpf: store both map ptr and state in bpf_insn_aux_data · 0a525621

Philo Lu authored Apr 05, 2024

Currently, bpf_insn_aux_data->map_ptr_state is used to store either
map_ptr or its poison state (i.e., BPF_MAP_PTR_POISON). Thus
BPF_MAP_PTR_POISON must be checked before reading map_ptr. In certain
cases, we may need valid map_ptr even in case of poison state.
This will be explained in next patch with bpf_for_each_map_elem()
helper.

This patch changes map_ptr_state into a new struct including both map
pointer and its state (poison/unpriv). It's in the same union with
struct bpf_loop_inline_state, so there is no extra memory overhead.
Besides, macros BPF_MAP_PTR_UNPRIV/BPF_MAP_PTR_POISON/BPF_MAP_PTR are no
longer needed.

This patch does not change any existing functionality.
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240405025536.18113-2-lulie@linux.alibaba.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

0a525621

bpf: fix perf_snapshot_branch_stack link failure · 58babe27

Arnd Bergmann authored Apr 05, 2024

The newly added code to handle bpf_get_branch_snapshot fails to link when
CONFIG_PERF_EVENTS is disabled:

aarch64-linux-ld: kernel/bpf/verifier.o: in function `do_misc_fixups':
verifier.c:(.text+0x1090c): undefined reference to `__SCK__perf_snapshot_branch_stack'

Add a build-time check for that Kconfig symbol around the code to
remove the link time dependency.

Fixes: 314a5362 ("bpf: inline bpf_get_branch_snapshot() helper")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20240405142637.577046-1-arnd@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

58babe27

selftests/bpf: add fp-leaking precise subprog result tests · 343ca813

Andrii Nakryiko authored Apr 04, 2024

Add selftests validating that BPF verifier handles precision marking
for SCALAR registers derived from r10 (fp) register correctly.

Given `r0 = (s8)r10;` syntax is not supported by older Clang compilers,
use the raw BPF instruction syntax to maximize compatibility.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240404214536.3551295-2-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

343ca813

bpf: prevent r10 register from being marked as precise · 1f2a74b4

Andrii Nakryiko authored Apr 04, 2024

r10 is a special register that is not under BPF program's control and is
always effectively precise. The rest of precision logic assumes that
only r0-r9 SCALAR registers are marked as precise, so prevent r10 from
being marked precise.

This can happen due to signed cast instruction allowing to do something
like `r0 = (s8)r10;`, which later, if r0 needs to be precise, would lead
to an attempt to mark r10 as precise.

Prevent this with an extra check during instruction backtracking.

Fixes: 8100928c ("bpf: Support new sign-extension mov insns")
Reported-by: syzbot+148110ee7cf72f39f33e@syzkaller.appspotmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240404214536.3551295-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

1f2a74b4

04 Apr, 2024 9 commits

bpf: Pack struct bpf_fib_lookup · f9171700

Anton Protopopov authored Apr 03, 2024

The struct bpf_fib_lookup is supposed to be of size 64. A recent commit
59b418c7 ("bpf: Add a check for struct bpf_fib_lookup size") added
a static assertion to check this property so that future changes to the
structure will not accidentally break this assumption.

As it immediately turned out, on some 32-bit arm systems, when AEABI=n,
the total size of the structure was equal to 68, see [1]. This happened
because the bpf_fib_lookup structure contains a union of two 16-bit
fields:

    union {
            __u16 tot_len;
            __u16 mtu_result;
    };

which was supposed to compile to a 16-bit-aligned 16-bit field. On the
aforementioned setups it was instead both aligned and padded to 32-bits.

Declare this inner union as __attribute__((packed, aligned(2))) such
that it always is of size 2 and is aligned to 16 bits.

  [1] https://lore.kernel.org/all/CA+G9fYtsoP51f-oP_Sp5MOq-Ffv8La2RztNpwvE6+R1VtFiLrw@mail.gmail.com/#tReported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Fixes: e1850ea9 ("bpf: bpf_fib_lookup return MTU value as output when looked up")
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240403123303.1452184-1-aspsk@isovalent.com

f9171700

bpftool: Mount bpffs on provided dir instead of parent dir · 478a535a

Sahil Siddiq authored Apr 05, 2024

When pinning programs/objects under PATH (eg: during "bpftool prog
loadall") the bpffs is mounted on the parent dir of PATH in the
following situations:
- the given dir exists but it is not bpffs.
- the given dir doesn't exist and the parent dir is not bpffs.

Mounting on the parent dir can also have the unintentional side-
effect of hiding other files located under the parent dir.

If the given dir exists but is not bpffs, then the bpffs should
be mounted on the given dir and not its parent dir.

Similarly, if the given dir doesn't exist and its parent dir is not
bpffs, then the given dir should be created and the bpffs should be
mounted on this new dir.

Fixes: 2a36c26f ("bpftool: Support bpffs mountpoint as pin path for prog loadall")
Signed-off-by: Sahil Siddiq <icegambit91@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/2da44d24-74ae-a564-1764-afccf395eeec@isovalent.com/T/#t
Link: https://lore.kernel.org/bpf/20240404192219.52373-1-icegambit91@gmail.com

Closes: https://github.com/libbpf/bpftool/issues/100

Changes since v1:
 - Split "mount_bpffs_for_pin" into two functions.
   This is done to improve maintainability and readability.

Changes since v2:
- mount_bpffs_for_pin: rename to "create_and_mount_bpffs_dir".
- mount_bpffs_given_file: rename to "mount_bpffs_given_file".
- create_and_mount_bpffs_dir:
  - introduce "dir_exists" boolean.
  - remove new dir if "mnt_fs" fails.
- improve error handling and error messages.

Changes since v3:
- Rectify function name.
- Improve error messages and formatting.
- mount_bpffs_for_file:
  - Check if dir exists before block_mount check.

Changes since v4:
- Use strdup instead of strcpy.
- create_and_mount_bpffs_dir:
  - Use S_IRWXU instead of 0700.
- Improve error handling and formatting.

478a535a

Merge branch 'inline-bpf_get_branch_snapshot-bpf-helper' · d82c045f

Alexei Starovoitov authored Apr 04, 2024

Andrii Nakryiko says:

====================
Inline bpf_get_branch_snapshot() BPF helper

Implement inlining of bpf_get_branch_snapshot() BPF helper using generic BPF
assembly approach. This allows to reduce LBR record usage right before LBR
records are captured from inside BPF program.

See v1 cover letter ([0]) for some visual examples. I dropped them from v2
because there are multiple independent changes landing and being reviewed, all
of which remove different parts of LBR record waste, so presenting final state
of LBR "waste" gets more complicated until all of the pieces land.

  [0] https://lore.kernel.org/bpf/20240321180501.734779-1-andrii@kernel.org/

v2->v3:
  - fix BPF_MUL instruction definition;
v1->v2:
  - inlining of bpf_get_smp_processor_id() split out into a separate patch set
    implementing internal per-CPU BPF instruction;
  - add efficient divide-by-24 through multiplication logic, and leave
    comments to explain the idea behind it; this way inlined version of
    bpf_get_branch_snapshot() has no compromises compared to non-inlined
    version of the helper  (Alexei).
====================

Link: https://lore.kernel.org/r/20240404002640.1774210-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d82c045f

bpf: inline bpf_get_branch_snapshot() helper · 314a5362

Andrii Nakryiko authored Apr 03, 2024

Inline bpf_get_branch_snapshot() helper using architecture-agnostic
inline BPF code which calls directly into underlying callback of
perf_snapshot_branch_stack static call. This callback is set early
during kernel initialization and is never updated or reset, so it's ok
to fetch actual implementation using static_call_query() and call
directly into it.

This change eliminates a full function call and saves one LBR entry
in PERF_SAMPLE_BRANCH_ANY LBR mode.
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240404002640.1774210-3-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

314a5362

bpf: make bpf_get_branch_snapshot() architecture-agnostic · 5e6a3c1e

Andrii Nakryiko authored Apr 03, 2024

perf_snapshot_branch_stack is set up in an architecture-agnostic way, so
there is no reason for BPF subsystem to keep track of which
architectures do support LBR or not. E.g., it looks like ARM64 might soon
get support for BRBE ([0]), which (with proper integration) should be
possible to utilize using this BPF helper.

perf_snapshot_branch_stack static call will point to
__static_call_return0() by default, which just returns zero, which will
lead to -ENOENT, as expected. So no need to guard anything here.

  [0] https://lore.kernel.org/linux-arm-kernel/20240125094119.2542332-1-anshuman.khandual@arm.com/Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240404002640.1774210-2-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

5e6a3c1e

bpf, riscv: Implement bpf_addr_space_cast instruction · 21ab0b6d

Puranjay Mohan authored Apr 04, 2024

LLVM generates bpf_addr_space_cast instruction while translating
pointers between native (zero) address space and
__attribute__((address_space(N))). The addr_space=0 is reserved as
bpf_arena address space.

rY = addr_space_cast(rX, 0, 1) is processed by the verifier and
converted to normal 32-bit move: wX = wY

rY = addr_space_cast(rX, 1, 0) has to be converted by JIT.
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Pu Lehui <pulehui@huawei.com>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20240404114203.105970-3-puranjay12@gmail.com

21ab0b6d

bpf, riscv: Implement PROBE_MEM32 pseudo instructions · 633a6e01

Puranjay Mohan authored Apr 04, 2024

Add support for [LDX | STX | ST], PROBE_MEM32, [B | H | W | DW]
instructions. They are similar to PROBE_MEM instructions with the
following differences:

- PROBE_MEM32 supports store.
- PROBE_MEM32 relies on the verifier to clear upper 32-bit of the
  src/dst register
- PROBE_MEM32 adds 64-bit kern_vm_start address (which is stored in S7
  in the prologue). Due to bpf_arena constructions such S7 + reg +
  off16 access is guaranteed to be within arena virtual range, so no
  address check at run-time.
- S11 is a free callee-saved register, so it is used to store kern_vm_start
- PROBE_MEM32 allows STX and ST. If they fault the store is a nop. When
  LDX faults the destination register is zeroed.

To support these on riscv, we do tmp = S7 + src/dst reg and then use
tmp2 as the new src/dst register. This allows us to reuse most of the
code for normal [LDX | STX | ST].
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Pu Lehui <pulehui@huawei.com>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20240404114203.105970-2-puranjay12@gmail.com

633a6e01

bpf: Optimize emit_mov_imm64(). · af682b76

Alexei Starovoitov authored Apr 01, 2024

Turned out that bpf prog callback addresses, bpf prog addresses
used in bpf_trampoline, and in other cases the 64-bit address
can be represented as sign extended 32-bit value.

According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82339
"Skylake has 0.64c throughput for mov r64, imm64, vs. 0.25 for mov r32, imm32."
So use shorter encoding and faster instruction when possible.

Special care is needed in jit_subprogs(), since bpf_pseudo_func()
instruction cannot change its size during the last step of JIT.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/CAADnVQKFfpY-QZBrOU2CG8v2du8Lgyb7MNVmOZVK_yTyOdNbBA@mail.gmail.com
Link: https://lore.kernel.org/bpf/20240401233800.42737-1-alexei.starovoitov@gmail.com

af682b76

bpf: handle CONFIG_SMP=n configuration in x86 BPF JIT · 1e9e0b85

Andrii Nakryiko authored Apr 03, 2024

On non-SMP systems, there is no "per-CPU" data, it's just global data.
So in such case just don't do this_cpu_off-based per-CPU address adjustment.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202404040951.d4CUx5S6-lkp@intel.com/
Fixes: 7bdbf744 ("bpf: add special internal-only MOV instruction to resolve per-CPU addrs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240404034726.2766740-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

1e9e0b85

03 Apr, 2024 1 commit

Merge branch 'add-internal-only-bpf-per-cpu-instruction' · 519e1de9

Alexei Starovoitov authored Apr 03, 2024

Andrii Nakryiko says:

====================
Add internal-only BPF per-CPU instruction

Add a new BPF instruction for resolving per-CPU memory addresses.

New instruction is a special form of BPF_ALU64 | BPF_MOV | BPF_X, with
insns->off set to BPF_ADDR_PERCPU (== -1). It resolves provided per-CPU offset
to an absolute address where per-CPU data resides for "this" CPU.

This patch set implements support for it in x86-64 BPF JIT only.

Using the new instruction, we also implement inlining for three cases:
  - bpf_get_smp_processor_id(), which allows to avoid unnecessary trivial
    function call, saving a bit of performance and also not polluting LBR
    records with unnecessary function call/return records;
  - PERCPU_ARRAY's bpf_map_lookup_elem() is completely inlined, bringing its
    performance to implementing per-CPU data structures using global variables
    in BPF (which is an awesome improvement, see benchmarks below);
  - PERCPU_HASH's bpf_map_lookup_elem() is partially inlined, just like the
    same for non-PERCPU HASH map; this still saves a bit of overhead.

To validate performance benefits, I hacked together a tiny benchmark doing
only bpf_map_lookup_elem() and incrementing the value by 1 for PERCPU_ARRAY
(arr-inc benchmark below) and PERCPU_HASH (hash-inc benchmark below) maps. To
establish a baseline, I also implemented logic similar to PERCPU_ARRAY based
on global variable array using bpf_get_smp_processor_id() to index array for
current CPU (glob-arr-inc benchmark below).

BEFORE
======
glob-arr-inc   :  163.685 ± 0.092M/s
arr-inc        :  138.096 ± 0.160M/s
hash-inc       :   66.855 ± 0.123M/s

AFTER
=====
glob-arr-inc   :  173.921 ± 0.039M/s (+6%)
arr-inc        :  170.729 ± 0.210M/s (+23.7%)
hash-inc       :   68.673 ± 0.070M/s (+2.7%)

As can be seen, PERCPU_HASH gets a modest +2.7% improvement, while global
array-based gets a nice +6% due to inlining of bpf_get_smp_processor_id().

But what's really important is that arr-inc benchmark basically catches up
with glob-arr-inc, resulting in +23.7% improvement. This means that in
practice it won't be necessary to avoid PERCPU_ARRAY anymore if performance is
critical (e.g., high-frequent stats collection, which is often a practical use
for PERCPU_ARRAY today).

v1->v2:
  - use BPF_ALU64 | BPF_MOV instruction instead of LDX (Alexei);
  - dropped the direct per-CPU memory read instruction, it can always be added
    back, if necessary;
  - guarded bpf_get_smp_processor_id() behind x86-64 check (Alexei);
  - switched all per-cpu addr casts to (unsigned long) to avoid sparse
    warnings.
====================

Link: https://lore.kernel.org/r/20240402021307.1012571-1-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

519e1de9