Commits · 830154cdc57971b06f81d4ffc39b868e3d7693de · Kirill Smelkov / linux

22 Mar, 2023 7 commits

bpf/selftests: coverage for bpf_map_ops errors · 830154cd

JP Kobryn authored Mar 22, 2023

These tests expose the issue of being unable to properly check for errors
returned from inlined bpf map helpers that make calls to the bpf_map_ops
functions. At best, a check for zero or non-zero can be done but these
tests show it is not possible to check for a negative value or for a
specific error value.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230322194754.185781-2-inwardvessel@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

830154cd

Merge branch 'bpf: Support ksym detection in light skeleton.' · d9d93f3b

Andrii Nakryiko authored Mar 22, 2023

Alexei Starovoitov says:

====================

From: Alexei Starovoitov <ast@kernel.org>

v1->v2: update denylist on s390

Patch 1: Cleanup internal libbpf names.
Patch 2: Teach the verifier that rdonly_mem != NULL.
Patch 3: Fix gen_loader to support ksym detection.
Patch 4: Selftest and update denylist.
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

d9d93f3b

selftests/bpf: Add light skeleton test for kfunc detection. · 3b2ec214

Alexei Starovoitov authored Mar 21, 2023

Add light skeleton test for kfunc detection and denylist it for s390.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230321203854.3035-5-alexei.starovoitov@gmail.com

3b2ec214

libbpf: Support kfunc detection in light skeleton. · 708cdc57

Alexei Starovoitov authored Mar 21, 2023

Teach gen_loader to find {btf_id, btf_obj_fd} of kernel variables and kfuncs
and populate corresponding ld_imm64 and bpf_call insns.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230321203854.3035-4-alexei.starovoitov@gmail.com

708cdc57

bpf: Teach the verifier to recognize rdonly_mem as not null. · 1057d299

Alexei Starovoitov authored Mar 21, 2023

Teach the verifier to recognize PTR_TO_MEM | MEM_RDONLY as not NULL
otherwise if (!bpf_ksym_exists(known_kfunc)) doesn't go through
dead code elimination.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230321203854.3035-3-alexei.starovoitov@gmail.com

1057d299

libbpf: Rename RELO_EXTERN_VAR/FUNC. · a18f7214

Alexei Starovoitov authored Mar 21, 2023

RELO_EXTERN_VAR/FUNC names are not correct anymore. RELO_EXTERN_VAR represent
ksym symbol in ld_imm64 insn. It can point to kernel variable or kfunc.
Rename RELO_EXTERN_VAR->RELO_EXTERN_LD64 and RELO_EXTERN_FUNC->RELO_EXTERN_CALL
to match what they actually represent.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230321203854.3035-2-alexei.starovoitov@gmail.com

a18f7214

selftests/xsk: add xdp populate metadata test · 9a321fd3

Tushar Vyavahare authored Mar 20, 2023

Add a new test in copy-mode for testing the copying of metadata from the
buffer in kernel-space to user-space. This is accomplished by adding a
new XDP program and using the bss map to store a counter that is written
to the metadata field. This counter is incremented for every packet so
that the number becomes unique and should be the same as the payload. It
is store in the bss so the value can be reset between runs.

The XDP program populates the metadata and the userspace program checks
the value stored in the metadata field against the payload using the new
is_metadata_correct() function. To turn this verification on or off, add
a new parameter (use_metadata) to the ifobject structure.
Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230320102705.306187-1-tushar.vyavahare@intel.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

9a321fd3

21 Mar, 2023 4 commits

Merge branch 'net: skbuff: skb bitfield compaction - bpf' · 6a9f5cdb

Martin KaFai Lau authored Mar 20, 2023

Jakub Kicinski says:

====================

I'm trying to make more of the sk_buff bits optional.
Move the BPF-accessed bits a little - because they must
be at coding-time-constant offsets they must precede any
optional bit. While at it clean up the naming a bit.

v1: https://lore.kernel.org/all/20230308003159.441580-1-kuba@kernel.org/
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

6a9f5cdb

net: skbuff: move the fields BPF cares about directly next to the offset marker · c0ba8611

Jakub Kicinski authored Mar 20, 2023

To avoid more possible BPF dependencies with moving bitfields
around keep the fields BPF cares about right next to the offset
marker.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230321014115.997841-4-kuba@kernel.orgSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

c0ba8611

net: skbuff: reorder bytes 2 and 3 of the bitfield · b94e032b

Jakub Kicinski authored Mar 20, 2023

BPF needs to know the offsets of fields it tries to access.
Zero-length fields are added to make offsetof() work.
This unfortunately partitions the bitfield (fields across
the zero-length members can't be coalesced).

Reorder bytes 2 and 3, BPF needs to know the offset of fields
previously in byte 3 and some fields in byte 2 should really
be optional.

The two bytes are always in the same cacheline so it should
not matter.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230321014115.997841-3-kuba@kernel.orgSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

b94e032b

net: skbuff: rename __pkt_vlan_present_offset to __mono_tc_offset · 04aae213

Jakub Kicinski authored Mar 20, 2023

vlan_present is gone since
commit 354259fa ("net: remove skb->vlan_present")
rename the offset field to what BPF is currently looking
for in this byte - mono_delivery_time and tc_at_ingress.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230321014115.997841-2-kuba@kernel.orgSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

04aae213

20 Mar, 2023 3 commits

libbpf: Explicitly call write to append content to file · 01dc26c9

Liu Pan authored Mar 20, 2023

Write data to fd by calling "vdprintf", in most implementations
of the standard library, the data is finally written by the writev syscall.
But "uprobe_events/kprobe_events" does not allow segmented writes,
so switch the "append_to_file" function to explicit write() call.
Signed-off-by: Liu Pan <patteliu@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230320030720.650-1-patteliu@gmail.com

01dc26c9

selftest/bpf: Add a test case for ld_imm64 copy logic. · bb4a6a92

Alexei Starovoitov authored Mar 19, 2023

Add a test case to exercise {btf_id, btf_obj_fd} copy logic between ld_imm64 insns.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230319203014.55866-2-alexei.starovoitov@gmail.com

bb4a6a92

libbpf: Fix ld_imm64 copy logic for ksym in light skeleton. · a506d6ce

Alexei Starovoitov authored Mar 19, 2023

Unlike normal libbpf the light skeleton 'loader' program is doing
btf_find_by_name_kind() call at run-time to find ksym in the kernel and
populate its {btf_id, btf_obj_fd} pair in ld_imm64 insn. To avoid doing the
search multiple times for the same ksym it remembers the first patched ld_imm64
insn and copies {btf_id, btf_obj_fd} from it into subsequent ld_imm64 insn.
Fix a bug in copying logic, since it may incorrectly clear BPF_PSEUDO_BTF_ID flag.

Also replace always true if (btf_obj_fd >= 0) check with unconditional JMP_JA
to clarify the code.

Fixes: d995816b ("libbpf: Avoid reload of imm for weak, unresolved, repeating ksym")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230319203014.55866-1-alexei.starovoitov@gmail.com

a506d6ce

18 Mar, 2023 1 commit

bpf, docs: Libbpf overview documentation · 08ff1c9f

Sreevani Sreejith authored Mar 15, 2023

This patch documents overview of libbpf, including its features for
developing BPF programs.
Signed-off-by: Sreevani Sreejith <ssreevani@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230315195405.2051559-1-ssreevani@meta.com

08ff1c9f

17 Mar, 2023 11 commits

selftests/bpf: Add --json-summary option to test_progs · 2be7aa76

Manu Bretelle authored Mar 17, 2023

Currently, test_progs outputs all stdout/stderr as it runs, and when it
is done, prints a summary.

It is non-trivial for tooling to parse that output and extract meaningful
information from it.

This change adds a new option, `--json-summary`/`-J` that let the caller
specify a file where `test_progs{,-no_alu32}` can write a summary of the
run in a json format that can later be parsed by tooling.

Currently, it creates a summary section with successes/skipped/failures
followed by a list of failed tests and subtests.

A test contains the following fields:
- name: the name of the test
- number: the number of the test
- message: the log message that was printed by the test.
- failed: A boolean indicating whether the test failed or not. Currently
we only output failed tests, but in the future, successful tests could
be added.
- subtests: A list of subtests associated with this test.

A subtest contains the following fields:
- name: same as above
- number: sanme as above
- message: the log message that was printed by the subtest.
- failed: same as above but for the subtest

An example run and json content below:
```
$ sudo ./test_progs -a $(grep -v '^#' ./DENYLIST.aarch64 | awk '{print
$1","}' | tr -d '\n') -j -J /tmp/test_progs.json
$ jq < /tmp/test_progs.json | head -n 30
{
  "success": 29,
  "success_subtest": 23,
  "skipped": 3,
  "failed": 28,
  "results": [
    {
      "name": "bpf_cookie",
      "number": 10,
      "message": "test_bpf_cookie:PASS:skel_open 0 nsec\n",
      "failed": true,
      "subtests": [
        {
          "name": "multi_kprobe_link_api",
          "number": 2,
          "message": "kprobe_multi_link_api_subtest:PASS:load_kallsyms 0 nsec\nlibbpf: extern 'bpf_testmod_fentry_test1' (strong): not resolved\nlibbpf: failed to load object 'kprobe_multi'\nlibbpf: failed to load BPF skeleton 'kprobe_multi': -3\nkprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3\n",
          "failed": true
        },
        {
          "name": "multi_kprobe_attach_api",
          "number": 3,
          "message": "libbpf: extern 'bpf_testmod_fentry_test1' (strong): not resolved\nlibbpf: failed to load object 'kprobe_multi'\nlibbpf: failed to load BPF skeleton 'kprobe_multi': -3\nkprobe_multi_attach_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3\n",
          "failed": true
        },
        {
          "name": "lsm",
          "number": 8,
          "message": "lsm_subtest:PASS:lsm.link_create 0 nsec\nlsm_subtest:FAIL:stack_mprotect unexpected stack_mprotect: actual 0 != expected -1\n",
          "failed": true
        }
```

The file can then be used to print a summary of the test run and list of
failing tests/subtests:

```
$ jq -r < /tmp/test_progs.json '"Success: \(.success)/\(.success_subtest), Skipped: \(.skipped), Failed: \(.failed)"'

Success: 29/23, Skipped: 3, Failed: 28
$ jq -r < /tmp/test_progs.json '.results | map([
    if .failed then "#\(.number) \(.name)" else empty end,
    (
        . as {name: $tname, number: $tnum} | .subtests | map(
            if .failed then "#\($tnum)/\(.number) \($tname)/\(.name)" else empty end
        )
    )
]) | flatten | .[]' | head -n 20
 #10 bpf_cookie
 #10/2 bpf_cookie/multi_kprobe_link_api
 #10/3 bpf_cookie/multi_kprobe_attach_api
 #10/8 bpf_cookie/lsm
 #15 bpf_mod_race
 #15/1 bpf_mod_race/ksym (used_btfs UAF)
 #15/2 bpf_mod_race/kfunc (kfunc_btf_tab UAF)
 #36 cgroup_hierarchical_stats
 #61 deny_namespace
 #61/1 deny_namespace/unpriv_userns_create_no_bpf
 #73 fexit_stress
 #83 get_func_ip_test
 #99 kfunc_dynptr_param
 #99/1 kfunc_dynptr_param/dynptr_data_null
 #99/4 kfunc_dynptr_param/dynptr_data_null
 #100 kprobe_multi_bench_attach
 #100/1 kprobe_multi_bench_attach/kernel
 #100/2 kprobe_multi_bench_attach/modules
 #101 kprobe_multi_test
 #101/1 kprobe_multi_test/skel_api
```
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230317163256.3809328-1-chantr4@gmail.com

2be7aa76

Merge branch 'bpf: Add detection of kfuncs.' · 6cae5a71

Andrii Nakryiko authored Mar 17, 2023

Alexei Starovoitov says:

====================

From: Alexei Starovoitov <ast@kernel.org>

Allow BPF programs detect at load time whether particular kfunc exists.

Patch 1: Allow ld_imm64 to point to kfunc in the kernel.
Patch 2: Fix relocation of kfunc in ld_imm64 insn when kfunc is in kernel module.
Patch 3: Introduce bpf_ksym_exists() macro.
Patch 4: selftest.

NOTE: detection of kfuncs from light skeleton is not supported yet.
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

6cae5a71

selftests/bpf: Add test for bpf_ksym_exists(). · 95fdf6e3

Alexei Starovoitov authored Mar 17, 2023

Add load and run time test for bpf_ksym_exists() and check that the verifier
performs dead code elimination for non-existing kfunc.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20230317201920.62030-5-alexei.starovoitov@gmail.com

95fdf6e3

libbpf: Introduce bpf_ksym_exists() macro. · 5cbd3fe3

Alexei Starovoitov authored Mar 17, 2023

Introduce bpf_ksym_exists() macro that can be used by BPF programs
to detect at load time whether particular ksym (either variable or kfunc)
is present in the kernel.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230317201920.62030-4-alexei.starovoitov@gmail.com

5cbd3fe3

libbpf: Fix relocation of kfunc ksym in ld_imm64 insn. · 5fc13ad5

Alexei Starovoitov authored Mar 17, 2023

void *p = kfunc; -> generates ld_imm64 insn.
kfunc() -> generates bpf_call insn.

libbpf patches bpf_call insn correctly while only btf_id part of ld_imm64 is
set in the former case. Which means that pointers to kfuncs in modules are not
patched correctly and the verifier rejects load of such programs due to btf_id
being out of range. Fix libbpf to patch ld_imm64 for kfunc.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230317201920.62030-3-alexei.starovoitov@gmail.com

5fc13ad5

bpf: Allow ld_imm64 instruction to point to kfunc. · 58aa2afb

Alexei Starovoitov authored Mar 17, 2023

Allow ld_imm64 insn with BPF_PSEUDO_BTF_ID to hold the address of kfunc. The
ld_imm64 pointing to a valid kfunc will be seen as non-null PTR_TO_MEM by
is_branch_taken() logic of the verifier, while libbpf will resolve address to
unknown kfunc as ld_imm64 reg, 0 which will also be recognized by
is_branch_taken() and the verifier will proceed dead code elimination. BPF
programs can use this logic to detect at load time whether kfunc is present in
the kernel with bpf_ksym_exists() macro that is introduced in the next patches.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20230317201920.62030-2-alexei.starovoitov@gmail.com

58aa2afb

bpf, docs: Use internal linking for link to netdev subsystem doc · 0f10f647

Bagas Sanjaya authored Mar 14, 2023

Commit d56b0c46 ("bpf, docs: Fix link to netdev-FAQ target")
attempts to fix linking problem to undefined "netdev-FAQ" label
introduced in 287f4fa9 ("docs: Update references to netdev-FAQ")
by changing internal cross reference to netdev subsystem documentation
(Documentation/process/maintainer-netdev.rst) to external one at
docs.kernel.org. However, the linking problem is still not
resolved, as the generated link points to non-existent netdev-FAQ
section of the external doc, which when clicked, will instead going
to the top of the doc.

Revert back to internal linking by simply mention the doc path while
massaging the leading text to the link, since the netdev subsystem
doc contains no FAQs but rather general information about the subsystem.

Fixes: d56b0c46 ("bpf, docs: Fix link to netdev-FAQ target")
Fixes: 287f4fa9 ("docs: Update references to netdev-FAQ")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230314074449.23620-1-bagasdotme@gmail.com

0f10f647

kallsyms, bpf: Move find_kallsyms_symbol_value out of internal header · bd5314f8

Viktor Malik authored Mar 17, 2023

Moving find_kallsyms_symbol_value from kernel/module/internal.h to
include/linux/module.h. The reason is that internal.h is not prepared to
be included when CONFIG_MODULES=n. find_kallsyms_symbol_value is used by
kernel/bpf/verifier.c and including internal.h from it (without modules)
leads into a compilation error:

  In file included from ../include/linux/container_of.h:5,
                   from ../include/linux/list.h:5,
                   from ../include/linux/timer.h:5,
                   from ../include/linux/workqueue.h:9,
                   from ../include/linux/bpf.h:10,
                   from ../include/linux/bpf-cgroup.h:5,
                   from ../kernel/bpf/verifier.c:7:
  ../kernel/bpf/../module/internal.h: In function 'mod_find':
  ../include/linux/container_of.h:20:54: error: invalid use of undefined type 'struct module'
     20 |         static_assert(__same_type(*(ptr), ((type *)0)->member) ||       \
        |                                                      ^~
  [...]

This patch fixes the above error.

Fixes: 31bf1dbc ("bpf: Fix attaching fentry/fexit/fmod_ret/lsm to modules")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/oe-kbuild-all/202303161404.OrmfCy09-lkp@intel.com/
Link: https://lore.kernel.org/bpf/20230317095601.386738-1-vmalik@redhat.com

bd5314f8

Merge branch 'double-fix bpf_test_run + XDP_PASS recycling' · 94bbbdfb

Alexei Starovoitov authored Mar 16, 2023

Alexander Lobakin says:

====================

Enabling skb PP recycling revealed a couple issues in the bpf_test_run
code. Recycling broke the assumption that the headroom won't ever be
touched during the test_run execution: xdp_scrub_frame() invalidates the
XDP frame at the headroom start, while neigh xmit code overwrites 2 bytes
to the left of the Ethernet header. The first makes the kernel panic in
certain cases, while the second breaks xdp_do_redirect selftest on BE.
test_run is a limited-scope entity, so let's hope no more corner cases
will happen here or at least they will be as easy and pleasant to fix
as those two.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

94bbbdfb

selftests/bpf: fix "metadata marker" getting overwritten by the netstack · 5640b6d8

Alexander Lobakin authored Mar 16, 2023

Alexei noticed xdp_do_redirect test on BPF CI started failing on
BE systems after skb PP recycling was enabled:

test_xdp_do_redirect:PASS:prog_run 0 nsec
test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec
test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec
test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc: actual
220 != expected 9998
test_max_pkt_size:PASS:prog_run_max_size 0 nsec
test_max_pkt_size:PASS:prog_run_too_big 0 nsec
close_netns:PASS:setns 0 nsec
 #289 xdp_do_redirect:FAIL
Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED

and it doesn't happen on LE systems.
Ilya then hunted it down to:

 #0  0x0000000000aaeee6 in neigh_hh_output (hh=0x83258df0,
skb=0x88142200) at linux/include/net/neighbour.h:503
 #1  0x0000000000ab2cda in neigh_output (skip_cache=false,
skb=0x88142200, n=<optimized out>) at linux/include/net/neighbour.h:544
 #2  ip6_finish_output2 (net=net@entry=0x88edba00, sk=sk@entry=0x0,
skb=skb@entry=0x88142200) at linux/net/ipv6/ip6_output.c:134
 #3  0x0000000000ab4cbc in __ip6_finish_output (skb=0x88142200, sk=0x0,
net=0x88edba00) at linux/net/ipv6/ip6_output.c:195
 #4  ip6_finish_output (net=0x88edba00, sk=0x0, skb=0x88142200) at
linux/net/ipv6/ip6_output.c:206

xdp_do_redirect test places a u32 marker (0x42) right before the Ethernet
header to check it then in the XDP program and return %XDP_ABORTED if it's
not there. Neigh xmit code likes to round up hard header length to speed
up copying the header, so it overwrites two bytes in front of the Eth
header. On LE systems, 0x42 is one byte at `data - 4`, while on BE it's
`data - 1`, what explains why it happens only there.
It didn't happen previously due to that %XDP_PASS meant the page will be
discarded and replaced by a new one, but now it can be recycled as well,
while bpf_test_run code doesn't reinitialize the content of recycled
pages. This mark is limited to this particular test and its setup though,
so there's no need to predict 1000 different possible cases. Just move
it 4 bytes to the left, still keeping it 32 bit to match on more bytes.

Fixes: 9c94bbf9 ("xdp: recycle Page Pool backed skbs built from XDP frames")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/CAADnVQ+B_JOU+EpP=DKhbY9yXdN6GiRPnpTTXfEZ9sNkUeb-yQ@mail.gmail.com
Reported-by: Ilya Leoshkevich <iii@linux.ibm.com> # + debugging
Link: https://lore.kernel.org/bpf/8341c1d9f935f410438e79d3bd8a9cc50aefe105.camel@linux.ibm.comSigned-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230316175051.922550-3-aleksander.lobakin@intel.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

5640b6d8

bpf, test_run: fix crashes due to XDP frame overwriting/corruption · e5995bc7

Alexander Lobakin authored Mar 16, 2023

syzbot and Ilya faced the splats when %XDP_PASS happens for bpf_test_run
after skb PP recycling was enabled for {__,}xdp_build_skb_from_frame():

BUG: kernel NULL pointer dereference, address: 0000000000000d28
RIP: 0010:memset_erms+0xd/0x20 arch/x86/lib/memset_64.S:66
[...]
Call Trace:
 <TASK>
 __finalize_skb_around net/core/skbuff.c:321 [inline]
 __build_skb_around+0x232/0x3a0 net/core/skbuff.c:379
 build_skb_around+0x32/0x290 net/core/skbuff.c:444
 __xdp_build_skb_from_frame+0x121/0x760 net/core/xdp.c:622
 xdp_recv_frames net/bpf/test_run.c:248 [inline]
 xdp_test_run_batch net/bpf/test_run.c:334 [inline]
 bpf_test_run_xdp_live+0x1289/0x1930 net/bpf/test_run.c:362
 bpf_prog_test_run_xdp+0xa05/0x14e0 net/bpf/test_run.c:1418
[...]

This happens due to that it calls xdp_scrub_frame(), which nullifies
xdpf->data. bpf_test_run code doesn't reinit the frame when the XDP
program doesn't adjust head or tail. Previously, %XDP_PASS meant the
page will be released from the pool and returned to the MM layer, but
now it does return to the Pool with the nullified xdpf->data, which
doesn't get reinitialized then.
So, in addition to checking whether the head and/or tail have been
adjusted, check also for a potential XDP frame corruption. xdpf->data
is 100% affected and also xdpf->flags is the field closest to the
metadata / frame start. Checking for these two should be enough for
non-extreme cases.

Fixes: 9c94bbf9 ("xdp: recycle Page Pool backed skbs built from XDP frames")
Reported-by: syzbot+e1d1b65f7c32f2a86a9f@syzkaller.appspotmail.com
Link: https://lore.kernel.org/bpf/000000000000f1985705f6ef2243@google.comReported-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/bpf/e07dd94022ad5731705891b9487cc9ed66328b94.camel@linux.ibm.comSigned-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230316175051.922550-2-aleksander.lobakin@intel.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

e5995bc7

16 Mar, 2023 13 commits

bpf: Remove misleading spec_v1 check on var-offset stack read · 082cdc69

Luis Gerhorst authored Mar 15, 2023

For every BPF_ADD/SUB involving a pointer, adjust_ptr_min_max_vals()
ensures that the resulting pointer has a constant offset if
bypass_spec_v1 is false. This is ensured by calling sanitize_check_bounds()
which in turn calls check_stack_access_for_ptr_arithmetic(). There,
-EACCESS is returned if the register's offset is not constant, thereby
rejecting the program.

In summary, an unprivileged user must never be able to create stack
pointers with a variable offset. That is also the case, because a
respective check in check_stack_write() is missing. If they were able
to create a variable-offset pointer, users could still use it in a
stack-write operation to trigger unsafe speculative behavior [1].

Because unprivileged users must already be prevented from creating
variable-offset stack pointers, viable options are to either remove
this check (replacing it with a clarifying comment), or to turn it
into a "verifier BUG"-message, also adding a similar check in
check_stack_write() (for consistency, as a second-level defense).
This patch implements the first option to reduce verifier bloat.

This check was introduced by commit 01f810ac ("bpf: Allow
variable-offset stack access") which correctly notes that
"variable-offset reads and writes are disallowed (they were already
disallowed for the indirect access case) because the speculative
execution checking code doesn't support them". However, it does not
further discuss why the check in check_stack_read() is necessary.
The code which made this check obsolete was also introduced in this
commit.

I have compiled ~650 programs from the Linux selftests, Linux samples,
Cilium, and libbpf/examples projects and confirmed that none of these
trigger the check in check_stack_read() [2]. Instead, all of these
programs are, as expected, already rejected when constructing the
variable-offset pointers. Note that the check in
check_stack_access_for_ptr_arithmetic() also prints "off=%d" while the
code removed by this patch does not (the error removed does not appear
in the "verification_error" values). For reproducibility, the
repository linked includes the raw data and scripts used to create
the plot.

  [1] https://arxiv.org/pdf/1807.03757.pdf
  [2] https://gitlab.cs.fau.de/un65esoq/bpf-spectre/-/raw/53dc19fcf459c186613b1156a81504b39c8d49db/data/plots/23-02-26_23-56_bpftool/bpftool/0004-errors.pdf?inline=false

Fixes: 01f810ac ("bpf: Allow variable-offset stack access")
Signed-off-by: Luis Gerhorst <gerhorst@cs.fau.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230315165358.23701-1-gerhorst@cs.fau.de

082cdc69

Merge branch 'Make struct bpf_cpumask RCU safe' · deb9fd64

Alexei Starovoitov authored Mar 16, 2023

David Vernet says:

====================

The struct bpf_cpumask type is currently not RCU safe. It uses the
bpf_mem_cache_{alloc,free}() APIs to allocate and release cpumasks, and
those allocations may be reused before an RCU grace period has elapsed.
We want to be able to enable using this pattern in BPF programs:

private(MASK) static struct bpf_cpumask __kptr *global;

int BPF_PROG(prog, ...)
{
	struct bpf_cpumask *cpumask;

	bpf_rcu_read_lock();
	cpumask = global;
	if (!cpumask) {
		bpf_rcu_read_unlock();
		return -1;
	}
	bpf_cpumask_setall(cpumask);
	...
	bpf_rcu_read_unlock();
}

In other words, to be able to pass a kptr to KF_RCU bpf_cpumask kfuncs
without requiring the acquisition and release of refcounts using
bpf_cpumask_kptr_get(). This patchset enables this by making the struct
bpf_cpumask type RCU safe, and removing the bpf_cpumask_kptr_get()
function.
---
v1: https://lore.kernel.org/all/20230316014122.678082-2-void@manifault.com/

Changelog:
----------
v1 -> v2:
- Add doxygen comment for new @rcu field in struct bpf_cpumask.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

deb9fd64

bpf,docs: Remove bpf_cpumask_kptr_get() from documentation · fec2c6d1

David Vernet authored Mar 16, 2023

Now that the kfunc no longer exists, we can remove it and instead
describe how RCU can be used to get a struct bpf_cpumask from a map
value. This patch updates the BPF documentation accordingly.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230316054028.88924-6-void@manifault.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

fec2c6d1

bpf: Remove bpf_cpumask_kptr_get() kfunc · 1b403ce7

David Vernet authored Mar 16, 2023

Now that struct bpf_cpumask is RCU safe, there's no need for this kfunc.
Rather than doing the following:

private(MASK) static struct bpf_cpumask __kptr *global;

int BPF_PROG(prog, s32 cpu, ...)
{
	struct bpf_cpumask *cpumask;

	bpf_rcu_read_lock();
	cpumask = bpf_cpumask_kptr_get(&global);
	if (!cpumask) {
		bpf_rcu_read_unlock();
		return -1;
	}
	bpf_cpumask_setall(cpumask);
	...
	bpf_cpumask_release(cpumask);
	bpf_rcu_read_unlock();
}

Programs can instead simply do (assume same global cpumask):

int BPF_PROG(prog, ...)
{
	struct bpf_cpumask *cpumask;

	bpf_rcu_read_lock();
	cpumask = global;
	if (!cpumask) {
		bpf_rcu_read_unlock();
		return -1;
	}
	bpf_cpumask_setall(cpumask);
	...
	bpf_rcu_read_unlock();
}

In other words, no extra atomic acquire / release, and less boilerplate
code.

This patch removes both the kfunc, as well as its selftests and
documentation.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230316054028.88924-5-void@manifault.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

1b403ce7

bpf/selftests: Test using global cpumask kptr with RCU · a5a197df

David Vernet authored Mar 16, 2023

Now that struct bpf_cpumask * is considered an RCU-safe type according
to the verifier, we should add tests that validate its common usages.
This patch adds those tests to the cpumask test suite. A subsequent
changes will remove bpf_cpumask_kptr_get(), and will adjust the selftest
and BPF documentation accordingly.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230316054028.88924-4-void@manifault.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

a5a197df

bpf: Mark struct bpf_cpumask as rcu protected · 63d2d83d

David Vernet authored Mar 16, 2023

struct bpf_cpumask is a BPF-wrapper around the struct cpumask type which
can be instantiated by a BPF program, and then queried as a cpumask in
similar fashion to normal kernel code. The previous patch in this series
makes the type fully RCU safe, so the type can be included in the
rcu_protected_type BTF ID list.

A subsequent patch will remove bpf_cpumask_kptr_get(), as it's no longer
useful now that we can just treat the type as RCU safe by default and do
our own if check.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230316054028.88924-3-void@manifault.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

63d2d83d

bpf: Free struct bpf_cpumask in call_rcu handler · 77473d1a

David Vernet authored Mar 16, 2023

The struct bpf_cpumask type uses the bpf_mem_cache_{alloc,free}() APIs
to allocate and free its cpumasks. The bpf_mem allocator may currently
immediately reuse some memory when its freed, without waiting for an RCU
read cycle to elapse. We want to be able to treat struct bpf_cpumask
objects as completely RCU safe.

This is necessary for two reasons:

1. bpf_cpumask_kptr_get() currently does an RCU-protected
refcnt_inc_not_zero(). This of course assumes that the underlying
memory is not reused, and is therefore unsafe in its current form.

2. We want to be able to get rid of bpf_cpumask_kptr_get() entirely, and
intead use the superior kptr RCU semantics now afforded by the
verifier.

This patch fixes (1), and enables (2), by making struct bpf_cpumask RCU
safe. A subsequent patch will update the verifier to allow struct
bpf_cpumask * pointers to be passed to KF_RCU kfuncs, and then a latter
patch will remove bpf_cpumask_kptr_get().

Fixes: 516f4d33 ("bpf: Enable cpumasks to be queried and used as kptrs")
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230316054028.88924-2-void@manifault.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

77473d1a

libbpf: Ignore warnings about "inefficient alignment" · 6cb9430b

Daniel Müller authored Mar 15, 2023

Some consumers of libbpf compile the code base with different warnings
enabled. In a report for perf, for example, -Wpacked was set which
caused warnings about "inefficient alignment" to be emitted on a subset
of supported architectures.

With this change we silence specifically those warnings, as we intentionally
worked with packed structs.

This is a similar resolution as in b2f10cd4 ("perf cpumap: Fix alignment
for masks in event encoding").

Fixes: 1eebcb60 ("libbpf: Implement basic zip archive parsing support")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/bpf/CA+G9fYtBnwxAWXi2+GyNByApxnf_DtP1-6+_zOKAdJKnJBexjg@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20230315171550.1551603-1-deso@posteo.net

6cb9430b

selftests/bpf: Fix a fd leak in an error path in network_helpers.c · 226efec2

Martin KaFai Lau authored Mar 15, 2023

In __start_server, it leaks a fd when setsockopt(SO_REUSEPORT) fails.
This patch fixes it.

Fixes: eed92afd ("bpf: selftest: Test batching and bpf_(get|set)sockopt in bpf tcp iter")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20230316000726.1016773-2-martin.lau@linux.dev

226efec2

selftests/bpf: Use ASSERT_EQ instead ASSERT_OK for testing memcmp result · ed01385c

Martin KaFai Lau authored Mar 15, 2023

In tcp_hdr_options test, it ensures the received tcp hdr option
and the sk local storage have the expected values. It uses memcmp
to check that. Testing the memcmp result with ASSERT_OK is confusing
because ASSERT_OK will print out the errno which is not set.
This patch uses ASSERT_EQ to check for 0 instead.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20230316000726.1016773-1-martin.lau@linux.dev

ed01385c

Merge branch 'Fix attaching fentry/fexit/fmod_ret/lsm to modules' · 72fe61d7

Alexei Starovoitov authored Mar 15, 2023

Viktor Malik says:

====================

I noticed that the verifier behaves incorrectly when attaching to fentry
of multiple functions of the same name located in different modules (or
in vmlinux). The reason for this is that if the target program is not
specified, the verifier will search kallsyms for the trampoline address
to attach to. The entire kallsyms is always searched, not respecting the
module in which the function to attach to is located.

As Yonghong correctly pointed out, there is yet another issue - the
trampoline acquires the module reference in register_fentry which means
that if the module is unloaded between the place where the address is
found in the verifier and register_fentry, it is possible that another
module is loaded to the same address in the meantime, which may lead to
errors.

This patch fixes the above issues by extracting the module name from the
BTF of the attachment target (which must be specified) and by doing the
search in kallsyms of the correct module. At the same time, the module
reference is acquired right after the address is found and only released
right before the program itself is unloaded.
---
Changes in v10:
- added the new test to DENYLIST.aarch64 (suggested by Andrii)
- renamed the test source file to match the test name

Changes in v9:
- two small changes suggested by Jiri Olsa and Jiri's ack

Changes in v8:
- added module_put to error paths in bpf_check_attach_target after the
  module reference is acquired

Changes in v7:
- refactored the module reference manipulation (comments by Jiri Olsa)
- cleaned up the test (comments by Andrii Nakryiko)

Changes in v6:
- storing the module reference inside bpf_prog_aux instead of
  bpf_trampoline and releasing it when the program is unloaded
  (suggested by Jiri Olsa)

Changes in v5:
- fixed acquiring and releasing of module references by trampolines to
  prevent modules being unloaded between address lookup and trampoline
  allocation

Changes in v4:
- reworked module kallsyms lookup approach using existing functions,
  verifier now calls btf_try_get_module to retrieve the module and
  find_kallsyms_symbol_value to get the symbol address (suggested by
  Alexei)
- included Jiri Olsa's comments
- improved description of the new test and added it as a comment into
  the test source

Changes in v3:
- added trivial implementation for kallsyms_lookup_name_in_module() for
  !CONFIG_MODULES (noticed by test robot, fix suggested by Hao Luo)

Changes in v2:
- introduced and used more space-efficient kallsyms lookup function,
  suggested by Jiri Olsa
- included Hao Luo's comments
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

72fe61d7

bpf/selftests: Test fentry attachment to shadowed functions · aa3d65de

Viktor Malik authored Mar 10, 2023

Adds a new test that tries to attach a program to fentry of two
functions of the same name, one located in vmlinux and the other in
bpf_testmod.

To avoid conflicts with existing tests, a new function
"bpf_fentry_shadow_test" was created both in vmlinux and in bpf_testmod.

The previous commit fixed a bug which caused this test to fail. The
verifier would always use the vmlinux function's address as the target
trampoline address, hence trying to create two trampolines for a single
address, which is forbidden.

The test (similarly to other fentry/fexit tests) is not working on arm64
at the moment.
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/5fe2f364190b6f79b085066ed7c5989c5bc475fa.1678432753.git.vmalik@redhat.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

aa3d65de

bpf: Fix attaching fentry/fexit/fmod_ret/lsm to modules · 31bf1dbc

Viktor Malik authored Mar 10, 2023

This resolves two problems with attachment of fentry/fexit/fmod_ret/lsm
to functions located in modules:

1. The verifier tries to find the address to attach to in kallsyms. This
   is always done by searching the entire kallsyms, not respecting the
   module in which the function is located. Such approach causes an
   incorrect attachment address to be computed if the function to attach
   to is shadowed by a function of the same name located earlier in
   kallsyms.

2. If the address to attach to is located in a module, the module
   reference is only acquired in register_fentry. If the module is
   unloaded between the place where the address is found
   (bpf_check_attach_target in the verifier) and register_fentry, it is
   possible that another module is loaded to the same address which may
   lead to potential errors.

Since the attachment must contain the BTF of the program to attach to,
we extract the module from it and search for the function address in the
correct module (resolving problem no. 1). Then, the module reference is
taken directly in bpf_check_attach_target and stored in the bpf program
(in bpf_prog_aux). The reference is only released when the program is
unloaded (resolving problem no. 2).
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/3f6a9d8ae850532b5ef864ef16327b0f7a669063.1678432753.git.vmalik@redhat.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

31bf1dbc

14 Mar, 2023 1 commit

cgroup: Make current_cgns_cgroup_dfl() safe to call after exit_task_namespace() · b8a2e3f9

Tejun Heo authored Mar 14, 2023

The commit 332ea1f6 ("bpf: Add bpf_cgroup_from_id() kfunc") added
bpf_cgroup_from_id() which calls current_cgns_cgroup_dfl() through
cgroup_get_from_id(). However, BPF programs may be attached to a point where
current->nsproxy has already been cleared to NULL by exit_task_namespace()
and calling bpf_cgroup_from_id() would cause an oops.

Just return the system-wide root if nsproxy has been cleared. This allows
all cgroups to be looked up after the task passed through
exit_task_namespace(), which semantically makes sense. Given that the only
way to get this behavior is through BPF programs, it seems safe but let's
see what others think.

Fixes: 332ea1f6 ("bpf: Add bpf_cgroup_from_id() kfunc")
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/ZBDuVWiFj2jiz3i8@slm.duckdns.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

b8a2e3f9