Commits · 16b7c970cc8192e929dbd5192ccc1867e19d7bda · Kirill Smelkov / linux

03 Apr, 2023 1 commit

bpf, docs: Add docs on extended 64-bit immediate instructions · 16b7c970

Dave Thaler authored Mar 26, 2023

Add docs on extended 64-bit immediate instructions, including six instructions
previously undocumented. Include a brief description of maps and variables,
as used by those instructions.

V1 -> V2: rebased on top of latest master

V2 -> V3: addressed comments from Alexei

V3 -> V4: addressed comments from David Vernet
Signed-off-by: Dave Thaler <dthaler@microsoft.com>
Link: https://lore.kernel.org/r/20230326054946.2331-1-dthaler1968@googlemail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

16b7c970

02 Apr, 2023 1 commit

bpf: compute hashes in bloom filter similar to hashmap · 92b2e810

Anton Protopopov authored Apr 02, 2023

If the value size in a bloom filter is a multiple of 4, then the jhash2()
function is used to compute hashes. The length parameter of this function
equals to the number of 32-bit words in input. Compute it in the hot path
instead of pre-computing it, as this is translated to one extra shift to
divide the length by four vs. one extra memory load of a pre-computed length.
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Link: https://lore.kernel.org/r/20230402114340.3441-1-aspsk@isovalent.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

92b2e810

01 Apr, 2023 10 commits

bpf: optimize hashmap lookups when key_size is divisible by 4 · 5b85575a

Anton Protopopov authored Apr 01, 2023

The BPF hashmap uses the jhash() hash function. There is an optimized version
of this hash function which may be used if hash size is a multiple of 4. Apply
this optimization to the hashmap in a similar way as it is done in the bloom
filter map.

On practice the optimization is only noticeable for smaller key sizes, which,
however, is sufficient for many applications. An example is listed in the
following table of measurements (a hashmap of 65536 elements was used):

    --------------------------------------------------------------------
    | key_size | fullness | lookups /sec | lookups (opt) /sec |   gain |
    --------------------------------------------------------------------
    |        4 |      25% |      42.990M |            46.000M |   7.0% |
    |        4 |      50% |      37.910M |            39.094M |   3.1% |
    |        4 |      75% |      34.486M |            36.124M |   4.7% |
    |        4 |     100% |      31.760M |            32.719M |   3.0% |
    --------------------------------------------------------------------
    |        8 |      25% |      43.855M |            49.626M |  13.2% |
    |        8 |      50% |      38.328M |            42.152M |  10.0% |
    |        8 |      75% |      34.483M |            38.088M |  10.5% |
    |        8 |     100% |      31.306M |            34.686M |  10.8% |
    --------------------------------------------------------------------
    |       12 |      25% |      38.398M |            43.770M |  14.0% |
    |       12 |      50% |      33.336M |            37.712M |  13.1% |
    |       12 |      75% |      29.917M |            34.440M |  15.1% |
    |       12 |     100% |      27.322M |            30.480M |  11.6% |
    --------------------------------------------------------------------
    |       16 |      25% |      41.491M |            41.921M |   1.0% |
    |       16 |      50% |      36.206M |            36.474M |   0.7% |
    |       16 |      75% |      32.529M |            33.027M |   1.5% |
    |       16 |     100% |      29.581M |            30.325M |   2.5% |
    --------------------------------------------------------------------
    |       20 |      25% |      34.240M |            36.787M |   7.4% |
    |       20 |      50% |      30.328M |            32.663M |   7.7% |
    |       20 |      75% |      27.536M |            29.354M |   6.6% |
    |       20 |     100% |      24.847M |            26.505M |   6.7% |
    --------------------------------------------------------------------
    |       24 |      25% |      36.329M |            40.608M |  11.8% |
    |       24 |      50% |      31.444M |            35.059M |  11.5% |
    |       24 |      75% |      28.426M |            31.452M |  10.6% |
    |       24 |     100% |      26.278M |            28.741M |   9.4% |
    --------------------------------------------------------------------
    |       28 |      25% |      31.540M |            31.944M |   1.3% |
    |       28 |      50% |      27.739M |            28.063M |   1.2% |
    |       28 |      75% |      24.993M |            25.814M |   3.3% |
    |       28 |     100% |      23.513M |            23.500M |  -0.1% |
    --------------------------------------------------------------------
    |       32 |      25% |      32.116M |            33.953M |   5.7% |
    |       32 |      50% |      28.879M |            29.859M |   3.4% |
    |       32 |      75% |      26.227M |            26.948M |   2.7% |
    |       32 |     100% |      23.829M |            24.613M |   3.3% |
    --------------------------------------------------------------------
    |       64 |      25% |      22.535M |            22.554M |   0.1% |
    |       64 |      50% |      20.471M |            20.675M |   1.0% |
    |       64 |      75% |      19.077M |            19.146M |   0.4% |
    |       64 |     100% |      17.710M |            18.131M |   2.4% |
    --------------------------------------------------------------------

The following script was used to gather the results (SMT & frequency off):

    cd tools/testing/selftests/bpf
    for key_size in 4 8 12 16 20 24 28 32 64; do
            for nr_entries in `seq 16384 16384 65536`; do
                    fullness=$(printf '%3s' $((nr_entries*100/65536)))
                    echo -n "key_size=$key_size: $fullness% full: "
                    sudo ./bench -d2 -a bpf-hashmap-lookup --key_size=$key_size --nr_entries=$nr_entries --max_entries=65536 --nr_loops=2000000 --map_flags=0x40 | grep cpu
            done
            echo
    done
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Link: https://lore.kernel.org/r/20230401200602.3275-1-aspsk@isovalent.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

5b85575a

Merge branch 'Enable RCU semantics for task kptrs' · a033907e

Alexei Starovoitov authored Apr 01, 2023

David Vernet says:

====================

In commit 22df776a ("tasks: Extract rcu_users out of union"), the
'refcount_t rcu_users' field was extracted out of a union with the
'struct rcu_head rcu' field. This allows us to use the field for
refcounting struct task_struct with RCU protection, as the RCU callback
no longer flips rcu_users to be nonzero after the callback is scheduled.

This patch set leverages this to do a few things:

1. Marks struct task_struct as RCU safe in the verifier, allowing
   referenced kptr tasks stored in maps to be accessed in an RCU
   read region without acquiring a reference (with just a NULL check).
2. Makes bpf_task_acquire() a KF_ACQUIRE | KF_RCU | KF_RET_NULL kfunc.
3. Removes bpf_task_kptr_get() and bpf_task_acquire_not_zero(), as
   they're now redundant with the above two changes.
4. Updates selftests and documentation accordingly.
---
Changelog:
v1: https://lore.kernel.org/all/20230331005733.406202-1-void@manifault.com/
v1 -> v2:
- Remove testcases validating nested trust inheritance. The first
  version used 'struct task_struct __rcu *parent', but because that
  field has the __rcu tag it functions differently on gcc and llvm and
  causes gcc selftests to fail. Alexei is reworking nested trust,
  anyways so let's leave it off for now (Alexei).
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

a033907e

bpf,docs: Update documentation to reflect new task kfuncs · db9d479a

David Vernet authored Mar 31, 2023

Now that struct task_struct objects are RCU safe, and bpf_task_acquire()
can return NULL, we should update the BPF task kfunc documentation to
reflect the current state of the API.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230331195733.699708-4-void@manifault.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

db9d479a

bpf: Remove now-defunct task kfuncs · f85671c6

David Vernet authored Mar 31, 2023

In commit 22df776a ("tasks: Extract rcu_users out of union"), the
'refcount_t rcu_users' field was extracted out of a union with the
'struct rcu_head rcu' field. This allows us to safely perform a
refcount_inc_not_zero() on task->rcu_users when acquiring a reference on
a task struct. A prior patch leveraged this by making struct task_struct
an RCU-protected object in the verifier, and by bpf_task_acquire() to
use the task->rcu_users field for synchronization.

Now that we can use RCU to protect tasks, we no longer need
bpf_task_kptr_get(), or bpf_task_acquire_not_zero(). bpf_task_kptr_get()
is truly completely unnecessary, as we can just use RCU to get the
object. bpf_task_acquire_not_zero() is now equivalent to
bpf_task_acquire().

In addition to these changes, this patch also updates the associated
selftests to no longer use these kfuncs.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230331195733.699708-3-void@manifault.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

f85671c6

bpf: Make struct task_struct an RCU-safe type · d02c48fa

David Vernet authored Mar 31, 2023

struct task_struct objects are a bit interesting in terms of how their
lifetime is protected by refcounts. task structs have two refcount
fields:

1. refcount_t usage: Protects the memory backing the task struct. When
this refcount drops to 0, the task is immediately freed, without
waiting for an RCU grace period to elapse. This is the field that
most callers in the kernel currently use to ensure that a task
remains valid while it's being referenced, and is what's currently
tracked with bpf_task_acquire() and bpf_task_release().

2. refcount_t rcu_users: A refcount field which, when it drops to 0,
schedules an RCU callback that drops a reference held on the 'usage'
field above (which is acquired when the task is first created). This
field therefore provides a form of RCU protection on the task by
ensuring that at least one 'usage' refcount will be held until an RCU
grace period has elapsed. The qualifier "a form of" is important
here, as a task can remain valid after task->rcu_users has dropped to
0 and the subsequent RCU gp has elapsed.

In terms of BPF, we want to use task->rcu_users to protect tasks that
function as referenced kptrs, and to allow tasks stored as referenced
kptrs in maps to be accessed with RCU protection.

Let's first determine whether we can safely use task->rcu_users to
protect tasks stored in maps. All of the bpf_task* kfuncs can only be
called from tracepoint, struct_ops, or BPF_PROG_TYPE_SCHED_CLS, program
types. For tracepoint and struct_ops programs, the struct task_struct
passed to a program handler will always be trusted, so it will always be
safe to call bpf_task_acquire() with any task passed to a program.
Note, however, that we must update bpf_task_acquire() to be KF_RET_NULL,
as it is possible that the task has exited by the time the program is
invoked, even if the pointer is still currently valid because the main
kernel holds a task->usage refcount. For BPF_PROG_TYPE_SCHED_CLS, tasks
should never be passed as an argument to the any program handlers, so it
should not be relevant.

The second question is whether it's safe to use RCU to access a task
that was acquired with bpf_task_acquire(), and stored in a map. Because
bpf_task_acquire() now uses task->rcu_users, it follows that if the task
is present in the map, that it must have had at least one
task->rcu_users refcount by the time the current RCU cs was started.
Therefore, it's safe to access that task until the end of the current
RCU cs.

With all that said, this patch makes struct task_struct is an
RCU-protected object. In doing so, we also change bpf_task_acquire() to
be KF_ACQUIRE | KF_RCU | KF_RET_NULL, and adjust any selftests as
necessary. A subsequent patch will remove bpf_task_kptr_get(), and
bpf_task_acquire_not_zero() respectively.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230331195733.699708-2-void@manifault.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d02c48fa

Merge branch 'Prepare veristat for packaging' · 85850058

Alexei Starovoitov authored Apr 01, 2023

Andrii Nakryiko says:

====================

This patch set relicenses veristat.c to dual GPL-2.0/BSD-2 license and
prepares it to be mirrored to Github at libbpf/veristat repo.

Few small issues in the source code are fixed, found during Github sync
preparetion.

v2->v3:
  - fix few warnings about uninitialized variable uses;
v1->v2:
  - drop linux/compiler.h and define own ARRAY_SIZE macro;
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

85850058

veristat: small fixed found in -O2 mode · ebf390c9

Andrii Nakryiko authored Mar 31, 2023

Fix few potentially unitialized variables uses, found while building
veristat.c in release (-O2) mode.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230331222405.3468634-5-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

ebf390c9

veristat: avoid using kernel-internal headers · e3b65c0c

Andrii Nakryiko authored Mar 31, 2023

Drop linux/compiler.h include, which seems to be needed for ARRAY_SIZE
macro only. Redefine own version of ARRAY_SIZE instead.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230331222405.3468634-4-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

e3b65c0c

veristat: improve version reporting · 71c8c39f

Andrii Nakryiko authored Mar 31, 2023

For packaging version of the tool is important, so add a simple way to
specify veristat version for upstream mirror at Github.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230331222405.3468634-3-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

71c8c39f

veristat: relicense veristat.c as dual GPL-2.0-only or BSD-2-Clause licensed · 3ed85ae8

Andrii Nakryiko authored Mar 31, 2023

Dual-license veristat.c to dual GPL-2.0-only or BSD-2-Clause license.
This is needed to mirror it to Github to make it convenient for distro
packagers to package veristat as a separate package.

Veristat grew into a useful tool by itself, and there are already
a bunch of users relying on veristat as generic BPF loading and
verification helper tool. So making it easy to packagers by providing
Github mirror just like we do for bpftool and libbpf is the next step to
get veristat into the hands of users.

Apart from few typo fixes, I'm the sole contributor to veristat.c so
far, so no extra Acks should be needed for relicensing.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230331222405.3468634-2-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

3ed85ae8

31 Mar, 2023 5 commits

selftests/bpf: Fix conflicts with built-in functions in bench_local_storage_create · 9af0f555

James Hilliard authored Mar 31, 2023

The fork function in gcc is considered a built in function due to
being used by libgcov when building with gnu extensions.

Rename fork to sched_process_fork to prevent this conflict.

See details:
https://github.com/gcc-mirror/gcc/commit/d1c38823924506d389ca58d02926ace21bdf82fa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82457

Fixes the following error:

In file included from progs/bench_local_storage_create.c:6:
progs/bench_local_storage_create.c:43:14: error: conflicting types for
built-in function 'fork'; expected 'int(void)'
[-Werror=builtin-declaration-mismatch]
   43 | int BPF_PROG(fork, struct task_struct *parent, struct
task_struct *child)
      |              ^~~~

Fixes: cbe9d93d ("selftests/bpf: Add bench for task storage creation")
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230331075848.1642814-1-james.hilliard1@gmail.com

9af0f555

Merge branch 'selftests/bpf: Add read_build_id function' · e941933c

Alexei Starovoitov authored Mar 31, 2023

Jiri Olsa says:

====================

hi,
this selftests cleanup was previously posted as part of file build id changes [1],
which might take more time, so I'm sending the selftests changes separately so it
won't get stuck.

v4 changes:
  - added size argument to read_build_id [Andrii]
  - condition changes in parse_build_id_buf [Andrii]
  - use ELF_C_READ_MMAP in elf_begin [Andrii]
  - return -ENOENT in read_build_id if build id is not found [Andrii]
  - dropped elf class check [Andrii]

thanks,
jirka

[1] https://lore.kernel.org/bpf/20230316170149.4106586-1-jolsa@kernel.org/
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

e941933c

selftests/bpf: Replace extract_build_id with read_build_id · dcc46f51

Jiri Olsa authored Mar 31, 2023

Replacing extract_build_id with read_build_id that parses out
build id directly from elf without using readelf tool.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230331093157.1749137-4-jolsa@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

dcc46f51

selftests/bpf: Add read_build_id function · 88dc8b36

Jiri Olsa authored Mar 31, 2023

Adding read_build_id function that parses out build id from
specified binary.

It will replace extract_build_id and also be used in following
changes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230331093157.1749137-3-jolsa@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

88dc8b36

selftests/bpf: Add err.h header · 328bafc9

Jiri Olsa authored Mar 31, 2023

Moving error macros from profiler.inc.h to new err.h header.
It will be used in following changes.

Also adding PTR_ERR macro that will be used in following changes.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230331093157.1749137-2-jolsa@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

328bafc9

30 Mar, 2023 8 commits

selftests/bpf: Add testcases for ptr_*_or_null_ in bpf_kptr_xchg · 67efbd57

David Vernet authored Mar 30, 2023

The second argument of the bpf_kptr_xchg() helper function is
ARG_PTR_TO_BTF_ID_OR_NULL. A recent patch fixed a bug whereby the
verifier would fail with an internal error message if a program invoked
the helper with a PTR_TO_BTF_ID | PTR_MAYBE_NULL register. This testcase
adds some testcases to ensure that it fails gracefully moving forward.

Before the fix, these testcases would have failed an error resembling
the following:

; p = bpf_kfunc_call_test_acquire(&(unsigned long){0});
99: (7b) *(u64 *)(r10 -16) = r7       ; frame1: ...
100: (bf) r1 = r10                    ; frame1: ...
101: (07) r1 += -16                   ; frame1: ...
; p = bpf_kfunc_call_test_acquire(&(unsigned long){0});
102: (85) call bpf_kfunc_call_test_acquire#13908
; frame1: R0_w=ptr_or_null_prog_test_ref_kfunc...
; p = bpf_kptr_xchg(&v->ref_ptr, p);
103: (bf) r1 = r6                     ; frame1: ...
104: (bf) r2 = r0
; frame1: R0_w=ptr_or_null_prog_test_ref_kfunc...
105: (85) call bpf_kptr_xchg#194
verifier internal error: invalid PTR_TO_BTF_ID register for type match
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230330145203.80506-2-void@manifault.com

67efbd57

bpf: Handle PTR_MAYBE_NULL case in PTR_TO_BTF_ID helper call arg · e4c2acab

David Vernet authored Mar 30, 2023

When validating a helper function argument, we use check_reg_type() to
ensure that the register containing the argument is of the correct type.
When the register's base type is PTR_TO_BTF_ID, there is some
supplemental logic where we do extra checks for various combinations of
PTR_TO_BTF_ID type modifiers. For example, for PTR_TO_BTF_ID,
PTR_TO_BTF_ID | PTR_TRUSTED, and PTR_TO_BTF_ID | MEM_RCU, we call
map_kptr_match_type() for bpf_kptr_xchg() calls, and
btf_struct_ids_match() for other helper calls.

When an unhandled PTR_TO_BTF_ID type modifier combination is passed to
check_reg_type(), the verifier fails with an internal verifier error
message. This can currently be triggered by passing a PTR_MAYBE_NULL
pointer to helper functions (currently just bpf_kptr_xchg()) with an
ARG_PTR_TO_BTF_ID_OR_NULL arg type. For example, by callin
bpf_kptr_xchg(&v->kptr, bpf_cpumask_create()).

Whether or not passing a PTR_MAYBE_NULL arg to an
ARG_PTR_TO_BTF_ID_OR_NULL argument is valid is an interesting question.
In a vacuum, it seems fine. A helper function with an
ARG_PTR_TO_BTF_ID_OR_NULL arg would seem to be implying that it can
handle either a NULL or non-NULL arg, and has logic in place to detect
and gracefully handle each. This is the case for bpf_kptr_xchg(), which
of course simply does an xchg(). On the other hand, bpf_kptr_xchg() also
specifies OBJ_RELEASE, and refcounting semantics for a PTR_MAYBE_NULL
pointer is different than handling it for a NULL _OR_ non-NULL pointer.
For example, with a non-NULL arg, we should always fail if there was not
a nonzero refcount for the value in the register being passed to the
helper. For PTR_MAYBE_NULL on the other hand, it's unclear. If the
pointer is NULL it would be fine, but if it's not NULL, it would be
incorrect to load the program.

The current solution to this is to just fail if PTR_MAYBE_NULL is
passed, and to instead require programs to have a NULL check to
explicitly handle the NULL and non-NULL cases. This seems reasonable.
Not only would it possibly be quite complicated to correctly handle
PTR_MAYBE_NULL refcounting in the verifier, but it's also an arguably
odd programming pattern in general to not explicitly handle the NULL
case anyways. For example, it seems odd to not care about whether a
pointer you're passing to bpf_kptr_xchg() was successfully allocated in
a program such as the following:

private(MASK) static struct bpf_cpumask __kptr * global_mask;

SEC("tp_btf/task_newtask")
int BPF_PROG(example, struct task_struct *task, u64 clone_flags)
{
        struct bpf_cpumask *prev;

	/* bpf_cpumask_create() returns PTR_MAYBE_NULL */
	prev = bpf_kptr_xchg(&global_mask, bpf_cpumask_create());
	if (prev)
		bpf_cpumask_release(prev);

	return 0;
}

This patch therefore updates the verifier to explicitly check for
PTR_MAYBE_NULL in check_reg_type(), and fail gracefully if it's
observed. This isn't really "fixing" anything unsafe or incorrect. We're
just updating the verifier to fail gracefully, and explicitly handle
this pattern rather than unintentionally falling back to an internal
verifier error path. A subsequent patch will update selftests.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230330145203.80506-1-void@manifault.com

e4c2acab

veristat: change guess for __sk_buff from CGROUP_SKB to SCHED_CLS · d8161295

Andrii Nakryiko authored Mar 30, 2023

SCHED_CLS seems to be a better option as a default guess for freplace
programs that have __sk_buff as a context type.
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230330190115.3942962-1-andrii@kernel.orgSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

d8161295

selftests/bpf: Rewrite two infinite loops in bound check cases · 4ca13d10

Xu Kuohai authored Mar 28, 2023

The two infinite loops in bound check cases added by commit
1a3148fc ("selftests/bpf: Check when bounds are not in the 32-bit range")
increased the execution time of test_verifier from about 6 seconds to
about 9 seconds. Rewrite these two infinite loops to finite loops to get
rid of this extra time cost.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20230329011048.1721937-1-xukuohai@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

4ca13d10

Merge branch 'veristat: add better support of freplace programs' · 8a9abe02

Alexei Starovoitov authored Mar 29, 2023

Andrii Nakryiko says:

====================

Teach veristat how to deal with freplace BPF programs. As they can't be
directly loaded by veristat without custom user-space part that sets correct
target program FD, veristat always fails freplace programs. This patch set
teaches veristat to guess target program type that will be inherited by
freplace program itself, and subtitute it for BPF_PROG_TYPE_EXT (freplace) one
for the purposes of BPF verification.

Patch #1 fixes bug in libbpf preventing overriding freplace with specific
program type.

Patch #2 adds convenient -d flag to request veristat to emit libbpf debug
logs. It help debugging why a specific BPF program fails to load, if the
problem is not due to BPF verification itself.

v3->v4:
  - fix optional kern_name check when guessing prog type (Alexei);
v2->v3:
  - fix bpf_obj_id selftest that uses legacy bpf_prog_test_load() helper,
    which always sets program type programmatically; teach the helper to do it
    only if actually necessary (Stanislav);
v1->v2:
  - fix compilation error reported by old GCC (my GCC v11 doesn't produce even
    a warning) and Clang (see CI failure at [0]):

GCC version:

  veristat.c: In function ‘fixup_obj’:
  veristat.c:908:1: error: label at end of compound statement
    908 | skip_freplace_fixup:
        | ^~~~~~~~~~~~~~~~~~~

Clang version:

  veristat.c:909:1: error: label at end of compound statement is a C2x extension [-Werror,-Wc2x-extensions]
  }
  ^
  1 error generated.

  [0] https://github.com/kernel-patches/bpf/actions/runs/4515972059/jobs/7953845335
====================
Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

8a9abe02

veristat: guess and substitue underlying program type for freplace (EXT) progs · fa7cc906

Andrii Nakryiko authored Mar 27, 2023

SEC("freplace") (i.e., BPF_PROG_TYPE_EXT) programs are not loadable as
is through veristat, as kernel expects actual program's FD during
BPF_PROG_LOAD time, which veristat has no way of knowing.

Unfortunately, freplace programs are a pretty important class of
programs, especially when dealing with XDP chaining solutions, which
rely on EXT programs.

So let's do our best and teach veristat to try to guess the original
program type, based on program's context argument type. And if guessing
process succeeds, we manually override freplace/EXT with guessed program
type using bpf_program__set_type() setter to increase chances of proper
BPF verification.

We rely on BTF and maintain a simple lookup table. This process is
obviously not 100% bulletproof, as valid program might not use context
and thus wouldn't have to specify correct type. Also, __sk_buff is very
ambiguous and is the context type across many different program types.
We pick BPF_PROG_TYPE_CGROUP_SKB for now, which seems to work fine in
practice so far. Similarly, some program types require specifying attach
type, and so we pick one out of possible few variants.

Best effort at its best. But this makes veristat even more widely
applicable.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230327185202.1929145-4-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

fa7cc906

veristat: add -d debug mode option to see debug libbpf log · b3c63d7a

Andrii Nakryiko authored Mar 27, 2023

Add -d option to allow requesting libbpf debug logs from veristat.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230327185202.1929145-3-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

b3c63d7a

libbpf: disassociate section handler on explicit bpf_program__set_type() call · d6e6286a

Andrii Nakryiko authored Mar 27, 2023

If user explicitly overrides programs's type with
bpf_program__set_type() API call, we need to disassociate whatever
SEC_DEF handler libbpf determined initially based on program's SEC()
definition, as it's not goind to be valid anymore and could lead to
crashes and/or confusing failures.

Also, fix up bpf_prog_test_load() helper in selftests/bpf, which is
force-setting program type (even if that's completely unnecessary; this
is quite a legacy piece of code), and thus should expect auto-attach to
not work, yet one of the tests explicitly relies on auto-attach for
testing.

Instead, force-set program type only if it differs from the desired one.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230327185202.1929145-2-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d6e6286a

29 Mar, 2023 4 commits

Merge branch 'Allow BPF TCP CCs to write app_limited' · 8b52cc2a

Martin KaFai Lau authored Mar 29, 2023

Yixin Shen says:

====================

This series allow BPF TCP CCs to write app_limited of struct
tcp_sock. A built-in CC or one from a kernel module is already
able to write to app_limited of struct tcp_sock. Until now,
a BPF CC doesn't have write access to this member of struct
tcp_sock.

v2:
 - Merge the test of writing app_limited into the test of
   writing sk_pacing. (Martin KaFai Lau)
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

8b52cc2a

selftests/bpf: test a BPF CC writing app_limited · 4239561b

Yixin Shen authored Mar 29, 2023

Test whether a TCP CC implemented in BPF is allowed to write
app_limited in struct tcp_sock. This is already allowed for
the built-in TCP CC.
Signed-off-by: Yixin Shen <bobankhshen@gmail.com>
Link: https://lore.kernel.org/r/20230329073558.8136-3-bobankhshen@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

4239561b

bpf: allow a TCP CC to write app_limited · 562dc56a

Yixin Shen authored Mar 29, 2023

A CC that implements tcp_congestion_ops.cong_control() should be able to
write app_limited. A built-in CC or one from a kernel module is already
able to write to this member of struct tcp_sock.
For a BPF program, write access has not been allowed, yet.
Signed-off-by: Yixin Shen <bobankhshen@gmail.com>
Link: https://lore.kernel.org/r/20230329073558.8136-2-bobankhshen@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

562dc56a

tools: bpftool: json: Fix backslash escape typo in jsonw_puts · d8d8b008

Manu Bretelle authored Mar 29, 2023

This is essentially a backport of iproute2's
commit ed54f76484b5 ("json: fix backslash escape typo in jsonw_puts")

Also added the stdio.h include in json_writer.h to be able to compile
and run the json_writer test as used below).

Before this fix:

$ gcc -D notused -D TEST -I../../include -o json_writer  json_writer.c
json_writer.h
$ ./json_writer
{
    "Vyatta": {
        "url": "http://vyatta.com",
        "downloads": 2000000,
        "stock": 8.16,
        "ARGV": [],
        "empty": [],
        "NIL": {},
        "my_null": null,
        "special chars": [
            "slash": "/",
            "newline": "\n",
            "tab": "\t",
            "ff": "\f",
            "quote": "\"",
            "tick": "'",
            "backslash": "\n"
        ]
    }
}

After:

$ gcc -D notused -D TEST -I../../include -o json_writer  json_writer.c
json_writer.h
$ ./json_writer
{
    "Vyatta": {
        "url": "http://vyatta.com",
        "downloads": 2000000,
        "stock": 8.16,
        "ARGV": [],
        "empty": [],
        "NIL": {},
        "my_null": null,
        "special chars": [
            "slash": "/",
            "newline": "\n",
            "tab": "\t",
            "ff": "\f",
            "quote": "\"",
            "tick": "'",
            "backslash": "\\"
        ]
    }
}
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20230329073002.2026563-1-chantr4@gmail.com

d8d8b008

28 Mar, 2023 4 commits

Merge branch 'verifier/xdp_direct_packet_access.c converted to inline assembly' · 07561769

Andrii Nakryiko authored Mar 28, 2023

Eduard Zingerman says:

====================

verifier/xdp_direct_packet_access.c automatically converted to inline
assembly using [1].

This is a leftover from [2], the last patch in a batch was blocked by
mail server for being too long. This patch-set splits it in two:
- one to add migrated test to progs/
- one to remove old test from verifier/

[1] Migration tool
    https://github.com/eddyz87/verifier-tests-migrator
[2] First batch of migrated verifier/*.c tests
    https://lore.kernel.org/bpf/167979433109.17761.17302808621381963629.git-patchwork-notify@kernel.org/
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

07561769

selftests/bpf: Remove verifier/xdp_direct_packet_access.c, converted to... · c63a7d8b

Eduard Zingerman authored Mar 28, 2023

selftests/bpf: Remove verifier/xdp_direct_packet_access.c, converted to progs/verifier_xdp_direct_packet_access.c

Removing verifier/xdp_direct_packet_access.c.c as it was automatically converted to use
inline assembly in the previous commit. It is available in
progs/verifier_xdp_direct_packet_access.c.c.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230328020813.392560-3-eddyz87@gmail.com

c63a7d8b

selftests/bpf: Verifier/xdp_direct_packet_access.c converted to inline assembly · 6e9e141a

Eduard Zingerman authored Mar 28, 2023

Test verifier/xdp_direct_packet_access.c automatically converted to use inline assembly.
Original test would be removed in the next patch.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230328020813.392560-2-eddyz87@gmail.com

6e9e141a

libbpf: Fix double-free when linker processes empty sections · d08ab82f

Eduard Zingerman authored Mar 28, 2023

Double-free error in bpf_linker__free() was reported by James Hilliard.
The error is caused by miss-use of realloc() in extend_sec().
The error occurs when two files with empty sections of the same name
are linked:
- when first file is processed:
  - extend_sec() calls realloc(dst->raw_data, dst_align_sz)
    with dst->raw_data == NULL and dst_align_sz == 0;
  - dst->raw_data is set to a special pointer to a memory block of
    size zero;
- when second file is processed:
  - extend_sec() calls realloc(dst->raw_data, dst_align_sz)
    with dst->raw_data == <special pointer> and dst_align_sz == 0;
  - realloc() "frees" dst->raw_data special pointer and returns NULL;
  - extend_sec() exits with -ENOMEM, and the old dst->raw_data value
    is preserved (it is now invalid);
  - eventually, bpf_linker__free() attempts to free dst->raw_data again.

This patch fixes the bug by avoiding -ENOMEM exit for dst_align_sz == 0.
The fix was suggested by Andrii Nakryiko <andrii.nakryiko@gmail.com>.
Reported-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: James Hilliard <james.hilliard1@gmail.com>
Link: https://lore.kernel.org/bpf/CADvTj4o7ZWUikKwNTwFq0O_AaX+46t_+Ca9gvWMYdWdRtTGeHQ@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20230328004738.381898-3-eddyz87@gmail.com

d08ab82f

27 Mar, 2023 2 commits

selftests/bpf: Don't assume page size is 4096 · 7283137a

Hengqi Chen authored Mar 26, 2023

The verifier test creates BPF ringbuf maps using hard-coded
4096 as max_entries. Some tests will fail if the page size
of the running kernel is not 4096. Use getpagesize() instead.
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230326095341.816023-1-hengqi.chen@gmail.com

7283137a

libbpf: Ensure print callback usage is thread-safe · f1cb927c

JP Kobryn authored Mar 24, 2023

This patch prevents races on the print function pointer, allowing the
libbpf_set_print() function to become thread-safe.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230325010845.46000-1-inwardvessel@gmail.com

f1cb927c

26 Mar, 2023 5 commits

xsk: allow remap of fill and/or completion rings · 5f5a7d8d

Nuno Gonçalves authored Mar 24, 2023

The remap of fill and completion rings was frowned upon as they
control the usage of UMEM which does not support concurrent use.
At the same time this would disallow the remap of these rings
into another process.

A possible use case is that the user wants to transfer the socket/
UMEM ownership to another process (via SYS_pidfd_getfd) and so
would need to also remap these rings.

This will have no impact on current usages and just relaxes the
remap limitation.
Signed-off-by: Nuno Gonçalves <nunog@fr24.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230324100222.13434-1-nunog@fr24.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

5f5a7d8d

bpf, docs: Add extended call instructions · 8cfee110

Dave Thaler authored Mar 26, 2023

Add extended call instructions.  Uses the term "program-local" for
call by offset.  And there are instructions for calling helper functions
by "address" (the old way of using integer values), and for calling
helper functions by BTF ID (for kfuncs).

V1 -> V2: addressed comments from David Vernet

V2 -> V3: make descriptions in table consistent with updated names

V3 -> V4: addressed comments from Alexei

V4 -> V5: fixed alignment
Signed-off-by: Dave Thaler <dthaler@microsoft.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230326033117.1075-1-dthaler1968@googlemail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

8cfee110

Merge branch 'bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage' · 8d275960

Alexei Starovoitov authored Mar 25, 2023

Martin KaFai Lau says:

====================

From: Martin KaFai Lau <martin.lau@kernel.org>

This set is a continuation of the effort in using
bpf_mem_cache_alloc/free in bpf_local_storage [1]

Major change is only using bpf_mem_alloc for task and cgrp storage
while sk and inode stay with kzalloc/kfree. The details is
in patch 2.

[1]: https://lore.kernel.org/bpf/20230308065936.1550103-1-martin.lau@linux.dev/

v3:
- Only use bpf_mem_alloc for task and cgrp storage.
- sk and inode storage stay with kzalloc/kfree.
- Check NULL and add comments in bpf_mem_cache_raw_free() in patch 1.
- Added test and benchmark for task storage.

v2:
- Added bpf_mem_cache_alloc_flags() and bpf_mem_cache_raw_free()
  to hide the internal data structure of the bpf allocator.
- Fixed a typo bug in bpf_selem_free()
- Simplified the test_local_storage test by directly using
  err returned from libbpf
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

8d275960

selftests/bpf: Add bench for task storage creation · cbe9d93d

Martin KaFai Lau authored Mar 22, 2023

This patch adds a task storage benchmark to the existing
local-storage-create benchmark.

For task storage,
./bench --storage-type task --batch-size 32:
   bpf_ma: Summary: creates   30.456 ± 0.507k/s ( 30.456k/prod), 6.08 kmallocs/create
no bpf_ma: Summary: creates   31.962 ± 0.486k/s ( 31.962k/prod), 6.13 kmallocs/create

./bench --storage-type task --batch-size 64:
   bpf_ma: Summary: creates   30.197 ± 1.476k/s ( 30.197k/prod), 6.08 kmallocs/create
no bpf_ma: Summary: creates   31.103 ± 0.297k/s ( 31.103k/prod), 6.13 kmallocs/create
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20230322215246.1675516-6-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

cbe9d93d

selftests/bpf: Test task storage when local_storage->smap is NULL · d8db84d7

Martin KaFai Lau authored Mar 22, 2023

The current sk storage test ensures the memory free works when
the local_storage->smap is NULL.

This patch adds a task storage test to ensure the memory free
code path works when local_storage->smap is NULL.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20230322215246.1675516-5-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d8db84d7