- 07 Dec, 2022 11 commits
-
-
Alexei Starovoitov authored
Andrii Nakryiko says: ==================== Disentangle prune and jump points in BPF verifier code. They are conceptually independent but currently coupled together. This small patch set refactors related code and make it possible to have some instruction marked as pruning or jump point independently. Besides just conceptual cleanliness, this allows to remove unnecessary jump points (saving a tiny bit of performance and memory usage, potentially), and even more importantly it allows for clean extension of special pruning points, similarly to how it's done for BPF_FUNC_timer_set_callback. This will be used by future patches implementing open-coded BPF iterators. v1->v2: - clarified path #3 commit message and a comment in the code (John); - added back mark_jmp_point() to right after subprog call to record non-linear implicit jump from BPF_EXIT to right after CALL <subprog>. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Don't mark some instructions as jump points when there are actually no jumps and instructions are just processed sequentially. Such case is handled naturally by precision backtracking logic without the need to update jump history. See get_prev_insn_idx(). It goes back linearly by one instruction, unless current top of jmp_history is pointing to current instruction. In such case we use `st->jmp_history[cnt - 1].prev_idx` to find instruction from which we jumped to the current instruction non-linearly. Also remove both jump and prune point marking for instruction right after unconditional jumps, as program flow can get to the instruction right after unconditional jump instruction only if there is a jump to that instruction from somewhere else in the program. In such case we'll mark such instruction as prune/jump point because it's a destination of a jump. This change has no changes in terms of number of instructions or states processes across Cilium and selftests programs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20221206233345.438540-4-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Jump history updating and state equivalence checks are conceptually independent, so move push_jmp_history() out of is_state_visited(). Also make a decision whether to perform state equivalence checks or not one layer higher in do_check(), keeping is_state_visited() unconditionally performing state checks. push_jmp_history() should be performed after state checks. There is just one small non-uniformity. When is_state_visited() finds already validated equivalent state, it propagates precision marks to current state's parent chain. For this to work correctly, jump history has to be updated, so is_state_visited() is doing that internally. But if no equivalent verified state is found, jump history has to be updated in a newly cloned child state, so is_jmp_point() + push_jmp_history() is performed after is_state_visited() exited with zero result, which means "proceed with validation". This change has no functional changes. It's not strictly necessary, but feels right to decouple these two processes. Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20221206233345.438540-3-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
BPF verifier marks some instructions as prune points. Currently these prune points serve two purposes. It's a point where verifier tries to find previously verified state and check current state's equivalence to short circuit verification for current code path. But also currently it's a point where jump history, used for precision backtracking, is updated. This is done so that non-linear flow of execution could be properly backtracked. Such coupling is coincidental and unnecessary. Some prune points are not part of some non-linear jump path, so don't need update of jump history. On the other hand, not all instructions which have to be recorded in jump history necessarily are good prune points. This patch splits prune and jump points into independent flags. Currently all prune points are marked as jump points to minimize amount of changes in this patch, but next patch will perform some optimization of prune vs jmp point placement. No functional changes are intended. Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20221206233345.438540-2-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Dave Marchevsky authored
btf->struct_meta_tab is populated by btf_parse_struct_metas in btf.c. There, a BTF record is created for any type containing a spin_lock or any next-gen datastructure node/head. Currently, for non-MAP_VALUE types, reg_btf_record will only search for a record using struct_meta_tab if the reg->type exactly matches (PTR_TO_BTF_ID | MEM_ALLOC). This exact match is too strict: an "allocated obj" type - returned from bpf_obj_new - might pick up other flags while working its way through the program. Loosen the check to be exact for base_type and just use MEM_ALLOC mask for type_flag. This patch is marked Fixes as the original intent of reg_btf_record was unlikely to have been to fail finding btf_record for valid alloc obj types with additional flags, some of which (e.g. PTR_UNTRUSTED) are valid register type states for alloc obj independent of this series. However, I didn't find a specific broken repro case outside of this series' added functionality, so it's possible that nothing was triggering this logic error before. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> cc: Kumar Kartikeya Dwivedi <memxor@gmail.com> Fixes: 4e814da0 ("bpf: Allow locking bpf_spin_lock in allocated objects") Link: https://lore.kernel.org/r/20221206231000.3180914-2-davemarchevsky@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
David Vernet authored
A series of prior patches added some kfuncs that allow struct task_struct * objects to be used as kptrs. These kfuncs leveraged the 'refcount_t rcu_users' field of the task for performing refcounting. This field was used instead of 'refcount_t usage', as we wanted to leverage the safety provided by RCU for ensuring a task's lifetime. A struct task_struct is refcounted by two different refcount_t fields: 1. p->usage: The "true" refcount field which task lifetime. The task is freed as soon as this refcount drops to 0. 2. p->rcu_users: An "RCU users" refcount field which is statically initialized to 2, and is co-located in a union with a struct rcu_head field (p->rcu). p->rcu_users essentially encapsulates a single p->usage refcount, and when p->rcu_users goes to 0, an RCU callback is scheduled on the struct rcu_head which decrements the p->usage refcount. Our logic was that by using p->rcu_users, we would be able to use RCU to safely issue refcount_inc_not_zero() a task's rcu_users field to determine if a task could still be acquired, or was exiting. Unfortunately, this does not work due to p->rcu_users and p->rcu sharing a union. When p->rcu_users goes to 0, an RCU callback is scheduled to drop a single p->usage refcount, and because the fields share a union, the refcount immediately becomes nonzero again after the callback is scheduled. If we were to split the fields out of the union, this wouldn't be a problem. Doing so should also be rather non-controversial, as there are a number of places in struct task_struct that have padding which we could use to avoid growing the structure by splitting up the fields. For now, so as to fix the kfuncs to be correct, this patch instead updates bpf_task_acquire() and bpf_task_release() to use the p->usage field for refcounting via the get_task_struct() and put_task_struct() functions. Because we can no longer rely on RCU, the change also guts the bpf_task_acquire_not_zero() and bpf_task_kptr_get() functions pending a resolution on the above problem. In addition, the task fixes the kfunc and rcu_read_lock selftests to expect this new behavior. Fixes: 90660309 ("bpf: Add kfuncs for storing struct task_struct * as a kptr") Fixes: fca1aa75 ("bpf: Handle MEM_RCU type properly") Reported-by: Matus Jokay <matus.jokay@stuba.sk> Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221206210538.597606-1-void@manifault.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Daan De Meyer says: ==================== This patch series fixes a few issues I've found while integrating the bpf selftests into systemd's mkosi development environment. ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
-
Daan De Meyer authored
CONFIG_TEST_BPF can only be a module, so let's indicate it as such in the selftests config. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221205131618.1524337-4-daan.j.demeyer@gmail.com
-
Daan De Meyer authored
"=n" is not valid kconfig syntax. Use "is not set" instead to indicate the option should be disabled. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221205131618.1524337-3-daan.j.demeyer@gmail.com
-
Daan De Meyer authored
When installing the selftests using "make -C tools/testing/selftests install", we need to make sure all the required files to run the selftests are installed. Let's make sure this is the case. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221205131618.1524337-2-daan.j.demeyer@gmail.com
-
Timo Hunziker authored
Parse USDT arguments like "8@(%rsp)" on x86. These are emmited by SystemTap. The argument syntax is similar to the existing "memory dereference case" but the offset left out as it's zero (i.e. read the value from the address in the register). We treat it the same as the the "memory dereference case", but set the offset to 0. I've tested that this fixes the "unrecognized arg #N spec: 8@(%rsp).." error I've run into when attaching to a probe with such an argument. Attaching and reading the correct argument values works. Something similar might be needed for the other supported architectures. [0] Closes: https://github.com/libbpf/libbpf/issues/559Signed-off-by: Timo Hunziker <timo.hunziker@gmx.ch> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221203123746.2160-1-timo.hunziker@eclipso.ch
-
- 06 Dec, 2022 8 commits
-
-
Martin KaFai Lau authored
It is useful to use vmlinux.h in the xfrm_info test like other kfunc tests do. In particular, it is common for kfunc bpf prog that requires to use other core kernel structures in vmlinux.h Although vmlinux.h is preferred, it needs a ___local flavor of struct bpf_xfrm_info in order to build the bpf selftests when CONFIG_XFRM_INTERFACE=[m|n]. Cc: Eyal Birger <eyal.birger@gmail.com> Fixes: 90a3a05e ("selftests/bpf: add xfrm_info tests") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20221206193554.1059757-1-martin.lau@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Miaoqian Lin authored
strdup() allocates memory for path. We need to release the memory in the following error path. Add free() to avoid memory leak. Fixes: 8f184732 ("bpftool: Switch to libbpf's hashmap for pinned paths of BPF objects") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20221206071906.806384-1-linmq006@gmail.com
-
Pu Lehui authored
For BPF_PSEUDO_FUNC instruction, verifier will refill imm with correct addresses of bpf_calls and then run last pass of JIT. Since the emit_imm of RV64 is variable-length, which will emit appropriate length instructions accorroding to the imm, it may broke ctx->offset, and lead to unpredictable problem, such as inaccurate jump. So let's fix it with fixed-length instructions. Fixes: 69c087ba ("bpf: Add bpf_for_each_map_elem() helper") Suggested-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Björn Töpel <bjorn@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20221206091410.1584784-1-pulehui@huaweicloud.com
-
Martin KaFai Lau authored
Eyal Birger says: ==================== This patch series adds xfrm metadata helpers using the unstable kfunc call interface for the TC-BPF hooks. This allows steering traffic towards different IPsec connections based on logic implemented in bpf programs. The helpers are integrated into the xfrm_interface module. For this purpose the main functionality of this module is moved to xfrm_interface_core.c. --- changes in v6: fix sparse warning in patch 2 changes in v5: - avoid cleanup of percpu dsts as detailed in patch 2 changes in v3: - tag bpf-next tree instead of ipsec-next - add IFLA_XFRM_COLLECT_METADATA sync patch ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Eyal Birger authored
Test the xfrm_info kfunc helpers. The test setup creates three name spaces - NS0, NS1, NS2. XFRM tunnels are setup between NS0 and the two other NSs. The kfunc helpers are used to steer traffic from NS0 to the other NSs based on a userspace populated bpf global variable and validate that the return traffic had arrived from the desired NS. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20221203084659.1837829-5-eyal.birger@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Eyal Birger authored
Needed for XFRM metadata tests. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20221203084659.1837829-4-eyal.birger@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Eyal Birger authored
This change adds xfrm metadata helpers using the unstable kfunc call interface for the TC-BPF hooks. This allows steering traffic towards different IPsec connections based on logic implemented in bpf programs. This object is built based on the availability of BTF debug info. When setting the xfrm metadata, percpu metadata dsts are used in order to avoid allocating a metadata dst per packet. In order to guarantee safe module unload, the percpu dsts are allocated on first use and never freed. The percpu pointer is stored in net/core/filter.c so that it can be reused on module reload. The metadata percpu dsts take ownership of the original skb dsts so that they may be used as part of the xfrm transmission logic - e.g. for MTU calculations. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20221203084659.1837829-3-eyal.birger@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Eyal Birger authored
This change allows adding additional files to the xfrm_interface module. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20221203084659.1837829-2-eyal.birger@gmail.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
- 05 Dec, 2022 6 commits
-
-
James Hilliard authored
Both tolower and toupper are built in c functions, we should not redefine them as this can result in a build error. Fixes the following errors: progs/bpf_iter_ksym.c:10:20: error: conflicting types for built-in function 'tolower'; expected 'int(int)' [-Werror=builtin-declaration-mismatch] 10 | static inline char tolower(char c) | ^~~~~~~ progs/bpf_iter_ksym.c:5:1: note: 'tolower' is declared in header '<ctype.h>' 4 | #include <bpf/bpf_helpers.h> +++ |+#include <ctype.h> 5 | progs/bpf_iter_ksym.c:17:20: error: conflicting types for built-in function 'toupper'; expected 'int(int)' [-Werror=builtin-declaration-mismatch] 17 | static inline char toupper(char c) | ^~~~~~~ progs/bpf_iter_ksym.c:17:20: note: 'toupper' is declared in header '<ctype.h>' See background on this sort of issue: https://stackoverflow.com/a/20582607 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12213 (C99, 7.1.3p1) "All identifiers with external linkage in any of the following subclauses (including the future library directions) are always reserved for use as identifiers with external linkage." This is documented behavior in GCC: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-std-2Signed-off-by: James Hilliard <james.hilliard1@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20221203010847.2191265-1-james.hilliard1@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Eric Dumazet authored
sock_map_free() calls release_sock(sk) without owning a reference on the socket. This can cause use-after-free as syzbot found [1] Jakub Sitnicki already took care of a similar issue in sock_hash_free() in commit 75e68e5b ("bpf, sockhash: Synchronize delete from bucket list on map free") [1] refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 0 PID: 3785 at lib/refcount.c:31 refcount_warn_saturate+0x17c/0x1a0 lib/refcount.c:31 Modules linked in: CPU: 0 PID: 3785 Comm: kworker/u4:6 Not tainted 6.1.0-rc7-syzkaller-00103-gef4d3ea4 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Workqueue: events_unbound bpf_map_free_deferred RIP: 0010:refcount_warn_saturate+0x17c/0x1a0 lib/refcount.c:31 Code: 68 8b 31 c0 e8 75 71 15 fd 0f 0b e9 64 ff ff ff e8 d9 6e 4e fd c6 05 62 9c 3d 0a 01 48 c7 c7 80 bb 68 8b 31 c0 e8 54 71 15 fd <0f> 0b e9 43 ff ff ff 89 d9 80 e1 07 80 c1 03 38 c1 0f 8c a2 fe ff RSP: 0018:ffffc9000456fb60 EFLAGS: 00010246 RAX: eae59bab72dcd700 RBX: 0000000000000004 RCX: ffff8880207057c0 RDX: 0000000000000000 RSI: 0000000000000201 RDI: 0000000000000000 RBP: 0000000000000004 R08: ffffffff816fdabd R09: fffff520008adee5 R10: fffff520008adee5 R11: 1ffff920008adee4 R12: 0000000000000004 R13: dffffc0000000000 R14: ffff88807b1c6c00 R15: 1ffff1100f638dcf FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b30c30000 CR3: 000000000d08e000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] __sock_put include/net/sock.h:779 [inline] tcp_release_cb+0x2d0/0x360 net/ipv4/tcp_output.c:1092 release_sock+0xaf/0x1c0 net/core/sock.c:3468 sock_map_free+0x219/0x2c0 net/core/sock_map.c:356 process_one_work+0x81c/0xd10 kernel/workqueue.c:2289 worker_thread+0xb14/0x1330 kernel/workqueue.c:2436 kthread+0x266/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 </TASK> Fixes: 7e81a353 ("bpf: Sockmap, ensure sock lock held during tear down") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Song Liu <songliubraving@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20221202111640.2745533-1-edumazet@google.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Toke Høiland-Jørgensen authored
The bpf_ct_set_nat_info() kfunc is defined in the nf_nat.ko module, and takes as a parameter the nf_conn___init struct, which is allocated through the bpf_xdp_ct_alloc() helper defined in the nf_conntrack.ko module. However, because kernel modules can't deduplicate BTF types between each other, and the nf_conn___init struct is not referenced anywhere in vmlinux BTF, this leads to two distinct BTF IDs for the same type (one in each module). This confuses the verifier, as described here: https://lore.kernel.org/all/87leoh372s.fsf@toke.dk/ As a workaround, add an explicit BTF_TYPE_EMIT for the type in net/filter.c, so the type definition gets included in vmlinux BTF. This way, both modules can refer to the same type ID (as they both build on top of vmlinux BTF), and the verifier is no longer confused. v2: - Use BTF_TYPE_EMIT (which is a statement so it has to be inside a function definition; use xdp_func_proto() for this, since this is mostly xdp-related). Fixes: 820dc052 ("net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20221201123939.696558-1-toke@redhat.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Yonghong Song authored
Add three tests for cgrp local storage support for sleepable progs. Two tests can load and run properly, one for cgroup_iter, another for passing current->cgroups->dfl_cgrp to bpf_cgrp_storage_get() helper. One test has bpf_rcu_read_lock() and failed to load. Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221201050449.2785613-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Yonghong Song authored
Similar to sk/inode/task local storage, enable sleepable support for cgrp local storage. Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221201050444.2785007-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Sreevani Sreejith authored
Document that describes how BPF iterators work, how to use iterators, and how to pass parameters in BPF iterators. Acked-by: David Vernet <void@manifault.com> Signed-off-by: Sreevani Sreejith <psreep@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221202221710.320810-2-ssreevani@meta.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 04 Dec, 2022 5 commits
-
-
Yonghong Song authored
Martin mentioned that the verifier cannot assume arguments from LSM hook sk_alloc_security being trusted since after the hook is called, the sk ref_count is set to 1. This will overwrite the ref_count changed by the bpf program and may cause ref_count underflow later on. I then further checked some other hooks. For example, for bpf_lsm_file_alloc() hook in fs/file_table.c, f->f_cred = get_cred(cred); error = security_file_alloc(f); if (unlikely(error)) { file_free_rcu(&f->f_rcuhead); return ERR_PTR(error); } atomic_long_set(&f->f_count, 1); The input parameter 'f' to security_file_alloc() cannot be trusted as well. Specifically, I investiaged bpf_map/bpf_prog/file/sk/task alloc/free lsm hooks. Except bpf_map_alloc and task_alloc, arguments for all other hooks should not be considered as trusted. This may not be a complete list, but it covers common usage for sk and task. Fixes: 3f00c523 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs") Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221203204954.2043348-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Alexei Starovoitov authored
Yonghong Song says: ==================== Patch set [1] added rcu support for bpf programs. In [1], a rcu pointer is considered to be trusted and not null. This is actually not true in some cases. The rcu pointer could be null, and for non-null rcu pointer, it may have reference count of 0. This small patch set fixed this problem. Patch 1 is the kernel fix. Patch 2 adjusted selftests properly. Patch 3 added documentation for newly-introduced KF_RCU flag. [1] https://lore.kernel.org/all/20221124053201.2372298-1-yhs@fb.com/ Changelogs: v1 -> v2: - rcu ptr could be NULL. - non_null_rcu_ptr->rcu_field can be marked as MEM_RCU as well. - Adjust the code to avoid existing error message change. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Yonghong Song authored
Add proper KF_RCU documentation in kfuncs.rst. Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221203184613.478967-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Yonghong Song authored
Add MEM_RCU pointer null checking for related tests. Also modified task_acquire test so it takes a rcu ptr 'ptr' where 'ptr = rcu_ptr->rcu_field'. Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221203184607.478314-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Yonghong Song authored
Commit 9bb00b28 ("bpf: Add kfunc bpf_rcu_read_lock/unlock()") introduced MEM_RCU and bpf_rcu_read_lock/unlock() support. In that commit, a rcu pointer is tagged with both MEM_RCU and PTR_TRUSTED so that it can be passed into kfuncs or helpers as an argument. Martin raised a good question in [1] such that the rcu pointer, although being able to accessing the object, might have reference count of 0. This might cause a problem if the rcu pointer is passed to a kfunc which expects trusted arguments where ref count should be greater than 0. This patch makes the following changes related to MEM_RCU pointer: - MEM_RCU pointer might be NULL (PTR_MAYBE_NULL). - Introduce KF_RCU so MEM_RCU ptr can be acquired with a KF_RCU tagged kfunc which assumes ref count of rcu ptr could be zero. - For mem access 'b = ptr->a', say 'ptr' is a MEM_RCU ptr, and 'a' is tagged with __rcu as well. Let us mark 'b' as MEM_RCU | PTR_MAYBE_NULL. [1] https://lore.kernel.org/bpf/ac70f574-4023-664e-b711-e0d3b18117fd@linux.dev/ Fixes: 9bb00b28 ("bpf: Add kfunc bpf_rcu_read_lock/unlock()") Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221203184602.477272-1-yhs@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 03 Dec, 2022 2 commits
-
-
Xin Liu authored
Current libbpf Makefile does not contain the help command, which is inconvenient to use. Similar to the Makefile help command of the perf, a help command is provided to list the commands supported by libbpf make and the functions of the commands. Signed-off-by: Xin Liu <liuxin350@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221202081738.128513-1-liuxin350@huawei.com
-
James Hilliard authored
The bpf_legacy.h header uses llvm specific load functions, add GCC compatible variants as well to fix tests using these functions under GCC. Signed-off-by: James Hilliard <james.hilliard1@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221201190939.3230513-1-james.hilliard1@gmail.com
-
- 02 Dec, 2022 3 commits
-
-
Zheng Yejian authored
Refer to description of BPF_XOR, dst_reg should be used but not src_reg in the examples. Fixes: be3193cd ("bpf, docs: Add subsections for ALU and JMP instructions") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20221129134558.2757043-1-zhengyejian1@huawei.com
-
Dave Marchevsky authored
Modify list_push_pop_multiple to alloc and insert nodes 2-at-a-time. Without the previous patch's fix, this block of code: bpf_spin_lock(lock); bpf_list_push_front(head, &f[i]->node); bpf_list_push_front(head, &f[i + 1]->node); bpf_spin_unlock(lock); would fail check_reference_leak check as release_on_unlock logic would miss a ref that should've been released. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> cc: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221201183406.1203621-2-davemarchevsky@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
Dave Marchevsky authored
Consider a verifier state with three acquired references, all with release_on_unlock = true: idx 0 1 2 state->refs = [2 4 6] (with 2, 4, and 6 being the ref ids). When bpf_spin_unlock is called, process_spin_lock will loop through all acquired_refs and, for each ref, if it's release_on_unlock, calls release_reference on it. That function in turn calls release_reference_state, which removes the reference from state->refs by swapping the reference state with the last reference state in refs array and decrements acquired_refs count. process_spin_lock's loop logic, which is essentially: for (i = 0; i < state->acquired_refs; i++) { if (!state->refs[i].release_on_unlock) continue; release_reference(state->refs[i].id); } will fail to release release_on_unlock references which are swapped from the end. Running this logic on our example demonstrates: state->refs = [2 4 6] (start of idx=0 iter) release state->refs[0] by swapping w/ state->refs[2] state->refs = [6 4] (start of idx=1) release state->refs[1], no need to swap as it's the last idx state->refs = [6] (start of idx=2, loop terminates) ref_id 6 should have been removed but was skipped. Fix this by looping from back-to-front, which results in refs that are candidates for removal being swapped with refs which have already been examined and kept. If we modify our initial example such that ref 6 is replaced with ref 7, which is _not_ release_on_unlock, and loop from the back, we'd see: state->refs = [2 4 7] (start of idx=2) state->refs = [2 4 7] (start of idx=1) state->refs = [2 7] (start of idx=0, refs 7 and 4 swapped) state->refs = [7] (after idx=0, 7 and 2 swapped, loop terminates) Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Acked-by: Yonghong Song <yhs@fb.com> cc: Kumar Kartikeya Dwivedi <memxor@gmail.com> Fixes: 534e86bc ("bpf: Add 'release on unlock' logic for bpf_list_push_{front,back}") Link: https://lore.kernel.org/r/20221201183406.1203621-1-davemarchevsky@fb.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>
-
- 01 Dec, 2022 5 commits
-
-
Yonghong Song authored
When building the kernel with clang lto (CONFIG_LTO_CLANG_FULL=y), the following compilation error will appear: $ make LLVM=1 LLVM_IAS=1 -j ... ld.lld: error: ld-temp.o <inline asm>:26889:1: symbol 'cgroup_storage_map_btf_ids' is already defined cgroup_storage_map_btf_ids:; ^ make[1]: *** [/.../bpf-next/scripts/Makefile.vmlinux_o:61: vmlinux.o] Error 1 In local_storage.c, we have BTF_ID_LIST_SINGLE(cgroup_storage_map_btf_ids, struct, bpf_local_storage_map) Commit c4bcfb38 ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs") added the above identical BTF_ID_LIST_SINGLE definition in bpf_cgrp_storage.c. With duplicated definitions, llvm linker complains with lto build. Also, extracting btf_id of 'struct bpf_local_storage_map' is defined four times for sk, inode, task and cgrp local storages. Let us define a single global one with a different name than cgroup_storage_map_btf_ids, which also fixed the lto compilation error. Fixes: c4bcfb38 ("bpf: Implement cgroup storage available to non-cgroup-attached bpf progs") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221130052147.1591625-1-yhs@fb.com
-
Pengcheng Yang authored
Currently, the ingress redirect is not covered in "txmsg test apply". Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/1669718441-2654-5-git-send-email-yangpc@wangsu.com
-
Pengcheng Yang authored
Use apply_bytes on ingress redirect, when apply_bytes is less than the length of msg data, some data may be skipped and lost in bpf_tcp_ingress(). If there is still data in the scatterlist that has not been consumed, we cannot move the msg iter. Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/1669718441-2654-4-git-send-email-yangpc@wangsu.com
-
Pengcheng Yang authored
When redirecting, we use sk_msg_to_ingress() to get the BPF_F_INGRESS flag from the msg->flags. If apply_bytes is used and it is larger than the current data being processed, sk_psock_msg_verdict() will not be called when sendmsg() is called again. At this time, the msg->flags is 0, and we lost the BPF_F_INGRESS flag. So we need to save the BPF_F_INGRESS flag in sk_psock and use it when redirection. Fixes: 8934ce2f ("bpf: sockmap redirect ingress support") Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/1669718441-2654-3-git-send-email-yangpc@wangsu.com
-
Pengcheng Yang authored
In tcp_bpf_send_verdict() redirection, the eval variable is assigned to __SK_REDIRECT after the apply_bytes data is sent, if msg has more_data, sock_put() will be called multiple times. We should reset the eval variable to __SK_NONE every time more_data starts. This causes: IPv4: Attempt to release TCP socket in state 1 00000000b4c925d7 ------------[ cut here ]------------ refcount_t: addition on 0; use-after-free. WARNING: CPU: 5 PID: 4482 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0x110 Modules linked in: CPU: 5 PID: 4482 Comm: sockhash_bypass Kdump: loaded Not tainted 6.0.0 #1 Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014 Call Trace: <TASK> __tcp_transmit_skb+0xa1b/0xb90 ? __alloc_skb+0x8c/0x1a0 ? __kmalloc_node_track_caller+0x184/0x320 tcp_write_xmit+0x22a/0x1110 __tcp_push_pending_frames+0x32/0xf0 do_tcp_sendpages+0x62d/0x640 tcp_bpf_push+0xae/0x2c0 tcp_bpf_sendmsg_redir+0x260/0x410 ? preempt_count_add+0x70/0xa0 tcp_bpf_send_verdict+0x386/0x4b0 tcp_bpf_sendmsg+0x21b/0x3b0 sock_sendmsg+0x58/0x70 __sys_sendto+0xfa/0x170 ? xfd_validate_state+0x1d/0x80 ? switch_fpu_return+0x59/0xe0 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x37/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: cd9733f5 ("tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function") Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/1669718441-2654-2-git-send-email-yangpc@wangsu.com
-