- 19 Aug, 2021 7 commits
-
-
Prankur Gupta authored
Adding selftests for the newly added functionality to call bpf_setsockopt() and bpf_getsockopt() from setsockopt BPF programs. Test Details: 1. BPF Program Checks for changes in IPV6_TCLASS(SOL_IPV6) via setsockopt If the cca for the socket is not cubic do nothing If the newly set value for IPV6_TCLASS is 45 (0x2d) (as per our use-case) then change the cc from cubic to reno 2. User Space Program Creates an AF_INET6 socket and set the cca for that to be "cubic" Attach the program and set the IPV6_TCLASS to 0x2d using setsockopt Verify the cca for the socket changed to reno Signed-off-by: Prankur Gupta <prankgup@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210817224221.3257826-3-prankgup@fb.com
-
Prankur Gupta authored
Add logic to call bpf_setsockopt() and bpf_getsockopt() from setsockopt BPF programs. An example use case is when the user sets the IPV6_TCLASS socket option, we would also like to change the tcp-cc for that socket. We don't have any use case for calling bpf_setsockopt() from supposedly read- only sys_getsockopt(), so it is made available to BPF_CGROUP_SETSOCKOPT only at this point. Signed-off-by: Prankur Gupta <prankgup@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210817224221.3257826-2-prankgup@fb.com
-
Stanislav Fomichev authored
Same as previous patch but for the keys. memdup_bpfptr is renamed to kvmemdup_bpfptr (and converted to kvmalloc). Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210818235216.1159202-2-sdf@google.com
-
Stanislav Fomichev authored
Use kvmalloc/kvfree for temporary value when manipulating a map via syscall. kmalloc might not be sufficient for percpu maps where the value is big (and further multiplied by hundreds of CPUs). Can be reproduced with netcnt test on qemu with "-smp 255". Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210818235216.1159202-1-sdf@google.com
-
Yucong Sun authored
This patch adds a 1ms delay to reduce flakyness of the test. Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210819163609.2583758-1-fallentree@fb.com
-
Yonghong Song authored
Andrii reported that libbpf CI hit the following oops when running selftest send_signal: [ 1243.160719] BUG: kernel NULL pointer dereference, address: 0000000000000030 [ 1243.161066] #PF: supervisor read access in kernel mode [ 1243.161066] #PF: error_code(0x0000) - not-present page [ 1243.161066] PGD 0 P4D 0 [ 1243.161066] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 1243.161066] CPU: 1 PID: 882 Comm: new_name Tainted: G O 5.14.0-rc5 #1 [ 1243.161066] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 1243.161066] RIP: 0010:bpf_overflow_handler+0x9a/0x1e0 [ 1243.161066] Code: 5a 84 c0 0f 84 06 01 00 00 be 66 02 00 00 48 c7 c7 6d 96 07 82 48 8b ab 18 05 00 00 e8 df 55 eb ff 66 90 48 8d 75 48 48 89 e7 <ff> 55 30 41 89 c4 e8 fb c1 f0 ff 84 c0 0f 84 94 00 00 00 e8 6e 0f [ 1243.161066] RSP: 0018:ffffc900000c0d80 EFLAGS: 00000046 [ 1243.161066] RAX: 0000000000000002 RBX: ffff8881002e0dd0 RCX: 00000000b4b47cf8 [ 1243.161066] RDX: ffffffff811dcb06 RSI: 0000000000000048 RDI: ffffc900000c0d80 [ 1243.161066] RBP: 0000000000000000 R08: 0000000000000000 R09: 1a9d56bb00000000 [ 1243.161066] R10: 0000000000000001 R11: 0000000000080000 R12: 0000000000000000 [ 1243.161066] R13: ffffc900000c0e00 R14: ffffc900001c3c68 R15: 0000000000000082 [ 1243.161066] FS: 00007fc0be2d3380(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 [ 1243.161066] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1243.161066] CR2: 0000000000000030 CR3: 0000000104f8e000 CR4: 00000000000006e0 [ 1243.161066] Call Trace: [ 1243.161066] <IRQ> [ 1243.161066] __perf_event_overflow+0x4f/0xf0 [ 1243.161066] perf_swevent_hrtimer+0x116/0x130 [ 1243.161066] ? __lock_acquire+0x378/0x2730 [ 1243.161066] ? __lock_acquire+0x372/0x2730 [ 1243.161066] ? lock_is_held_type+0xd5/0x130 [ 1243.161066] ? find_held_lock+0x2b/0x80 [ 1243.161066] ? lock_is_held_type+0xd5/0x130 [ 1243.161066] ? perf_event_groups_first+0x80/0x80 [ 1243.161066] ? perf_event_groups_first+0x80/0x80 [ 1243.161066] __hrtimer_run_queues+0x1a3/0x460 [ 1243.161066] hrtimer_interrupt+0x110/0x220 [ 1243.161066] __sysvec_apic_timer_interrupt+0x8a/0x260 [ 1243.161066] sysvec_apic_timer_interrupt+0x89/0xc0 [ 1243.161066] </IRQ> [ 1243.161066] asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 1243.161066] RIP: 0010:finish_task_switch+0xaf/0x250 [ 1243.161066] Code: 31 f6 68 90 2a 09 81 49 8d 7c 24 18 e8 aa d6 03 00 4c 89 e7 e8 12 ff ff ff 4c 89 e7 e8 ca 9c 80 00 e8 35 af 0d 00 fb 4d 85 f6 <58> 74 1d 65 48 8b 04 25 c0 6d 01 00 4c 3b b0 a0 04 00 00 74 37 f0 [ 1243.161066] RSP: 0018:ffffc900001c3d18 EFLAGS: 00000282 [ 1243.161066] RAX: 000000000000031f RBX: ffff888104cf4980 RCX: 0000000000000000 [ 1243.161066] RDX: 0000000000000000 RSI: ffffffff82095460 RDI: ffffffff820adc4e [ 1243.161066] RBP: ffffc900001c3d58 R08: 0000000000000001 R09: 0000000000000001 [ 1243.161066] R10: 0000000000000001 R11: 0000000000080000 R12: ffff88813bd2bc80 [ 1243.161066] R13: ffff8881002e8000 R14: ffff88810022ad80 R15: 0000000000000000 [ 1243.161066] ? finish_task_switch+0xab/0x250 [ 1243.161066] ? finish_task_switch+0x70/0x250 [ 1243.161066] __schedule+0x36b/0xbb0 [ 1243.161066] ? _raw_spin_unlock_irqrestore+0x2d/0x50 [ 1243.161066] ? lockdep_hardirqs_on+0x79/0x100 [ 1243.161066] schedule+0x43/0xe0 [ 1243.161066] pipe_read+0x30b/0x450 [ 1243.161066] ? wait_woken+0x80/0x80 [ 1243.161066] new_sync_read+0x164/0x170 [ 1243.161066] vfs_read+0x122/0x1b0 [ 1243.161066] ksys_read+0x93/0xd0 [ 1243.161066] do_syscall_64+0x35/0x80 [ 1243.161066] entry_SYSCALL_64_after_hwframe+0x44/0xae The oops can also be reproduced with the following steps: ./vmtest.sh -s # at qemu shell cd /root/bpf && while true; do ./test_progs -t send_signal Further analysis showed that the failure is introduced with commit b89fbfbb ("bpf: Implement minimal BPF perf link"). With the above commit, the following scenario becomes possible: cpu1 cpu2 hrtimer_interrupt -> bpf_overflow_handler (due to closing link_fd) bpf_perf_link_release -> perf_event_free_bpf_prog -> perf_event_free_bpf_handler -> WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler) event->prog = NULL bpf_prog_run(event->prog, &ctx) In the above case, the event->prog is NULL for bpf_prog_run, hence causing oops. To fix the issue, check whether event->prog is NULL or not. If it is, do not call bpf_prog_run. This seems working as the above reproducible step runs more than one hour and I didn't see any failures. Fixes: b89fbfbb ("bpf: Implement minimal BPF perf link") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210819155209.1927994-1-yhs@fb.com
-
Daniel Borkmann authored
The BPF interpreter as well as x86-64 BPF JIT were both in line by allowing up to 33 tail calls (however odd that number may be!). Recently, this was changed for the interpreter to reduce it down to 32 with the assumption that this should have been the actual limit "which is in line with the behavior of the x86 JITs" according to b61a28cf ("bpf: Fix off-by-one in tail call count limiting"). Paul recently reported: I'm a bit surprised by this because I had previously tested the tail call limit of several JIT compilers and found it to be 33 (i.e., allowing chains of up to 34 programs). I've just extended a test program I had to validate this again on the x86-64 JIT, and found a limit of 33 tail calls again [1]. Also note we had previously changed the RISC-V and MIPS JITs to allow up to 33 tail calls [2, 3], for consistency with other JITs and with the interpreter. We had decided to increase these two to 33 rather than decrease the other JITs to 32 for backward compatibility, though that probably doesn't matter much as I'd expect few people to actually use 33 tail calls. [1] https://github.com/pchaigno/tail-call-bench/commit/ae7887482985b4b1745c9b2ef7ff9ae506c82886 [2] 96bc4432 ("bpf, riscv: Limit to 33 tail calls") [3] e49e6f6d ("bpf, mips: Limit to 33 tail calls") Therefore, revert b61a28cf to re-align interpreter to limit a maximum of 33 tail calls. While it is unlikely to hit the limit for the vast majority, programs in the wild could one way or another depend on this, so lets rather be a bit more conservative, and lets align the small remainder of JITs to 33. If needed in future, this limit could be slightly increased, but not decreased. Fixes: b61a28cf ("bpf: Fix off-by-one in tail call count limiting") Reported-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/CAO5pjwTWrC0_dzTbTHFPSqDwA56aVH+4KFGVqdq8=ASs0MqZGQ@mail.gmail.com
-
- 18 Aug, 2021 3 commits
-
-
Xu Liu authored
Add test to use get_netns_cookie() from BPF_PROG_TYPE_SOCK_OPS. Signed-off-by: Xu Liu <liuxu623@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210818105820.91894-3-liuxu623@gmail.com
-
Xu Liu authored
We'd like to be able to identify netns from sockops hooks to accelerate local process communication form different netns. Signed-off-by: Xu Liu <liuxu623@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210818105820.91894-2-liuxu623@gmail.com
-
Grant Seltzer authored
This patch renames a documentation libbpf.rst to index.rst. In order for readthedocs.org to pick this file up and properly build the documentation site. It also changes the title type of the ABI subsection in the naming convention doc. This is so that readthedocs.org doesn't treat this section as a separate document. Signed-off-by: Grant Seltzer <grantseltzer@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210818151313.49992-1-grantseltzer@gmail.com
-
- 17 Aug, 2021 19 commits
-
-
Colin Ian King authored
The variable allow is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210817170842.495440-1-colin.king@canonical.com
-
Andrii Nakryiko authored
Yonghong Song says: ==================== The bpf selftest send_signal() is flaky for its subtests trying to send signals in softirq/nmi context. To reduce flakiness, the signal-targetted process priority is boosted, which should minimize preemption of that process and improve the possibility that the underlying task in softirq/nmi context is the bpf_send_signal() wanted task. Patch #1 did a refactoring to use ASSERT_* instead of old CHECK macros. Patch #2 did actual change of boosting priority. Changelog: v1 -> v2: remove skip logic where the underlying task in interrupt context is not the intended one. ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
-
Yonghong Song authored
libbpf CI has reported send_signal test is flaky although I am not able to reproduce it in my local environment. But I am able to reproduce with on-demand libbpf CI ([1]). Through code analysis, the following is possible reason. The failed subtest runs bpf program in softirq environment. Since bpf_send_signal() only sends to a fork of "test_progs" process. If the underlying current task is not "test_progs", bpf_send_signal() will not be triggered and the subtest will fail. To reduce the chances where the underlying process is not the intended one, this patch boosted scheduling priority to -20 (highest allowed by setpriority() call). And I did 10 runs with on-demand libbpf CI with this patch and I didn't observe any failures. [1] https://github.com/libbpf/libbpf/actions/workflows/ondemand.ymlSigned-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210817190923.3186725-1-yhs@fb.com
-
Yonghong Song authored
Replace CHECK in send_signal.c with ASSERT_* macros as ASSERT_* macros are generally preferred. There is no funcitonality change. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210817190918.3186400-1-yhs@fb.com
-
Andrii Nakryiko authored
Yucong Sun says: ==================== This short series adds two new switches to test_progs, "-a" and "-d", adding support for both exact string matching, as well as '*' wildcards. It also cleans up the output to make it possible to generate allowlist/denylist using common cli tools. ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
-
Yucong Sun authored
This patch adds '-a' and '-d' arguments supporting both exact string match as well as using '*' wildcard in test/subtests selection. '-a' and '-t' can co-exists, same as '-d' and '-b', in which case they just add to the list of allowed or denied test selectors. Caveat: Same as the current substring matching mechanism, test and subtest selector applies independently, 'a*/b*' will execute all tests matching "a*", and with subtest name matching "b*", but tests matching "a*" that has no subtests will also be executed. Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210817044732.3263066-5-fallentree@fb.com
-
Yucong Sun authored
This patch add test name in subtest status message line, making it possible to grep ':OK' in the output to generate a list of passed test+subtest names, which can be processed to generate argument list to be used with "-a", "-d" exact string matching. Example: #1/1 align/mov:OK .. #1/12 align/pointer variable subtraction:OK #1 align:OK Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210817044732.3263066-4-fallentree@fb.com
-
Yucong Sun authored
In skip_account(), test->skip_cnt is set to 0 at the end, this makes next print statement never display SKIP status for the subtest. This patch moves the accounting logic after the print statement, fixing the issue. This patch also added SKIP status display for normal tests. Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210817044732.3263066-3-fallentree@fb.com
-
Yucong Sun authored
When using "-l", test_progs often is executed as non-root user, load_bpf_testmod() will fail and output errors. This patch skips loading bpf testmod when "-l" is specified, making output cleaner. Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210817044732.3263066-2-fallentree@fb.com
-
Yucong Sun authored
Using a fixed delay of 1 microsecond has proven flaky in slow CPU environment, e.g. Github Actions CI system. This patch adds exponential backoff with a cap of 50ms to reduce the flakiness of the test. Initial delay is chosen at random in the range [0ms, 5ms). Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210817045713.3307985-1-fallentree@fb.com
-
Yucong Sun authored
Using a fixed delay of 1 microsecond has proven flaky in slow CPU environment, e.g. Github Actions CI system. This patch adds exponential backoff with a cap of 50ms to reduce the flakiness of the test. Initial delay is chosen at random in the range [0ms, 5ms). Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210816175250.296110-1-fallentree@fb.com
-
Andrii Nakryiko authored
Jiang Wang says: ==================== This patch series add support for unix stream type for sockmap. Sockmap already supports TCP, UDP, unix dgram types. The unix stream support is similar to unix dgram. Also add selftests for unix stream type in sockmap tests. ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
-
Jiang Wang authored
Add two new test cases in sockmap tests, where unix stream is redirected to tcp and vice versa. Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-6-jiang.wang@bytedance.com
-
Jiang Wang authored
This is to prepare for adding new unix stream tests. Mostly renames, also pass the socket types as an argument. Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-5-jiang.wang@bytedance.com
-
Jiang Wang authored
Add two tests for unix stream to unix stream redirection in sockmap tests. Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-4-jiang.wang@bytedance.com
-
Jiang Wang authored
Previously, sockmap for AF_UNIX protocol only supports dgram type. This patch add unix stream type support, which is similar to unix_dgram_proto. To support sockmap, dgram and stream cannot share the same unix_proto anymore, because they have different implementations, such as unhash for stream type (which will remove closed or disconnected sockets from the map), so rename unix_proto to unix_dgram_proto and add a new unix_stream_proto. Also implement stream related sockmap functions. And add dgram key words to those dgram specific functions. Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-3-jiang.wang@bytedance.com
-
Jiang Wang authored
To support sockmap for af_unix stream type, implement read_sock, which is similar to the read_sock for unix dgram sockets. Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-2-jiang.wang@bytedance.com
-
Hengqi Chen authored
Add test for btf__load_vmlinux_btf/btf__load_module_btf APIs. The test loads bpf_testmod module BTF and check existence of a symbol which is known to exist. Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210815081035.205879-1-hengqi.chen@gmail.com
-
grantseltzer authored
This removes the libbpf_api.rst file from the kernel documentation. The intention for this file was to pull documentation from comments above API functions in libbpf. However, due to limitations of the kernel documentation system, this API documentation could not be versioned, which is counterintuative to how users expect to use it. There is also currently no doc comments, making this a blank page. Once the kernel comment documentation is actually contributed, it will still exist in the kernel repository, just in the code itself. A seperate site is being spun up to generate documentaiton from those comments in a way in which it can be versioned properly. This also reconfigures the bpf documentation index page to make it easier to sync to the previously mentioned documentaiton site. Signed-off-by: Grant Seltzer <grantseltzer@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210810020508.280639-1-grantseltzer@gmail.com
-
- 16 Aug, 2021 11 commits
-
-
Daniel Borkmann authored
Andrii Nakryiko says: ==================== This patch set implements an ability for users to specify custom black box u64 value for each BPF program attachment, bpf_cookie, which is available to BPF program at runtime. This is a feature that's critically missing for cases when some sort of generic processing needs to be done by the common BPF program logic (or even exactly the same BPF program) across multiple BPF hooks (e.g., many uniformly handled kprobes) and it's important to be able to distinguish between each BPF hook at runtime (e.g., for additional configuration lookup). The choice of restricting this to a fixed-size 8-byte u64 value is an explicit design decision. Making this configurable by users adds unnecessary complexity (extra memory allocations, extra complications on the verifier side to validate accesses to variable-sized data area) while not really opening up new possibilities. If user's use case requires storing more data per attachment, it's possible to use either global array, or ARRAY/HASHMAP BPF maps, where bpf_cookie would be used as an index into respective storage, populated by user-space code before creating BPF link. This gives user all the flexibility and control while keeping BPF verifier and BPF helper API simple. Currently, similar functionality can only be achieved through: - code-generation and BPF program cloning, which is very complicated and unmaintainable; - on-the-fly C code generation and further runtime compilation, which is what BCC uses and allows to do pretty simply. The big downside is a very heavy-weight Clang/LLVM dependency and inefficient memory usage (due to many BPF program clones and the compilation process itself); - in some cases (kprobes and sometimes uprobes) it's possible to do function IP lookup to get function-specific configuration. This doesn't work for all the cases (e.g., when attaching uprobes to shared libraries) and has higher runtime overhead and additional programming complexity due to BPF_MAP_TYPE_HASHMAP lookups. Up until recently, before bpf_get_func_ip() BPF helper was added, it was also very complicated and unstable (API-wise) to get traced function's IP from fentry/fexit and kretprobe. With libbpf and BPF CO-RE, runtime compilation is not an option, so to be able to build generic tracing tooling simply and efficiently, ability to provide additional bpf_cookie value for each *attachment* (as opposed to each BPF program) is extremely important. Two immediate users of this functionality are going to be libbpf-based USDT library (currently in development) and retsnoop ([0]), but I'm sure more applications will come once users get this feature in their kernels. To achieve above described, all perf_event-based BPF hooks are made available through a new BPF_LINK_TYPE_PERF_EVENT BPF link, which allows to use common LINK_CREATE command for program attachments and generally brings perf_event-based attachments into a common BPF link infrastructure. With that, LINK_CREATE gets ability to pass throught bpf_cookie value during link creation (BPF program attachment) time. bpf_get_attach_cookie() BPF helper is added to allow fetching this value at runtime from BPF program side. BPF cookie is stored either on struct perf_event itself and fetched from the BPF program context, or is passed through ambient BPF run context, added in c7603cfa ("bpf: Add ambient BPF runtime context stored in current"). On the libbpf side of things, BPF perf link is utilized whenever is supported by the kernel instead of using PERF_EVENT_IOC_SET_BPF ioctl on perf_event FD. All the tracing attach APIs are extended with OPTS and bpf_cookie is passed through corresponding opts structs. Last part of the patch set adds few self-tests utilizing new APIs. There are also a few refactorings along the way to make things cleaner and easier to work with, both in kernel (BPF_PROG_RUN and BPF_PROG_RUN_ARRAY), and throughout libbpf and selftests. Follow-up patches will extend bpf_cookie to fentry/fexit programs. While adding uprobe_opts, also extend it with ref_ctr_offset for specifying USDT semaphore (reference counter) offset. Update attach_probe selftests to validate its functionality. This is another feature (along with bpf_cookie) required for implementing libbpf-based USDT solution. [0] https://github.com/anakryiko/retsnoop v4->v5: - rebase on latest bpf-next to resolve merge conflict; - add ref_ctr_offset to uprobe_opts and corresponding selftest; v3->v4: - get rid of BPF_PROG_RUN macro in favor of bpf_prog_run() (Daniel); - move #ifdef CONFIG_BPF_SYSCALL check into bpf_set_run_ctx (Daniel); v2->v3: - user_ctx -> bpf_cookie, bpf_get_user_ctx -> bpf_get_attach_cookie (Peter); - fix BPF_LINK_TYPE_PERF_EVENT value fix (Jiri); - use bpf_prog_run() from bpf_prog_run_pin_on_cpu() (Yonghong); v1->v2: - fix build failures on non-x86 arches by gating on CONFIG_PERF_EVENTS. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Andrii Nakryiko authored
Extend attach_probe selftests to specify ref_ctr_offset for uprobe/uretprobe and validate that its value is incremented from zero. Turns out that once uprobe is attached with ref_ctr_offset, uretprobe for the same location/function *has* to use ref_ctr_offset as well, otherwise perf_event_open() fails with -EINVAL. So this test uses ref_ctr_offset for both uprobe and uretprobe, even though for the purpose of test uprobe would be enough. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210815070609.987780-17-andrii@kernel.org
-
Andrii Nakryiko authored
When attaching to uprobes through perf subsystem, it's possible to specify offset of a so-called USDT semaphore, which is just a reference counted u16, used by kernel to keep track of how many tracers are attached to a given location. Support for this feature was added in [0], so just wire this through uprobe_opts. This is important to enable implementing USDT attachment and tracing through libbpf's bpf_program__attach_uprobe_opts() API. [0] a6ca88b2 ("trace_uprobe: support reference counter in fd-based uprobe") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210815070609.987780-16-andrii@kernel.org
-
Andrii Nakryiko authored
Add selftest with few subtests testing proper bpf_cookie usage. Kprobe and uprobe subtests are pretty straightforward and just validate that the same BPF program attached with different bpf_cookie will be triggered with those different bpf_cookie values. Tracepoint subtest is a bit more interesting, as it is the only perf_event-based BPF hook that shares bpf_prog_array between multiple perf_events internally. This means that the same BPF program can't be attached to the same tracepoint multiple times. So we have 3 identical copies. This arrangement allows to test bpf_prog_array_copy()'s handling of bpf_prog_array list manipulation logic when programs are attached and detached. The test validates that bpf_cookie isn't mixed up and isn't lost during such list manipulations. Perf_event subtest validates that two BPF links can be created against the same perf_event (but not at the same time, only one BPF program can be attached to perf_event itself), and that for each we can specify different bpf_cookie value. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210815070609.987780-15-andrii@kernel.org
-
Andrii Nakryiko authored
Extract two helpers used for working with uprobes into trace_helpers.{c,h} to be re-used between multiple uprobe-using selftests. Also rename get_offset() into more appropriate get_uprobe_offset(). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210815070609.987780-14-andrii@kernel.org
-
Andrii Nakryiko authored
Add tests utilizing low-level bpf_link_create() API to create perf BPF link. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210815070609.987780-13-andrii@kernel.org
-
Andrii Nakryiko authored
Wire through bpf_cookie for all attach APIs that use perf_event_open under the hood: - for kprobes, extend existing bpf_kprobe_opts with bpf_cookie field; - for perf_event, uprobe, and tracepoint APIs, add their _opts variants and pass bpf_cookie through opts. For kernel that don't support BPF_LINK_CREATE for perf_events, and thus bpf_cookie is not supported either, return error and log warning for user. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210815070609.987780-12-andrii@kernel.org
-
Andrii Nakryiko authored
Add ability to specify bpf_cookie value when creating BPF perf link with bpf_link_create() low-level API. Given BPF_LINK_CREATE command is growing and keeps getting new fields that are specific to the type of BPF_LINK, extend libbpf side of bpf_link_create() API and corresponding OPTS struct to accomodate such changes. Add extra checks to prevent using incompatible/unexpected combinations of fields. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210815070609.987780-11-andrii@kernel.org
-
Andrii Nakryiko authored
Detect kernel support for BPF perf link and prefer it when attaching to perf_event, tracepoint, kprobe/uprobe. Underlying perf_event FD will be kept open until BPF link is destroyed, at which point both perf_event FD and BPF link FD will be closed. This preserves current behavior in which perf_event FD is open for the duration of bpf_link's lifetime and user is able to "disconnect" bpf_link from underlying FD (with bpf_link__disconnect()), so that bpf_link__destroy() doesn't close underlying perf_event FD.When BPF perf link is used, disconnect will keep both perf_event and bpf_link FDs open, so it will be up to (advanced) user to close them. This approach is demonstrated in bpf_cookie.c selftests, added in this patch set. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210815070609.987780-10-andrii@kernel.org
-
Andrii Nakryiko authored
bpf_link->destroy() isn't used by any code, so remove it. Instead, add ability to override deallocation procedure, with default doing plain free(link). This is necessary for cases when we want to "subclass" struct bpf_link to keep extra information, as is the case in the next patch adding struct bpf_link_perf. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210815070609.987780-9-andrii@kernel.org
-
Andrii Nakryiko authored
Ensure libbpf.so is re-built whenever libbpf.map is modified. Without this, changes to libbpf.map are not detected and versioned symbols mismatch error will be reported until `make clean && make` is used, which is a suboptimal developer experience. Fixes: 306b267c ("libbpf: Verify versioned symbols") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210815070609.987780-8-andrii@kernel.org
-