Commits · faef26fa444dc44eeff70c9a63ebe7fef00b6c37 · Kirill Smelkov / linux

30 Sep, 2020 9 commits

bpf, selftests: Use bpf_tail_call_static where appropriate · faef26fa

Daniel Borkmann authored Sep 30, 2020

For those locations where we use an immediate tail call map index use the
newly added bpf_tail_call_static() helper.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/3cfb2b799a62d22c6e7ae5897c23940bdcc24cbc.1601477936.git.daniel@iogearbox.net

faef26fa

bpf, libbpf: Add bpf_tail_call_static helper for bpf programs · 0e9f6841

Daniel Borkmann authored Sep 30, 2020

Port of tail_call_static() helper function from Cilium's BPF code base [0]
to libbpf, so others can easily consume it as well. We've been using this
in production code for some time now. The main idea is that we guarantee
that the kernel's BPF infrastructure and JIT (here: x86_64) can patch the
JITed BPF insns with direct jumps instead of having to fall back to using
expensive retpolines. By using inline asm, we guarantee that the compiler
won't merge the call from different paths with potentially different
content of r2/r3.

We're also using Cilium's __throw_build_bug() macro (here as: __bpf_unreachable())
in different places as a neat trick to trigger compilation errors when
compiler does not remove code at compilation time. This works for the BPF
back end as it does not implement the __builtin_trap().

  [0] https://github.com/cilium/cilium/commit/f5537c26020d5297b70936c6b7d03a1e412a1035Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/1656a082e077552eb46642d513b4a6bde9a7dd01.1601477936.git.daniel@iogearbox.net

0e9f6841

bpf: Add redirect_neigh helper as redirect drop-in · b4ab3141

Daniel Borkmann authored Sep 30, 2020

Add a redirect_neigh() helper as redirect() drop-in replacement
for the xmit side. Main idea for the helper is to be very similar
in semantics to the latter just that the skb gets injected into
the neighboring subsystem in order to let the stack do the work
it knows best anyway to populate the L2 addresses of the packet
and then hand over to dev_queue_xmit() as redirect() does.

This solves two bigger items: i) skbs don't need to go up to the
stack on the host facing veth ingress side for traffic egressing
the container to achieve the same for populating L2 which also
has the huge advantage that ii) the skb->sk won't get orphaned in
ip_rcv_core() when entering the IP routing layer on the host stack.

Given that skb->sk neither gets orphaned when crossing the netns
as per 9c4c3252 ("skbuff: preserve sock reference when scrubbing
the skb.") the helper can then push the skbs directly to the phys
device where FQ scheduler can do its work and TCP stack gets proper
backpressure given we hold on to skb->sk as long as skb is still
residing in queues.

With the helper used in BPF data path to then push the skb to the
phys device, I observed a stable/consistent TCP_STREAM improvement
on veth devices for traffic going container -> host -> host ->
container from ~10Gbps to ~15Gbps for a single stream in my test
environment.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/bpf/f207de81629e1724899b73b8112e0013be782d35.1601477936.git.daniel@iogearbox.net

b4ab3141

bpf, net: Rework cookie generator as per-cpu one · 92acdc58

Daniel Borkmann authored Sep 30, 2020

With its use in BPF, the cookie generator can be called very frequently
in particular when used out of cgroup v2 hooks (e.g. connect / sendmsg)
and attached to the root cgroup, for example, when used in v1/v2 mixed
environments. In particular, when there's a high churn on sockets in the
system there can be many parallel requests to the bpf_get_socket_cookie()
and bpf_get_netns_cookie() helpers which then cause contention on the
atomic counter.

As similarly done in f991bd2e ("fs: introduce a per-cpu last_ino
allocator"), add a small helper library that both can use for the 64 bit
counters. Given this can be called from different contexts, we also need
to deal with potential nested calls even though in practice they are
considered extremely rare. One idea as suggested by Eric Dumazet was
to use a reverse counter for this situation since we don't expect 64 bit
overflows anyways; that way, we can avoid bigger gaps in the 64 bit
counter space compared to just batch-wise increase. Even on machines
with small number of cores (e.g. 4) the cookie generation shrinks from
min/max/med/avg (ns) of 22/50/40/38.9 down to 10/35/14/17.3 when run
in parallel from multiple CPUs.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Link: https://lore.kernel.org/bpf/8a80b8d27d3c49f9a14e1d5213c19d8be87d1dc8.1601477936.git.daniel@iogearbox.net

92acdc58

bpf: Add classid helper only based on skb->sk · b426ce83

Daniel Borkmann authored Sep 30, 2020

Similarly to 5a52ae4e ("bpf: Allow to retrieve cgroup v1 classid
from v2 hooks"), add a helper to retrieve cgroup v1 classid solely
based on the skb->sk, so it can be used as key as part of BPF map
lookups out of tc from host ns, in particular given the skb->sk is
retained these days when crossing net ns thanks to 9c4c3252
("skbuff: preserve sock reference when scrubbing the skb."). This
is similar to bpf_skb_cgroup_id() which implements the same for v2.
Kubernetes ecosystem is still operating on v1 however, hence net_cls
needs to be used there until this can be dropped in with the v2
helper of bpf_skb_cgroup_id().
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/ed633cf27a1c620e901c5aa99ebdefb028dce600.1601477936.git.daniel@iogearbox.net

b426ce83

bpf: fix raw_tp test run in preempt kernel · 963ec27a

Song Liu authored Sep 29, 2020

In preempt kernel, BPF_PROG_TEST_RUN on raw_tp triggers:

[   35.874974] BUG: using smp_processor_id() in preemptible [00000000]
code: new_name/87
[   35.893983] caller is bpf_prog_test_run_raw_tp+0xd4/0x1b0
[   35.900124] CPU: 1 PID: 87 Comm: new_name Not tainted 5.9.0-rc6-g615bd02bf #1
[   35.907358] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1ubuntu1 04/01/2014
[   35.916941] Call Trace:
[   35.919660]  dump_stack+0x77/0x9b
[   35.923273]  check_preemption_disabled+0xb4/0xc0
[   35.928376]  bpf_prog_test_run_raw_tp+0xd4/0x1b0
[   35.933872]  ? selinux_bpf+0xd/0x70
[   35.937532]  __do_sys_bpf+0x6bb/0x21e0
[   35.941570]  ? find_held_lock+0x2d/0x90
[   35.945687]  ? vfs_write+0x150/0x220
[   35.949586]  do_syscall_64+0x2d/0x40
[   35.953443]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by calling migrate_disable() before smp_processor_id().

Fixes: 1b4d60ec ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

963ec27a

libbpf: Compile in PIC mode only for shared library case · b0efc216

Andrii Nakryiko authored Sep 29, 2020

Libbpf compiles .o's for static and shared library modes separately, so no
need to specify -fPIC for both. Keep it only for shared library mode.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200929220604.833631-3-andriin@fb.com

b0efc216

libbpf: Compile libbpf under -O2 level by default and catch extra warnings · 0a62291d

Andrii Nakryiko authored Sep 29, 2020

For some reason compiler doesn't complain about uninitialized variable, fixed
in previous patch, if libbpf is compiled without -O2 optimization level. So do
compile it with -O2 and never let similar issue slip by again. -Wall is added
unconditionally, so no need to specify it again.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200929220604.833631-2-andriin@fb.com

0a62291d

libbpf: Fix uninitialized variable in btf_parse_type_sec · 33433913

Andrii Nakryiko authored Sep 29, 2020

Fix obvious unitialized variable use that wasn't reported by compiler. libbpf
Makefile changes to catch such errors are added separately.

Fixes: 3289959b ("libbpf: Support BTF loading and raw data output in both endianness")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200929220604.833631-1-andriin@fb.com

33433913

29 Sep, 2020 31 commits

Merge branch 'bpf, x64: optimize JIT's pro/epilogue' · 67e4ca74

Alexei Starovoitov authored Sep 29, 2020

Maciej Fijalkowski says:

====================
Hi!

This small set can be considered as a followup after recent addition of
support for tailcalls in bpf subprograms and is focused on optimizing
x64 JIT prologue and epilogue sections.

Turns out the popping tail call counter is not needed anymore and %rsp
handling when stack depth is 0 can be skipped.

For longer explanations, please see commit messages.

Thank you,
Maciej
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

67e4ca74

bpf: x64: Do not emit sub/add 0, %rsp when !stack_depth · 4d0b8c0b

Maciej Fijalkowski authored Sep 29, 2020

There is no particular reason for keeping the "sub 0, %rsp" insn within
the BPF's x64 JIT prologue.

When tail call code was skipping the whole prologue section these 7
bytes that represent the rsp subtraction could not be simply discarded
as the jump target address would be broken. An option to address that
would be to substitute it with nop7.

Right now tail call is skipping only first 11 bytes of target program's
prologue and "sub X, %rsp" is the first insn that is processed, so if
stack depth is zero then this insn could be omitted without the need for
nop7 swap.

Therefore, do not emit the "sub 0, %rsp" in prologue when program is not
making use of R10 register. Also, make the emission of "add X, %rsp"
conditional in tail call code logic and take into account the presence
of mentioned insn when calculating the jump offsets.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929204653.4325-3-maciej.fijalkowski@intel.com

4d0b8c0b

bpf, x64: Drop "pop %rcx" instruction on BPF JIT epilogue · d207929d

Maciej Fijalkowski authored Sep 29, 2020

Back when all of the callee-saved registers where always pushed to stack
in x64 JIT prologue, tail call counter was placed at the bottom of the
BPF program's stack frame that had a following layout:

+-------------+
|  ret addr   |
+-------------+
|     rbp     | <- rbp
+-------------+
|             |
| free space  |
| from:       |
| sub $x,%rsp |
|             |
+-------------+
|     rbx     |
+-------------+
|     r13     |
+-------------+
|     r14     |
+-------------+
|     r15     |
+-------------+
|  tail call  | <- rsp
|   counter   |
+-------------+

In order to restore the callee saved registers, epilogue needed to
explicitly toss away the tail call counter via "pop %rbx" insn, so that
%rsp would be back at the place where %r15 was stored.

Currently, the tail call counter is placed on stack *before* the callee
saved registers (brackets on rbx through r15 mean that they are now
pushed to stack only if they are used):

+-------------+
|  ret addr   |
+-------------+
|     rbp     | <- rbp
+-------------+
|             |
| free space  |
| from:       |
| sub $x,%rsp |
|             |
+-------------+
|  tail call  |
|   counter   |
+-------------+
(     rbx     )
+-------------+
(     r13     )
+-------------+
(     r14     )
+-------------+
(     r15     ) <- rsp
+-------------+

For the record, the epilogue insns consist of (assuming all of the
callee saved registers are used by program):
pop    %r15
pop    %r14
pop    %r13
pop    %rbx
pop    %rcx
leaveq
retq

"pop %rbx" for getting rid of tail call counter was not an option
anymore as it would overwrite the restored value of %rbx register, so it
was changed to use the %rcx register.

Since epilogue can start popping the callee saved registers right away
without any additional work, the "pop %rcx" could be dropped altogether
as "leave" insn will simply move the %rbp to %rsp. IOW, tail call
counter does not need the explicit handling.

Having in mind the explanation above and the actual reason for that,
let's piggy back on "leave" insn for discarding the tail call counter
from stack and remove the "pop %rcx" from epilogue.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929204653.4325-2-maciej.fijalkowski@intel.com

d207929d

selftests/bpf: Fix endianness issues in sk_lookup/ctx_narrow_access · 6458bde3

Ilya Leoshkevich authored Sep 29, 2020

This test makes a lot of narrow load checks while assuming little
endian architecture, and therefore fails on s390.

Fix by introducing LSB and LSW macros and using them to perform narrow
loads.

Fixes: 0ab5539f ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929201814.44360-1-iii@linux.ibm.com

6458bde3

bpf, selftests: Fix warning in snprintf_btf where system() call unchecked · c810b31e

John Fastabend authored Sep 29, 2020

On my systems system() calls are marked with warn_unused_result
apparently. So without error checking we get this warning,

./prog_tests/snprintf_btf.c:30:9: warning: ignoring return value
   of ‘system’, declared with attribute warn_unused_result[-Wunused-result]

Also it seems like a good idea to check the return value anyways
to ensure ping exists even if its seems unlikely.

Fixes: 076a95f5 ("selftests/bpf: Add bpf_snprintf_btf helper tests")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/160141006897.25201.12095049414156293265.stgit@john-Precision-5820-Tower

c810b31e

Merge branch 'bpf: Support multi-attach for freplace' · 93b8713d

Alexei Starovoitov authored Sep 29, 2020

Toke Høiland-Jørgensen says:

====================
This series adds support attaching freplace BPF programs to multiple targets.
This is needed to support incremental attachment of multiple XDP programs using
the libxdp dispatcher model.

Patch 1 moves prog_aux->linked_prog and the trampoline to be embedded in
bpf_tracing_link on attach, and freed by the link release logic, and introduces
a mutex to protect the writing of the pointers in prog->aux.

Based on this refactoring (and previously applied patches), it becomes pretty
straight-forward to support multiple-attach for freplace programs (patch 2).
This is simply a matter of creating a second bpf_tracing_link if a target is
supplied. However, for API consistency with other types of link attach, this
option is added to the BPF_LINK_CREATE API instead of extending
bpf_raw_tracepoint_open().

Patch 3 is a port of Jiri Olsa's patch to support fentry/fexit on freplace
programs. His approach of getting the target type from the target program
reference no longer works after we've gotten rid of linked_prog (because the
bpf_tracing_link reference disappears on attach). Instead, we used the saved
reference to the target prog type that is also used to verify compatibility on
secondary freplace attachment.

Patch 4 is the accompanying libbpf update, and patches 5-7 are selftests: patch
5 tests for the multi-freplace functionality itself; patch 6 is Jiri's previous
selftest for the fentry-to-freplace fix; patch 7 is a test for the change
introduced in the previously-applied patches, blocking MODIFY_RETURN functions
from attaching to other BPF programs.

With this series, libxdp and xdp-tools can successfully attach multiple programs
one at a time. To play with this, use the 'freplace-multi-attach' branch of
xdp-tools:

$ git clone --recurse-submodules --branch freplace-multi-attach https://github.com/xdp-project/xdp-tools
$ cd xdp-tools/xdp-loader
$ make
$ sudo ./xdp-loader load veth0 ../lib/testing/xdp_drop.o
$ sudo ./xdp-loader load veth0 ../lib/testing/xdp_pass.o
$ sudo ./xdp-loader status

The series is also available here:
https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=bpf-freplace-multi-attach-alt-10

Changelog:

v10:
  - Dial back the s/tgt_/dst_/ replacement a bit
  - Fix smatch warning (from ktest robot)
  - Rebase to bpf-next, drop already-applied patches

v9:
  - Clarify commit message of patch 3
  - Add new struct bpf_attach_target_info for returning from
    bpf_check_attach_target() and passing to bpf_trampoline_get()
  - Move trampoline key computation into a helper
  - Make sure we don't break bpffs debug umh
  - Add some comment blocks explaining the logic flow in
    bpf_tracing_prog_attach()
  - s/tgt_/dst_/ in prog->aux, and for local variables using those members
  - Always drop dst_trampoline and dst_prog from prog->aux on first attach
  - Don't remove syscall fmod_ret test from selftest benchmarks
  - Add saved_ prefix to dst_{prog,attach}_type members in prog_aux
  - Drop prog argument from check_attach_modify_return()
  - Add comment about possible NULL of tr_link->tgt_prog on link_release()

v8:
  - Add a separate error message when trying to attach FMOD_REPLACE to tgt_prog
  - Better error messages in bpf_program__attach_freplace()
  - Don't lock mutex when setting tgt_* pointers in prog create and verifier
  - Remove fmod_ret programs from benchmarks in selftests (new patch 11)
  - Fix a few other nits in selftests

v7:
  - Add back missing ptype == prog->type check in link_create()
  - Use tracing_bpf_link_attach() instead of separate freplace_bpf_link_attach()
  - Don't break attachment of bpf_iters in libbpf (by clobbering link_create.iter_info)

v6:
  - Rebase to latest bpf-next
  - Simplify logic in bpf_tracing_prog_attach()
  - Don't create a new attach_type for link_create(), disambiguate on prog->type
    instead
  - Use raw_tracepoint_open() in libbpf bpf_program__attach_ftrace() if called
    with NULL target
  - Switch bpf_program__attach_ftrace() to take function name as parameter
    instead of btf_id
  - Add a patch disallowing MODIFY_RETURN programs from attaching to other BPF
    programs, and an accompanying selftest (patches 1 and 10)

v5:
  - Fix typo in inline function definition of bpf_trampoline_get()
  - Don't put bpf_tracing_link in prog->aux, use a mutex to protect tgt_prog and
    trampoline instead, and move them to the link on attach.
  - Restore Jiri as author of the last selftest patch

v4:
  - Cleanup the refactored check_attach_btf_id() to make the logic easier to follow
  - Fix cleanup paths for bpf_tracing_link
  - Use xchg() for removing the bpf_tracing_link from prog->aux and restore on (some) failures
  - Use BPF_LINK_CREATE operation to create link with target instead of extending raw_tracepoint_open
  - Fold update of tools/ UAPI header into main patch
  - Update arg dereference patch to use skeletons and set_attach_target()

v3:
  - Get rid of prog_aux->linked_prog entirely in favour of a bpf_tracing_link
  - Incorporate Jiri's fix for attaching fentry to freplace programs

v2:
  - Drop the log arguments from bpf_raw_tracepoint_open
  - Fix kbot errors
  - Rebase to latest bpf-next
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

93b8713d

selftests: Add selftest for disallowing modify_return attachment to freplace · bee4b7e6

Toke Høiland-Jørgensen authored Sep 29, 2020

This adds a selftest that ensures that modify_return tracing programs
cannot be attached to freplace programs. The security_ prefix is added to
the freplace program because that would otherwise let it pass the check for
modify_return.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/160138355713.48470.3811074984255709369.stgit@toke.dk

bee4b7e6

selftests/bpf: Adding test for arg dereference in extension trace · 17d3f386

Jiri Olsa authored Sep 29, 2020

Adding test that setup following program:

  SEC("classifier/test_pkt_md_access")
  int test_pkt_md_access(struct __sk_buff *skb)

with its extension:

  SEC("freplace/test_pkt_md_access")
  int test_pkt_md_access_new(struct __sk_buff *skb)

and tracing that extension with:

  SEC("fentry/test_pkt_md_access_new")
  int BPF_PROG(fentry, struct sk_buff *skb)

The test verifies that the tracing program can
dereference skb argument properly.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/160138355603.48470.9072073357530773228.stgit@toke.dk

17d3f386

selftests: Add test for multiple attachments of freplace program · f6429476

Toke Høiland-Jørgensen authored Sep 29, 2020

This adds a selftest for attaching an freplace program to multiple targets
simultaneously.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/160138355497.48470.17568077161540217107.stgit@toke.dk

f6429476

libbpf: Add support for freplace attachment in bpf_link_create · a5359091

Toke Høiland-Jørgensen authored Sep 29, 2020

This adds support for supplying a target btf ID for the bpf_link_create()
operation, and adds a new bpf_program__attach_freplace() high-level API for
attaching freplace functions with a target.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/160138355387.48470.18026176785351166890.stgit@toke.dk

a5359091

bpf: Fix context type resolving for extension programs · 43bc2874

Toke Høiland-Jørgensen authored Sep 29, 2020

Eelco reported we can't properly access arguments if the tracing
program is attached to extension program.

Having following program:

  SEC("classifier/test_pkt_md_access")
  int test_pkt_md_access(struct __sk_buff *skb)

with its extension:

  SEC("freplace/test_pkt_md_access")
  int test_pkt_md_access_new(struct __sk_buff *skb)

and tracing that extension with:

  SEC("fentry/test_pkt_md_access_new")
  int BPF_PROG(fentry, struct sk_buff *skb)

It's not possible to access skb argument in the fentry program,
with following error from verifier:

  ; int BPF_PROG(fentry, struct sk_buff *skb)
  0: (79) r1 = *(u64 *)(r1 +0)
  invalid bpf_context access off=0 size=8

The problem is that btf_ctx_access gets the context type for the
traced program, which is in this case the extension.

But when we trace extension program, we want to get the context
type of the program that the extension is attached to, so we can
access the argument properly in the trace program.

This version of the patch is tweaked slightly from Jiri's original one,
since the refactoring in the previous patches means we have to get the
target prog type from the new variable in prog->aux instead of directly
from the target prog.
Reported-by: Eelco Chaudron <echaudro@redhat.com>
Suggested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/160138355278.48470.17057040257274725638.stgit@toke.dk

43bc2874

bpf: Support attaching freplace programs to multiple attach points · 4a1e7c0c

Toke Høiland-Jørgensen authored Sep 29, 2020

This enables support for attaching freplace programs to multiple attach
points. It does this by amending the UAPI for bpf_link_Create with a target
btf ID that can be used to supply the new attachment point along with the
target program fd. The target must be compatible with the target that was
supplied at program load time.

The implementation reuses the checks that were factored out of
check_attach_btf_id() to ensure compatibility between the BTF types of the
old and new attachment. If these match, a new bpf_tracing_link will be
created for the new attach target, allowing multiple attachments to
co-exist simultaneously.

The code could theoretically support multiple-attach of other types of
tracing programs as well, but since I don't have a use case for any of
those, there is no API support for doing so.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/160138355169.48470.17165680973640685368.stgit@toke.dk

4a1e7c0c

bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach · 3aac1ead

Toke Høiland-Jørgensen authored Sep 29, 2020

In preparation for allowing multiple attachments of freplace programs, move
the references to the target program and trampoline into the
bpf_tracing_link structure when that is created. To do this atomically,
introduce a new mutex in prog->aux to protect writing to the two pointers
to target prog and trampoline, and rename the members to make it clear that
they are related.

With this change, it is no longer possible to attach the same tracing
program multiple times (detaching in-between), since the reference from the
tracing program to the target disappears on the first attach. However,
since the next patch will let the caller supply an attach target, that will
also make it possible to attach to the same place multiple times.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/160138355059.48470.2503076992210324984.stgit@toke.dk

3aac1ead

Merge branch 'libbpf: support loading/storing any BTF' · 85e3f318

Alexei Starovoitov authored Sep 29, 2020

Andrii Nakryiko says:

====================
Add support for loading and storing BTF in either little- or big-endian
integer encodings, regardless of host endianness. This allows users of libbpf
to not care about endianness when they don't want to and transparently
open/load BTF of any endianness. libbpf will preserve original endianness and
will convert output raw data as necessary back to original endianness, if
necessary. This allows tools like pahole to be ignorant to such issues during
cross-compilation.

While working with BTF data in memory, the endianness is always native to the
host. Convetion can happen only during btf__get_raw_data() call, and only in
a raw data copy.

Additionally, it's possible to force output BTF endianness through new
btf__set_endianness() API. This which allows to create flexible tools doing
arbitrary conversions of BTF endianness, just by relying on libbpf.

Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
Cc: Tony Ambardar <tony.ambardar@gmail.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Luka Perkov <luka.perkov@sartura.hr>
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

85e3f318

selftests/bpf: Test BTF's handling of endianness · ed9cf248

Andrii Nakryiko authored Sep 28, 2020

Add selftests juggling endianness back and forth to validate BTF's handling of
endianness convertions internally.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929043046.1324350-4-andriin@fb.com

ed9cf248

libbpf: Support BTF loading and raw data output in both endianness · 3289959b

Andrii Nakryiko authored Sep 28, 2020

Teach BTF to recognized wrong endianness and transparently convert it
internally to host endianness. Original endianness of BTF will be preserved
and used during btf__get_raw_data() to convert resulting raw data to the same
endianness and a source raw_data. This means that little-endian host can parse
big-endian BTF with no issues, all the type data will be presented to the
client application in native endianness, but when it's time for emitting BTF
to persist it in a file (e.g., after BTF deduplication), original non-native
endianness will be preserved and stored.

It's possible to query original endianness of BTF data with new
btf__endianness() API. It's also possible to override desired output
endianness with btf__set_endianness(), so that if application needs to load,
say, big-endian BTF and store it as little-endian BTF, it's possible to
manually override this. If btf__set_endianness() was used to change
endianness, btf__endianness() will reflect overridden endianness.

Given there are no known use cases for supporting cross-endianness for
.BTF.ext, loading .BTF.ext in non-native endianness is not supported.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929043046.1324350-3-andriin@fb.com

3289959b

selftests/bpf: Move and extend ASSERT_xxx() testing macros · 22ba3635

Andrii Nakryiko authored Sep 28, 2020

Move existing ASSERT_xxx() macros out of btf_write selftest into test_progs.h
to use across all selftests. Also expand a set of macros for typical cases.

Now there are the following macros:
  - ASSERT_EQ() -- check for equality of two integers;
  - ASSERT_STREQ() -- check for equality of two C strings;
  - ASSERT_OK() -- check for successful (zero) return result;
  - ASSERT_ERR() -- check for unsuccessful (non-zero) return result;
  - ASSERT_NULL() -- check for NULL pointer;
  - ASSERT_OK_PTR() -- check for a valid pointer;
  - ASSERT_ERR_PTR() -- check for NULL or negative error encoded in a pointer.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929043046.1324350-2-andriin@fb.com

22ba3635

selftests: Make sure all 'skel' variables are declared static · f970cbcd

Toke Høiland-Jørgensen authored Sep 29, 2020

If programs in prog_tests using skeletons declare the 'skel' variable as
global but not static, that will lead to linker errors on the final link of
the prog_tests binary due to duplicate symbols. Fix a few instances of this.

Fixes: b18c1f0a ("bpf: selftest: Adapt sock_fields test to use skel and global variables")
Fixes: 9a856cae ("bpf: selftest: Add test_btf_skc_cls_ingress")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200929123026.46751-1-toke@redhat.com

f970cbcd

xsk: Fix a documentation mistake in xsk_queue.h · f1fc8ece

Ciara Loftus authored Sep 28, 2020

After 'peeking' the ring, the consumer, not the producer, reads the data.
Fix this mistake in the comments.

Fixes: 15d8c916 ("xsk: Add function naming comments and reorder functions")
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20200928082344.17110-1-ciara.loftus@intel.com

f1fc8ece

selftests/bpf_iter: Don't fail test due to missing __builtin_btf_type_id · d2197c7f

Toke Høiland-Jørgensen authored Sep 29, 2020

The new test for task iteration in bpf_iter checks (in do_btf_read()) if it
should be skipped due to missing __builtin_btf_type_id. However, this
'skip' verdict is not propagated to the caller, so the parent test will
still fail. Fix this by also skipping the rest of the parent test if the
skip condition was reached.

Fixes: b72091bd ("selftests/bpf: Add test for bpf_seq_printf_btf helper")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20200929123004.46694-1-toke@redhat.com

d2197c7f

bpf/preload: Make sure Makefile cleans up after itself, and add .gitignore · 9d9aae53

Toke Høiland-Jørgensen authored Sep 27, 2020

The Makefile in bpf/preload builds a local copy of libbpf, but does not
properly clean up after itself. This can lead to subsequent compilation
failures, since the feature detection cache is kept around which can lead
subsequent detection to fail.

Fix this by properly setting clean-files, and while we're at it, also add a
.gitignore for the directory to ignore the build artifacts.

Fixes: d71fa5c9 ("bpf: Add kernel module with user mode driver that populates bpffs.")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200927193005.8459-1-toke@redhat.com

9d9aae53

Merge branch 'selftests/bpf: BTF-based kernel data display' · 3aae4a38

Alexei Starovoitov authored Sep 29, 2020

Alan Maguire says:

====================
Resolve issues in bpf selftests introduced with BTF-based kernel data
display selftests; these are

- a warning introduced in snprintf_btf.c; and
- compilation failures with old kernels vmlinux.h
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

3aae4a38

selftests/bpf: Ensure snprintf_btf/bpf_iter tests compatibility with old vmlinux.h · cfe77683

Alan Maguire authored Sep 29, 2020

Andrii reports that bpf selftests relying on "struct btf_ptr" and BTF_F_*
values will not build as vmlinux.h for older kernels will not include
"struct btf_ptr" or the BTF_F_* enum values.  Undefine and redefine
them to work around this.

Fixes: b72091bd ("selftests/bpf: Add test for bpf_seq_printf_btf helper")
Fixes: 076a95f5 ("selftests/bpf: Add bpf_snprintf_btf helper tests")
Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/1601379151-21449-3-git-send-email-alan.maguire@oracle.com

cfe77683

selftests/bpf: Fix unused-result warning in snprintf_btf.c · 96c48058

Alan Maguire authored Sep 29, 2020

Daniel reports:

+    system("ping -c 1 127.0.0.1 > /dev/null");

This generates the following new warning when compiling BPF selftests:

  [...]
  EXT-OBJ  [test_progs] cgroup_helpers.o
  EXT-OBJ  [test_progs] trace_helpers.o
  EXT-OBJ  [test_progs] network_helpers.o
  EXT-OBJ  [test_progs] testing_helpers.o
  TEST-OBJ [test_progs] snprintf_btf.test.o
/root/bpf-next/tools/testing/selftests/bpf/prog_tests/snprintf_btf.c: In function ‘test_snprintf_btf’:
/root/bpf-next/tools/testing/selftests/bpf/prog_tests/snprintf_btf.c:30:2: warning: ignoring return value of ‘system’, declared with attribute warn_unused_result [-Wunused-result]
  system("ping -c 1 127.0.0.1 > /dev/null");
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  [...]

Fixes: 076a95f5 ("selftests/bpf: Add bpf_snprintf_btf helper tests")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1601379151-21449-2-git-send-email-alan.maguire@oracle.com

96c48058

bpf, selftests: Fix cast to smaller integer type 'int' warning in raw_tp · 00e8c44a

John Fastabend authored Sep 28, 2020

Fix warning in bpf selftests,

progs/test_raw_tp_test_run.c:18:10: warning: cast to smaller integer type 'int' from 'struct task_struct *' [-Wpointer-to-int-cast]

Change int type cast to long to fix. Discovered with gcc-9 and llvm-11+
where llvm was recent main branch.

Fixes: 09d8ad16 ("selftests/bpf: Add raw_tp_test_run")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/160134424745.11199.13841922833336698133.stgit@john-Precision-5820-Tower

00e8c44a

Merge branch 'libbpf: BTF writer APIs' · bc600908

Alexei Starovoitov authored Sep 28, 2020

Andrii Nakryiko says:

====================
This patch set introduces a new set of BTF APIs to libbpf that allow to
conveniently produce BTF types and strings. These APIs will allow libbpf to do
more intrusive modifications of program's BTF (by rewriting it, at least as of
right now), which is necessary for the upcoming libbpf static linking. But
they are complete and generic, so can be adopted by anyone who has a need to
produce BTF type information.

One such example outside of libbpf is pahole, which was actually converted to
these APIs (locally, pending landing of these changes in libbpf) completely
and shows reduction in amount of custom pahole code necessary and brings nice
savings in memory usage (about 370MB reduction at peak for my kernel
configuration) and even BTF deduplication times (one second reduction,
23.7s -> 22.7s). Memory savings are due to avoiding pahole's own copy of
"uncompressed" raw BTF data. Time reduction comes from faster string
search and deduplication by relying on hashmap instead of BST used by pahole's
own code. Consequently, these APIs are already tested on real-world
complicated kernel BTF, but there is also pretty extensive selftest doing
extra validations.

Selftests in patch #3 add a set of generic ASSERT_{EQ,STREQ,ERR,OK} macros
that are useful for writing shorter and less repretitive selftests. I decided
to keep them local to that selftest for now, but if they prove to be useful in
more contexts we should move them to test_progs.h. And few more (e.g.,
inequality tests) macros are probably necessary to have a more complete set.

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>

v2->v3:
  - resending original patches #7-9 as patches #1-3 due to merge conflict;

v1->v2:
  - fixed comments (John);
  - renamed btf__append_xxx() into btf__add_xxx() (Alexei);
  - added btf__find_str() in addition to btf__add_str();
  - btf__new_empty() now sets kernel FD to -1 initially.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bc600908

selftests/bpf: Test BTF writing APIs · 9141f75a

Andrii Nakryiko authored Sep 28, 2020

Add selftests for BTF writer APIs.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200929020533.711288-4-andriin@fb.com

9141f75a

libbpf: Add btf__str_by_offset() as a more generic variant of name_by_offset · f86ed050

Andrii Nakryiko authored Sep 28, 2020

BTF strings are used not just for names, they can be arbitrary strings used
for CO-RE relocations, line/func infos, etc. Thus "name_by_offset" terminology
is too specific and might be misleading. Instead, introduce
btf__str_by_offset() API which uses generic string terminology.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200929020533.711288-3-andriin@fb.com

f86ed050

libbpf: Add BTF writing APIs · 4a3b33f8

Andrii Nakryiko authored Sep 28, 2020

Add APIs for appending new BTF types at the end of BTF object.

Each BTF kind has either one API of the form btf__add_<kind>(). For types
that have variable amount of additional items (struct/union, enum, func_proto,
datasec), additional API is provided to emit each such item. E.g., for
emitting a struct, one would use the following sequence of API calls:

btf__add_struct(...);
btf__add_field(...);
...
btf__add_field(...);

Each btf__add_field() will ensure that the last BTF type is of STRUCT or
UNION kind and will automatically increment that type's vlen field.

All the strings are provided as C strings (const char *), not a string offset.
This significantly improves usability of BTF writer APIs. All such strings
will be automatically appended to string section or existing string will be
re-used, if such string was already added previously.

Each API attempts to do all the reasonable validations, like enforcing
non-empty names for entities with required names, proper value bounds, various
bit offset restrictions, etc.

Type ID validation is minimal because it's possible to emit a type that refers
to type that will be emitted later, so libbpf has no way to enforce such
cases. User must be careful to properly emit all the necessary types and
specify type IDs that will be valid in the finally generated BTF.

Each of btf__add_<kind>() APIs return new type ID on success or negative
value on error. APIs like btf__add_field() that emit additional items
return zero on success and negative value on error.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200929020533.711288-2-andriin@fb.com

4a3b33f8

Merge branch 'bpf: add helpers to support BTF-based kernel' · 98b972d2

Alexei Starovoitov authored Sep 28, 2020

Alan Maguire says:

====================
This series attempts to provide a simple way for BPF programs (and in
future other consumers) to utilize BPF Type Format (BTF) information
to display kernel data structures in-kernel.  The use case this
functionality is applied to here is to support a snprintf()-like
helper to copy a BTF representation of kernel data to a string,
and a BPF seq file helper to display BTF data for an iterator.

There is already support in kernel/bpf/btf.c for "show" functionality;
the changes here generalize that support from seq-file specific
verifier display to the more generic case and add another specific
use case; rather than seq_printf()ing the show data, it is copied
to a supplied string using a snprintf()-like function.  Other future
consumers of the show functionality could include a bpf_printk_btf()
function which printk()ed the data instead.  Oops messaging in
particular would be an interesting application for such functionality.

The above potential use case hints at a potential reply to
a reasonable objection that such typed display should be
solved by tracing programs, where the in-kernel tracing records
data and the userspace program prints it out.  While this
is certainly the recommended approach for most cases, I
believe having an in-kernel mechanism would be valuable
also.  Critically in BPF programs it greatly simplifies
debugging and tracing of such data to invoking a simple
helper.

One challenge raised in an earlier iteration of this work -
where the BTF printing was implemented as a printk() format
specifier - was that the amount of data printed per
printk() was large, and other format specifiers were far
simpler.  Here we sidestep that concern by printing
components of the BTF representation as we go for the
seq file case, and in the string case the snprintf()-like
operation is intended to be a basis for perf event or
ringbuf output.  The reasons for avoiding bpf_trace_printk
are that

1. bpf_trace_printk() strings are restricted in size and
cannot display anything beyond trivial data structures; and
2. bpf_trace_printk() is for debugging purposes only.

As Alexei suggested, a bpf_trace_puts() helper could solve
this in the future but it still would be limited by the
1000 byte limit for traced strings.

Default output for an sk_buff looks like this (zeroed fields
are omitted):

(struct sk_buff){
 .transport_header = (__u16)65535,
 .mac_header = (__u16)65535,
 .end = (sk_buff_data_t)192,
 .head = (unsigned char *)0x000000007524fd8b,
 .data = (unsigned char *)0x000000007524fd8b,
 .truesize = (unsigned int)768,
 .users = (refcount_t){
  .refs = (atomic_t){
   .counter = (int)1,
  },
 },
}

Flags can modify aspects of output format; see patch 3
for more details.

Changes since v6:

- Updated safe data size to 32, object name size to 80.
  This increases the number of safe copies done, but performance is
  not a key goal here. WRT name size the largest type name length
  in bpf-next according to "pahole -s" is 64 bytes, so that still gives
  room for additional type qualifiers, parens etc within the name limit
  (Alexei, patch 2)
- Remove inlines and converted as many #defines to functions as was
  possible.  In a few cases - btf_show_type_value[s]() specifically -
  I left these as macros as btf_show_type_value[s]() prepends and
  appends format strings to the format specifier (in order to include
  indentation, delimiters etc so a macro makes that simpler (Alexei,
  patch 2)
- Handle btf_resolve_size() error in btf_show_obj_safe() (Alexei, patch 2)
- Removed clang loop unroll in BTF snprintf test (Alexei)
- switched to using bpf_core_type_id_kernel(type) as suggested by Andrii,
  and Alexei noted that __builtin_btf_type_id(,1) should be used (patch 4)
- Added skip logic if __builtin_btf_type_id is not available (patches 4,8)
- Bumped limits on bpf iters to support printing larger structures (Alexei,
  patch 5)
- Updated overflow bpf_iter tests to reflect new iter max size (patch 6)
- Updated seq helper to use type id only (Alexei, patch 7)
- Updated BTF task iter test to use task struct instead of struct fs_struct
  since new limits allow a task_struct to be displayed (patch 8)
- Fixed E2BIG handling in iter task (Alexei, patch 8)

Changes since v5:

- Moved btf print prepare into patch 3, type show seq
  with flags into patch 2 (Alexei, patches 2,3)
- Fixed build bot warnings around static declarations
  and printf attributes
- Renamed functions to snprintf_btf/seq_printf_btf
  (Alexei, patches 3-6)

Changes since v4:

- Changed approach from a BPF trace event-centric design to one
  utilizing a snprintf()-like helper and an iter helper (Alexei,
  patches 3,5)
- Added tests to verify BTF output (patch 4)
- Added support to tests for verifying BTF type_id-based display
  as well as type name via __builtin_btf_type_id (Andrii, patch 4).
- Augmented task iter tests to cover the BTF-based seq helper.
  Because a task_struct's BTF-based representation would overflow
  the PAGE_SIZE limit on iterator data, the "struct fs_struct"
  (task->fs) is displayed for each task instead (Alexei, patch 6).

Changes since v3:

- Moved to RFC since the approach is different (and bpf-next is
  closed)
- Rather than using a printk() format specifier as the means
  of invoking BTF-enabled display, a dedicated BPF helper is
  used.  This solves the issue of printk() having to output
  large amounts of data using a complex mechanism such as
  BTF traversal, but still provides a way for the display of
  such data to be achieved via BPF programs.  Future work could
  include a bpf_printk_btf() function to invoke display via
  printk() where the elements of a data structure are printk()ed
 one at a time.  Thanks to Petr Mladek, Andy Shevchenko and
  Rasmus Villemoes who took time to look at the earlier printk()
  format-specifier-focused version of this and provided feedback
  clarifying the problems with that approach.
- Added trace id to the bpf_trace_printk events as a means of
  separating output from standard bpf_trace_printk() events,
  ensuring it can be easily parsed by the reader.
- Added bpf_trace_btf() helper tests which do simple verification
  of the various display options.

Changes since v2:

- Alexei and Yonghong suggested it would be good to use
  probe_kernel_read() on to-be-shown data to ensure safety
  during operation.  Safe copy via probe_kernel_read() to a
  buffer object in "struct btf_show" is used to support
  this.  A few different approaches were explored
  including dynamic allocation and per-cpu buffers. The
  downside of dynamic allocation is that it would be done
  during BPF program execution for bpf_trace_printk()s using
  %pT format specifiers. The problem with per-cpu buffers
  is we'd have to manage preemption and since the display
  of an object occurs over an extended period and in printk
  context where we'd rather not change preemption status,
  it seemed tricky to manage buffer safety while considering
  preemption.  The approach of utilizing stack buffer space
  via the "struct btf_show" seemed like the simplest approach.
  The stack size of the associated functions which have a
  "struct btf_show" on their stack to support show operation
  (btf_type_snprintf_show() and btf_type_seq_show()) stays
  under 500 bytes. The compromise here is the safe buffer we
  use is small - 256 bytes - and as a result multiple
  probe_kernel_read()s are needed for larger objects. Most
  objects of interest are smaller than this (e.g.
  "struct sk_buff" is 224 bytes), and while task_struct is a
  notable exception at ~8K, performance is not the priority for
  BTF-based display. (Alexei and Yonghong, patch 2).
- safe buffer use is the default behaviour (and is mandatory
  for BPF) but unsafe display - meaning no safe copy is done
  and we operate on the object itself - is supported via a
  'u' option.
- pointers are prefixed with 0x for clarity (Alexei, patch 2)
- added additional comments and explanations around BTF show
  code, especially around determining whether objects such
  zeroed. Also tried to comment safe object scheme used. (Yonghong,
  patch 2)
- added late_initcall() to initialize vmlinux BTF so that it would
  not have to be initialized during printk operation (Alexei,
  patch 5)
- removed CONFIG_BTF_PRINTF config option as it is not needed;
  CONFIG_DEBUG_INFO_BTF can be used to gate test behaviour and
  determining behaviour of type-based printk can be done via
  retrieval of BTF data; if it's not there BTF was unavailable
  or broken (Alexei, patches 4,6)
- fix bpf_trace_printk test to use vmlinux.h and globals via
  skeleton infrastructure, removing need for perf events
  (Andrii, patch 8)

Changes since v1:

- changed format to be more drgn-like, rendering indented type info
  along with type names by default (Alexei)
- zeroed values are omitted (Arnaldo) by default unless the '0'
  modifier is specified (Alexei)
- added an option to print pointer values without obfuscation.
  The reason to do this is the sysctls controlling pointer display
  are likely to be irrelevant in many if not most tracing contexts.
  Some questions on this in the outstanding questions section below...
- reworked printk format specifer so that we no longer rely on format
  %pT<type> but instead use a struct * which contains type information
  (Rasmus). This simplifies the printk parsing, makes use more dynamic
  and also allows specification by BTF id as well as name.
- removed incorrect patch which tried to fix dereferencing of resolved
  BTF info for vmlinux; instead we skip modifiers for the relevant
  case (array element type determination) (Alexei).
- fixed issues with negative snprintf format length (Rasmus)
- added test cases for various data structure formats; base types,
  typedefs, structs, etc.
- tests now iterate through all typedef, enum, struct and unions
  defined for vmlinux BTF and render a version of the target dummy
  value which is either all zeros or all 0xff values; the idea is this
  exercises the "skip if zero" and "print everything" cases.
- added support in BPF for using the %pT format specifier in
  bpf_trace_printk()
- added BPF tests which ensure %pT format specifier use works (Alexei).
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

98b972d2

selftests/bpf: Add test for bpf_seq_printf_btf helper · b72091bd

Alan Maguire authored Sep 28, 2020

Add a test verifying iterating over tasks and displaying BTF
representation of task_struct succeeds.
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1601292670-1616-9-git-send-email-alan.maguire@oracle.com

b72091bd