Commits · 26d82640d5ba2c3b32d79597be2dcf820ed78b16 · Kirill Smelkov / linux

17 Aug, 2021 11 commits

selftests/bpf: Skip loading bpf_testmod when using -l to list tests. · 26d82640

Yucong Sun authored Aug 16, 2021

When using "-l", test_progs often is executed as non-root user,
load_bpf_testmod() will fail and output errors. This patch skips loading bpf
testmod when "-l" is specified, making output cleaner.
Signed-off-by: Yucong Sun <fallentree@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210817044732.3263066-2-fallentree@fb.com

26d82640

selftests/bpf: Add exponential backoff to map_delete_retriable in test_maps · 857f75ea

Yucong Sun authored Aug 16, 2021

Using a fixed delay of 1 microsecond has proven flaky in slow CPU environment,
e.g. Github Actions CI system. This patch adds exponential backoff with a cap
of 50ms to reduce the flakiness of the test. Initial delay is chosen at random
in the range [0ms, 5ms).
Signed-off-by: Yucong Sun <fallentree@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210817045713.3307985-1-fallentree@fb.com

857f75ea

selftests/bpf: Add exponential backoff to map_update_retriable in test_maps · 3c3bd542

Yucong Sun authored Aug 16, 2021

Using a fixed delay of 1 microsecond has proven flaky in slow CPU environment,
e.g. Github Actions CI system. This patch adds exponential backoff with a cap
of 50ms to reduce the flakiness of the test. Initial delay is chosen at random
in the range [0ms, 5ms).
Signed-off-by: Yucong Sun <fallentree@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210816175250.296110-1-fallentree@fb.com

3c3bd542

Merge branch 'sockmap: add sockmap support for unix stream socket' · 1e1e49df

Andrii Nakryiko authored Aug 16, 2021

Jiang Wang says:

====================

This patch series add support for unix stream type
for sockmap. Sockmap already supports TCP, UDP,
unix dgram types. The unix stream support is similar
to unix dgram.

Also add selftests for unix stream type in sockmap tests.
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

1e1e49df

selftest/bpf: Add new tests in sockmap for unix stream to tcp. · 31c50aee

Jiang Wang authored Aug 16, 2021

Add two new test cases in sockmap tests, where unix stream is
redirected to tcp and vice versa.
Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210816190327.2739291-6-jiang.wang@bytedance.com

31c50aee

selftest/bpf: Change udp to inet in some function names · 75e0e27d

Jiang Wang authored Aug 16, 2021

This is to prepare for adding new unix stream tests.
Mostly renames, also pass the socket types as an argument.
Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210816190327.2739291-5-jiang.wang@bytedance.com

75e0e27d

selftest/bpf: Add tests for sockmap with unix stream type. · 9b03152b

Jiang Wang authored Aug 16, 2021

Add two tests for unix stream to unix stream redirection
in sockmap tests.
Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210816190327.2739291-4-jiang.wang@bytedance.com

9b03152b

af_unix: Add unix_stream_proto for sockmap · 94531cfc

Jiang Wang authored Aug 16, 2021

Previously, sockmap for AF_UNIX protocol only supports
dgram type. This patch add unix stream type support, which
is similar to unix_dgram_proto. To support sockmap, dgram
and stream cannot share the same unix_proto anymore, because
they have different implementations, such as unhash for stream
type (which will remove closed or disconnected sockets from the map),
so rename unix_proto to unix_dgram_proto and add a new
unix_stream_proto.

Also implement stream related sockmap functions.
And add dgram key words to those dgram specific functions.
Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210816190327.2739291-3-jiang.wang@bytedance.com

94531cfc

af_unix: Add read_sock for stream socket types · 77462de1

Jiang Wang authored Aug 16, 2021

To support sockmap for af_unix stream type, implement
read_sock, which is similar to the read_sock for unix
dgram sockets.
Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210816190327.2739291-2-jiang.wang@bytedance.com

77462de1

selftests/bpf: Test btf__load_vmlinux_btf/btf__load_module_btf APIs · edce1a24

Hengqi Chen authored Aug 15, 2021

Add test for btf__load_vmlinux_btf/btf__load_module_btf APIs. The test
loads bpf_testmod module BTF and check existence of a symbol which is
known to exist.
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210815081035.205879-1-hengqi.chen@gmail.com

edce1a24

bpf: Reconfigure libbpf docs to remove unversioned API · bb571649

grantseltzer authored Aug 09, 2021

This removes the libbpf_api.rst file from the kernel documentation.
The intention for this file was to pull documentation from comments
above API functions in libbpf. However, due to limitations of the
kernel documentation system, this API documentation could not be
versioned, which is counterintuative to how users expect to use it.
There is also currently no doc comments, making this a blank page.

Once the kernel comment documentation is actually contributed, it
will still exist in the kernel repository, just in the code itself.

A seperate site is being spun up to generate documentaiton from those
comments in a way in which it can be versioned properly.

This also reconfigures the bpf documentation index page to make it
easier to sync to the previously mentioned documentaiton site.
Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210810020508.280639-1-grantseltzer@gmail.com

bb571649

16 Aug, 2021 18 commits

Merge branch 'bpf-perf-link' · 3a4ce01b

Daniel Borkmann authored Aug 17, 2021

Andrii Nakryiko says:

====================
This patch set implements an ability for users to specify custom black box u64
value for each BPF program attachment, bpf_cookie, which is available to BPF
program at runtime. This is a feature that's critically missing for cases when
some sort of generic processing needs to be done by the common BPF program
logic (or even exactly the same BPF program) across multiple BPF hooks (e.g.,
many uniformly handled kprobes) and it's important to be able to distinguish
between each BPF hook at runtime (e.g., for additional configuration lookup).

The choice of restricting this to a fixed-size 8-byte u64 value is an explicit
design decision. Making this configurable by users adds unnecessary complexity
(extra memory allocations, extra complications on the verifier side to validate
accesses to variable-sized data area) while not really opening up new
possibilities. If user's use case requires storing more data per attachment,
it's possible to use either global array, or ARRAY/HASHMAP BPF maps, where
bpf_cookie would be used as an index into respective storage, populated by
user-space code before creating BPF link. This gives user all the flexibility
and control while keeping BPF verifier and BPF helper API simple.

Currently, similar functionality can only be achieved through:

  - code-generation and BPF program cloning, which is very complicated and
    unmaintainable;
  - on-the-fly C code generation and further runtime compilation, which is
    what BCC uses and allows to do pretty simply. The big downside is a very
    heavy-weight Clang/LLVM dependency and inefficient memory usage (due to
    many BPF program clones and the compilation process itself);
  - in some cases (kprobes and sometimes uprobes) it's possible to do function
    IP lookup to get function-specific configuration. This doesn't work for
    all the cases (e.g., when attaching uprobes to shared libraries) and has
    higher runtime overhead and additional programming complexity due to
    BPF_MAP_TYPE_HASHMAP lookups. Up until recently, before bpf_get_func_ip()
    BPF helper was added, it was also very complicated and unstable (API-wise)
    to get traced function's IP from fentry/fexit and kretprobe.

With libbpf and BPF CO-RE, runtime compilation is not an option, so to be able
to build generic tracing tooling simply and efficiently, ability to provide
additional bpf_cookie value for each *attachment* (as opposed to each BPF
program) is extremely important. Two immediate users of this functionality are
going to be libbpf-based USDT library (currently in development) and retsnoop
([0]), but I'm sure more applications will come once users get this feature in
their kernels.

To achieve above described, all perf_event-based BPF hooks are made available
through a new BPF_LINK_TYPE_PERF_EVENT BPF link, which allows to use common
LINK_CREATE command for program attachments and generally brings
perf_event-based attachments into a common BPF link infrastructure.

With that, LINK_CREATE gets ability to pass throught bpf_cookie value during
link creation (BPF program attachment) time. bpf_get_attach_cookie() BPF
helper is added to allow fetching this value at runtime from BPF program side.
BPF cookie is stored either on struct perf_event itself and fetched from the
BPF program context, or is passed through ambient BPF run context, added in
c7603cfa ("bpf: Add ambient BPF runtime context stored in current").

On the libbpf side of things, BPF perf link is utilized whenever is supported
by the kernel instead of using PERF_EVENT_IOC_SET_BPF ioctl on perf_event FD.
All the tracing attach APIs are extended with OPTS and bpf_cookie is passed
through corresponding opts structs.

Last part of the patch set adds few self-tests utilizing new APIs.

There are also a few refactorings along the way to make things cleaner and
easier to work with, both in kernel (BPF_PROG_RUN and BPF_PROG_RUN_ARRAY), and
throughout libbpf and selftests.

Follow-up patches will extend bpf_cookie to fentry/fexit programs.

While adding uprobe_opts, also extend it with ref_ctr_offset for specifying
USDT semaphore (reference counter) offset. Update attach_probe selftests to
validate its functionality. This is another feature (along with bpf_cookie)
required for implementing libbpf-based USDT solution.

  [0] https://github.com/anakryiko/retsnoop

v4->v5:
  - rebase on latest bpf-next to resolve merge conflict;
  - add ref_ctr_offset to uprobe_opts and corresponding selftest;
v3->v4:
  - get rid of BPF_PROG_RUN macro in favor of bpf_prog_run() (Daniel);
  - move #ifdef CONFIG_BPF_SYSCALL check into bpf_set_run_ctx (Daniel);
v2->v3:
  - user_ctx -> bpf_cookie, bpf_get_user_ctx -> bpf_get_attach_cookie (Peter);
  - fix BPF_LINK_TYPE_PERF_EVENT value fix (Jiri);
  - use bpf_prog_run() from bpf_prog_run_pin_on_cpu() (Yonghong);
v1->v2:
  - fix build failures on non-x86 arches by gating on CONFIG_PERF_EVENTS.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

3a4ce01b

selftests/bpf: Add ref_ctr_offset selftests · 4bd11e08

Andrii Nakryiko authored Aug 15, 2021

Extend attach_probe selftests to specify ref_ctr_offset for uprobe/uretprobe
and validate that its value is incremented from zero.

Turns out that once uprobe is attached with ref_ctr_offset, uretprobe for the
same location/function *has* to use ref_ctr_offset as well, otherwise
perf_event_open() fails with -EINVAL. So this test uses ref_ctr_offset for
both uprobe and uretprobe, even though for the purpose of test uprobe would be
enough.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815070609.987780-17-andrii@kernel.org

4bd11e08

libbpf: Add uprobe ref counter offset support for USDT semaphores · 5e3b8356

Andrii Nakryiko authored Aug 15, 2021

When attaching to uprobes through perf subsystem, it's possible to specify
offset of a so-called USDT semaphore, which is just a reference counted u16,
used by kernel to keep track of how many tracers are attached to a given
location. Support for this feature was added in [0], so just wire this through
uprobe_opts. This is important to enable implementing USDT attachment and
tracing through libbpf's bpf_program__attach_uprobe_opts() API.

  [0] a6ca88b2 ("trace_uprobe: support reference counter in fd-based uprobe")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815070609.987780-16-andrii@kernel.org

5e3b8356

selftests/bpf: Add bpf_cookie selftests for high-level APIs · 0a80cf67

Andrii Nakryiko authored Aug 15, 2021

Add selftest with few subtests testing proper bpf_cookie usage.

Kprobe and uprobe subtests are pretty straightforward and just validate that
the same BPF program attached with different bpf_cookie will be triggered with
those different bpf_cookie values.

Tracepoint subtest is a bit more interesting, as it is the only
perf_event-based BPF hook that shares bpf_prog_array between multiple
perf_events internally. This means that the same BPF program can't be attached
to the same tracepoint multiple times. So we have 3 identical copies. This
arrangement allows to test bpf_prog_array_copy()'s handling of bpf_prog_array
list manipulation logic when programs are attached and detached. The test
validates that bpf_cookie isn't mixed up and isn't lost during such list
manipulations.

Perf_event subtest validates that two BPF links can be created against the
same perf_event (but not at the same time, only one BPF program can be
attached to perf_event itself), and that for each we can specify different
bpf_cookie value.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815070609.987780-15-andrii@kernel.org

0a80cf67

selftests/bpf: Extract uprobe-related helpers into trace_helpers.{c,h} · a549aaa6

Andrii Nakryiko authored Aug 15, 2021

Extract two helpers used for working with uprobes into trace_helpers.{c,h} to
be re-used between multiple uprobe-using selftests. Also rename get_offset()
into more appropriate get_uprobe_offset().
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815070609.987780-14-andrii@kernel.org

a549aaa6

selftests/bpf: Test low-level perf BPF link API · f36d3557

Andrii Nakryiko authored Aug 15, 2021

Add tests utilizing low-level bpf_link_create() API to create perf BPF link.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815070609.987780-13-andrii@kernel.org

f36d3557

libbpf: Add bpf_cookie to perf_event, kprobe, uprobe, and tp attach APIs · 47faff37

Andrii Nakryiko authored Aug 15, 2021

Wire through bpf_cookie for all attach APIs that use perf_event_open under the
hood:
  - for kprobes, extend existing bpf_kprobe_opts with bpf_cookie field;
  - for perf_event, uprobe, and tracepoint APIs, add their _opts variants and
    pass bpf_cookie through opts.

For kernel that don't support BPF_LINK_CREATE for perf_events, and thus
bpf_cookie is not supported either, return error and log warning for user.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815070609.987780-12-andrii@kernel.org

47faff37

libbpf: Add bpf_cookie support to bpf_link_create() API · 3ec84f4b

Andrii Nakryiko authored Aug 15, 2021

Add ability to specify bpf_cookie value when creating BPF perf link with
bpf_link_create() low-level API.

Given BPF_LINK_CREATE command is growing and keeps getting new fields that are
specific to the type of BPF_LINK, extend libbpf side of bpf_link_create() API
and corresponding OPTS struct to accomodate such changes. Add extra checks to
prevent using incompatible/unexpected combinations of fields.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815070609.987780-11-andrii@kernel.org

3ec84f4b

libbpf: Use BPF perf link when supported by kernel · 668ace0e

Andrii Nakryiko authored Aug 15, 2021

Detect kernel support for BPF perf link and prefer it when attaching to
perf_event, tracepoint, kprobe/uprobe. Underlying perf_event FD will be kept
open until BPF link is destroyed, at which point both perf_event FD and BPF
link FD will be closed.

This preserves current behavior in which perf_event FD is open for the
duration of bpf_link's lifetime and user is able to "disconnect" bpf_link from
underlying FD (with bpf_link__disconnect()), so that bpf_link__destroy()
doesn't close underlying perf_event FD.When BPF perf link is used, disconnect
will keep both perf_event and bpf_link FDs open, so it will be up to
(advanced) user to close them. This approach is demonstrated in bpf_cookie.c
selftests, added in this patch set.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815070609.987780-10-andrii@kernel.org

668ace0e

libbpf: Remove unused bpf_link's destroy operation, but add dealloc · d88b71d4

Andrii Nakryiko authored Aug 15, 2021

bpf_link->destroy() isn't used by any code, so remove it. Instead, add ability
to override deallocation procedure, with default doing plain free(link). This
is necessary for cases when we want to "subclass" struct bpf_link to keep
extra information, as is the case in the next patch adding struct
bpf_link_perf.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815070609.987780-9-andrii@kernel.org

d88b71d4

libbpf: Re-build libbpf.so when libbpf.map changes · 61c7aa50

Andrii Nakryiko authored Aug 15, 2021

Ensure libbpf.so is re-built whenever libbpf.map is modified.  Without this,
changes to libbpf.map are not detected and versioned symbols mismatch error
will be reported until `make clean && make` is used, which is a suboptimal
developer experience.

Fixes: 306b267c ("libbpf: Verify versioned symbols")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210815070609.987780-8-andrii@kernel.org

61c7aa50

bpf: Add bpf_get_attach_cookie() BPF helper to access bpf_cookie value · 7adfc6c9

Andrii Nakryiko authored Aug 15, 2021

Add new BPF helper, bpf_get_attach_cookie(), which can be used by BPF programs
to get access to a user-provided bpf_cookie value, specified during BPF
program attachment (BPF link creation) time.

Naming is hard, though. With the concept being named "BPF cookie", I've
considered calling the helper:
  - bpf_get_cookie() -- seems too unspecific and easily mistaken with socket
    cookie;
  - bpf_get_bpf_cookie() -- too much tautology;
  - bpf_get_link_cookie() -- would be ok, but while we create a BPF link to
    attach BPF program to BPF hook, it's still an "attachment" and the
    bpf_cookie is associated with BPF program attachment to a hook, not a BPF
    link itself. Technically, we could support bpf_cookie with old-style
    cgroup programs.So I ultimately rejected it in favor of
    bpf_get_attach_cookie().

Currently all perf_event-backed BPF program types support
bpf_get_attach_cookie() helper. Follow-up patches will add support for
fentry/fexit programs as well.

While at it, mark bpf_tracing_func_proto() as static to make it obvious that
it's only used from within the kernel/trace/bpf_trace.c.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210815070609.987780-7-andrii@kernel.org

7adfc6c9

bpf: Allow to specify user-provided bpf_cookie for BPF perf links · 82e6b1ee

Andrii Nakryiko authored Aug 15, 2021

Add ability for users to specify custom u64 value (bpf_cookie) when creating
BPF link for perf_event-backed BPF programs (kprobe/uprobe, perf_event,
tracepoints).

This is useful for cases when the same BPF program is used for attaching and
processing invocation of different tracepoints/kprobes/uprobes in a generic
fashion, but such that each invocation is distinguished from each other (e.g.,
BPF program can look up additional information associated with a specific
kernel function without having to rely on function IP lookups). This enables
new use cases to be implemented simply and efficiently that previously were
possible only through code generation (and thus multiple instances of almost
identical BPF program) or compilation at runtime (BCC-style) on target hosts
(even more expensive resource-wise). For uprobes it is not even possible in
some cases to know function IP before hand (e.g., when attaching to shared
library without PID filtering, in which case base load address is not known
for a library).

This is done by storing u64 bpf_cookie in struct bpf_prog_array_item,
corresponding to each attached and run BPF program. Given cgroup BPF programs
already use two 8-byte pointers for their needs and cgroup BPF programs don't
have (yet?) support for bpf_cookie, reuse that space through union of
cgroup_storage and new bpf_cookie field.

Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
program execution code, which luckily is now also split from
BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
giving access to this user-provided cookie value from inside a BPF program.
Generic perf_event BPF programs will access this value from perf_event itself
through passed in BPF program context.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20210815070609.987780-6-andrii@kernel.org

82e6b1ee

bpf: Implement minimal BPF perf link · b89fbfbb

Andrii Nakryiko authored Aug 15, 2021

Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
the common BPF link infrastructure, allowing to list all active perf_event
based attachments, auto-detaching BPF program from perf_event when link's FD
is closed, get generic BPF link fdinfo/get_info functionality.

BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
are currently supported.

Force-detaching and atomic BPF program updates are not yet implemented, but
with perf_event-based BPF links we now have common framework for this without
the need to extend ioctl()-based perf_event interface.

One interesting consideration is a new value for bpf_attach_type, which
BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
define a single BPF_PERF_EVENT attach type for all of them and adjust
link_create()'s logic for checking correspondence between attach type and
program type.

The alternative would be to define three new attach types (e.g., BPF_KPROBE,
BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
libbpf. I chose to not do this to avoid unnecessary proliferation of
bpf_attach_type enum values and not have to deal with naming conflicts.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20210815070609.987780-5-andrii@kernel.org

b89fbfbb

bpf: Refactor perf_event_set_bpf_prog() to use struct bpf_prog input · 652c1b17

Andrii Nakryiko authored Aug 15, 2021

Make internal perf_event_set_bpf_prog() use struct bpf_prog pointer as an
input argument, which makes it easier to re-use for other internal uses
(coming up for BPF link in the next patch). BPF program FD is not as
convenient and in some cases it's not available. So switch to struct bpf_prog,
move out refcounting outside and let caller do bpf_prog_put() in case of an
error. This follows the approach of most of the other BPF internal functions.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210815070609.987780-4-andrii@kernel.org

652c1b17

bpf: Refactor BPF_PROG_RUN_ARRAY family of macros into functions · 7d08c2c9

Andrii Nakryiko authored Aug 15, 2021

Similar to BPF_PROG_RUN, turn BPF_PROG_RUN_ARRAY macros into proper functions
with all the same readability and maintainability benefits. Making them into
functions required shuffling around bpf_set_run_ctx/bpf_reset_run_ctx
functions. Also, explicitly specifying the type of the BPF prog run callback
required adjusting __bpf_prog_run_save_cb() to accept const void *, casted
internally to const struct sk_buff.

Further, split out a cgroup-specific BPF_PROG_RUN_ARRAY_CG and
BPF_PROG_RUN_ARRAY_CG_FLAGS from the more generic BPF_PROG_RUN_ARRAY due to
the differences in bpf_run_ctx used for those two different use cases.

I think BPF_PROG_RUN_ARRAY_CG would benefit from further refactoring to accept
struct cgroup and enum bpf_attach_type instead of bpf_prog_array, fetching
cgrp->bpf.effective[type] and RCU-dereferencing it internally. But that
required including include/linux/cgroup-defs.h, which I wasn't sure is ok with
everyone.

The remaining generic BPF_PROG_RUN_ARRAY function will be extended to
pass-through user-provided context value in the next patch.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210815070609.987780-3-andrii@kernel.org

7d08c2c9

bpf: Refactor BPF_PROG_RUN into a function · fb7dd8bc

Andrii Nakryiko authored Aug 15, 2021

Turn BPF_PROG_RUN into a proper always inlined function. No functional and
performance changes are intended, but it makes it much easier to understand
what's going on with how BPF programs are actually get executed. It's more
obvious what types and callbacks are expected. Also extra () around input
parameters can be dropped, as well as `__` variable prefixes intended to avoid
naming collisions, which makes the code simpler to read and write.

This refactoring also highlighted one extra issue. BPF_PROG_RUN is both
a macro and an enum value (BPF_PROG_RUN == BPF_PROG_TEST_RUN). Turning
BPF_PROG_RUN into a function causes naming conflict compilation error. So
rename BPF_PROG_RUN into lower-case bpf_prog_run(), similar to
bpf_prog_run_xdp(), bpf_prog_run_pin_on_cpu(), etc. All existing callers of
BPF_PROG_RUN, the macro, are switched to bpf_prog_run() explicitly.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210815070609.987780-2-andrii@kernel.org

fb7dd8bc

bpf, tests: Fix spelling mistake "shoft" -> "shift" · 1bda52f8

Colin Ian King authored Aug 15, 2021

There is a spelling mistake in a literal string. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210815213950.47751-1-colin.king@canonical.com

1bda52f8

15 Aug, 2021 6 commits

Merge branch 'BPF iterator for UNIX domain socket.' · fa183a86

Andrii Nakryiko authored Aug 15, 2021

Kuniyuki Iwashima says:

====================

This patch set adds BPF iterator support for UNIX domain socket.  The first
patch implements it, and the second adds "%c" support for BPF_SEQ_PRINTF().

Thanks to Yonghong Song for the fix [0] for the LLVM code gen.  The fix
prevents the LLVM compiler from transforming the loop exit condition '<' to
'!=', where the upper bound is not a constant.  The transformation leads
the verifier to interpret it as an infinite loop.

And thanks to Andrii Nakryiko for its workaround [1].

[0] https://reviews.llvm.org/D107483
[1] https://lore.kernel.org/netdev/CAEf4BzZ3sVx1m1mOCcPcuVPiY6cWEAO=6VGHDiXEs9ZVD-RoLg@mail.gmail.com/

Changelog:
  v6:
  - Align the header "Inde" column
  - Change int vars to __u64 not to break test_progs-no_alu32
  - Move the if statement into the for loop not to depend on the fix [0]
  - Drop the README change
  - Modify "%c" positive test patterns

  v5:
  https://lore.kernel.org/netdev/20210812164557.79046-1-kuniyu@amazon.co.jp/
  - Align header line of bpf_iter_unix.c
  - Add test for "%c"

  v4:
  https://lore.kernel.org/netdev/20210810092807.13190-1-kuniyu@amazon.co.jp/
  - Check IS_BUILTIN(CONFIG_UNIX)
  - Support "%c" in BPF_SEQ_PRINTF()
  - Uncomment the code to print the name of the abstract socket
  - Mention the LLVM fix in README.rst
  - Remove the 'aligned' attribute in bpf_iter.h
  - Keep the format string on a single line

  v3:
  https://lore.kernel.org/netdev/20210804070851.97834-1-kuniyu@amazon.co.jp/
  - Export some functions for CONFIG_UNIX=m

  v2:
  https://lore.kernel.org/netdev/20210803011110.21205-1-kuniyu@amazon.co.jp/
  - Implement bpf_iter specific seq_ops->stop()
  - Add bpf_iter__unix in bpf_iter.h
  - Move common definitions in selftest to bpf_tracing_net.h
  - Include the code for abstract UNIX domain socket as comment in selftest
  - Use ASSERT_OK_PTR() instead of CHECK()
  - Make ternary operators on single line

  v1:
  https://lore.kernel.org/netdev/20210729233645.4869-1-kuniyu@amazon.co.jp/
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

fa183a86

selftest/bpf: Extend the bpf_snprintf() test for "%c". · ce547335

Kuniyuki Iwashima authored Aug 14, 2021

This patch adds various "positive" patterns for "%c" and two "negative"
patterns for wide character.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210814015718.42704-5-kuniyu@amazon.co.jp

ce547335

selftest/bpf: Implement sample UNIX domain socket iterator program. · 04e92818

Kuniyuki Iwashima authored Aug 14, 2021

The iterator can output almost the same result compared to /proc/net/unix.
The header line is aligned, and the Inode column uses "%8lu" because "%5lu"
can be easily overflown.

  # cat /sys/fs/bpf/unix
  Num               RefCount Protocol Flags    Type St    Inode Path
  ffff963c06689800: 00000002 00000000 00010000 0001 01    18697 private/defer
  ffff963c7c979c00: 00000002 00000000 00000000 0001 01   598245 @Hello@World@

  # cat /proc/net/unix
  Num       RefCount Protocol Flags    Type St Inode Path
  ffff963c06689800: 00000002 00000000 00010000 0001 01 18697 private/defer
  ffff963c7c979c00: 00000002 00000000 00000000 0001 01 598245 @Hello@World@
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210814015718.42704-4-kuniyu@amazon.co.jp

04e92818

bpf: Support "%c" in bpf_bprintf_prepare(). · 3478cfcf

Kuniyuki Iwashima authored Aug 14, 2021

/proc/net/unix uses "%c" to print a single-byte character to escape '\0' in
the name of the abstract UNIX domain socket.  The following selftest uses
it, so this patch adds support for "%c".  Note that it does not support
wide character ("%lc" and "%llc") for simplicity.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210814015718.42704-3-kuniyu@amazon.co.jp

3478cfcf

bpf: af_unix: Implement BPF iterator for UNIX domain socket. · 2c860a43

Kuniyuki Iwashima authored Aug 14, 2021

This patch implements the BPF iterator for the UNIX domain socket.

Currently, the batch optimisation introduced for the TCP iterator in the
commit 04c7820b ("bpf: tcp: Bpf iter batching and lock_sock") is not
used for the UNIX domain socket.  It will require replacing the big lock
for the hash table with small locks for each hash list not to block other
processes.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210814015718.42704-2-kuniyu@amazon.co.jp

2c860a43

samples/bpf: Define MAX_ENTRIES instead of a magic number in offwaketime · d1bf7c4d

Muhammad Falak R Wani authored Aug 15, 2021

Define MAX_ENTRIES instead of using 10000 as a magic number in various
places.
Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210815065013.15411-1-falakreyaz@gmail.com

d1bf7c4d

14 Aug, 2021 3 commits

Merge branch 'bpf: Allow bpf_get_netns_cookie in BPF_PROG_TYPE_CGROUP_SOCKOPT' · faff1cca

Andrii Nakryiko authored Aug 13, 2021

Stanislav Fomichev says:

====================

We'd like to be able to identify netns from setsockopt hooks
to be able to do the enforcement of some options only in the
"initial" netns (to give users the ability to create clear/isolated
sandboxes if needed without any enforcement by doing unshare(net)).

v3:
- remove extra 'ctx->skb == NULL' check (Martin KaFai Lau)
- rework test to make sure the helper is really called, not just
  verified

v2:
- add missing CONFIG_NET
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

faff1cca

selftests/bpf: Verify bpf_get_netns_cookie in BPF_PROG_TYPE_CGROUP_SOCKOPT · 6a3a3dcc

Stanislav Fomichev authored Aug 13, 2021

Add extra calls to sockopt_sk.c.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210813230530.333779-3-sdf@google.com

6a3a3dcc

bpf: Allow bpf_get_netns_cookie in BPF_PROG_TYPE_CGROUP_SOCKOPT · f1248dee

Stanislav Fomichev authored Aug 13, 2021

This is similar to existing BPF_PROG_TYPE_CGROUP_SOCK
and BPF_PROG_TYPE_CGROUP_SOCK_ADDR.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210813230530.333779-2-sdf@google.com

f1248dee

13 Aug, 2021 2 commits

selftests/bpf: Fix test_core_autosize on big-endian machines · d164dd9a

Ilya Leoshkevich authored Aug 13, 2021

The "probed" part of test_core_autosize copies an integer using
bpf_core_read() into an integer of a potentially different size.
On big-endian machines a destination offset is required for this to
produce a sensible result.

Fixes: 888d83b9 ("selftests/bpf: Validate libbpf's auto-sizing of LD/ST/STX instructions")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210812224814.187460-1-iii@linux.ibm.com

d164dd9a

libbpf: Support weak typed ksyms. · 2211c825

Hao Luo authored Aug 11, 2021

Currently weak typeless ksyms have default value zero, when they don't
exist in the kernel. However, weak typed ksyms are rejected by libbpf
if they can not be resolved. This means that if a bpf object contains
the declaration of a nonexistent weak typed ksym, it will be rejected
even if there is no program that references the symbol.

Nonexistent weak typed ksyms can also default to zero just like
typeless ones. This allows programs that access weak typed ksyms to be
accepted by verifier, if the accesses are guarded. For example,

extern const int bpf_link_fops3 __ksym __weak;

/* then in BPF program */

if (&bpf_link_fops3) {
   /* use bpf_link_fops3 */
}

If actual use of nonexistent typed ksym is not guarded properly,
verifier would see that register is not PTR_TO_BTF_ID and wouldn't
allow to use it for direct memory reads or passing it to BPF helpers.
Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210812003819.2439037-1-haoluo@google.com

2211c825