Commits · d094c9851ae970d4c95a269f9410464ec064b8cd · Kirill Smelkov / linux

21 Jan, 2022 16 commits

net: mvneta: simplify mvneta_swbm_add_rx_fragment management · d094c985

Lorenzo Bianconi authored Jan 21, 2022

Relying on xdp frags bit, remove skb_shared_info structure
allocated on the stack in mvneta_rx_swbm routine and simplify
mvneta_swbm_add_rx_fragment accessing skb_shared_info in the
xdp_buff structure directly. There is no performance penalty in
this approach since mvneta_swbm_add_rx_fragment is run just
for xdp frags use-case.
Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/45f050c094ccffce49d6bc5112939ed35250ba90.1642758637.git.lorenzo@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d094c985

net: mvneta: update frags bit before passing the xdp buffer to eBPF layer · 76a67694

Lorenzo Bianconi authored Jan 21, 2022

Update frags bit (XDP_FLAGS_HAS_FRAGS) in xdp_buff to notify
XDP/eBPF layer and XDP remote drivers if this is a "non-linear"
XDP buffer. Access skb_shared_info only if XDP_FLAGS_HAS_FRAGS flag
is set in order to avoid possible cache-misses.
Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/c00a73097f8a35860d50dae4a36e6cc9ef7e172f.1642758637.git.lorenzo@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

76a67694

xdp: introduce flags field in xdp_buff/xdp_frame · 2e88d4ff

Lorenzo Bianconi authored Jan 21, 2022

Introduce flags field in xdp_frame and xdp_buffer data structures
to define additional buffer features. At the moment the only
supported buffer feature is frags bit (XDP_FLAGS_HAS_FRAGS).
frags bit is used to specify if this is a linear buffer
(XDP_FLAGS_HAS_FRAGS not set) or a frags frame (XDP_FLAGS_HAS_FRAGS
set). In the latter case the driver is expected to initialize the
skb_shared_info structure at the end of the first buffer to link together
subsequent buffers belonging to the same frame.
Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/e389f14f3a162c0a5bc6a2e1aa8dd01a90be117d.1642758637.git.lorenzo@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

2e88d4ff

net: skbuff: add size metadata to skb_shared_info for xdp · d16697cb

Lorenzo Bianconi authored Jan 21, 2022

Introduce xdp_frags_size field in skb_shared_info data structure
to store xdp_buff/xdp_frame frame paged size (xdp_frags_size will
be used in xdp frags support). In order to not increase
skb_shared_info size we will use a hole due to skb_shared_info
alignment.
Acked-by: Toke Hoiland-Jorgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/8a849819a3e0a143d540f78a3a5add76e17e980d.1642758637.git.lorenzo@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d16697cb

selftests: bpf: test BPF_PROG_QUERY for progs attached to sockmap · 820e6e22

Di Zhu authored Jan 19, 2022

Add test for querying progs attached to sockmap. we use an existing
libbpf query interface to query prog cnt before and after progs
attaching to sockmap and check whether the queried prog id is right.
Signed-off-by: Di Zhu <zhudi2@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20220119014005.1209-2-zhudi2@huawei.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

820e6e22

bpf: support BPF_PROG_QUERY for progs attached to sockmap · 748cd572

Di Zhu authored Jan 19, 2022

Right now there is no way to query whether BPF programs are
attached to a sockmap or not.

we can use the standard interface in libbpf to query, such as:
bpf_prog_query(mapFd, BPF_SK_SKB_STREAM_PARSER, 0, NULL, ...);
the mapFd is the fd of sockmap.
Signed-off-by: Di Zhu <zhudi2@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20220119014005.1209-1-zhudi2@huawei.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

748cd572

Merge branch 'libbpf: streamline netlink-based XDP APIs' · 3f712d46

Alexei Starovoitov authored Jan 20, 2022

Andrii Nakryiko says:

====================

Revamp existing low-level XDP APIs provided by libbpf to follow more
consistent naming (new APIs follow bpf_tc_xxx() approach where it makes
sense) and be extensible without ABI breakages (OPTS-based). See patch #1 for
details, remaining patches switch bpftool, selftests/bpf and samples/bpf to
new APIs.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

3f712d46

samples/bpf: adapt samples/bpf to bpf_xdp_xxx() APIs · d4e34bfc

Andrii Nakryiko authored Jan 19, 2022

Use new bpf_xdp_*() APIs across all XDP-related BPF samples.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120061422.2710637-5-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d4e34bfc

selftests/bpf: switch to new libbpf XDP APIs · 54435652

Andrii Nakryiko authored Jan 19, 2022

Switch to using new bpf_xdp_*() APIs across all selftests. Take
advantage of a more straightforward and user-friendly semantics of
old_prog_fd (0 means "don't care") in few places.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120061422.2710637-4-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

54435652

bpftool: use new API for attaching XDP program · c86575ec

Andrii Nakryiko authored Jan 19, 2022

Switch to new bpf_xdp_attach() API to avoid deprecation warnings.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120061422.2710637-3-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

c86575ec

libbpf: streamline low-level XDP APIs · c359821a

Andrii Nakryiko authored Jan 19, 2022

Introduce 4 new netlink-based XDP APIs for attaching, detaching, and
querying XDP programs:
  - bpf_xdp_attach;
  - bpf_xdp_detach;
  - bpf_xdp_query;
  - bpf_xdp_query_id.

These APIs replace bpf_set_link_xdp_fd, bpf_set_link_xdp_fd_opts,
bpf_get_link_xdp_id, and bpf_get_link_xdp_info APIs ([0]). The latter
don't follow a consistent naming pattern and some of them use
non-extensible approaches (e.g., struct xdp_link_info which can't be
modified without breaking libbpf ABI).

The approach I took with these low-level XDP APIs is similar to what we
did with low-level TC APIs. There is a nice duality of bpf_tc_attach vs
bpf_xdp_attach, and so on. I left bpf_xdp_attach() to support detaching
when -1 is specified for prog_fd for generality and convenience, but
bpf_xdp_detach() is preferred due to clearer naming and associated
semantics. Both bpf_xdp_attach() and bpf_xdp_detach() accept the same
opts struct allowing to specify expected old_prog_fd.

While doing the refactoring, I noticed that old APIs require users to
specify opts with old_fd == -1 to declare "don't care about already
attached XDP prog fd" condition. Otherwise, FD 0 is assumed, which is
essentially never an intended behavior. So I made this behavior
consistent with other kernel and libbpf APIs, in which zero FD means "no
FD". This seems to be more in line with the latest thinking in BPF land
and should cause less user confusion, hopefully.

For querying, I left two APIs, both more generic bpf_xdp_query()
allowing to query multiple IDs and attach mode, but also
a specialization of it, bpf_xdp_query_id(), which returns only requested
prog_id. Uses of prog_id returning bpf_get_link_xdp_id() were so
prevalent across selftests and samples, that it seemed a very common use
case and using bpf_xdp_query() for doing it felt very cumbersome with
a highly branches if/else chain based on flags and attach mode.

Old APIs are scheduled for deprecation in libbpf 0.8 release.

  [0] Closes: https://github.com/libbpf/libbpf/issues/309Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20220120061422.2710637-2-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

c359821a

Merge branch 'libbpf: deprecate legacy BPF map definitions' · 1713e33b

Alexei Starovoitov authored Jan 20, 2022

Andrii Nakryiko says:

====================

Officially deprecate legacy BPF map definitions in libbpf. They've been slated
for deprecation for a while in favor of more powerful BTF-defined map
definitions and this patch set adds warnings and a way to enforce this in
libbpf through LIBBPF_STRICT_MAP_DEFINITIONS strict mode flag.

Selftests are fixed up and updated, BPF documentation is updated, bpftool's
strict mode usage is adjusted to avoid breaking users unnecessarily.

v1->v2:
  - replace missed bpf_map_def case in Documentation/bpf/btf.rst (Alexei).
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

1713e33b

docs/bpf: update BPF map definition example · 96c85308

Andrii Nakryiko authored Jan 19, 2022

Use BTF-defined map definition in the documentation example.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120060529.1890907-5-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

96c85308

libbpf: deprecate legacy BPF map definitions · 93b8952d

Andrii Nakryiko authored Jan 19, 2022

Enact deprecation of legacy BPF map definition in SEC("maps") ([0]). For
the definitions themselves introduce LIBBPF_STRICT_MAP_DEFINITIONS flag
for libbpf strict mode. If it is set, error out on any struct
bpf_map_def-based map definition. If not set, libbpf will print out
a warning for each legacy BPF map to raise awareness that it goes away.

For any use of BPF_ANNOTATE_KV_PAIR() macro providing a legacy way to
associate BTF key/value type information with legacy BPF map definition,
warn through libbpf's pr_warn() error message (but don't fail BPF object
open).

BPF-side struct bpf_map_def is marked as deprecated. User-space struct
bpf_map_def has to be used internally in libbpf, so it is left
untouched. It should be enough for bpf_map__def() to be marked
deprecated to raise awareness that it goes away.

bpftool is an interesting case that utilizes libbpf to open BPF ELF
object to generate skeleton. As such, even though bpftool itself uses
full on strict libbpf mode (LIBBPF_STRICT_ALL), it has to relax it a bit
for BPF map definition handling to minimize unnecessary disruptions. So
opt-out of LIBBPF_STRICT_MAP_DEFINITIONS for bpftool. User's code that
will later use generated skeleton will make its own decision whether to
enforce LIBBPF_STRICT_MAP_DEFINITIONS or not.

There are few tests in selftests/bpf that are consciously using legacy
BPF map definitions to test libbpf functionality. For those, temporary
opt out of LIBBPF_STRICT_MAP_DEFINITIONS mode for the duration of those
tests.

[0] Closes: https://github.com/libbpf/libbpf/issues/272Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120060529.1890907-4-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

93b8952d

selftests/bpf: convert remaining legacy map definitions · ccc3f569

Andrii Nakryiko authored Jan 19, 2022

Converted few remaining legacy BPF map definition to BTF-defined ones.
For the remaining two bpf_map_def-based legacy definitions that we want
to keep for testing purposes until libbpf 1.0 release, guard them in
pragma to suppres deprecation warnings which will be added in libbpf in
the next commit.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120060529.1890907-3-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

ccc3f569

selftests/bpf: fail build on compilation warning · 32b34294

Andrii Nakryiko authored Jan 19, 2022

It's very easy to miss compilation warnings without -Werror, which is
not set for selftests. libbpf and bpftool are already strict about this,
so make selftests/bpf also treat compilation warnings as errors to catch
such regressions early.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20220120060529.1890907-2-andrii@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

32b34294

20 Jan, 2022 5 commits

selftests/bpf: Do not fail build if CONFIG_NF_CONNTRACK=m/n · 1058b6a7

Kumar Kartikeya Dwivedi authored Jan 20, 2022

Some users have complained that selftests fail to build when
CONFIG_NF_CONNTRACK=m. It would be useful to allow building as long as
it is set to module or built-in, even though in case of building as
module, user would need to load it before running the selftest. Note
that this also allows building selftest when CONFIG_NF_CONNTRACK is
disabled.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220120164932.2798544-1-memxor@gmail.com

1058b6a7

selftests: bpf: Fix bind on used port · 8c0be063

Felix Maurer authored Jan 18, 2022

The bind_perm BPF selftest failed when port 111/tcp was already in use
during the test. To fix this, the test now runs in its own network name
space.

To use unshare, it is necessary to reorder the includes. The style of
the includes is adapted to be consistent with the other prog_tests.

v2: Replace deprecated CHECK macro with ASSERT_OK

Fixes: 8259fdeb ("selftests/bpf: Verify that rebinding to port < 1024 from BPF works")
Signed-off-by: Felix Maurer <fmaurer@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/551ee65533bb987a43f93d88eaf2368b416ccd32.1642518457.git.fmaurer@redhat.com

8c0be063

Merge branch 'rely on ASSERT marcos in xdp_bpf2bpf.c/xdp_adjust_tail.c' · 38f033a1

Andrii Nakryiko authored Jan 20, 2022

Lorenzo Bianconi says:

====================

Rely on ASSERT* macros and get rid of deprecated CHECK ones in xdp_bpf2bpf and
xdp_adjust_tail bpf selftests.
This is a preliminary series for XDP multi-frags support.

Changes since v1:
- run each ASSERT test separately
- drop unnecessary return statements
- drop unnecessary if condition in test_xdp_bpf2bpf()
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

38f033a1

bpf: selftests: Get rid of CHECK macro in xdp_bpf2bpf.c · fa6fde35

Lorenzo Bianconi authored Jan 20, 2022

Rely on ASSERT* macros and get rid of deprecated CHECK ones in
xdp_bpf2bpf bpf selftest.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/df7e5098465016e27d91f2c69a376a35d63a7621.1642679130.git.lorenzo@kernel.org

fa6fde35

bpf: selftests: Get rid of CHECK macro in xdp_adjust_tail.c · 791cad02

Lorenzo Bianconi authored Jan 20, 2022

Rely on ASSERT* macros and get rid of deprecated CHECK ones in
xdp_adjust_tail bpf selftest.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/c0ab002ffa647a20ec9e584214bf0d4373142b54.1642679130.git.lorenzo@kernel.org

791cad02

19 Jan, 2022 12 commits

Merge branch 'bpf: allow cgroup progs to export custom retval to userspace' · 4e950747

Alexei Starovoitov authored Jan 19, 2022

YiFei Zhu says:

====================

Right now, most cgroup hooks are best used for permission checks. They
can only reject a syscall with -EPERM, so a cause of a rejection, if
the rejected by eBPF cgroup hooks, is ambiguous to userspace.
Additionally, if the syscalls are implemented in eBPF, all permission
checks and the implementation has to happen within the same filter,
as programs executed later in the series of progs are unaware of the
return values return by the previous progs.

This patch series adds two helpers, bpf_get_retval and bpf_set_retval,
that allows hooks to get/set the return value of syscall to userspace.
This also allows later progs to retrieve retval set by previous progs.

For legacy programs that rejects a syscall without setting the retval,
for backwards compatibility, if a prog rejects without itself or a
prior prog setting retval to an -err, the retval is set by the kernel
to -EPERM.

For getsockopt hooks that has ctx->retval, this variable mirrors that
that accessed by the helpers.

Additionally, the following user-visible behavior for getsockopt
hooks has changed:
  - If a prior filter rejected the syscall, it will be visible
    in ctx->retval.
  - Attempting to change the retval arbitrarily is now allowed and
    will not cause an -EFAULT.
  - If kernel rejects a getsockopt syscall before running the hooks,
    the error will be visible in ctx->retval. Returning 0 from the
    prog will not overwrite the error to -EPERM unless there is an
    explicit call of bpf_set_retval(-EPERM)

Tests have been added in this series to test the behavior of the helper
with cgroup setsockopt getsockopt hooks.

Patch 1 changes the API of macros to prepare for the next patch and
  should be a no-op.
Patch 2 moves ctx->retval to a struct pointed to by current
  task_struct.
Patch 3 implements the helpers.
Patch 4 tests the behaviors of the helpers.
Patch 5 updates a test after the test broke due to the visible changes.

v1 -> v2:
  - errno -> retval
  - split one helper to get & set helpers
  - allow retval to be set arbitrarily in the general case
  - made the helper retval and context retval mirror each other
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

4e950747

selftests/bpf: Update sockopt_sk test to the use bpf_set_retval · 1080ef5c

YiFei Zhu authored Dec 16, 2021

The tests would break without this patch, because at one point it calls
  getsockopt(fd, SOL_TCP, TCP_ZEROCOPY_RECEIVE, &buf, &optlen)
This getsockopt receives the kernel-set -EINVAL. Prior to this patch
series, the eBPF getsockopt hook's -EPERM would override kernel's
-EINVAL, however, after this patch series, return 0's automatic
-EPERM will not; the eBPF prog has to explicitly bpf_set_retval(-EPERM)
if that is wanted.

I also removed the explicit mentions of EPERM in the comments in the
prog.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/4f20b77cb46812dbc2bdcd7e3fa87c7573bde55e.1639619851.git.zhuyifei@google.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

1080ef5c

selftests/bpf: Test bpf_{get,set}_retval behavior with cgroup/sockopt · b8bff6f8

YiFei Zhu authored Dec 16, 2021

The tests checks how different ways of interacting with the helpers
(getting retval, setting EUNATCH, EISCONN, and legacy reject
returning 0 without setting retval), produce different results in
both the setsockopt syscall and the retval returned by the helper.
A few more tests verify the interaction between the retval of the
helper and the retval in getsockopt context.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/43ec60d679ae3f4f6fd2460559c28b63cb93cd12.1639619851.git.zhuyifei@google.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

b8bff6f8

bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall return value · b44123b4

YiFei Zhu authored Dec 16, 2021

The helpers continue to use int for retval because all the hooks
are int-returning rather than long-returning. The return value of
bpf_set_retval is int for future-proofing, in case in the future
there may be errors trying to set the retval.

After the previous patch, if a program rejects a syscall by
returning 0, an -EPERM will be generated no matter if the retval
is already set to -err. This patch change it being forced only if
retval is not -err. This is because we want to support, for
example, invoking bpf_set_retval(-EINVAL) and return 0, and have
the syscall return value be -EINVAL not -EPERM.

For BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY, the prior behavior is
that, if the return value is NET_XMIT_DROP, the packet is silently
dropped. We preserve this behavior for backward compatibility
reasons, so even if an errno is set, the errno does not return to
caller. However, setting a non-err to retval cannot propagate so
this is not allowed and we return a -EFAULT in that case.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/b4013fd5d16bed0b01977c1fafdeae12e1de61fb.1639619851.git.zhuyifei@google.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

b44123b4

bpf: Move getsockopt retval to struct bpf_cg_run_ctx · c4dcfdd4

YiFei Zhu authored Dec 16, 2021

The retval value is moved to struct bpf_cg_run_ctx for ease of access
in different prog types with different context structs layouts. The
helper implementation (to be added in a later patch in the series) can
simply perform a container_of from current->bpf_ctx to retrieve
bpf_cg_run_ctx.

Unfortunately, there is no easy way to access the current task_struct
via the verifier BPF bytecode rewrite, aside from possibly calling a
helper, so a pointer to current task is added to struct bpf_sockopt_kern
so that the rewritten BPF bytecode can access struct bpf_cg_run_ctx with
an indirection.

For backward compatibility, if a getsockopt program rejects a syscall
by returning 0, an -EPERM will be generated, by having the
BPF_PROG_RUN_ARRAY_CG family macros automatically set the retval to
-EPERM. Unlike prior to this patch, this -EPERM will be visible to
ctx->retval for any other hooks down the line in the prog array.

Additionally, the restriction that getsockopt filters can only set
the retval to 0 is removed, considering that certain getsockopt
implementations may return optlen. Filters are now able to set the
value arbitrarily.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/73b0325f5c29912ccea7ea57ec1ed4d388fc1d37.1639619851.git.zhuyifei@google.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

c4dcfdd4

bpf: Make BPF_PROG_RUN_ARRAY return -err instead of allow boolean · f10d0596

YiFei Zhu authored Dec 16, 2021

Right now BPF_PROG_RUN_ARRAY and related macros return 1 or 0
for whether the prog array allows or rejects whatever is being
hooked. The caller of these macros then return -EPERM or continue
processing based on thw macro's return value. Unforunately this is
inflexible, since -EPERM is the only err that can be returned.

This patch should be a no-op; it prepares for the next patch. The
returning of the -EPERM is moved to inside the macros, so the outer
functions are directly returning what the macros returned if they
are non-zero.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/788abcdca55886d1f43274c918eaa9f792a9f33b.1639619851.git.zhuyifei@google.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

f10d0596

libbpf: Improve btf__add_btf() with an additional hashmap for strings. · d81283d2

Kui-Feng Lee authored Jan 19, 2022

Add a hashmap to map the string offsets from a source btf to the
string offsets from a target btf to reduce overheads.

btf__add_btf() calls btf__add_str() to add strings from a source to a
target btf. It causes many string comparisons, and it is a major
hotspot when adding a big btf. btf__add_str() uses strcmp() to check
if a hash entry is the right one. The extra hashmap here compares
offsets of strings, that are much cheaper. It remembers the results
of btf__add_str() for later uses to reduce the cost.

We are parallelizing BTF encoding for pahole by creating separated btf
instances for worker threads. These per-thread btf instances will be
added to the btf instance of the main thread by calling btf__add_str()
to deduplicate and write out. With this patch and -j4, the running
time of pahole drops to about 6.0s from 6.6s.

The following lines are the summary of 'perf stat' w/o the change.

6.668126396 seconds time elapsed

13.451054000 seconds user
0.715520000 seconds sys

The following lines are the summary w/ the change.

5.986973919 seconds time elapsed

12.939903000 seconds user
0.724152000 seconds sys

V4 fixes a bug of error checking against the pointer returned by
hashmap__new().

[v3] https://lore.kernel.org/bpf/20220118232053.2113139-1-kuifeng@fb.com/
[v2] https://lore.kernel.org/bpf/20220114193713.461349-1-kuifeng@fb.com/Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220119180214.255634-1-kuifeng@fb.com

d81283d2

bpf/scripts: Raise an exception if the correct number of sycalls are not generated · 0ba3929e

Usama Arif authored Jan 19, 2022

Currently the syscalls rst and subsequently man page are auto-generated
using function documentation present in bpf.h. If the documentation for the
syscall is missing or doesn't follow a specific format, then that syscall
is not dumped in the auto-generated rst.

This patch checks the number of syscalls documented within the header file
with those present as part of the enum bpf_cmd and raises an Exception if
they don't match. It is not needed with the currently documented upstream
syscalls, but can help in debugging when developing new syscalls when
there might be missing or misformatted documentation.

The function helper_number_check is moved to the Printer parent
class and renamed to elem_number_check as all the most derived children
classes are using this function now.
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220119114442.1452088-3-usama.arif@bytedance.com

0ba3929e

bpf/scripts: Make description and returns section for helpers/syscalls mandatory · f1f3f67f

Usama Arif authored Jan 19, 2022

This enforce a minimal formatting consistency for the documentation. The
description and returns missing for a few helpers have also been added.
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220119114442.1452088-2-usama.arif@bytedance.com

f1f3f67f

uapi/bpf: Add missing description and returns for helper documentation · e40fbbf0

Usama Arif authored Jan 19, 2022

Both description and returns section will become mandatory
for helpers and syscalls in a later commit to generate man pages.

This commit also adds in the documentation that BPF_PROG_RUN is
an alias for BPF_PROG_TEST_RUN for anyone searching for the
syscall in the generated man pages.
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220119114442.1452088-1-usama.arif@bytedance.com

e40fbbf0

bpftool: Adding support for BTF program names · b662000a

Raman Shukhau authored Jan 19, 2022

`bpftool prog list` and other bpftool subcommands that show
BPF program names currently get them from bpf_prog_info.name.
That field is limited to 16 (BPF_OBJ_NAME_LEN) chars which leads
to truncated names since many progs have much longer names.

The idea of this change is to improve all bpftool commands that
output prog name so that bpftool uses info from BTF to print
program names if available.

It tries bpf_prog_info.name first and fall back to btf only if
the name is suspected to be truncated (has 15 chars length).

Right now `bpftool p show id <id>` returns capped prog name

<id>: kprobe  name example_cap_cap  tag 712e...
...

With this change it would return

<id>: kprobe  name example_cap_capable  tag 712e...
...

Note, other commands that print prog names (e.g. "bpftool
cgroup tree") are also addressed in this change.
Signed-off-by: Raman Shukhau <ramasha@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220119100255.1068997-1-ramasha@fb.com

b662000a

libbpf: Define BTF_KIND_* constants in btf.h to avoid compilation errors · eaa266d8

Toke Høiland-Jørgensen authored Jan 18, 2022

The btf.h header included with libbpf contains inline helper functions to
check for various BTF kinds. These helpers directly reference the
BTF_KIND_* constants defined in the kernel header, and because the header
file is included in user applications, this happens in the user application
compile units.

This presents a problem if a user application is compiled on a system with
older kernel headers because the constants are not available. To avoid
this, add #defines of the constants directly in btf.h before using them.

Since the kernel header moved to an enum for BTF_KIND_*, the #defines can
shadow the enum values without any errors, so we only need #ifndef guards
for the constants that predates the conversion to enum. We group these so
there's only one guard for groups of values that were added together.

  [0] Closes: https://github.com/libbpf/libbpf/issues/436

Fixes: 223f903e ("bpf: Rename BTF_KIND_TAG to BTF_KIND_DECL_TAG")
Fixes: 5b84bd10 ("libbpf: Add support for BTF_KIND_TAG")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/bpf/20220118141327.34231-1-toke@redhat.com

eaa266d8

18 Jan, 2022 7 commits

Merge branch 'bpf: Batching iter for AF_UNIX sockets.' · 712d4793

Alexei Starovoitov authored Jan 18, 2022

Kuniyuki Iwashima says:

====================

Last year the commit afd20b92 ("af_unix: Replace the big lock with
small locks.") landed on bpf-next.  Now we can use a batching algorithm
for AF_UNIX bpf iter as TCP bpf iter.

Changelog:
- Add the 1st patch.
- Call unix_get_first() in .start()/.next() to always acquire a lock in
  each iteration in the 2nd patch.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

712d4793

selftest/bpf: Fix a stale comment. · a796966b

Kuniyuki Iwashima authored Jan 13, 2022

The commit b8a58aa6 ("af_unix: Cut unix_validate_addr() out of
unix_mkname().") moved the bound test part into unix_validate_addr().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Link: https://lore.kernel.org/r/20220113002849.4384-6-kuniyu@amazon.co.jpSigned-off-by: Alexei Starovoitov <ast@kernel.org>

a796966b

selftest/bpf: Test batching and bpf_(get|set)sockopt in bpf unix iter. · 7ff8985c

Kuniyuki Iwashima authored Jan 13, 2022

This patch adds a test for the batching and bpf_(get|set)sockopt in bpf
unix iter.

It does the following.

  1. Creates an abstract UNIX domain socket
  2. Call bpf_setsockopt()
  3. Call bpf_getsockopt() and save the value
  4. Call setsockopt()
  5. Call getsockopt() and save the value
  6. Compare the saved values
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Link: https://lore.kernel.org/r/20220113002849.4384-5-kuniyu@amazon.co.jpSigned-off-by: Alexei Starovoitov <ast@kernel.org>

7ff8985c

bpf: Support bpf_(get|set)sockopt() in bpf unix iter. · eb7d8f1d

Kuniyuki Iwashima authored Jan 13, 2022

This patch makes bpf_(get|set)sockopt() available when iterating AF_UNIX
sockets.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Link: https://lore.kernel.org/r/20220113002849.4384-4-kuniyu@amazon.co.jpSigned-off-by: Alexei Starovoitov <ast@kernel.org>

eb7d8f1d

bpf: af_unix: Use batching algorithm in bpf unix iter. · 855d8e77

Kuniyuki Iwashima authored Jan 13, 2022

The commit 04c7820b ("bpf: tcp: Bpf iter batching and lock_sock")
introduces the batching algorithm to iterate TCP sockets with more
consistency.

This patch uses the same algorithm to iterate AF_UNIX sockets.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Link: https://lore.kernel.org/r/20220113002849.4384-3-kuniyu@amazon.co.jpSigned-off-by: Alexei Starovoitov <ast@kernel.org>

855d8e77

af_unix: Refactor unix_next_socket(). · 4408d55a

Kuniyuki Iwashima authored Jan 13, 2022

Currently, unix_next_socket() is overloaded depending on the 2nd argument.
If it is NULL, unix_next_socket() returns the first socket in the hash. If
not NULL, it returns the next socket in the same hash list or the first
socket in the next non-empty hash list.

This patch refactors unix_next_socket() into two functions unix_get_first()
and unix_get_next(). unix_get_first() newly acquires a lock and returns
the first socket in the list. unix_get_next() returns the next socket in a
list or releases a lock and falls back to unix_get_first().

In the following patch, bpf iter holds entire sockets in a list and always
releases the lock before .show(). It always calls unix_get_first() to
acquire a lock in each iteration. So, this patch makes the change easier
to follow.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Link: https://lore.kernel.org/r/20220113002849.4384-2-kuniyu@amazon.co.jpSigned-off-by: Alexei Starovoitov <ast@kernel.org>

4408d55a

Merge branch 'Introduce unstable CT lookup helpers' · 2a1aff60

Alexei Starovoitov authored Jan 18, 2022

Kumar Kartikeya says:

====================

This series adds unstable conntrack lookup helpers using BPF kfunc support.  The
patch adding the lookup helper is based off of Maxim's recent patch to aid in
rebasing their series on top of this, all adjusted to work with module kfuncs [0].

  [0]: https://lore.kernel.org/bpf/20211019144655.3483197-8-maximmi@nvidia.com

To enable returning a reference to struct nf_conn, the verifier is extended to
support reference tracking for PTR_TO_BTF_ID, and kfunc is extended with support
for working as acquire/release functions, similar to existing BPF helpers. kfunc
returning pointer (limited to PTR_TO_BTF_ID in the kernel) can also return a
PTR_TO_BTF_ID_OR_NULL now, typically needed when acquiring a resource can fail.
kfunc can also receive PTR_TO_CTX and PTR_TO_MEM (with some limitations) as
arguments now. There is also support for passing a mem, len pair as argument
to kfunc now. In such cases, passing pointer to unsized type (void) is also
permitted.

Please see individual commits for details.

Changelog:
----------
v7 -> v8:
v7: https://lore.kernel.org/bpf/20220111180428.931466-1-memxor@gmail.com

 * Move enum btf_kfunc_hook to btf.c (Alexei)
 * Drop verbose log for unlikely failure case in __find_kfunc_desc_btf (Alexei)
 * Remove unnecessary barrier in register_btf_kfunc_id_set (Alexei)
 * Switch macro in bpf_nf test to __always_inline function (Alexei)

v6 -> v7:
v6: https://lore.kernel.org/bpf/20220102162115.1506833-1-memxor@gmail.com

 * Drop try_module_get_live patch, use flag in btf_module struct (Alexei)
 * Add comments and expand commit message detailing why we have to concatenate
   and sort vmlinux kfunc BTF ID sets (Alexei)
 * Use bpf_testmod for testing btf_try_get_module race (Alexei)
 * Use bpf_prog_type for both btf_kfunc_id_set_contains and
   register_btf_kfunc_id_set calls (Alexei)
 * In case of module set registration, directly assign set (Alexei)
 * Add CONFIG_USERFAULTFD=y to selftest config
 * Fix other nits

v5 -> v6:
v5: https://lore.kernel.org/bpf/20211230023705.3860970-1-memxor@gmail.com

 * Fix for a bug in btf_try_get_module leading to use-after-free
 * Drop *kallsyms_on_each_symbol loop, reinstate register_btf_kfunc_id_set (Alexei)
 * btf_free_kfunc_set_tab now takes struct btf, and handles resetting tab to NULL
 * Check return value btf_name_by_offset for param_name
 * Instead of using tmp_set, use btf->kfunc_set_tab directly, and simplify cleanup

v4 -> v5:
v4: https://lore.kernel.org/bpf/20211217015031.1278167-1-memxor@gmail.com

 * Move nf_conntrack helpers code to its own separate file (Toke, Pablo)
 * Remove verifier callbacks, put btf_id_sets in struct btf (Alexei)
  * Convert the in-kernel users away from the old API
 * Change len__ prefix convention to __sz suffix (Alexei)
 * Drop parent_ref_obj_id patch (Alexei)

v3 -> v4:
v3: https://lore.kernel.org/bpf/20211210130230.4128676-1-memxor@gmail.com

 * Guard unstable CT helpers with CONFIG_DEBUG_INFO_BTF_MODULES
 * Move addition of prog_test test kfuncs to selftest commit
 * Move negative kfunc tests to test_verifier suite
 * Limit struct nesting depth to 4, which should be enough for now

v2 -> v3:
v2: https://lore.kernel.org/bpf/20211209170929.3485242-1-memxor@gmail.com

 * Fix build error for !CONFIG_BPF_SYSCALL (Patchwork)

RFC v1 -> v2:
v1: https://lore.kernel.org/bpf/20211030144609.263572-1-memxor@gmail.com

 * Limit PTR_TO_MEM support to pointer to scalar, or struct with scalars (Alexei)
 * Use btf_id_set for checking acquire, release, ret type null (Alexei)
 * Introduce opts struct for CT helpers, move int err parameter to it
 * Add l4proto as parameter to CT helper's opts, remove separate tcp/udp helpers
 * Add support for mem, len argument pair to kfunc
 * Allow void * as pointer type for mem, len argument pair
 * Extend selftests to cover new additions to kfuncs
 * Copy ref_obj_id to PTR_TO_BTF_ID dst_reg on btf_struct_access, test it
 * Fix other misc nits, bugs, and expand commit messages
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

2a1aff60