Commits · 291471dd1559528a4c2ad5026eff94ed1030562b · Kirill Smelkov / linux

08 Mar, 2021 3 commits

libbpf, xsk: Add libbpf_smp_store_release libbpf_smp_load_acquire · 291471dd

Björn Töpel authored Mar 05, 2021

Now that the AF_XDP rings have load-acquire/store-release semantics,
move libbpf to that as well.

The library-internal libbpf_smp_{load_acquire,store_release} are only
valid for 32-bit words on ARM64.

Also, remove the barriers that are no longer in use.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210305094113.413544-3-bjorn.topel@gmail.com

291471dd

xsk: Update rings for load-acquire/store-release barriers · a23b3f56

Björn Töpel authored Mar 05, 2021

Currently, the AF_XDP rings uses general smp_{r,w,}mb() barriers on
the kernel-side. On most modern architectures
load-acquire/store-release barriers perform better, and results in
simpler code for circular ring buffers.

This change updates the XDP socket rings to use
load-acquire/store-release barriers.

It is important to note that changing from the old smp_{r,w,}mb()
barriers, to load-acquire/store-release barriers does not break
compatibility. The old semantics work with the new one, and vice
versa.

As pointed out by "Documentation/memory-barriers.txt" in the "SMP
BARRIER PAIRING" section:

  "General barriers pair with each other, though they also pair with
  most other types of barriers, albeit without multicopy atomicity.
  An acquire barrier pairs with a release barrier, but both may also
  pair with other barriers, including of course general barriers."

How different barriers behaves and pairs is outlined in
"tools/memory-model/Documentation/cheatsheet.txt".

In order to make sure that compatibility is not broken, LKMM herd7
based litmus tests can be constructed and verified.

We generalize the XDP socket ring to a one entry ring, and create two
scenarios; One where the ring is full, where only the consumer can
proceed, followed by the producer. One where the ring is empty, where
only the producer can proceed, followed by the consumer. Each scenario
is then expanded to four different tests: general producer/general
consumer, general producer/acqrel consumer, acqrel producer/general
consumer, acqrel producer/acqrel consumer. In total eight tests.

The empty ring test:
  C spsc-rb+empty

  // Simple one entry ring:
  // prod cons     allowed action       prod cons
  //    0    0 =>       prod          =>   1    0
  //    0    1 =>       cons          =>   0    0
  //    1    0 =>       cons          =>   1    1
  //    1    1 =>       prod          =>   0    1

  {}

  // We start at prod==0, cons==0, data==0, i.e. nothing has been
  // written to the ring. From here only the producer can start, and
  // should write 1. Afterwards, consumer can continue and read 1 to
  // data. Can we enter state prod==1, cons==1, but consumer observed
  // the incorrect value of 0?

  P0(int *prod, int *cons, int *data)
  {
     ... producer
  }

  P1(int *prod, int *cons, int *data)
  {
     ... consumer
  }

  exists( 1:d=0 /\ prod=1 /\ cons=1 );

The full ring test:
  C spsc-rb+full

  // Simple one entry ring:
  // prod cons     allowed action       prod cons
  //    0    0 =>       prod          =>   1    0
  //    0    1 =>       cons          =>   0    0
  //    1    0 =>       cons          =>   1    1
  //    1    1 =>       prod          =>   0    1

  { prod = 1; }

  // We start at prod==1, cons==0, data==1, i.e. producer has
  // written 0, so from here only the consumer can start, and should
  // consume 0. Afterwards, producer can continue and write 1 to
  // data. Can we enter state prod==0, cons==1, but consumer observed
  // the write of 1?

  P0(int *prod, int *cons, int *data)
  {
    ... producer
  }

  P1(int *prod, int *cons, int *data)
  {
    ... consumer
  }

  exists( 1:d=1 /\ prod=0 /\ cons=1 );

where P0 and P1 are:

  P0(int *prod, int *cons, int *data)
  {
  	int p;

  	p = READ_ONCE(*prod);
  	if (READ_ONCE(*cons) == p) {
  		WRITE_ONCE(*data, 1);
  		smp_wmb();
  		WRITE_ONCE(*prod, p ^ 1);
  	}
  }

  P0(int *prod, int *cons, int *data)
  {
  	int p;

  	p = READ_ONCE(*prod);
  	if (READ_ONCE(*cons) == p) {
  		WRITE_ONCE(*data, 1);
  		smp_store_release(prod, p ^ 1);
  	}
  }

  P1(int *prod, int *cons, int *data)
  {
  	int c;
  	int d = -1;

  	c = READ_ONCE(*cons);
  	if (READ_ONCE(*prod) != c) {
  		smp_rmb();
  		d = READ_ONCE(*data);
  		smp_mb();
  		WRITE_ONCE(*cons, c ^ 1);
  	}
  }

  P1(int *prod, int *cons, int *data)
  {
  	int c;
  	int d = -1;

  	c = READ_ONCE(*cons);
  	if (smp_load_acquire(prod) != c) {
  		d = READ_ONCE(*data);
  		smp_store_release(cons, c ^ 1);
  	}
  }

The full LKMM litmus tests are found at [1].

On x86-64 systems the l2fwd AF_XDP xdpsock sample performance
increases by 1%. This is mostly due to that the smp_mb() is removed,
which is a relatively expensive operation on these
platforms. Weakly-ordered platforms, such as ARM64 might benefit even
more.

[1] https://github.com/bjoto/litmus-xskSigned-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210305094113.413544-2-bjorn.topel@gmail.com

a23b3f56

selftests/bpf: Fix test_attach_probe for powerpc uprobes · 299194a9

Jiri Olsa authored Mar 05, 2021

When testing uprobes we the test gets GEP (Global Entry Point)
address from kallsyms, but then the function is called locally
so the uprobe is not triggered.

Fixing this by adjusting the address to LEP (Local Entry Point)
for powerpc arch plus instruction check stolen from ppc_function_entry
function pointed out and explained by Michael and Naveen.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Link: https://lore.kernel.org/bpf/20210305134050.139840-1-jolsa@kernel.org

299194a9

05 Mar, 2021 36 commits

selftests, bpf: Extend test_tc_tunnel test with vxlan · 256becd4

Xuesen Huang authored Mar 05, 2021

Add BPF_F_ADJ_ROOM_ENCAP_L2_ETH flag to the existing tests which
encapsulates the ethernet as the inner l2 header.

Update a vxlan encapsulation test case.
Signed-off-by: Xuesen Huang <huangxuesen@kuaishou.com>
Signed-off-by: Li Wang <wangli09@kuaishou.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/bpf/20210305123347.15311-1-hxseverything@gmail.com

256becd4

bpf: Add bpf_skb_adjust_room flag BPF_F_ADJ_ROOM_ENCAP_L2_ETH · d01b59c9

Xuesen Huang authored Mar 04, 2021

bpf_skb_adjust_room sets the inner_protocol as skb->protocol for packets
encapsulation. But that is not appropriate when pushing Ethernet header.

Add an option to further specify encap L2 type and set the inner_protocol
as ETH_P_TEB.
Suggested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Xuesen Huang <huangxuesen@kuaishou.com>
Signed-off-by: Zhiyong Cheng <chengzhiyong@kuaishou.com>
Signed-off-by: Li Wang <wangli09@kuaishou.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/bpf/20210304064046.6232-1-hxseverything@gmail.com

d01b59c9

selftests/bpf: Simplify the calculation of variables · bce86231

Jiapeng Chong authored Mar 03, 2021

Fix the following coccicheck warnings:

./tools/testing/selftests/bpf/test_sockmap.c:735:35-37: WARNING !A || A
&& B is equivalent to !A || B.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/1614757930-17197-1-git-send-email-jiapeng.chong@linux.alibaba.com

bce86231

bpf: Simplify the calculation of variables · 46ac034f

Jiapeng Chong authored Mar 03, 2021

Fix the following coccicheck warnings:

./tools/bpf/bpf_dbg.c:1201:55-57: WARNING !A || A && B is equivalent to
!A || B.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/1614756035-111280-1-git-send-email-jiapeng.chong@linux.alibaba.com

46ac034f

Merge branch 'PROG_TEST_RUN support for sk_lookup programs' · b0d3df48

Alexei Starovoitov authored Mar 04, 2021

Lorenz Bauer says:

====================

We don't have PROG_TEST_RUN support for sk_lookup programs at the
moment. So far this hasn't been a problem, since we can run our
tests in a separate network namespace. For benchmarking it's nice
to have PROG_TEST_RUN, so I've gone and implemented it.

Based on discussion on the v1 I've dropped support for testing multiple
programs at once.

Changes since v3:
- Use bpf_test_timer prefix (Andrii)

Changes since v2:
- Fix test_verifier failure (Alexei)

Changes since v1:
- Add sparse annotations to the t_* functions
- Add appropriate type casts in bpf_prog_test_run_sk_lookup
- Drop running multiple programs
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

b0d3df48

selftests: bpf: Don't run sk_lookup in verifier tests · b4f89463

Lorenz Bauer authored Mar 03, 2021

sk_lookup doesn't allow setting data_in for bpf_prog_run. This doesn't
play well with the verifier tests, since they always set a 64 byte
input buffer. Allow not running verifier tests by setting
bpf_test.runs to a negative value and don't run the ctx access case
for sk_lookup. We have dedicated ctx access tests so skipping here
doesn't reduce coverage.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210303101816.36774-6-lmb@cloudflare.com

b4f89463

selftests: bpf: Check that PROG_TEST_RUN repeats as requested · abab306f

Lorenz Bauer authored Mar 03, 2021

Extend a simple prog_run test to check that PROG_TEST_RUN adheres
to the requested repetitions. Convert it to use BPF skeleton.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210303101816.36774-5-lmb@cloudflare.com

abab306f

selftests: bpf: Convert sk_lookup ctx access tests to PROG_TEST_RUN · 509b2937

Lorenz Bauer authored Mar 03, 2021

Convert the selftests for sk_lookup narrow context access to use
PROG_TEST_RUN instead of creating actual sockets. This ensures that
ctx is populated correctly when using PROG_TEST_RUN.

Assert concrete values since we now control remote_ip and remote_port.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210303101816.36774-4-lmb@cloudflare.com

509b2937

bpf: Add PROG_TEST_RUN support for sk_lookup programs · 7c32e8f8

Lorenz Bauer authored Mar 03, 2021

Allow to pass sk_lookup programs to PROG_TEST_RUN. User space
provides the full bpf_sk_lookup struct as context. Since the
context includes a socket pointer that can't be exposed
to user space we define that PROG_TEST_RUN returns the cookie
of the selected socket or zero in place of the socket pointer.

We don't support testing programs that select a reuseport socket,
since this would mean running another (unrelated) BPF program
from the sk_lookup test handler.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210303101816.36774-3-lmb@cloudflare.com

7c32e8f8

bpf: Consolidate shared test timing code · 607b9cc9

Lorenz Bauer authored Mar 03, 2021

Share the timing / signal interruption logic between different
implementations of PROG_TEST_RUN. There is a change in behaviour
as well. We check the loop exit condition before checking for
pending signals. This resolves an edge case where a signal
arrives during the last iteration. Instead of aborting with
EINTR we return the successful result to user space.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210303101816.36774-2-lmb@cloudflare.com

607b9cc9

Merge branch 'Improve BPF syscall command documentation' · 2374e0f1

Alexei Starovoitov authored Mar 04, 2021

Joe Stringer says:

====================

The state of bpf(2) manual pages today is not exactly ideal. For the
most part, it was written several years ago and has not kept up with the
pace of development in the kernel tree. For instance, out of a total of
~35 commands to the BPF syscall available today, when I pull the
kernel-man-pages tree today I find just 6 documented commands: The very
basics of map interaction and program load.

In contrast, looking at bpf-helpers(7), I am able today to run one
command[0] to fetch API documentation of the very latest eBPF helpers
that have been added to the kernel. This documentation is up to date
because kernel maintainers enforce documenting the APIs as part of
the feature submission process. As far as I can tell, we rely on manual
synchronization from the kernel tree to the kernel-man-pages tree to
distribute these more widely, so all locations may not be completely up
to date. That said, the documentation does in fact exist in the first
place which is a major initial hurdle to overcome.

Given the relative success of the process around bpf-helpers(7) to
encourage developers to document their user-facing changes, in this
patch series I explore applying this technique to bpf(2) as well.
Unfortunately, even with bpf(2) being so out-of-date, there is still a
lot of content to convert over. In particular, the following aspects of
the bpf syscall could also be individually be generated from separate
documentation in the header:
* BPF syscall commands
* BPF map types
* BPF program types
* BPF program subtypes (aka expected_attach_type)
* BPF attachment points

Rather than tackle everything at once, I have focused in this series on
the syscall commands, "enum bpf_cmd". This series is structured to first
import what useful descriptions there are from the kernel-man-pages
tree, then piece-by-piece document a few of the syscalls in more detail
in cases where I could find useful documentation from the git tree or
from a casual read of the code. Not all documentation is comprehensive
at this point, but a basis is provided with examples that can be further
enhanced with subsequent follow-up patches. Note, the series in its
current state only includes documentation around the syscall commands
themselves, so in the short term it doesn't allow us to automate bpf(2)
generation; Only one section of the man page could be replaced. Though
if there is appetite for this approach, this should be trivial to
improve on, even if just by importing the remaining static text from the
kernel-man-pages tree.

Following that, the series enhances the python scripting around parsing
the descriptions from the header files and generating dedicated
ReStructured Text and troff output. Finally, to expose the new text and
reduce the likelihood of having it get out of date or break the docs
parser, it is added to the selftests and exposed through the kernel
documentation web pages.

The eventual goal of this effort would be to extend the kernel UAPI
headers such that each of the categories I had listed above (commands,
maps, progs, hooks) have dedicated documentation in the kernel tree, and
that developers must update the comments in the headers to document the
APIs prior to patch acceptance, and that we could auto-generate the
latest version of the bpf(2) manual pages based on a few static
description sections combined with the dynamically-generated output from
the header.

This patch series can also be found at the following location on GitHub:
https://github.com/joestringer/linux/tree/submit/bpf-command-docs_v2

Thanks also to Quentin Monnet for initial review.

[0]: make -C tools/testing/selftests/bpf docs

v2:
* Remove build infrastructure in favor of kernel-doc directives
* Shift userspace-api docs under Documentation/userspace-api/ebpf
* Fix scripts/bpf_doc.py syscall --header (throw unsupported error)
* Improve .gitignore handling of newly autogenerated files

====================
Acked-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

2374e0f1

tools: Sync uapi bpf.h header with latest changes · 242029f4

Joe Stringer authored Mar 02, 2021

Synchronize the header after all of the recent changes.
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-16-joe@cilium.io

242029f4

docs/bpf: Add bpf() syscall command reference · 6197e5b7

Joe Stringer authored Mar 02, 2021

Generate the syscall command reference from the UAPI header file and
include it in the main bpf docs page.
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-15-joe@cilium.io

6197e5b7

selftests/bpf: Test syscall command parsing · accbd33a

Joe Stringer authored Mar 02, 2021

Add building of the bpf(2) syscall commands documentation as part of the
docs building step in the build. This allows us to pick up on potential
parse errors from the docs generator script as part of selftests.

The generated manual pages here are not intended for distribution, they
are just a fragment that can be integrated into the other static text of
bpf(2) to form the full manual page.
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-14-joe@cilium.io

accbd33a

selftests/bpf: Templatize man page generation · 62b379a2

Joe Stringer authored Mar 02, 2021

Previously, the Makefile here was only targeting a single manual page so
it just hardcoded a bunch of individual rules to specifically handle
build, clean, install, uninstall for that particular page.

Upcoming commits will generate manual pages for an additional section,
so this commit prepares the makefile first by converting the existing
targets into an evaluated set of targets based on the manual page name
and section.
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-13-joe@cilium.io

62b379a2

tools/bpf: Remove bpf-helpers from bpftool docs · a01d935b

Joe Stringer authored Mar 02, 2021

This logic is used for validating the manual pages from selftests, so
move the infra under tools/testing/selftests/bpf/ and rely on selftests
for validation rather than tying it into the bpftool build.
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-12-joe@cilium.io

a01d935b

scripts/bpf: Add syscall commands printer · a67882a2

Joe Stringer authored Mar 02, 2021

Add a new target to bpf_doc.py to support generating the list of syscall
commands directly from the UAPI headers. Assuming that developer
submissions keep the main header up to date, this should allow the man
pages to be automatically generated based on the latest API changes
rather than requiring someone to separately go back through the API and
describe each command.
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-11-joe@cilium.io

a67882a2

scripts/bpf: Abstract eBPF API target parameter · 923a932c

Joe Stringer authored Mar 02, 2021

Abstract out the target parameter so that upcoming commits, more than
just the existing "helpers" target can be called to generate specific
portions of docs from the eBPF UAPI headers.
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-10-joe@cilium.io

923a932c

bpf: Document BPF_MAP_*_BATCH syscall commands · 0cb80454

Joe Stringer authored Mar 02, 2021

Based roughly on the following commits:
* Commit cb4d03ab ("bpf: Add generic support for lookup batch op")
* Commit 05799638 ("bpf: Add batch ops to all htab bpf map")
* Commit aa2e93b8 ("bpf: Add generic support for update and delete
  batch ops")
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Brian Vazquez <brianvv@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-9-joe@cilium.io

0cb80454

bpf: Document BPF_PROG_QUERY syscall command · 5d999994

Joe Stringer authored Mar 02, 2021

Commit 468e2f64 ("bpf: introduce BPF_PROG_QUERY command") originally
introduced this, but there have been several additions since then.
Unlike BPF_PROG_ATTACH, it appears that the sockmap progs are not able
to be queried so far.
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-8-joe@cilium.io

5d999994

bpf: Document BPF_PROG_TEST_RUN syscall command · 2a3fdca4

Joe Stringer authored Mar 02, 2021

Based on a brief read of the corresponding source code.
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-7-joe@cilium.io

2a3fdca4

bpf: Document BPF_PROG_ATTACH syscall command · 32e76b18

Joe Stringer authored Mar 02, 2021

Document the prog attach command in more detail, based on git commits:
* commit f4324551 ("bpf: add BPF_PROG_ATTACH and BPF_PROG_DETACH
  commands")
* commit 4f738adb ("bpf: create tcp_bpf_ulp allowing BPF to monitor
  socket TX/RX data")
* commit f4364dcf ("media: rc: introduce BPF_PROG_LIRC_MODE2")
* commit d58e468b ("flow_dissector: implements flow dissector BPF
  hook")
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-6-joe@cilium.io

32e76b18

bpf: Document BPF_PROG_PIN syscall command · 8aacb3c8

Joe Stringer authored Mar 02, 2021

Commit b2197755 ("bpf: add support for persistent maps/progs")
contains the original implementation and git logs, used as reference for
this documentation.

Also pull in the filename restriction as documented in commit 6d8cb045
("bpf: comment why dots in filenames under BPF virtual FS are not allowed")
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-5-joe@cilium.io

8aacb3c8

bpf: Document BPF_F_LOCK in syscall commands · 6690523b

Joe Stringer authored Mar 02, 2021

Document the meaning of the BPF_F_LOCK flag for the map lookup/update
descriptions. Based on commit 96049f3a ("bpf: introduce BPF_F_LOCK
flag").
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-4-joe@cilium.io

6690523b

bpf: Add minimal bpf() command documentation · f67c9cbf

Joe Stringer authored Mar 02, 2021

Introduce high-level descriptions of the intent and return codes of the
bpf() syscall commands. Subsequent patches may further flesh out the
content to provide a more useful programming reference.
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-3-joe@cilium.io

f67c9cbf

bpf: Import syscall arg documentation · 7799e4d9

Joe Stringer authored Mar 02, 2021

These descriptions are present in the man-pages project from the
original submissions around 2015-2016. Import them so that they can be
kept up to date as developers extend the bpf syscall commands.

These descriptions follow the pattern used by scripts/bpf_helpers_doc.py
so that we can take advantage of the parser to generate more up-to-date
man page writing based upon these headers.

Some minor wording adjustments were made to make the descriptions
more consistent for the description / return format.
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210302171947.2268128-2-joe@cilium.ioCo-authored-by: Alexei Starovoitov <ast@kernel.org>
Co-authored-by: Michael Kerrisk <mtk.manpages@gmail.com>

7799e4d9

Merge branch 'Add BTF_KIND_FLOAT support' · 13ec0216

Alexei Starovoitov authored Mar 04, 2021

Ilya Leoshkevich says:

====================

Some BPF programs compiled on s390 fail to load, because s390
arch-specific linux headers contain float and double types.

Introduce support for such types by representing them using the new
BTF_KIND_FLOAT. This series deals with libbpf, bpftool, in-kernel BTF
parser as well as selftests and documentation.

There are also pahole and LLVM parts:

* https://github.com/iii-i/dwarves/commit/btf-kind-float-v2
* https://reviews.llvm.org/D83289

but they should go in after the libbpf part is integrated.
---

v0: https://lore.kernel.org/bpf/20210210030317.78820-1-iii@linux.ibm.com/
v0 -> v1: Per Andrii's suggestion, remove the unnecessary trailing u32.

v1: https://lore.kernel.org/bpf/20210216011216.3168-1-iii@linux.ibm.com/
v1 -> v2: John noticed that sanitization corrupts BTF, because new and
          old sizes don't match. Per Yonghong's suggestion, use a
          modifier type (which has the same size as the float type) as
          a replacement.
          Per Yonghong's suggestion, add size and alignment checks to
          the kernel BTF parser.

v2: https://lore.kernel.org/bpf/20210219022543.20893-1-iii@linux.ibm.com/
v2 -> v3: Based on Yonghong's suggestions: Use BTF_KIND_CONST instead of
          BTF_KIND_TYPEDEF and make sure that the C code generated from
          the sanitized BTF is well-formed; fix size calculation in
          tests and use NAME_TBD everywhere; limit allowed sizes to 2,
          4, 8, 12 and 16 (this should also fix m68k and nds32le
          builds).

v3: https://lore.kernel.org/bpf/20210220034959.27006-1-iii@linux.ibm.com/
v3 -> v4: More fixes for the Yonghong's findings: fix the outdated
          comment in bpf_object__sanitize_btf() and add the error
          handling there (I've decided to check uint_id and uchar_id
          too in order to simplify debugging); add bpftool output
          example; use div64_u64_rem() instead of % in order to fix the
          linker error.
          Also fix the "invalid BTF_INFO" test (new commit, #4).

v4: https://lore.kernel.org/bpf/20210222214917.83629-1-iii@linux.ibm.com/
v4 -> v5: Fixes for the Andrii's findings: Use BTF_KIND_STRUCT instead
          of BTF_KIND_TYPEDEF for sanitization; check byte_sz in
          libbpf; move btf__add_float; remove relo support; add a dedup
          test (new commit, #7).

v5: https://lore.kernel.org/bpf/20210223231459.99664-1-iii@linux.ibm.com/
v5 -> v6: Fixes for further findings by Andrii: split whitespace issue
          fix into a separate patch; add 12-byte float to "float test #1,
          well-formed".

v6: https://lore.kernel.org/bpf/20210224234535.106970-1-iii@linux.ibm.com/
v6 -> v7: John suggested to add a comment explaining why sanitization
          does not preserve the type name, as well as what effect it has
          on running the code on the older kernels.
          Yonghong has asked to add a comment explaining why we are not
          checking the alignment very precisely in the kernel.
          John suggested to add a bpf_core_field_size test (commit #9).

Based on Alexei's feedback [1] I'm proceeding with the BTF_KIND_FLOAT
approach.

[1] https://lore.kernel.org/bpf/CAADnVQKWPODWZ2RSJ5FJhfYpxkuV0cvSAL1O+FSr9oP1ercoBg@mail.gmail.com/
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

13ec0216

bpf: Document BTF_KIND_FLOAT in btf.rst · 6be6a0ba

Ilya Leoshkevich authored Feb 26, 2021

Also document the expansion of the kind bitfield.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210226202256.116518-11-iii@linux.ibm.com

6be6a0ba

selftests/bpf: Add BTF_KIND_FLOAT to the existing deduplication tests · 7999cf7d

Ilya Leoshkevich authored Feb 26, 2021

Check that floats don't interfere with struct deduplication, that they
are not merged with another kinds and that floats of different sizes are
not merged with each other.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226202256.116518-9-iii@linux.ibm.com

7999cf7d

selftest/bpf: Add BTF_KIND_FLOAT tests · 7e72aad3

Ilya Leoshkevich authored Feb 26, 2021

Test the good variants as well as the potential malformed ones.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226202256.116518-8-iii@linux.ibm.com

7e72aad3

bpf: Add BTF_KIND_FLOAT support · b1828f0b

Ilya Leoshkevich authored Feb 26, 2021

On the kernel side, introduce a new btf_kind_operations. It is
similar to that of BTF_KIND_INT, however, it does not need to
handle encodings and bit offsets. Do not implement printing, since
the kernel does not know how to format floating-point values.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210226202256.116518-7-iii@linux.ibm.com

b1828f0b

selftests/bpf: Use the 25th bit in the "invalid BTF_INFO" test · eea154a8

Ilya Leoshkevich authored Feb 26, 2021

The bit being checked by this test is no longer reserved after
introducing BTF_KIND_FLOAT, so use the next one instead.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210226202256.116518-6-iii@linux.ibm.com

eea154a8

tools/bpftool: Add BTF_KIND_FLOAT support · 737e0f91

Ilya Leoshkevich authored Feb 26, 2021

Only dumping support needs to be adjusted, the code structure follows
that of BTF_KIND_INT. Example plain and JSON outputs:

    [4] FLOAT 'float' size=4
    {"id":4,"kind":"FLOAT","name":"float","size":4}
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210226202256.116518-5-iii@linux.ibm.com

737e0f91

libbpf: Add BTF_KIND_FLOAT support · 22541a9e

Ilya Leoshkevich authored Feb 26, 2021

The logic follows that of BTF_KIND_INT most of the time. Sanitization
replaces BTF_KIND_FLOATs with equally-sized empty BTF_KIND_STRUCTs on
older kernels, for example, the following:

    [4] FLOAT 'float' size=4

becomes the following:

    [4] STRUCT '(anon)' size=4 vlen=0

With dwarves patch [1] and this patch, the older kernels, which were
failing with the floating-point-related errors, will now start working
correctly.

[1] https://github.com/iii-i/dwarves/commit/btf-kind-float-v2Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210226202256.116518-4-iii@linux.ibm.com

22541a9e

libbpf: Fix whitespace in btf_add_composite() comment · 1b1ce92b

Ilya Leoshkevich authored Feb 26, 2021

Remove trailing space.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210226202256.116518-3-iii@linux.ibm.com

1b1ce92b

bpf: Add BTF_KIND_FLOAT to uapi · 8fd88691

Ilya Leoshkevich authored Feb 26, 2021

Add a new kind value and expand the kind bitfield.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210226202256.116518-2-iii@linux.ibm.com

8fd88691

04 Mar, 2021 1 commit

selftests/bpf: Add a verifier scale test with unknown bounded loop · 86a35af6

Yonghong Song authored Feb 26, 2021

The original bcc pull request https://github.com/iovisor/bcc/pull/3270 exposed
a verifier failure with Clang 12/13 while Clang 4 works fine.

Further investigation exposed two issues:

  Issue 1: LLVM may generate code which uses less refined value. The issue is
           fixed in LLVM patch: https://reviews.llvm.org/D97479

  Issue 2: Spills with initial value 0 are marked as precise which makes later
           state pruning less effective. This is my rough initial analysis and
           further investigation is needed to find how to improve verifier
           pruning in such cases.

With the above LLVM patch, for the new loop6.c test, which has smaller loop
bound compared to original test, I got:

  $ test_progs -s -n 10/16
  ...
  stack depth 64
  processed 390735 insns (limit 1000000) max_states_per_insn 87
      total_states 8658 peak_states 964 mark_read 6
  #10/16 loop6.o:OK

Use the original loop bound, i.e., commenting out "#define WORKAROUND", I got:

  $ test_progs -s -n 10/16
  ...
  BPF program is too large. Processed 1000001 insn
  stack depth 64
  processed 1000001 insns (limit 1000000) max_states_per_insn 91
      total_states 23176 peak_states 5069 mark_read 6
  ...
  #10/16 loop6.o:FAIL

The purpose of this patch is to provide a regression test for the above LLVM fix
and also provide a test case for further analyzing the verifier pruning issue.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Zhenwei Pi <pizhenwei@bytedance.com>
Link: https://lore.kernel.org/bpf/20210226223810.236472-1-yhs@fb.com

86a35af6