Commits · 9ab5305dbe3ffcd146852e28aa76a917e45c7541 · Kirill Smelkov / linux

01 Mar, 2019 9 commits

docs/btf: reflow text to fill up to 78 characters · 9ab5305d

Andrii Nakryiko authored Feb 28, 2019

Reflow paragraphs to more fully and evenly fill 78 character lines.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

9ab5305d

docs/btf: fix typos, improve wording · 5efc529f

Andrii Nakryiko authored Feb 28, 2019

Fix various typos, some of the formatting and wording for
Documentation/btf.rst.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

5efc529f

bpf: fix u64_stats_init() usage in bpf_prog_alloc() · 4b911304

Eric Dumazet authored Mar 01, 2019

We need to iterate through all possible cpus.

Fixes: 492ecee8 ("bpf: enable program stats")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

4b911304

Merge branch 'bpf-dedup-fixes' · 3860d38f

Daniel Borkmann authored Mar 01, 2019

Andrii Nakryiko says:

====================
This patchset fixes a bug in btf_dedup() algorithm, which under specific
hash collision causes infinite loop. It also exposes ability to tune BTF
deduplication table size, with double purpose of allowing applications to
adjust size according to the size of BTF data, as well as allowing a simple
way to force hash collisions by setting table size to 1.

- Patch #1 fixes bug in btf_dedup testing code that's checking strings
- Patch #2 fixes pointer arg formatting in btf.h
- Patch #3 adds option to specify custom dedup table size
- Patch #4 fixes aforementioned bug in btf_dedup
- Patch #5 adds test that validates the fix

v1->v2:
- remove "Fixes" from formatting change patch
- extract roundup_pow2_max func for dedup table size
- btf_equal_struct -> btf_shallow_equal_struct
- explain in comment why we can't rely on just btf_dedup_is_equiv
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

3860d38f

selftests/bpf: add btf_dedup test of FWD/STRUCT resolution · 7c7a4890

Andrii Nakryiko authored Feb 28, 2019

This patch adds a btf_dedup test exercising logic of STRUCT<->FWD
resolution and validating that STRUCT is not resolved to a FWD. It also
forces hash collisions, forcing both FWD and STRUCT to be candidates for
each other. Previously this condition caused infinite loop due to FWD
pointing to STRUCT and STRUCT pointing to its FWD.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

7c7a4890

btf: fix bug with resolving STRUCT/UNION into corresponding FWD · 91097fbe

Andrii Nakryiko authored Feb 28, 2019

When checking available canonical candidates for struct/union algorithm
utilizes btf_dedup_is_equiv to determine if candidate is suitable. This
check is not enough when candidate is corresponding FWD for that
struct/union, because according to equivalence logic they are
equivalent. When it so happens that FWD and STRUCT/UNION end in hashing
to the same bucket, it's possible to create remapping loop from FWD to
STRUCT and STRUCT to same FWD, which will cause btf_dedup() to loop
forever.

This patch fixes the issue by additionally checking that type and
canonical candidate are strictly equal (utilizing btf_equal_struct).

Fixes: d5caef5b ("btf: add BTF types deduplication algorithm")
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

91097fbe

btf: allow to customize dedup hash table size · 51edf5f6

Andrii Nakryiko authored Feb 28, 2019

Default size of dedup table (16k) is good enough for most binaries, even
typical vmlinux images. But there are cases of binaries with huge amount
of BTF types (e.g., allyesconfig variants of kernel), which benefit from
having bigger dedup table size to lower amount of unnecessary hash
collisions. Tools like pahole, thus, can tune this parameter to reach
optimal performance.

This change also serves double purpose of allowing tests to force hash
collisions to test some corner cases, used in follow up patch.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

51edf5f6

libbpf: fix formatting for btf_ext__get_raw_data · 1baabdc1

Andrii Nakryiko authored Feb 28, 2019

Fix invalid formatting of pointer arg.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

1baabdc1

selftests/bpf: fix btf_dedup testing code · 8054d51f

Andrii Nakryiko authored Feb 28, 2019

btf_dedup testing code doesn't account for length of struct btf_header
when calculating the start of a string section. This patch fixes this
problem.

Fixes: 49b57e0d ("tools/bpf: remove btf__get_strings() superseded by raw data API")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

8054d51f

28 Feb, 2019 13 commits

tools/libbpf: signedness bug in btf_dedup_ref_type() · 3d8669e6

Dan Carpenter authored Feb 28, 2019

The "ref_type_id" variable needs to be signed for the error handling
to work.

Fixes: d5caef5b ("btf: add BTF types deduplication algorithm")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

3d8669e6

Merge branch 'bpf-samples-improvements' · 74b38819

Daniel Borkmann authored Mar 01, 2019

Jakub Kicinski says:

====================
This set is next part of a quest to get rid of the bpf_load
ELF loader.  It fixes some minor issues with the samples and
starts the conversion.

First patch fixes ping invocations, ping localhost defaults
to IPv6 on modern setups. Next load_sock_ops sample is removed
and users are directed towards using bpftool directly.

Patch 4 removes the use of bpf_load from samples which don't
need the auto-attachment functionality at all.

Patch 5 improves symbol counting in libbpf, it's not currently
an issue but it will be when anyone adds a symbol with a long
name. Let's make sure that person doesn't have to spend time
scratching their head and wondering why .a and .so symbol
counts don't match.

v2: - specify prog_type where possible (Andrii).
====================
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

74b38819

tools: libbpf: make sure readelf shows full names in build checks · 771744f9

Jakub Kicinski authored Feb 27, 2019

readelf truncates its output by default to attempt to make it more
readable.  This can lead to function names getting aliased if they
differ late in the string.  Use --wide parameter to avoid
truncation.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

771744f9

samples: bpf: use libbpf where easy · 1a9b268c

Jakub Kicinski authored Feb 27, 2019

Some samples don't really need the magic of bpf_load,
switch them to libbpf.

v2: - specify program types.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

1a9b268c

tools: libbpf: add a correctly named define for map iteration · f74a53d9

Jakub Kicinski authored Feb 27, 2019

For historical reasons the helper to loop over maps in an object
is called bpf_map__for_each while it really should be called
bpf_object__for_each_map.  Rename and add a correctly named
define for backward compatibility.

Switch all in-tree users to the correct name (Quentin).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

f74a53d9

samples: bpf: remove load_sock_ops in favour of bpftool · ea9b6362

Jakub Kicinski authored Feb 27, 2019

bpftool can do all the things load_sock_ops used to do, and more.
Point users to bpftool instead of maintaining this sample utility.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

ea9b6362

samples: bpf: force IPv4 in ping · 5c3cf87d

Jakub Kicinski authored Feb 27, 2019

ping localhost may default of IPv6 on modern systems, but
samples are trying to only parse IPv4.  Force IPv4.

samples/bpf/tracex1_user.c doesn't interpret the packet so
we don't care which IP version will be used there.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

5c3cf87d

selftests/bpf: use __bpf_constant_htons in test_prog.c for flow dissector · ebace0e9

Stanislav Fomichev authored Feb 27, 2019

Older GCC (<4.8) isn't smart enough to optimize !__builtin_constant_p()
branch in bpf_htons.

I recently fixed it for pkt_v4 and pkt_v6 in commit a0517a0f
("selftests/bpf: use __bpf_constant_htons in test_prog.c"), but
later added another bunch of bpf_htons in commit bf0f0fd9
("selftests/bpf: add simple BPF_PROG_TEST_RUN examples for flow
dissector").

Fixes: bf0f0fd9 ("selftests/bpf: add simple BPF_PROG_TEST_RUN examples for flow dissector")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

ebace0e9

bpf: add missing entries to bpf_helpers.h · f2bb5388

Willem de Bruijn authored Feb 27, 2019

This header defines the BPF functions enumerated in uapi/linux.bpf.h
in a callable format. Expand to include all registered functions.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

f2bb5388

bpf: fix build without bpf_syscall · 3fcc5530

Alexei Starovoitov authored Feb 27, 2019

wrap bpf_stats_enabled sysctl with #ifdef
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 492ecee8 ("bpf: enable program stats")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

3fcc5530

Merge branch 'inner_map_spin_lock-fix' · 3bcd6044

Alexei Starovoitov authored Feb 27, 2019

Yonghong Song says:

====================
The inner_map_meta->spin_lock_off is not set correctly during
map creation for BPF_MAP_TYPE_ARRAY_OF_MAPS and BPF_MAP_TYPE_HASH_OF_MAPS.
This may lead verifier error due to misinformation.
This patch set fixed the issue with Patch #1 for the kernel change
and Patch #2 for enhanced selftest test_maps.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

3bcd6044

tools/bpf: selftests: add map lookup to test_map_in_map bpf prog · 9eca5083

Yonghong Song authored Feb 27, 2019

The bpf_map_lookup_elem is added in the bpf program.
Without previous patch, the test change will trigger the
following error:
  $ ./test_maps
  ...
  ; value_p = bpf_map_lookup_elem(map, &key);
  20: (bf) r1 = r7
  21: (bf) r2 = r8
  22: (85) call bpf_map_lookup_elem#1
  ; if (!value_p || *value_p != 123)
  23: (15) if r0 == 0x0 goto pc+16
   R0=map_value(id=2,off=0,ks=4,vs=4,imm=0) R6=inv1 R7=map_ptr(id=0,off=0,ks=4,vs=4,imm=0)
   R8=fp-8,call_-1 R10=fp0,call_-1 fp-8=mmmmmmmm
  ; if (!value_p || *value_p != 123)
  24: (61) r1 = *(u32 *)(r0 +0)
   R0=map_value(id=2,off=0,ks=4,vs=4,imm=0) R6=inv1 R7=map_ptr(id=0,off=0,ks=4,vs=4,imm=0)
   R8=fp-8,call_-1 R10=fp0,call_-1 fp-8=mmmmmmmm
  bpf_spin_lock cannot be accessed directly by load/store

With the kernel fix in the previous commit, the error goes away.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

9eca5083

bpf: set inner_map_meta->spin_lock_off correctly · a115d0ed

Yonghong Song authored Feb 27, 2019

Commit d83525ca ("bpf: introduce bpf_spin_lock")
introduced bpf_spin_lock and the field spin_lock_off
in kernel internal structure bpf_map has the following
meaning:
  >=0 valid offset, <0 error

For every map created, the kernel will ensure
spin_lock_off has correct value.

Currently, bpf_map->spin_lock_off is not copied
from the inner map to the map_in_map inner_map_meta
during a map_in_map type map creation, so
inner_map_meta->spin_lock_off = 0.
This will give verifier wrong information that
inner_map has bpf_spin_lock and the bpf_spin_lock
is defined at offset 0. An access to offset 0
of a value pointer will trigger the following error:
   bpf_spin_lock cannot be accessed directly by load/store

This patch fixed the issue by copy inner map's spin_lock_off
value to inner_map_meta->spin_lock_off.

Fixes: d83525ca ("bpf: introduce bpf_spin_lock")
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

a115d0ed

27 Feb, 2019 6 commits

samples: bpf: fix: broken sample regarding removed function · d2e614cb

Daniel T. Lee authored Feb 27, 2019

Currently, running sample "task_fd_query" and "tracex3" occurs the
following error. On kernel v5.0-rc* this sample will be unavailable
due to the removal of function 'blk_start_request' at commit "a1ce35fa".
(function removed, as "Single Queue IO scheduler" no longer exists)

$ sudo ./task_fd_query
failed to create kprobe 'blk_start_request' error 'No such file or
directory'

This commit will change the function 'blk_start_request' to
'blk_mq_start_request' to fix the broken sample.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

d2e614cb

Merge branch 'bpf-prog-stats' · da4e023e

Daniel Borkmann authored Feb 27, 2019

Alexei Starovoitov says:

====================
Introduce per program stats to monitor the usage BPF.

v2->v3:
- rename to run_time_ns/run_cnt everywhere

v1->v2:
- fixed u64 stats on 32-bit archs. Thanks Eric
- use more verbose run_time_ns in json output as suggested by Andrii
- refactored prog_alloc and clarified behavior of stats in subprogs
====================
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

da4e023e

tools/bpftool: recognize bpf_prog_info run_time_ns and run_cnt · 88ad472b

Alexei Starovoitov authored Feb 25, 2019

$ bpftool p s
1: kprobe  tag a56587d488d216c9  gpl run_time_ns 79786 run_cnt 8
	loaded_at 2019-02-22T12:22:51-0800  uid 0
	xlated 352B  not jited  memlock 4096B

$ bpftool --json --pretty p s
[{
        "id": 1,
        "type": "kprobe",
        "tag": "a56587d488d216c9",
        "gpl_compatible": true,
        "run_time_ns": 79786,
        "run_cnt": 8,
        "loaded_at": 1550866971,
        "uid": 0,
        "bytes_xlated": 352,
        "jited": false,
        "bytes_memlock": 4096
    }
]
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

88ad472b

tools/bpf: sync bpf.h into tools · b1eca86d

Alexei Starovoitov authored Feb 25, 2019

sync bpf.h into tools directory
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

b1eca86d

bpf: expose program stats via bpf_prog_info · 5f8f8b93

Alexei Starovoitov authored Feb 25, 2019

Return bpf program run_time_ns and run_cnt via bpf_prog_info
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

5f8f8b93

bpf: enable program stats · 492ecee8

Alexei Starovoitov authored Feb 25, 2019

JITed BPF programs are indistinguishable from kernel functions, but unlike
kernel code BPF code can be changed often.
Typical approach of "perf record" + "perf report" profiling and tuning of
kernel code works just as well for BPF programs, but kernel code doesn't
need to be monitored whereas BPF programs do.
Users load and run large amount of BPF programs.
These BPF stats allow tools monitor the usage of BPF on the server.
The monitoring tools will turn sysctl kernel.bpf_stats_enabled
on and off for few seconds to sample average cost of the programs.
Aggregated data over hours and days will provide an insight into cost of BPF
and alarms can trigger in case given program suddenly gets more expensive.

The cost of two sched_clock() per program invocation adds ~20 nsec.
Fast BPF progs (like selftests/bpf/progs/test_pkt_access.c) will slow down
from ~10 nsec to ~30 nsec.
static_key minimizes the cost of the stats collection.
There is no measurable difference before/after this patch
with kernel.bpf_stats_enabled=0
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

492ecee8

25 Feb, 2019 7 commits

Merge branch 'bpf-libbpf-af-xdp' · 143bdc2e

Daniel Borkmann authored Feb 25, 2019

Magnus Karlsson says:

====================
This patch proposes to add AF_XDP support to libbpf. The main reason
for this is to facilitate writing applications that use AF_XDP by
offering higher-level APIs that hide many of the details of the AF_XDP
uapi. This is in the same vein as libbpf facilitates XDP adoption by
offering easy-to-use higher level interfaces of XDP
functionality. Hopefully this will facilitate adoption of AF_XDP, make
applications using it simpler and smaller, and finally also make it
possible for applications to benefit from optimizations in the AF_XDP
user space access code. Previously, people just copied and pasted the
code from the sample application into their application, which is not
desirable.

The proposed interface is composed of two parts:

* Low-level access interface to the four rings and the packet
* High-level control plane interface for creating and setting up umems
  and AF_XDP sockets. This interface also loads a simple XDP program
  that routes all traffic on a queue up to the AF_XDP socket.

The sample program has been updated to use this new interface and in
that process it lost roughly 300 lines of code. I cannot detect any
performance degradations due to the use of this library instead of the
previous functions that were inlined in the sample application. But I
did measure this on a slower machine and not the Broadwell that we
normally use.

The rings are now called xsk_ring and when a producer operates on
it. It is xsk_ring_prod and for a consumer it is xsk_ring_cons. This
way we can get some compile time error checking that the rings are
used correctly.

Comments and contenplations:

* The current behaviour is that the library loads an XDP program (if
  requested to do so) but the clean up of this program is left to the
  application. It would be possible to implement this cleanup in the
  library, but it would require state to be kept on netdev level,
  which there is none at the moment, and the synchronization of this
  between processes. All this adding complexity. But when we get an
  XDP program per queue id, then it becomes trivial to also remove the
  XDP program when the application exits. This proposal from Jesper,
  Björn and others will also improve the performance of libbpf, since
  most of the XDP program code can be removed when that feature is
  supported.

* In a future release, I am planning on adding a higher level data
  plane interface too. This will be based around recvmsg and sendmsg
  with the use of struct iovec for batching, without the user having
  to know anything about the underlying four rings of an AF_XDP
  socket. There will be one semantic difference though from the
  standard recvmsg and that is that the kernel will fill in the iovecs
  instead of the application. But the rest should be the same as the
  libc versions so that application writers feel at home.

Patch 1: adds AF_XDP support in libbpf
Patch 2: updates the xdpsock sample application to use the libbpf functions
Patch 3: Documentation update to help first time users

Changes v5 to v6:
  * Fixed prog_fd bug found by Xiaolong Ye. Thanks!
Changes v4 to v5:
  * Added a FAQ to the documentation
  * Removed xsk_umem__get_data and renamed xsk_umem__get_dat_raw to
    xsk_umem__get_data
  * Replaced the netlink code with bpf_get_link_xdp_id()
  * Dynamic allocation of the map sizes. They are now sized after
    the max number of queueus on the netdev in question.
Changes v3 to v4:
  * Dropped the pr_*() patch in favor of Yonghong Song's patch set
  * Addressed the review comments of Daniel Borkmann, mainly leaking
    of file descriptors at clean up and making the data plane APIs
    all static inline (with the exception of xsk_umem__get_data that
    uses an internal structure I do not want to expose).
  * Fixed the netlink callback as suggested by Maciej Fijalkowski.
  * Removed an unecessary include in the sample program as spotted by
    Ilia Fillipov.
Changes v2 to v3:
  * Added automatic loading of a simple XDP program that routes all
    traffic on a queue up to the AF_XDP socket. This program loading
    can be disabled.
  * Updated function names to be consistent with the libbpf naming
    convention
  * Moved all code to xsk.[ch]
  * Removed all the XDP program loading code from the sample since
    this is now done by libbpf
  * The initialization functions now return a handle as suggested by
    Alexei
  * const statements added in the API where applicable.
Changes v1 to v2:
  * Fixed cleanup of library state on error.
  * Moved API to initial version
  * Prefixed all public functions by xsk__ instead of xsk_
  * Added comment about changed default ring sizes, batch size and umem
    size in the sample application commit message
  * The library now only creates an Rx or Tx ring if the respective
    parameter is != NULL
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

143bdc2e

xsk: add FAQ to facilitate for first time users · 0f4a9b7d

Magnus Karlsson authored Feb 21, 2019

Added an FAQ section in Documentation/networking/af_xdp.rst to help
first time users with common problems. As problems are getting
identified, entries will be added to the FAQ.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

0f4a9b7d

samples/bpf: convert xdpsock to use libbpf for AF_XDP access · 248c7f9c

Magnus Karlsson authored Feb 21, 2019

This commit converts the xdpsock sample application to use the AF_XDP
functions present in libbpf. This cuts down the size of it by nearly
300 lines of code.

The default ring sizes plus the batch size has been increased and the
size of the umem area has decreased. This so that the sample application
will provide higher throughput. Note also that the shared umem code
has been removed from the sample as this is not supported by libbpf
at this point in time.
Tested-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

248c7f9c

libbpf: add support for using AF_XDP sockets · 1cad0788

Magnus Karlsson authored Feb 21, 2019

This commit adds AF_XDP support to libbpf. The main reason for this is
to facilitate writing applications that use AF_XDP by offering
higher-level APIs that hide many of the details of the AF_XDP
uapi. This is in the same vein as libbpf facilitates XDP adoption by
offering easy-to-use higher level interfaces of XDP
functionality. Hopefully this will facilitate adoption of AF_XDP, make
applications using it simpler and smaller, and finally also make it
possible for applications to benefit from optimizations in the AF_XDP
user space access code. Previously, people just copied and pasted the
code from the sample application into their application, which is not
desirable.

The interface is composed of two parts:

* Low-level access interface to the four rings and the packet
* High-level control plane interface for creating and setting
  up umems and af_xdp sockets as well as a simple XDP program.
Tested-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

1cad0788

selftests/bpf: make sure signal interrupts BPF_PROG_TEST_RUN · 740f8a65

Stanislav Fomichev authored Feb 21, 2019

Simple test that I used to reproduce the issue in the previous commit:
Do BPF_PROG_TEST_RUN with max iterations, each program is 4096 simple
move instructions. File alarm in 0.1 second and check that
bpf_prog_test_run is interrupted (i.e. test doesn't hang).

Note: reposting this for bpf-next to avoid linux-next conflict. In this
version I test both BPF_PROG_TYPE_SOCKET_FILTER (which uses generic
bpf_test_run implementation) and BPF_PROG_TYPE_FLOW_DISSECTOR (which has
it own loop with preempt handling in bpf_prog_test_run_flow_dissector).
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

740f8a65

bpf/test_run: fix unkillable BPF_PROG_TEST_RUN for flow dissector · a439184d

Stanislav Fomichev authored Feb 19, 2019

Syzbot found out that running BPF_PROG_TEST_RUN with repeat=0xffffffff
makes process unkillable. The problem is that when CONFIG_PREEMPT is
enabled, we never see need_resched() return true. This is due to the
fact that preempt_enable() (which we do in bpf_test_run_one on each
iteration) now handles resched if it's needed.

Let's disable preemption for the whole run, not per test. In this case
we can properly see whether resched is needed.
Let's also properly return -EINTR to the userspace in case of a signal
interrupt.

This is a follow up for a recently fixed issue in bpf_test_run, see
commit df1a2cb7 ("bpf/test_run: fix unkillable
BPF_PROG_TEST_RUN").
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

a439184d

bpf: test_bpf: turn off preemption in function __run_once · fd92d664

Anders Roxell authored Feb 22, 2019

When running BPF test suite the following splat occurs:

[  415.930950] test_bpf: #0 TAX jited:0
[  415.931067] BUG: assuming atomic context at lib/test_bpf.c:6674
[  415.946169] in_atomic(): 0, irqs_disabled(): 0, pid: 11556, name: modprobe
[  415.953176] INFO: lockdep is turned off.
[  415.957207] CPU: 1 PID: 11556 Comm: modprobe Tainted: G        W         5.0.0-rc7-next-20190220 #1
[  415.966328] Hardware name: HiKey Development Board (DT)
[  415.971592] Call trace:
[  415.974069]  dump_backtrace+0x0/0x160
[  415.977761]  show_stack+0x24/0x30
[  415.981104]  dump_stack+0xc8/0x114
[  415.984534]  __cant_sleep+0xf0/0x108
[  415.988145]  test_bpf_init+0x5e0/0x1000 [test_bpf]
[  415.992971]  do_one_initcall+0x90/0x428
[  415.996837]  do_init_module+0x60/0x1e4
[  416.000614]  load_module+0x1de0/0x1f50
[  416.004391]  __se_sys_finit_module+0xc8/0xe0
[  416.008691]  __arm64_sys_finit_module+0x24/0x30
[  416.013255]  el0_svc_common+0x78/0x130
[  416.017031]  el0_svc_handler+0x38/0x78
[  416.020806]  el0_svc+0x8/0xc

Rework so that preemption is disabled when we loop over function
'BPF_PROG_RUN(...)'.

Fixes: 568f1967 ("bpf: check that BPF programs run with preemption disabled")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

fd92d664

22 Feb, 2019 1 commit

samples/bpf: Fix dummy program unloading for xdp_redirect samples · 915654fd

Toke Høiland-Jørgensen authored Feb 21, 2019

The xdp_redirect and xdp_redirect_map sample programs both load a dummy
program onto the egress interfaces. However, the unload code checks these
programs against the wrong id number, and thus refuses to unload them. Fix
the comparison to avoid this.

Fixes: 3b7a8ec2 ("samples/bpf: Check the prog id before exiting")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

915654fd

21 Feb, 2019 1 commit

seccomp, bpf: disable preemption before calling into bpf prog · e80d02dd

Alexei Starovoitov authored Feb 21, 2019

All BPF programs must be called with preemption disabled.

Fixes: 568f1967 ("bpf: check that BPF programs run with preemption disabled")
Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

e80d02dd

19 Feb, 2019 3 commits

bpf: add skb->queue_mapping write access from tc clsact · 74e31ca8

Jesper Dangaard Brouer authored Feb 19, 2019

The skb->queue_mapping already have read access, via __sk_buff->queue_mapping.

This patch allow BPF tc qdisc clsact write access to the queue_mapping via
tc_cls_act_is_valid_access.  Also handle that the value NO_QUEUE_MAPPING
is not allowed.

It is already possible to change this via TC filter action skbedit
tc-skbedit(8).  Due to the lack of TC examples, lets show one:

  # tc qdisc  add  dev ixgbe1 clsact
  # tc filter add  dev ixgbe1 ingress matchall action skbedit queue_mapping 5
  # tc filter list dev ixgbe1 ingress

The most common mistake is that XPS (Transmit Packet Steering) takes
precedence over setting skb->queue_mapping. XPS is configured per DEVICE
via /sys/class/net/DEVICE/queues/tx-*/xps_cpus via a CPU hex mask. To
disable set mask=00.

The purpose of changing skb->queue_mapping is to influence the selection of
the net_device "txq" (struct netdev_queue), which influence selection of
the qdisc "root_lock" (via txq->qdisc->q.lock) and txq->_xmit_lock. When
using the MQ qdisc the txq->qdisc points to different qdiscs and associated
locks, and HARD_TX_LOCK (txq->_xmit_lock), allowing for CPU scalability.

Due to lack of TC examples, lets show howto attach clsact BPF programs:

 # tc qdisc  add  dev ixgbe2 clsact
 # tc filter add  dev ixgbe2 egress bpf da obj XXX_kern.o sec tc_qmap2cpu
 # tc filter list dev ixgbe2 egress
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

74e31ca8

bpf: check that BPF programs run with preemption disabled · 568f1967

Peter Zijlstra authored Jan 28, 2019

Introduce cant_sleep() macro for annotation of functions that
cannot sleep.

Use it in BPF_PROG_RUN to catch execution of BPF programs in
preemptable context.
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

568f1967

bpf: bpftool, fix documentation for attach types · a5d9265e

Alban Crequy authored Feb 19, 2019

bpftool has support for attach types "stream_verdict" and
"stream_parser" but the documentation was referring to them as
"skb_verdict" and "skb_parse". The inconsistency comes from commit
b7d3826c ("bpf: bpftool, add support for attaching programs to
maps").

This patch changes the documentation to match the implementation:
- "bpftool prog help"
- man pages
- bash completion
Signed-off-by: Alban Crequy <alban@kinvolk.io>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

a5d9265e