Commits · 608114e441ad3a4fa1fced4d6d00653a34765eee · nexedi / linux

19 Nov, 2018 4 commits

Lorenz Bauer authored Nov 16, 2018

Synchronize changes to linux/bpf.h from
* "bpf: allow zero-initializing hash map seed"
* "bpf: move BPF_F_QUERY_EFFECTIVE after map flags"
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

608114e4

bpf: move BPF_F_QUERY_EFFECTIVE after map flags · 2f183360

Lorenz Bauer authored Nov 16, 2018

BPF_F_QUERY_EFFECTIVE is in the middle of the flags valid
for BPF_MAP_CREATE. Move it to its own section to reduce confusion.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

2f183360

bpf: allow zero-initializing hash map seed · 96b3b6c9

Lorenz Bauer authored Nov 16, 2018

Add a new flag BPF_F_ZERO_SEED, which forces a hash map
to initialize the seed to zero. This is useful when doing
performance analysis both on individual BPF programs, as
well as the kernel's hash table implementation.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

96b3b6c9

bpf: libbpf: retry map creation without the name · 23499442

Stanislav Fomichev authored Nov 19, 2018

Since commit 88cda1c9 ("bpf: libbpf: Provide basic API support
to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
for maps. Pre v4.14 kernels don't know about map names and return an
error about unexpected non-zero data. Retry sys_bpf without a map
name to cover older kernels.

v2 changes:
* check for errno == EINVAL as suggested by Daniel Borkmann
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

23499442

17 Nov, 2018 9 commits

bpf: fix null pointer dereference on pointer offload · 592ee43f

Colin Ian King authored Nov 13, 2018

Pointer offload is being null checked however the following statement
dereferences the potentially null pointer offload when assigning
offload->dev_state.  Fix this by only assigning it if offload is not
null.

Detected by CoverityScan, CID#1475437 ("Dereference after null check")

Fixes: 00db12c3 ("bpf: call verifier_prep from its callback in struct bpf_offload_dev")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

592ee43f

bpftool: make libbfd optional · 29a9c10e

Stanislav Fomichev authored Nov 12, 2018

Make it possible to build bpftool without libbfd. libbfd and libopcodes are
typically provided in dev/dbg packages (binutils-dev in debian) which we
usually don't have installed on the fleet machines and we'd like a way to have
bpftool version that works without installing any additional packages.
This excludes support for disassembling jit-ted code and prints an error if
the user tries to use these features.

Tested by:
cat > FEATURES_DUMP.bpftool <<EOF
feature-libbfd=0
feature-disassembler-four-args=1
feature-reallocarray=0
feature-libelf=1
feature-libelf-mmap=1
feature-bpf=1
EOF
FEATURES_DUMP=$PWD/FEATURES_DUMP.bpftool make
ldd bpftool | grep libbfd
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

29a9c10e

Merge branch 'socket-lookup-cg_sock' · ae9435f6

Alexei Starovoitov authored Nov 16, 2018

Andrey Ignatov says:

====================
This patch set makes bpf_sk_lookup_tcp, bpf_sk_lookup_udp and
bpf_sk_release helpers available in programs of type
BPF_PROG_TYPE_CGROUP_SOCK_ADDR.

Patch 1 is a fix for bpf_sk_lookup_udp that was already merged to bpf
(stable) tree. Here it's prerequisite for patch 3.

Patch 2 is the main patch in the set, it makes the helpers available for
BPF_PROG_TYPE_CGROUP_SOCK_ADDR and provides more details about use-case.

Patch 3 adds selftest for new functionality.

v1->v2:
- remove "Split bpf_sk_lookup" patch since it was already split by:
  commit c8123ead ("bpf: Extend the sk_lookup() helper to XDP
  hookpoint.");
- avoid unnecessary bpf_sock_addr_sk_lookup function.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ae9435f6

selftest/bpf: Use bpf_sk_lookup_{tcp, udp} in test_sock_addr · 9108e3a0

Andrey Ignatov authored Nov 09, 2018

Use bpf_sk_lookup_tcp, bpf_sk_lookup_udp and bpf_sk_release helpers from
test_sock_addr programs to make sure they're available and can lookup
and release socket properly for IPv4/IPv4, TCP/UDP.

Reading from a few fields of returned struct bpf_sock is also tested.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

9108e3a0

bpf: Support socket lookup in CGROUP_SOCK_ADDR progs · 6c49e65e

Andrey Ignatov authored Nov 09, 2018

Make bpf_sk_lookup_tcp, bpf_sk_lookup_udp and bpf_sk_release helpers
available in programs of type BPF_PROG_TYPE_CGROUP_SOCK_ADDR.

Such programs operate on sockets and have access to socket and struct
sockaddr passed by user to system calls such as sys_bind, sys_connect,
sys_sendmsg.

It's useful to be able to lookup other sockets from these programs.
E.g. sys_connect may lookup IP:port endpoint and if there is a server
socket bound to that endpoint ("server" can be defined by saddr & sport
being zero), redirect client connection to it by rewriting IP:port in
sockaddr passed to sys_connect.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

6c49e65e

bpf: Fix IPv6 dport byte order in bpf_sk_lookup_udp · cac6cc2f

Andrey Ignatov authored Nov 09, 2018

Lookup functions in sk_lookup have different expectations about byte
order of provided arguments.

Specifically __inet_lookup, __udp4_lib_lookup and __udp6_lib_lookup
expect dport to be in network byte order and do ntohs(dport) internally.

At the same time __inet6_lookup expects dport to be in host byte order
and correspondingly name the argument hnum.

sk_lookup works correctly with __inet_lookup, __udp4_lib_lookup and
__inet6_lookup with regard to dport. But in __udp6_lib_lookup case it
uses host instead of expected network byte order. It makes result
returned by bpf_sk_lookup_udp for IPv6 incorrect.

The patch fixes byte order of dport passed to __udp6_lib_lookup.

Originally sk_lookup properly handled UDPv6, but not TCPv6. 5ef0ae84
fixes TCPv6 but breaks UDPv6.

Fixes: 5ef0ae84 ("bpf: Fix IPv6 dport byte-order in bpf_sk_lookup")
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Joe Stringer <joe@wand.net.nz>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

cac6cc2f

bpf: Remove unused variable in nsim_bpf · ac8acec9

Nathan Chancellor authored Nov 12, 2018

Clang warns:

drivers/net/netdevsim/bpf.c:557:30: error: unused variable 'state'
[-Werror,-Wunused-variable]
        struct nsim_bpf_bound_prog *state;
                                    ^
1 error generated.

The declaration should have been removed in commit b07ade27 ("bpf:
pass translate() as a callback and remove its ndo_bpf subcommand").
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ac8acec9

bpf: libbpf: Fix bpf_program__next() API · a83d6e76

Martin KaFai Lau authored Nov 12, 2018

This patch restores the behavior in
commit eac7d845 ("tools: libbpf: don't return '.text' as a program for multi-function programs")
such that bpf_program__next() does not return pseudo programs in ".text".

Fixes: 0c19a9fb ("libbpf: cleanup after partial failure in bpf_object__pin")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

a83d6e76

selftests/bpf: Fix uninitialized duration warning · 5c86d212

Joe Stringer authored Nov 09, 2018

Daniel Borkmann reports:

test_progs.c: In function ‘main’:
test_progs.c:81:3: warning: ‘duration’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   printf("%s:PASS:%s %d nsec\n", __func__, tag, duration);\
   ^~~~~~
test_progs.c:1706:8: note: ‘duration’ was declared here
  __u32 duration;
        ^~~~~~~~
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

5c86d212

11 Nov, 2018 4 commits

Merge branch 'narrow-loads' · 407be8d0

Alexei Starovoitov authored Nov 10, 2018

Andrey Ignatov says:

====================
This patch set adds support for narrow loads with offset > 0 to BPF
verifier.

Patch 1 provides more details and is the main patch in the set.
Patches 2 and 3 add new test cases to test_verifier and test_sock_addr
selftests.

v1->v2:
- fix -Wdeclaration-after-statement warning.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

407be8d0

selftests/bpf: Test narrow loads with off > 0 for bpf_sock_addr · e7605475

Andrey Ignatov authored Nov 10, 2018

Add more test cases for context bpf_sock_addr to test narrow loads with
offset > 0 for ctx->user_ip4 field (__u32):
* off=1, size=1;
* off=2, size=1;
* off=3, size=1;
* off=2, size=2.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

e7605475

selftests/bpf: Test narrow loads with off > 0 in test_verifier · 6c2afb67

Andrey Ignatov authored Nov 10, 2018

Test the following narrow loads in test_verifier for context __sk_buff:
* off=1, size=1 - ok;
* off=2, size=1 - ok;
* off=3, size=1 - ok;
* off=0, size=2 - ok;
* off=1, size=2 - fail;
* off=0, size=2 - ok;
* off=3, size=2 - fail.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

6c2afb67

bpf: Allow narrow loads with offset > 0 · 46f53a65

Andrey Ignatov authored Nov 10, 2018

Currently BPF verifier allows narrow loads for a context field only with
offset zero. E.g. if there is a __u32 field then only the following
loads are permitted:
  * off=0, size=1 (narrow);
  * off=0, size=2 (narrow);
  * off=0, size=4 (full).

On the other hand LLVM can generate a load with offset different than
zero that make sense from program logic point of view, but verifier
doesn't accept it.

E.g. tools/testing/selftests/bpf/sendmsg4_prog.c has code:

  #define DST_IP4			0xC0A801FEU /* 192.168.1.254 */
  ...
  	if ((ctx->user_ip4 >> 24) == (bpf_htonl(DST_IP4) >> 24) &&

where ctx is struct bpf_sock_addr.

Some versions of LLVM can produce the following byte code for it:

       8:       71 12 07 00 00 00 00 00         r2 = *(u8 *)(r1 + 7)
       9:       67 02 00 00 18 00 00 00         r2 <<= 24
      10:       18 03 00 00 00 00 00 fe 00 00 00 00 00 00 00 00         r3 = 4261412864 ll
      12:       5d 32 07 00 00 00 00 00         if r2 != r3 goto +7 <LBB0_6>

where `*(u8 *)(r1 + 7)` means narrow load for ctx->user_ip4 with size=1
and offset=3 (7 - sizeof(ctx->user_family) = 3). This load is currently
rejected by verifier.

Verifier code that rejects such loads is in bpf_ctx_narrow_access_ok()
what means any is_valid_access implementation, that uses the function,
works this way, e.g. bpf_skb_is_valid_access() for __sk_buff or
sock_addr_is_valid_access() for bpf_sock_addr.

The patch makes such loads supported. Offset can be in [0; size_default)
but has to be multiple of load size. E.g. for __u32 field the following
loads are supported now:
  * off=0, size=1 (narrow);
  * off=1, size=1 (narrow);
  * off=2, size=1 (narrow);
  * off=3, size=1 (narrow);
  * off=0, size=2 (narrow);
  * off=2, size=2 (narrow);
  * off=0, size=4 (full).
Reported-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

46f53a65

10 Nov, 2018 18 commits

Merge branch 'bpftool-flow-dissector' · f2cbf958

Alexei Starovoitov authored Nov 10, 2018

Stanislav Fomichev says:

====================
v5 changes:
* FILE -> PATH for load/loadall (can be either file or directory now)
* simpler implementation for __bpf_program__pin_name
* removed p_err for REQ_ARGS checks
* parse_atach_detach_args -> parse_attach_detach_args
* for -> while in bpf_object__pin_{programs,maps} recovery

v4 changes:
* addressed another round of comments/style issues from Jakub Kicinski &
  Quentin Monnet (thanks!)
* implemented bpf_object__pin_maps and bpf_object__pin_programs helpers and
  used them in bpf_program__pin
* added new pin_name to bpf_program so bpf_program__pin
  works with sections that contain '/'
* moved *loadall* command implementation into a separate patch
* added patch that implements *pinmaps* to pin maps when doing
  load/loadall

v3 changes:
* (maybe) better cleanup for partial failure in bpf_object__pin
* added special case in bpf_program__pin for programs with single
  instances

v2 changes:
* addressed comments/style issues from Jakub Kicinski & Quentin Monnet
* removed logic that populates jump table
* added cleanup for partial failure in bpf_object__pin

This patch series adds support for loading and attaching flow dissector
programs from the bpftool:

* first patch fixes flow dissector section name in the selftests (so
  libbpf auto-detection works)
* second patch adds proper cleanup to bpf_object__pin, parts of which are now
  being used to attach all flow dissector progs/maps
* third patch adds special case in bpf_program__pin for programs with
  single instances (we don't create <prog>/0 pin anymore, just <prog>)
* forth patch adds pin_name to the bpf_program struct
  which is now used as a pin name in bpf_program__pin et al
* fifth patch adds *loadall* command that pins all programs, not just
  the first one
* sixth patch adds *pinmaps* argument to load/loadall to let users pin
  all maps of the obj file
* seventh patch adds actual flow_dissector support to the bpftool and
  an example
====================
Acked-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

f2cbf958

bpftool: support loading flow dissector · 092f0892

Stanislav Fomichev authored Nov 09, 2018

This commit adds support for loading/attaching/detaching flow
dissector program.

When `bpftool loadall` is called with a flow_dissector prog (i.e. when the
'type flow_dissector' argument is passed), we load and pin all programs.
User is responsible to construct the jump table for the tail calls.

The last argument of `bpftool attach` is made optional for this use
case.

Example:
bpftool prog load tools/testing/selftests/bpf/bpf_flow.o \
        /sys/fs/bpf/flow type flow_dissector \
	pinmaps /sys/fs/bpf/flow

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
        key 0 0 0 0 \
        value pinned /sys/fs/bpf/flow/IP

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
        key 1 0 0 0 \
        value pinned /sys/fs/bpf/flow/IPV6

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
        key 2 0 0 0 \
        value pinned /sys/fs/bpf/flow/IPV6OP

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
        key 3 0 0 0 \
        value pinned /sys/fs/bpf/flow/IPV6FR

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
        key 4 0 0 0 \
        value pinned /sys/fs/bpf/flow/MPLS

bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
        key 5 0 0 0 \
        value pinned /sys/fs/bpf/flow/VLAN

bpftool prog attach pinned /sys/fs/bpf/flow/flow_dissector flow_dissector

Tested by using the above lines to load the prog in
the test_flow_dissector.sh selftest.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

092f0892

bpftool: add pinmaps argument to the load/loadall · 3767a94b

Stanislav Fomichev authored Nov 09, 2018

This new additional argument lets users pin all maps from the object at
specified path.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

3767a94b

bpftool: add loadall command · 77380998

Stanislav Fomichev authored Nov 09, 2018

This patch adds new *loadall* command which slightly differs from the
existing *load*. *load* command loads all programs from the obj file,
but pins only the first programs. *loadall* pins all programs from the
obj file under specified directory.

The intended usecase is flow_dissector, where we want to load a bunch
of progs, pin them all and after that construct a jump table.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

77380998

libbpf: add internal pin_name · 33a2c75c

Stanislav Fomichev authored Nov 09, 2018

pin_name is the same as section_name where '/' is replaced
by '_'. bpf_object__pin_programs is converted to use pin_name
to avoid the situation where section_name would require creating another
subdirectory for a pin (as, for example, when calling bpf_object__pin_programs
for programs in sections like "cgroup/connect6").
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

33a2c75c

libbpf: bpf_program__pin: add special case for instances.nr == 1 · fd734c5c

Stanislav Fomichev authored Nov 09, 2018

When bpf_program has only one instance, don't create a subdirectory with
per-instance pin files (<prog>/0). Instead, just create a single pin file
for that single instance. This simplifies object pinning by not creating
unnecessary subdirectories.

This can potentially break existing users that depend on the case
where '/0' is always created. However, I couldn't find any serious
usage of bpf_program__pin inside the kernel tree and I suppose there
should be none outside.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

fd734c5c

libbpf: cleanup after partial failure in bpf_object__pin · 0c19a9fb

Stanislav Fomichev authored Nov 09, 2018

bpftool will use bpf_object__pin in the next commits to pin all programs
and maps from the file; in case of a partial failure, we need to get
back to the clean state (undo previous program/map pins).

As part of a cleanup, I've added and exported separate routines to
pin all maps (bpf_object__pin_maps) and progs (bpf_object__pin_programs)
of an object.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

0c19a9fb

selftests/bpf: rename flow dissector section to flow_dissector · 108d50a9

Stanislav Fomichev authored Nov 09, 2018

Makes it compatible with the logic that derives program type
from section name in libbpf_prog_type_by_name.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

108d50a9

Merge branch 'device-ops-as-cb' · 0157edc8

Alexei Starovoitov authored Nov 10, 2018

Quentin Monnet says:

====================
For passing device functions for offloaded eBPF programs, there used to
be no place where to store the pointer without making the non-offloaded
programs pay a memory price.

As a consequence, three functions were called with ndo_bpf() through
specific commands. Now that we have struct bpf_offload_dev, and since none
of those operations rely on RTNL, we can turn these three commands into
hooks inside the struct bpf_prog_offload_ops, and pass them as part of
bpf_offload_dev_create().

This patch set changes the offload architecture to do so, and brings the
relevant changes to the nfp and netdevsim drivers.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

0157edc8

bpf: do not pass netdev to translate() and prepare() offload callbacks · 16a8cb5c

Quentin Monnet authored Nov 09, 2018

The kernel functions to prepare verifier and translate for offloaded
program retrieve "offload" from "prog", and "netdev" from "offload".
Then both "prog" and "netdev" are passed to the callbacks.

Simplify this by letting the drivers retrieve the net device themselves
from the offload object attached to prog - if they need it at all. There
is currently no need to pass the netdev as an argument to those
functions.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

16a8cb5c

bpf: pass prog instead of env to bpf_prog_offload_verifier_prep() · a40a2632

Quentin Monnet authored Nov 09, 2018

Function bpf_prog_offload_verifier_prep(), called from the kernel BPF
verifier to run a driver-specific callback for preparing for the
verification step for offloaded programs, takes a pointer to a struct
bpf_verifier_env object. However, no driver callback needs the whole
structure at this time: the two drivers supporting this, nfp and
netdevsim, only need a pointer to the struct bpf_prog instance held by
env.

Update the callback accordingly, on kernel side and in these two
drivers.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

a40a2632

bpf: pass destroy() as a callback and remove its ndo_bpf subcommand · eb911947

Quentin Monnet authored Nov 09, 2018

As part of the transition from ndo_bpf() to callbacks attached to struct
bpf_offload_dev for some of the eBPF offload operations, move the
functions related to program destruction to the struct and remove the
subcommand that was used to call them through the NDO.

Remove function __bpf_offload_ndo(), which is no longer used.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

eb911947

bpf: pass translate() as a callback and remove its ndo_bpf subcommand · b07ade27

Quentin Monnet authored Nov 09, 2018

As part of the transition from ndo_bpf() to callbacks attached to struct
bpf_offload_dev for some of the eBPF offload operations, move the
functions related to code translation to the struct and remove the
subcommand that was used to call them through the NDO.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

b07ade27

bpf: call verifier_prep from its callback in struct bpf_offload_dev · 00db12c3

Quentin Monnet authored Nov 09, 2018

In a way similar to the change previously brought to the verify_insn
hook and to the finalize callback, switch to the newly added ops in
struct bpf_prog_offload for calling the functions used to prepare driver
verifiers.

Since the dev_ops pointer in struct bpf_prog_offload is no longer used
by any callback, we can now remove it from struct bpf_prog_offload.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

00db12c3

bpf: call finalize() from its callback in struct bpf_offload_dev · 6dc18fa6

Quentin Monnet authored Nov 09, 2018

In a way similar to the change previously brought to the verify_insn
hook, switch to the newly added ops in struct bpf_prog_offload for
calling the functions used to perform final verification steps for
offloaded programs.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

6dc18fa6

bpf: call verify_insn from its callback in struct bpf_offload_dev · 341b3e7b

Quentin Monnet authored Nov 09, 2018

We intend to remove the dev_ops in struct bpf_prog_offload, and to only
keep the ops in struct bpf_offload_dev instead, which is accessible from
more locations for passing function pointers.

But dev_ops is used for calling the verify_insn hook. Switch to the
newly added ops in struct bpf_prog_offload instead.

To avoid table lookups for each eBPF instruction to verify, we remember
the offdev attached to a netdev and modify bpf_offload_find_netdev() to
avoid performing more than once a lookup for a given offload object.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

341b3e7b

bpf: pass a struct with offload callbacks to bpf_offload_dev_create() · 1385d755

Quentin Monnet authored Nov 09, 2018

For passing device functions for offloaded eBPF programs, there used to
be no place where to store the pointer without making the non-offloaded
programs pay a memory price.

As a consequence, three functions were called with ndo_bpf() through
specific commands. Now that we have struct bpf_offload_dev, and since
none of those operations rely on RTNL, we can turn these three commands
into hooks inside the struct bpf_prog_offload_ops, and pass them as part
of bpf_offload_dev_create().

This commit effectively passes a pointer to the struct to
bpf_offload_dev_create(). We temporarily have two struct
bpf_prog_offload_ops instances, one under offdev->ops and one under
offload->dev_ops. The next patches will make the transition towards the
former, so that offload->dev_ops can be removed, and callbacks relying
on ndo_bpf() added to offdev->ops as well.

While at it, rename "nfp_bpf_analyzer_ops" as "nfp_bpf_dev_ops" (and
similarly for netdevsim).
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

1385d755

nfp: bpf: move nfp_bpf_analyzer_ops from verifier.c to offload.c · 1da6f573

Quentin Monnet authored Nov 09, 2018

We are about to add several new callbacks to the struct, all of them
defined in offload.c. Move the struct bpf_prog_offload_ops object in
that file. As a consequence, nfp_verify_insn() and nfp_finalize() can no
longer be static.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

1da6f573

09 Nov, 2018 5 commits

bpf: Extend the sk_lookup() helper to XDP hookpoint. · c8123ead

Nitin Hande authored Oct 28, 2018

This patch proposes to extend the sk_lookup() BPF API to the
XDP hookpoint. The sk_lookup() helper supports a lookup
on incoming packet to find the corresponding socket that will
receive this packet. Current support for this BPF API is
at the tc hookpoint. This patch will extend this API at XDP
hookpoint. A XDP program can map the incoming packet to the
5-tuple parameter and invoke the API to find the corresponding
socket structure.
Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

c8123ead

bpftool: Improve handling of ENOENT on map dumps · bf598a8f

David Ahern authored Nov 08, 2018

bpftool output is not user friendly when dumping a map with only a few
populated entries:

    $ bpftool map
    1: devmap  name tx_devmap  flags 0x0
            key 4B  value 4B  max_entries 64  memlock 4096B
    2: array  name tx_idxmap  flags 0x0
            key 4B  value 4B  max_entries 64  memlock 4096B

    $ bpftool map dump id 1
    key:
    00 00 00 00
    value:
    No such file or directory
    key:
    01 00 00 00
    value:
    No such file or directory
    key:
    02 00 00 00
    value:
    No such file or directory
    key: 03 00 00 00  value: 03 00 00 00

Handle ENOENT by keeping the line format sane and dumping
"<no entry>" for the value

    $ bpftool map dump id 1
    key: 00 00 00 00  value: <no entry>
    key: 01 00 00 00  value: <no entry>
    key: 02 00 00 00  value: <no entry>
    key: 03 00 00 00  value: 03 00 00 00
    ...
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bf598a8f

selftests/bpf: add a test case for sock_ops perf-event notification · 435f90a3

Sowmini Varadhan authored Nov 07, 2018

This patch provides a tcp_bpf based eBPF sample. The test

- ncat(1) as the TCP client program to connect() to a port
  with the intention of triggerring SYN retransmissions: we
  first install an iptables DROP rule to make sure ncat SYNs are
  resent (instead of aborting instantly after a TCP RST)

- has a bpf kernel module that sends a perf-event notification for
  each TCP retransmit, and also tracks the number of such notifications
  sent in the global_map

The test passes when the number of event notifications intercepted
in user-space matches the value in the global_map.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

435f90a3

bpf: add perf event notificaton support for sock_ops · a5a3a828

Sowmini Varadhan authored Nov 07, 2018

This patch allows eBPF programs that use sock_ops to send perf
based event notifications using bpf_perf_event_output(). Our main
use case for this is the following:

  We would like to monitor some subset of TCP sockets in user-space,
  (the monitoring application would define 4-tuples it wants to monitor)
  using TCP_INFO stats to analyze reported problems. The idea is to
  use those stats to see where the bottlenecks are likely to be ("is
  it application-limited?" or "is there evidence of BufferBloat in
  the path?" etc).

  Today we can do this by periodically polling for tcp_info, but this
  could be made more efficient if the kernel would asynchronously
  notify the application via tcp_info when some "interesting"
  thresholds (e.g., "RTT variance > X", or "total_retrans > Y" etc)
  are reached. And to make this effective, it is better if
  we could apply the threshold check *before* constructing the
  tcp_info netlink notification, so that we don't waste resources
  constructing notifications that will be discarded by the filter.

This work solves the problem by adding perf event based notification
support for sock_ops. The eBPF program can thus be designed to apply
any desired filters to the bpf_sock_ops and trigger a perf event
notification based on the evaluation from the filter. The user space
component can use these perf event notifications to either read any
state managed by the eBPF program, or issue a TCP_INFO netlink call
if desired.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

a5a3a828

Merge branch 'bpf-max-pkt-offset' · 185067a8

Daniel Borkmann authored Nov 09, 2018

Jiong Wang says:

====================
The maximum packet offset accessed by one BPF program is useful
information.

Because sometimes there could be packet split and it is possible for some
reasons (for example performance) we want to reject the BPF program if the
maximum packet size would trigger such split. Normally, MTU value is
treated as the maximum packet size, but one BPF program does not always
access the whole packet, it could only access the head portion of the data.

We could let verifier calculate the maximum packet offset ever used and
record it inside prog auxiliar information structure as a new field
"max_pkt_offset".
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

185067a8