Commits · 484b165306e18fd9a0d960e6ed2d6eff31665522 · Kirill Smelkov / linux

21 Dec, 2019 1 commit

xsk: Eliminate the lazy update threshold · 484b1653

Magnus Karlsson authored Dec 19, 2019

The lazy update threshold was introduced to keep the producer and
consumer some distance apart in the completion ring. This was
important in the beginning of the development of AF_XDP as the ring
format as that point in time was very sensitive to the producer and
consumer being on the same cache line. This is not the case
anymore as the current ring format does not degrade in any noticeable
way when this happens. Moreover, this threshold makes it impossible
to run the system with rings that have less than 128 entries.

So let us remove this threshold and just get one entry from the ring
as in all other functions. This will enable us to remove this function
in a later commit. Note that xskq_produce_addr_lazy followed by
xskq_produce_flush_addr_n are still not the same function as
xskq_produce_addr() as it operates on another cached pointer.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1576759171-28550-2-git-send-email-magnus.karlsson@intel.com

484b1653

20 Dec, 2019 16 commits

Merge branch 'replace-cg_bpf-prog' · 99cacdc6

Alexei Starovoitov authored Dec 19, 2019

Andrey Ignatov says:

====================
v3->v4:
- use OPTS_VALID and OPTS_GET to handle bpf_prog_attach_opts.

v2->v3:
- rely on DECLARE_LIBBPF_OPTS from libbpf_common.h;
- separate "required" and "optional" arguments in bpf_prog_attach_xattr;
- convert test_cgroup_attach to prog_tests;
- move the new selftest to prog_tests/cgroup_attach_multi.

v1->v2:
- move DECLARE_LIBBPF_OPTS from libbpf.h to bpf.h (patch 4);
- switch new libbpf API to OPTS framework;
- switch selftest to libbpf OPTS framework.

This patch set adds support for replacing cgroup-bpf programs attached with
BPF_F_ALLOW_MULTI flag so that any program in a list can be updated to a new
version without service interruption and order of programs can be preserved.

Please see patch 3 for details on the use-case and API changes.

Other patches:
Patch 1 is preliminary refactoring of __cgroup_bpf_attach to simplify it.
Patch 2 is minor cleanup of hierarchy_allows_attach.
Patch 4 extends libbpf API to support new set of attach attributes.
Patch 5 converts test_cgroup_attach to prog_tests.
Patch 6 adds selftest coverage for the new API.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

99cacdc6

selftests/bpf: Test BPF_F_REPLACE in cgroup_attach_multi · 06ac0186

Andrey Ignatov authored Dec 18, 2019

Test replacing a cgroup-bpf program attached with BPF_F_ALLOW_MULTI and
possible failure modes: invalid combination of flags, invalid
replace_bpf_fd, replacing a non-attachd to specified cgroup program.

Example of program replacing:

  # gdb -q --args ./test_progs --name=cgroup_attach_multi
  ...
  Breakpoint 1, test_cgroup_attach_multi () at cgroup_attach_multi.c:227
  (gdb)
  [1]+  Stopped                 gdb -q --args ./test_progs --name=cgroup_attach_multi
  # bpftool c s /mnt/cgroup2/cgroup-test-work-dir/cg1
  ID       AttachType      AttachFlags     Name
  2133     egress          multi
  2134     egress          multi
  # fg
  gdb -q --args ./test_progs --name=cgroup_attach_multi
  (gdb) c
  Continuing.

  Breakpoint 2, test_cgroup_attach_multi () at cgroup_attach_multi.c:233
  (gdb)
  [1]+  Stopped                 gdb -q --args ./test_progs --name=cgroup_attach_multi
  # bpftool c s /mnt/cgroup2/cgroup-test-work-dir/cg1
  ID       AttachType      AttachFlags     Name
  2139     egress          multi
  2134     egress          multi
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/7b9b83e8d5fb82e15b034341bd40b6fb2431eeba.1576741281.git.rdna@fb.com

06ac0186

selftests/bpf: Convert test_cgroup_attach to prog_tests · 257c8855

Andrey Ignatov authored Dec 18, 2019

Convert test_cgroup_attach to prog_tests.

This change does a lot of things but in many cases it's pretty expensive
to separate them, so they go in one commit. Nevertheless the logic is
ketp as is and changes made are just moving things around, simplifying
them (w/o changing the meaning of the tests) and making prog_tests
compatible:

* split the 3 tests in the file into 3 separate files in prog_tests/;

* rename the test functions to test_<file_base_name>;

* remove unused includes, constants, variables and functions from every
  test;

* replace `if`-s with or `if (CHECK())` where additional context should
  be logged and with `if (CHECK_FAIL())` where line number is enough;

* switch from `log_err()` to logging via `CHECK()`;

* replace `assert`-s with `CHECK_FAIL()` to avoid crashing the whole
  test_progs if one assertion fails;

* replace cgroup_helpers with test__join_cgroup() in
  cgroup_attach_override only, other tests need more fine-grained
  control for cgroup creation/deletion so cgroup_helpers are still used
  there;

* simplify cgroup_attach_autodetach by switching to easiest possible
  program since this test doesn't really need such a complicated program
  as cgroup_attach_multi does;

* remove test_cgroup_attach.c itself.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/0ff19cc64d2dc5cf404349f07131119480e10e32.1576741281.git.rdna@fb.com

257c8855

libbpf: Introduce bpf_prog_attach_xattr · cdbee383

Andrey Ignatov authored Dec 18, 2019

Introduce a new bpf_prog_attach_xattr function that, in addition to
program fd, target fd and attach type, accepts an extendable struct
bpf_prog_attach_opts.

bpf_prog_attach_opts relies on DECLARE_LIBBPF_OPTS macro to maintain
backward and forward compatibility and has the following "optional"
attach attributes:

* existing attach_flags, since it's not required when attaching in NONE
  mode. Even though it's quite often used in MULTI and OVERRIDE mode it
  seems to be a good idea to reduce number of arguments to
  bpf_prog_attach_xattr;

* newly introduced attribute of BPF_PROG_ATTACH command: replace_prog_fd
  that is fd of previously attached cgroup-bpf program to replace if
  BPF_F_REPLACE flag is used.

The new function is named to be consistent with other xattr-functions
(bpf_prog_test_run_xattr, bpf_create_map_xattr, bpf_load_program_xattr).

The struct bpf_prog_attach_opts is supposed to be used with
DECLARE_LIBBPF_OPTS macro.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/bd6e0732303eb14e4b79cb128268d9e9ad6db208.1576741281.git.rdna@fb.com

cdbee383

bpf: Support replacing cgroup-bpf program in MULTI mode · 7dd68b32

Andrey Ignatov authored Dec 18, 2019

The common use-case in production is to have multiple cgroup-bpf
programs per attach type that cover multiple use-cases. Such programs
are attached with BPF_F_ALLOW_MULTI and can be maintained by different
people.

Order of programs usually matters, for example imagine two egress
programs: the first one drops packets and the second one counts packets.
If they're swapped the result of counting program will be different.

It brings operational challenges with updating cgroup-bpf program(s)
attached with BPF_F_ALLOW_MULTI since there is no way to replace a
program:

* One way to update is to detach all programs first and then attach the
new version(s) again in the right order. This introduces an
interruption in the work a program is doing and may not be acceptable
(e.g. if it's egress firewall);

* Another way is attach the new version of a program first and only then
detach the old version. This introduces the time interval when two
versions of same program are working, what may not be acceptable if a
program is not idempotent. It also imposes additional burden on
program developers to make sure that two versions of their program can
co-exist.

Solve the problem by introducing a "replace" mode in BPF_PROG_ATTACH
command for cgroup-bpf programs being attached with BPF_F_ALLOW_MULTI
flag. This mode is enabled by newly introduced BPF_F_REPLACE attach flag
and bpf_attr.replace_bpf_fd attribute to pass fd of the old program to
replace

That way user can replace any program among those attached with
BPF_F_ALLOW_MULTI flag without the problems described above.

Details of the new API:

* If BPF_F_REPLACE is set but replace_bpf_fd doesn't have valid
descriptor of BPF program, BPF_PROG_ATTACH will return corresponding
error (EINVAL or EBADF).

* If replace_bpf_fd has valid descriptor of BPF program but such a
program is not attached to specified cgroup, BPF_PROG_ATTACH will
return ENOENT.

BPF_F_REPLACE is introduced to make the user intent clear, since
replace_bpf_fd alone can't be used for this (its default value, 0, is a
valid fd). BPF_F_REPLACE also makes it possible to extend the API in the
future (e.g. add BPF_F_BEFORE and BPF_F_AFTER if needed).
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrii Narkyiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/30cd850044a0057bdfcaaf154b7d2f39850ba813.1576741281.git.rdna@fb.com

7dd68b32

bpf: Remove unused new_flags in hierarchy_allows_attach() · 9fab329d

Andrey Ignatov authored Dec 18, 2019

new_flags is unused, remove it.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/2c49b30ab750f93cfef04a1e40b097d70c3a39a1.1576741281.git.rdna@fb.com

9fab329d

bpf: Simplify __cgroup_bpf_attach · 1020c1f2

Andrey Ignatov authored Dec 18, 2019

__cgroup_bpf_attach has a lot of identical code to handle two scenarios:
BPF_F_ALLOW_MULTI is set and unset.

Simplify it by splitting the two main steps:

* First, the decision is made whether a new bpf_prog_list entry should
  be allocated or existing entry should be reused for the new program.
  This decision is saved in replace_pl pointer;

* Next, replace_pl pointer is used to handle both possible states of
  BPF_F_ALLOW_MULTI flag (set / unset) instead of doing similar work for
  them separately.

This splitting, in turn, allows to make further simplifications:

* The check for attaching same program twice in BPF_F_ALLOW_MULTI mode
  can be done before allocating cgroup storage, so that if user tries to
  attach same program twice no alloc/free happens as it was before;

* pl_was_allocated becomes redundant so it's removed.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/c6193db6fe630797110b0d3ff06c125d093b834c.1576741281.git.rdna@fb.com

1020c1f2

Merge branch 'simplify-do_redirect' · c92bbaa0

Alexei Starovoitov authored Dec 19, 2019

Björn Töpel says:

====================
This series aims to simplify the XDP maps and
xdp_do_redirect_map()/xdp_do_flush_map(), and to crank out some more
performance from XDP_REDIRECT scenarios.

The first part of the series simplifies all XDP_REDIRECT capable maps,
so that __XXX_flush_map() does not require the map parameter, by
moving the flush list from the map to global scope.

This results in that the map_to_flush member can be removed from
struct bpf_redirect_info, and its corresponding logic.

Simpler code, and more performance due to that checks/code per-packet
is moved to flush.

Pre-series performance:
  $ sudo taskset -c 22 ./xdpsock -i enp134s0f0 -q 20 -n 1 -r -z

   sock0@enp134s0f0:20 rxdrop xdp-drv
                  pps         pkts        1.00
  rx              20,797,350  230,942,399
  tx              0           0

  $ sudo ./xdp_redirect_cpu --dev enp134s0f0 --cpu 22 xdp_cpu_map0

  Running XDP/eBPF prog_name:xdp_cpu_map5_lb_hash_ip_pairs
  XDP-cpumap      CPU:to  pps            drop-pps    extra-info
  XDP-RX          20      7723038        0           0
  XDP-RX          total   7723038        0
  cpumap_kthread  total   0              0           0
  redirect_err    total   0              0
  xdp_exception   total   0              0

Post-series performance:
  $ sudo taskset -c 22 ./xdpsock -i enp134s0f0 -q 20 -n 1 -r -z

   sock0@enp134s0f0:20 rxdrop xdp-drv
                  pps         pkts        1.00
  rx              21,524,979  86,835,327
  tx              0           0

  $ sudo ./xdp_redirect_cpu --dev enp134s0f0 --cpu 22 xdp_cpu_map0

  Running XDP/eBPF prog_name:xdp_cpu_map5_lb_hash_ip_pairs
  XDP-cpumap      CPU:to  pps            drop-pps    extra-info
  XDP-RX          20      7840124        0           0
  XDP-RX          total   7840124        0
  cpumap_kthread  total   0              0           0
  redirect_err    total   0              0
  xdp_exception   total   0              0

Results: +3.5% and +1.5% for the ubenchmarks.

v1->v2 [1]:
  * Removed 'unused-variable' compiler warning (Jakub)

[1] https://lore.kernel.org/bpf/20191218105400.2895-1-bjorn.topel@gmail.com/
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

c92bbaa0

xdp: Simplify __bpf_tx_xdp_map() · 1170beaa

Björn Töpel authored Dec 19, 2019

The explicit error checking is not needed. Simply return the error
instead.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20191219061006.21980-9-bjorn.topel@gmail.com

1170beaa

xdp: Remove map_to_flush and map swap detection · 332f22a6

Björn Töpel authored Dec 19, 2019

Now that all XDP maps that can be used with bpf_redirect_map() tracks
entries to be flushed in a global fashion, there is not need to track
that the map has changed and flush from xdp_do_generic_map()
anymore. All entries will be flushed in xdp_do_flush_map().

This means that the map_to_flush can be removed, and the corresponding
checks. Moving the flush logic to one place, xdp_do_flush_map(), give
a bulking behavior and performance boost.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20191219061006.21980-8-bjorn.topel@gmail.com

332f22a6

xdp: Make cpumap flush_list common for all map instances · cdfafe98

Björn Töpel authored Dec 19, 2019

The cpumap flush list is used to track entries that need to flushed
from via the xdp_do_flush_map() function. This list used to be
per-map, but there is really no reason for that. Instead make the
flush list global for all devmaps, which simplifies __cpu_map_flush()
and cpu_map_alloc().
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20191219061006.21980-7-bjorn.topel@gmail.com

cdfafe98

xdp: Make devmap flush_list common for all map instances · 96360004

Björn Töpel authored Dec 19, 2019

The devmap flush list is used to track entries that need to flushed
from via the xdp_do_flush_map() function. This list used to be
per-map, but there is really no reason for that. Instead make the
flush list global for all devmaps, which simplifies __dev_map_flush()
and dev_map_init_map().
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20191219061006.21980-6-bjorn.topel@gmail.com

96360004

xsk: Make xskmap flush_list common for all map instances · e312b9e7

Björn Töpel authored Dec 19, 2019

The xskmap flush list is used to track entries that need to flushed
from via the xdp_do_flush_map() function. This list used to be
per-map, but there is really no reason for that. Instead make the
flush list global for all xskmaps, which simplifies __xsk_map_flush()
and xsk_map_alloc().
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20191219061006.21980-5-bjorn.topel@gmail.com

e312b9e7

xdp: Fix graze->grace type-o in cpumap comments · fb5aacdf

Björn Töpel authored Dec 19, 2019

Simple spelling fix.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20191219061006.21980-4-bjorn.topel@gmail.com

fb5aacdf

xdp: Simplify cpumap cleanup · 4bc188c7

Björn Töpel authored Dec 19, 2019

After the RCU flavor consolidation [1], call_rcu() and
synchronize_rcu() waits for preempt-disable regions (NAPI) in addition
to the read-side critical sections. As a result of this, the cleanup
code in cpumap can be simplified

* There is no longer a need to flush in __cpu_map_entry_free, since we
  know that this has been done when the call_rcu() callback is
  triggered.

* When freeing the map, there is no need to explicitly wait for a
  flush. It's guaranteed to be done after the synchronize_rcu() call
  in cpu_map_free().

[1] https://lwn.net/Articles/777036/Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20191219061006.21980-3-bjorn.topel@gmail.com

4bc188c7

xdp: Simplify devmap cleanup · 0536b852

Björn Töpel authored Dec 19, 2019

After the RCU flavor consolidation [1], call_rcu() and
synchronize_rcu() waits for preempt-disable regions (NAPI) in addition
to the read-side critical sections. As a result of this, the cleanup
code in devmap can be simplified

* There is no longer a need to flush in __dev_map_entry_free, since we
  know that this has been done when the call_rcu() callback is
  triggered.

* When freeing the map, there is no need to explicitly wait for a
  flush. It's guaranteed to be done after the synchronize_rcu() call
  in dev_map_free(). The rcu_barrier() is still needed, so that the
  map is not freed prior the elements.

[1] https://lwn.net/Articles/777036/Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20191219061006.21980-2-bjorn.topel@gmail.com

0536b852

19 Dec, 2019 23 commits

bpf: Remove unnecessary assertion on fp_old · 5bf2fc1f

Aditya Pakki authored Dec 19, 2019

The two callers of bpf_prog_realloc - bpf_patch_insn_single and
bpf_migrate_filter dereference the struct fp_old, before passing
it to the function. Thus assertion to check fp_old is unnecessary
and can be removed.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191219175735.19231-1-pakki001@umn.edu

5bf2fc1f

libbpf: Fix another __u64 printf warning · 7745ff98

Andrii Nakryiko authored Dec 18, 2019

Fix yet another printf warning for %llu specifier on ppc64le. This time size_t
casting won't work, so cast to verbose `unsigned long long`.

Fixes: 166750bc ("libbpf: Support libbpf-provided extern variables")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191219052103.3515-1-andriin@fb.com

7745ff98

libbpf: Fix printing of ulimit value · b5c7d0d0

Toke Høiland-Jørgensen authored Dec 19, 2019

Naresh pointed out that libbpf builds fail on 32-bit architectures because
rlimit.rlim_cur is defined as 'unsigned long long' on those architectures.
Fix this by using %zu in printf and casting to size_t.

Fixes: dc3a2d25 ("libbpf: Print hint about ulimit when getting permission denied error")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191219090236.905059-1-toke@redhat.com

b5c7d0d0

selftests/bpf: Fix test_attach_probe · 580205dd

Alexei Starovoitov authored Dec 18, 2019

Fix two issues in test_attach_probe:

1. it was not able to parse /proc/self/maps beyond the first line,
   since %s means parse string until white space.
2. offset has to be accounted for otherwise uprobed address is incorrect.

Fixes: 1e8611bb ("selftests/bpf: add kprobe/uprobe selftests")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191219020442.1922617-1-ast@kernel.org

580205dd

libbpf: Add missing newline in opts validation macro · 12dd14b2

Toke Høiland-Jørgensen authored Dec 19, 2019

The error log output in the opts validation macro was missing a newline.

Fixes: 2ce8450e ("libbpf: add bpf_object__open_{file, mem} w/ extensible opts")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191219120714.928380-1-toke@redhat.com

12dd14b2

Merge branch 'bpf-riscv-jit-improvements' · 7800a3d5

Daniel Borkmann authored Dec 19, 2019

Björn Töpel says:

====================
This series contain one non-critical fix, support for far jumps, and
some optimizations for the BPF JIT.

Previously, the JIT only supported 12b branch targets for conditional
branches, and 21b for unconditional branches. Starting with this
series, 32b branching is supported.

As part of supporting far jumps, branch relaxation was introduced. The
idea is to start with a pessimistic jump (e.g. auipc/jalr) and for
each pass the JIT will have an opportunity to pick a better
instruction (e.g. jal) and shrink the image. Instead of two passes,
the JIT requires more passes. It typically converges after 3 passes.

The optimizations mentioned in the subject are for calls and tail
calls. In the tail call generation we can save one instruction by
using the offset in jalr. Calls are optimized by doing (auipc)/jal(r)
relative jumps instead of loading the entire absolute address and
doing jalr. This required that the JIT image allocator was made RISC-V
specific, so we can ensure that the JIT image and the kernel text are
in range (32b).

The last two patches of the series is not critical to the series, but
are two UAPI build issues for BPF events. A closer look from the
RV-folks would be much appreciated.

The test_bpf.ko module, selftests/bpf/test_verifier and
selftests/seccomp/seccomp_bpf pass all tests.

RISC-V is still missing proper kprobe and tracepoint support, so a lot
of BPF selftests cannot be run.

v1->v2: [1]
 * Removed unused function parameter from emit_branch()
 * Added patch to support far branch in tail call emit

[1] https://lore.kernel.org/bpf/20191209173136.29615-1-bjorn.topel@gmail.com/
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

7800a3d5

riscv, perf: Add arch specific perf_arch_bpf_user_pt_regs · 34bfc10a

Björn Töpel authored Dec 16, 2019

RISC-V was missing a proper perf_arch_bpf_user_pt_regs macro for
CONFIG_PERF_EVENT builds.
Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191216091343.23260-10-bjorn.topel@gmail.com

34bfc10a

riscv, bpf: Add missing uapi header for BPF_PROG_TYPE_PERF_EVENT programs · eb9928be

Björn Töpel authored Dec 16, 2019

Add missing uapi header the BPF_PROG_TYPE_PERF_EVENT programs by
exporting struct user_regs_struct instead of struct pt_regs which is
in-kernel only.
Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191216091343.23260-9-bjorn.topel@gmail.com

eb9928be

riscv, bpf: Optimize calls · e368b64f

Björn Töpel authored Dec 16, 2019

Instead of using emit_imm() and emit_jalr() which can expand to six
instructions, start using jal or auipc+jalr.
Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191216091343.23260-8-bjorn.topel@gmail.com

e368b64f

riscv, bpf: Provide RISC-V specific JIT image alloc/free · 7f3631e8

Björn Töpel authored Dec 16, 2019

This commit makes sure that the JIT images is kept close to the kernel
text, so BPF calls can use relative calling with auipc/jalr or jal
instead of loading the full 64-bit address and jalr.

The BPF JIT image region is 128 MB before the kernel text.
Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191216091343.23260-7-bjorn.topel@gmail.com

7f3631e8

riscv, bpf: Optimize BPF tail calls · fe8322b8

Björn Töpel authored Dec 16, 2019

Remove one addi, and instead use the offset part of jalr.
Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191216091343.23260-6-bjorn.topel@gmail.com

fe8322b8

riscv, bpf: Add support for far jumps and exits · 33203c02

Björn Töpel authored Dec 16, 2019

This commit add support for far (offset > 21b) jumps and exits.
Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Luke Nelson <lukenels@cs.washington.edu>
Link: https://lore.kernel.org/bpf/20191216091343.23260-5-bjorn.topel@gmail.com

33203c02

riscv, bpf: Add support for far branching when emitting tail call · 29d92edd

Björn Töpel authored Dec 16, 2019

Start use the emit_branch() function in the tail call emitter in order
to support far branching.
Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191216091343.23260-4-bjorn.topel@gmail.com

29d92edd

riscv, bpf: Add support for far branching · 7d1ef13f

Björn Töpel authored Dec 16, 2019

This commit adds branch relaxation to the BPF JIT, and with that
support for far (offset greater than 12b) branching.

The branch relaxation requires more than two passes to converge. For
most programs it is three passes, but for larger programs it can be
more.
Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Luke Nelson <lukenels@cs.washington.edu>
Link: https://lore.kernel.org/bpf/20191216091343.23260-3-bjorn.topel@gmail.com

7d1ef13f

riscv, bpf: Fix broken BPF tail calls · f1003b78

Björn Töpel authored Dec 16, 2019

The BPF JIT incorrectly clobbered the a0 register, and did not flag
usage of s5 register when BPF stack was being used.

Fixes: 2353ecc6 ("bpf, riscv: add BPF JIT for RV64G")
Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191216091343.23260-2-bjorn.topel@gmail.com

f1003b78

Merge branch 'libbpf-extern-followups' · a352a824

Alexei Starovoitov authored Dec 18, 2019

Andrii Nakryiko says:

====================
Based on latest feedback and discussions, this patch set implements the
following changes:

- Kconfig-provided externs have to be in .kconfig section, for which
  bpf_helpers.h provides convenient __kconfig macro (Daniel);
- instead of allowing to override Kconfig file path, switch this to ability to
  extend and override system Kconfig with user-provided custom values (Alexei);
- BTF is required when externs are used.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

a352a824

libbpf: BTF is required when externs are present · 630628cb

Andrii Nakryiko authored Dec 18, 2019

BTF is required to get type information about extern variables.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191219002837.3074619-4-andriin@fb.com

630628cb

libbpf: Allow to augment system Kconfig through extra optional config · 8601fd42

Andrii Nakryiko authored Dec 18, 2019

Instead of all or nothing approach of overriding Kconfig file location, allow
to extend it with extra values and override chosen subset of values though
optional user-provided extra config, passed as a string through open options'
.kconfig option. If same config key is present in both user-supplied config
and Kconfig, user-supplied one wins. This allows applications to more easily
test various conditions despite host kernel's real configuration. If all of
BPF object's __kconfig externs are satisfied from user-supplied config, system
Kconfig won't be read at all.

Simplify selftests by not needing to create temporary Kconfig files.
Suggested-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191219002837.3074619-3-andriin@fb.com

8601fd42

libbpf: Put Kconfig externs into .kconfig section · 81bfdd08

Andrii Nakryiko authored Dec 18, 2019

Move Kconfig-provided externs into custom .kconfig section. Add __kconfig into
bpf_helpers.h for user convenience. Update selftests accordingly.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191219002837.3074619-2-andriin@fb.com

81bfdd08

libbpf: Add bpf_link__disconnect() API to preserve underlying BPF resource · d6958706

Andrii Nakryiko authored Dec 18, 2019

There are cases in which BPF resource (program, map, etc) has to outlive
userspace program that "installed" it in the system in the first place.
When BPF program is attached, libbpf returns bpf_link object, which
is supposed to be destroyed after no longer necessary through
bpf_link__destroy() API. Currently, bpf_link destruction causes both automatic
detachment and frees up any resources allocated to for bpf_link in-memory
representation. This is inconvenient for the case described above because of
coupling of detachment and resource freeing.

This patch introduces bpf_link__disconnect() API call, which marks bpf_link as
disconnected from its underlying BPF resouces. This means that when bpf_link
is destroyed later, all its memory resources will be freed, but BPF resource
itself won't be detached.

This design allows to follow strict and resource-leak-free design by default,
while giving easy and straightforward way for user code to opt for keeping BPF
resource attached beyond lifetime of a bpf_link. For some BPF programs (i.e.,
FS-based tracepoints, kprobes, raw tracepoint, etc), user has to make sure to
pin BPF program to prevent kernel to automatically detach it on process exit.
This should typically be achived by pinning BPF program (or map in some cases)
in BPF FS.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20191218225039.2668205-1-andriin@fb.com

d6958706

bpf: Allow to change skb mark in test_run · 6de6c1f8

Nikita V. Shirokov authored Dec 18, 2019

allow to pass skb's mark field into bpf_prog_test_run ctx
for BPF_PROG_TYPE_SCHED_CLS prog type. that would allow
to test bpf programs which are doing decision based on this
field
Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

6de6c1f8

bpftool: Work-around rst2man conversion bug · dacce641

Andrii Nakryiko authored Dec 18, 2019

Work-around what appears to be a bug in rst2man convertion tool, used to
create man pages out of reStructureText-formatted documents. If text line
starts with dot, rst2man will put it in resulting man file verbatim. This
seems to cause man tool to interpret it as a directive/command (e.g., `.bs`), and
subsequently not render entire line because it's unrecognized one.

Enclose '.xxx' words in extra formatting to work around.

Fixes: cb21ac58 ("bpftool: Add gen subcommand manpage")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com
Link: https://lore.kernel.org/bpf/20191218221707.2552199-1-andriin@fb.com

dacce641

bpftool: Simplify format string to not use positional args · 7c43e0d6

Andrii Nakryiko authored Dec 18, 2019

Change format string referring to just single argument out of two available.
Some versions of libc can reject such format string.
Reported-by: Nikita Shirokov <tehnerd@tehnerd.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20191218214314.2403729-1-andriin@fb.com

7c43e0d6