Commits · 680ee0456a5712309db9ec2692e908ea1d6b1644 · Kirill Smelkov / linux

03 Aug, 2023 3 commits

net: invert the netdevice.h vs xdp.h dependency · 680ee045

Jakub Kicinski authored Aug 02, 2023

xdp.h is far more specific and is included in only 67 other
files vs netdevice.h's 1538 include sites.
Make xdp.h include netdevice.h, instead of the other way around.
This decreases the incremental allmodconfig builds size when
xdp.h is touched from 5947 to 662 objects.

Move bpf_prog_run_xdp() to xdp.h, seems appropriate and filter.h
is a mega-header in its own right so it's nice to avoid xdp.h
getting included there as well.

The only unfortunate part is that the typedef for xdp_features_t
has to move to netdevice.h, since its embedded in struct netdevice.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230803010230.1755386-4-kuba@kernel.orgSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

680ee045

net: move struct netdev_rx_queue out of netdevice.h · 49e47a5b

Jakub Kicinski authored Aug 02, 2023

struct netdev_rx_queue is touched in only a few places
and having it defined in netdevice.h brings in the dependency
on xdp.h, because struct xdp_rxq_info gets embedded in
struct netdev_rx_queue.

In prep for removal of xdp.h from netdevice.h move all
the netdev_rx_queue stuff to a new header.

We could technically break the new header up to avoid
the sysfs.h include but it's so rarely included it
doesn't seem to be worth it at this point.
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230803010230.1755386-3-kuba@kernel.orgSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

49e47a5b

eth: add missing xdp.h includes in drivers · 92272ec4

Jakub Kicinski authored Aug 02, 2023

Handful of drivers currently expect to get xdp.h by virtue
of including netdevice.h. This will soon no longer be the case
so add explicit includes.
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230803010230.1755386-2-kuba@kernel.orgSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

92272ec4

02 Aug, 2023 7 commits

Merge branch 'bpf-xdp-add-tracepoint-to-xdp-attaching-failure' · 87dc2bb3

Alexei Starovoitov authored Aug 02, 2023

Leon Hwang says:

====================
bpf, xdp: Add tracepoint to xdp attaching failure

This series introduces a new tracepoint in bpf_xdp_link_attach(). By
this tracepoint, error message will be captured when error happens in
dev_xdp_attach(), e.g. invalid attaching flags.

v4 -> v5:
* Initialise the extack variable.
* Fix code style issue of variable declaration lines.

v3 -> v4:
* Fix selftest-crashed issue.
====================

Link: https://lore.kernel.org/r/20230801142621.7925-1-hffilwlqm@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

87dc2bb3

selftests/bpf: Add testcase for xdp attaching failure tracepoint · 7fedbf32

Leon Hwang authored Aug 01, 2023

Add a test case for the tracepoint of xdp attaching failure by bpf
tracepoint when attach XDP to a device with invalid flags option.

The bpf tracepoint retrieves error message from the tracepoint, and
then put the error message to a perf buffer. The testing code receives
error message from perf buffer, and then ASSERT "Invalid XDP flags for
BPF link attachment".
Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
Link: https://lore.kernel.org/r/20230801142621.7925-3-hffilwlqm@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

7fedbf32

bpf, xdp: Add tracepoint to xdp attaching failure · bf4ea1d0

Leon Hwang authored Aug 01, 2023

When error happens in dev_xdp_attach(), it should have a way to tell
users the error message like the netlink approach.

To avoid breaking uapi, adding a tracepoint in bpf_xdp_link_attach() is
an appropriate way to notify users the error message.

Hence, bpf libraries are able to retrieve the error message by this
tracepoint, and then report the error message to users.
Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
Link: https://lore.kernel.org/r/20230801142621.7925-2-hffilwlqm@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

bf4ea1d0

selftests/bpf: fix static assert compilation issue for test_cls_*.c · 416c6d01

Alan Maguire authored Aug 02, 2023

commit bdeeed34 ("libbpf: fix offsetof() and container_of() to work with CO-RE")

...was backported to stable trees such as 5.15. The problem is that with older
LLVM/clang (14/15) - which is often used for older kernels - we see compilation
failures in BPF selftests now:

In file included from progs/test_cls_redirect_subprogs.c:2:
progs/test_cls_redirect.c:90:2: error: static assertion expression is not an integral constant expression
        sizeof(flow_ports_t) !=
        ^~~~~~~~~~~~~~~~~~~~~~~
progs/test_cls_redirect.c:91:3: note: cast that performs the conversions of a reinterpret_cast is not allowed in a constant expression
                offsetofend(struct bpf_sock_tuple, ipv4.dport) -
                ^
progs/test_cls_redirect.c:32:3: note: expanded from macro 'offsetofend'
        (offsetof(TYPE, MEMBER) + sizeof((((TYPE *)0)->MEMBER)))
         ^
tools/testing/selftests/bpf/tools/include/bpf/bpf_helpers.h:86:33: note: expanded from macro 'offsetof'
                                 ^
In file included from progs/test_cls_redirect_subprogs.c:2:
progs/test_cls_redirect.c:95:2: error: static assertion expression is not an integral constant expression
        sizeof(flow_ports_t) !=
        ^~~~~~~~~~~~~~~~~~~~~~~
progs/test_cls_redirect.c:96:3: note: cast that performs the conversions of a reinterpret_cast is not allowed in a constant expression
                offsetofend(struct bpf_sock_tuple, ipv6.dport) -
                ^
progs/test_cls_redirect.c:32:3: note: expanded from macro 'offsetofend'
        (offsetof(TYPE, MEMBER) + sizeof((((TYPE *)0)->MEMBER)))
         ^
tools/testing/selftests/bpf/tools/include/bpf/bpf_helpers.h:86:33: note: expanded from macro 'offsetof'
                                 ^
2 errors generated.
make: *** [Makefile:594: tools/testing/selftests/bpf/test_cls_redirect_subprogs.bpf.o] Error 1

The problem is the new offsetof() does not play nice with static asserts.
Given that the context is a static assert (and CO-RE relocation is not
needed at compile time), offsetof() usage can be replaced by restoring
the original offsetof() definition as __builtin_offsetof().

Fixes: bdeeed34 ("libbpf: fix offsetof() and container_of() to work with CO-RE")
Reported-by: Colm Harrington <colm.harrington@oracle.com>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Yipeng Zou <zouyipeng@huawei.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230802073906.3197480-1-alan.maguire@oracle.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

416c6d01

bpf: fix bpf_probe_read_kernel prototype mismatch · 6a5a148a

Arnd Bergmann authored Aug 01, 2023

bpf_probe_read_kernel() has a __weak definition in core.c and another
definition with an incompatible prototype in kernel/trace/bpf_trace.c,
when CONFIG_BPF_EVENTS is enabled.

Since the two are incompatible, there cannot be a shared declaration in
a header file, but the lack of a prototype causes a W=1 warning:

kernel/bpf/core.c:1638:12: error: no previous prototype for 'bpf_probe_read_kernel' [-Werror=missing-prototypes]

On 32-bit architectures, the local prototype

u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)

passes arguments in other registers as the one in bpf_trace.c

BPF_CALL_3(bpf_probe_read_kernel, void *, dst, u32, size,
const void *, unsafe_ptr)

which uses 64-bit arguments in pairs of registers.

As both versions of the function are fairly simple and only really
differ in one line, just move them into a header file as an inline
function that does not add any overhead for the bpf_trace.c callers
and actually avoids a function call for the other one.

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/ac25cb0f-b804-1649-3afb-1dc6138c2716@iogearbox.net/Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230801111449.185301-1-arnd@kernel.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

6a5a148a

riscv, bpf: Adapt bpf trampoline to optimized riscv ftrace framework · 25ad1065

Pu Lehui authored Jul 21, 2023

Commit 6724a76c ("riscv: ftrace: Reduce the detour code size to
half") optimizes the detour code size of kernel functions to half with
T0 register and the upcoming DYNAMIC_FTRACE_WITH_DIRECT_CALLS of riscv
is based on this optimization, we need to adapt riscv bpf trampoline
based on this. One thing to do is to reduce detour code size of bpf
programs, and the second is to deal with the return address after the
execution of bpf trampoline. Meanwhile, we need to construct the frame
of parent function, otherwise we will miss one layer when unwinding.
The related tests have passed.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230721100627.2630326-1-pulehui@huaweicloud.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

25ad1065

libbpf: fix typos in Makefile · 94e38c95

Randy Dunlap authored Jul 21, 2023

Capitalize ABI (acronym) and fix spelling of "destination".

Fixes: 70681949 ("libbpf: Improve usability of libbpf Makefile")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: bpf@vger.kernel.org
Cc: Xin Liu <liuxin350@huawei.com>
Link: https://lore.kernel.org/r/20230722065236.17010-1-rdunlap@infradead.orgSigned-off-by: Alexei Starovoitov <ast@kernel.org>

94e38c95

01 Aug, 2023 6 commits

tracing: bpf: use struct trace_entry in struct syscall_tp_t · d3c4db86

Yauheni Kaliuta authored Aug 01, 2023

bpf tracepoint program uses struct trace_event_raw_sys_enter as
argument where trace_entry is the first field. Use the same instead
of unsigned long long since if it's amended (for example by RT
patch) it accesses data with wrong offset.
Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230801075222.7717-1-ykaliuta@redhat.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d3c4db86

Merge branch 'Remove unused fields in cpumap & devmap' · 11108652

Martin KaFai Lau authored Jul 31, 2023

Hou Tao says:

====================
Patchset "Simplify xdp_do_redirect_map()/xdp_do_flush_map() and XDP
maps" [0] changed per-map flush list to global per-cpu flush list
for cpumap, devmap and xskmap, but it forgot to remove these unused
fields from cpumap and devmap. So just remove these unused fields.

Comments and suggestions are always welcome.

[0]: https://lore.kernel.org/bpf/20191219061006.21980-1-bjorn.topel@gmail.com
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

11108652

bpf, devmap: Remove unused dtab field from bpf_dtab_netdev · 1ea66e89

Hou Tao authored Jul 28, 2023

Commit 96360004 ("xdp: Make devmap flush_list common for all map
instances") removes the use of bpf_dtab_netdev::dtab in bq_enqueue(),
so just remove dtab from bpf_dtab_netdev.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230728014942.892272-3-houtao@huaweicloud.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

1ea66e89

bpf, cpumap: Remove unused cmap field from bpf_cpu_map_entry · 2d20bfc3

Hou Tao authored Jul 28, 2023

Since commit cdfafe98 ("xdp: Make cpumap flush_list common for all
map instances"), cmap is no longer used, so just remove it.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230728014942.892272-2-houtao@huaweicloud.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

2d20bfc3

netfilter: bpf: Only define get_proto_defrag_hook() if necessary · 81584c23

Daniel Xu authored Jul 31, 2023

Before, we were getting this warning:

net/netfilter/nf_bpf_link.c:32:1: warning: 'get_proto_defrag_hook' defined but not used [-Wunused-function]

Guard the definition with CONFIG_NF_DEFRAG_IPV[4|6].

Fixes: 91721c2d ("netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307291213.fZ0zDmoG-lkp@intel.com/Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/b128b6489f0066db32c4772ae4aaee1480495929.1690840454.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>

81584c23

bpf: Fix an array-index-out-of-bounds issue in disasm.c · e99688eb

Yonghong Song authored Jul 31, 2023

syzbot reported an array-index-out-of-bounds when printing out bpf
insns. Further investigation shows the insn is illegal but
is printed out due to log level 1 or 2 before actual insn verification
in do_check().

This particular illegal insn is a MOVSX insn with offset value 2.
The legal offset value for MOVSX should be 8, 16 and 32.
The disasm sign-extension-size array index is calculated as
 (insn->off / 8) - 1
and offset value 2 gives an out-of-bound index -1.

Tighten the checking for MOVSX insn in disasm.c to avoid
array-index-out-of-bounds issue.

Reported-by: syzbot+3758842a6c01012aa73b@syzkaller.appspotmail.com
Fixes: f835bb62 ("bpf: Add kernel/bpftool asm support for new instructions")
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230731204534.1975311-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

e99688eb

31 Jul, 2023 1 commit

net: remove duplicate INDIRECT_CALLABLE_DECLARE of udp[6]_ehashfn · 74bdfab4

Lorenz Bauer authored Jul 31, 2023

There are already INDIRECT_CALLABLE_DECLARE in the hashtable
headers, no need to declare them again.

Fixes: 0f495f76 ("net: remove duplicate reuseport_lookup functions")
Suggested-by: Martin Lau <martin.lau@linux.dev>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230731-indir-call-v1-1-4cd0aeaee64f@isovalent.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>

74bdfab4

30 Jul, 2023 1 commit

docs/bpf: Fix malformed documentation · fb213ecb

Yonghong Song authored Jul 29, 2023

Two issues are fixed:
1. Malformed table due to newly-introduced BPF_MOVSX
2. Missing reference link for ``Sign-extension load operations``

Fixes: 245d4c40 ("docs/bpf: Add documentation for new instructions")
Cc: bpf@ietf.org
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307291840.Cqhj7uox-lkp@intel.com/Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20230730004251.381307-1-yonghong.song@linux.devSigned-off-by: Alexei Starovoitov <ast@kernel.org>

fb213ecb

28 Jul, 2023 22 commits

Merge branch 'support-defragmenting-ipv-4-6-packets-in-bpf' · eb03993a

Alexei Starovoitov authored Jul 28, 2023

Daniel Xu says:

====================
Support defragmenting IPv(4|6) packets in BPF

=== Context ===

In the context of a middlebox, fragmented packets are tricky to handle.
The full 5-tuple of a packet is often only available in the first
fragment which makes enforcing consistent policy difficult. There are
really only two stateless options, neither of which are very nice:

1. Enforce policy on first fragment and accept all subsequent fragments.
   This works but may let in certain attacks or allow data exfiltration.

2. Enforce policy on first fragment and drop all subsequent fragments.
   This does not really work b/c some protocols may rely on
   fragmentation. For example, DNS may rely on oversized UDP packets for
   large responses.

So stateful tracking is the only sane option. RFC 8900 [0] calls this
out as well in section 6.3:

    Middleboxes [...] should process IP fragments in a manner that is
    consistent with [RFC0791] and [RFC8200]. In many cases, middleboxes
    must maintain state in order to achieve this goal.

=== BPF related bits ===

Policy has traditionally been enforced from XDP/TC hooks. Both hooks
run before kernel reassembly facilities. However, with the new
BPF_PROG_TYPE_NETFILTER, we can rather easily hook into existing
netfilter reassembly infra.

The basic idea is we bump a refcnt on the netfilter defrag module and
then run the bpf prog after the defrag module runs. This allows bpf
progs to transparently see full, reassembled packets. The nice thing
about this is that progs don't have to carry around logic to detect
fragments.

=== Changelog ===

Changes from v5:

* Fix defrag disable codepaths

Changes from v4:

* Refactor module handling code to not sleep in rcu_read_lock()
* Also unify the v4 and v6 hook structs so they can share codepaths
* Fixed some checkpatch.pl formatting warnings

Changes from v3:

* Correctly initialize `addrlen` stack var for recvmsg()

Changes from v2:

* module_put() if ->enable() fails
* Fix CI build errors

Changes from v1:

* Drop bpf_program__attach_netfilter() patches
* static -> static const where appropriate
* Fix callback assignment order during registration
* Only request_module() if callbacks are missing
* Fix retval when modprobe fails in userspace
* Fix v6 defrag module name (nf_defrag_ipv6_hooks -> nf_defrag_ipv6)
* Simplify priority checking code
* Add warning if module doesn't assign callbacks in the future
* Take refcnt on module while defrag link is active

[0]: https://datatracker.ietf.org/doc/html/rfc8900
====================

Link: https://lore.kernel.org/r/cover.1689970773.git.dxu@dxuuu.xyzSigned-off-by: Alexei Starovoitov <ast@kernel.org>

eb03993a

bpf: selftests: Add defrag selftests · c313eae7