Commits · 6d26d985eeda89faedabbcf6607c37454b9691b0 · Kirill Smelkov / linux

22 Apr, 2023 7 commits

bpf: fix link failure with NETFILTER=y INET=n · 6d26d985

Florian Westphal authored Apr 22, 2023

Explicitly check if NETFILTER_BPF_LINK is enabled, else configs
that have NETFILTER=y but CONFIG_INET=n fail to link:

> kernel/bpf/syscall.o: undefined reference to `netfilter_prog_ops'
> kernel/bpf/verifier.o: undefined reference to `netfilter_verifier_ops'

Fixes: fd9c663b ("bpf: minimal support for programs hooked into netfilter framework")
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304220903.fRZTJtxe-lkp@intel.com/Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230422073544.17634-1-fw@strlen.deSigned-off-by: Alexei Starovoitov <ast@kernel.org>

6d26d985

Merge tag 'mlx5-updates-2023-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · fbc1449d

Jakub Kicinski authored Apr 21, 2023

Saeed Mahameed says:

====================
mlx5-updates-2023-04-20

1) Dragos Improves RX page pool, and provides some fixes to his previous
   series:
 1.1) Fix releasing page_pool for striding RQ and legacy RQ nonlinear case
 1.2) Hook NAPIs to page pools to gain more performance.

2) From Roi, Some cleanups to TC and eswitch modules.

3) Maher migrates vnic diagnostic counters reporting from debugfs to a
    dedicated devlink health reporter

Maher Says:
===========
 net/mlx5: Expose vnic diagnostic counters using devlink

Currently, vnic diagnostic counters are exposed through the following
debugfs:

$ ls /sys/kernel/debug/mlx5/0000:08:00.0/esw/vf_0/vnic_diag/
cq_overrun
quota_exceeded_command
total_q_under_processor_handle
invalid_command
send_queue_priority_update_flow
nic_receive_steering_discard

The current design does not allow the hypervisor to view the diagnostic
counters of its VFs, in case the VFs get bound to a VM. In other words,
the counters are not exposed for representor interfaces.
Furthermore, the debugfs design is inconvenient future-wise, in case more
counters need to be reported by the driver in the future.

As these counters pertain to vNIC health, it is more appropriate to
utilize the devlink health reporter to expose them.

Thus, this patchest includes the following changes:

* Drop the current vnic diagnostic counters debugfs interface.
* Add a vnic devlink health reporter for PFs/VFs core devices, which
  when diagnosed will dump vnic diagnostic counter values that are
  queried from FW.
* Add a vnic devlink health reporter for the representor interface, which
  serves the same purpose listed in the previous point, in addition to
  allowing the hypervisor to view its VFs diagnostic counters, even when
  the VFs are bounded to external VMs.

Example of devlink health reporter usage is:
$devlink health diagnose pci/0000:08:00.0 reporter vnic
 vNIC env counters:
    total_error_queues: 0 send_queue_priority_update_flow: 0
    comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0
    invalid_command: 0 quota_exceeded_command: 0
    nic_receive_steering_discard: 0

===========

4) SW steering fixes and improvements

Yevgeny Kliteynik Says:
=======================
These short patch series are just small fixes / improvements for
SW steering:

 - Patch 1: Fix dumping of legacy modify_hdr in debug dump to
   align to what is expected by parser
 - Patch 2: Have separate threshold for ICM sync per ICM type
 - Patch 3: Add more info to the steering debug dump - Linux
   version and device name
 - Patch 4: Keep track of number of buddies that are currently
   in use per domain per buddy type

=======================

* tag 'mlx5-updates-2023-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Update op_mode to op_mod for port selection
  net/mlx5: E-Switch, Remove unused mlx5_esw_offloads_vport_metadata_set()
  net/mlx5: E-Switch, Remove redundant dev arg from mlx5_esw_vport_alloc()
  net/mlx5: Include linux/pci.h for pci_msix_can_alloc_dyn()
  net/mlx5e: RX, Hook NAPIs to page pools
  net/mlx5e: RX, Fix XDP_TX page release for legacy rq nonlinear case
  net/mlx5e: RX, Fix releasing page_pool pages twice for striding RQ
  net/mlx5e: Add vnic devlink health reporter to representors
  net/mlx5: Add vnic devlink health reporter to PFs/VFs
  Revert "net/mlx5: Expose vnic diagnostic counters for eswitch managed vports"
  Revert "net/mlx5: Expose steering dropped packets counter"
  net/mlx5: DR, Add memory statistics for domain object
  net/mlx5: DR, Add more info in domain dbg dump
  net/mlx5: DR, Calculate sync threshold of each pool according to its type
  net/mlx5: DR, Fix dumping of legacy modify_hdr in debug dump
====================

Link: https://lore.kernel.org/r/20230421013850.349646-1-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

fbc1449d

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 9a82cdc2

Jakub Kicinski authored Apr 21, 2023

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-04-21

We've added 71 non-merge commits during the last 8 day(s) which contain
a total of 116 files changed, 13397 insertions(+), 8896 deletions(-).

The main changes are:

1) Add a new BPF netfilter program type and minimal support to hook
   BPF programs to netfilter hooks such as prerouting or forward,
   from Florian Westphal.

2) Fix race between btf_put and btf_idr walk which caused a deadlock,
   from Alexei Starovoitov.

3) Second big batch to migrate test_verifier unit tests into test_progs
   for ease of readability and debugging, from Eduard Zingerman.

4) Add support for refcounted local kptrs to the verifier for allowing
   shared ownership, useful for adding a node to both the BPF list and
   rbtree, from Dave Marchevsky.

5) Migrate bpf_for(), bpf_for_each() and bpf_repeat() macros from BPF
  selftests into libbpf-provided bpf_helpers.h header and improve
  kfunc handling, from Andrii Nakryiko.

6) Support 64-bit pointers to kfuncs needed for archs like s390x,
   from Ilya Leoshkevich.

7) Support BPF progs under getsockopt with a NULL optval,
   from Stanislav Fomichev.

8) Improve verifier u32 scalar equality checking in order to enable
   LLVM transformations which earlier had to be disabled specifically
   for BPF backend, from Yonghong Song.

9) Extend bpftool's struct_ops object loading to support links,
   from Kui-Feng Lee.

10) Add xsk selftest follow-up fixes for hugepage allocated umem,
    from Magnus Karlsson.

11) Support BPF redirects from tc BPF to ifb devices,
    from Daniel Borkmann.

12) Add BPF support for integer type when accessing variable length
    arrays, from Feng Zhou.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (71 commits)
  selftests/bpf: verifier/value_ptr_arith converted to inline assembly
  selftests/bpf: verifier/value_illegal_alu converted to inline assembly
  selftests/bpf: verifier/unpriv converted to inline assembly
  selftests/bpf: verifier/subreg converted to inline assembly
  selftests/bpf: verifier/spin_lock converted to inline assembly
  selftests/bpf: verifier/sock converted to inline assembly
  selftests/bpf: verifier/search_pruning converted to inline assembly
  selftests/bpf: verifier/runtime_jit converted to inline assembly
  selftests/bpf: verifier/regalloc converted to inline assembly
  selftests/bpf: verifier/ref_tracking converted to inline assembly
  selftests/bpf: verifier/map_ptr_mixing converted to inline assembly
  selftests/bpf: verifier/map_in_map converted to inline assembly
  selftests/bpf: verifier/lwt converted to inline assembly
  selftests/bpf: verifier/loops1 converted to inline assembly
  selftests/bpf: verifier/jeq_infer_not_null converted to inline assembly
  selftests/bpf: verifier/direct_packet_access converted to inline assembly
  selftests/bpf: verifier/d_path converted to inline assembly
  selftests/bpf: verifier/ctx converted to inline assembly
  selftests/bpf: verifier/btf_ctx_access converted to inline assembly
  selftests/bpf: verifier/bpf_get_stack converted to inline assembly
  ...
====================

Link: https://lore.kernel.org/r/20230421211035.9111-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

9a82cdc2

net: dst: fix missing initialization of rt_uncached · 418a7307

Maxime Bizon authored Apr 20, 2023

xfrm_alloc_dst() followed by xfrm4_dst_destroy(), without a
xfrm4_fill_dst() call in between, causes the following BUG:

 BUG: spinlock bad magic on CPU#0, fbxhostapd/732
  lock: 0x890b7668, .magic: 890b7668, .owner: <none>/-1, .owner_cpu: 0
 CPU: 0 PID: 732 Comm: fbxhostapd Not tainted 6.3.0-rc6-next-20230414-00613-ge8de66369925-dirty #9
 Hardware name: Marvell Kirkwood (Flattened Device Tree)
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x28/0x30
  dump_stack_lvl from do_raw_spin_lock+0x20/0x80
  do_raw_spin_lock from rt_del_uncached_list+0x30/0x64
  rt_del_uncached_list from xfrm4_dst_destroy+0x3c/0xbc
  xfrm4_dst_destroy from dst_destroy+0x5c/0xb0
  dst_destroy from rcu_process_callbacks+0xc4/0xec
  rcu_process_callbacks from __do_softirq+0xb4/0x22c
  __do_softirq from call_with_stack+0x1c/0x24
  call_with_stack from do_softirq+0x60/0x6c
  do_softirq from __local_bh_enable_ip+0xa0/0xcc

Patch "net: dst: Prevent false sharing vs. dst_entry:: __refcnt" moved
rt_uncached and rt_uncached_list fields from rtable struct to dst
struct, so they are more zeroed by memset_after(xdst, 0, u.dst) in
xfrm_alloc_dst().

Note that rt_uncached (list_head) was never properly initialized at
alloc time, but xfrm[46]_dst_destroy() is written in such a way that
it was not an issue thanks to the memset:

	if (xdst->u.rt.dst.rt_uncached_list)
		rt_del_uncached_list(&xdst->u.rt);

The route code does it the other way around: rt_uncached_list is
assumed to be valid IIF rt_uncached list_head is not empty:

void rt_del_uncached_list(struct rtable *rt)
{
        if (!list_empty(&rt->dst.rt_uncached)) {
                struct uncached_list *ul = rt->dst.rt_uncached_list;

                spin_lock_bh(&ul->lock);
                list_del_init(&rt->dst.rt_uncached);
                spin_unlock_bh(&ul->lock);
        }
}

This patch adds mandatory rt_uncached list_head initialization in
generic dst_init(), and adapt xfrm[46]_dst_destroy logic to match the
rest of the code.

Fixes: d288a162 ("net: dst: Prevent false sharing vs. dst_entry:: __refcnt")
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/oe-lkp/202304162125.18b7bcdd-oliver.sang@intel.comReviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
CC: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Link: https://lore.kernel.org/r/20230420182508.2417582-1-mbizon@freebox.frSigned-off-by: Jakub Kicinski <kuba@kernel.org>

418a7307

net: dsa: qca8k: fix LEDS_CLASS dependency · 33c1af8e

Arnd Bergmann authored Apr 20, 2023

With LEDS_CLASS=m, a built-in qca8k driver fails to link:

arm-linux-gnueabi-ld: drivers/net/dsa/qca/qca8k-leds.o: in function `qca8k_setup_led_ctrl':
qca8k-leds.c:(.text+0x1ea): undefined reference to `devm_led_classdev_register_ext'

Change the dependency to avoid the broken configuration.

Fixes: 1e264f9d ("net: dsa: qca8k: add LEDs basic support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230420213639.2243388-1-arnd@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

33c1af8e

net/handshake: Fix section mismatch in handshake_exit · 6aa445e3

Geert Uytterhoeven authored Apr 20, 2023

If CONFIG_NET_NS=n (e.g. m68k/defconfig):

WARNING: modpost: vmlinux.o: section mismatch in reference: handshake_exit (section: .exit.text) -> handshake_genl_net_ops (section: .init.data)
ERROR: modpost: Section mismatches detected.

Fix this by dropping the __net_initdata tag from handshake_genl_net_ops.

Fixes: 3b3009ea ("net/handshake: Create a NETLINK service for handling handshake requests")
Reported-by: noreply@ellerman.id.au
Closes: http://kisskb.ellerman.id.au/kisskb/buildresult/14912987Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/20230420173723.3773434-1-geert@linux-m68k.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6aa445e3

net: phy: add basic driver for NXP CBTX PHY · f3b766d9

Vladimir Oltean authored Apr 18, 2023

The CBTX PHY is a Fast Ethernet PHY integrated into the SJA1110 A/B/C
automotive Ethernet switches.

It was hoped it would work with the Generic PHY driver, but alas, it
doesn't. The most important reason why is that the PHY is powered down
by default, and it needs a vendor register to power it on.

It has a linear memory map that is accessed over SPI by the SJA1110
switch driver, which exposes a fake MDIO controller. It has the
following (and only the following) standard clause 22 registers:

0x0: MII_BMCR
0x1: MII_BMSR
0x2: MII_PHYSID1
0x3: MII_PHYSID2
0x4: MII_ADVERTISE
0x5: MII_LPA
0x6: MII_EXPANSION
0x7: the missing MII_NPAGE for Next Page Transmit Register

Every other register is vendor-defined.

The register map expands the standard clause 22 5-bit address space of
0x20 registers, however the driver does not need to access the extra
registers for now (and hopefully never). If it ever needs to do that, it
is possible to implement a fake (software) page switching mechanism
between the PHY driver and the SJA1110 MDIO controller driver.

Also, Auto-MDIX is turned off by default in hardware, the driver turns
it on by default and reports the current status. I've tested this with a
VSC8514 link partner and a crossover cable, by forcing the mode on the
link partner, and seeing that the CBTX PHY always sees the reverse of
the mode forced on the VSC8514 (and that traffic works). The link
doesn't come up (as expected) if MDI modes are forced on both ends in
the same way (with the cross-over cable, that is).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230418190141.1040562-1-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f3b766d9

21 Apr, 2023 33 commits

selftests/bpf: verifier/value_ptr_arith converted to inline assembly · 4db10a82

Eduard Zingerman authored Apr 21, 2023

Test verifier/value_ptr_arith automatically converted to use inline assembly.

Test cases "sanitation: alu with different scalars 2" and
"sanitation: alu with different scalars 3" are updated to
avoid -ENOENT as return value, as __retval() annotation
only supports numeric literals.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-25-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

4db10a82

selftests/bpf: verifier/value_illegal_alu converted to inline assembly · efe25a33

Eduard Zingerman authored Apr 21, 2023

Test verifier/value_illegal_alu automatically converted to use inline assembly.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-24-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

efe25a33

selftests/bpf: verifier/unpriv converted to inline assembly · 82887c25

Eduard Zingerman authored Apr 21, 2023

Test verifier/unpriv semi-automatically converted to use inline assembly.

The verifier/unpriv.c had to be split in two parts:
- the bulk of the tests is in the progs/verifier_unpriv.c;
- the single test that needs `struct bpf_perf_event_data`
  definition is in the progs/verifier_unpriv_perf.c.

The tests above can't be in a single file because:
- first requires inclusion of the filter.h header
  (to get access to BPF_ST_MEM macro, inline assembler does
   not support this isntruction);
- the second requires vmlinux.h, which contains definitions
  conflicting with filter.h.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-23-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

82887c25

selftests/bpf: verifier/subreg converted to inline assembly · 81d1d6dd

Eduard Zingerman authored Apr 21, 2023

Test verifier/subreg automatically converted to use inline assembly.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-22-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

81d1d6dd

selftests/bpf: verifier/spin_lock converted to inline assembly · f323a818

Eduard Zingerman authored Apr 21, 2023

Test verifier/spin_lock automatically converted to use inline assembly.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-21-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

f323a818

selftests/bpf: verifier/sock converted to inline assembly · 426fc0e3

Eduard Zingerman authored Apr 21, 2023

Test verifier/sock automatically converted to use inline assembly.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-20-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

426fc0e3

selftests/bpf: verifier/search_pruning converted to inline assembly · 034d9ad2

Eduard Zingerman authored Apr 21, 2023

Test verifier/search_pruning automatically converted to use inline assembly.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-19-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

034d9ad2

selftests/bpf: verifier/runtime_jit converted to inline assembly · 65222842

Eduard Zingerman authored Apr 21, 2023

Test verifier/runtime_jit automatically converted to use inline assembly.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-18-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

65222842

selftests/bpf: verifier/regalloc converted to inline assembly · 16a42573

Eduard Zingerman authored Apr 21, 2023

Test verifier/regalloc automatically converted to use inline assembly.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-17-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

16a42573

selftests/bpf: verifier/ref_tracking converted to inline assembly · 8be63279

Eduard Zingerman authored Apr 21, 2023

Test verifier/ref_tracking automatically converted to use inline assembly.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-16-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

8be63279

selftests/bpf: verifier/map_ptr_mixing converted to inline assembly · aee1779f

Eduard Zingerman authored Apr 21, 2023

Test verifier/map_ptr_mixing automatically converted to use inline assembly.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-13-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

aee1779f

selftests/bpf: verifier/map_in_map converted to inline assembly · 4a400ef9

Eduard Zingerman authored Apr 21, 2023

Test verifier/map_in_map automatically converted to use inline assembly.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-12-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

4a400ef9

selftests/bpf: verifier/lwt converted to inline assembly · b427ca57

Eduard Zingerman authored Apr 21, 2023

Test verifier/lwt automatically converted to use inline assembly.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-11-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

b427ca57

selftests/bpf: verifier/loops1 converted to inline assembly · a6fc14dc

Eduard Zingerman authored Apr 21, 2023

Test verifier/loops1 automatically converted to use inline assembly.

There are a few modifications for the converted tests.
"tracepoint" programs do not support test execution, change program
type to "xdp" (which supports test execution) for the following tests
that have __retval tags:
- bounded loop, count to 4
- bonded loop containing forward jump

Also, remove the __retval tag for test:
- bounded loop, count from positive unknown to 4

As it's return value is a random number.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-10-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

a6fc14dc

selftests/bpf: verifier/jeq_infer_not_null converted to inline assembly · a5828e31

Eduard Zingerman authored Apr 21, 2023

Test verifier/jeq_infer_not_null automatically converted to use inline assembly.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-9-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

a5828e31

selftests/bpf: verifier/direct_packet_access converted to inline assembly · 0a372c9c

Eduard Zingerman authored Apr 21, 2023

Test verifier/direct_packet_access automatically converted to use inline assembly.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-8-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

0a372c9c

selftests/bpf: verifier/d_path converted to inline assembly · 60802802

Eduard Zingerman authored Apr 21, 2023

Test verifier/d_path automatically converted to use inline assembly.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-7-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

60802802

selftests/bpf: verifier/ctx converted to inline assembly · fcd36964

Eduard Zingerman authored Apr 21, 2023

Test verifier/ctx automatically converted to use inline assembly.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-6-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

fcd36964

selftests/bpf: verifier/btf_ctx_access converted to inline assembly · 37467c79

Eduard Zingerman authored Apr 21, 2023

Test verifier/btf_ctx_access automatically converted to use inline assembly.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-5-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

37467c79

selftests/bpf: verifier/bpf_get_stack converted to inline assembly · 965a3f91

Eduard Zingerman authored Apr 21, 2023

Test verifier/bpf_get_stack automatically converted to use inline assembly.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-4-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

965a3f91

selftests/bpf: verifier/bounds converted to inline assembly · c9233655

Eduard Zingerman authored Apr 21, 2023

Test verifier/bounds automatically converted to use inline assembly.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-3-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

c9233655

selftests/bpf: Add notion of auxiliary programs for test_loader · 63bb645b

Eduard Zingerman authored Apr 21, 2023

In order to express test cases that use bpf_tail_call() intrinsic it
is necessary to have several programs to be loaded at a time.
This commit adds __auxiliary annotation to the set of annotations
supported by test_loader.c. Programs marked as auxiliary are always
loaded but are not treated as a separate test.

For example:

    void dummy_prog1(void);

    struct {
            __uint(type, BPF_MAP_TYPE_PROG_ARRAY);
            __uint(max_entries, 4);
            __uint(key_size, sizeof(int));
            __array(values, void (void));
    } prog_map SEC(".maps") = {
            .values = {
                    [0] = (void *) &dummy_prog1,
            },
    };

    SEC("tc")
    __auxiliary
    __naked void dummy_prog1(void) {
            asm volatile ("r0 = 42; exit;");
    }

    SEC("tc")
    __description("reference tracking: check reference or tail call")
    __success __retval(0)
    __naked void check_reference_or_tail_call(void)
    {
            asm volatile (
            "r2 = %[prog_map] ll;"
            "r3 = 0;"
            "call %[bpf_tail_call];"
            "r0 = 0;"
            "exit;"
            :: __imm(bpf_tail_call),
            :  __clobber_all);
    }
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-2-eddyz87@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

63bb645b

Merge branch 'bpf: add netfilter program type' · d7a799ec

Alexei Starovoitov authored Apr 21, 2023

Florian Westphal says:

====================
Changes since last version:
- rework test case in last patch wrt. ctx->skb dereference etc (Alexei)
- pacify bpf ci tests, netfilter program type missed string translation
  in libbpf helper.

This still uses runtime btf walk rather than extending
the btf trace array as Alexei suggested, I would do this later (or someone else can).

v1 cover letter:

Add minimal support to hook bpf programs to netfilter hooks, e.g.
PREROUTING or FORWARD.

For this the most relevant parts for registering a netfilter
hook via the in-kernel api are exposed to userspace via bpf_link.

The new program type is 'tracing style', i.e. there is no context
access rewrite done by verifier, the function argument (struct bpf_nf_ctx)
isn't stable.
There is no support for direct packet access, dynptr api should be used
instead.

With this its possible to build a small test program such as:

 #include "vmlinux.h"
extern int bpf_dynptr_from_skb(struct __sk_buff *skb, __u64 flags,
                               struct bpf_dynptr *ptr__uninit) __ksym;
extern void *bpf_dynptr_slice(const struct bpf_dynptr *ptr, uint32_t offset,
                                   void *buffer, uint32_t buffer__sz) __ksym;
SEC("netfilter")
int nf_test(struct bpf_nf_ctx *ctx)
{
	struct nf_hook_state *state = ctx->state;
	struct sk_buff *skb = ctx->skb;
	const struct iphdr *iph, _iph;
	const struct tcphdr *th, _th;
	struct bpf_dynptr ptr;

	if (bpf_dynptr_from_skb(skb, 0, &ptr))
		return NF_DROP;

	iph = bpf_dynptr_slice(&ptr, 0, &_iph, sizeof(_iph));
	if (!iph)
		return NF_DROP;

	th = bpf_dynptr_slice(&ptr, iph->ihl << 2, &_th, sizeof(_th));
	if (!th)
		return NF_DROP;

	bpf_printk("accept %x:%d->%x:%d, hook %d ifin %d\n",
		   iph->saddr, bpf_ntohs(th->source), iph->daddr,
		   bpf_ntohs(th->dest), state->hook, state->in->ifindex);
        return NF_ACCEPT;
}

Then, tail /sys/kernel/tracing/trace_pipe.

Changes since v3:
- uapi: remove 'reserved' struct member, s/prio/priority (Alexei)
- add ctx access test cases (Alexei, see last patch)
- some arm32 can only handle cmpxchg on u32 (build bot)
- Fix kdoc annotations (Simon Horman)
- bpftool: prefer p_err, not fprintf (Quentin)
- add test cases in separate patch

Changes since v2:
1. don't WARN when user calls 'bpftool loink detach' twice
   restrict attachment to ip+ip6 families, lets relax this
   later in case arp/bridge/netdev are needed too.
2. show netfilter links in 'bpftool net' output as well.

Changes since v1:
1. Don't fail to link when CONFIG_NETFILTER=n (build bot)
2. Use test_progs instead of test_verifier (Alexei)

Changes since last RFC version:
1. extend 'bpftool link show' to print prio/hooknum etc
2. extend 'nft list hooks' so it can print the bpf program id
3. Add an extra patch to artificially restrict bpf progs with
   same priority.  Its fine from a technical pov but it will
   cause ordering issues (most recent one comes first).
   Can be removed later.
4. Add test_run support for netfilter prog type and a small
   extension to verifier tests to make sure we can't return
   verdicts like NF_STOLEN.
5. Alter the netfilter part of the bpf_link uapi struct:
   - add flags/reserved members.
  Not used here except returning errors when they are nonzero.
  Plan is to allow the bpf_link users to enable netfilter
  defrag or conntrack engine by setting feature flags at
  link create time in the future.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

d7a799ec

selftests/bpf: add missing netfilter return value and ctx access tests · 006c0e44

Florian Westphal authored Apr 21, 2023

Extend prog_tests with two test cases:

 # ./test_progs --allow=verifier_netfilter_retcode
 #278/1   verifier_netfilter_retcode/bpf_exit with invalid return code. test1:OK
 #278/2   verifier_netfilter_retcode/bpf_exit with valid return code. test2:OK
 #278/3   verifier_netfilter_retcode/bpf_exit with valid return code. test3:OK
 #278/4   verifier_netfilter_retcode/bpf_exit with invalid return code. test4:OK
 #278     verifier_netfilter_retcode:OK

This checks that only accept and drop (0,1) are permitted.

NF_QUEUE could be implemented later if we can guarantee that attachment
of such programs can be rejected if they get attached to a pf/hook that
doesn't support async reinjection.

NF_STOLEN could be implemented via trusted helpers that can guarantee
that the skb will eventually be free'd.

v4: test case for bpf_nf_ctx access checks, requested by Alexei Starovoitov.
v5: also check ctx->{state,skb} can be dereferenced (Alexei).

 # ./test_progs --allow=verifier_netfilter_ctx
 #281/1   verifier_netfilter_ctx/netfilter invalid context access, size too short:OK
 #281/2   verifier_netfilter_ctx/netfilter invalid context access, size too short:OK
 #281/3   verifier_netfilter_ctx/netfilter invalid context access, past end of ctx:OK
 #281/4   verifier_netfilter_ctx/netfilter invalid context, write:OK
 #281/5   verifier_netfilter_ctx/netfilter valid context read and invalid write:OK
 #281/6   verifier_netfilter_ctx/netfilter test prog with skb and state read access:OK
 #281/7   verifier_netfilter_ctx/netfilter test prog with skb and state read access @unpriv:OK
 #281     verifier_netfilter_ctx:OK
Summary: 1/7 PASSED, 0 SKIPPED, 0 FAILED

This checks:
1/2: partial reads of ctx->{skb,state} are rejected
3. read access past sizeof(ctx) is rejected
4. write to ctx content, e.g. 'ctx->skb = NULL;' is rejected
5. ctx->state content cannot be altered
6. ctx->state and ctx->skb can be dereferenced
7. ... same program fails for unpriv (CAP_NET_ADMIN needed).

Link: https://lore.kernel.org/bpf/20230419021152.sjq4gttphzzy6b5f@dhcp-172-26-102-232.dhcp.thefacebook.com/
Link: https://lore.kernel.org/bpf/20230420201655.77kkgi3dh7fesoll@MacBook-Pro-6.local/Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-8-fw@strlen.deSigned-off-by: Alexei Starovoitov <ast@kernel.org>

006c0e44

bpf: add test_run support for netfilter program type · 2b99ef22

Florian Westphal authored Apr 21, 2023

add glue code so a bpf program can be run using userspace-provided
netfilter state and packet/skb.

Default is to use ipv4:output hook point, but this can be overridden by
userspace. Userspace provided netfilter state is restricted, only hook and
protocol families can be overridden and only to ipv4/ipv6.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-7-fw@strlen.deSigned-off-by: Alexei Starovoitov <ast@kernel.org>

2b99ef22

tools: bpftool: print netfilter link info · d0fe92fb

Florian Westphal authored Apr 21, 2023

Dump protocol family, hook and priority value:
$ bpftool link
2: netfilter  prog 14
        ip input prio -128
        pids install(3264)
5: netfilter  prog 14
        ip6 forward prio 21
        pids a.out(3387)
9: netfilter  prog 14
        ip prerouting prio 123
        pids a.out(5700)
10: netfilter  prog 14
        ip input prio 21
        pids test2(5701)

v2: Quentin Monnet suggested to also add 'bpftool net' support:

$ bpftool net
xdp:

tc:

flow_dissector:

netfilter:

        ip prerouting prio 21 prog_id 14
        ip input prio -128 prog_id 14
        ip input prio 21 prog_id 14
        ip forward prio 21 prog_id 14
        ip output prio 21 prog_id 14
        ip postrouting prio 21 prog_id 14

'bpftool net' only dumps netfilter link type, links are sorted by protocol
family, hook and priority.

v5: fix bpf ci failure: libbpf needs small update to prog_type_name[]
    and probe_prog_load helper.
v4: don't fail with -EOPNOTSUPP in libbpf probe_prog_load, update
    prog_type_name[] with "netfilter" entry (bpf ci)
v3: fix bpf.h copy, 'reserved' member was removed (Alexei)
    use p_err, not fprintf (Quentin)
Suggested-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/eeeaac99-9053-90c2-aa33-cc1ecb1ae9ca@isovalent.com/Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-6-fw@strlen.deSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d0fe92fb

netfilter: disallow bpf hook attachment at same priority · 0bdc6da8

Florian Westphal authored Apr 21, 2023

This is just to avoid ordering issues between multiple bpf programs,
this could be removed later in case it turns out to be too cautious.

bpf prog could still be shared with non-bpf hook, otherwise we'd have to
make conntrack hook registration fail just because a bpf program has
same priority.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-5-fw@strlen.deSigned-off-by: Alexei Starovoitov <ast@kernel.org>

0bdc6da8

netfilter: nfnetlink hook: dump bpf prog id · 506a74db

Florian Westphal authored Apr 21, 2023

This allows userspace ("nft list hooks") to show which bpf program
is attached to which hook.

Without this, user only knows bpf prog is attached at prio
x, y, z at INPUT and FORWARD, but can't tell which program is where.

v4: kdoc fixups (Simon Horman)

Link: https://lore.kernel.org/bpf/ZEELzpNCnYJuZyod@corigine.com/Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-4-fw@strlen.deSigned-off-by: Alexei Starovoitov <ast@kernel.org>

506a74db

bpf: minimal support for programs hooked into netfilter framework · fd9c663b

Florian Westphal authored Apr 21, 2023

This adds minimal support for BPF_PROG_TYPE_NETFILTER bpf programs
that will be invoked via the NF_HOOK() points in the ip stack.

Invocation incurs an indirect call. This is not a necessity: Its
possible to add 'DEFINE_BPF_DISPATCHER(nf_progs)' and handle the
program invocation with the same method already done for xdp progs.

This isn't done here to keep the size of this chunk down.

Verifier restricts verdicts to either DROP or ACCEPT.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-3-fw@strlen.deSigned-off-by: Alexei Starovoitov <ast@kernel.org>

fd9c663b

bpf: add bpf_link support for BPF_NETFILTER programs · 84601d6e

Florian Westphal authored Apr 21, 2023

Add bpf_link support skeleton.  To keep this reviewable, no bpf program
can be invoked yet, if a program is attached only a c-stub is called and
not the actual bpf program.

Defaults to 'y' if both netfilter and bpf syscall are enabled in kconfig.

Uapi example usage:
	union bpf_attr attr = { };

	attr.link_create.prog_fd = progfd;
	attr.link_create.attach_type = 0; /* unused */
	attr.link_create.netfilter.pf = PF_INET;
	attr.link_create.netfilter.hooknum = NF_INET_LOCAL_IN;
	attr.link_create.netfilter.priority = -128;

	err = bpf(BPF_LINK_CREATE, &attr, sizeof(attr));

... this would attach progfd to ipv4:input hook.

Such hook gets removed automatically if the calling program exits.

BPF_NETFILTER program invocation is added in followup change.

NF_HOOK_OP_BPF enum will eventually be read from nfnetlink_hook, it
allows to tell userspace which program is attached at the given hook
when user runs 'nft hook list' command rather than just the priority
and not-very-helpful 'this hook runs a bpf prog but I can't tell which
one'.

Will also be used to disallow registration of two bpf programs with
same priority in a followup patch.

v4: arm32 cmpxchg only supports 32bit operand
    s/prio/priority/
v3: restrict prog attachment to ip/ip6 for now, lets lift restrictions if
    more use cases pop up (arptables, ebtables, netdev ingress/egress etc).
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-2-fw@strlen.deSigned-off-by: Alexei Starovoitov <ast@kernel.org>

84601d6e

bpftool: Update doc to explain struct_ops register subcommand. · 45cea721

Kui-Feng Lee authored Apr 19, 2023

The "struct_ops register" subcommand now allows for an optional *LINK_DIR*
to be included. This specifies the directory path where bpftool will pin
struct_ops links with the same name as their corresponding map names.
Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/r/20230420002822.345222-2-kuifeng@meta.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

45cea721

bpftool: Register struct_ops with a link. · 0232b788

Kui-Feng Lee authored Apr 19, 2023

You can include an optional path after specifying the object name for the
'struct_ops register' subcommand.

Since the commit 226bc6ae ("Merge branch 'Transit between BPF TCP
congestion controls.'") has been accepted, it is now possible to create a
link for a struct_ops. This can be done by defining a struct_ops in
SEC(".struct_ops.link") to make libbpf returns a real link. If we don't pin
the links before leaving bpftool, they will disappear. To instruct bpftool
to pin the links in a directory with the names of the maps, we need to
provide the path of that directory.
Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/r/20230420002822.345222-1-kuifeng@meta.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

0232b788

selftests/bpf: Verify optval=NULL case · 833d67ec

Stanislav Fomichev authored Apr 18, 2023

Make sure we get optlen exported instead of getting EFAULT.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230418225343.553806-3-sdf@google.com

833d67ec