Commits · b3fce974d4239bd46ae81bba07b59f255eb979d7 · Kirill Smelkov / linux

22 Jul, 2022 40 commits

Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · b3fce974

Jakub Kicinski authored Jul 22, 2022

Daniel Borkmann says:

====================
bpf-next 2022-07-22

We've added 73 non-merge commits during the last 12 day(s) which contain
a total of 88 files changed, 3458 insertions(+), 860 deletions(-).

The main changes are:

1) Implement BPF trampoline for arm64 JIT, from Xu Kuohai.

2) Add ksyscall/kretsyscall section support to libbpf to simplify tracing kernel
   syscalls through kprobe mechanism, from Andrii Nakryiko.

3) Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel
   function, from Song Liu & Jiri Olsa.

4) Add new kfunc infrastructure for netfilter's CT e.g. to insert and change
   entries, from Kumar Kartikeya Dwivedi & Lorenzo Bianconi.

5) Add a ksym BPF iterator to allow for more flexible and efficient interactions
   with kernel symbols, from Alan Maguire.

6) Bug fixes in libbpf e.g. for uprobe binary path resolution, from Dan Carpenter.

7) Fix BPF subprog function names in stack traces, from Alexei Starovoitov.

8) libbpf support for writing custom perf event readers, from Jon Doron.

9) Switch to use SPDX tag for BPF helper man page, from Alejandro Colomar.

10) Fix xsk send-only sockets when in busy poll mode, from Maciej Fijalkowski.

11) Reparent BPF maps and their charging on memcg offlining, from Roman Gushchin.

12) Multiple follow-up fixes around BPF lsm cgroup infra, from Stanislav Fomichev.

13) Use bootstrap version of bpftool where possible to speed up builds, from Pu Lehui.

14) Cleanup BPF verifier's check_func_arg() handling, from Joanne Koong.

15) Make non-prealloced BPF map allocations low priority to play better with
    memcg limits, from Yafang Shao.

16) Fix BPF test runner to reject zero-length data for skbs, from Zhengchao Shao.

17) Various smaller cleanups and improvements all over the place.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (73 commits)
  bpf: Simplify bpf_prog_pack_[size|mask]
  bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)
  bpf, x64: Allow to use caller address from stack
  ftrace: Allow IPMODIFY and DIRECT ops on the same function
  ftrace: Add modify_ftrace_direct_multi_nolock
  bpf/selftests: Fix couldn't retrieve pinned program in xdp veth test
  bpf: Fix build error in case of !CONFIG_DEBUG_INFO_BTF
  selftests/bpf: Fix test_verifier failed test in unprivileged mode
  selftests/bpf: Add negative tests for new nf_conntrack kfuncs
  selftests/bpf: Add tests for new nf_conntrack kfuncs
  selftests/bpf: Add verifier tests for trusted kfunc args
  net: netfilter: Add kfuncs to set and change CT status
  net: netfilter: Add kfuncs to set and change CT timeout
  net: netfilter: Add kfuncs to allocate and insert CT
  net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup
  bpf: Add documentation for kfuncs
  bpf: Add support for forcing kfunc args to be trusted
  bpf: Switch to new kfunc flags infrastructure
  tools/resolve_btfids: Add support for 8-byte BTF sets
  bpf: Introduce 8-byte BTF set
  ...
====================

Link: https://lore.kernel.org/r/20220722221218.29943-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b3fce974

Merge branch 'io_uring-zerocopy-send' of git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux · 3c47fb2f
Jakub Kicinski authored Jul 22, 2022
```
Pull in Pavel's patch from a shared branch.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
```
3c47fb2f

net: fix uninitialised msghdr->sg_from_iter · 2829a267

Pavel Begunkov authored Jul 21, 2022

Because of how struct msghdr is usually initialised some fields and
sg_from_iter in particular might be left out not initialised, so we
can't safely use it in __zerocopy_sg_from_iter().

For now use the callback only when there is ->msg_ubuf set relying on
the fact that they're used together and we properly zero ->msg_ubuf.

Fixes: ebe73a28 ("net: Allow custom iter handler in msghdr")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Message-Id: <ce8b68b41351488f79fd998b032b3c56e9b1cc6c.1658401817.git.asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2829a267

mlxsw: core: Fix use-after-free calling devl_unlock() in mlxsw_core_bus_device_unregister() · 1b5995e3

Jiri Pirko authored Jul 21, 2022

Do devl_unlock() before freeing the devlink in
mlxsw_core_bus_device_unregister() function.
Reported-by: Ido Schimmel <idosch@nvidia.com>
Fixes: 72a4c8c9 ("mlxsw: convert driver to use unlocked devlink API during init/fini")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220721142424.3975704-1-jiri@resnulli.usSigned-off-by: Jakub Kicinski <kuba@kernel.org>

1b5995e3

bpf: Simplify bpf_prog_pack_[size|mask] · ea2babac

Song Liu authored Jul 13, 2022

Simplify the logic that selects bpf_prog_pack_size, and always use
(PMD_SIZE * num_possible_nodes()). This is a good tradeoff, as most of
the performance benefit observed is from less direct map fragmentation [0].

Also, module_alloc(4MB) may not allocate 4MB aligned memory. Therefore,
we cannot use (ptr & bpf_prog_pack_mask) to find the correct address of
bpf_prog_pack. Fix this by checking the header address falls in the range
of pack->ptr and (pack->ptr + bpf_prog_pack_size).

  [0] https://lore.kernel.org/bpf/20220707223546.4124919-1-song@kernel.org/Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20220713204950.3015201-1-song@kernel.org

ea2babac

bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch) · 00963a2e

Song Liu authored Jul 19, 2022

When tracing a function with IPMODIFY ftrace_ops (livepatch), the bpf
trampoline must follow the instruction pointer saved on stack. This needs
extra handling for bpf trampolines with BPF_TRAMP_F_CALL_ORIG flag.

Implement bpf_tramp_ftrace_ops_func and use it for the ftrace_ops used
by BPF trampoline. This enables tracing functions with livepatch.

This also requires moving bpf trampoline to *_ftrace_direct_mult APIs.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-5-song@kernel.org

00963a2e

bpf, x64: Allow to use caller address from stack · 316cba62

Jiri Olsa authored Jul 19, 2022

Currently we call the original function by using the absolute address
given at the JIT generation. That's not usable when having trampoline
attached to multiple functions, or the target address changes dynamically
(in case of live patch). In such cases we need to take the return address
from the stack.

Adding support to retrieve the original function address from the stack
by adding new BPF_TRAMP_F_ORIG_STACK flag for arch_prepare_bpf_trampoline
function.

Basically we take the return address of the 'fentry' call:

   function + 0: call fentry    # stores 'function + 5' address on stack
   function + 5: ...

The 'function + 5' address will be used as the address for the
original function to call.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220720002126.803253-4-song@kernel.org

316cba62

ftrace: Allow IPMODIFY and DIRECT ops on the same function · 53cd885b

Song Liu authored Jul 19, 2022

IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.

First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().

Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.

If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.

If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.

For more details, please refer to comment before enum ftrace_ops_cmd.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/
Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/
Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org

53cd885b

ftrace: Add modify_ftrace_direct_multi_nolock · f96f644a

Song Liu authored Jul 19, 2022

This is similar to modify_ftrace_direct_multi, but does not acquire
direct_mutex. This is useful when direct_mutex is already locked by the
user.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/bpf/20220720002126.803253-2-song@kernel.org

f96f644a

bpf/selftests: Fix couldn't retrieve pinned program in xdp veth test · f664f9c6

Jie2x Zhou authored Jul 19, 2022

Before change:

  selftests: bpf: test_xdp_veth.sh
  Couldn't retrieve pinned program '/sys/fs/bpf/test_xdp_veth/progs/redirect_map_0': No such file or directory
  selftests: xdp_veth [SKIP]
  ok 20 selftests: bpf: test_xdp_veth.sh # SKIP

After change:

  PING 10.1.1.33 (10.1.1.33) 56(84) bytes of data.
  64 bytes from 10.1.1.33: icmp_seq=1 ttl=64 time=0.320 ms
  --- 10.1.1.33 ping statistics ---
  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 0.320/0.320/0.320/0.000 ms
  selftests: xdp_veth [PASS]

For the test case, the following can be found:

  ls /sys/fs/bpf/test_xdp_veth/progs/redirect_map_0
  ls: cannot access '/sys/fs/bpf/test_xdp_veth/progs/redirect_map_0': No such file or directory
  ls /sys/fs/bpf/test_xdp_veth/progs/
  xdp_redirect_map_0  xdp_redirect_map_1  xdp_redirect_map_2
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jie2x Zhou <jie2x.zhou@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220719082430.9916-1-jie2x.zhou@intel.com

f664f9c6

bpf: Fix build error in case of !CONFIG_DEBUG_INFO_BTF · e4234143

Kumar Kartikeya Dwivedi authored Jul 22, 2022

BTF_ID_FLAGS macro needs to be able to take 0 or 1 args, so make it a
variable argument. BTF_SET8_END is incorrect, it should just be empty.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: ab21d606 ("bpf: Introduce 8-byte BTF set")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220722113605.6513-1-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

e4234143

net: add missing includes and forward declarations under net/ · 949d6b40

Jakub Kicinski authored Jul 20, 2022

This patch adds missing includes to headers under include/net.
All these problems are currently masked by the existing users
including the missing dependency before the broken header.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

949d6b40

Merge branch 'sfc-E100-VF-respresenters' · 735dbc69

David S. Miller authored Jul 22, 2022

Edward Cree says:

====================
sfc: VF representors for EF100

This series adds representor netdevices for EF100 VFs, as a step towards
 supporting TC offload and vDPA usecases in future patches.
In this first series is basic netdevice creation and packet TX; the
 following series will add the RX path.

v3: dropped massive mcdi_pcol.h patch which was applied separately.
v2: converted comments on struct efx_nic members added in patch #4 to
 kernel-doc (Jakub).  While at it, also gave struct efx_rep its own kdoc
 since several members had comments on them.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

735dbc69

sfc: attach/detach EF100 representors along with their owning PF · 84e7fc25

Edward Cree authored Jul 20, 2022

Since representors piggy-back on the PF's queues for TX, they can
 only accept new TXes while the PF is up.  Thus, any operation which
 detaches the PF must first detach all its VFreps.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

84e7fc25

sfc: hook up ef100 representor TX · f72c38fa

Edward Cree authored Jul 20, 2022

Implement .ndo_start_xmit() by calling into the parent PF's TX path.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f72c38fa

sfc: support passing a representor to the EF100 TX path · 02443ab8

Edward Cree authored Jul 20, 2022

A non-null efv in __ef100_enqueue_skb() indicates that the packet is
from that representor, should be transmitted with a suitable option
descriptor (to instruct the switch to deliver it to the representee),
and should not be accounted to the parent PF's stats or BQL.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

02443ab8

sfc: determine representee m-port for EF100 representors · da56552d

Edward Cree authored Jul 20, 2022

An MAE port, or m-port, is a port (source/destination for traffic) on
 the Match-Action Engine (the internal switch on EF100).
Representors will use their representee's m-port for two purposes: as
 a destination override on TX from the representor, and as a source
 match in 'default rules' to steer representee traffic (when not
 matched by e.g. a TC flower rule) to representor RX via the parent
 PF's receive queue.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

da56552d

sfc: phys port/switch identification for ef100 reps · e1479556

Edward Cree authored Jul 20, 2022

Requires storing VF index in struct efx_rep.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1479556

sfc: add basic ethtool ops to ef100 reps · 5687eb34

Edward Cree authored Jul 20, 2022

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5687eb34

sfc: add skeleton ef100 VF representors · 08135eec

Edward Cree authored Jul 20, 2022

No net_device_ops yet, just a placeholder netdev created per VF.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

08135eec

sfc: detect ef100 MAE admin privilege/capability at probe time · 95287e1b

Edward Cree authored Jul 20, 2022

One PCIe function per network port (more precisely, per m-port group) is
 responsible for configuring the Match-Action Engine which performs
 switching and packet modification in the slice to support flower/OVS
 offload.  The GRP_MAE bit in the privilege mask indicates whether a
 given function has this capability.
At probe time, call MCDIs to read the calling function's privilege mask,
 and store the GRP_MAE bit in a new ef100_nic_data member.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

95287e1b

sfc: update EF100 register descriptions · 8ca353da

Edward Cree authored Jul 20, 2022

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8ca353da

ping: support ipv6 ping socket flow labels · 16576a03

Alan Brady authored Jul 20, 2022

Ping sockets don't appear to make any attempt to preserve flow labels
created and set by userspace using IPV6_FLOWINFO_SEND. Instead they are
clobbered by autolabels (if enabled) or zero.

Grab the flowlabel out of the msghdr similar to how rawv6_sendmsg does
it and move the memset up so it doesn't get zeroed after.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

16576a03

net: marvell: prestera: use netif_is_any_bridge_port instead of open code · c497885e

Juhee Kang authored Jul 21, 2022

The open code which is netif_is_bridge_port() || netif_is_ovs_port() is
defined as a new helper function on netdev.h like netif_is_any_bridge_port
that can check both IFF flags in 1 go. So use netif_is_any_bridge_port()
function instead of open code. This patch doesn't change logic.
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c497885e

mlxsw: use netif_is_any_bridge_port() instead of open code · 59ad2471

Juhee Kang authored Jul 21, 2022

The open code which is netif_is_bridge_port() || netif_is_ovs_port() is
defined as a new helper function on netdev.h like netif_is_any_bridge_port
that can check both IFF flags in 1 go. So use netif_is_any_bridge_port()
function instead of open code. This patch doesn't change logic.
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

59ad2471

Merge branch 'New nf_conntrack kfuncs for insertion, changing timeout, status' · ac7ac432

Alexei Starovoitov authored Jul 21, 2022

Kumar Kartikeya Dwivedi says:

====================

Introduce the following new kfuncs:
 - bpf_{xdp,skb}_ct_alloc
 - bpf_ct_insert_entry
 - bpf_ct_{set,change}_timeout
 - bpf_ct_{set,change}_status

The setting of timeout and status on allocated or inserted/looked up CT
is same as the ctnetlink interface, hence code is refactored and shared
with the kfuncs. It is ensured allocated CT cannot be passed to kfuncs
that expected inserted CT, and vice versa. Please see individual patches
for details.

Changelog:
----------
v6 -> v7:
v6: https://lore.kernel.org/bpf/20220719132430.19993-1-memxor@gmail.com

 * Use .long to encode flags (Alexei)
 * Fix description of KF_RET_NULL in documentation (Toke)

v5 -> v6:
v5: https://lore.kernel.org/bpf/20220623192637.3866852-1-memxor@gmail.com

 * Introduce kfunc flags, rework verifier to work with them
 * Add documentation for kfuncs
 * Add comment explaining TRUSTED_ARGS kfunc flag (Alexei)
 * Fix missing offset check for trusted arguments (Alexei)
 * Change nf_conntrack test minimum delta value to 8

v4 -> v5:
v4: https://lore.kernel.org/bpf/cover.1653600577.git.lorenzo@kernel.org

 * Drop read-only PTR_TO_BTF_ID approach, use struct nf_conn___init (Alexei)
 * Drop acquire release pair code that is no longer required (Alexei)
 * Disable writes into nf_conn, use dedicated helpers (Florian, Alexei)
 * Refactor and share ctnetlink code for setting timeout and status
 * Do strict type matching on finding __ref suffix on argument to
   prevent passing nf_conn___init as nf_conn (offset = 0, match on walk)
 * Remove bpf_ct_opts parameter from bpf_ct_insert_entry
 * Update selftests for new additions, add more negative tests

v3 -> v4:
v3: https://lore.kernel.org/bpf/cover.1652870182.git.lorenzo@kernel.org

 * split bpf_xdp_ct_add in bpf_xdp_ct_alloc/bpf_skb_ct_alloc and
   bpf_ct_insert_entry
 * add verifier code to properly populate/configure ct entry
 * improve selftests

v2 -> v3:
v2: https://lore.kernel.org/bpf/cover.1652372970.git.lorenzo@kernel.org

 * add bpf_xdp_ct_add and bpf_ct_refresh_timeout kfunc helpers
 * remove conntrack dependency from selftests
 * add support for forcing kfunc args to be referenced and related selftests

v1 -> v2:
v1: https://lore.kernel.org/bpf/1327f8f5696ff2bc60400e8f3b79047914ccc837.1651595019.git.lorenzo@kernel.org

 * add bpf_ct_refresh_timeout kfunc selftest

Kumar Kartikeya Dwivedi (10):
  bpf: Introduce 8-byte BTF set
  tools/resolve_btfids: Add support for 8-byte BTF sets
  bpf: Switch to new kfunc flags infrastructure
  bpf: Add support for forcing kfunc args to be trusted
  bpf: Add documentation for kfuncs
  net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup
  net: netfilter: Add kfuncs to set and change CT timeout
  selftests/bpf: Add verifier tests for trusted kfunc args
  selftests/bpf: Add negative tests for new nf_conntrack kfuncs
  selftests/bpf: Fix test_verifier failed test in unprivileged mode
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

ac7ac432

selftests/bpf: Fix test_verifier failed test in unprivileged mode · e3fa4735

Kumar Kartikeya Dwivedi authored Jul 21, 2022

Loading the BTF won't be permitted without privileges, hence only test
for privileged mode by setting the prog type. This makes the
test_verifier show 0 failures when unprivileged BPF is enabled.

Fixes: 41188e9e ("selftest/bpf: Test for use-after-free bug fix in inline_bpf_loop")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-14-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

e3fa4735

selftests/bpf: Add negative tests for new nf_conntrack kfuncs · c6f420ac

Kumar Kartikeya Dwivedi authored Jul 21, 2022

Test cases we care about and ensure improper usage is caught and
rejected by the verifier.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-13-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

c6f420ac

selftests/bpf: Add tests for new nf_conntrack kfuncs · 6eb7fba0

Lorenzo Bianconi authored Jul 21, 2022

Introduce selftests for the following kfunc helpers:
- bpf_xdp_ct_alloc
- bpf_skb_ct_alloc
- bpf_ct_insert_entry
- bpf_ct_set_timeout
- bpf_ct_change_timeout
- bpf_ct_set_status
- bpf_ct_change_status
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-12-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

6eb7fba0

selftests/bpf: Add verifier tests for trusted kfunc args · 8dd5e756

Kumar Kartikeya Dwivedi authored Jul 21, 2022

Make sure verifier rejects the bad cases and ensure the good case keeps
working. The selftests make use of the bpf_kfunc_call_test_ref kfunc
added in the previous patch only for verification.
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-11-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

8dd5e756

net: netfilter: Add kfuncs to set and change CT status · ef69aa3a

Lorenzo Bianconi authored Jul 21, 2022

Introduce bpf_ct_set_status and bpf_ct_change_status kfunc helpers in
order to set nf_conn field of allocated entry or update nf_conn status
field of existing inserted entry. Use nf_ct_change_status_common to
share the permitted status field changes between netlink and BPF side
by refactoring ctnetlink_change_status.

It is required to introduce two kfuncs taking nf_conn___init and nf_conn
instead of sharing one because KF_TRUSTED_ARGS flag causes strict type
checking. This would disallow passing nf_conn___init to kfunc taking
nf_conn, and vice versa. We cannot remove the KF_TRUSTED_ARGS flag as we
only want to accept refcounted pointers and not e.g. ct->master.

Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and
bpf_ct_change_* kfuncs are meant to be used on inserted or looked up
CT entry.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-10-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

ef69aa3a

net: netfilter: Add kfuncs to set and change CT timeout · 0b389236

Kumar Kartikeya Dwivedi authored Jul 21, 2022

Introduce bpf_ct_set_timeout and bpf_ct_change_timeout kfunc helpers in
order to change nf_conn timeout. This is same as ctnetlink_change_timeout,
hence code is shared between both by extracting it out to
__nf_ct_change_timeout. It is also updated to return an error when it
sees IPS_FIXED_TIMEOUT_BIT bit in ct->status, as that check was missing.

Apart from this, bpf_ct_set_timeout is only called for newly allocated
CT so it doesn't need to inspect the status field just yet. Sharing the
helpers even if it was possible would make timeout setting helper
sensitive to order of setting status and timeout after allocation.

Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and
bpf_ct_change_* kfuncs are meant to be used on inserted or looked up
CT entry.
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-9-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

0b389236

net: netfilter: Add kfuncs to allocate and insert CT · d7e79c97

Lorenzo Bianconi authored Jul 21, 2022

Introduce bpf_xdp_ct_alloc, bpf_skb_ct_alloc and bpf_ct_insert_entry
kfuncs in order to insert a new entry from XDP and TC programs.
Introduce bpf_nf_ct_tuple_parse utility routine to consolidate common
code.

We extract out a helper __nf_ct_set_timeout, used by the ctnetlink and
nf_conntrack_bpf code, extract it out to nf_conntrack_core, so that
nf_conntrack_bpf doesn't need a dependency on CONFIG_NF_CT_NETLINK.
Later this helper will be reused as a helper to set timeout of allocated
but not yet inserted CT entry.

The allocation functions return struct nf_conn___init instead of
nf_conn, to distinguish allocated CT from an already inserted or looked
up CT. This is later used to enforce restrictions on what kfuncs
allocated CT can be used with.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-8-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d7e79c97

net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup · aed8ee7f

Kumar Kartikeya Dwivedi authored Jul 21, 2022

Move common checks inside the common function, and maintain the only
difference the two being how to obtain the struct net * from ctx.
No functional change intended.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-7-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

aed8ee7f

bpf: Add documentation for kfuncs · 63e564eb

Kumar Kartikeya Dwivedi authored Jul 21, 2022

As the usage of kfuncs grows, we are starting to form consensus on the
kinds of attributes and annotations that kfuncs can have. To better help
developers make sense of the various options available at their disposal
to present an unstable API to the BPF users, document the various kfunc
flags and annotations, their expected usage, and explain the process of
defining and registering a kfunc set.

Cc: KP Singh <kpsingh@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-6-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

63e564eb

bpf: Add support for forcing kfunc args to be trusted · 56e948ff

Kumar Kartikeya Dwivedi authored Jul 21, 2022

Teach the verifier to detect a new KF_TRUSTED_ARGS kfunc flag, which
means each pointer argument must be trusted, which we define as a
pointer that is referenced (has non-zero ref_obj_id) and also needs to
have its offset unchanged, similar to how release functions expect their
argument. This allows a kfunc to receive pointer arguments unchanged
from the result of the acquire kfunc.

This is required to ensure that kfunc that operate on some object only
work on acquired pointers and not normal PTR_TO_BTF_ID with same type
which can be obtained by pointer walking. The restrictions applied to
release arguments also apply to trusted arguments. This implies that
strict type matching (not deducing type by recursively following members
at offset) and OBJ_RELEASE offset checks (ensuring they are zero) are
used for trusted pointer arguments.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-5-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

56e948ff

bpf: Switch to new kfunc flags infrastructure · a4703e31

Kumar Kartikeya Dwivedi authored Jul 21, 2022

Instead of populating multiple sets to indicate some attribute and then
researching the same BTF ID in them, prepare a single unified BTF set
which indicates whether a kfunc is allowed to be called, and also its
attributes if any at the same time. Now, only one call is needed to
perform the lookup for both kfunc availability and its attributes.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-4-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

a4703e31

tools/resolve_btfids: Add support for 8-byte BTF sets · ef2c6f37

Kumar Kartikeya Dwivedi authored Jul 21, 2022

A flag is a 4-byte symbol that may follow a BTF ID in a set8. This is
used in the kernel to tag kfuncs in BTF sets with certain flags. Add
support to adjust the sorting code so that it passes size as 8 bytes
for 8-byte BTF sets.

Cc: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220721134245.2450-3-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

ef2c6f37

bpf: Introduce 8-byte BTF set · ab21d606

Kumar Kartikeya Dwivedi authored Jul 21, 2022

Introduce support for defining flags for kfuncs using a new set of
macros, BTF_SET8_START/BTF_SET8_END, which define a set which contains
8 byte elements (each of which consists of a pair of BTF ID and flags),
using a new BTF_ID_FLAGS macro.

This will be used to tag kfuncs registered for a certain program type
as acquire, release, sleepable, ret_null, etc. without having to create
more and more sets which was proving to be an unscalable solution.

Now, when looking up whether a kfunc is allowed for a certain program,
we can also obtain its kfunc flags in the same call and avoid further
lookups.

The resolve_btfids change is split into a separate patch.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-2-memxor@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

ab21d606

net: ipv6: avoid accepting values greater than 2 for accept_untracked_na · b66eb3a6

Jaehee Park authored Jul 20, 2022

The accept_untracked_na sysctl changed from a boolean to an integer
when a new knob '2' was added. This patch provides a safeguard to avoid
accepting values that are not defined in the sysctl. When setting a
value greater than 2, the user will get an 'invalid argument' warning.

Fixes: aaa5f515 ("net: ipv6: new accept_untracked_na option to accept na only if in-network")
Signed-off-by: Jaehee Park <jhpark1013@gmail.com>
Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Suggested-by: Roopa Prabhu <roopa@nvidia.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20220720183632.376138-1-jhpark1013@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b66eb3a6