Commits · a77c2cfd4ee4bfb5267653dad3de8ec45a58b0b7 · Kirill Smelkov / linux

11 Mar, 2022 10 commits

Merge branch 'bpf-lsm: Extend interoperability with IMA' · a77c2cfd

Alexei Starovoitov authored Mar 10, 2022

Roberto Sassu says:

====================
Extend the interoperability with IMA, to give wider flexibility for the
implementation of integrity-focused LSMs based on eBPF.

Patch 1 fixes some style issues.

Patches 2-6 give the ability to eBPF-based LSMs to take advantage of the
measurement capability of IMA without needing to setup a policy in IMA
(those LSMs might implement the policy capability themselves).

Patches 7-9 allow eBPF-based LSMs to evaluate files read by the kernel.

Changelog

v2:
- Add better description to patch 1 (suggested by Shuah)
- Recalculate digest if it is not fresh (when IMA_COLLECTED flag not set)
- Move declaration of bpf_ima_file_hash() at the end (suggested by
  Yonghong)
- Add tests to check if the digest has been recalculated
- Add deny test for bpf_kernel_read_file()
- Add description to tests

v1:
- Modify ima_file_hash() only and allow the usage of the function with the
  modified behavior by eBPF-based LSMs through the new function
  bpf_ima_file_hash() (suggested by Mimi)
- Make bpf_lsm_kernel_read_file() sleepable so that bpf_ima_inode_hash()
  and bpf_ima_file_hash() can be called inside the implementation of
  eBPF-based LSMs for this hook
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

a77c2cfd

selftests/bpf: Check that bpf_kernel_read_file() denies reading IMA policy · 7bae42b6

Roberto Sassu authored Mar 02, 2022

Check that bpf_kernel_read_file() denies the reading of an IMA policy, by
ensuring that ima_setup.sh exits with an error.
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220302111404.193900-10-roberto.sassu@huawei.com

7bae42b6

selftests/bpf: Add test for bpf_lsm_kernel_read_file() · e6dcf7bb

Roberto Sassu authored Mar 02, 2022

Test the ability of bpf_lsm_kernel_read_file() to call the sleepable
functions bpf_ima_inode_hash() or bpf_ima_file_hash() to obtain a
measurement of a loaded IMA policy.
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220302111404.193900-9-roberto.sassu@huawei.com

e6dcf7bb

bpf-lsm: Make bpf_lsm_kernel_read_file() as sleepable · df6b3039

Roberto Sassu authored Mar 02, 2022

Make bpf_lsm_kernel_read_file() as sleepable, so that bpf_ima_inode_hash()
or bpf_ima_file_hash() can be called inside the implementation of this
hook.
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220302111404.193900-8-roberto.sassu@huawei.com

df6b3039

selftests/bpf: Check if the digest is refreshed after a file write · 91e8fa25

Roberto Sassu authored Mar 02, 2022

Verify that bpf_ima_inode_hash() returns a non-fresh digest after a file
write, and that bpf_ima_file_hash() returns a fresh digest. Verification is
done by requesting the digest from the bprm_creds_for_exec hook, called
before ima_bprm_check().
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220302111404.193900-7-roberto.sassu@huawei.com

91e8fa25

selftests/bpf: Add test for bpf_ima_file_hash() · 27a77d0d

Roberto Sassu authored Mar 02, 2022

Add new test to ensure that bpf_ima_file_hash() returns the digest of the
executed files.
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220302111404.193900-6-roberto.sassu@huawei.com

27a77d0d

selftests/bpf: Move sample generation code to ima_test_common() · 2746de3c

Roberto Sassu authored Mar 02, 2022

Move sample generator code to ima_test_common() so that the new function
can be called by multiple LSM hooks.
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220302111404.193900-5-roberto.sassu@huawei.com

2746de3c

bpf-lsm: Introduce new helper bpf_ima_file_hash() · 174b1694

Roberto Sassu authored Mar 02, 2022

ima_file_hash() has been modified to calculate the measurement of a file on
demand, if it has not been already performed by IMA or the measurement is
not fresh. For compatibility reasons, ima_inode_hash() remains unchanged.

Keep the same approach in eBPF and introduce the new helper
bpf_ima_file_hash() to take advantage of the modified behavior of
ima_file_hash().
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220302111404.193900-4-roberto.sassu@huawei.com

174b1694

ima: Always return a file measurement in ima_file_hash() · 280fe836

Roberto Sassu authored Mar 02, 2022

__ima_inode_hash() checks if a digest has been already calculated by
looking for the integrity_iint_cache structure associated to the passed
inode.

Users of ima_file_hash() (e.g. eBPF) might be interested in obtaining the
information without having to setup an IMA policy so that the digest is
always available at the time they call this function.

In addition, they likely expect the digest to be fresh, e.g. recalculated
by IMA after a file write. Although getting the digest from the
bprm_committed_creds hook (as in the eBPF test) ensures that the digest is
fresh, as the IMA hook is executed before that hook, this is not always the
case (e.g. for the mmap_file hook).

Call ima_collect_measurement() in __ima_inode_hash(), if the file
descriptor is available (passed by ima_file_hash()) and the digest is not
available/not fresh, and store the file measurement in a temporary
integrity_iint_cache structure.

This change does not cause memory usage increase, due to using the
temporary integrity_iint_cache structure, and due to freeing the
ima_digest_data structure inside integrity_iint_cache before exiting from
__ima_inode_hash().

For compatibility reasons, the behavior of ima_inode_hash() remains
unchanged.
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20220302111404.193900-3-roberto.sassu@huawei.com

280fe836

ima: Fix documentation-related warnings in ima_main.c · bae60eef

Roberto Sassu authored Mar 02, 2022

Fix the following warnings in ima_main.c, displayed with W=n make argument:

security/integrity/ima/ima_main.c:432: warning: Function parameter or
                          member 'vma' not described in 'ima_file_mprotect'
security/integrity/ima/ima_main.c:636: warning: Function parameter or
                  member 'inode' not described in 'ima_post_create_tmpfile'
security/integrity/ima/ima_main.c:636: warning: Excess function parameter
                            'file' description in 'ima_post_create_tmpfile'
security/integrity/ima/ima_main.c:843: warning: Function parameter or
                     member 'load_id' not described in 'ima_post_load_data'
security/integrity/ima/ima_main.c:843: warning: Excess function parameter
                                   'id' description in 'ima_post_load_data'

Also, fix some style issues in the description of ima_post_create_tmpfile()
and ima_post_path_mknod().
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20220302111404.193900-2-roberto.sassu@huawei.com

bae60eef

10 Mar, 2022 12 commits

bpftool: Ensure bytes_memlock json output is correct · 357b3cc3

Chris J Arges authored Mar 09, 2022

If a BPF map is created over 2^32 the memlock value as displayed in JSON
format will be incorrect. Use atoll instead of atoi so that the correct
number is displayed.

  ```
  $ bpftool map create /sys/fs/bpf/test_bpfmap type hash key 4 \
    value 1024 entries 4194304 name test_bpfmap
  $ bpftool map list
  1: hash  name test_bpfmap  flags 0x0
          key 4B  value 1024B  max_entries 4194304  memlock 4328521728B
  $ sudo bpftool map list -j | jq .[].bytes_memlock
  33554432
  ```
Signed-off-by: Chris J Arges <carges@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/b6601087-0b11-33cc-904a-1133d1500a10@cloudflare.com

357b3cc3

bpf: Use offsetofend() to simplify macro definition · 1b773d00

Yuntao Wang authored Mar 11, 2022

Use offsetofend() instead of offsetof() + sizeof() to simplify
MIN_BPF_LINEINFO_SIZE macro definition.
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/bpf/20220310161518.534544-1-ytcoode@gmail.com

1b773d00

bpf: Fix comment for helper bpf_current_task_under_cgroup() · 58617014

Hengqi Chen authored Mar 10, 2022

Fix the descriptions of the return values of helper bpf_current_task_under_cgroup().

Fixes: c6b5fb86 ("bpf: add documentation for eBPF helpers (42-50)")
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220310155335.1278783-1-hengqi.chen@gmail.com

58617014

Merge branch 'bpf-tstamp-follow-ups' · 60695896

Daniel Borkmann authored Mar 10, 2022

Martin KaFai Lau says:

====================
This set is a follow up on the bpf side based on discussion [0].

Patch 1 is to remove some skbuff macros that are used in bpf filter.c.

Patch 2 and 3 are to simplify the bpf insn rewrite on __sk_buff->tstamp.

Patch 4 is to simplify the bpf uapi by modeling the __sk_buff->tstamp
and __sk_buff->tstamp_type (was delivery_time_type) the same as its kernel
counter part skb->tstamp and skb->mono_delivery_time.

Patch 5 is to adjust the bpf selftests due to changes in patch 4.

  [0]: https://lore.kernel.org/bpf/419d994e-ff61-7c11-0ec7-11fefcb0186e@iogearbox.net/
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

60695896

bpf: selftests: Update tests after s/delivery_time/tstamp/ change in bpf.h · 3daf0896

Martin KaFai Lau authored Mar 09, 2022

The previous patch made the follow changes:
- s/delivery_time_type/tstamp_type/
- s/bpf_skb_set_delivery_time/bpf_skb_set_tstamp/
- BPF_SKB_DELIVERY_TIME_* to BPF_SKB_TSTAMP_*

This patch is to change the test_tc_dtime.c to reflect the above.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220309090515.3712742-1-kafai@fb.com

3daf0896

bpf: Remove BPF_SKB_DELIVERY_TIME_NONE and rename s/delivery_time_/tstamp_/ · 9bb984f2

Martin KaFai Lau authored Mar 09, 2022

This patch is to simplify the uapi bpf.h regarding to the tstamp type
and use a similar way as the kernel to describe the value stored
in __sk_buff->tstamp.

My earlier thought was to avoid describing the semantic and
clock base for the rcv timestamp until there is more clarity
on the use case, so the __sk_buff->delivery_time_type naming instead
of __sk_buff->tstamp_type.

With some thoughts, it can reuse the UNSPEC naming.  This patch first
removes BPF_SKB_DELIVERY_TIME_NONE and also

rename BPF_SKB_DELIVERY_TIME_UNSPEC to BPF_SKB_TSTAMP_UNSPEC
and    BPF_SKB_DELIVERY_TIME_MONO   to BPF_SKB_TSTAMP_DELIVERY_MONO.

The semantic of BPF_SKB_TSTAMP_DELIVERY_MONO is the same:
__sk_buff->tstamp has delivery time in mono clock base.

BPF_SKB_TSTAMP_UNSPEC means __sk_buff->tstamp has the (rcv)
tstamp at ingress and the delivery time at egress.  At egress,
the clock base could be found from skb->sk->sk_clockid.
__sk_buff->tstamp == 0 naturally means NONE, so NONE is not needed.

With BPF_SKB_TSTAMP_UNSPEC for the rcv tstamp at ingress,
the __sk_buff->delivery_time_type is also renamed to __sk_buff->tstamp_type
which was also suggested in the earlier discussion:
https://lore.kernel.org/bpf/b181acbe-caf8-502d-4b7b-7d96b9fc5d55@iogearbox.net/

The above will then make __sk_buff->tstamp and __sk_buff->tstamp_type
the same as its kernel skb->tstamp and skb->mono_delivery_time
counter part.

The internal kernel function bpf_skb_convert_dtime_type_read() is then
renamed to bpf_skb_convert_tstamp_type_read() and it can be simplified
with the BPF_SKB_DELIVERY_TIME_NONE gone.  A BPF_ALU32_IMM(BPF_AND)
insn is also saved by using BPF_JMP32_IMM(BPF_JSET).

The bpf helper bpf_skb_set_delivery_time() is also renamed to
bpf_skb_set_tstamp().  The arg name is changed from dtime
to tstamp also.  It only allows setting tstamp 0 for
BPF_SKB_TSTAMP_UNSPEC and it could be relaxed later
if there is use case to change mono delivery time to
non mono.

prog->delivery_time_access is also renamed to prog->tstamp_type_access.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220309090509.3712315-1-kafai@fb.com

9bb984f2

bpf: Simplify insn rewrite on BPF_WRITE __sk_buff->tstamp · 9d90db97

Martin KaFai Lau authored Mar 09, 2022

BPF_JMP32_IMM(BPF_JSET) is used to save a BPF_ALU32_IMM(BPF_AND).

The skb->tc_at_ingress and skb->mono_delivery_time are at the same
offset, so only one BPF_LDX_MEM(BPF_B) is needed.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220309090502.3711982-1-kafai@fb.com

9d90db97

bpf: Simplify insn rewrite on BPF_READ __sk_buff->tstamp · 539de932

Martin KaFai Lau authored Mar 09, 2022

The skb->tc_at_ingress and skb->mono_delivery_time are at the same
byte offset.  Thus, only one BPF_LDX_MEM(BPF_B) is needed
and both bits can be tested together.

/* BPF_READ: a = __sk_buff->tstamp */
if (skb->tc_at_ingress && skb->mono_delivery_time)
	a = 0;
else
	a = skb->tstamp;
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220309090456.3711530-1-kafai@fb.com

539de932

bpf: net: Remove TC_AT_INGRESS_OFFSET and SKB_MONO_DELIVERY_TIME_OFFSET macro · 3b5d4ddf

Martin KaFai Lau authored Mar 09, 2022

This patch removes the TC_AT_INGRESS_OFFSET and
SKB_MONO_DELIVERY_TIME_OFFSET macros. Instead, PKT_VLAN_PRESENT_OFFSET
is used because all of them are at the same offset. Comment is added to
make it clear that changing the position of tc_at_ingress or
mono_delivery_time will require to adjust the defined macros.

The earlier discussion can be found here:
https://lore.kernel.org/bpf/419d994e-ff61-7c11-0ec7-11fefcb0186e@iogearbox.net/Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220309090450.3710955-1-kafai@fb.com

3b5d4ddf

bpf, test_run: Use kvfree() for memory allocated with kvmalloc() · 743bec1b

Yihao Han authored Mar 10, 2022

It is allocated with kvmalloc(), the corresponding release function
should not be kfree(), use kvfree() instead.

Generated by: scripts/coccinelle/api/kfree_mismatch.cocci

Fixes: b530e9e1 ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Signed-off-by: Yihao Han <hanyihao@vivo.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20220310092828.13405-1-hanyihao@vivo.com

743bec1b

bpf: Initialise retval in bpf_prog_test_run_xdp() · eecbfd97

Toke Høiland-Jørgensen authored Mar 10, 2022

The kernel test robot pointed out that the newly added
bpf_test_run_xdp_live() runner doesn't set the retval in the caller (by
design), which means that the variable can be passed unitialised to
bpf_test_finish(). Fix this by initialising the variable properly.

Fixes: b530e9e1 ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220310110228.161869-1-toke@redhat.com

eecbfd97

bpftool: Restore support for BPF offload-enabled feature probing · f655c088

Niklas Söderlund authored Mar 10, 2022

Commit 1a56c18e ("bpftool: Stop supporting BPF offload-enabled
feature probing") removed the support to probe for BPF offload features.
This is still something that is useful for NFP NIC that can support
offloading of BPF programs.

The reason for the dropped support was that libbpf starting with v1.0
would drop support for passing the ifindex to the BPF prog/map/helper
feature probing APIs. In order to keep this useful feature for NFP
restore the functionality by moving it directly into bpftool.

The code restored is a simplified version of the code that existed in
libbpf which supposed passing the ifindex. The simplification is that it
only targets the cases where ifindex is given and call into libbpf for
the cases where it's not.

Before restoring support for probing offload features:

  # bpftool feature probe dev ens4np0
  Scanning system call availability...
  bpf() syscall is available

  Scanning eBPF program types...

  Scanning eBPF map types...

  Scanning eBPF helper functions...
  eBPF helpers supported for program type sched_cls:
  eBPF helpers supported for program type xdp:

  Scanning miscellaneous eBPF features...
  Large program size limit is NOT available
  Bounded loop support is NOT available
  ISA extension v2 is NOT available
  ISA extension v3 is NOT available

With support for probing offload features restored:

  # bpftool feature probe dev ens4np0
  Scanning system call availability...
  bpf() syscall is available

  Scanning eBPF program types...
  eBPF program_type sched_cls is available
  eBPF program_type xdp is available

  Scanning eBPF map types...
  eBPF map_type hash is available
  eBPF map_type array is available

  Scanning eBPF helper functions...
  eBPF helpers supported for program type sched_cls:
  	- bpf_map_lookup_elem
  	- bpf_get_prandom_u32
  	- bpf_perf_event_output
  eBPF helpers supported for program type xdp:
  	- bpf_map_lookup_elem
  	- bpf_get_prandom_u32
  	- bpf_perf_event_output
  	- bpf_xdp_adjust_head
  	- bpf_xdp_adjust_tail

  Scanning miscellaneous eBPF features...
  Large program size limit is NOT available
  Bounded loop support is NOT available
  ISA extension v2 is NOT available
  ISA extension v3 is NOT available
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220310121846.921256-1-niklas.soderlund@corigine.com

f655c088

09 Mar, 2022 10 commits

Merge branch 'Add support for transmitting packets using XDP in bpf_prog_run()' · de55c9a1

Alexei Starovoitov authored Mar 09, 2022

Toke Høiland-Jørgensen says:

====================

This series adds support for transmitting packets using XDP in
bpf_prog_run(), by enabling a new mode "live packet" mode which will handle
the XDP program return codes and redirect the packets to the stack or other
devices.

The primary use case for this is testing the redirect map types and the
ndo_xdp_xmit driver operation without an external traffic generator. But it
turns out to also be useful for creating a programmable traffic generator
in XDP, as well as injecting frames into the stack. A sample traffic
generator, which was included in previous versions of the series, but now
moved to xdp-tools, transmits up to 9 Mpps/core on my test machine.

To transmit the frames, the new mode instantiates a page_pool structure in
bpf_prog_run() and initialises the pages to contain XDP frames with the
data passed in by userspace. These frames can then be handled as though
they came from the hardware XDP path, and the existing page_pool code takes
care of returning and recycling them. The setup is optimised for high
performance with a high number of repetitions to support stress testing and
the traffic generator use case; see patch 1 for details.

v11:
- Fix override of return code in xdp_test_run_batch()
- Add Martin's ACKs to remaining patches

v10:
- Only propagate memory allocation errors from xdp_test_run_batch()
- Get rid of BPF_F_TEST_XDP_RESERVED; batch_size can be used to probe
- Check that batch_size is unset in non-XDP test_run funcs
- Lower the number of repetitions in the selftest to 10k
- Count number of recycled pages in the selftest
- Fix a few other nits from Martin, carry forward ACKs

v9:
- XDP_DROP packets in the selftest to ensure pages are recycled
- Fix a few issues reported by the kernel test robot
- Rewrite the documentation of the batch size to make it a bit clearer
- Rebase to newest bpf-next

v8:
- Make the batch size configurable from userspace
- Don't interrupt the packet loop on errors in do_redirect (this can be
  caught from the tracepoint)
- Add documentation of the feature
- Add reserved flag userspace can use to probe for support (kernel didn't
  check flags previously)
- Rebase to newest bpf-next, disallow live mode for jumbo frames

v7:
- Extend the local_bh_disable() to cover the full test run loop, to prevent
  running concurrently with the softirq. Fixes a deadlock with veth xmit.
- Reinstate the forwarding sysctl setting in the selftest, and bump up the
  number of packets being transmitted to trigger the above bug.
- Update commit message to make it clear that user space can select the
  ingress interface.

v6:
- Fix meta vs data pointer setting and add a selftest for it
- Add local_bh_disable() around code passing packets up the stack
- Create a new netns for the selftest and use a TC program instead of the
  forwarding hack to count packets being XDP_PASS'ed from the test prog.
- Check for the correct ingress ifindex in the selftest
- Rebase and drop patches 1-5 that were already merged

v5:
- Rebase to current bpf-next

v4:
- Fix a few code style issues (Alexei)
- Also handle the other return codes: XDP_PASS builds skbs and injects them
  into the stack, and XDP_TX is turned into a redirect out the same
  interface (Alexei).
- Drop the last patch adding an xdp_trafficgen program to samples/bpf; this
  will live in xdp-tools instead (Alexei).
- Add a separate bpf_test_run_xdp_live() function to test_run.c instead of
  entangling the new mode in the existing bpf_test_run().

v3:
- Reorder patches to make sure they all build individually (Patchwork)
- Remove a couple of unused variables (Patchwork)
- Remove unlikely() annotation in slow path and add back John's ACK that I
  accidentally dropped for v2 (John)

v2:
- Split up up __xdp_do_redirect to avoid passing two pointers to it (John)
- Always reset context pointers before each test run (John)
- Use get_mac_addr() from xdp_sample_user.h instead of rolling our own (Kumar)
- Fix wrong offset for metadata pointer
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

de55c9a1

selftests/bpf: Add selftest for XDP_REDIRECT in BPF_PROG_RUN · 55fcacca

Toke Høiland-Jørgensen authored Mar 09, 2022

This adds a selftest for the XDP_REDIRECT facility in BPF_PROG_RUN, that
redirects packets into a veth and counts them using an XDP program on the
other side of the veth pair and a TC program on the local side of the veth.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-6-toke@redhat.com

55fcacca

selftests/bpf: Move open_netns() and close_netns() into network_helpers.c · a3033884

Toke Høiland-Jørgensen authored Mar 09, 2022

These will also be used by the xdp_do_redirect test being added in the next
commit.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-5-toke@redhat.com

a3033884

libbpf: Support batch_size option to bpf_prog_test_run · 24592ad1

Toke Høiland-Jørgensen authored Mar 09, 2022

Add support for setting the new batch_size parameter to BPF_PROG_TEST_RUN
to libbpf; just add it as an option and pass it through to the kernel.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-4-toke@redhat.com

24592ad1

Documentation/bpf: Add documentation for BPF_PROG_RUN · 1a7551f1

Toke Høiland-Jørgensen authored Mar 09, 2022

This adds documentation for the BPF_PROG_RUN command; a short overview of
the command itself, and a more verbose description of the "live packet"
mode for XDP introduced in the previous commit.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-3-toke@redhat.com

1a7551f1

bpf: Add "live packet" mode for XDP in BPF_PROG_RUN · b530e9e1

Toke Høiland-Jørgensen authored Mar 09, 2022

This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.

The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.

When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.

Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.

To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.

Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.

Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com

b530e9e1

Merge branch 'BPF test_progs tests improvement' · 3399dd9f

Andrii Nakryiko authored Mar 08, 2022

Mykola Lysenko says:

====================

First patch reduces the sample_freq to 1000 to ensure test will
work even when kernel.perf_event_max_sample_rate was reduced to 1000.

Patches for send_signal and find_vma tune the test implementation to
make sure needed thread is scheduled. Also, both tests will finish as
soon as possible after the test condition is met.
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

3399dd9f

Improve stability of find_vma BPF test · ba83af05

Mykola Lysenko authored Mar 08, 2022

Remove unneeded spleep and increase length of dummy CPU
intensive computation to guarantee test process execution.
Also, complete aforemention computation as soon as
test success criteria is met
Signed-off-by: Mykola Lysenko <mykolal@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220308200449.1757478-4-mykolal@fb.com

ba83af05

Improve send_signal BPF test stability · 1fd49864

Mykola Lysenko authored Mar 08, 2022

Substitute sleep with dummy CPU intensive computation.
Finish aforemention computation as soon as signal was
delivered to the test process. Make the BPF code to
only execute when PID global variable is set
Signed-off-by: Mykola Lysenko <mykolal@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220308200449.1757478-3-mykolal@fb.com

1fd49864

Mykola Lysenko authored Mar 08, 2022

Linux kernel may automatically reduce kernel.perf_event_max_sample_rate
value when running tests in parallel on slow systems. Linux kernel checks
against this limit when opening perf event with freq=1 parameter set.
The lower bound is 1000. This patch reduces sample_freq value to 1000
in all BPF tests that use sample_freq to ensure they always can open
perf event.
Signed-off-by: Mykola Lysenko <mykolal@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220308200449.1757478-2-mykolal@fb.com

d4b54054

08 Mar, 2022 8 commits

tools: Fix unavoidable GCC call in Clang builds · 7fd9fd46

Adrian Ratiu authored Mar 08, 2022

In ChromeOS and Gentoo we catch any unwanted mixed Clang/LLVM
and GCC/binutils usage via toolchain wrappers which fail builds.
This has revealed that GCC is called unconditionally in Clang
configured builds to populate GCC_TOOLCHAIN_DIR.

Allow the user to override CLANG_CROSS_FLAGS to avoid the GCC
call - in our case we set the var directly in the ebuild recipe.

In theory Clang could be able to autodetect these settings so
this logic could be removed entirely, but in practice as the
commit cebdb737 ("tools: Help cross-building with clang")
mentions, this does not always work, so giving distributions
more control to specify their flags & sysroot is beneficial.
Suggested-by: Manoj Gupta <manojgupta@chromium.com>
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/lkml/87czjk4osi.fsf@ryzen9.i-did-not-set--mail-host-address--so-tickle-me
Link: https://lore.kernel.org/bpf/20220308121428.81735-1-adrian.ratiu@collabora.com

7fd9fd46

selftests/bpf: Make test_lwt_ip_encap more stable and faster · d23a8720

Felix Maurer authored Mar 03, 2022

In test_lwt_ip_encap, the ingress IPv6 encap test failed from time to
time. The failure occured when an IPv4 ping through the IPv6 GRE
encapsulation did not receive a reply within the timeout. The IPv4 ping
and the IPv6 ping in the test used different timeouts (1 sec for IPv4
and 6 sec for IPv6), probably taking into account that IPv6 might need
longer to successfully complete. However, when IPv4 pings (with the
short timeout) are encapsulated into the IPv6 tunnel, the delays of IPv6
apply.

The actual reason for the long delays with IPv6 was that the IPv6
neighbor discovery sometimes did not complete in time. This was caused
by the outgoing interface only having a tentative link local address,
i.e., not having completed DAD for that lladdr. The ND was successfully
retried after 1 sec but that was too late for the ping timeout.

The IPv6 addresses for the test were already added with nodad. However,
for the lladdrs, DAD was still performed. We now disable DAD in the test
netns completely and just assume that the two lladdrs on each veth pair
do not collide. This removes all the delays for IPv6 traffic in the
test.

Without the delays, we can now also reduce the delay of the IPv6 ping to
1 sec. This makes the whole test complete faster because we don't need
to wait for the excessive timeout for each IPv6 ping that is supposed
to fail.

Fixes: 0fde56e4 ("selftests: bpf: add test_lwt_ip_encap selftest")
Signed-off-by: Felix Maurer <fmaurer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/4987d549d48b4e316cd5b3936de69c8d4bc75a4f.1646305899.git.fmaurer@redhat.com

d23a8720

bpf: Determine buf_info inside check_buffer_access() · 44e9a741

Shung-Hsi Yu authored Mar 07, 2022

Instead of determining buf_info string in the caller of check_buffer_access(),
we can determine whether the register type is read-only through
type_is_rdonly_mem() helper inside check_buffer_access() and construct
buf_info, making the code slightly cleaner.
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/YiWYLnAkEZXBP/gH@syu-laptop

44e9a741

bpf/docs: Update list of architectures supported. · e878ae2d

KP Singh authored Mar 07, 2022

vmtest.sh also supports s390x now.
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220307133048.1287644-2-kpsingh@kernel.org

e878ae2d

bpf/docs: Update vmtest docs for static linking · 5ad0a415

KP Singh authored Mar 07, 2022

Dynamic linking when compiling on the host can cause issues when the
libc version does not match the one in the VM image. Update the
docs to explain how to do this.

Before:
  ./vmtest.sh -- ./test_progs -t test_ima
  ./test_progs: /usr/lib/libc.so.6: version `GLIBC_2.33' not found (required by ./test_progs)

After:

  LDLIBS=-static ./vmtest.sh -- ./test_progs -t test_ima
  test_ima:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Reported-by: "Geyslan G. Bem" <geyslan@gmail.com>
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220307133048.1287644-1-kpsingh@kernel.org

5ad0a415

bpf: Remove redundant slash · 4989135a

Yuntao Wang authored Mar 06, 2022

The trailing slash of LIBBPF_SRCS is redundant, remove it. Also inline
it as its only used in LIBBPF_INCLUDE.
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220305161013.361646-1-ytcoode@gmail.com

4989135a

libbpf: Fix array_size.cocci warning · 04b6de64

Guo Zhengkui authored Mar 06, 2022

Fix the following coccicheck warning:
tools/lib/bpf/bpf.c:114:31-32: WARNING: Use ARRAY_SIZE
tools/lib/bpf/xsk.c:484:34-35: WARNING: Use ARRAY_SIZE
tools/lib/bpf/xsk.c:485:35-36: WARNING: Use ARRAY_SIZE

It has been tested with gcc (Debian 8.3.0-6) 8.3.0 on x86_64.
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220306023426.19324-1-guozhengkui@vivo.com

04b6de64

bpf: Replace strncpy() with strscpy() · 03b9c7fa

Yuntao Wang authored Mar 04, 2022

Using strncpy() on NUL-terminated strings is considered deprecated[1].
Moreover, if the length of 'task->comm' is less than the destination buffer
size, strncpy() will NUL-pad the destination buffer, which is a needless
performance penalty.

Replacing strncpy() with strscpy() fixes all these issues.

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-stringsSigned-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220304070408.233658-1-ytcoode@gmail.com

03b9c7fa