Commits · 080b7f5733fd1283509e667bb49d19cc1b245083 · Kirill Smelkov / linux

14 Jul, 2023 4 commits

selftests: mptcp: add fastclose env var · 080b7f57

Geliang Tang authored Jul 12, 2023

Use a new env var fastclose instead of passing fastclose to addr_nr_ns2.
It can be set with 'server' or 'client':

  addr_nr_ns2=fastclose_client \
          run_tests $ns1 $ns2 10.0.1.1

  ->

  fastclose=client \
          run_tests $ns1 $ns2 10.0.1.1.

With this change, the fullmesh flag setting code can be moved into
pm_nl_set_endpoint() from do_transfer().
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230712-upstream-net-next-20230712-selftests-mptcp-use-local-env-v1-2-f1c8b62fbf95@tessares.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

080b7f57

selftests: mptcp: set all env vars as local ones · 662aa22d

Geliang Tang authored Jul 12, 2023

It would be better to move the declaration of all the env variables to
do_transfer(), run_tests(), or pm_nl_set_endpoint() as local variables,
instead of exporting them globally at the beginning of the file.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230712-upstream-net-next-20230712-selftests-mptcp-use-local-env-v1-1-f1c8b62fbf95@tessares.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

662aa22d

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · d2afa89f

Jakub Kicinski authored Jul 13, 2023

Alexei Starovoitov says:

====================
pull-request: bpf-next 2023-07-13

We've added 67 non-merge commits during the last 15 day(s) which contain
a total of 106 files changed, 4444 insertions(+), 619 deletions(-).

The main changes are:

1) Fix bpftool build in presence of stale vmlinux.h,
   from Alexander Lobakin.

2) Introduce bpf_me_mcache_free_rcu() and fix OOM under stress,
   from Alexei Starovoitov.

3) Teach verifier actual bounds of bpf_get_smp_processor_id()
   and fix perf+libbpf issue related to custom section handling,
   from Andrii Nakryiko.

4) Introduce bpf map element count, from Anton Protopopov.

5) Check skb ownership against full socket, from Kui-Feng Lee.

6) Support for up to 12 arguments in BPF trampoline, from Menglong Dong.

7) Export rcu_request_urgent_qs_task, from Paul E. McKenney.

8) Fix BTF walking of unions, from Yafang Shao.

9) Extend link_info for kprobe_multi and perf_event links,
   from Yafang Shao.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (67 commits)
  selftests/bpf: Add selftest for PTR_UNTRUSTED
  bpf: Fix an error in verifying a field in a union
  selftests/bpf: Add selftests for nested_trust
  bpf: Fix an error around PTR_UNTRUSTED
  selftests/bpf: add testcase for TRACING with 6+ arguments
  bpf, x86: allow function arguments up to 12 for TRACING
  bpf, x86: save/restore regs with BPF_DW size
  bpftool: Use "fallthrough;" keyword instead of comments
  bpf: Add object leak check.
  bpf: Convert bpf_cpumask to bpf_mem_cache_free_rcu.
  bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu().
  selftests/bpf: Improve test coverage of bpf_mem_alloc.
  rcu: Export rcu_request_urgent_qs_task()
  bpf: Allow reuse from waiting_for_gp_ttrace list.
  bpf: Add a hint to allocated objects.
  bpf: Change bpf_mem_cache draining process.
  bpf: Further refactor alloc_bulk().
  bpf: Factor out inc/dec of active flag into helpers.
  bpf: Refactor alloc_bulk().
  bpf: Let free_all() return the number of freed elements.
  ...
====================

Link: https://lore.kernel.org/r/20230714020910.80794-1-alexei.starovoitov@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d2afa89f

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · eb1b24a9

Jakub Kicinski authored Jul 13, 2023

Cross-merge networking fixes after downstream PR.

No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eb1b24a9

13 Jul, 2023 27 commits

selftests/bpf: Add selftest for PTR_UNTRUSTED · 1cd0e771

Yafang Shao authored Jul 13, 2023

Add a new selftest to check the PTR_UNTRUSTED condition. Below is the
result,

 #160     ptr_untrusted:OK
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230713025642.27477-5-laoar.shao@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

1cd0e771

bpf: Fix an error in verifying a field in a union · 33937607

Yafang Shao authored Jul 13, 2023

We are utilizing BPF LSM to monitor BPF operations within our container
environment. When we add support for raw_tracepoint, it hits below
error.

; (const void *)attr->raw_tracepoint.name);
27: (79) r3 = *(u64 *)(r2 +0)
access beyond the end of member map_type (mend:4) in struct (anon) with off 0 size 8

It can be reproduced with below BPF prog.

SEC("lsm/bpf")
int BPF_PROG(bpf_audit, int cmd, union bpf_attr *attr, unsigned int size)
{
	switch (cmd) {
	case BPF_RAW_TRACEPOINT_OPEN:
		bpf_printk("raw_tracepoint is %s", attr->raw_tracepoint.name);
		break;
	default:
		break;
	}
	return 0;
}

The reason is that when accessing a field in a union, such as bpf_attr,
if the field is located within a nested struct that is not the first
member of the union, it can result in incorrect field verification.

  union bpf_attr {
      struct {
          __u32 map_type; <<<< Actually it will find that field.
          __u32 key_size;
          __u32 value_size;
         ...
      };
      ...
      struct {
          __u64 name;    <<<< We want to verify this field.
          __u32 prog_fd;
      } raw_tracepoint;
  };

Considering the potential deep nesting levels, finding a perfect
solution to address this issue has proven challenging. Therefore, I
propose a solution where we simply skip the verification process if the
field in question is located within a union.

Fixes: 7e3617a7 ("bpf: Add array support to btf_struct_access")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230713025642.27477-4-laoar.shao@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

33937607

selftests/bpf: Add selftests for nested_trust · d2284d68

Yafang Shao authored Jul 13, 2023

Add selftests for nested_strust to check whehter PTR_UNTRUSTED is cleared
as expected, the result as follows:

 #141/1   nested_trust/test_read_cpumask:OK
 #141/2   nested_trust/test_skb_field:OK                    <<<<
 #141/3   nested_trust/test_invalid_nested_user_cpus:OK
 #141/4   nested_trust/test_invalid_nested_offset:OK
 #141/5   nested_trust/test_invalid_skb_field:OK            <<<<
 #141     nested_trust:OK

The #141/2 and #141/5 are newly added.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230713025642.27477-3-laoar.shao@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

d2284d68

bpf: Fix an error around PTR_UNTRUSTED · 7ce4dc3e

Yafang Shao authored Jul 13, 2023

Per discussion with Alexei, the PTR_UNTRUSTED flag should not been
cleared when we start to walk a new struct, because the struct in
question may be a struct nested in a union. We should also check and set
this flag before we walk its each member, in case itself is a union.
We will clear this flag if the field is BTF_TYPE_SAFE_RCU_OR_NULL.

Fixes: 6fcd486b ("bpf: Refactor RCU enforcement in the verifier.")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230713025642.27477-2-laoar.shao@gmail.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

7ce4dc3e

Merge branch 'bpf-x86-allow-function-arguments-up-to-12-for-tracing' · f892cac2

Alexei Starovoitov authored Jul 13, 2023

Menglong Dong says:

====================
bpf, x86: allow function arguments up to 12 for TRACING

From: Menglong Dong <imagedong@tencent.com>

For now, the BPF program of type BPF_PROG_TYPE_TRACING can only be used
on the kernel functions whose arguments count less than or equal to 6, if
not considering '> 8 bytes' struct argument. This is not friendly at all,
as too many functions have arguments count more than 6. According to the
current kernel version, below is a statistics of the function arguments
count:

argument count | function count
7              | 704
8              | 270
9              | 84
10             | 47
11             | 47
12             | 27
13             | 22
14             | 5
15             | 0
16             | 1

Therefore, let's enhance it by increasing the function arguments count
allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.

In the 1st patch, we save/restore regs with BPF_DW size to make the code
in save_regs()/restore_regs() simpler.

In the 2nd patch, we make arch_prepare_bpf_trampoline() support to copy
function arguments in stack for x86 arch. Therefore, the maximum
arguments can be up to MAX_BPF_FUNC_ARGS for FENTRY, FEXIT and
MODIFY_RETURN. Meanwhile, we clean the potential garbage value when we
copy the arguments on-stack.

And the 3rd patch is for the testcases of the this series.

Changes since v9:
- fix the failed test cases of trampoline_count and get_func_args_test
  in the 3rd patch

Changes since v8:
- change the way to test fmod_ret in the 3rd patch

Changes since v7:
- split the testcases, and add fentry_many_args/fexit_many_args to
  DENYLIST.aarch64 in 3rd patch

Changes since v6:
- somit nits from commit message and comment in the 1st patch
- remove the inline in get_nr_regs() in the 1st patch
- rename some function and various in the 1st patch

Changes since v5:
- adjust the commit log of the 1st patch, avoiding confusing people that
  bugs exist in current code
- introduce get_nr_regs() to get the space that used to pass args on
  stack correct in the 2nd patch
- add testcases to tracing_struct.c instead of fentry_test.c and
  fexit_test.c

Changes since v4:
- consider the case of the struct in arguments can't be hold by regs
- add comment for some code
- add testcases for MODIFY_RETURN
- rebase to the latest

Changes since v3:
- try make the stack pointer 16-byte aligned. Not sure if I'm right :)
- introduce clean_garbage() to clean the grabage when argument count is 7
- use different data type in bpf_testmod_fentry_test{7,12}
- add testcase for grabage values in ctx

Changes since v2:
- keep MAX_BPF_FUNC_ARGS still
- clean garbage value in upper bytes in the 2nd patch
- move bpf_fentry_test{7,12} to bpf_testmod.c and rename them to
  bpf_testmod_fentry_test{7,12} meanwhile in the 3rd patch

Changes since v1:
- change the maximun function arguments to 14 from 12
- add testcases (Jiri Olsa)
- instead EMIT4 with EMIT3_off32 for "lea" to prevent overflow
====================

Link: https://lore.kernel.org/r/20230713040738.1789742-1-imagedong@tencent.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

f892cac2

selftests/bpf: add testcase for TRACING with 6+ arguments · 5e9cf77d

Menglong Dong authored Jul 13, 2023

Add fentry_many_args.c and fexit_many_args.c to test the fentry/fexit
with 7/11 arguments. As this feature is not supported by arm64 yet, we
disable these testcases for arm64 in DENYLIST.aarch64. We can combine
them with fentry_test.c/fexit_test.c when arm64 is supported too.

Correspondingly, add bpf_testmod_fentry_test7() and
bpf_testmod_fentry_test11() to bpf_testmod.c

Meanwhile, add bpf_modify_return_test2() to test_run.c to test the
MODIFY_RETURN with 7 arguments.

Add bpf_testmod_test_struct_arg_7/bpf_testmod_test_struct_arg_7 in
bpf_testmod.c to test the struct in the arguments.

And the testcases passed on x86_64:

./test_progs -t fexit
Summary: 5/14 PASSED, 0 SKIPPED, 0 FAILED

./test_progs -t fentry
Summary: 3/2 PASSED, 0 SKIPPED, 0 FAILED

./test_progs -t modify_return
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

./test_progs -t tracing_struct
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230713040738.1789742-4-imagedong@tencent.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

5e9cf77d

bpf, x86: allow function arguments up to 12 for TRACING · 473e3150

Menglong Dong authored Jul 13, 2023

For now, the BPF program of type BPF_PROG_TYPE_TRACING can only be used
on the kernel functions whose arguments count less than or equal to 6, if
not considering '> 8 bytes' struct argument. This is not friendly at all,
as too many functions have arguments count more than 6.

According to the current kernel version, below is a statistics of the
function arguments count:

argument count | function count
7              | 704
8              | 270
9              | 84
10             | 47
11             | 47
12             | 27
13             | 22
14             | 5
15             | 0
16             | 1

Therefore, let's enhance it by increasing the function arguments count
allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.

For the case that we don't need to call origin function, which means
without BPF_TRAMP_F_CALL_ORIG, we need only copy the function arguments
that stored in the frame of the caller to current frame. The 7th and later
arguments are stored in "$rbp + 0x18", and they will be copied to the
stack area following where register values are saved.

For the case with BPF_TRAMP_F_CALL_ORIG, we need prepare the arguments
in stack before call origin function, which means we need alloc extra
"8 * (arg_count - 6)" memory in the top of the stack. Note, there should
not be any data be pushed to the stack before calling the origin function.
So 'rbx' value will be stored on a stack position higher than where stack
arguments are stored for BPF_TRAMP_F_CALL_ORIG.

According to the research of Yonghong, struct members should be all in
register or all on the stack. Meanwhile, the compiler will pass the
argument on regs if the remaining regs can hold the argument. Therefore,
we need save the arguments in order. Otherwise, disorder of the args can
happen. For example:

  struct foo_struct {
      long a;
      int b;
  };
  int foo(char, char, char, char, char, struct foo_struct,
          char);

the arg1-5,arg7 will be passed by regs, and arg6 will by stack. Therefore,
we should save/restore the arguments in the same order with the
declaration of foo(). And the args used as ctx in stack will be like this:

  reg_arg6   -- copy from regs
  stack_arg2 -- copy from stack
  stack_arg1
  reg_arg5   -- copy from regs
  reg_arg4
  reg_arg3
  reg_arg2
  reg_arg1

We use EMIT3_off32() or EMIT4() for "lea" and "sub". The range of the
imm in "lea" and "sub" is [-128, 127] if EMIT4() is used. Therefore,
we use EMIT3_off32() instead if the imm out of the range.

It works well for the FENTRY/FEXIT/MODIFY_RETURN.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230713040738.1789742-3-imagedong@tencent.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

473e3150

bpf, x86: save/restore regs with BPF_DW size · 02a6dfa8

Menglong Dong authored Jul 13, 2023

As we already reserve 8 byte in the stack for each reg, it is ok to
store/restore the regs in BPF_DW size. This will make the code in
save_regs()/restore_regs() simpler.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230713040738.1789742-2-imagedong@tencent.comSigned-off-by: Alexei Starovoitov <ast@kernel.org>

02a6dfa8

Merge tag 'net-6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · b1983d42

Linus Torvalds authored Jul 13, 2023

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter, wireless and ebpf.

  Current release - regressions:

   - netfilter: conntrack: gre: don't set assured flag for clash entries

   - wifi: iwlwifi: remove 'use_tfh' config to fix crash

  Previous releases - regressions:

   - ipv6: fix a potential refcount underflow for idev

   - icmp6: ifix null-ptr-deref of ip6_null_entry->rt6i_idev in
     icmp6_dev()

   - bpf: fix max stack depth check for async callbacks

   - eth: mlx5e:
      - check for NOT_READY flag state after locking
      - fix page_pool page fragment tracking for XDP

   - eth: igc:
      - fix tx hang issue when QBV gate is closed
      - fix corner cases for TSN offload

   - eth: octeontx2-af: Move validation of ptp pointer before its usage

   - eth: ena: fix shift-out-of-bounds in exponential backoff

  Previous releases - always broken:

   - core: prevent skb corruption on frag list segmentation

   - sched:
      - cls_fw: fix improper refcount update leads to use-after-free
      - sch_qfq: account for stab overhead in qfq_enqueue

   - netfilter:
      - report use refcount overflow
      - prevent OOB access in nft_byteorder_eval

   - wifi: mt7921e: fix init command fail with enabled device

   - eth: ocelot: fix oversize frame dropping for preemptible TCs

   - eth: fec: recycle pages for transmitted XDP frames"

* tag 'net-6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
  selftests: tc-testing: add test for qfq with stab overhead
  net/sched: sch_qfq: account for stab overhead in qfq_enqueue
  selftests: tc-testing: add tests for qfq mtu sanity check
  net/sched: sch_qfq: reintroduce lmax bound check for MTU
  wifi: cfg80211: fix receiving mesh packets without RFC1042 header
  wifi: rtw89: debug: fix error code in rtw89_debug_priv_send_h2c_set()
  net: txgbe: fix eeprom calculation error
  net/sched: make psched_mtu() RTNL-less safe
  net: ena: fix shift-out-of-bounds in exponential backoff
  netdevsim: fix uninitialized data in nsim_dev_trap_fa_cookie_write()
  net/sched: flower: Ensure both minimum and maximum ports are specified
  MAINTAINERS: Add another mailing list for QUALCOMM ETHQOS ETHERNET DRIVER
  docs: netdev: update the URL of the status page
  wifi: iwlwifi: remove 'use_tfh' config to fix crash
  xdp: use trusted arguments in XDP hints kfuncs
  bpf: cpumap: Fix memory leak in cpu_map_update_elem
  wifi: airo: avoid uninitialized warning in airo_get_rate()
  octeontx2-pf: Add additional check for MCAM rules
  net: dsa: Removed unneeded of_node_put in felix_parse_ports_node
  net: fec: use netdev_err_once() instead of netdev_err()
  ...

b1983d42

Merge tag 'trace-v6.5-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · ebc27aac

Linus Torvalds authored Jul 13, 2023

Pull tracing fixes from Steven Rostedt:

 - Fix some missing-prototype warnings

 - Fix user events struct args (did not include size of struct)

   When creating a user event, the "struct" keyword is to denote that
   the size of the field will be passed in. But the parsing failed to
   handle this case.

 - Add selftest to struct sizes for user events

 - Fix sample code for direct trampolines.

   The sample code for direct trampolines attached to handle_mm_fault().
   But the prototype changed and the direct trampoline sample code was
   not updated. Direct trampolines needs to have the arguments correct
   otherwise it can fail or crash the system.

 - Remove unused ftrace_regs_caller_ret() prototype.

 - Quiet false positive of FORTIFY_SOURCE

   Due to backward compatibility, the structure used to save stack
   traces in the kernel had a fixed size of 8. This structure is
   exported to user space via the tracing format file. A change was made
   to allow more than 8 functions to be recorded, and user space now
   uses the size field to know how many functions are actually in the
   stack.

   But the structure still has size of 8 (even though it points into the
   ring buffer that has the required amount allocated to hold a full
   stack.

   This was fine until the fortifier noticed that the
   memcpy(&entry->caller, stack, size) was greater than the 8 functions
   and would complain at runtime about it.

   Hide this by using a pointer to the stack location on the ring buffer
   instead of using the address of the entry structure caller field.

 - Fix a deadloop in reading trace_pipe that was caused by a mismatch
   between ring_buffer_empty() returning false which then asked to read
   the data, but the read code uses rb_num_of_entries() that returned
   zero, and causing a infinite "retry".

 - Fix a warning caused by not using all pages allocated to store ftrace
   functions, where this can happen if the linker inserts a bunch of
   "NULL" entries, causing the accounting of how many pages needed to be
   off.

 - Fix histogram synthetic event crashing when the start event is
   removed and the end event is still using a variable from it

 - Fix memory leak in freeing iter->temp in tracing_release_pipe()

* tag 'trace-v6.5-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Fix memory leak of iter->temp when reading trace_pipe
  tracing/histograms: Add histograms to hist_vars if they have referenced variables
  tracing: Stop FORTIFY_SOURCE complaining about stack trace caller
  ftrace: Fix possible warning on checking all pages used in ftrace_process_locs()
  ring-buffer: Fix deadloop issue on reading trace_pipe
  tracing: arm64: Avoid missing-prototype warnings
  selftests/user_events: Test struct size match cases
  tracing/user_events: Fix struct arg size match check
  x86/ftrace: Remove unsued extern declaration ftrace_regs_caller_ret()
  arm64: ftrace: Add direct call trampoline samples support
  samples: ftrace: Save required argument registers in sample trampolines

ebc27aac

Merge tag 'for-linus-6.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 15999328

Linus Torvalds authored Jul 13, 2023

Pull xen fixes from Juergen Gross:

 - a cleanup of the Xen related ELF-notes

 - a fix for virtio handling in Xen dom0 when running Xen in a VM

* tag 'for-linus-6.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/virtio: Fix NULL deref when a bridge of PCI root bus has no parent
  x86/Xen: tidy xen-head.S

15999328

Merge tag 'sh-for-v6.5-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux · 9350cd01

Linus Torvalds authored Jul 13, 2023

Pull sh fixes from John Paul Adrian Glaubitz:
 "The sh updates introduced multiple regressions.

  In particular, the change a8ac2961 ("sh: Avoid using IRQ0 on SH3
  and SH4") causes several boards to hang during boot due to incorrect
  IRQ numbers.

  Geert Uytterhoeven has contributed patches that handle the virq offset
  in the IRQ code for the dreamcast, highlander and r2d boards while
  Artur Rojek has contributed a patch which handles the virq offset for
  the hd64461 companion chip"

* tag 'sh-for-v6.5-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
  sh: hd64461: Handle virq offset for offchip IRQ base and HD64461 IRQ
  sh: mach-dreamcast: Handle virq offset in cascaded IRQ demux
  sh: mach-highlander: Handle virq offset in cascaded IRL demux
  sh: mach-r2d: Handle virq offset in cascaded IRL demux

9350cd01

tracing: Fix memory leak of iter->temp when reading trace_pipe · d5a82189

Zheng Yejian authored Jul 13, 2023

kmemleak reports:
  unreferenced object 0xffff88814d14e200 (size 256):
    comm "cat", pid 336, jiffies 4294871818 (age 779.490s)
    hex dump (first 32 bytes):
      04 00 01 03 00 00 00 00 08 00 00 00 00 00 00 00  ................
      0c d8 c8 9b ff ff ff ff 04 5a ca 9b ff ff ff ff  .........Z......
    backtrace:
      [<ffffffff9bdff18f>] __kmalloc+0x4f/0x140
      [<ffffffff9bc9238b>] trace_find_next_entry+0xbb/0x1d0
      [<ffffffff9bc9caef>] trace_print_lat_context+0xaf/0x4e0
      [<ffffffff9bc94490>] print_trace_line+0x3e0/0x950
      [<ffffffff9bc95499>] tracing_read_pipe+0x2d9/0x5a0
      [<ffffffff9bf03a43>] vfs_read+0x143/0x520
      [<ffffffff9bf04c2d>] ksys_read+0xbd/0x160
      [<ffffffff9d0f0edf>] do_syscall_64+0x3f/0x90
      [<ffffffff9d2000aa>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8

when reading file 'trace_pipe', 'iter->temp' is allocated or relocated
in trace_find_next_entry() but not freed before 'trace_pipe' is closed.

To fix it, free 'iter->temp' in tracing_release_pipe().

Link: https://lore.kernel.org/linux-trace-kernel/20230713141435.1133021-1-zhengyejian1@huawei.com

Cc: stable@vger.kernel.org
Fixes: ff895103 ("tracing: Save off entry when peeking at next entry")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

d5a82189

Merge branch 'net-sched-fixes-for-sch_qfq' · 9d23aac8

Paolo Abeni authored Jul 13, 2023

Pedro Tammela says:

====================
net/sched: fixes for sch_qfq

Patch 1 fixes a regression introduced in 6.4 where the MTU size could be
bigger than 'lmax'.

Patch 3 fixes an issue where the code doesn't account for qdisc_pkt_len()
returning a size bigger then 'lmax'.

Patches 2 and 4 are selftests for the issues above.
====================

Link: https://lore.kernel.org/r/20230711210103.597831-1-pctammela@mojatatu.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

9d23aac8

selftests: tc-testing: add test for qfq with stab overhead · 137f6219

Pedro Tammela authored Jul 11, 2023

A packet with stab overhead greater than QFQ_MAX_LMAX should be dropped
by the QFQ qdisc as it can't handle such lengths.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

137f6219

net/sched: sch_qfq: account for stab overhead in qfq_enqueue · 3e337087

Pedro Tammela authored Jul 11, 2023

Lion says:
-------
In the QFQ scheduler a similar issue to CVE-2023-31436
persists.

Consider the following code in net/sched/sch_qfq.c:

static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
                struct sk_buff **to_free)
{
     unsigned int len = qdisc_pkt_len(skb), gso_segs;

    // ...

     if (unlikely(cl->agg->lmax < len)) {
         pr_debug("qfq: increasing maxpkt from %u to %u for class %u",
              cl->agg->lmax, len, cl->common.classid);
         err = qfq_change_agg(sch, cl, cl->agg->class_weight, len);
         if (err) {
             cl->qstats.drops++;
             return qdisc_drop(skb, sch, to_free);
         }

    // ...

     }

Similarly to CVE-2023-31436, "lmax" is increased without any bounds
checks according to the packet length "len". Usually this would not
impose a problem because packet sizes are naturally limited.

This is however not the actual packet length, rather the
"qdisc_pkt_len(skb)" which might apply size transformations according to
"struct qdisc_size_table" as created by "qdisc_get_stab()" in
net/sched/sch_api.c if the TCA_STAB option was set when modifying the qdisc.

A user may choose virtually any size using such a table.

As a result the same issue as in CVE-2023-31436 can occur, allowing heap
out-of-bounds read / writes in the kmalloc-8192 cache.
-------

We can create the issue with the following commands:

tc qdisc add dev $DEV root handle 1: stab mtu 2048 tsize 512 mpu 0 \
overhead 999999999 linklayer ethernet qfq
tc class add dev $DEV parent 1: classid 1:1 htb rate 6mbit burst 15k
tc filter add dev $DEV parent 1: matchall classid 1:1
ping -I $DEV 1.1.1.2

This is caused by incorrectly assuming that qdisc_pkt_len() returns a
length within the QFQ_MIN_LMAX < len < QFQ_MAX_LMAX.

Fixes: 462dbc91 ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Reported-by: Lion <nnamrec@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

3e337087

selftests: tc-testing: add tests for qfq mtu sanity check · c5a06fdc

Pedro Tammela authored Jul 11, 2023

QFQ only supports a certain bound of MTU size so make sure
we check for this requirement in the tests.
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

c5a06fdc

net/sched: sch_qfq: reintroduce lmax bound check for MTU · 158810b2

Pedro Tammela authored Jul 11, 2023

25369891 deletes a check for the case where no 'lmax' is
specified which 30379334 previously fixed as 'lmax'
could be set to the device's MTU without any bound checking
for QFQ_LMAX_MIN and QFQ_LMAX_MAX. Therefore, reintroduce the check.

Fixes: 25369891 ("net/sched: sch_qfq: refactor parsing of netlink parameters")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

158810b2

sh: hd64461: Handle virq offset for offchip IRQ base and HD64461 IRQ · 7c28a35e

Artur Rojek authored Jul 11, 2023

A recent change to start counting SuperH IRQ #s from 16 breaks support
for the Hitachi HD64461 companion chip.

Move the offchip IRQ base and HD64461 IRQ # by 16 in order to
accommodate for the new virq numbering rules.

Fixes: a8ac2961 ("sh: Avoid using IRQ0 on SH3 and SH4")
Signed-off-by: Artur Rojek <contact@artur-rojek.eu>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/r/20230710233132.69734-1-contact@artur-rojek.euSigned-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>

7c28a35e

sh: mach-dreamcast: Handle virq offset in cascaded IRQ demux · 3d20f7a6

Geert Uytterhoeven authored Jul 09, 2023

Take into account the virq offset when translating cascaded interrupts.

Fixes: a8ac2961 ("sh: Avoid using IRQ0 on SH3 and SH4")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/r/7d0cb246c9f1cd24bb1f637ec5cb67e799a4c3b8.1688908227.git.geert+renesas@glider.beSigned-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>

3d20f7a6

sh: mach-highlander: Handle virq offset in cascaded IRL demux · a2601b8d

Geert Uytterhoeven authored Jul 09, 2023

Take into account the virq offset when translating cascaded IRL
interrupts.

Fixes: a8ac2961 ("sh: Avoid using IRQ0 on SH3 and SH4")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/r/4fcb0d08a2b372431c41e04312742dc9e41e1be4.1688908186.git.geert+renesas@glider.beSigned-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>

a2601b8d

sh: mach-r2d: Handle virq offset in cascaded IRL demux · ab8aa4f0

Geert Uytterhoeven authored Jul 09, 2023

When booting rts7751r2dplus_defconfig on QEMU, the system hangs due to
an interrupt storm on IRQ 20. IRQ 20 aka event 0x280 is a cascaded IRL
interrupt, which maps to IRQ_VOYAGER, the interrupt used by the Silicon
Motion SM501 multimedia companion chip. As rts7751r2d_irq_demux() does
not take into account the new virq offset, the interrupt is no longer
translated, leading to an unhandled interrupt.

Fix this by taking into account the virq offset when translating
cascaded IRL interrupts.

Fixes: a8ac2961 ("sh: Avoid using IRQ0 on SH3 and SH4")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/r/fbfea3ad-d327-4ad5-ac9c-648c7ca3fe1f@roeck-us.netSigned-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/2c99d5df41c40691f6c407b7b6a040d406bc81ac.1688901306.git.geert+renesas@glider.beSigned-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>

ab8aa4f0

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · b0b0ab6f

Jakub Kicinski authored Jul 12, 2023

Alexei Starovoitov says:

====================
pull-request: bpf 2023-07-12

We've added 5 non-merge commits during the last 7 day(s) which contain
a total of 7 files changed, 93 insertions(+), 28 deletions(-).

The main changes are:

1) Fix max stack depth check for async callbacks, from Kumar.

2) Fix inconsistent JIT image generation, from Björn.

3) Use trusted arguments in XDP hints kfuncs, from Larysa.

4) Fix memory leak in cpu_map_update_elem, from Pu.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  xdp: use trusted arguments in XDP hints kfuncs
  bpf: cpumap: Fix memory leak in cpu_map_update_elem
  riscv, bpf: Fix inconsistent JIT image generation
  selftests/bpf: Add selftest for check_stack_max_depth bug
  bpf: Fix max stack depth check for async callbacks
====================

Link: https://lore.kernel.org/r/20230712223045.40182-1-alexei.starovoitov@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b0b0ab6f

wifi: cfg80211: fix receiving mesh packets without RFC1042 header · fec3ebb5

Felix Fietkau authored Jul 11, 2023

Fix ethernet header length field after stripping the mesh header

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/CT5GNZSK28AI.2K6M69OXM9RW5@syracuse/
Fixes: 986e43b1 ("wifi: mac80211: fix receiving A-MSDU frames on mesh interfaces")
Reported-and-tested-by: Nicolas Escande <nico.escande@gmail.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230711115052.68430-1-nbd@nbd.nameSigned-off-by: Jakub Kicinski <kuba@kernel.org>

fec3ebb5

wifi: rtw89: debug: fix error code in rtw89_debug_priv_send_h2c_set() · 4f4626cd

Zhang Shurong authored Jul 06, 2023

If there is a failure during rtw89_fw_h2c_raw() rtw89_debug_priv_send_h2c
should return negative error code instead of a positive value count.
Fix this bug by returning correct error code.

Fixes: e3ec7017 ("rtw89: add Realtek 802.11ax driver")
Signed-off-by: Zhang Shurong <zhang_shurong@foxmail.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://lore.kernel.org/r/tencent_AD09A61BC4DA92AD1EB0790F5C850E544D07@qq.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4f4626cd

net: wwan: t7xx: Add AP CLDMA · ba2274dc

Jose Ignacio Tornos Martinez authored Jul 11, 2023

At this moment with the current status, t7xx is not functional due to
problems like this after connection, if there is no activity:
[   57.370534] mtk_t7xx 0000:72:00.0: [PM] SAP suspend error: -110
[   57.370581] mtk_t7xx 0000:72:00.0: can't suspend
    (t7xx_pci_pm_runtime_suspend [mtk_t7xx] returned -110)
because after this, the traffic no longer works.

The complete series 'net: wwan: t7xx: fw flashing & coredump support'
was reverted because of issues with the pci implementation.
In order to have at least the modem working, it would be enough if just
the first commit of the series is re-applied:
d20ef656 net: wwan: t7xx: Add AP CLDMA
With that, the Application Processor would be controlled, correctly
suspended and the commented problems would be fixed (I am testing here
like this with no related issue).

This commit is independent of the others and not related to the
commented pci implementation for the new features: fw flashing and
coredump collection.

Use v2 patch version of d20ef656 as JinJian Song suggests
(https://patchwork.kernel.org/project/netdevbpf/patch/20230105154215.198828-1-m.chetan.kumar@linux.intel.com/).

Original text from the commit that would be re-applied:

    d20ef656 net: wwan: t7xx: Add AP CLDMA
    Author: Haijun Liu <haijun.liu@mediatek.com>
    Date:   Tue Aug 16 09:53:28 2022 +0530

    The t7xx device contains two Cross Layer DMA (CLDMA) interfaces to
    communicate with AP and Modem processors respectively. So far only
    MD-CLDMA was being used, this patch enables AP-CLDMA.

    Rename small Application Processor (sAP) to AP.
Signed-off-by: Haijun Liu <haijun.liu@mediatek.com>
Co-developed-by: Madhusmita Sahu <madhusmita.sahu@intel.com>
Signed-off-by: Madhusmita Sahu <madhusmita.sahu@intel.com>
Signed-off-by: Moises Veleta <moises.veleta@linux.intel.com>
Signed-off-by: Devegowda Chandrashekar <chandrashekar.devegowda@intel.com>
Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230711062817.6108-1-jtornosm@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ba2274dc

ipv6: rpl: Remove redundant skb_dst_drop(). · c5ec13e3

Kuniyuki Iwashima authored Jul 10, 2023

RPL code has a pattern where skb_dst_drop() is called before
ip6_route_input().

However, ip6_route_input() calls skb_dst_drop() internally,
so we need not call skb_dst_drop() before ip6_route_input().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230710213511.5364-1-kuniyu@amazon.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c5ec13e3

12 Jul, 2023 9 commits

Merge branch 'mlxsw-add-port-range-matching-support' · fa3530be

Jakub Kicinski authored Jul 12, 2023

Petr Machata says:

====================
mlxsw: Add port range matching support

Ido Schimmel writes:

Add port range matching support in mlxsw as part of tc-flower offload.

Patches #1-#7 gradually add port range matching support in mlxsw. See
patch #3 to understand how port range matching is implemented in the
device.

Patches #8-#10 add selftests.
====================

Link: https://lore.kernel.org/r/cover.1689092769.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

fa3530be

selftests: forwarding: Add test cases for flower port range matching · 209218e4

Ido Schimmel authored Jul 11, 2023

Add test cases to verify that flower port range matching works
correctly. Test both source and destination port ranges, with different
combinations of IPv4/IPv6 and TCP/UDP, on both ingress and egress.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/9d47c9cd4522b2d335b13ce8f6c9b33199298cee.1689092769.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

209218e4

selftests: mlxsw: Test port range registers' occupancy · 0a1a818d

Ido Schimmel authored Jul 11, 2023

Test that filters that match on the same port range, but with different
combination of IPv4/IPv6 and TCP/UDP all use the same port range
register by observing port range registers' occupancy via
devlink-resource.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/0a2eb63b234fb062ff011e80231868cc80000c81.1689092769.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0a1a818d

selftests: mlxsw: Add scale test for port ranges · 45c5a384

Ido Schimmel authored Jul 11, 2023

Query the maximum number of supported port range registers using
devlink-resource and test that this number can be reached by configuring
tc filters with different port ranges. Test that an error is returned in
case the maximum number is exceeded.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/48eee181270d9f291e09d1858c7b26a3f7fcc164.1689092769.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

45c5a384

mlxsw: spectrum_flower: Add ability to match on port ranges · fe22f741

Ido Schimmel authored Jul 11, 2023

Add the ability to match on port ranges by utilizing the previously
added port range registers and the port range key element. Up to two
port range registers can be used for each filter, one for source port
and another for destination port.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/df4385a9592917e9a22ebff339e0463e4a8dfa82.1689092769.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

fe22f741

mlxsw: spectrum_acl: Pass main driver structure to mlxsw_sp_acl_rulei_destroy() · 898979c7

Ido Schimmel authored Jul 11, 2023

The main driver structure will be needed in this function by a
subsequent patch, so pass it. No functional changes intended.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/24d96a4e21310e5de2951ace58263db35e44a0df.1689092769.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

898979c7

mlxsw: spectrum_acl: Add port range key element · d65f24c9

Ido Schimmel authored Jul 11, 2023

Add the port range key element to supported key blocks so that it could
be used to match on the output of the port range registers. Each bit in
the element can be used to match on the output of the port range
register with the corresponding index.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/f0423f6ee9e36c6b0a426bc9995f42223c48f2db.1689092769.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d65f24c9

mlxsw: spectrum_port_range: Add devlink resource support · 74d6786c

Ido Schimmel authored Jul 11, 2023

Expose via devlink-resource the maximum number of port range registers
and their current occupancy. Besides the observability benefits, this
resource will be used by subsequent patches for scale and occupancy
tests.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/7945e0c715dc5efb1617f45f7560c1f1bd0bcf8a.1689092769.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

74d6786c

mlxsw: spectrum_port_range: Add port range core · b3eb04be

Ido Schimmel authored Jul 11, 2023

The Spectrum ASICs have a fixed number of port range registers, each of
which maintains the following parameters:

* Minimum and maximum port.
* Apply port range for source port, destination port or both.
* Apply port range for TCP, UDP or both.
* Apply port range for IPv4, IPv6 or both.

Implement a port range core which takes care of the allocation and
configuration of these registers and exposes an API that allows
in-driver consumers (e.g., the ACL code) to request matching on a range
of either source or destination port.

These registers are going to be used for port range matching in the
flower classifier that already matches on EtherType being IPv4 / IPv6 and
IP protocol being TCP / UDP. As such, there is no need to limit these
registers to a specific EtherType or IP protocol, which will increase
the likelihood of a register being shared by multiple flower filters.

It is unlikely that a filter will match on the same range of both source
and destination ports, which is why each register is only configured to
match on either source or destination port. If a filter requires
matching on a range of both source and destination ports, it will
utilize two port range registers and match on the output of both.

For efficient lookup and traversal, use XArray to store the allocated
port range registers. The XArray uses RCU and an internal spinlock to
synchronise access, so there is no need for a dedicate lock.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/674f00539a0072d455847663b5feb504db51a259.1689092769.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b3eb04be