1. 14 Jul, 2023 27 commits
  2. 13 Jul, 2023 13 commits
    • Yafang Shao's avatar
      selftests/bpf: Add selftest for PTR_UNTRUSTED · 1cd0e771
      Yafang Shao authored
      Add a new selftest to check the PTR_UNTRUSTED condition. Below is the
      result,
      
       #160     ptr_untrusted:OK
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Link: https://lore.kernel.org/r/20230713025642.27477-5-laoar.shao@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1cd0e771
    • Yafang Shao's avatar
      bpf: Fix an error in verifying a field in a union · 33937607
      Yafang Shao authored
      We are utilizing BPF LSM to monitor BPF operations within our container
      environment. When we add support for raw_tracepoint, it hits below
      error.
      
      ; (const void *)attr->raw_tracepoint.name);
      27: (79) r3 = *(u64 *)(r2 +0)
      access beyond the end of member map_type (mend:4) in struct (anon) with off 0 size 8
      
      It can be reproduced with below BPF prog.
      
      SEC("lsm/bpf")
      int BPF_PROG(bpf_audit, int cmd, union bpf_attr *attr, unsigned int size)
      {
      	switch (cmd) {
      	case BPF_RAW_TRACEPOINT_OPEN:
      		bpf_printk("raw_tracepoint is %s", attr->raw_tracepoint.name);
      		break;
      	default:
      		break;
      	}
      	return 0;
      }
      
      The reason is that when accessing a field in a union, such as bpf_attr,
      if the field is located within a nested struct that is not the first
      member of the union, it can result in incorrect field verification.
      
        union bpf_attr {
            struct {
                __u32 map_type; <<<< Actually it will find that field.
                __u32 key_size;
                __u32 value_size;
               ...
            };
            ...
            struct {
                __u64 name;    <<<< We want to verify this field.
                __u32 prog_fd;
            } raw_tracepoint;
        };
      
      Considering the potential deep nesting levels, finding a perfect
      solution to address this issue has proven challenging. Therefore, I
      propose a solution where we simply skip the verification process if the
      field in question is located within a union.
      
      Fixes: 7e3617a7 ("bpf: Add array support to btf_struct_access")
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Link: https://lore.kernel.org/r/20230713025642.27477-4-laoar.shao@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      33937607
    • Yafang Shao's avatar
      selftests/bpf: Add selftests for nested_trust · d2284d68
      Yafang Shao authored
      Add selftests for nested_strust to check whehter PTR_UNTRUSTED is cleared
      as expected, the result as follows:
      
       #141/1   nested_trust/test_read_cpumask:OK
       #141/2   nested_trust/test_skb_field:OK                    <<<<
       #141/3   nested_trust/test_invalid_nested_user_cpus:OK
       #141/4   nested_trust/test_invalid_nested_offset:OK
       #141/5   nested_trust/test_invalid_skb_field:OK            <<<<
       #141     nested_trust:OK
      
      The #141/2 and #141/5 are newly added.
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Link: https://lore.kernel.org/r/20230713025642.27477-3-laoar.shao@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d2284d68
    • Yafang Shao's avatar
      bpf: Fix an error around PTR_UNTRUSTED · 7ce4dc3e
      Yafang Shao authored
      Per discussion with Alexei, the PTR_UNTRUSTED flag should not been
      cleared when we start to walk a new struct, because the struct in
      question may be a struct nested in a union. We should also check and set
      this flag before we walk its each member, in case itself is a union.
      We will clear this flag if the field is BTF_TYPE_SAFE_RCU_OR_NULL.
      
      Fixes: 6fcd486b ("bpf: Refactor RCU enforcement in the verifier.")
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Link: https://lore.kernel.org/r/20230713025642.27477-2-laoar.shao@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7ce4dc3e
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-x86-allow-function-arguments-up-to-12-for-tracing' · f892cac2
      Alexei Starovoitov authored
      Menglong Dong says:
      
      ====================
      bpf, x86: allow function arguments up to 12 for TRACING
      
      From: Menglong Dong <imagedong@tencent.com>
      
      For now, the BPF program of type BPF_PROG_TYPE_TRACING can only be used
      on the kernel functions whose arguments count less than or equal to 6, if
      not considering '> 8 bytes' struct argument. This is not friendly at all,
      as too many functions have arguments count more than 6. According to the
      current kernel version, below is a statistics of the function arguments
      count:
      
      argument count | function count
      7              | 704
      8              | 270
      9              | 84
      10             | 47
      11             | 47
      12             | 27
      13             | 22
      14             | 5
      15             | 0
      16             | 1
      
      Therefore, let's enhance it by increasing the function arguments count
      allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.
      
      In the 1st patch, we save/restore regs with BPF_DW size to make the code
      in save_regs()/restore_regs() simpler.
      
      In the 2nd patch, we make arch_prepare_bpf_trampoline() support to copy
      function arguments in stack for x86 arch. Therefore, the maximum
      arguments can be up to MAX_BPF_FUNC_ARGS for FENTRY, FEXIT and
      MODIFY_RETURN. Meanwhile, we clean the potential garbage value when we
      copy the arguments on-stack.
      
      And the 3rd patch is for the testcases of the this series.
      
      Changes since v9:
      - fix the failed test cases of trampoline_count and get_func_args_test
        in the 3rd patch
      
      Changes since v8:
      - change the way to test fmod_ret in the 3rd patch
      
      Changes since v7:
      - split the testcases, and add fentry_many_args/fexit_many_args to
        DENYLIST.aarch64 in 3rd patch
      
      Changes since v6:
      - somit nits from commit message and comment in the 1st patch
      - remove the inline in get_nr_regs() in the 1st patch
      - rename some function and various in the 1st patch
      
      Changes since v5:
      - adjust the commit log of the 1st patch, avoiding confusing people that
        bugs exist in current code
      - introduce get_nr_regs() to get the space that used to pass args on
        stack correct in the 2nd patch
      - add testcases to tracing_struct.c instead of fentry_test.c and
        fexit_test.c
      
      Changes since v4:
      - consider the case of the struct in arguments can't be hold by regs
      - add comment for some code
      - add testcases for MODIFY_RETURN
      - rebase to the latest
      
      Changes since v3:
      - try make the stack pointer 16-byte aligned. Not sure if I'm right :)
      - introduce clean_garbage() to clean the grabage when argument count is 7
      - use different data type in bpf_testmod_fentry_test{7,12}
      - add testcase for grabage values in ctx
      
      Changes since v2:
      - keep MAX_BPF_FUNC_ARGS still
      - clean garbage value in upper bytes in the 2nd patch
      - move bpf_fentry_test{7,12} to bpf_testmod.c and rename them to
        bpf_testmod_fentry_test{7,12} meanwhile in the 3rd patch
      
      Changes since v1:
      - change the maximun function arguments to 14 from 12
      - add testcases (Jiri Olsa)
      - instead EMIT4 with EMIT3_off32 for "lea" to prevent overflow
      ====================
      
      Link: https://lore.kernel.org/r/20230713040738.1789742-1-imagedong@tencent.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f892cac2
    • Menglong Dong's avatar
      selftests/bpf: add testcase for TRACING with 6+ arguments · 5e9cf77d
      Menglong Dong authored
      Add fentry_many_args.c and fexit_many_args.c to test the fentry/fexit
      with 7/11 arguments. As this feature is not supported by arm64 yet, we
      disable these testcases for arm64 in DENYLIST.aarch64. We can combine
      them with fentry_test.c/fexit_test.c when arm64 is supported too.
      
      Correspondingly, add bpf_testmod_fentry_test7() and
      bpf_testmod_fentry_test11() to bpf_testmod.c
      
      Meanwhile, add bpf_modify_return_test2() to test_run.c to test the
      MODIFY_RETURN with 7 arguments.
      
      Add bpf_testmod_test_struct_arg_7/bpf_testmod_test_struct_arg_7 in
      bpf_testmod.c to test the struct in the arguments.
      
      And the testcases passed on x86_64:
      
      ./test_progs -t fexit
      Summary: 5/14 PASSED, 0 SKIPPED, 0 FAILED
      
      ./test_progs -t fentry
      Summary: 3/2 PASSED, 0 SKIPPED, 0 FAILED
      
      ./test_progs -t modify_return
      Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
      
      ./test_progs -t tracing_struct
      Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20230713040738.1789742-4-imagedong@tencent.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5e9cf77d
    • Menglong Dong's avatar
      bpf, x86: allow function arguments up to 12 for TRACING · 473e3150
      Menglong Dong authored
      For now, the BPF program of type BPF_PROG_TYPE_TRACING can only be used
      on the kernel functions whose arguments count less than or equal to 6, if
      not considering '> 8 bytes' struct argument. This is not friendly at all,
      as too many functions have arguments count more than 6.
      
      According to the current kernel version, below is a statistics of the
      function arguments count:
      
      argument count | function count
      7              | 704
      8              | 270
      9              | 84
      10             | 47
      11             | 47
      12             | 27
      13             | 22
      14             | 5
      15             | 0
      16             | 1
      
      Therefore, let's enhance it by increasing the function arguments count
      allowed in arch_prepare_bpf_trampoline(), for now, only x86_64.
      
      For the case that we don't need to call origin function, which means
      without BPF_TRAMP_F_CALL_ORIG, we need only copy the function arguments
      that stored in the frame of the caller to current frame. The 7th and later
      arguments are stored in "$rbp + 0x18", and they will be copied to the
      stack area following where register values are saved.
      
      For the case with BPF_TRAMP_F_CALL_ORIG, we need prepare the arguments
      in stack before call origin function, which means we need alloc extra
      "8 * (arg_count - 6)" memory in the top of the stack. Note, there should
      not be any data be pushed to the stack before calling the origin function.
      So 'rbx' value will be stored on a stack position higher than where stack
      arguments are stored for BPF_TRAMP_F_CALL_ORIG.
      
      According to the research of Yonghong, struct members should be all in
      register or all on the stack. Meanwhile, the compiler will pass the
      argument on regs if the remaining regs can hold the argument. Therefore,
      we need save the arguments in order. Otherwise, disorder of the args can
      happen. For example:
      
        struct foo_struct {
            long a;
            int b;
        };
        int foo(char, char, char, char, char, struct foo_struct,
                char);
      
      the arg1-5,arg7 will be passed by regs, and arg6 will by stack. Therefore,
      we should save/restore the arguments in the same order with the
      declaration of foo(). And the args used as ctx in stack will be like this:
      
        reg_arg6   -- copy from regs
        stack_arg2 -- copy from stack
        stack_arg1
        reg_arg5   -- copy from regs
        reg_arg4
        reg_arg3
        reg_arg2
        reg_arg1
      
      We use EMIT3_off32() or EMIT4() for "lea" and "sub". The range of the
      imm in "lea" and "sub" is [-128, 127] if EMIT4() is used. Therefore,
      we use EMIT3_off32() instead if the imm out of the range.
      
      It works well for the FENTRY/FEXIT/MODIFY_RETURN.
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20230713040738.1789742-3-imagedong@tencent.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      473e3150
    • Menglong Dong's avatar
      bpf, x86: save/restore regs with BPF_DW size · 02a6dfa8
      Menglong Dong authored
      As we already reserve 8 byte in the stack for each reg, it is ok to
      store/restore the regs in BPF_DW size. This will make the code in
      save_regs()/restore_regs() simpler.
      Signed-off-by: default avatarMenglong Dong <imagedong@tencent.com>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20230713040738.1789742-2-imagedong@tencent.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      02a6dfa8
    • Linus Torvalds's avatar
      Merge tag 'net-6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · b1983d42
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from netfilter, wireless and ebpf.
      
        Current release - regressions:
      
         - netfilter: conntrack: gre: don't set assured flag for clash entries
      
         - wifi: iwlwifi: remove 'use_tfh' config to fix crash
      
        Previous releases - regressions:
      
         - ipv6: fix a potential refcount underflow for idev
      
         - icmp6: ifix null-ptr-deref of ip6_null_entry->rt6i_idev in
           icmp6_dev()
      
         - bpf: fix max stack depth check for async callbacks
      
         - eth: mlx5e:
            - check for NOT_READY flag state after locking
            - fix page_pool page fragment tracking for XDP
      
         - eth: igc:
            - fix tx hang issue when QBV gate is closed
            - fix corner cases for TSN offload
      
         - eth: octeontx2-af: Move validation of ptp pointer before its usage
      
         - eth: ena: fix shift-out-of-bounds in exponential backoff
      
        Previous releases - always broken:
      
         - core: prevent skb corruption on frag list segmentation
      
         - sched:
            - cls_fw: fix improper refcount update leads to use-after-free
            - sch_qfq: account for stab overhead in qfq_enqueue
      
         - netfilter:
            - report use refcount overflow
            - prevent OOB access in nft_byteorder_eval
      
         - wifi: mt7921e: fix init command fail with enabled device
      
         - eth: ocelot: fix oversize frame dropping for preemptible TCs
      
         - eth: fec: recycle pages for transmitted XDP frames"
      
      * tag 'net-6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
        selftests: tc-testing: add test for qfq with stab overhead
        net/sched: sch_qfq: account for stab overhead in qfq_enqueue
        selftests: tc-testing: add tests for qfq mtu sanity check
        net/sched: sch_qfq: reintroduce lmax bound check for MTU
        wifi: cfg80211: fix receiving mesh packets without RFC1042 header
        wifi: rtw89: debug: fix error code in rtw89_debug_priv_send_h2c_set()
        net: txgbe: fix eeprom calculation error
        net/sched: make psched_mtu() RTNL-less safe
        net: ena: fix shift-out-of-bounds in exponential backoff
        netdevsim: fix uninitialized data in nsim_dev_trap_fa_cookie_write()
        net/sched: flower: Ensure both minimum and maximum ports are specified
        MAINTAINERS: Add another mailing list for QUALCOMM ETHQOS ETHERNET DRIVER
        docs: netdev: update the URL of the status page
        wifi: iwlwifi: remove 'use_tfh' config to fix crash
        xdp: use trusted arguments in XDP hints kfuncs
        bpf: cpumap: Fix memory leak in cpu_map_update_elem
        wifi: airo: avoid uninitialized warning in airo_get_rate()
        octeontx2-pf: Add additional check for MCAM rules
        net: dsa: Removed unneeded of_node_put in felix_parse_ports_node
        net: fec: use netdev_err_once() instead of netdev_err()
        ...
      b1983d42
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.5-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · ebc27aac
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Fix some missing-prototype warnings
      
       - Fix user events struct args (did not include size of struct)
      
         When creating a user event, the "struct" keyword is to denote that
         the size of the field will be passed in. But the parsing failed to
         handle this case.
      
       - Add selftest to struct sizes for user events
      
       - Fix sample code for direct trampolines.
      
         The sample code for direct trampolines attached to handle_mm_fault().
         But the prototype changed and the direct trampoline sample code was
         not updated. Direct trampolines needs to have the arguments correct
         otherwise it can fail or crash the system.
      
       - Remove unused ftrace_regs_caller_ret() prototype.
      
       - Quiet false positive of FORTIFY_SOURCE
      
         Due to backward compatibility, the structure used to save stack
         traces in the kernel had a fixed size of 8. This structure is
         exported to user space via the tracing format file. A change was made
         to allow more than 8 functions to be recorded, and user space now
         uses the size field to know how many functions are actually in the
         stack.
      
         But the structure still has size of 8 (even though it points into the
         ring buffer that has the required amount allocated to hold a full
         stack.
      
         This was fine until the fortifier noticed that the
         memcpy(&entry->caller, stack, size) was greater than the 8 functions
         and would complain at runtime about it.
      
         Hide this by using a pointer to the stack location on the ring buffer
         instead of using the address of the entry structure caller field.
      
       - Fix a deadloop in reading trace_pipe that was caused by a mismatch
         between ring_buffer_empty() returning false which then asked to read
         the data, but the read code uses rb_num_of_entries() that returned
         zero, and causing a infinite "retry".
      
       - Fix a warning caused by not using all pages allocated to store ftrace
         functions, where this can happen if the linker inserts a bunch of
         "NULL" entries, causing the accounting of how many pages needed to be
         off.
      
       - Fix histogram synthetic event crashing when the start event is
         removed and the end event is still using a variable from it
      
       - Fix memory leak in freeing iter->temp in tracing_release_pipe()
      
      * tag 'trace-v6.5-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing: Fix memory leak of iter->temp when reading trace_pipe
        tracing/histograms: Add histograms to hist_vars if they have referenced variables
        tracing: Stop FORTIFY_SOURCE complaining about stack trace caller
        ftrace: Fix possible warning on checking all pages used in ftrace_process_locs()
        ring-buffer: Fix deadloop issue on reading trace_pipe
        tracing: arm64: Avoid missing-prototype warnings
        selftests/user_events: Test struct size match cases
        tracing/user_events: Fix struct arg size match check
        x86/ftrace: Remove unsued extern declaration ftrace_regs_caller_ret()
        arm64: ftrace: Add direct call trampoline samples support
        samples: ftrace: Save required argument registers in sample trampolines
      ebc27aac
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 15999328
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
      
       - a cleanup of the Xen related ELF-notes
      
       - a fix for virtio handling in Xen dom0 when running Xen in a VM
      
      * tag 'for-linus-6.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/virtio: Fix NULL deref when a bridge of PCI root bus has no parent
        x86/Xen: tidy xen-head.S
      15999328
    • Linus Torvalds's avatar
      Merge tag 'sh-for-v6.5-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux · 9350cd01
      Linus Torvalds authored
      Pull sh fixes from John Paul Adrian Glaubitz:
       "The sh updates introduced multiple regressions.
      
        In particular, the change a8ac2961 ("sh: Avoid using IRQ0 on SH3
        and SH4") causes several boards to hang during boot due to incorrect
        IRQ numbers.
      
        Geert Uytterhoeven has contributed patches that handle the virq offset
        in the IRQ code for the dreamcast, highlander and r2d boards while
        Artur Rojek has contributed a patch which handles the virq offset for
        the hd64461 companion chip"
      
      * tag 'sh-for-v6.5-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
        sh: hd64461: Handle virq offset for offchip IRQ base and HD64461 IRQ
        sh: mach-dreamcast: Handle virq offset in cascaded IRQ demux
        sh: mach-highlander: Handle virq offset in cascaded IRL demux
        sh: mach-r2d: Handle virq offset in cascaded IRL demux
      9350cd01
    • Zheng Yejian's avatar
      tracing: Fix memory leak of iter->temp when reading trace_pipe · d5a82189
      Zheng Yejian authored
      kmemleak reports:
        unreferenced object 0xffff88814d14e200 (size 256):
          comm "cat", pid 336, jiffies 4294871818 (age 779.490s)
          hex dump (first 32 bytes):
            04 00 01 03 00 00 00 00 08 00 00 00 00 00 00 00  ................
            0c d8 c8 9b ff ff ff ff 04 5a ca 9b ff ff ff ff  .........Z......
          backtrace:
            [<ffffffff9bdff18f>] __kmalloc+0x4f/0x140
            [<ffffffff9bc9238b>] trace_find_next_entry+0xbb/0x1d0
            [<ffffffff9bc9caef>] trace_print_lat_context+0xaf/0x4e0
            [<ffffffff9bc94490>] print_trace_line+0x3e0/0x950
            [<ffffffff9bc95499>] tracing_read_pipe+0x2d9/0x5a0
            [<ffffffff9bf03a43>] vfs_read+0x143/0x520
            [<ffffffff9bf04c2d>] ksys_read+0xbd/0x160
            [<ffffffff9d0f0edf>] do_syscall_64+0x3f/0x90
            [<ffffffff9d2000aa>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      
      when reading file 'trace_pipe', 'iter->temp' is allocated or relocated
      in trace_find_next_entry() but not freed before 'trace_pipe' is closed.
      
      To fix it, free 'iter->temp' in tracing_release_pipe().
      
      Link: https://lore.kernel.org/linux-trace-kernel/20230713141435.1133021-1-zhengyejian1@huawei.com
      
      Cc: stable@vger.kernel.org
      Fixes: ff895103 ("tracing: Save off entry when peeking at next entry")
      Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      d5a82189