1. 01 May, 2024 8 commits
  2. 29 Apr, 2024 13 commits
  3. 27 Apr, 2024 1 commit
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · b2ff42c6
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2024-04-26
      
      We've added 12 non-merge commits during the last 22 day(s) which contain
      a total of 14 files changed, 168 insertions(+), 72 deletions(-).
      
      The main changes are:
      
      1) Fix BPF_PROBE_MEM in verifier and JIT to skip loads from vsyscall page,
         from Puranjay Mohan.
      
      2) Fix a crash in XDP with devmap broadcast redirect when the latter map
         is in process of being torn down, from Toke Høiland-Jørgensen.
      
      3) Fix arm64 and riscv64 BPF JITs to properly clear start time for BPF
         program runtime stats, from Xu Kuohai.
      
      4) Fix a sockmap KCSAN-reported data race in sk_psock_skb_ingress_enqueue,
          from Jason Xing.
      
      5) Fix BPF verifier error message in resolve_pseudo_ldimm64,
         from Anton Protopopov.
      
      6) Fix missing DEBUG_INFO_BTF_MODULES Kconfig menu item,
         from Andrii Nakryiko.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        selftests/bpf: Test PROBE_MEM of VSYSCALL_ADDR on x86-64
        bpf, x86: Fix PROBE_MEM runtime load check
        bpf: verifier: prevent userspace memory access
        xdp: use flags field to disambiguate broadcast redirect
        arm32, bpf: Reimplement sign-extension mov instruction
        riscv, bpf: Fix incorrect runtime stats
        bpf, arm64: Fix incorrect runtime stats
        bpf: Fix a verifier verbose message
        bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue
        MAINTAINERS: bpf: Add Lehui and Puranjay as riscv64 reviewers
        MAINTAINERS: Update email address for Puranjay Mohan
        bpf, kconfig: Fix DEBUG_INFO_BTF_MODULES Kconfig definition
      ====================
      
      Link: https://lore.kernel.org/r/20240426224248.26197-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b2ff42c6
  4. 26 Apr, 2024 11 commits
    • David Howells's avatar
      Fix a potential infinite loop in extract_user_to_sg() · 6a30653b
      David Howells authored
      Fix extract_user_to_sg() so that it will break out of the loop if
      iov_iter_extract_pages() returns 0 rather than looping around forever.
      
      [Note that I've included two fixes lines as the function got moved to a
      different file and renamed]
      
      Fixes: 85dd2c8f ("netfs: Add a function to extract a UBUF or IOVEC into a BVEC iterator")
      Fixes: f5f82cd1 ("Move netfs_extract_iter_to_sg() to lib/scatterlist.c")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: Steve French <sfrench@samba.org>
      cc: Herbert Xu <herbert@gondor.apana.org.au>
      cc: netfs@lists.linux.dev
      Link: https://lore.kernel.org/r/1967121.1714034372@warthog.procyon.org.ukSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6a30653b
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-prevent-userspace-memory-access' · a86538a2
      Alexei Starovoitov authored
      Puranjay Mohan says:
      
      ====================
      bpf: prevent userspace memory access
      
      V5: https://lore.kernel.org/bpf/20240324185356.59111-1-puranjay12@gmail.com/
      Changes in V6:
      - Disable the verifier's instrumentation in x86-64 and update the JIT to
        take care of vsyscall page in addition to userspace addresses.
      - Update bpf_testmod to test for vsyscall addresses.
      
      V4: https://lore.kernel.org/bpf/20240321124640.8870-1-puranjay12@gmail.com/
      Changes in V5:
      - Use TASK_SIZE_MAX + PAGE_SIZE, VSYSCALL_ADDR as userspace boundary in
        x86-64 JIT.
      - Added Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
      
      V3: https://lore.kernel.org/bpf/20240321120842.78983-1-puranjay12@gmail.com/
      Changes in V4:
      - Disable this feature on architectures that don't define
        CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.
      - By doing the above, we don't need anything explicitly for s390x.
      
      V2: https://lore.kernel.org/bpf/20240321101058.68530-1-puranjay12@gmail.com/
      Changes in V3:
      - Return 0 from bpf_arch_uaddress_limit() in disabled case because it
        returns u64.
      - Modify the check in verifier to no do instrumentation when uaddress_limit
        is 0.
      
      V1: https://lore.kernel.org/bpf/20240320105436.4781-1-puranjay12@gmail.com/
      Changes in V2:
      - Disable this feature on s390x.
      
      With BPF_PROBE_MEM, BPF allows de-referencing an untrusted pointer. To
      thwart invalid memory accesses, the JITs add an exception table entry for
      all such accesses. But in case the src_reg + offset is a userspace address,
      the BPF program might read that memory if the user has mapped it.
      
      x86-64 JIT already instruments the BPF_PROBE_MEM based loads with checks to
      skip loads from userspace addresses, but is doesn't check for vsyscall page
      because it falls in the kernel address space but is considered a userspace
      page. The second patch in this series fixes the x86-64 JIT to also skip
      loads from the vsyscall page. The last patch updates the bpf_testmod so
      this address can be checked as part of the selftests.
      
      Other architectures don't have the complexity of the vsyscall address and
      just need to skip loads from the userspace. To make this more scalable and
      robust, the verifier is updated in the first patch to instrument
      BPF_PROBE_MEM to skip loads from the userspace addresses.
      ====================
      
      Link: https://lore.kernel.org/r/20240424100210.11982-1-puranjay@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a86538a2
    • Puranjay Mohan's avatar
      selftests/bpf: Test PROBE_MEM of VSYSCALL_ADDR on x86-64 · 7cd6750d
      Puranjay Mohan authored
      The vsyscall is a legacy API for fast execution of system calls. It maps
      a page at address VSYSCALL_ADDR into the userspace program. This address
      is in the top 10MB of the address space:
      
      ffffffffff600000 - ffffffffff600fff |    4 kB | legacy vsyscall ABI
      
      The last commit fixes the x86-64 BPF JIT to skip accessing addresses in
      this memory region. Add this address to bpf_testmod_return_ptr() so we
      can make sure that it is fixed.
      
      After this change and without the previous commit, subprogs_extable
      selftest will crash the kernel.
      Signed-off-by: default avatarPuranjay Mohan <puranjay@kernel.org>
      Link: https://lore.kernel.org/r/20240424100210.11982-4-puranjay@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7cd6750d
    • Puranjay Mohan's avatar
      bpf, x86: Fix PROBE_MEM runtime load check · b599d7d2
      Puranjay Mohan authored
      When a load is marked PROBE_MEM - e.g. due to PTR_UNTRUSTED access - the
      address being loaded from is not necessarily valid. The BPF jit sets up
      exception handlers for each such load which catch page faults and 0 out
      the destination register.
      
      If the address for the load is outside kernel address space, the load
      will escape the exception handling and crash the kernel. To prevent this
      from happening, the emits some instruction to verify that addr is > end
      of userspace addresses.
      
      x86 has a legacy vsyscall ABI where a page at address 0xffffffffff600000
      is mapped with user accessible permissions. The addresses in this page
      are considered userspace addresses by the fault handler. Therefore, a
      BPF program accessing this page will crash the kernel.
      
      This patch fixes the runtime checks to also check that the PROBE_MEM
      address is below VSYSCALL_ADDR.
      
      Example BPF program:
      
       SEC("fentry/tcp_v4_connect")
       int BPF_PROG(fentry_tcp_v4_connect, struct sock *sk)
       {
      	*(volatile unsigned long *)&sk->sk_tsq_flags;
      	return 0;
       }
      
      BPF Assembly:
      
       0: (79) r1 = *(u64 *)(r1 +0)
       1: (79) r1 = *(u64 *)(r1 +344)
       2: (b7) r0 = 0
       3: (95) exit
      
      			       x86-64 JIT
      			       ==========
      
                  BEFORE                                    AFTER
      	    ------                                    -----
      
       0:   nopl   0x0(%rax,%rax,1)             0:   nopl   0x0(%rax,%rax,1)
       5:   xchg   %ax,%ax                      5:   xchg   %ax,%ax
       7:   push   %rbp                         7:   push   %rbp
       8:   mov    %rsp,%rbp                    8:   mov    %rsp,%rbp
       b:   mov    0x0(%rdi),%rdi               b:   mov    0x0(%rdi),%rdi
      -------------------------------------------------------------------------------
       f:   movabs $0x100000000000000,%r11      f:   movabs $0xffffffffff600000,%r10
      19:   add    $0x2a0,%rdi                 19:   mov    %rdi,%r11
      20:   cmp    %r11,%rdi                   1c:   add    $0x2a0,%r11
      23:   jae    0x0000000000000029          23:   sub    %r10,%r11
      25:   xor    %edi,%edi                   26:   movabs $0x100000000a00000,%r10
      27:   jmp    0x000000000000002d          30:   cmp    %r10,%r11
      29:   mov    0x0(%rdi),%rdi              33:   ja     0x0000000000000039
      --------------------------------\        35:   xor    %edi,%edi
      2d:   xor    %eax,%eax           \       37:   jmp    0x0000000000000040
      2f:   leave                       \      39:   mov    0x2a0(%rdi),%rdi
      30:   ret                          \--------------------------------------------
                                               40:   xor    %eax,%eax
                                               42:   leave
                                               43:   ret
      Signed-off-by: default avatarPuranjay Mohan <puranjay@kernel.org>
      Link: https://lore.kernel.org/r/20240424100210.11982-3-puranjay@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b599d7d2
    • Puranjay Mohan's avatar
      bpf: verifier: prevent userspace memory access · 66e13b61
      Puranjay Mohan authored
      With BPF_PROBE_MEM, BPF allows de-referencing an untrusted pointer. To
      thwart invalid memory accesses, the JITs add an exception table entry
      for all such accesses. But in case the src_reg + offset is a userspace
      address, the BPF program might read that memory if the user has
      mapped it.
      
      Make the verifier add guard instructions around such memory accesses and
      skip the load if the address falls into the userspace region.
      
      The JITs need to implement bpf_arch_uaddress_limit() to define where
      the userspace addresses end for that architecture or TASK_SIZE is taken
      as default.
      
      The implementation is as follows:
      
      REG_AX =  SRC_REG
      if(offset)
      	REG_AX += offset;
      REG_AX >>= 32;
      if (REG_AX <= (uaddress_limit >> 32))
      	DST_REG = 0;
      else
      	DST_REG = *(size *)(SRC_REG + offset);
      
      Comparing just the upper 32 bits of the load address with the upper
      32 bits of uaddress_limit implies that the values are being aligned down
      to a 4GB boundary before comparison.
      
      The above means that all loads with address <= uaddress_limit + 4GB are
      skipped. This is acceptable because there is a large hole (much larger
      than 4GB) between userspace and kernel space memory, therefore a
      correctly functioning BPF program should not access this 4GB memory
      above the userspace.
      
      Let's analyze what this patch does to the following fentry program
      dereferencing an untrusted pointer:
      
        SEC("fentry/tcp_v4_connect")
        int BPF_PROG(fentry_tcp_v4_connect, struct sock *sk)
        {
                      *(volatile long *)sk;
                      return 0;
        }
      
          BPF Program before              |           BPF Program after
          ------------------              |           -----------------
      
        0: (79) r1 = *(u64 *)(r1 +0)          0: (79) r1 = *(u64 *)(r1 +0)
        -----------------------------------------------------------------------
        1: (79) r1 = *(u64 *)(r1 +0) --\      1: (bf) r11 = r1
        ----------------------------\   \     2: (77) r11 >>= 32
        2: (b7) r0 = 0               \   \    3: (b5) if r11 <= 0x8000 goto pc+2
        3: (95) exit                  \   \-> 4: (79) r1 = *(u64 *)(r1 +0)
                                       \      5: (05) goto pc+1
                                        \     6: (b7) r1 = 0
                                         \--------------------------------------
                                              7: (b7) r0 = 0
                                              8: (95) exit
      
      As you can see from above, in the best case (off=0), 5 extra instructions
      are emitted.
      
      Now, we analyze the same program after it has gone through the JITs of
      ARM64 and RISC-V architectures. We follow the single load instruction
      that has the untrusted pointer and see what instrumentation has been
      added around it.
      
                                      x86-64 JIT
                                      ==========
           JIT's Instrumentation
                (upstream)
           ---------------------
      
         0:   nopl   0x0(%rax,%rax,1)
         5:   xchg   %ax,%ax
         7:   push   %rbp
         8:   mov    %rsp,%rbp
         b:   mov    0x0(%rdi),%rdi
        ---------------------------------
         f:   movabs $0x800000000000,%r11
        19:   cmp    %r11,%rdi
        1c:   jb     0x000000000000002a
        1e:   mov    %rdi,%r11
        21:   add    $0x0,%r11
        28:   jae    0x000000000000002e
        2a:   xor    %edi,%edi
        2c:   jmp    0x0000000000000032
        2e:   mov    0x0(%rdi),%rdi
        ---------------------------------
        32:   xor    %eax,%eax
        34:   leave
        35:   ret
      
      The x86-64 JIT already emits some instructions to protect against user
      memory access. This patch doesn't make any changes for the x86-64 JIT.
      
                                        ARM64 JIT
                                        =========
      
              No Intrumentation                       Verifier's Instrumentation
                 (upstream)                                  (This patch)
              -----------------                       --------------------------
      
         0:   add     x9, x30, #0x0                0:   add     x9, x30, #0x0
         4:   nop                                  4:   nop
         8:   paciasp                              8:   paciasp
         c:   stp     x29, x30, [sp, #-16]!        c:   stp     x29, x30, [sp, #-16]!
        10:   mov     x29, sp                     10:   mov     x29, sp
        14:   stp     x19, x20, [sp, #-16]!       14:   stp     x19, x20, [sp, #-16]!
        18:   stp     x21, x22, [sp, #-16]!       18:   stp     x21, x22, [sp, #-16]!
        1c:   stp     x25, x26, [sp, #-16]!       1c:   stp     x25, x26, [sp, #-16]!
        20:   stp     x27, x28, [sp, #-16]!       20:   stp     x27, x28, [sp, #-16]!
        24:   mov     x25, sp                     24:   mov     x25, sp
        28:   mov     x26, #0x0                   28:   mov     x26, #0x0
        2c:   sub     x27, x25, #0x0              2c:   sub     x27, x25, #0x0
        30:   sub     sp, sp, #0x0                30:   sub     sp, sp, #0x0
        34:   ldr     x0, [x0]                    34:   ldr     x0, [x0]
      --------------------------------------------------------------------------------
        38:   ldr     x0, [x0] ----------\        38:   add     x9, x0, #0x0
      -----------------------------------\\       3c:   lsr     x9, x9, #32
        3c:   mov     x7, #0x0            \\      40:   cmp     x9, #0x10, lsl #12
        40:   mov     sp, sp               \\     44:   b.ls    0x0000000000000050
        44:   ldp     x27, x28, [sp], #16   \\--> 48:   ldr     x0, [x0]
        48:   ldp     x25, x26, [sp], #16    \    4c:   b       0x0000000000000054
        4c:   ldp     x21, x22, [sp], #16     \   50:   mov     x0, #0x0
        50:   ldp     x19, x20, [sp], #16      \---------------------------------------
        54:   ldp     x29, x30, [sp], #16         54:   mov     x7, #0x0
        58:   add     x0, x7, #0x0                58:   mov     sp, sp
        5c:   autiasp                             5c:   ldp     x27, x28, [sp], #16
        60:   ret                                 60:   ldp     x25, x26, [sp], #16
        64:   nop                                 64:   ldp     x21, x22, [sp], #16
        68:   ldr     x10, 0x0000000000000070     68:   ldp     x19, x20, [sp], #16
        6c:   br      x10                         6c:   ldp     x29, x30, [sp], #16
                                                  70:   add     x0, x7, #0x0
                                                  74:   autiasp
                                                  78:   ret
                                                  7c:   nop
                                                  80:   ldr     x10, 0x0000000000000088
                                                  84:   br      x10
      
      There are 6 extra instructions added in ARM64 in the best case. This will
      become 7 in the worst case (off != 0).
      
                                 RISC-V JIT (RISCV_ISA_C Disabled)
                                 ==========
      
              No Intrumentation           Verifier's Instrumentation
                 (upstream)                      (This patch)
              -----------------           --------------------------
      
         0:   nop                            0:   nop
         4:   nop                            4:   nop
         8:   li      a6, 33                 8:   li      a6, 33
         c:   addi    sp, sp, -16            c:   addi    sp, sp, -16
        10:   sd      s0, 8(sp)             10:   sd      s0, 8(sp)
        14:   addi    s0, sp, 16            14:   addi    s0, sp, 16
        18:   ld      a0, 0(a0)             18:   ld      a0, 0(a0)
      ---------------------------------------------------------------
        1c:   ld      a0, 0(a0) --\         1c:   mv      t0, a0
      --------------------------\  \        20:   srli    t0, t0, 32
        20:   li      a5, 0      \  \       24:   lui     t1, 4096
        24:   ld      s0, 8(sp)   \  \      28:   sext.w  t1, t1
        28:   addi    sp, sp, 16   \  \     2c:   bgeu    t1, t0, 12
        2c:   sext.w  a0, a5        \  \--> 30:   ld      a0, 0(a0)
        30:   ret                    \      34:   j       8
                                      \     38:   li      a0, 0
                                       \------------------------------
                                            3c:   li      a5, 0
                                            40:   ld      s0, 8(sp)
                                            44:   addi    sp, sp, 16
                                            48:   sext.w  a0, a5
                                            4c:   ret
      
      There are 7 extra instructions added in RISC-V.
      
      Fixes: 80083428 ("bpf, arm64: Add BPF exception tables")
      Reported-by: default avatarBreno Leitao <leitao@debian.org>
      Suggested-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: default avatarPuranjay Mohan <puranjay12@gmail.com>
      Link: https://lore.kernel.org/r/20240424100210.11982-2-puranjay@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      66e13b61
    • David Bauer's avatar
      net l2tp: drop flow hash on forward · 42f853b4
      David Bauer authored
      Drop the flow-hash of the skb when forwarding to the L2TP netdev.
      
      This avoids the L2TP qdisc from using the flow-hash from the outer
      packet, which is identical for every flow within the tunnel.
      
      This does not affect every platform but is specific for the ethernet
      driver. It depends on the platform including L4 information in the
      flow-hash.
      
      One such example is the Mediatek Filogic MT798x family of networking
      processors.
      
      Fixes: d9e31d17 ("l2tp: Add L2TP ethernet pseudowire support")
      Acked-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarDavid Bauer <mail@david-bauer.net>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240424171110.13701-1-mail@david-bauer.netSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      42f853b4
    • Kuniyuki Iwashima's avatar
      nsh: Restore skb->{protocol,data,mac_header} for outer header in nsh_gso_segment(). · 4b911a96
      Kuniyuki Iwashima authored
      syzbot triggered various splats (see [0] and links) by a crafted GSO
      packet of VIRTIO_NET_HDR_GSO_UDP layering the following protocols:
      
        ETH_P_8021AD + ETH_P_NSH + ETH_P_IPV6 + IPPROTO_UDP
      
      NSH can encapsulate IPv4, IPv6, Ethernet, NSH, and MPLS.  As the inner
      protocol can be Ethernet, NSH GSO handler, nsh_gso_segment(), calls
      skb_mac_gso_segment() to invoke inner protocol GSO handlers.
      
      nsh_gso_segment() does the following for the original skb before
      calling skb_mac_gso_segment()
      
        1. reset skb->network_header
        2. save the original skb->{mac_heaeder,mac_len} in a local variable
        3. pull the NSH header
        4. resets skb->mac_header
        5. set up skb->mac_len and skb->protocol for the inner protocol.
      
      and does the following for the segmented skb
      
        6. set ntohs(ETH_P_NSH) to skb->protocol
        7. push the NSH header
        8. restore skb->mac_header
        9. set skb->mac_header + mac_len to skb->network_header
       10. restore skb->mac_len
      
      There are two problems in 6-7 and 8-9.
      
        (a)
        After 6 & 7, skb->data points to the NSH header, so the outer header
        (ETH_P_8021AD in this case) is stripped when skb is sent out of netdev.
      
        Also, if NSH is encapsulated by NSH + Ethernet (so NSH-Ethernet-NSH),
        skb_pull() in the first nsh_gso_segment() will make skb->data point
        to the middle of the outer NSH or Ethernet header because the Ethernet
        header is not pulled by the second nsh_gso_segment().
      
        (b)
        While restoring skb->{mac_header,network_header} in 8 & 9,
        nsh_gso_segment() does not assume that the data in the linear
        buffer is shifted.
      
        However, udp6_ufo_fragment() could shift the data and change
        skb->mac_header accordingly as demonstrated by syzbot.
      
        If this happens, even the restored skb->mac_header points to
        the middle of the outer header.
      
      It seems nsh_gso_segment() has never worked with outer headers so far.
      
      At the end of nsh_gso_segment(), the outer header must be restored for
      the segmented skb, instead of the NSH header.
      
      To do that, let's calculate the outer header position relatively from
      the inner header and set skb->{data,mac_header,protocol} properly.
      
      [0]:
      BUG: KMSAN: uninit-value in ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:524 [inline]
      BUG: KMSAN: uninit-value in ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:602 [inline]
      BUG: KMSAN: uninit-value in ipvlan_queue_xmit+0xf44/0x16b0 drivers/net/ipvlan/ipvlan_core.c:668
       ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:524 [inline]
       ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:602 [inline]
       ipvlan_queue_xmit+0xf44/0x16b0 drivers/net/ipvlan/ipvlan_core.c:668
       ipvlan_start_xmit+0x5c/0x1a0 drivers/net/ipvlan/ipvlan_main.c:222
       __netdev_start_xmit include/linux/netdevice.h:4989 [inline]
       netdev_start_xmit include/linux/netdevice.h:5003 [inline]
       xmit_one net/core/dev.c:3547 [inline]
       dev_hard_start_xmit+0x244/0xa10 net/core/dev.c:3563
       __dev_queue_xmit+0x33ed/0x51c0 net/core/dev.c:4351
       dev_queue_xmit include/linux/netdevice.h:3171 [inline]
       packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276
       packet_snd net/packet/af_packet.c:3081 [inline]
       packet_sendmsg+0x8aef/0x9f10 net/packet/af_packet.c:3113
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg net/socket.c:745 [inline]
       __sys_sendto+0x735/0xa10 net/socket.c:2191
       __do_sys_sendto net/socket.c:2203 [inline]
       __se_sys_sendto net/socket.c:2199 [inline]
       __x64_sys_sendto+0x125/0x1c0 net/socket.c:2199
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Uninit was created at:
       slab_post_alloc_hook mm/slub.c:3819 [inline]
       slab_alloc_node mm/slub.c:3860 [inline]
       __do_kmalloc_node mm/slub.c:3980 [inline]
       __kmalloc_node_track_caller+0x705/0x1000 mm/slub.c:4001
       kmalloc_reserve+0x249/0x4a0 net/core/skbuff.c:582
       __alloc_skb+0x352/0x790 net/core/skbuff.c:651
       skb_segment+0x20aa/0x7080 net/core/skbuff.c:4647
       udp6_ufo_fragment+0xcab/0x1150 net/ipv6/udp_offload.c:109
       ipv6_gso_segment+0x14be/0x2ca0 net/ipv6/ip6_offload.c:152
       skb_mac_gso_segment+0x3e8/0x760 net/core/gso.c:53
       nsh_gso_segment+0x6f4/0xf70 net/nsh/nsh.c:108
       skb_mac_gso_segment+0x3e8/0x760 net/core/gso.c:53
       __skb_gso_segment+0x4b0/0x730 net/core/gso.c:124
       skb_gso_segment include/net/gso.h:83 [inline]
       validate_xmit_skb+0x107f/0x1930 net/core/dev.c:3628
       __dev_queue_xmit+0x1f28/0x51c0 net/core/dev.c:4343
       dev_queue_xmit include/linux/netdevice.h:3171 [inline]
       packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276
       packet_snd net/packet/af_packet.c:3081 [inline]
       packet_sendmsg+0x8aef/0x9f10 net/packet/af_packet.c:3113
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg net/socket.c:745 [inline]
       __sys_sendto+0x735/0xa10 net/socket.c:2191
       __do_sys_sendto net/socket.c:2203 [inline]
       __se_sys_sendto net/socket.c:2199 [inline]
       __x64_sys_sendto+0x125/0x1c0 net/socket.c:2199
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      CPU: 1 PID: 5101 Comm: syz-executor421 Not tainted 6.8.0-rc5-syzkaller-00297-gf2e367d6 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
      
      Fixes: c411ed85 ("nsh: add GSO support")
      Reported-and-tested-by: syzbot+42a0dc856239de4de60e@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=42a0dc856239de4de60e
      Reported-and-tested-by: syzbot+c298c9f0e46a3c86332b@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=c298c9f0e46a3c86332b
      Link: https://lore.kernel.org/netdev/20240415222041.18537-1-kuniyu@amazon.com/Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240424023549.21862-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4b911a96
    • Jakub Kicinski's avatar
      Merge branch 'ensure-the-copied-buf-is-nul-terminated' · a5b1051a
      Jakub Kicinski authored
      Bui Quang Minh says:
      
      ====================
      Ensure the copied buf is NUL terminated (part)
      
      I found that some drivers contains an out-of-bound read pattern like this
      
      	kern_buf = memdup_user(user_buf, count);
      	...
      	sscanf(kern_buf, ...);
      
      The sscanf can be replaced by some other string-related functions. This
      pattern can lead to out-of-bound read of kern_buf in string-related
      functions.
      
      This series fix the above issue by replacing memdup_user with
      memdup_user_nul.
      
      v1: https://lore.kernel.org/r/20240422-fix-oob-read-v1-0-e02854c30174@gmail.com
      ====================
      
      Link: https://lore.kernel.org/r/20240424-fix-oob-read-v2-0-f1f1b53a10f4@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a5b1051a
    • Bui Quang Minh's avatar
      octeontx2-af: avoid off-by-one read from userspace · f299ee70
      Bui Quang Minh authored
      We try to access count + 1 byte from userspace with memdup_user(buffer,
      count + 1). However, the userspace only provides buffer of count bytes and
      only these count bytes are verified to be okay to access. To ensure the
      copied buffer is NUL terminated, we use memdup_user_nul instead.
      
      Fixes: 3a2eb515 ("octeontx2-af: Fix an off by one in rvu_dbg_qsize_write()")
      Signed-off-by: default avatarBui Quang Minh <minhquangbui99@gmail.com>
      Link: https://lore.kernel.org/r/20240424-fix-oob-read-v2-6-f1f1b53a10f4@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f299ee70
    • Bui Quang Minh's avatar
      bna: ensure the copied buf is NUL terminated · 8c34096c
      Bui Quang Minh authored
      Currently, we allocate a nbytes-sized kernel buffer and copy nbytes from
      userspace to that buffer. Later, we use sscanf on this buffer but we don't
      ensure that the string is terminated inside the buffer, this can lead to
      OOB read when using sscanf. Fix this issue by using memdup_user_nul
      instead of memdup_user.
      
      Fixes: 7afc5dbd ("bna: Add debugfs interface.")
      Signed-off-by: default avatarBui Quang Minh <minhquangbui99@gmail.com>
      Link: https://lore.kernel.org/r/20240424-fix-oob-read-v2-2-f1f1b53a10f4@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8c34096c
    • Bui Quang Minh's avatar
      ice: ensure the copied buf is NUL terminated · 666854ea
      Bui Quang Minh authored
      Currently, we allocate a count-sized kernel buffer and copy count bytes
      from userspace to that buffer. Later, we use sscanf on this buffer but we
      don't ensure that the string is terminated inside the buffer, this can lead
      to OOB read when using sscanf. Fix this issue by using memdup_user_nul
      instead of memdup_user.
      
      Fixes: 96a9a934 ("ice: configure FW logging")
      Fixes: 73671c31 ("ice: enable FW logging")
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarBui Quang Minh <minhquangbui99@gmail.com>
      Link: https://lore.kernel.org/r/20240424-fix-oob-read-v2-1-f1f1b53a10f4@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      666854ea
  5. 25 Apr, 2024 7 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 52afb15e
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from netfilter, wireless and bluetooth.
      
        Nothing major, regression fixes are mostly in drivers, two more of
        those are flowing towards us thru various trees. I wish some of the
        changes went into -rc5, we'll try to keep an eye on frequency of PRs
        from sub-trees.
      
        Also disproportional number of fixes for bugs added in v6.4, strange
        coincidence.
      
        Current release - regressions:
      
         - igc: fix LED-related deadlock on driver unbind
      
         - wifi: mac80211: small fixes to recent clean up of the connection
           process
      
         - Revert "wifi: iwlwifi: bump FW API to 90 for BZ/SC devices", kernel
           doesn't have all the code to deal with that version, yet
      
         - Bluetooth:
             - set power_ctrl_enabled on NULL returned by gpiod_get_optional()
             - qca: fix invalid device address check, again
      
         - eth: ravb: fix registered interrupt names
      
        Current release - new code bugs:
      
         - wifi: mac80211: check EHT/TTLM action frame length
      
        Previous releases - regressions:
      
         - fix sk_memory_allocated_{add|sub} for architectures where
           __this_cpu_{add|sub}* are not IRQ-safe
      
         - dsa: mv88e6xx: fix link setup for 88E6250
      
        Previous releases - always broken:
      
         - ip: validate dev returned from __in_dev_get_rcu(), prevent possible
           null-derefs in a few places
      
         - switch number of for_each_rcu() loops using call_rcu() on the
           iterator to for_each_safe()
      
         - macsec: fix isolation of broadcast traffic in presence of offload
      
         - vxlan: drop packets from invalid source address
      
         - eth: mlxsw: trap and ACL programming fixes
      
         - eth: bnxt: PCIe error recovery fixes, fix counting dropped packets
      
         - Bluetooth:
             - lots of fixes for the command submission rework from v6.4
             - qca: fix NULL-deref on non-serdev suspend
      
        Misc:
      
         - tools: ynl: don't ignore errors in NLMSG_DONE messages"
      
      * tag 'net-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
        af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().
        net: b44: set pause params only when interface is up
        tls: fix lockless read of strp->msg_ready in ->poll
        dpll: fix dpll_pin_on_pin_register() for multiple parent pins
        net: ravb: Fix registered interrupt names
        octeontx2-af: fix the double free in rvu_npc_freemem()
        net: ethernet: ti: am65-cpts: Fix PTPv1 message type on TX packets
        ice: fix LAG and VF lock dependency in ice_reset_vf()
        iavf: Fix TC config comparison with existing adapter TC config
        i40e: Report MFS in decimal base instead of hex
        i40e: Do not use WQ_MEM_RECLAIM flag for workqueue
        net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()
        net/mlx5e: Advertise mlx5 ethernet driver updates sk_buff md_dst for MACsec
        macsec: Detect if Rx skb is macsec-related for offloading devices that update md_dst
        ethernet: Add helper for assigning packet type when dest address does not match device address
        macsec: Enable devices to advertise whether they update sk_buff md_dst during offloads
        net: phy: dp83869: Fix MII mode failure
        netfilter: nf_tables: honor table dormant flag from netdev release event path
        eth: bnxt: fix counting packets discarded due to OOM and netpoll
        igc: Fix LED-related deadlock on driver unbind
        ...
      52afb15e
    • Linus Torvalds's avatar
      Merge tag 'nfsd-6.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · e33c4963
      Linus Torvalds authored
      Pull nfsd fixes from Chuck Lever:
      
       - Revert some backchannel fixes that went into v6.9-rc
      
      * tag 'nfsd-6.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        Revert "NFSD: Convert the callback workqueue to use delayed_work"
        Revert "NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down"
      e33c4963
    • Linus Torvalds's avatar
      Merge tag 'for-linus-2024042501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · f9e02329
      Linus Torvalds authored
      Pull HID fixes from Benjamin Tissoires:
      
       - A couple of i2c-hid fixes (Kenny Levinsen & Nam Cao)
      
       - A config issue with mcp-2221 when CONFIG_IIO is not enabled
         (Abdelrahman Morsy)
      
       - A dev_err fix in intel-ish-hid (Zhang Lixu)
      
       - A couple of mouse fixes for both nintendo and Logitech-dj (Nuno
         Pereira and Yaraslau Furman)
      
       - I'm changing my main kernel email address as it's way simpler for me
         than the Red Hat one (Benjamin Tissoires)
      
      * tag 'for-linus-2024042501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
        HID: mcp-2221: cancel delayed_work only when CONFIG_IIO is enabled
        HID: logitech-dj: allow mice to use all types of reports
        HID: i2c-hid: Revert to await reset ACK before reading report descriptor
        HID: nintendo: Fix N64 controller being identified as mouse
        MAINTAINERS: update Benjamin's email address
        HID: intel-ish-hid: ipc: Fix dev_err usage with uninitialized dev->devc
        HID: i2c-hid: remove I2C_HID_READ_PENDING flag to prevent lock-up
      f9e02329
    • Jakub Kicinski's avatar
      Merge tag 'nf-24-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · e8baa63f
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS fixes for net
      
      The following patchset contains two Netfilter/IPVS fixes for net:
      
      Patch #1 fixes SCTP checksumming for IPVS with gso packets,
      	 from Ismael Luceno.
      
      Patch #2 honor dormant flag from netdev event path to fix a possible
      	 double hook unregistration.
      
      * tag 'nf-24-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: honor table dormant flag from netdev release event path
        ipvs: Fix checksumming on GSO of SCTP packets
      ====================
      
      Link: https://lore.kernel.org/r/20240425090149.1359547-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e8baa63f
    • Kuniyuki Iwashima's avatar
      af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc(). · 1971d13f
      Kuniyuki Iwashima authored
      syzbot reported a lockdep splat regarding unix_gc_lock and
      unix_state_lock().
      
      One is called from recvmsg() for a connected socket, and another
      is called from GC for TCP_LISTEN socket.
      
      So, the splat is false-positive.
      
      Let's add a dedicated lock class for the latter to suppress the splat.
      
      Note that this change is not necessary for net-next.git as the issue
      is only applied to the old GC impl.
      
      [0]:
      WARNING: possible circular locking dependency detected
      6.9.0-rc5-syzkaller-00007-g4d200843 #0 Not tainted
       -----------------------------------------------------
      kworker/u8:1/11 is trying to acquire lock:
      ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
      ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: __unix_gc+0x40e/0xf70 net/unix/garbage.c:302
      
      but task is already holding lock:
      ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
      ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
       -> #1 (unix_gc_lock){+.+.}-{2:2}:
             lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
             __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
             _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
             spin_lock include/linux/spinlock.h:351 [inline]
             unix_notinflight+0x13d/0x390 net/unix/garbage.c:140
             unix_detach_fds net/unix/af_unix.c:1819 [inline]
             unix_destruct_scm+0x221/0x350 net/unix/af_unix.c:1876
             skb_release_head_state+0x100/0x250 net/core/skbuff.c:1188
             skb_release_all net/core/skbuff.c:1200 [inline]
             __kfree_skb net/core/skbuff.c:1216 [inline]
             kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1252
             kfree_skb include/linux/skbuff.h:1262 [inline]
             manage_oob net/unix/af_unix.c:2672 [inline]
             unix_stream_read_generic+0x1125/0x2700 net/unix/af_unix.c:2749
             unix_stream_splice_read+0x239/0x320 net/unix/af_unix.c:2981
             do_splice_read fs/splice.c:985 [inline]
             splice_file_to_pipe+0x299/0x500 fs/splice.c:1295
             do_splice+0xf2d/0x1880 fs/splice.c:1379
             __do_splice fs/splice.c:1436 [inline]
             __do_sys_splice fs/splice.c:1652 [inline]
             __se_sys_splice+0x331/0x4a0 fs/splice.c:1634
             do_syscall_x64 arch/x86/entry/common.c:52 [inline]
             do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
             entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
       -> #0 (&u->lock){+.+.}-{2:2}:
             check_prev_add kernel/locking/lockdep.c:3134 [inline]
             check_prevs_add kernel/locking/lockdep.c:3253 [inline]
             validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
             __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
             lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
             __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
             _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
             spin_lock include/linux/spinlock.h:351 [inline]
             __unix_gc+0x40e/0xf70 net/unix/garbage.c:302
             process_one_work kernel/workqueue.c:3254 [inline]
             process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335
             worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
             kthread+0x2f0/0x390 kernel/kthread.c:388
             ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
             ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(unix_gc_lock);
                                     lock(&u->lock);
                                     lock(unix_gc_lock);
        lock(&u->lock);
      
       *** DEADLOCK ***
      
      3 locks held by kworker/u8:1/11:
       #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline]
       #0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x17c0 kernel/workqueue.c:3335
       #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline]
       #1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x17c0 kernel/workqueue.c:3335
       #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
       #2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261
      
      stack backtrace:
      CPU: 0 PID: 11 Comm: kworker/u8:1 Not tainted 6.9.0-rc5-syzkaller-00007-g4d200843 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Workqueue: events_unbound __unix_gc
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
       check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187
       check_prev_add kernel/locking/lockdep.c:3134 [inline]
       check_prevs_add kernel/locking/lockdep.c:3253 [inline]
       validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
       __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
       __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
       _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
       spin_lock include/linux/spinlock.h:351 [inline]
       __unix_gc+0x40e/0xf70 net/unix/garbage.c:302
       process_one_work kernel/workqueue.c:3254 [inline]
       process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335
       worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
       kthread+0x2f0/0x390 kernel/kthread.c:388
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
       </TASK>
      
      Fixes: 47d8ac01 ("af_unix: Fix garbage collector racing against connect()")
      Reported-and-tested-by: syzbot+fa379358c28cc87cc307@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=fa379358c28cc87cc307Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240424170443.9832-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1971d13f
    • Peter Münster's avatar
      net: b44: set pause params only when interface is up · e3eb7dd4
      Peter Münster authored
      b44_free_rings() accesses b44::rx_buffers (and ::tx_buffers)
      unconditionally, but b44::rx_buffers is only valid when the
      device is up (they get allocated in b44_open(), and deallocated
      again in b44_close()), any other time these are just a NULL pointers.
      
      So if you try to change the pause params while the network interface
      is disabled/administratively down, everything explodes (which likely
      netifd tries to do).
      
      Link: https://github.com/openwrt/openwrt/issues/13789
      Fixes: 1da177e4 (Linux-2.6.12-rc2)
      Cc: stable@vger.kernel.org
      Reported-by: default avatarPeter Münster <pm@a16n.net>
      Suggested-by: default avatarJonas Gorski <jonas.gorski@gmail.com>
      Signed-off-by: default avatarVaclav Svoboda <svoboda@neng.cz>
      Tested-by: default avatarPeter Münster <pm@a16n.net>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarPeter Münster <pm@a16n.net>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Link: https://lore.kernel.org/r/87y192oolj.fsf@a16n.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e3eb7dd4
    • Sabrina Dubroca's avatar
      tls: fix lockless read of strp->msg_ready in ->poll · 0844370f
      Sabrina Dubroca authored
      tls_sk_poll is called without locking the socket, and needs to read
      strp->msg_ready (via tls_strp_msg_ready). Convert msg_ready to a bool
      and use READ_ONCE/WRITE_ONCE where needed. The remaining reads are
      only performed when the socket is locked.
      
      Fixes: 121dca78 ("tls: suppress wakeups unless we have a full record")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Link: https://lore.kernel.org/r/0b7ee062319037cf86af6b317b3d72f7bfcd2e97.1713797701.git.sd@queasysnail.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0844370f