1. 18 Jun, 2024 6 commits
    • Yue Haibing's avatar
      netns: Make get_net_ns() handle zero refcount net · ff960f9d
      Yue Haibing authored
      Syzkaller hit a warning:
      refcount_t: addition on 0; use-after-free.
      WARNING: CPU: 3 PID: 7890 at lib/refcount.c:25 refcount_warn_saturate+0xdf/0x1d0
      Modules linked in:
      CPU: 3 PID: 7890 Comm: tun Not tainted 6.10.0-rc3-00100-gcaa4f9578aba-dirty #310
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
      RIP: 0010:refcount_warn_saturate+0xdf/0x1d0
      Code: 41 49 04 31 ff 89 de e8 9f 1e cd fe 84 db 75 9c e8 76 26 cd fe c6 05 b6 41 49 04 01 90 48 c7 c7 b8 8e 25 86 e8 d2 05 b5 fe 90 <0f> 0b 90 90 e9 79 ff ff ff e8 53 26 cd fe 0f b6 1
      RSP: 0018:ffff8881067b7da0 EFLAGS: 00010286
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff811c72ac
      RDX: ffff8881026a2140 RSI: ffffffff811c72b5 RDI: 0000000000000001
      RBP: ffff8881067b7db0 R08: 0000000000000000 R09: 205b5d3730353139
      R10: 0000000000000000 R11: 205d303938375420 R12: ffff8881086500c4
      R13: ffff8881086500c4 R14: ffff8881086500b0 R15: ffff888108650040
      FS:  00007f5b2961a4c0(0000) GS:ffff88823bd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055d7ed36fd18 CR3: 00000001482f6000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       ? show_regs+0xa3/0xc0
       ? __warn+0xa5/0x1c0
       ? refcount_warn_saturate+0xdf/0x1d0
       ? report_bug+0x1fc/0x2d0
       ? refcount_warn_saturate+0xdf/0x1d0
       ? handle_bug+0xa1/0x110
       ? exc_invalid_op+0x3c/0xb0
       ? asm_exc_invalid_op+0x1f/0x30
       ? __warn_printk+0xcc/0x140
       ? __warn_printk+0xd5/0x140
       ? refcount_warn_saturate+0xdf/0x1d0
       get_net_ns+0xa4/0xc0
       ? __pfx_get_net_ns+0x10/0x10
       open_related_ns+0x5a/0x130
       __tun_chr_ioctl+0x1616/0x2370
       ? __sanitizer_cov_trace_switch+0x58/0xa0
       ? __sanitizer_cov_trace_const_cmp2+0x1c/0x30
       ? __pfx_tun_chr_ioctl+0x10/0x10
       tun_chr_ioctl+0x2f/0x40
       __x64_sys_ioctl+0x11b/0x160
       x64_sys_call+0x1211/0x20d0
       do_syscall_64+0x9e/0x1d0
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7f5b28f165d7
      Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 8
      RSP: 002b:00007ffc2b59c5e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5b28f165d7
      RDX: 0000000000000000 RSI: 00000000000054e3 RDI: 0000000000000003
      RBP: 00007ffc2b59c650 R08: 00007f5b291ed8c0 R09: 00007f5b2961a4c0
      R10: 0000000029690010 R11: 0000000000000246 R12: 0000000000400730
      R13: 00007ffc2b59cf40 R14: 0000000000000000 R15: 0000000000000000
       </TASK>
      Kernel panic - not syncing: kernel: panic_on_warn set ...
      
      This is trigger as below:
                ns0                                    ns1
      tun_set_iff() //dev is tun0
         tun->dev = dev
      //ip link set tun0 netns ns1
                                             put_net() //ref is 0
      __tun_chr_ioctl() //TUNGETDEVNETNS
         net = dev_net(tun->dev);
         open_related_ns(&net->ns, get_net_ns); //ns1
           get_net_ns()
              get_net() //addition on 0
      
      Use maybe_get_net() in get_net_ns in case net's ref is zero to fix this
      
      Fixes: 0c3e0e3b ("tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of tun device")
      Signed-off-by: default avatarYue Haibing <yuehaibing@huawei.com>
      Link: https://lore.kernel.org/r/20240614131302.2698509-1-yuehaibing@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ff960f9d
    • Eric Dumazet's avatar
      xfrm6: check ip6_dst_idev() return value in xfrm6_get_saddr() · d4640105
      Eric Dumazet authored
      ip6_dst_idev() can return NULL, xfrm6_get_saddr() must act accordingly.
      
      syzbot reported:
      
      Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI
      KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
      CPU: 1 PID: 12 Comm: kworker/u8:1 Not tainted 6.10.0-rc2-syzkaller-00383-gb8481381 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
      Workqueue: wg-kex-wg1 wg_packet_handshake_send_worker
       RIP: 0010:xfrm6_get_saddr+0x93/0x130 net/ipv6/xfrm6_policy.c:64
      Code: df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 97 00 00 00 4c 8b ab d8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 86 00 00 00 4d 8b 6d 00 e8 ca 13 47 01 48 b8 00
      RSP: 0018:ffffc90000117378 EFLAGS: 00010246
      RAX: dffffc0000000000 RBX: ffff88807b079dc0 RCX: ffffffff89a0d6d7
      RDX: 0000000000000000 RSI: ffffffff89a0d6e9 RDI: ffff88807b079e98
      RBP: ffff88807ad73248 R08: 0000000000000007 R09: fffffffffffff000
      R10: ffff88807b079dc0 R11: 0000000000000007 R12: ffffc90000117480
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f4586d00440 CR3: 0000000079042000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
        xfrm_get_saddr net/xfrm/xfrm_policy.c:2452 [inline]
        xfrm_tmpl_resolve_one net/xfrm/xfrm_policy.c:2481 [inline]
        xfrm_tmpl_resolve+0xa26/0xf10 net/xfrm/xfrm_policy.c:2541
        xfrm_resolve_and_create_bundle+0x140/0x2570 net/xfrm/xfrm_policy.c:2835
        xfrm_bundle_lookup net/xfrm/xfrm_policy.c:3070 [inline]
        xfrm_lookup_with_ifid+0x4d1/0x1e60 net/xfrm/xfrm_policy.c:3201
        xfrm_lookup net/xfrm/xfrm_policy.c:3298 [inline]
        xfrm_lookup_route+0x3b/0x200 net/xfrm/xfrm_policy.c:3309
        ip6_dst_lookup_flow+0x15c/0x1d0 net/ipv6/ip6_output.c:1256
        send6+0x611/0xd20 drivers/net/wireguard/socket.c:139
        wg_socket_send_skb_to_peer+0xf9/0x220 drivers/net/wireguard/socket.c:178
        wg_socket_send_buffer_to_peer+0x12b/0x190 drivers/net/wireguard/socket.c:200
        wg_packet_send_handshake_initiation+0x227/0x360 drivers/net/wireguard/send.c:40
        wg_packet_handshake_send_worker+0x1c/0x30 drivers/net/wireguard/send.c:51
        process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231
        process_scheduled_works kernel/workqueue.c:3312 [inline]
        worker_thread+0x6c8/0xf70 kernel/workqueue.c:3393
        kthread+0x2c1/0x3a0 kernel/kthread.c:389
        ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
        ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20240615154231.234442-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d4640105
    • Eric Dumazet's avatar
      ipv6: prevent possible NULL dereference in rt6_probe() · b86762db
      Eric Dumazet authored
      syzbot caught a NULL dereference in rt6_probe() [1]
      
      Bail out if  __in6_dev_get() returns NULL.
      
      [1]
      Oops: general protection fault, probably for non-canonical address 0xdffffc00000000cb: 0000 [#1] PREEMPT SMP KASAN PTI
      KASAN: null-ptr-deref in range [0x0000000000000658-0x000000000000065f]
      CPU: 1 PID: 22444 Comm: syz-executor.0 Not tainted 6.10.0-rc2-syzkaller-00383-gb8481381 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
       RIP: 0010:rt6_probe net/ipv6/route.c:656 [inline]
       RIP: 0010:find_match+0x8c4/0xf50 net/ipv6/route.c:758
      Code: 14 fd f7 48 8b 85 38 ff ff ff 48 c7 45 b0 00 00 00 00 48 8d b8 5c 06 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 19
      RSP: 0018:ffffc900034af070 EFLAGS: 00010203
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90004521000
      RDX: 00000000000000cb RSI: ffffffff8990d0cd RDI: 000000000000065c
      RBP: ffffc900034af150 R08: 0000000000000005 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000002 R12: 000000000000000a
      R13: 1ffff92000695e18 R14: ffff8880244a1d20 R15: 0000000000000000
      FS:  00007f4844a5a6c0(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b31b27000 CR3: 000000002d42c000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
        rt6_nh_find_match+0xfa/0x1a0 net/ipv6/route.c:784
        nexthop_for_each_fib6_nh+0x26d/0x4a0 net/ipv4/nexthop.c:1496
        __find_rr_leaf+0x6e7/0xe00 net/ipv6/route.c:825
        find_rr_leaf net/ipv6/route.c:853 [inline]
        rt6_select net/ipv6/route.c:897 [inline]
        fib6_table_lookup+0x57e/0xa30 net/ipv6/route.c:2195
        ip6_pol_route+0x1cd/0x1150 net/ipv6/route.c:2231
        pol_lookup_func include/net/ip6_fib.h:616 [inline]
        fib6_rule_lookup+0x386/0x720 net/ipv6/fib6_rules.c:121
        ip6_route_output_flags_noref net/ipv6/route.c:2639 [inline]
        ip6_route_output_flags+0x1d0/0x640 net/ipv6/route.c:2651
        ip6_dst_lookup_tail.constprop.0+0x961/0x1760 net/ipv6/ip6_output.c:1147
        ip6_dst_lookup_flow+0x99/0x1d0 net/ipv6/ip6_output.c:1250
        rawv6_sendmsg+0xdab/0x4340 net/ipv6/raw.c:898
        inet_sendmsg+0x119/0x140 net/ipv4/af_inet.c:853
        sock_sendmsg_nosec net/socket.c:730 [inline]
        __sock_sendmsg net/socket.c:745 [inline]
        sock_write_iter+0x4b8/0x5c0 net/socket.c:1160
        new_sync_write fs/read_write.c:497 [inline]
        vfs_write+0x6b6/0x1140 fs/read_write.c:590
        ksys_write+0x1f8/0x260 fs/read_write.c:643
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      Fixes: 52e16356 ("[IPV6]: ROUTE: Add router_probe_interval sysctl.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarJason Xing <kerneljasonxing@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20240615151454.166404-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b86762db
    • Eric Dumazet's avatar
      ipv6: prevent possible NULL deref in fib6_nh_init() · 2eab4543
      Eric Dumazet authored
      syzbot reminds us that in6_dev_get() can return NULL.
      
      fib6_nh_init()
          ip6_validate_gw(  &idev  )
              ip6_route_check_nh(  idev  )
                  *idev = in6_dev_get(dev); // can be NULL
      
      Oops: general protection fault, probably for non-canonical address 0xdffffc00000000bc: 0000 [#1] PREEMPT SMP KASAN PTI
      KASAN: null-ptr-deref in range [0x00000000000005e0-0x00000000000005e7]
      CPU: 0 PID: 11237 Comm: syz-executor.3 Not tainted 6.10.0-rc2-syzkaller-00249-gbe27b896 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
       RIP: 0010:fib6_nh_init+0x640/0x2160 net/ipv6/route.c:3606
      Code: 00 00 fc ff df 4c 8b 64 24 58 48 8b 44 24 28 4c 8b 74 24 30 48 89 c1 48 89 44 24 28 48 8d 98 e0 05 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 38 84 c0 0f 85 b3 17 00 00 8b 1b 31 ff 89 de e8 b8 8b
      RSP: 0018:ffffc900032775a0 EFLAGS: 00010202
      RAX: 00000000000000bc RBX: 00000000000005e0 RCX: 0000000000000000
      RDX: 0000000000000010 RSI: ffffc90003277a54 RDI: ffff88802b3a08d8
      RBP: ffffc900032778b0 R08: 00000000000002fc R09: 0000000000000000
      R10: 00000000000002fc R11: 0000000000000000 R12: ffff88802b3a08b8
      R13: 1ffff9200064eec8 R14: ffffc90003277a00 R15: dffffc0000000000
      FS:  00007f940feb06c0(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000000245e8000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
        ip6_route_info_create+0x99e/0x12b0 net/ipv6/route.c:3809
        ip6_route_add+0x28/0x160 net/ipv6/route.c:3853
        ipv6_route_ioctl+0x588/0x870 net/ipv6/route.c:4483
        inet6_ioctl+0x21a/0x280 net/ipv6/af_inet6.c:579
        sock_do_ioctl+0x158/0x460 net/socket.c:1222
        sock_ioctl+0x629/0x8e0 net/socket.c:1341
        vfs_ioctl fs/ioctl.c:51 [inline]
        __do_sys_ioctl fs/ioctl.c:907 [inline]
        __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7f940f07cea9
      
      Fixes: 428604fb ("ipv6: do not set routes if disable_ipv6 has been enabled")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20240614082002.26407-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2eab4543
    • Matthieu Baerts (NGI0)'s avatar
      selftests: mptcp: userspace_pm: fixed subtest names · e874557f
      Matthieu Baerts (NGI0) authored
      It is important to have fixed (sub)test names in TAP, because these
      names are used to identify them. If they are not fixed, tracking cannot
      be done.
      
      Some subtests from the userspace_pm selftest were using random numbers
      in their names: the client and server address IDs from $RANDOM, and the
      client port number randomly picked by the kernel when creating the
      connection. These values have been replaced by 'client' and 'server'
      words: that's even more helpful than showing random numbers. Note that
      the addresses IDs are incremented and decremented in the test: +1 or -1
      are then displayed in these cases.
      
      Not to loose info that can be useful for debugging in case of issues,
      these random numbers are now displayed at the beginning of the test.
      
      Fixes: f589234e ("selftests: mptcp: userspace_pm: format subtests results in TAP")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240614-upstream-net-20240614-selftests-mptcp-uspace-pm-fixed-test-names-v1-1-460ad3edb429@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e874557f
    • Eric Dumazet's avatar
      tcp: clear tp->retrans_stamp in tcp_rcv_fastopen_synack() · 9e046bb1
      Eric Dumazet authored
      Some applications were reporting ETIMEDOUT errors on apparently
      good looking flows, according to packet dumps.
      
      We were able to root cause the issue to an accidental setting
      of tp->retrans_stamp in the following scenario:
      
      - client sends TFO SYN with data.
      - server has TFO disabled, ACKs only SYN but not payload.
      - client receives SYNACK covering only SYN.
      - tcp_ack() eats SYN and sets tp->retrans_stamp to 0.
      - tcp_rcv_fastopen_synack() calls tcp_xmit_retransmit_queue()
        to retransmit TFO payload w/o SYN, sets tp->retrans_stamp to "now",
        but we are not in any loss recovery state.
      - TFO payload is ACKed.
      - we are not in any loss recovery state, and don't see any dupacks,
        so we don't get to any code path that clears tp->retrans_stamp.
      - tp->retrans_stamp stays non-zero for the lifetime of the connection.
      - after first RTO, tcp_clamp_rto_to_user_timeout() clamps second RTO
        to 1 jiffy due to bogus tp->retrans_stamp.
      - on clamped RTO with non-zero icsk_retransmits, retransmits_timed_out()
        sets start_ts from tp->retrans_stamp from TFO payload retransmit
        hours/days ago, and computes bogus long elapsed time for loss recovery,
        and suffers ETIMEDOUT early.
      
      Fixes: a7abf3cd ("tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()")
      CC: stable@vger.kernel.org
      Co-developed-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Co-developed-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240614130615.396837-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9e046bb1
  2. 17 Jun, 2024 1 commit
    • Gavrilov Ilia's avatar
      netrom: Fix a memory leak in nr_heartbeat_expiry() · 0b913024
      Gavrilov Ilia authored
      syzbot reported a memory leak in nr_create() [0].
      
      Commit 409db27e ("netrom: Fix use-after-free of a listening socket.")
      added sock_hold() to the nr_heartbeat_expiry() function, where
      a) a socket has a SOCK_DESTROY flag or
      b) a listening socket has a SOCK_DEAD flag.
      
      But in the case "a," when the SOCK_DESTROY flag is set, the file descriptor
      has already been closed and the nr_release() function has been called.
      So it makes no sense to hold the reference count because no one will
      call another nr_destroy_socket() and put it as in the case "b."
      
      nr_connect
        nr_establish_data_link
          nr_start_heartbeat
      
      nr_release
        switch (nr->state)
        case NR_STATE_3
          nr->state = NR_STATE_2
          sock_set_flag(sk, SOCK_DESTROY);
      
                              nr_rx_frame
                                nr_process_rx_frame
                                  switch (nr->state)
                                  case NR_STATE_2
                                    nr_state2_machine()
                                      nr_disconnect()
                                        nr_sk(sk)->state = NR_STATE_0
                                        sock_set_flag(sk, SOCK_DEAD)
      
                              nr_heartbeat_expiry
                                switch (nr->state)
                                case NR_STATE_0
                                  if (sock_flag(sk, SOCK_DESTROY) ||
                                     (sk->sk_state == TCP_LISTEN
                                       && sock_flag(sk, SOCK_DEAD)))
                                     sock_hold()  // ( !!! )
                                     nr_destroy_socket()
      
      To fix the memory leak, let's call sock_hold() only for a listening socket.
      
      Found by InfoTeCS on behalf of Linux Verification Center
      (linuxtesting.org) with Syzkaller.
      
      [0]: https://syzkaller.appspot.com/bug?extid=d327a1f3b12e1e206c16
      
      Reported-by: syzbot+d327a1f3b12e1e206c16@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=d327a1f3b12e1e206c16
      Fixes: 409db27e ("netrom: Fix use-after-free of a listening socket.")
      Signed-off-by: default avatarGavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b913024
  3. 15 Jun, 2024 3 commits
  4. 14 Jun, 2024 8 commits
  5. 13 Jun, 2024 22 commits
    • Aryan Srivastava's avatar
      net: mvpp2: use slab_build_skb for oversized frames · 4467c09b
      Aryan Srivastava authored
      Setting frag_size to 0 to indicate kmalloc has been deprecated,
      use slab_build_skb directly.
      
      Fixes: ce098da1 ("skbuff: Introduce slab_build_skb()")
      Signed-off-by: default avatarAryan Srivastava <aryan.srivastava@alliedtelesis.co.nz>
      Reviewed-by: default avatarKees Cook <kees@kernel.org>
      Link: https://lore.kernel.org/r/20240613024900.3842238-1-aryan.srivastava@alliedtelesis.co.nzSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4467c09b
    • Maciej Żenczykowski's avatar
      bpf: fix UML x86_64 compile failure · b99a95bc
      Maciej Żenczykowski authored
      pcpu_hot (defined in arch/x86) is not available on user mode linux (ARCH=um)
      
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Fixes: 1ae69210 ("bpf: inline bpf_get_smp_processor_id() helper")
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      Link: https://lore.kernel.org/r/20240613173146.2524647-1-maze@google.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b99a95bc
    • Daniel Borkmann's avatar
      selftests/bpf: Add test coverage for reg_set_min_max handling · ceb65eb6
      Daniel Borkmann authored
      Add a test case for the jmp32/k fix to ensure selftests have coverage.
      
      Before fix:
      
        # ./vmtest.sh -- ./test_progs -t verifier_or_jmp32_k
        [...]
        ./test_progs -t verifier_or_jmp32_k
        tester_init:PASS:tester_log_buf 0 nsec
        process_subtest:PASS:obj_open_mem 0 nsec
        process_subtest:PASS:specs_alloc 0 nsec
        run_subtest:PASS:obj_open_mem 0 nsec
        run_subtest:FAIL:unexpected_load_success unexpected success: 0
        #492/1   verifier_or_jmp32_k/or_jmp32_k: bit ops + branch on unknown value:FAIL
        #492     verifier_or_jmp32_k:FAIL
        Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
      
      After fix:
      
        # ./vmtest.sh -- ./test_progs -t verifier_or_jmp32_k
        [...]
        ./test_progs -t verifier_or_jmp32_k
        #492/1   verifier_or_jmp32_k/or_jmp32_k: bit ops + branch on unknown value:OK
        #492     verifier_or_jmp32_k:OK
        Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/20240613115310.25383-3-daniel@iogearbox.netSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ceb65eb6
    • Daniel Borkmann's avatar
      bpf: Reduce stack consumption in check_stack_write_fixed_off · e73cd1cf
      Daniel Borkmann authored
      The fake_reg moved into env->fake_reg given it consumes a lot of stack
      space (120 bytes). Migrate the fake_reg in check_stack_write_fixed_off()
      as well now that we have it.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/r/20240613115310.25383-2-daniel@iogearbox.netSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e73cd1cf
    • Daniel Borkmann's avatar
      bpf: Fix reg_set_min_max corruption of fake_reg · 92424801
      Daniel Borkmann authored
      Juan reported that after doing some changes to buzzer [0] and implementing
      a new fuzzing strategy guided by coverage, they noticed the following in
      one of the probes:
      
        [...]
        13: (79) r6 = *(u64 *)(r0 +0)         ; R0=map_value(ks=4,vs=8) R6_w=scalar()
        14: (b7) r0 = 0                       ; R0_w=0
        15: (b4) w0 = -1                      ; R0_w=0xffffffff
        16: (74) w0 >>= 1                     ; R0_w=0x7fffffff
        17: (5c) w6 &= w0                     ; R0_w=0x7fffffff R6_w=scalar(smin=smin32=0,smax=umax=umax32=0x7fffffff,var_off=(0x0; 0x7fffffff))
        18: (44) w6 |= 2                      ; R6_w=scalar(smin=umin=smin32=umin32=2,smax=umax=umax32=0x7fffffff,var_off=(0x2; 0x7ffffffd))
        19: (56) if w6 != 0x7ffffffd goto pc+1
        REG INVARIANTS VIOLATION (true_reg2): range bounds violation u64=[0x7fffffff, 0x7ffffffd] s64=[0x7fffffff, 0x7ffffffd] u32=[0x7fffffff, 0x7ffffffd] s32=[0x7fffffff, 0x7ffffffd] var_off=(0x7fffffff, 0x0)
        REG INVARIANTS VIOLATION (false_reg1): range bounds violation u64=[0x7fffffff, 0x7ffffffd] s64=[0x7fffffff, 0x7ffffffd] u32=[0x7fffffff, 0x7ffffffd] s32=[0x7fffffff, 0x7ffffffd] var_off=(0x7fffffff, 0x0)
        REG INVARIANTS VIOLATION (false_reg2): const tnum out of sync with range bounds u64=[0x0, 0xffffffffffffffff] s64=[0x8000000000000000, 0x7fffffffffffffff] u32=[0x0, 0xffffffff] s32=[0x80000000, 0x7fffffff] var_off=(0x7fffffff, 0x0)
        19: R6_w=0x7fffffff
        20: (95) exit
      
        from 19 to 21: R0=0x7fffffff R6=scalar(smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x7ffffffe,var_off=(0x2; 0x7ffffffd)) R7=map_ptr(ks=4,vs=8) R9=ctx() R10=fp0 fp-24=map_ptr(ks=4,vs=8) fp-40=mmmmmmmm
        21: R0=0x7fffffff R6=scalar(smin=umin=smin32=umin32=2,smax=umax=smax32=umax32=0x7ffffffe,var_off=(0x2; 0x7ffffffd)) R7=map_ptr(ks=4,vs=8) R9=ctx() R10=fp0 fp-24=map_ptr(ks=4,vs=8) fp-40=mmmmmmmm
        21: (14) w6 -= 2147483632             ; R6_w=scalar(smin=umin=umin32=2,smax=umax=0xffffffff,smin32=0x80000012,smax32=14,var_off=(0x2; 0xfffffffd))
        22: (76) if w6 s>= 0xe goto pc+1      ; R6_w=scalar(smin=umin=umin32=2,smax=umax=0xffffffff,smin32=0x80000012,smax32=13,var_off=(0x2; 0xfffffffd))
        23: (95) exit
      
        from 22 to 24: R0=0x7fffffff R6_w=14 R7=map_ptr(ks=4,vs=8) R9=ctx() R10=fp0 fp-24=map_ptr(ks=4,vs=8) fp-40=mmmmmmmm
        24: R0=0x7fffffff R6_w=14 R7=map_ptr(ks=4,vs=8) R9=ctx() R10=fp0 fp-24=map_ptr(ks=4,vs=8) fp-40=mmmmmmmm
        24: (14) w6 -= 14                     ; R6_w=0
        [...]
      
      What can be seen here is a register invariant violation on line 19. After
      the binary-or in line 18, the verifier knows that bit 2 is set but knows
      nothing about the rest of the content which was loaded from a map value,
      meaning, range is [2,0x7fffffff] with var_off=(0x2; 0x7ffffffd). When in
      line 19 the verifier analyzes the branch, it splits the register states
      in reg_set_min_max() into the registers of the true branch (true_reg1,
      true_reg2) and the registers of the false branch (false_reg1, false_reg2).
      
      Since the test is w6 != 0x7ffffffd, the src_reg is a known constant.
      Internally, the verifier creates a "fake" register initialized as scalar
      to the value of 0x7ffffffd, and then passes it onto reg_set_min_max(). Now,
      for line 19, it is mathematically impossible to take the false branch of
      this program, yet the verifier analyzes it. It is impossible because the
      second bit of r6 will be set due to the prior or operation and the
      constant in the condition has that bit unset (hex(fd) == binary(1111 1101).
      
      When the verifier first analyzes the false / fall-through branch, it will
      compute an intersection between the var_off of r6 and of the constant. This
      is because the verifier creates a "fake" register initialized to the value
      of the constant. The intersection result later refines both registers in
      regs_refine_cond_op():
      
        [...]
        t = tnum_intersect(tnum_subreg(reg1->var_off), tnum_subreg(reg2->var_off));
        reg1->var_off = tnum_with_subreg(reg1->var_off, t);
        reg2->var_off = tnum_with_subreg(reg2->var_off, t);
        [...]
      
      Since the verifier is analyzing the false branch of the conditional jump,
      reg1 is equal to false_reg1 and reg2 is equal to false_reg2, i.e. the reg2
      is the "fake" register that was meant to hold a constant value. The resulting
      var_off of the intersection says that both registers now hold a known value
      of var_off=(0x7fffffff, 0x0) or in other words: this operation manages to
      make the verifier think that the "constant" value that was passed in the
      jump operation now holds a different value.
      
      Normally this would not be an issue since it should not influence the true
      branch, however, false_reg2 and true_reg2 are pointers to the same "fake"
      register. Meaning, the false branch can influence the results of the true
      branch. In line 24, the verifier assumes R6_w=0, but the actual runtime
      value in this case is 1. The fix is simply not passing in the same "fake"
      register location as inputs to reg_set_min_max(), but instead making a
      copy. Moving the fake_reg into the env also reduces stack consumption by
      120 bytes. With this, the verifier successfully rejects invalid accesses
      from the test program.
      
        [0] https://github.com/google/buzzer
      
      Fixes: 67420501 ("bpf: generalize reg_set_min_max() to handle non-const register comparisons")
      Reported-by: default avatarJuan José López Jaimez <jjlopezjaimez@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/20240613115310.25383-1-daniel@iogearbox.netSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      92424801
    • Linus Torvalds's avatar
      Merge tag 'net-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · d20f6b3d
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bluetooth and netfilter.
      
        Slim pickings this time, probably a combination of summer, DevConf.cz,
        and the end of first half of the year at corporations.
      
        Current release - regressions:
      
         - Revert "igc: fix a log entry using uninitialized netdev", it traded
           lack of netdev name in a printk() for a crash
      
        Previous releases - regressions:
      
         - Bluetooth: L2CAP: fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ
      
         - geneve: fix incorrectly setting lengths of inner headers in the
           skb, confusing the drivers and causing mangled packets
      
         - sched: initialize noop_qdisc owner to avoid false-positive
           recursion detection (recursing on CPU 0), which bubbles up to user
           space as a sendmsg() error, while noop_qdisc should silently drop
      
         - netdevsim: fix backwards compatibility in nsim_get_iflink()
      
        Previous releases - always broken:
      
         - netfilter: ipset: fix race between namespace cleanup and gc in the
           list:set type"
      
      * tag 'net-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits)
        bnxt_en: Adjust logging of firmware messages in case of released token in __hwrm_send()
        af_unix: Read with MSG_PEEK loops if the first unread byte is OOB
        bnxt_en: Cap the size of HWRM_PORT_PHY_QCFG forwarded response
        gve: Clear napi->skb before dev_kfree_skb_any()
        ionic: fix use after netif_napi_del()
        Revert "igc: fix a log entry using uninitialized netdev"
        net: bridge: mst: fix suspicious rcu usage in br_mst_set_state
        net: bridge: mst: pass vlan group directly to br_mst_vlan_set_state
        net/ipv6: Fix the RT cache flush via sysctl using a previous delay
        net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters
        gve: ignore nonrelevant GSO type bits when processing TSO headers
        net: pse-pd: Use EOPNOTSUPP error code instead of ENOTSUPP
        netfilter: Use flowlabel flow key when re-routing mangled packets
        netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type
        netfilter: nft_inner: validate mandatory meta and payload
        tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()
        mailmap: map Geliang's new email address
        mptcp: pm: update add_addr counters after connect
        mptcp: pm: inc RmAddr MIB counter once per RM_ADDR ID
        mptcp: ensure snd_una is properly initialized on connect
        ...
      d20f6b3d
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-6.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · fd88e181
      Linus Torvalds authored
      Pull NFS client fixes from Trond Myklebust:
       "Bugfixes:
         - NFSv4.2: Fix a memory leak in nfs4_set_security_label
         - NFSv2/v3: abort nfs_atomic_open_v23 if the name is too long.
         - NFS: Add appropriate memory barriers to the sillyrename code
         - Propagate readlink errors in nfs_symlink_filler
         - NFS: don't invalidate dentries on transient errors
         - NFS: fix unnecessary synchronous writes in random write workloads
         - NFSv4.1: enforce rootpath check when deciding whether or not to trunk
      
        Other:
         - Change email address for Trond Myklebust due to email server concerns"
      
      * tag 'nfs-for-6.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        NFS: add barriers when testing for NFS_FSDATA_BLOCKED
        SUNRPC: return proper error from gss_wrap_req_priv
        NFSv4.1 enforce rootpath check in fs_location query
        NFS: abort nfs_atomic_open_v23 if name is too long.
        nfs: don't invalidate dentries on transient errors
        nfs: Avoid flushing many pages with NFS_FILE_SYNC
        nfs: propagate readlink errors in nfs_symlink_filler
        MAINTAINERS: Change email address for Trond Myklebust
        NFSv4: Fix memory leak in nfs4_set_security_label
      fd88e181
    • Linus Torvalds's avatar
      Merge tag 'fixes-2024-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock · 3572597c
      Linus Torvalds authored
      Pull memblock fixes from Mike Rapoport:
       "Fix validation of NUMA coverage.
      
        memblock_validate_numa_coverage() was checking for a unset node ID
        using NUMA_NO_NODE, but x86 used MAX_NUMNODES when no node ID was
        specified by buggy firmware.
      
        Update memblock to substitute MAX_NUMNODES with NUMA_NO_NODE in
        memblock_set_node() and use NUMA_NO_NODE in x86::numa_init()"
      
      * tag 'fixes-2024-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
        x86/mm/numa: Use NUMA_NO_NODE when calling memblock_set_node()
        memblock: make memblock_set_node() also warn about use of MAX_NUMNODES
      3572597c
    • Wojciech Drewek's avatar
      ice: implement AQ download pkg retry · a27f6ac9
      Wojciech Drewek authored
      ice_aqc_opc_download_pkg (0x0C40) AQ sporadically returns error due
      to FW issue. Fix this by retrying five times before moving to
      Safe Mode. Sleep for 20 ms before retrying. This was tested with the
      4.40 firmware.
      
      Fixes: c7648810 ("ice: Implement Dynamic Device Personalization (DDP) download")
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Reviewed-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      a27f6ac9
    • Paul Greenwalt's avatar
      ice: fix 200G link speed message log · aeccadb2
      Paul Greenwalt authored
      Commit 24407a01 ("ice: Add 200G speed/phy type use") added support
      for 200G PHY speeds, but did not include 200G link speed message
      support. As a result the driver incorrectly reports Unknown for 200G
      link speed.
      
      Fix this by adding 200G support to ice_print_link_msg().
      
      Fixes: 24407a01 ("ice: Add 200G speed/phy type use")
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: default avatarPaul Greenwalt <paul.greenwalt@intel.com>
      Reviewed-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      aeccadb2
    • En-Wei Wu's avatar
      ice: avoid IRQ collision to fix init failure on ACPI S3 resume · bc69ad74
      En-Wei Wu authored
      A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes
      that irdma would break and report hardware initialization failed after
      suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5).
      
      The problem is caused due to the collision between the irq numbers
      requested in irdma and the irq numbers requested in other drivers
      after suspend/resume.
      
      The irq numbers used by irdma are derived from ice's ice_pf->msix_entries
      which stores mappings between MSI-X index and Linux interrupt number.
      It's supposed to be cleaned up when suspend and rebuilt in resume but
      it's not, causing irdma using the old irq numbers stored in the old
      ice_pf->msix_entries to request_irq() when resume. And eventually
      collide with other drivers.
      
      This patch fixes this problem. On suspend, we call ice_deinit_rdma() to
      clean up the ice_pf->msix_entries (and free the MSI-X vectors used by
      irdma if we've dynamically allocated them). On resume, we call
      ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the
      MSI-X vectors if we would like to dynamically allocate them).
      
      Fixes: f9f5301e ("ice: Register auxiliary device to provide RDMA")
      Tested-by: default avatarCyrus Lien <cyrus.lien@canonical.com>
      Signed-off-by: default avatarEn-Wei Wu <en-wei.wu@canonical.com>
      Reviewed-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      bc69ad74
    • Aleksandr Mishin's avatar
      bnxt_en: Adjust logging of firmware messages in case of released token in __hwrm_send() · a9b97418
      Aleksandr Mishin authored
      In case of token is released due to token->state == BNXT_HWRM_DEFERRED,
      released token (set to NULL) is used in log messages. This issue is
      expected to be prevented by HWRM_ERR_CODE_PF_UNAVAILABLE error code. But
      this error code is returned by recent firmware. So some firmware may not
      return it. This may lead to NULL pointer dereference.
      Adjust this issue by adding token pointer check.
      
      Found by Linux Verification Center (linuxtesting.org) with SVACE.
      
      Fixes: 8fa4219d ("bnxt_en: add dynamic debug support for HWRM messages")
      Suggested-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarAleksandr Mishin <amishin@t-argos.ru>
      Reviewed-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Link: https://lore.kernel.org/r/20240611082547.12178-1-amishin@t-argos.ruSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a9b97418
    • Rao Shoaib's avatar
      af_unix: Read with MSG_PEEK loops if the first unread byte is OOB · a6736a0a
      Rao Shoaib authored
      Read with MSG_PEEK flag loops if the first byte to read is an OOB byte.
      commit 22dd70eb ("af_unix: Don't peek OOB data without MSG_OOB.")
      addresses the loop issue but does not address the issue that no data
      beyond OOB byte can be read.
      
      >>> from socket import *
      >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
      >>> c1.send(b'a', MSG_OOB)
      1
      >>> c1.send(b'b')
      1
      >>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
      b'b'
      
      >>> from socket import *
      >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
      >>> c2.setsockopt(SOL_SOCKET, SO_OOBINLINE, 1)
      >>> c1.send(b'a', MSG_OOB)
      1
      >>> c1.send(b'b')
      1
      >>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
      b'a'
      >>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
      b'a'
      >>> c2.recv(1, MSG_DONTWAIT)
      b'a'
      >>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
      b'b'
      >>>
      
      Fixes: 314001f0 ("af_unix: Add OOB support")
      Signed-off-by: default avatarRao Shoaib <Rao.Shoaib@oracle.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240611084639.2248934-1-Rao.Shoaib@oracle.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a6736a0a
    • Michael Chan's avatar
      bnxt_en: Cap the size of HWRM_PORT_PHY_QCFG forwarded response · 7d9df38c
      Michael Chan authored
      Firmware interface 1.10.2.118 has increased the size of
      HWRM_PORT_PHY_QCFG response beyond the maximum size that can be
      forwarded.  When the VF's link state is not the default auto state,
      the PF will need to forward the response back to the VF to indicate
      the forced state.  This regression may cause the VF to fail to
      initialize.
      
      Fix it by capping the HWRM_PORT_PHY_QCFG response to the maximum
      96 bytes.  The SPEEDS2_SUPPORTED flag needs to be cleared because the
      new speeds2 fields are beyond the legacy structure.  Also modify
      bnxt_hwrm_fwd_resp() to print a warning if the message size exceeds 96
      bytes to make this failure more obvious.
      
      Fixes: 84a911db ("bnxt_en: Update firmware interface to 1.10.2.118")
      Reviewed-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Link: https://lore.kernel.org/r/20240612231736.57823-1-michael.chan@broadcom.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7d9df38c
    • Ziwei Xiao's avatar
      gve: Clear napi->skb before dev_kfree_skb_any() · 6f4d93b7
      Ziwei Xiao authored
      gve_rx_free_skb incorrectly leaves napi->skb referencing an skb after it
      is freed with dev_kfree_skb_any(). This can result in a subsequent call
      to napi_get_frags returning a dangling pointer.
      
      Fix this by clearing napi->skb before the skb is freed.
      
      Fixes: 9b8dd5e5 ("gve: DQO: Add RX path")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarShailend Chand <shailend@google.com>
      Signed-off-by: default avatarZiwei Xiao <ziweixiao@google.com>
      Reviewed-by: default avatarHarshitha Ramamurthy <hramamurthy@google.com>
      Reviewed-by: default avatarShailend Chand <shailend@google.com>
      Reviewed-by: default avatarPraveen Kaligineedi <pkaligineedi@google.com>
      Link: https://lore.kernel.org/r/20240612001654.923887-1-ziweixiao@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6f4d93b7
    • Taehee Yoo's avatar
      ionic: fix use after netif_napi_del() · 79f18a41
      Taehee Yoo authored
      When queues are started, netif_napi_add() and napi_enable() are called.
      If there are 4 queues and only 3 queues are used for the current
      configuration, only 3 queues' napi should be registered and enabled.
      The ionic_qcq_enable() checks whether the .poll pointer is not NULL for
      enabling only the using queue' napi. Unused queues' napi will not be
      registered by netif_napi_add(), so the .poll pointer indicates NULL.
      But it couldn't distinguish whether the napi was unregistered or not
      because netif_napi_del() doesn't reset the .poll pointer to NULL.
      So, ionic_qcq_enable() calls napi_enable() for the queue, which was
      unregistered by netif_napi_del().
      
      Reproducer:
         ethtool -L <interface name> rx 1 tx 1 combined 0
         ethtool -L <interface name> rx 0 tx 0 combined 1
         ethtool -L <interface name> rx 0 tx 0 combined 4
      
      Splat looks like:
      kernel BUG at net/core/dev.c:6666!
      Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      CPU: 3 PID: 1057 Comm: kworker/3:3 Not tainted 6.10.0-rc2+ #16
      Workqueue: events ionic_lif_deferred_work [ionic]
      RIP: 0010:napi_enable+0x3b/0x40
      Code: 48 89 c2 48 83 e2 f6 80 b9 61 09 00 00 00 74 0d 48 83 bf 60 01 00 00 00 74 03 80 ce 01 f0 4f
      RSP: 0018:ffffb6ed83227d48 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff97560cda0828 RCX: 0000000000000029
      RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff97560cda0a28
      RBP: ffffb6ed83227d50 R08: 0000000000000400 R09: 0000000000000001
      R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
      R13: ffff97560ce3c1a0 R14: 0000000000000000 R15: ffff975613ba0a20
      FS:  0000000000000000(0000) GS:ffff975d5f780000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f8f734ee200 CR3: 0000000103e50000 CR4: 00000000007506f0
      PKRU: 55555554
      Call Trace:
       <TASK>
       ? die+0x33/0x90
       ? do_trap+0xd9/0x100
       ? napi_enable+0x3b/0x40
       ? do_error_trap+0x83/0xb0
       ? napi_enable+0x3b/0x40
       ? napi_enable+0x3b/0x40
       ? exc_invalid_op+0x4e/0x70
       ? napi_enable+0x3b/0x40
       ? asm_exc_invalid_op+0x16/0x20
       ? napi_enable+0x3b/0x40
       ionic_qcq_enable+0xb7/0x180 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8]
       ionic_start_queues+0xc4/0x290 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8]
       ionic_link_status_check+0x11c/0x170 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8]
       ionic_lif_deferred_work+0x129/0x280 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8]
       process_one_work+0x145/0x360
       worker_thread+0x2bb/0x3d0
       ? __pfx_worker_thread+0x10/0x10
       kthread+0xcc/0x100
       ? __pfx_kthread+0x10/0x10
       ret_from_fork+0x2d/0x50
       ? __pfx_kthread+0x10/0x10
       ret_from_fork_asm+0x1a/0x30
      
      Fixes: 0f3154e6 ("ionic: Add Tx and Rx handling")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Link: https://lore.kernel.org/r/20240612060446.1754392-1-ap420073@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      79f18a41
    • Sasha Neftin's avatar
      Revert "igc: fix a log entry using uninitialized netdev" · 8eef5c3c
      Sasha Neftin authored
      This reverts commit 86167183.
      
      igc_ptp_init() needs to be called before igc_reset(), otherwise kernel
      crash could be observed. Following the corresponding discussion [1] and
      [2] revert this commit.
      
      Link: https://lore.kernel.org/all/8fb634f8-7330-4cf4-a8ce-485af9c0a61a@intel.com/ [1]
      Link: https://lore.kernel.org/all/87o78rmkhu.fsf@intel.com/ [2]
      Fixes: 86167183 ("igc: fix a log entry using uninitialized netdev")
      Signed-off-by: default avatarSasha Neftin <sasha.neftin@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20240611162456.961631-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8eef5c3c
    • Jakub Kicinski's avatar
      Merge branch 'net-bridge-mst-fix-suspicious-rcu-usage-warning' · b60b1bdc
      Jakub Kicinski authored
      Nikolay Aleksandrov says:
      
      ====================
      net: bridge: mst: fix suspicious rcu usage warning
      
      This set fixes a suspicious RCU usage warning triggered by syzbot[1] in
      the bridge's MST code. After I converted br_mst_set_state to RCU, I
      forgot to update the vlan group dereference helper. Fix it by using
      the proper helper, in order to do that we need to pass the vlan group
      which is already obtained correctly by the callers for their respective
      context. Patch 01 is a requirement for the fix in patch 02.
      
      Note I did consider rcu_dereference_rtnl() but the churn is much bigger
      and in every part of the bridge. We can do that as a cleanup in
      net-next.
      
      [1] https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe
       =============================
       WARNING: suspicious RCU usage
       6.10.0-rc2-syzkaller-00235-g8a929806 #0 Not tainted
       -----------------------------
       net/bridge/br_private.h:1599 suspicious rcu_dereference_protected() usage!
      
       other info that might help us debug this:
      
       rcu_scheduler_active = 2, debug_locks = 1
       4 locks held by syz-executor.1/5374:
        #0: ffff888022d50b18 (&mm->mmap_lock){++++}-{3:3}, at: mmap_read_lock include/linux/mmap_lock.h:144 [inline]
        #0: ffff888022d50b18 (&mm->mmap_lock){++++}-{3:3}, at: __mm_populate+0x1b0/0x460 mm/gup.c:2111
        #1: ffffc90000a18c00 ((&p->forward_delay_timer)){+.-.}-{0:0}, at: call_timer_fn+0xc0/0x650 kernel/time/timer.c:1789
        #2: ffff88805fb2ccb8 (&br->lock){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
        #2: ffff88805fb2ccb8 (&br->lock){+.-.}-{2:2}, at: br_forward_delay_timer_expired+0x50/0x440 net/bridge/br_stp_timer.c:86
        #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
        #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:781 [inline]
        #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: br_mst_set_state+0x171/0x7a0 net/bridge/br_mst.c:105
      
       stack backtrace:
       CPU: 1 PID: 5374 Comm: syz-executor.1 Not tainted 6.10.0-rc2-syzkaller-00235-g8a929806 #0
       Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
       Call Trace:
        <IRQ>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
        lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712
        nbp_vlan_group net/bridge/br_private.h:1599 [inline]
        br_mst_set_state+0x29e/0x7a0 net/bridge/br_mst.c:106
        br_set_state+0x28a/0x7b0 net/bridge/br_stp.c:47
        br_forward_delay_timer_expired+0x176/0x440 net/bridge/br_stp_timer.c:88
        call_timer_fn+0x18e/0x650 kernel/time/timer.c:1792
        expire_timers kernel/time/timer.c:1843 [inline]
        __run_timers kernel/time/timer.c:2417 [inline]
        __run_timer_base+0x66a/0x8e0 kernel/time/timer.c:2428
        run_timer_base kernel/time/timer.c:2437 [inline]
        run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2447
        handle_softirqs+0x2c4/0x970 kernel/softirq.c:554
        __do_softirq kernel/softirq.c:588 [inline]
        invoke_softirq kernel/softirq.c:428 [inline]
        __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637
        irq_exit_rcu+0x9/0x30 kernel/softirq.c:649
        instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
        sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
        </IRQ>
        <TASK>
      ====================
      
      Link: https://lore.kernel.org/r/20240609103654.914987-1-razor@blackwall.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b60b1bdc
    • Nikolay Aleksandrov's avatar
      net: bridge: mst: fix suspicious rcu usage in br_mst_set_state · 546ceb1d
      Nikolay Aleksandrov authored
      I converted br_mst_set_state to RCU to avoid a vlan use-after-free
      but forgot to change the vlan group dereference helper. Switch to vlan
      group RCU deref helper to fix the suspicious rcu usage warning.
      
      Fixes: 3a7c1661 ("net: bridge: mst: fix vlan use-after-free")
      Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5feSigned-off-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Link: https://lore.kernel.org/r/20240609103654.914987-3-razor@blackwall.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      546ceb1d
    • Nikolay Aleksandrov's avatar
      net: bridge: mst: pass vlan group directly to br_mst_vlan_set_state · 36c92936
      Nikolay Aleksandrov authored
      Pass the already obtained vlan group pointer to br_mst_vlan_set_state()
      instead of dereferencing it again. Each caller has already correctly
      dereferenced it for their context. This change is required for the
      following suspicious RCU dereference fix. No functional changes
      intended.
      
      Fixes: 3a7c1661 ("net: bridge: mst: fix vlan use-after-free")
      Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5feSigned-off-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Link: https://lore.kernel.org/r/20240609103654.914987-2-razor@blackwall.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      36c92936
    • Stanislav Fomichev's avatar
      26ba7c3f
    • Petr Pavlu's avatar
      net/ipv6: Fix the RT cache flush via sysctl using a previous delay · 14a20e5b
      Petr Pavlu authored
      The net.ipv6.route.flush system parameter takes a value which specifies
      a delay used during the flush operation for aging exception routes. The
      written value is however not used in the currently requested flush and
      instead utilized only in the next one.
      
      A problem is that ipv6_sysctl_rtcache_flush() first reads the old value
      of net->ipv6.sysctl.flush_delay into a local delay variable and then
      calls proc_dointvec() which actually updates the sysctl based on the
      provided input.
      
      Fix the problem by switching the order of the two operations.
      
      Fixes: 4990509f ("[NETNS][IPV6]: Make sysctls route per namespace.")
      Signed-off-by: default avatarPetr Pavlu <petr.pavlu@suse.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20240607112828.30285-1-petr.pavlu@suse.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      14a20e5b