1. 10 Oct, 2024 21 commits
  2. 09 Oct, 2024 9 commits
  3. 08 Oct, 2024 10 commits
    • Eric Dumazet's avatar
      net/sched: accept TCA_STAB only for root qdisc · 3cb7cf15
      Eric Dumazet authored
      Most qdiscs maintain their backlog using qdisc_pkt_len(skb)
      on the assumption it is invariant between the enqueue()
      and dequeue() handlers.
      
      Unfortunately syzbot can crash a host rather easily using
      a TBF + SFQ combination, with an STAB on SFQ [1]
      
      We can't support TCA_STAB on arbitrary level, this would
      require to maintain per-qdisc storage.
      
      [1]
      [   88.796496] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [   88.798611] #PF: supervisor read access in kernel mode
      [   88.799014] #PF: error_code(0x0000) - not-present page
      [   88.799506] PGD 0 P4D 0
      [   88.799829] Oops: Oops: 0000 [#1] SMP NOPTI
      [   88.800569] CPU: 14 UID: 0 PID: 2053 Comm: b371744477 Not tainted 6.12.0-rc1-virtme #1117
      [   88.801107] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
      [   88.801779] RIP: 0010:sfq_dequeue (net/sched/sch_sfq.c:272 net/sched/sch_sfq.c:499) sch_sfq
      [ 88.802544] Code: 0f b7 50 12 48 8d 04 d5 00 00 00 00 48 89 d6 48 29 d0 48 8b 91 c0 01 00 00 48 c1 e0 03 48 01 c2 66 83 7a 1a 00 7e c0 48 8b 3a <4c> 8b 07 4c 89 02 49 89 50 08 48 c7 47 08 00 00 00 00 48 c7 07 00
      All code
      ========
         0:	0f b7 50 12          	movzwl 0x12(%rax),%edx
         4:	48 8d 04 d5 00 00 00 	lea    0x0(,%rdx,8),%rax
         b:	00
         c:	48 89 d6             	mov    %rdx,%rsi
         f:	48 29 d0             	sub    %rdx,%rax
        12:	48 8b 91 c0 01 00 00 	mov    0x1c0(%rcx),%rdx
        19:	48 c1 e0 03          	shl    $0x3,%rax
        1d:	48 01 c2             	add    %rax,%rdx
        20:	66 83 7a 1a 00       	cmpw   $0x0,0x1a(%rdx)
        25:	7e c0                	jle    0xffffffffffffffe7
        27:	48 8b 3a             	mov    (%rdx),%rdi
        2a:*	4c 8b 07             	mov    (%rdi),%r8		<-- trapping instruction
        2d:	4c 89 02             	mov    %r8,(%rdx)
        30:	49 89 50 08          	mov    %rdx,0x8(%r8)
        34:	48 c7 47 08 00 00 00 	movq   $0x0,0x8(%rdi)
        3b:	00
        3c:	48                   	rex.W
        3d:	c7                   	.byte 0xc7
        3e:	07                   	(bad)
      	...
      
      Code starting with the faulting instruction
      ===========================================
         0:	4c 8b 07             	mov    (%rdi),%r8
         3:	4c 89 02             	mov    %r8,(%rdx)
         6:	49 89 50 08          	mov    %rdx,0x8(%r8)
         a:	48 c7 47 08 00 00 00 	movq   $0x0,0x8(%rdi)
        11:	00
        12:	48                   	rex.W
        13:	c7                   	.byte 0xc7
        14:	07                   	(bad)
      	...
      [   88.803721] RSP: 0018:ffff9a1f892b7d58 EFLAGS: 00000206
      [   88.804032] RAX: 0000000000000000 RBX: ffff9a1f8420c800 RCX: ffff9a1f8420c800
      [   88.804560] RDX: ffff9a1f81bc1440 RSI: 0000000000000000 RDI: 0000000000000000
      [   88.805056] RBP: ffffffffc04bb0e0 R08: 0000000000000001 R09: 00000000ff7f9a1f
      [   88.805473] R10: 000000000001001b R11: 0000000000009a1f R12: 0000000000000140
      [   88.806194] R13: 0000000000000001 R14: ffff9a1f886df400 R15: ffff9a1f886df4ac
      [   88.806734] FS:  00007f445601a740(0000) GS:ffff9a2e7fd80000(0000) knlGS:0000000000000000
      [   88.807225] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   88.807672] CR2: 0000000000000000 CR3: 000000050cc46000 CR4: 00000000000006f0
      [   88.808165] Call Trace:
      [   88.808459]  <TASK>
      [   88.808710] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
      [   88.809261] ? page_fault_oops (arch/x86/mm/fault.c:715)
      [   88.809561] ? exc_page_fault (./arch/x86/include/asm/irqflags.h:26 ./arch/x86/include/asm/irqflags.h:87 ./arch/x86/include/asm/irqflags.h:147 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539)
      [   88.809806] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
      [   88.810074] ? sfq_dequeue (net/sched/sch_sfq.c:272 net/sched/sch_sfq.c:499) sch_sfq
      [   88.810411] sfq_reset (net/sched/sch_sfq.c:525) sch_sfq
      [   88.810671] qdisc_reset (./include/linux/skbuff.h:2135 ./include/linux/skbuff.h:2441 ./include/linux/skbuff.h:3304 ./include/linux/skbuff.h:3310 net/sched/sch_generic.c:1036)
      [   88.810950] tbf_reset (./include/linux/timekeeping.h:169 net/sched/sch_tbf.c:334) sch_tbf
      [   88.811208] qdisc_reset (./include/linux/skbuff.h:2135 ./include/linux/skbuff.h:2441 ./include/linux/skbuff.h:3304 ./include/linux/skbuff.h:3310 net/sched/sch_generic.c:1036)
      [   88.811484] netif_set_real_num_tx_queues (./include/linux/spinlock.h:396 ./include/net/sch_generic.h:768 net/core/dev.c:2958)
      [   88.811870] __tun_detach (drivers/net/tun.c:590 drivers/net/tun.c:673)
      [   88.812271] tun_chr_close (drivers/net/tun.c:702 drivers/net/tun.c:3517)
      [   88.812505] __fput (fs/file_table.c:432 (discriminator 1))
      [   88.812735] task_work_run (kernel/task_work.c:230)
      [   88.813016] do_exit (kernel/exit.c:940)
      [   88.813372] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:58 (discriminator 4))
      [   88.813639] ? handle_mm_fault (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/memcontrol.h:1022 ./include/linux/memcontrol.h:1045 ./include/linux/memcontrol.h:1052 mm/memory.c:5928 mm/memory.c:6088)
      [   88.813867] do_group_exit (kernel/exit.c:1070)
      [   88.814138] __x64_sys_exit_group (kernel/exit.c:1099)
      [   88.814490] x64_sys_call (??:?)
      [   88.814791] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
      [   88.815012] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
      [   88.815495] RIP: 0033:0x7f44560f1975
      
      Fixes: 175f9c1b ("net_sched: Add size table for qdiscs")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Link: https://patch.msgid.link/20241007184130.3960565-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3cb7cf15
    • Vitaly Lifshits's avatar
      e1000e: change I219 (19) devices to ADP · 9d9e5347
      Vitaly Lifshits authored
      Sporadic issues, such as PHY access loss, have been observed on I219 (19)
      devices. It was found that these devices have hardware more closely
      related to ADP than MTP and the issues were caused by taking MTP-specific
      flows.
      
      Change the MAC and board types of these devices from MTP to ADP to
      correctly reflect the LAN hardware, and flows, of these devices.
      
      Fixes: db2d737d ("e1000e: Separate MTP board type from ADP")
      Signed-off-by: default avatarVitaly Lifshits <vitaly.lifshits@intel.com>
      Tested-by: default avatarMor Bar-Gabay <morx.bar.gabay@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      9d9e5347
    • Mohamed Khalfella's avatar
      igb: Do not bring the device up after non-fatal error · 330a699e
      Mohamed Khalfella authored
      Commit 004d2506 ("igb: Fix igb_down hung on surprise removal")
      changed igb_io_error_detected() to ignore non-fatal pcie errors in order
      to avoid hung task that can happen when igb_down() is called multiple
      times. This caused an issue when processing transient non-fatal errors.
      igb_io_resume(), which is called after igb_io_error_detected(), assumes
      that device is brought down by igb_io_error_detected() if the interface
      is up. This resulted in panic with stacktrace below.
      
      [ T3256] igb 0000:09:00.0 haeth0: igb: haeth0 NIC Link is Down
      [  T292] pcieport 0000:00:1c.5: AER: Uncorrected (Non-Fatal) error received: 0000:09:00.0
      [  T292] igb 0000:09:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
      [  T292] igb 0000:09:00.0:   device [8086:1537] error status/mask=00004000/00000000
      [  T292] igb 0000:09:00.0:    [14] CmpltTO [  200.105524,009][  T292] igb 0000:09:00.0: AER:   TLP Header: 00000000 00000000 00000000 00000000
      [  T292] pcieport 0000:00:1c.5: AER: broadcast error_detected message
      [  T292] igb 0000:09:00.0: Non-correctable non-fatal error reported.
      [  T292] pcieport 0000:00:1c.5: AER: broadcast mmio_enabled message
      [  T292] pcieport 0000:00:1c.5: AER: broadcast resume message
      [  T292] ------------[ cut here ]------------
      [  T292] kernel BUG at net/core/dev.c:6539!
      [  T292] invalid opcode: 0000 [#1] PREEMPT SMP
      [  T292] RIP: 0010:napi_enable+0x37/0x40
      [  T292] Call Trace:
      [  T292]  <TASK>
      [  T292]  ? die+0x33/0x90
      [  T292]  ? do_trap+0xdc/0x110
      [  T292]  ? napi_enable+0x37/0x40
      [  T292]  ? do_error_trap+0x70/0xb0
      [  T292]  ? napi_enable+0x37/0x40
      [  T292]  ? napi_enable+0x37/0x40
      [  T292]  ? exc_invalid_op+0x4e/0x70
      [  T292]  ? napi_enable+0x37/0x40
      [  T292]  ? asm_exc_invalid_op+0x16/0x20
      [  T292]  ? napi_enable+0x37/0x40
      [  T292]  igb_up+0x41/0x150
      [  T292]  igb_io_resume+0x25/0x70
      [  T292]  report_resume+0x54/0x70
      [  T292]  ? report_frozen_detected+0x20/0x20
      [  T292]  pci_walk_bus+0x6c/0x90
      [  T292]  ? aer_print_port_info+0xa0/0xa0
      [  T292]  pcie_do_recovery+0x22f/0x380
      [  T292]  aer_process_err_devices+0x110/0x160
      [  T292]  aer_isr+0x1c1/0x1e0
      [  T292]  ? disable_irq_nosync+0x10/0x10
      [  T292]  irq_thread_fn+0x1a/0x60
      [  T292]  irq_thread+0xe3/0x1a0
      [  T292]  ? irq_set_affinity_notifier+0x120/0x120
      [  T292]  ? irq_affinity_notify+0x100/0x100
      [  T292]  kthread+0xe2/0x110
      [  T292]  ? kthread_complete_and_exit+0x20/0x20
      [  T292]  ret_from_fork+0x2d/0x50
      [  T292]  ? kthread_complete_and_exit+0x20/0x20
      [  T292]  ret_from_fork_asm+0x11/0x20
      [  T292]  </TASK>
      
      To fix this issue igb_io_resume() checks if the interface is running and
      the device is not down this means igb_io_error_detected() did not bring
      the device down and there is no need to bring it up.
      Signed-off-by: default avatarMohamed Khalfella <mkhalfella@purestorage.com>
      Reviewed-by: default avatarYuanyuan Zhong <yzhong@purestorage.com>
      Fixes: 004d2506 ("igb: Fix igb_down hung on surprise removal")
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      330a699e
    • Aleksandr Loktionov's avatar
      i40e: Fix macvlan leak by synchronizing access to mac_filter_hash · dac6c7b3
      Aleksandr Loktionov authored
      This patch addresses a macvlan leak issue in the i40e driver caused by
      concurrent access to vsi->mac_filter_hash. The leak occurs when multiple
      threads attempt to modify the mac_filter_hash simultaneously, leading to
      inconsistent state and potential memory leaks.
      
      To fix this, we now wrap the calls to i40e_del_mac_filter() and zeroing
      vf->default_lan_addr.addr with spin_lock/unlock_bh(&vsi->mac_filter_hash_lock),
      ensuring atomic operations and preventing concurrent access.
      
      Additionally, we add lockdep_assert_held(&vsi->mac_filter_hash_lock) in
      i40e_add_mac_filter() to help catch similar issues in the future.
      
      Reproduction steps:
      1. Spawn VFs and configure port vlan on them.
      2. Trigger concurrent macvlan operations (e.g., adding and deleting
      	portvlan and/or mac filters).
      3. Observe the potential memory leak and inconsistent state in the
      	mac_filter_hash.
      
      This synchronization ensures the integrity of the mac_filter_hash and prevents
      the described leak.
      
      Fixes: fed0d9f1 ("i40e: Fix VF's MAC Address change on VM")
      Reviewed-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Signed-off-by: default avatarAleksandr Loktionov <aleksandr.loktionov@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      dac6c7b3
    • Marcin Szycik's avatar
      ice: Fix increasing MSI-X on VF · bce9af1b
      Marcin Szycik authored
      Increasing MSI-X value on a VF leads to invalid memory operations. This
      is caused by not reallocating some arrays.
      
      Reproducer:
        modprobe ice
        echo 0 > /sys/bus/pci/devices/$PF_PCI/sriov_drivers_autoprobe
        echo 1 > /sys/bus/pci/devices/$PF_PCI/sriov_numvfs
        echo 17 > /sys/bus/pci/devices/$VF0_PCI/sriov_vf_msix_count
      
      Default MSI-X is 16, so 17 and above triggers this issue.
      
      KASAN reports:
      
        BUG: KASAN: slab-out-of-bounds in ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
        Read of size 8 at addr ffff8888b937d180 by task bash/28433
        (...)
      
        Call Trace:
         (...)
         ? ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
         kasan_report+0xed/0x120
         ? ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
         ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
         ice_vsi_cfg_def+0x3360/0x4770 [ice]
         ? mutex_unlock+0x83/0xd0
         ? __pfx_ice_vsi_cfg_def+0x10/0x10 [ice]
         ? __pfx_ice_remove_vsi_lkup_fltr+0x10/0x10 [ice]
         ice_vsi_cfg+0x7f/0x3b0 [ice]
         ice_vf_reconfig_vsi+0x114/0x210 [ice]
         ice_sriov_set_msix_vec_count+0x3d0/0x960 [ice]
         sriov_vf_msix_count_store+0x21c/0x300
         (...)
      
        Allocated by task 28201:
         (...)
         ice_vsi_cfg_def+0x1c8e/0x4770 [ice]
         ice_vsi_cfg+0x7f/0x3b0 [ice]
         ice_vsi_setup+0x179/0xa30 [ice]
         ice_sriov_configure+0xcaa/0x1520 [ice]
         sriov_numvfs_store+0x212/0x390
         (...)
      
      To fix it, use ice_vsi_rebuild() instead of ice_vf_reconfig_vsi(). This
      causes the required arrays to be reallocated taking the new queue count
      into account (ice_vsi_realloc_stat_arrays()). Set req_txq and req_rxq
      before ice_vsi_rebuild(), so that realloc uses the newly set queue
      count.
      
      Additionally, ice_vsi_rebuild() does not remove VSI filters
      (ice_fltr_remove_all()), so ice_vf_init_host_cfg() is no longer
      necessary.
      Reported-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Fixes: 2a2cb4c6 ("ice: replace ice_vf_recreate_vsi() with ice_vf_reconfig_vsi()")
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: default avatarMarcin Szycik <marcin.szycik@linux.intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      bce9af1b
    • Wojciech Drewek's avatar
      ice: Flush FDB entries before reset · fbcb968a
      Wojciech Drewek authored
      Triggering the reset while in switchdev mode causes
      errors[1]. Rules are already removed by this time
      because switch content is flushed in case of the reset.
      This means that rules were deleted from HW but SW
      still thinks they exist so when we get
      SWITCHDEV_FDB_DEL_TO_DEVICE notification we try to
      delete not existing rule.
      
      We can avoid these errors by clearing the rules
      early in the reset flow before they are removed from HW.
      Switchdev API will get notified that the rule was removed
      so we won't get SWITCHDEV_FDB_DEL_TO_DEVICE notification.
      Remove unnecessary ice_clear_sw_switch_recipes.
      
      [1]
      ice 0000:01:00.0: Failed to delete FDB forward rule, err: -2
      ice 0000:01:00.0: Failed to delete FDB guard rule, err: -2
      
      Fixes: 7c945a1a ("ice: Switchdev FDB events support")
      Reviewed-by: default avatarMateusz Polchlopek <mateusz.polchlopek@intel.com>
      Signed-off-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Tested-by: default avatarSujai Buvaneswaran <sujai.buvaneswaran@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      fbcb968a
    • Marcin Szycik's avatar
      ice: Fix netif_is_ice() in Safe Mode · 8e60dbcb
      Marcin Szycik authored
      netif_is_ice() works by checking the pointer to netdev ops. However, it
      only checks for the default ice_netdev_ops, not ice_netdev_safe_mode_ops,
      so in Safe Mode it always returns false, which is unintuitive. While it
      doesn't look like netif_is_ice() is currently being called anywhere in Safe
      Mode, this could change and potentially lead to unexpected behaviour.
      
      Fixes: df006dd4 ("ice: Add initial support framework for LAG")
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarMarcin Szycik <marcin.szycik@linux.intel.com>
      Reviewed-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Tested-by: default avatarSujai Buvaneswaran <sujai.buvaneswaran@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      8e60dbcb
    • Marcin Szycik's avatar
      ice: Fix entering Safe Mode · b972060a
      Marcin Szycik authored
      If DDP package is missing or corrupted, the driver should enter Safe Mode.
      Instead, an error is returned and probe fails.
      
      To fix this, don't exit init if ice_init_ddp_config() returns an error.
      
      Repro:
      * Remove or rename DDP package (/lib/firmware/intel/ice/ddp/ice.pkg)
      * Load ice
      
      Fixes: cc5776fe ("ice: Enable switching default Tx scheduler topology")
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarMarcin Szycik <marcin.szycik@linux.intel.com>
      Reviewed-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      b972060a
    • Greg Thelen's avatar
      selftests: make kselftest-clean remove libynl outputs · 1fd9e4f2
      Greg Thelen authored
      Starting with 6.12 commit 85585b4b ("selftests: add ncdevmem, netcat
      for devmem TCP") kselftest-all creates additional outputs that
      kselftest-clean does not cleanup:
        $ make defconfig
        $ make kselftest-all
        $ make kselftest-clean
        $ git clean -ndxf | grep tools/net
        Would remove tools/net/ynl/lib/__pycache__/
        Would remove tools/net/ynl/lib/ynl.a
        Would remove tools/net/ynl/lib/ynl.d
        Would remove tools/net/ynl/lib/ynl.o
      
      Make kselftest-clean remove the newly added net/ynl outputs.
      
      Fixes: 85585b4b ("selftests: add ncdevmem, netcat for devmem TCP")
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Reviewed-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Link: https://patch.msgid.link/20241005215600.852260-1-gthelen@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1fd9e4f2
    • Jakub Kicinski's avatar
      Merge branch 'selftests-net-add-missing-gitignore-and-extra_clean-entries' · c0a30936
      Jakub Kicinski authored
      Javier Carrasco says:
      
      ====================
      selftests: net: add missing gitignore and EXTRA_CLEAN entries.
      
      This series is a cherry-pick on top of v6.12-rc1 from the one I sent
      for selftests with other patches that were not net-related:
      
      https://lore.kernel.org/all/20240925-selftests-gitignore-v3-0-9db896474170@gmail.com/
      
      The patches have not been modified, and the Reviewed-by tags have
      been kept.
      
      v1: https://lore.kernel.org/20240930-net-selftests-gitignore-v1-0-65225a855946@gmail.com
      ====================
      
      Link: https://patch.msgid.link/20241005-net-selftests-gitignore-v2-0-3a0b2876394a@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c0a30936