1. 21 Oct, 2023 6 commits
  2. 20 Oct, 2023 8 commits
    • Mateusz Palczewski's avatar
      igb: Fix potential memory leak in igb_add_ethtool_nfc_entry · 8c0b48e0
      Mateusz Palczewski authored
      Add check for return of igb_update_ethtool_nfc_entry so that in case
      of any potential errors the memory alocated for input will be freed.
      
      Fixes: 0e71def2 ("igb: add support of RX network flow classification")
      Reviewed-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c0b48e0
    • Kunwu Chan's avatar
      treewide: Spelling fix in comment · fb71ba0e
      Kunwu Chan authored
      reques -> request
      
      Fixes: 09dde54c ("PS3: gelic: Add wireless support for PS3")
      Signed-off-by: default avatarKunwu Chan <chentao@kylinos.cn>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb71ba0e
    • Ivan Vecera's avatar
      i40e: Fix I40E_FLAG_VF_VLAN_PRUNING value · 665e7d83
      Ivan Vecera authored
      Commit c87c938f ("i40e: Add VF VLAN pruning") added new
      PF flag I40E_FLAG_VF_VLAN_PRUNING but its value collides with
      existing I40E_FLAG_TOTAL_PORT_SHUTDOWN_ENABLED flag.
      
      Move the affected flag at the end of the flags and fix its value.
      
      Reproducer:
      [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 link-down-on-close on
      [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 vf-vlan-pruning on
      [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 link-down-on-close off
      [ 6323.142585] i40e 0000:02:00.0: Setting link-down-on-close not supported on this port (because total-port-shutdown is enabled)
      netlink error: Operation not supported
      [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 vf-vlan-pruning off
      [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 link-down-on-close off
      
      The link-down-on-close flag cannot be modified after setting vf-vlan-pruning
      because vf-vlan-pruning shares the same bit with total-port-shutdown flag
      that prevents any modification of link-down-on-close flag.
      
      Fixes: c87c938f ("i40e: Add VF VLAN pruning")
      Cc: Mateusz Palczewski <mateusz.palczewski@intel.com>
      Cc: Simon Horman <horms@kernel.org>
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      665e7d83
    • Michal Schmidt's avatar
      iavf: initialize waitqueues before starting watchdog_task · 7db31110
      Michal Schmidt authored
      It is not safe to initialize the waitqueues after queueing the
      watchdog_task. It will be using them.
      
      The chance of this causing a real problem is very small, because
      there will be some sleeping before any of the waitqueues get used.
      I got a crash only after inserting an artificial sleep in iavf_probe.
      
      Queue the watchdog_task as the last step in iavf_probe. Add a comment to
      prevent repeating the mistake.
      
      Fixes: fe2647ab ("i40evf: prevent VF close returning before state transitions to DOWN")
      Signed-off-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Reviewed-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7db31110
    • Mirsad Goran Todorovac's avatar
      r8169: fix the KCSAN reported data race in rtl_rx while reading desc->opts1 · f97eee48
      Mirsad Goran Todorovac authored
      KCSAN reported the following data-race bug:
      
      ==================================================================
      BUG: KCSAN: data-race in rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4430 drivers/net/ethernet/realtek/r8169_main.c:4583) r8169
      
      race at unknown origin, with read to 0xffff888117e43510 of 4 bytes by interrupt on cpu 21:
      rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4430 drivers/net/ethernet/realtek/r8169_main.c:4583) r8169
      __napi_poll (net/core/dev.c:6527)
      net_rx_action (net/core/dev.c:6596 net/core/dev.c:6727)
      __do_softirq (kernel/softirq.c:553)
      __irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632)
      irq_exit_rcu (kernel/softirq.c:647)
      sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1074 (discriminator 14))
      asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:645)
      cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291)
      cpuidle_enter (drivers/cpuidle/cpuidle.c:390)
      call_cpuidle (kernel/sched/idle.c:135)
      do_idle (kernel/sched/idle.c:219 kernel/sched/idle.c:282)
      cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
      start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294)
      secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
      
      value changed: 0x80003fff -> 0x3402805f
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 21 PID: 0 Comm: swapper/21 Tainted: G             L     6.6.0-rc2-kcsan-00143-gb5cbe7c0 #41
      Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
      ==================================================================
      
      drivers/net/ethernet/realtek/r8169_main.c:
      ==========================================
         4429
       → 4430                 status = le32_to_cpu(desc->opts1);
         4431                 if (status & DescOwn)
         4432                         break;
         4433
         4434                 /* This barrier is needed to keep us from reading
         4435                  * any other fields out of the Rx descriptor until
         4436                  * we know the status of DescOwn
         4437                  */
         4438                 dma_rmb();
         4439
         4440                 if (unlikely(status & RxRES)) {
         4441                         if (net_ratelimit())
         4442                                 netdev_warn(dev, "Rx ERROR. status = %08x\n",
      
      Marco Elver explained that dma_rmb() doesn't prevent the compiler to tear up the access to
      desc->opts1 which can be written to concurrently. READ_ONCE() should prevent that from
      happening:
      
         4429
       → 4430                 status = le32_to_cpu(READ_ONCE(desc->opts1));
         4431                 if (status & DescOwn)
         4432                         break;
         4433
      
      As the consequence of this fix, this KCSAN warning was eliminated.
      
      Fixes: 6202806e ("r8169: drop member opts1_mask from struct rtl8169_private")
      Suggested-by: default avatarMarco Elver <elver@google.com>
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Cc: nic_swsd@realtek.com
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: netdev@vger.kernel.org
      Link: https://lore.kernel.org/lkml/dc7fc8fa-4ea4-e9a9-30a6-7c83e6b53188@alu.unizg.hr/Signed-off-by: default avatarMirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f97eee48
    • Mirsad Goran Todorovac's avatar
      r8169: fix the KCSAN reported data-race in rtl_tx while reading TxDescArray[entry].opts1 · dcf75a0f
      Mirsad Goran Todorovac authored
      KCSAN reported the following data-race:
      
      ==================================================================
      BUG: KCSAN: data-race in rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4368 drivers/net/ethernet/realtek/r8169_main.c:4581) r8169
      
      race at unknown origin, with read to 0xffff888140d37570 of 4 bytes by interrupt on cpu 21:
      rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4368 drivers/net/ethernet/realtek/r8169_main.c:4581) r8169
      __napi_poll (net/core/dev.c:6527)
      net_rx_action (net/core/dev.c:6596 net/core/dev.c:6727)
      __do_softirq (kernel/softirq.c:553)
      __irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632)
      irq_exit_rcu (kernel/softirq.c:647)
      sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1074 (discriminator 14))
      asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:645)
      cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291)
      cpuidle_enter (drivers/cpuidle/cpuidle.c:390)
      call_cpuidle (kernel/sched/idle.c:135)
      do_idle (kernel/sched/idle.c:219 kernel/sched/idle.c:282)
      cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
      start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294)
      secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
      
      value changed: 0xb0000042 -> 0x00000000
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 21 PID: 0 Comm: swapper/21 Tainted: G             L     6.6.0-rc2-kcsan-00143-gb5cbe7c0 #41
      Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
      ==================================================================
      
      The read side is in
      
      drivers/net/ethernet/realtek/r8169_main.c
      =========================================
         4355 static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp,
         4356                    int budget)
         4357 {
         4358         unsigned int dirty_tx, bytes_compl = 0, pkts_compl = 0;
         4359         struct sk_buff *skb;
         4360
         4361         dirty_tx = tp->dirty_tx;
         4362
         4363         while (READ_ONCE(tp->cur_tx) != dirty_tx) {
         4364                 unsigned int entry = dirty_tx % NUM_TX_DESC;
         4365                 u32 status;
         4366
       → 4367                 status = le32_to_cpu(tp->TxDescArray[entry].opts1);
         4368                 if (status & DescOwn)
         4369                         break;
         4370
         4371                 skb = tp->tx_skb[entry].skb;
         4372                 rtl8169_unmap_tx_skb(tp, entry);
         4373
         4374                 if (skb) {
         4375                         pkts_compl++;
         4376                         bytes_compl += skb->len;
         4377                         napi_consume_skb(skb, budget);
         4378                 }
         4379                 dirty_tx++;
         4380         }
         4381
         4382         if (tp->dirty_tx != dirty_tx) {
         4383                 dev_sw_netstats_tx_add(dev, pkts_compl, bytes_compl);
         4384                 WRITE_ONCE(tp->dirty_tx, dirty_tx);
         4385
         4386                 netif_subqueue_completed_wake(dev, 0, pkts_compl, bytes_compl,
         4387                                               rtl_tx_slots_avail(tp),
         4388                                               R8169_TX_START_THRS);
         4389                 /*
         4390                  * 8168 hack: TxPoll requests are lost when the Tx packets are
         4391                  * too close. Let's kick an extra TxPoll request when a burst
         4392                  * of start_xmit activity is detected (if it is not detected,
         4393                  * it is slow enough). -- FR
         4394                  * If skb is NULL then we come here again once a tx irq is
         4395                  * triggered after the last fragment is marked transmitted.
         4396                  */
         4397                 if (READ_ONCE(tp->cur_tx) != dirty_tx && skb)
         4398                         rtl8169_doorbell(tp);
         4399         }
         4400 }
      
      tp->TxDescArray[entry].opts1 is reported to have a data-race and READ_ONCE() fixes
      this KCSAN warning.
      
         4366
       → 4367                 status = le32_to_cpu(READ_ONCE(tp->TxDescArray[entry].opts1));
         4368                 if (status & DescOwn)
         4369                         break;
         4370
      
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Cc: nic_swsd@realtek.com
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Marco Elver <elver@google.com>
      Cc: netdev@vger.kernel.org
      Link: https://lore.kernel.org/lkml/dc7fc8fa-4ea4-e9a9-30a6-7c83e6b53188@alu.unizg.hr/Signed-off-by: default avatarMirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dcf75a0f
    • Mirsad Goran Todorovac's avatar
      r8169: fix the KCSAN reported data-race in rtl_tx() while reading tp->cur_tx · c1c0ce31
      Mirsad Goran Todorovac authored
      KCSAN reported the following data-race:
      
      ==================================================================
      BUG: KCSAN: data-race in rtl8169_poll [r8169] / rtl8169_start_xmit [r8169]
      
      write (marked) to 0xffff888102474b74 of 4 bytes by task 5358 on cpu 29:
      rtl8169_start_xmit (drivers/net/ethernet/realtek/r8169_main.c:4254) r8169
      dev_hard_start_xmit (./include/linux/netdevice.h:4889 ./include/linux/netdevice.h:4903 net/core/dev.c:3544 net/core/dev.c:3560)
      sch_direct_xmit (net/sched/sch_generic.c:342)
      __dev_queue_xmit (net/core/dev.c:3817 net/core/dev.c:4306)
      ip_finish_output2 (./include/linux/netdevice.h:3082 ./include/net/neighbour.h:526 ./include/net/neighbour.h:540 net/ipv4/ip_output.c:233)
      __ip_finish_output (net/ipv4/ip_output.c:311 net/ipv4/ip_output.c:293)
      ip_finish_output (net/ipv4/ip_output.c:328)
      ip_output (net/ipv4/ip_output.c:435)
      ip_send_skb (./include/net/dst.h:458 net/ipv4/ip_output.c:127 net/ipv4/ip_output.c:1486)
      udp_send_skb (net/ipv4/udp.c:963)
      udp_sendmsg (net/ipv4/udp.c:1246)
      inet_sendmsg (net/ipv4/af_inet.c:840 (discriminator 4))
      sock_sendmsg (net/socket.c:730 net/socket.c:753)
      __sys_sendto (net/socket.c:2177)
      __x64_sys_sendto (net/socket.c:2185)
      do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
      entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      
      read to 0xffff888102474b74 of 4 bytes by interrupt on cpu 21:
      rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4397 drivers/net/ethernet/realtek/r8169_main.c:4581) r8169
      __napi_poll (net/core/dev.c:6527)
      net_rx_action (net/core/dev.c:6596 net/core/dev.c:6727)
      __do_softirq (kernel/softirq.c:553)
      __irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632)
      irq_exit_rcu (kernel/softirq.c:647)
      common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14))
      asm_common_interrupt (./arch/x86/include/asm/idtentry.h:636)
      cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291)
      cpuidle_enter (drivers/cpuidle/cpuidle.c:390)
      call_cpuidle (kernel/sched/idle.c:135)
      do_idle (kernel/sched/idle.c:219 kernel/sched/idle.c:282)
      cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
      start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294)
      secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
      
      value changed: 0x002f4815 -> 0x002f4816
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 21 PID: 0 Comm: swapper/21 Tainted: G             L     6.6.0-rc2-kcsan-00143-gb5cbe7c0 #41
      Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
      ==================================================================
      
      The write side of drivers/net/ethernet/realtek/r8169_main.c is:
      ==================
         4251         /* rtl_tx needs to see descriptor changes before updated tp->cur_tx */
         4252         smp_wmb();
         4253
       → 4254         WRITE_ONCE(tp->cur_tx, tp->cur_tx + frags + 1);
         4255
         4256         stop_queue = !netif_subqueue_maybe_stop(dev, 0, rtl_tx_slots_avail(tp),
         4257                                                 R8169_TX_STOP_THRS,
         4258                                                 R8169_TX_START_THRS);
      
      The read side is the function rtl_tx():
      
         4355 static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp,
         4356                    int budget)
         4357 {
         4358         unsigned int dirty_tx, bytes_compl = 0, pkts_compl = 0;
         4359         struct sk_buff *skb;
         4360
         4361         dirty_tx = tp->dirty_tx;
         4362
         4363         while (READ_ONCE(tp->cur_tx) != dirty_tx) {
         4364                 unsigned int entry = dirty_tx % NUM_TX_DESC;
         4365                 u32 status;
         4366
         4367                 status = le32_to_cpu(tp->TxDescArray[entry].opts1);
         4368                 if (status & DescOwn)
         4369                         break;
         4370
         4371                 skb = tp->tx_skb[entry].skb;
         4372                 rtl8169_unmap_tx_skb(tp, entry);
         4373
         4374                 if (skb) {
         4375                         pkts_compl++;
         4376                         bytes_compl += skb->len;
         4377                         napi_consume_skb(skb, budget);
         4378                 }
         4379                 dirty_tx++;
         4380         }
         4381
         4382         if (tp->dirty_tx != dirty_tx) {
         4383                 dev_sw_netstats_tx_add(dev, pkts_compl, bytes_compl);
         4384                 WRITE_ONCE(tp->dirty_tx, dirty_tx);
         4385
         4386                 netif_subqueue_completed_wake(dev, 0, pkts_compl, bytes_compl,
         4387                                               rtl_tx_slots_avail(tp),
         4388                                               R8169_TX_START_THRS);
         4389                 /*
         4390                  * 8168 hack: TxPoll requests are lost when the Tx packets are
         4391                  * too close. Let's kick an extra TxPoll request when a burst
         4392                  * of start_xmit activity is detected (if it is not detected,
         4393                  * it is slow enough). -- FR
         4394                  * If skb is NULL then we come here again once a tx irq is
         4395                  * triggered after the last fragment is marked transmitted.
         4396                  */
       → 4397                 if (tp->cur_tx != dirty_tx && skb)
         4398                         rtl8169_doorbell(tp);
         4399         }
         4400 }
      
      Obviously from the code, an earlier detected data-race for tp->cur_tx was fixed in the
      line 4363:
      
         4363         while (READ_ONCE(tp->cur_tx) != dirty_tx) {
      
      but the same solution is required for protecting the other access to tp->cur_tx:
      
       → 4397                 if (READ_ONCE(tp->cur_tx) != dirty_tx && skb)
         4398                         rtl8169_doorbell(tp);
      
      The write in the line 4254 is protected with WRITE_ONCE(), but the read in the line 4397
      might have suffered read tearing under some compiler optimisations.
      
      The fix eliminated the KCSAN data-race report for this bug.
      
      It is yet to be evaluated what happens if tp->cur_tx changes between the test in line 4363
      and line 4397. This test should certainly not be cached by the compiler in some register
      for such a long time, while asynchronous writes to tp->cur_tx might have occurred in line
      4254 in the meantime.
      
      Fixes: 94d8a98e ("r8169: reduce number of workaround doorbell rings")
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Cc: nic_swsd@realtek.com
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Marco Elver <elver@google.com>
      Cc: netdev@vger.kernel.org
      Link: https://lore.kernel.org/lkml/dc7fc8fa-4ea4-e9a9-30a6-7c83e6b53188@alu.unizg.hr/Signed-off-by: default avatarMirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1c0ce31
    • Maciej Fijalkowski's avatar
      i40e: xsk: remove count_mask · 913eda2b
      Maciej Fijalkowski authored
      Cited commit introduced a neat way of updating next_to_clean that does
      not require boundary checks on each increment. This was done by masking
      the new value with (ring length - 1) mask. Problem is that this is
      applicable only for power of 2 ring sizes, for every other size this
      assumption can not be made. In turn, it leads to cleaning descriptors
      out of order as well as splats:
      
      [ 1388.411915] Workqueue: events xp_release_deferred
      [ 1388.411919] RIP: 0010:xp_free+0x1a/0x50
      [ 1388.411921] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 48 8b 57 70 48 8d 47 70 48 89 e5 48 39 d0 74 06 <5d> c3 cc cc cc cc 48 8b 57 60 83 82 b8 00 00 00 01 48 8b 57 60 48
      [ 1388.411922] RSP: 0018:ffa0000000a83cb0 EFLAGS: 00000206
      [ 1388.411923] RAX: ff11000119aa5030 RBX: 000000000000001d RCX: ff110001129b6e50
      [ 1388.411924] RDX: ff11000119aa4fa0 RSI: 0000000055555554 RDI: ff11000119aa4fc0
      [ 1388.411925] RBP: ffa0000000a83cb0 R08: 0000000000000000 R09: 0000000000000000
      [ 1388.411926] R10: 0000000000000001 R11: 0000000000000000 R12: ff11000115829b80
      [ 1388.411927] R13: 000000000000005f R14: 0000000000000000 R15: ff11000119aa4fc0
      [ 1388.411928] FS:  0000000000000000(0000) GS:ff11000277e00000(0000) knlGS:0000000000000000
      [ 1388.411929] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1388.411930] CR2: 00007f1f564e6c14 CR3: 000000000783c005 CR4: 0000000000771ef0
      [ 1388.411931] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1388.411931] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 1388.411932] PKRU: 55555554
      [ 1388.411933] Call Trace:
      [ 1388.411934]  <IRQ>
      [ 1388.411935]  ? show_regs+0x6e/0x80
      [ 1388.411937]  ? watchdog_timer_fn+0x1d2/0x240
      [ 1388.411939]  ? __pfx_watchdog_timer_fn+0x10/0x10
      [ 1388.411941]  ? __hrtimer_run_queues+0x10e/0x290
      [ 1388.411945]  ? clockevents_program_event+0xae/0x130
      [ 1388.411947]  ? hrtimer_interrupt+0x105/0x240
      [ 1388.411949]  ? __sysvec_apic_timer_interrupt+0x54/0x150
      [ 1388.411952]  ? sysvec_apic_timer_interrupt+0x7f/0x90
      [ 1388.411955]  </IRQ>
      [ 1388.411955]  <TASK>
      [ 1388.411956]  ? asm_sysvec_apic_timer_interrupt+0x1f/0x30
      [ 1388.411958]  ? xp_free+0x1a/0x50
      [ 1388.411960]  i40e_xsk_clean_rx_ring+0x5d/0x100 [i40e]
      [ 1388.411968]  i40e_clean_rx_ring+0x14c/0x170 [i40e]
      [ 1388.411977]  i40e_queue_pair_disable+0xda/0x260 [i40e]
      [ 1388.411986]  i40e_xsk_pool_setup+0x192/0x1d0 [i40e]
      [ 1388.411993]  i40e_reconfig_rss_queues+0x1f0/0x1450 [i40e]
      [ 1388.412002]  xp_disable_drv_zc+0x73/0xf0
      [ 1388.412004]  ? mutex_lock+0x17/0x50
      [ 1388.412007]  xp_release_deferred+0x2b/0xc0
      [ 1388.412010]  process_one_work+0x178/0x350
      [ 1388.412011]  ? __pfx_worker_thread+0x10/0x10
      [ 1388.412012]  worker_thread+0x2f7/0x420
      [ 1388.412014]  ? __pfx_worker_thread+0x10/0x10
      [ 1388.412015]  kthread+0xf8/0x130
      [ 1388.412017]  ? __pfx_kthread+0x10/0x10
      [ 1388.412019]  ret_from_fork+0x3d/0x60
      [ 1388.412021]  ? __pfx_kthread+0x10/0x10
      [ 1388.412023]  ret_from_fork_asm+0x1b/0x30
      [ 1388.412026]  </TASK>
      
      It comes from picking wrong ring entries when cleaning xsk buffers
      during pool detach.
      
      Remove the count_mask logic and use they boundary check when updating
      next_to_process (which used to be a next_to_clean).
      
      Fixes: c8a8ca34 ("i40e: remove unnecessary memory writes of the next to clean pointer")
      Reported-by: default avatarTushar Vyavahare <tushar.vyavahare@intel.com>
      Tested-by: default avatarTushar Vyavahare <tushar.vyavahare@intel.com>
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20231018163908.40841-1-maciej.fijalkowski@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      913eda2b
  3. 19 Oct, 2023 26 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · ce55c22e
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bluetooth, netfilter, WiFi.
      
        Feels like an up-tick in regression fixes, mostly for older releases.
        The hfsc fix, tcp_disconnect() and Intel WWAN fixes stand out as
        fairly clear-cut user reported regressions. The mlx5 DMA bug was
        causing strife for 390x folks. The fixes themselves are not
        particularly scary, tho. No open investigations / outstanding reports
        at the time of writing.
      
        Current release - regressions:
      
         - eth: mlx5: perform DMA operations in the right locations, make
           devices usable on s390x, again
      
         - sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner
           curve, previous fix of rejecting invalid config broke some scripts
      
         - rfkill: reduce data->mtx scope in rfkill_fop_open, avoid deadlock
      
         - revert "ethtool: Fix mod state of verbose no_mask bitset", needs
           more work
      
        Current release - new code bugs:
      
         - tcp: fix listen() warning with v4-mapped-v6 address
      
        Previous releases - regressions:
      
         - tcp: allow tcp_disconnect() again when threads are waiting, it was
           denied to plug a constant source of bugs but turns out .NET depends
           on it
      
         - eth: mlx5: fix double-free if buffer refill fails under OOM
      
         - revert "net: wwan: iosm: enable runtime pm support for 7560", it's
           causing regressions and the WWAN team at Intel disappeared
      
         - tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a
           single skb, fix single-stream perf regression on some devices
      
        Previous releases - always broken:
      
         - Bluetooth:
            - fix issues in legacy BR/EDR PIN code pairing
            - correctly bounds check and pad HCI_MON_NEW_INDEX name
      
         - netfilter:
            - more fixes / follow ups for the large "commit protocol" rework,
              which went in as a fix to 6.5
            - fix null-derefs on netlink attrs which user may not pass in
      
         - tcp: fix excessive TLP and RACK timeouts from HZ rounding (bless
           Debian for keeping HZ=250 alive)
      
         - net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation, prevent
           letting frankenstein UDP super-frames from getting into the stack
      
         - net: fix interface altnames when ifc moves to a new namespace
      
         - eth: qed: fix the size of the RX buffers
      
         - mptcp: avoid sending RST when closing the initial subflow"
      
      * tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
        Revert "ethtool: Fix mod state of verbose no_mask bitset"
        selftests: mptcp: join: no RST when rm subflow/addr
        mptcp: avoid sending RST when closing the initial subflow
        mptcp: more conservative check for zero probes
        tcp: check mptcp-level constraints for backlog coalescing
        selftests: mptcp: join: correctly check for no RST
        net: ti: icssg-prueth: Fix r30 CMDs bitmasks
        selftests: net: add very basic test for netdev names and namespaces
        net: move altnames together with the netdevice
        net: avoid UAF on deleted altname
        net: check for altname conflicts when changing netdev's netns
        net: fix ifname in netlink ntf during netns move
        net: ethernet: ti: Fix mixed module-builtin object
        net: phy: bcm7xxx: Add missing 16nm EPHY statistics
        ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr
        tcp_bpf: properly release resources on error paths
        net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve
        net: mdio-mux: fix C45 access returning -EIO after API change
        tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb
        octeon_ep: update BQL sent bytes before ringing doorbell
        ...
      ce55c22e
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-6.6-3' of... · 74e9347e
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai ChenL
       "Fix 4-level pagetable building, disable WUC for pgprot_writecombine()
        like ioremap_wc(), use correct annotation for exception handlers, and
        a trivial cleanup"
      
      * tag 'loongarch-fixes-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        LoongArch: Disable WUC for pgprot_writecombine() like ioremap_wc()
        LoongArch: Replace kmap_atomic() with kmap_local_page() in copy_user_highpage()
        LoongArch: Export symbol invalid_pud_table for modules building
        LoongArch: Use SYM_CODE_* to annotate exception handlers
      74e9347e
    • Linus Torvalds's avatar
      Merge tag 'slab-fixes-for-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab · 54fb58ae
      Linus Torvalds authored
      Pull slab fix from Vlastimil Babka:
      
       - stable fix to prevent kernel warnings with KASAN_HW_TAGS on arm64
         due to improperly resolved kmalloc alignment restrictions (Catalin
         Marinas)
      
      * tag 'slab-fixes-for-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
        mm: slab: Do not create kmalloc caches smaller than arch_slab_minalign()
      54fb58ae
    • Linus Torvalds's avatar
      Merge tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 189b7562
      Linus Torvalds authored
      Pull seccomp fix from Kees Cook:
      
       - Fix seccomp_unotify perf benchmark for 32-bit (Jiri Slaby)
      
      * tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        perf/benchmark: fix seccomp_unotify benchmark for 32-bit
      189b7562
    • Linus Torvalds's avatar
      Merge tag 'v6.6-rc7.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · ea1cc20c
      Linus Torvalds authored
      Pull vfs fix from Christian Brauner:
       "An openat() call from io_uring triggering an audit call can apparently
        cause the refcount of struct filename to be incremented from multiple
        threads concurrently during async execution, triggering a refcount
        underflow and hitting a BUG_ON(). That bug has been lurking around
        since at least v5.16 apparently.
      
        Switch to an atomic counter to fix that. The underflow check is
        downgraded from a BUG_ON() to a WARN_ON_ONCE() but we could easily
        remove that check altogether tbh"
      
      * tag 'v6.6-rc7.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        audit,io_uring: io_uring openat triggers audit reference count underflow
      ea1cc20c
    • Kory Maincent's avatar
      Revert "ethtool: Fix mod state of verbose no_mask bitset" · 52451502
      Kory Maincent authored
      This reverts commit 108a36d0.
      
      It was reported that this fix breaks the possibility to remove existing WoL
      flags. For example:
      ~$ ethtool lan2
      ...
              Supports Wake-on: pg
              Wake-on: d
      ...
      ~$ ethtool -s lan2 wol gp
      ~$ ethtool lan2
      ...
              Wake-on: pg
      ...
      ~$ ethtool -s lan2 wol d
      ~$ ethtool lan2
      ...
              Wake-on: pg
      ...
      
      This worked correctly before this commit because we were always updating
      a zero bitmap (since commit 66991703 ("ethtool: fix application of
      verbose no_mask bitset"), that is) so that the rest was left zero
      naturally. But now the 1->0 change (old_val is true, bit not present in
      netlink nest) no longer works.
      Reported-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reported-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Closes: https://lore.kernel.org/netdev/20231019095140.l6fffnszraeb6iiw@lion.mk-sys.cz/
      Cc: stable@vger.kernel.org
      Fixes: 108a36d0 ("ethtool: Fix mod state of verbose no_mask bitset")
      Signed-off-by: default avatarKory Maincent <kory.maincent@bootlin.com>
      Reviewed-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Link: https://lore.kernel.org/r/20231019-feature_ptp_bitset_fix-v1-1-70f3c429a221@bootlin.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      52451502
    • Linus Torvalds's avatar
      Merge tag 'ntfs3_for_6.6' of https://github.com/Paragon-Software-Group/linux-ntfs3 · f69d00d1
      Linus Torvalds authored
      Pull ntfs3 fixes from Konstantin Komarov:
      
       - memory leak
      
       - some logic errors, NULL dereferences
      
       - some code was refactored
      
       - more sanity checks
      
      * tag 'ntfs3_for_6.6' of https://github.com/Paragon-Software-Group/linux-ntfs3:
        fs/ntfs3: Avoid possible memory leak
        fs/ntfs3: Fix directory element type detection
        fs/ntfs3: Fix possible null-pointer dereference in hdr_find_e()
        fs/ntfs3: Fix OOB read in ntfs_init_from_boot
        fs/ntfs3: fix panic about slab-out-of-bounds caused by ntfs_list_ea()
        fs/ntfs3: Fix NULL pointer dereference on error in attr_allocate_frame()
        fs/ntfs3: Fix possible NULL-ptr-deref in ni_readpage_cmpr()
        fs/ntfs3: Do not allow to change label if volume is read-only
        fs/ntfs3: Add more info into /proc/fs/ntfs3/<dev>/volinfo
        fs/ntfs3: Refactoring and comments
        fs/ntfs3: Fix alternative boot searching
        fs/ntfs3: Allow repeated call to ntfs3_put_sbi
        fs/ntfs3: Use inode_set_ctime_to_ts instead of inode_set_ctime
        fs/ntfs3: Fix shift-out-of-bounds in ntfs_fill_super
        fs/ntfs3: fix deadlock in mark_as_free_ex
        fs/ntfs3: Add more attributes checks in mi_enum_attr()
        fs/ntfs3: Use kvmalloc instead of kmalloc(... __GFP_NOWARN)
        fs/ntfs3: Write immediately updated ntfs state
        fs/ntfs3: Add ckeck in ni_update_parent()
      f69d00d1
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-fixes-for-v6-6' · 1c1f14f9
      Jakub Kicinski authored
      Mat Martineau says:
      
      ====================
      mptcp: Fixes for v6.6
      
      Patch 1 corrects the logic for MP_JOIN tests where 0 RSTs are expected.
      
      Patch 2 ensures MPTCP packets are not incorrectly coalesced in the TCP
      backlog queue.
      
      Patch 3 avoids a zero-window probe and associated WARN_ON_ONCE() in an
      expected MPTCP reinjection scenario.
      
      Patches 4 & 5 allow an initial MPTCP subflow to be closed cleanly
      instead of always sending RST. Associated selftest is updated.
      ====================
      
      Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-0-17ecb002e41d@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1c1f14f9
    • Matthieu Baerts's avatar
      selftests: mptcp: join: no RST when rm subflow/addr · 2cfaa8b3
      Matthieu Baerts authored
      Recently, we noticed that some RST were wrongly generated when removing
      the initial subflow.
      
      This patch makes sure RST are not sent when removing any subflows or any
      addresses.
      
      Fixes: c2b2ae39 ("mptcp: handle correctly disconnect() failures")
      Cc: stable@vger.kernel.org
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMatthieu Baerts <matttbe@kernel.org>
      Signed-off-by: default avatarMat Martineau <martineau@kernel.org>
      Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-5-17ecb002e41d@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2cfaa8b3
    • Geliang Tang's avatar
      mptcp: avoid sending RST when closing the initial subflow · 14c56686
      Geliang Tang authored
      When closing the first subflow, the MPTCP protocol unconditionally
      calls tcp_disconnect(), which in turn generates a reset if the subflow
      is established.
      
      That is unexpected and different from what MPTCP does with MPJ
      subflows, where resets are generated only on FASTCLOSE and other edge
      scenarios.
      
      We can't reuse for the first subflow the same code in place for MPJ
      subflows, as MPTCP clean them up completely via a tcp_close() call,
      while must keep the first subflow socket alive for later re-usage, due
      to implementation constraints.
      
      This patch adds a new helper __mptcp_subflow_disconnect() that
      encapsulates, a logic similar to tcp_close, issuing a reset only when
      the MPTCP_CF_FASTCLOSE flag is set, and performing a clean shutdown
      otherwise.
      
      Fixes: c2b2ae39 ("mptcp: handle correctly disconnect() failures")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMatthieu Baerts <matttbe@kernel.org>
      Co-developed-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarGeliang Tang <geliang.tang@suse.com>
      Signed-off-by: default avatarMat Martineau <martineau@kernel.org>
      Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-4-17ecb002e41d@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      14c56686
    • Paolo Abeni's avatar
      mptcp: more conservative check for zero probes · 72377ab2
      Paolo Abeni authored
      Christoph reported that the MPTCP protocol can find the subflow-level
      write queue unexpectedly not empty while crafting a zero-window probe,
      hitting a warning:
      
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 188 at net/mptcp/protocol.c:1312 mptcp_sendmsg_frag+0xc06/0xe70
      Modules linked in:
      CPU: 0 PID: 188 Comm: kworker/0:2 Not tainted 6.6.0-rc2-g1176aa719d7a #47
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      Workqueue: events mptcp_worker
      RIP: 0010:mptcp_sendmsg_frag+0xc06/0xe70 net/mptcp/protocol.c:1312
      RAX: 47d0530de347ff6a RBX: 47d0530de347ff6b RCX: ffff8881015d3c00
      RDX: ffff8881015d3c00 RSI: 47d0530de347ff6b RDI: 47d0530de347ff6b
      RBP: 47d0530de347ff6b R08: ffffffff8243c6a8 R09: ffffffff82042d9c
      R10: 0000000000000002 R11: ffffffff82056850 R12: ffff88812a13d580
      R13: 0000000000000001 R14: ffff88812b375e50 R15: ffff88812bbf3200
      FS:  0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000695118 CR3: 0000000115dfc001 CR4: 0000000000170ef0
      Call Trace:
       <TASK>
       __subflow_push_pending+0xa4/0x420 net/mptcp/protocol.c:1545
       __mptcp_push_pending+0x128/0x3b0 net/mptcp/protocol.c:1614
       mptcp_release_cb+0x218/0x5b0 net/mptcp/protocol.c:3391
       release_sock+0xf6/0x100 net/core/sock.c:3521
       mptcp_worker+0x6e8/0x8f0 net/mptcp/protocol.c:2746
       process_scheduled_works+0x341/0x690 kernel/workqueue.c:2630
       worker_thread+0x3a7/0x610 kernel/workqueue.c:2784
       kthread+0x143/0x180 kernel/kthread.c:388
       ret_from_fork+0x4d/0x60 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:304
       </TASK>
      
      The root cause of the issue is that expectations are wrong: e.g. due
      to MPTCP-level re-injection we can hit the critical condition.
      
      Explicitly avoid the zero-window probe when the subflow write queue
      is not empty and drop the related warnings.
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/444
      Fixes: f70cad10 ("mptcp: stop relying on tcp_tx_skb_cache")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <martineau@kernel.org>
      Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-3-17ecb002e41d@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      72377ab2
    • Paolo Abeni's avatar
      tcp: check mptcp-level constraints for backlog coalescing · 6db8a37d
      Paolo Abeni authored
      The MPTCP protocol can acquire the subflow-level socket lock and
      cause the tcp backlog usage. When inserting new skbs into the
      backlog, the stack will try to coalesce them.
      
      Currently, we have no check in place to ensure that such coalescing
      will respect the MPTCP-level DSS, and that may cause data stream
      corruption, as reported by Christoph.
      
      Address the issue by adding the relevant admission check for coalescing
      in tcp_add_backlog().
      
      Note the issue is not easy to reproduce, as the MPTCP protocol tries
      hard to avoid acquiring the subflow-level socket lock.
      
      Fixes: 648ef4b8 ("mptcp: Implement MPTCP receive path")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/420Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <martineau@kernel.org>
      Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-2-17ecb002e41d@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6db8a37d
    • Matthieu Baerts's avatar
      selftests: mptcp: join: correctly check for no RST · b134a580
      Matthieu Baerts authored
      The commit mentioned below was more tolerant with the number of RST seen
      during a test because in some uncontrollable situations, multiple RST
      can be generated.
      
      But it was not taking into account the case where no RST are expected:
      this validation was then no longer reporting issues for the 0 RST case
      because it is not possible to have less than 0 RST in the counter. This
      patch fixes the issue by adding a specific condition.
      
      Fixes: 6bf41020 ("selftests: mptcp: update and extend fastclose test-cases")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts <matttbe@kernel.org>
      Signed-off-by: default avatarMat Martineau <martineau@kernel.org>
      Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-1-17ecb002e41d@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b134a580
    • MD Danish Anwar's avatar
      net: ti: icssg-prueth: Fix r30 CMDs bitmasks · 389db4fd
      MD Danish Anwar authored
      The bitmasks for EMAC_PORT_DISABLE and EMAC_PORT_FORWARD r30 commands are
      wrong in the driver.
      
      Update the bitmasks of these commands to the correct ones as used by the
      ICSSG firmware. These bitmasks are backwards compatible and work with
      any ICSSG firmware version.
      
      Fixes: e9b4ece7 ("net: ti: icssg-prueth: Add Firmware config and classification APIs.")
      Signed-off-by: default avatarMD Danish Anwar <danishanwar@ti.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20231018150715.3085380-1-danishanwar@ti.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      389db4fd
    • Linus Torvalds's avatar
      Merge tag 'for-6.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 7cf4bea7
      Linus Torvalds authored
      Pull btrfs fix from David Sterba:
       "Fix a bug in chunk size decision that could lead to suboptimal
        placement and filling patterns"
      
      * tag 'for-6.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fix stripe length calculation for non-zoned data chunk allocation
      7cf4bea7
    • Paolo Abeni's avatar
      Merge branch 'net-fix-bugs-in-device-netns-move-and-rename' · f7d86df4
      Paolo Abeni authored
      Jakub Kicinski says:
      
      ====================
      net: fix bugs in device netns-move and rename
      
      Daniel reported issues with the uevents generated during netdev
      namespace move, if the netdev is getting renamed at the same time.
      
      While the issue that he actually cares about is not fixed here,
      there is a bunch of seemingly obvious other bugs in this code.
      Fix the purely networking bugs while the discussion around
      the uevent fix is still ongoing.
      ====================
      
      Link: https://lore.kernel.org/r/20231018013817.2391509-1-kuba@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f7d86df4
    • Jakub Kicinski's avatar
      selftests: net: add very basic test for netdev names and namespaces · 3920431d
      Jakub Kicinski authored
      Add selftest for fixes around naming netdevs and namespaces.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3920431d
    • Jakub Kicinski's avatar
      net: move altnames together with the netdevice · 8e15aee6
      Jakub Kicinski authored
      The altname nodes are currently not moved to the new netns
      when netdevice itself moves:
      
        [ ~]# ip netns add test
        [ ~]# ip -netns test link add name eth0 type dummy
        [ ~]# ip -netns test link property add dev eth0 altname some-name
        [ ~]# ip -netns test link show dev some-name
        2: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
            link/ether 1e:67:ed:19:3d:24 brd ff:ff:ff:ff:ff:ff
            altname some-name
        [ ~]# ip -netns test link set dev eth0 netns 1
        [ ~]# ip link
        ...
        3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
            link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff
            altname some-name
        [ ~]# ip li show dev some-name
        Device "some-name" does not exist.
      
      Remove them from the hash table when device is unlisted
      and add back when listed again.
      
      Fixes: 36fbf1e5 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames")
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8e15aee6
    • Jakub Kicinski's avatar
      net: avoid UAF on deleted altname · 1a83f4a7
      Jakub Kicinski authored
      Altnames are accessed under RCU (dev_get_by_name_rcu())
      but freed by kfree() with no synchronization point.
      
      Each node has one or two allocations (node and a variable-size
      name, sometimes the name is netdev->name). Adding rcu_heads
      here is a bit tedious. Besides most code which unlists the names
      already has rcu barriers - so take the simpler approach of adding
      synchronize_rcu(). Note that the one on the unregistration path
      (which matters more) is removed by the next fix.
      
      Fixes: ff927412 ("net: introduce name_node struct to be used in hashlist")
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      1a83f4a7
    • Jakub Kicinski's avatar
      net: check for altname conflicts when changing netdev's netns · 7663d522
      Jakub Kicinski authored
      It's currently possible to create an altname conflicting
      with an altname or real name of another device by creating
      it in another netns and moving it over:
      
       [ ~]$ ip link add dev eth0 type dummy
      
       [ ~]$ ip netns add test
       [ ~]$ ip -netns test link add dev ethX netns test type dummy
       [ ~]$ ip -netns test link property add dev ethX altname eth0
       [ ~]$ ip -netns test link set dev ethX netns 1
      
       [ ~]$ ip link
       ...
       3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
           link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff
       ...
       5: ethX: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
           link/ether 26:b7:28:78:38:0f brd ff:ff:ff:ff:ff:ff
           altname eth0
      
      Create a macro for walking the altnames, this hopefully makes
      it clearer that the list we walk contains only altnames.
      Which is otherwise not entirely intuitive.
      
      Fixes: 36fbf1e5 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames")
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7663d522
    • Jakub Kicinski's avatar
      net: fix ifname in netlink ntf during netns move · 311cca40
      Jakub Kicinski authored
      dev_get_valid_name() overwrites the netdev's name on success.
      This makes it hard to use in prepare-commit-like fashion,
      where we do validation first, and "commit" to the change
      later.
      
      Factor out a helper which lets us save the new name to a buffer.
      Use it to fix the problem of notification on netns move having
      incorrect name:
      
       5: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
           link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff
       6: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
           link/ether 1e:4a:34:36:e3:cd brd ff:ff:ff:ff:ff:ff
      
       [ ~]# ip link set dev eth0 netns 1 name eth1
      
      ip monitor inside netns:
       Deleted inet eth0
       Deleted inet6 eth0
       Deleted 5: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
           link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff new-netnsid 0 new-ifindex 7
      
      Name is reported as eth1 in old netns for ifindex 5, already renamed.
      
      Fixes: d9031024 ("net: device name allocation cleanups")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      311cca40
    • MD Danish Anwar's avatar
      net: ethernet: ti: Fix mixed module-builtin object · a602ee31
      MD Danish Anwar authored
      With CONFIG_TI_K3_AM65_CPSW_NUSS=y and CONFIG_TI_ICSSG_PRUETH=m,
      k3-cppi-desc-pool.o is linked to a module and also to vmlinux even though
      the expected CFLAGS are different between builtins and modules.
      
      The build system is complaining about the following:
      
      k3-cppi-desc-pool.o is added to multiple modules: icssg-prueth
      ti-am65-cpsw-nuss
      
      Introduce the new module, k3-cppi-desc-pool, to provide the common
      functions to ti-am65-cpsw-nuss and icssg-prueth.
      
      Fixes: 128d5874 ("net: ti: icssg-prueth: Add ICSSG ethernet driver")
      Signed-off-by: default avatarMD Danish Anwar <danishanwar@ti.com>
      Link: https://lore.kernel.org/r/20231018064936.3146846-1-danishanwar@ti.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a602ee31
    • Jakub Kicinski's avatar
      Merge tag 'nf-23-10-18' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 9b9ac46c
      Jakub Kicinski authored
      Florian Westphal says:
      
      ====================
      netfilter: updates for net
      
      First patch, from Phil Sutter, reduces number of audit notifications
      when userspace requests to re-set stateful objects.
      This change also comes with a selftest update.
      
      Second patch, also from Phil, moves the nftables audit selftest
      to its own netns to avoid interference with the init netns.
      
      Third patch, from Pablo Neira, fixes an inconsistency with the "rbtree"
      set backend: When set element X has expired, a request to delete element
      X should fail (like with all other backends).
      
      Finally, patch four, also from Pablo, reverts a recent attempt to speed
      up abort of a large pending update with the "pipapo" set backend.
      
      It could cause stray references to remain in the set, which then
      results in a double-free.
      
      * tag 'nf-23-10-18' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: revert do not remove elements if set backend implements .abort
        netfilter: nft_set_rbtree: .deactivate fails if element has expired
        selftests: netfilter: Run nft_audit.sh in its own netns
        netfilter: nf_tables: audit log object reset once per table
      ====================
      
      Link: https://lore.kernel.org/r/20231018125605.27299-1-fw@strlen.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9b9ac46c
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2023-10-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · 88343fbe
      Jakub Kicinski authored
      Johannes Berg says:
      
      ====================
      A few more fixes:
       * prevent value bounce/glitch in rfkill GPIO probe
       * fix lockdep report in rfkill
       * fix error path leak in mac80211 key handling
       * use system_unbound_wq for wiphy work since it
         can take longer
      
      * tag 'wireless-2023-10-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
        net: rfkill: reduce data->mtx scope in rfkill_fop_open
        net: rfkill: gpio: prevent value glitch during probe
        wifi: mac80211: fix error path key leak
        wifi: cfg80211: use system_unbound_wq for wiphy work
      ====================
      
      Link: https://lore.kernel.org/r/20231018071041.8175-2-johannes@sipsolutions.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      88343fbe
    • Florian Fainelli's avatar
      net: phy: bcm7xxx: Add missing 16nm EPHY statistics · 6200e00e
      Florian Fainelli authored
      The .probe() function would allocate the necessary space and ensure that
      the library call sizes the number of statistics but the callbacks
      necessary to fetch the name and values were not wired up.
      Reported-by: default avatarJustin Chen <justin.chen@broadcom.com>
      Fixes: f68d08c4 ("net: phy: bcm7xxx: Add EPHY entry for 72165")
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20231017205119.416392-1-florian.fainelli@broadcom.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6200e00e
    • Eric Dumazet's avatar
      ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr · 195374d8
      Eric Dumazet authored
      syzbot reported a data-race while accessing nh->nh_saddr_genid [1]
      
      Add annotations, but leave the code lazy as intended.
      
      [1]
      BUG: KCSAN: data-race in fib_select_path / fib_select_path
      
      write to 0xffff8881387166f0 of 4 bytes by task 6778 on cpu 1:
      fib_info_update_nhc_saddr net/ipv4/fib_semantics.c:1334 [inline]
      fib_result_prefsrc net/ipv4/fib_semantics.c:1354 [inline]
      fib_select_path+0x292/0x330 net/ipv4/fib_semantics.c:2269
      ip_route_output_key_hash_rcu+0x659/0x12c0 net/ipv4/route.c:2810
      ip_route_output_key_hash net/ipv4/route.c:2644 [inline]
      __ip_route_output_key include/net/route.h:134 [inline]
      ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2872
      send4+0x1f5/0x520 drivers/net/wireguard/socket.c:61
      wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175
      wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200
      wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
      wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51
      process_one_work kernel/workqueue.c:2630 [inline]
      process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703
      worker_thread+0x525/0x730 kernel/workqueue.c:2784
      kthread+0x1d7/0x210 kernel/kthread.c:388
      ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
      ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
      
      read to 0xffff8881387166f0 of 4 bytes by task 6759 on cpu 0:
      fib_result_prefsrc net/ipv4/fib_semantics.c:1350 [inline]
      fib_select_path+0x1cb/0x330 net/ipv4/fib_semantics.c:2269
      ip_route_output_key_hash_rcu+0x659/0x12c0 net/ipv4/route.c:2810
      ip_route_output_key_hash net/ipv4/route.c:2644 [inline]
      __ip_route_output_key include/net/route.h:134 [inline]
      ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2872
      send4+0x1f5/0x520 drivers/net/wireguard/socket.c:61
      wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175
      wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200
      wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
      wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51
      process_one_work kernel/workqueue.c:2630 [inline]
      process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703
      worker_thread+0x525/0x730 kernel/workqueue.c:2784
      kthread+0x1d7/0x210 kernel/kthread.c:388
      ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
      ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
      
      value changed: 0x959d3217 -> 0x959d3218
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 6759 Comm: kworker/u4:15 Not tainted 6.6.0-rc4-syzkaller-00029-gcbf3a2cb #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023
      Workqueue: wg-kex-wg1 wg_packet_handshake_send_worker
      
      Fixes: 436c3b66 ("ipv4: Invalidate nexthop cache nh_saddr more correctly.")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20231017192304.82626-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      195374d8