1. 27 Jun, 2024 3 commits
    • Kuniyuki Iwashima's avatar
      selftest: af_unix: Add msg_oob.c. · d098d772
      Kuniyuki Iwashima authored
      AF_UNIX's MSG_OOB functionality lacked thorough testing, and we found
      some bizarre behaviour.
      
      The new selftest validates every MSG_OOB operation against TCP as a
      reference implementation.
      
      This patch adds only a few tests with basic send() and recv() that
      do not fail.
      
      The following patches will add more test cases for SO_OOBINLINE, SIGURG,
      EPOLLPRI, and SIOCATMARK.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d098d772
    • Kuniyuki Iwashima's avatar
      selftest: af_unix: Remove test_unix_oob.c. · 7d139181
      Kuniyuki Iwashima authored
      test_unix_oob.c does not fully cover AF_UNIX's MSG_OOB functionality,
      thus there are discrepancies between TCP behaviour.
      
      Also, the test uses fork() to create message producer, and it's not
      easy to understand and add more test cases.
      
      Let's remove test_unix_oob.c and rewrite a new test.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7d139181
    • Yunseong Kim's avatar
      tracing/net_sched: NULL pointer dereference in perf_trace_qdisc_reset() · bab49231
      Yunseong Kim authored
      In the TRACE_EVENT(qdisc_reset) NULL dereference occurred from
      
       qdisc->dev_queue->dev <NULL> ->name
      
      This situation simulated from bunch of veths and Bluetooth disconnection
      and reconnection.
      
      During qdisc initialization, qdisc was being set to noop_queue.
      In veth_init_queue, the initial tx_num was reduced back to one,
      causing the qdisc reset to be called with noop, which led to the kernel
      panic.
      
      I've attached the GitHub gist link that C converted syz-execprogram
      source code and 3 log of reproduced vmcore-dmesg.
      
       https://gist.github.com/yskelg/cc64562873ce249cdd0d5a358b77d740
      
      Yeoreum and I use two fuzzing tool simultaneously.
      
      One process with syz-executor : https://github.com/google/syzkaller
      
       $ ./syz-execprog -executor=./syz-executor -repeat=1 -sandbox=setuid \
          -enable=none -collide=false log1
      
      The other process with perf fuzzer:
       https://github.com/deater/perf_event_tests/tree/master/fuzzer
      
       $ perf_event_tests/fuzzer/perf_fuzzer
      
      I think this will happen on the kernel version.
      
       Linux kernel version +v6.7.10, +v6.8, +v6.9 and it could happen in v6.10.
      
      This occurred from 51270d57. I think this patch is absolutely
      necessary. Previously, It was showing not intended string value of name.
      
      I've reproduced 3 time from my fedora 40 Debug Kernel with any other module
      or patched.
      
       version: 6.10.0-0.rc2.20240608gitdc772f82.29.fc41.aarch64+debug
      
      [ 5287.164555] veth0_vlan: left promiscuous mode
      [ 5287.164929] veth1_macvtap: left promiscuous mode
      [ 5287.164950] veth0_macvtap: left promiscuous mode
      [ 5287.164983] veth1_vlan: left promiscuous mode
      [ 5287.165008] veth0_vlan: left promiscuous mode
      [ 5287.165450] veth1_macvtap: left promiscuous mode
      [ 5287.165472] veth0_macvtap: left promiscuous mode
      [ 5287.165502] veth1_vlan: left promiscuous mode
      …
      [ 5297.598240] bridge0: port 2(bridge_slave_1) entered blocking state
      [ 5297.598262] bridge0: port 2(bridge_slave_1) entered forwarding state
      [ 5297.598296] bridge0: port 1(bridge_slave_0) entered blocking state
      [ 5297.598313] bridge0: port 1(bridge_slave_0) entered forwarding state
      [ 5297.616090] 8021q: adding VLAN 0 to HW filter on device bond0
      [ 5297.620405] bridge0: port 1(bridge_slave_0) entered disabled state
      [ 5297.620730] bridge0: port 2(bridge_slave_1) entered disabled state
      [ 5297.627247] 8021q: adding VLAN 0 to HW filter on device team0
      [ 5297.629636] bridge0: port 1(bridge_slave_0) entered blocking state
      …
      [ 5298.002798] bridge_slave_0: left promiscuous mode
      [ 5298.002869] bridge0: port 1(bridge_slave_0) entered disabled state
      [ 5298.309444] bond0 (unregistering): (slave bond_slave_0): Releasing backup interface
      [ 5298.315206] bond0 (unregistering): (slave bond_slave_1): Releasing backup interface
      [ 5298.320207] bond0 (unregistering): Released all slaves
      [ 5298.354296] hsr_slave_0: left promiscuous mode
      [ 5298.360750] hsr_slave_1: left promiscuous mode
      [ 5298.374889] veth1_macvtap: left promiscuous mode
      [ 5298.374931] veth0_macvtap: left promiscuous mode
      [ 5298.374988] veth1_vlan: left promiscuous mode
      [ 5298.375024] veth0_vlan: left promiscuous mode
      [ 5299.109741] team0 (unregistering): Port device team_slave_1 removed
      [ 5299.185870] team0 (unregistering): Port device team_slave_0 removed
      …
      [ 5300.155443] Bluetooth: hci3: unexpected cc 0x0c03 length: 249 > 1
      [ 5300.155724] Bluetooth: hci3: unexpected cc 0x1003 length: 249 > 9
      [ 5300.155988] Bluetooth: hci3: unexpected cc 0x1001 length: 249 > 9
      ….
      [ 5301.075531] team0: Port device team_slave_1 added
      [ 5301.085515] bridge0: port 1(bridge_slave_0) entered blocking state
      [ 5301.085531] bridge0: port 1(bridge_slave_0) entered disabled state
      [ 5301.085588] bridge_slave_0: entered allmulticast mode
      [ 5301.085800] bridge_slave_0: entered promiscuous mode
      [ 5301.095617] bridge0: port 1(bridge_slave_0) entered blocking state
      [ 5301.095633] bridge0: port 1(bridge_slave_0) entered disabled state
      …
      [ 5301.149734] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link
      [ 5301.173234] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link
      [ 5301.180517] bond0: (slave bond_slave_1): Enslaving as an active interface with an up link
      [ 5301.193481] hsr_slave_0: entered promiscuous mode
      [ 5301.204425] hsr_slave_1: entered promiscuous mode
      [ 5301.210172] debugfs: Directory 'hsr0' with parent 'hsr' already present!
      [ 5301.210185] Cannot create hsr debugfs directory
      [ 5301.224061] bond0: (slave bond_slave_1): Enslaving as an active interface with an up link
      [ 5301.246901] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link
      [ 5301.255934] team0: Port device team_slave_0 added
      [ 5301.256480] team0: Port device team_slave_1 added
      [ 5301.256948] team0: Port device team_slave_0 added
      …
      [ 5301.435928] hsr_slave_0: entered promiscuous mode
      [ 5301.446029] hsr_slave_1: entered promiscuous mode
      [ 5301.455872] debugfs: Directory 'hsr0' with parent 'hsr' already present!
      [ 5301.455884] Cannot create hsr debugfs directory
      [ 5301.502664] hsr_slave_0: entered promiscuous mode
      [ 5301.513675] hsr_slave_1: entered promiscuous mode
      [ 5301.526155] debugfs: Directory 'hsr0' with parent 'hsr' already present!
      [ 5301.526164] Cannot create hsr debugfs directory
      [ 5301.563662] hsr_slave_0: entered promiscuous mode
      [ 5301.576129] hsr_slave_1: entered promiscuous mode
      [ 5301.580259] debugfs: Directory 'hsr0' with parent 'hsr' already present!
      [ 5301.580270] Cannot create hsr debugfs directory
      [ 5301.590269] 8021q: adding VLAN 0 to HW filter on device bond0
      
      [ 5301.595872] KASAN: null-ptr-deref in range [0x0000000000000130-0x0000000000000137]
      [ 5301.595877] Mem abort info:
      [ 5301.595881]   ESR = 0x0000000096000006
      [ 5301.595885]   EC = 0x25: DABT (current EL), IL = 32 bits
      [ 5301.595889]   SET = 0, FnV = 0
      [ 5301.595893]   EA = 0, S1PTW = 0
      [ 5301.595896]   FSC = 0x06: level 2 translation fault
      [ 5301.595900] Data abort info:
      [ 5301.595903]   ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
      [ 5301.595907]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
      [ 5301.595911]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
      [ 5301.595915] [dfff800000000026] address between user and kernel address ranges
      [ 5301.595971] Internal error: Oops: 0000000096000006 [#1] SMP
      …
      [ 5301.596076] CPU: 2 PID: 102769 Comm:
      syz-executor.3 Kdump: loaded Tainted:
       G        W         -------  ---  6.10.0-0.rc2.20240608gitdc772f82.29.fc41.aarch64+debug #1
      [ 5301.596080] Hardware name: VMware, Inc. VMware20,1/VBSA,
       BIOS VMW201.00V.21805430.BA64.2305221830 05/22/2023
      [ 5301.596082] pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
      [ 5301.596085] pc : strnlen+0x40/0x88
      [ 5301.596114] lr : trace_event_get_offsets_qdisc_reset+0x6c/0x2b0
      [ 5301.596124] sp : ffff8000beef6b40
      [ 5301.596126] x29: ffff8000beef6b40 x28: dfff800000000000 x27: 0000000000000001
      [ 5301.596131] x26: 6de1800082c62bd0 x25: 1ffff000110aa9e0 x24: ffff800088554f00
      [ 5301.596136] x23: ffff800088554ec0 x22: 0000000000000130 x21: 0000000000000140
      [ 5301.596140] x20: dfff800000000000 x19: ffff8000beef6c60 x18: ffff7000115106d8
      [ 5301.596143] x17: ffff800121bad000 x16: ffff800080020000 x15: 0000000000000006
      [ 5301.596147] x14: 0000000000000002 x13: ffff0001f3ed8d14 x12: ffff700017ddeda5
      [ 5301.596151] x11: 1ffff00017ddeda4 x10: ffff700017ddeda4 x9 : ffff800082cc5eec
      [ 5301.596155] x8 : 0000000000000004 x7 : 00000000f1f1f1f1 x6 : 00000000f2f2f200
      [ 5301.596158] x5 : 00000000f3f3f3f3 x4 : ffff700017dded80 x3 : 00000000f204f1f1
      [ 5301.596162] x2 : 0000000000000026 x1 : 0000000000000000 x0 : 0000000000000130
      [ 5301.596166] Call trace:
      [ 5301.596175]  strnlen+0x40/0x88
      [ 5301.596179]  trace_event_get_offsets_qdisc_reset+0x6c/0x2b0
      [ 5301.596182]  perf_trace_qdisc_reset+0xb0/0x538
      [ 5301.596184]  __traceiter_qdisc_reset+0x68/0xc0
      [ 5301.596188]  qdisc_reset+0x43c/0x5e8
      [ 5301.596190]  netif_set_real_num_tx_queues+0x288/0x770
      [ 5301.596194]  veth_init_queues+0xfc/0x130 [veth]
      [ 5301.596198]  veth_newlink+0x45c/0x850 [veth]
      [ 5301.596202]  rtnl_newlink_create+0x2c8/0x798
      [ 5301.596205]  __rtnl_newlink+0x92c/0xb60
      [ 5301.596208]  rtnl_newlink+0xd8/0x130
      [ 5301.596211]  rtnetlink_rcv_msg+0x2e0/0x890
      [ 5301.596214]  netlink_rcv_skb+0x1c4/0x380
      [ 5301.596225]  rtnetlink_rcv+0x20/0x38
      [ 5301.596227]  netlink_unicast+0x3c8/0x640
      [ 5301.596231]  netlink_sendmsg+0x658/0xa60
      [ 5301.596234]  __sock_sendmsg+0xd0/0x180
      [ 5301.596243]  __sys_sendto+0x1c0/0x280
      [ 5301.596246]  __arm64_sys_sendto+0xc8/0x150
      [ 5301.596249]  invoke_syscall+0xdc/0x268
      [ 5301.596256]  el0_svc_common.constprop.0+0x16c/0x240
      [ 5301.596259]  do_el0_svc+0x48/0x68
      [ 5301.596261]  el0_svc+0x50/0x188
      [ 5301.596265]  el0t_64_sync_handler+0x120/0x130
      [ 5301.596268]  el0t_64_sync+0x194/0x198
      [ 5301.596272] Code: eb15001f 54000120 d343fc02 12000801 (38f46842)
      [ 5301.596285] SMP: stopping secondary CPUs
      [ 5301.597053] Starting crashdump kernel...
      [ 5301.597057] Bye!
      
      After applying our patch, I didn't find any kernel panic errors.
      
      We've found a simple reproducer
      
       # echo 1 > /sys/kernel/debug/tracing/events/qdisc/qdisc_reset/enable
      
       # ip link add veth0 type veth peer name veth1
      
       Error: Unknown device type.
      
      However, without our patch applied, I tested upstream 6.10.0-rc3 kernel
      using the qdisc_reset event and the ip command on my qemu virtual machine.
      
      This 2 commands makes always kernel panic.
      
      Linux version: 6.10.0-rc3
      
      [    0.000000] Linux version 6.10.0-rc3-00164-g44ef20ba-dirty
      (paran@fedora) (gcc (GCC) 14.1.1 20240522 (Red Hat 14.1.1-4), GNU ld
      version 2.41-34.fc40) #20 SMP PREEMPT Sat Jun 15 16:51:25 KST 2024
      
      Kernel panic message:
      
      [  615.236484] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
      [  615.237250] Dumping ftrace buffer:
      [  615.237679]    (ftrace buffer empty)
      [  615.238097] Modules linked in: veth crct10dif_ce virtio_gpu
      virtio_dma_buf drm_shmem_helper drm_kms_helper zynqmp_fpga xilinx_can
      xilinx_spi xilinx_selectmap xilinx_core xilinx_pr_decoupler versal_fpga
      uvcvideo uvc videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev
      videobuf2_common mc usbnet deflate zstd ubifs ubi rcar_canfd rcar_can
      omap_mailbox ntb_msi_test ntb_hw_epf lattice_sysconfig_spi
      lattice_sysconfig ice40_spi gpio_xilinx dwmac_altr_socfpga mdio_regmap
      stmmac_platform stmmac pcs_xpcs dfl_fme_region dfl_fme_mgr dfl_fme_br
      dfl_afu dfl fpga_region fpga_bridge can can_dev br_netfilter bridge stp
      llc atl1c ath11k_pci mhi ath11k_ahb ath11k qmi_helpers ath10k_sdio
      ath10k_pci ath10k_core ath mac80211 libarc4 cfg80211 drm fuse backlight ipv6
      Jun 22 02:36:5[3   6k152.62-4sm98k4-0k]v  kCePUr:n e1l :P IUDn:a b4le6
      8t oC ohmma: nidpl eN oketr nteali nptaedg i6n.g1 0re.0q-urecs3t- 0at0
      1v6i4r-tgu4a4le fa2d0dbraeeds0se-dir tyd f#f2f08
        615.252376] Hardware name: linux,dummy-virt (DT)
      [  615.253220] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS
      BTYPE=--)
      [  615.254433] pc : strnlen+0x6c/0xe0
      [  615.255096] lr : trace_event_get_offsets_qdisc_reset+0x94/0x3d0
      [  615.256088] sp : ffff800080b269a0
      [  615.256615] x29: ffff800080b269a0 x28: ffffc070f3f98500 x27:
      0000000000000001
      [  615.257831] x26: 0000000000000010 x25: ffffc070f3f98540 x24:
      ffffc070f619cf60
      [  615.259020] x23: 0000000000000128 x22: 0000000000000138 x21:
      dfff800000000000
      [  615.260241] x20: ffffc070f631ad00 x19: 0000000000000128 x18:
      ffffc070f448b800
      [  615.261454] x17: 0000000000000000 x16: 0000000000000001 x15:
      ffffc070f4ba2a90
      [  615.262635] x14: ffff700010164d73 x13: 1ffff80e1e8d5eb3 x12:
      1ffff00010164d72
      [  615.263877] x11: ffff700010164d72 x10: dfff800000000000 x9 :
      ffffc070e85d6184
      [  615.265047] x8 : ffffc070e4402070 x7 : 000000000000f1f1 x6 :
      000000001504a6d3
      [  615.266336] x5 : ffff28ca21122140 x4 : ffffc070f5043ea8 x3 :
      0000000000000000
      [  615.267528] x2 : 0000000000000025 x1 : 0000000000000000 x0 :
      0000000000000000
      [  615.268747] Call trace:
      [  615.269180]  strnlen+0x6c/0xe0
      [  615.269767]  trace_event_get_offsets_qdisc_reset+0x94/0x3d0
      [  615.270716]  trace_event_raw_event_qdisc_reset+0xe8/0x4e8
      [  615.271667]  __traceiter_qdisc_reset+0xa0/0x140
      [  615.272499]  qdisc_reset+0x554/0x848
      [  615.273134]  netif_set_real_num_tx_queues+0x360/0x9a8
      [  615.274050]  veth_init_queues+0x110/0x220 [veth]
      [  615.275110]  veth_newlink+0x538/0xa50 [veth]
      [  615.276172]  __rtnl_newlink+0x11e4/0x1bc8
      [  615.276944]  rtnl_newlink+0xac/0x120
      [  615.277657]  rtnetlink_rcv_msg+0x4e4/0x1370
      [  615.278409]  netlink_rcv_skb+0x25c/0x4f0
      [  615.279122]  rtnetlink_rcv+0x48/0x70
      [  615.279769]  netlink_unicast+0x5a8/0x7b8
      [  615.280462]  netlink_sendmsg+0xa70/0x1190
      
      Yeoreum and I don't know if the patch we wrote will fix the underlying
      cause, but we think that priority is to prevent kernel panic happening.
      So, we're sending this patch.
      
      Fixes: 51270d57 ("tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string")
      Link: https://lore.kernel.org/lkml/20240229143432.273b4871@gandalf.local.home/t/
      Cc: netdev@vger.kernel.org
      Tested-by: default avatarYunseong Kim <yskelg@gmail.com>
      Signed-off-by: default avatarYunseong Kim <yskelg@gmail.com>
      Signed-off-by: default avatarYeoreum Yun <yeoreum.yun@arm.com>
      Link: https://lore.kernel.org/r/20240624173320.24945-4-yskelg@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      bab49231
  2. 26 Jun, 2024 2 commits
    • Daniele Palmas's avatar
      net: usb: qmi_wwan: add Telit FN912 compositions · 77453e2b
      Daniele Palmas authored
      Add the following Telit FN912 compositions:
      
      0x3000: rmnet + tty (AT/NMEA) + tty (AT) + tty (diag)
      T:  Bus=03 Lev=01 Prnt=03 Port=07 Cnt=01 Dev#=  8 Spd=480  MxCh= 0
      D:  Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
      P:  Vendor=1bc7 ProdID=3000 Rev=05.15
      S:  Manufacturer=Telit Cinterion
      S:  Product=FN912
      S:  SerialNumber=92c4c4d8
      C:  #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=500mA
      I:  If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
      E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=82(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
      I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option
      E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=84(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
      E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=86(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      I:  If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
      E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      
      0x3001: rmnet + tty (AT) + tty (diag) + DPL (data packet logging) + adb
      T:  Bus=03 Lev=01 Prnt=03 Port=07 Cnt=01 Dev#=  7 Spd=480  MxCh= 0
      D:  Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
      P:  Vendor=1bc7 ProdID=3001 Rev=05.15
      S:  Manufacturer=Telit Cinterion
      S:  Product=FN912
      S:  SerialNumber=92c4c4d8
      C:  #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA
      I:  If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
      E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=82(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
      I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
      E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=84(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      I:  If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
      E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:  If#= 3 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=80 Driver=(none)
      E:  Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:  If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=usbfs
      E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      Signed-off-by: default avatarDaniele Palmas <dnlplm@gmail.com>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Link: https://patch.msgid.link/20240625102236.69539-1-dnlplm@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      77453e2b
    • Neal Cardwell's avatar
      tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO · 5dfe9d27
      Neal Cardwell authored
      Testing determined that the recent commit 9e046bb1 ("tcp: clear
      tp->retrans_stamp in tcp_rcv_fastopen_synack()") has a race, and does
      not always ensure retrans_stamp is 0 after a TFO payload retransmit.
      
      If transmit completion for the SYN+data skb happens after the client
      TCP stack receives the SYNACK (which sometimes happens), then
      retrans_stamp can erroneously remain non-zero for the lifetime of the
      connection, causing a premature ETIMEDOUT later.
      
      Testing and tracing showed that the buggy scenario is the following
      somewhat tricky sequence:
      
      + Client attempts a TFO handshake. tcp_send_syn_data() sends SYN + TFO
        cookie + data in a single packet in the syn_data skb. It hands the
        syn_data skb to tcp_transmit_skb(), which makes a clone. Crucially,
        it then reuses the same original (non-clone) syn_data skb,
        transforming it by advancing the seq by one byte and removing the
        FIN bit, and enques the resulting payload-only skb in the
        sk->tcp_rtx_queue.
      
      + Client sets retrans_stamp to the start time of the three-way
        handshake.
      
      + Cookie mismatches or server has TFO disabled, and server only ACKs
        SYN.
      
      + tcp_ack() sees SYN is acked, tcp_clean_rtx_queue() clears
        retrans_stamp.
      
      + Since the client SYN was acked but not the payload, the TFO failure
        code path in tcp_rcv_fastopen_synack() tries to retransmit the
        payload skb.  However, in some cases the transmit completion for the
        clone of the syn_data (which had SYN + TFO cookie + data) hasn't
        happened.  In those cases, skb_still_in_host_queue() returns true
        for the retransmitted TFO payload, because the clone of the syn_data
        skb has not had its tx completetion.
      
      + Because skb_still_in_host_queue() finds skb_fclone_busy() is true,
        it sets the TSQ_THROTTLED bit and the retransmit does not happen in
        the tcp_rcv_fastopen_synack() call chain.
      
      + The tcp_rcv_fastopen_synack() code next implicitly assumes the
        retransmit process is finished, and sets retrans_stamp to 0 to clear
        it, but this is later overwritten (see below).
      
      + Later, upon tx completion, tcp_tsq_write() calls
        tcp_xmit_retransmit_queue(), which puts the retransmit in flight and
        sets retrans_stamp to a non-zero value.
      
      + The client receives an ACK for the retransmitted TFO payload data.
      
      + Since we're in CA_Open and there are no dupacks/SACKs/DSACKs/ECN to
        make tcp_ack_is_dubious() true and make us call
        tcp_fastretrans_alert() and reach a code path that clears
        retrans_stamp, retrans_stamp stays nonzero.
      
      + Later, if there is a TLP, RTO, RTO sequence, then the connection
        will suffer an early ETIMEDOUT due to the erroneously ancient
        retrans_stamp.
      
      The fix: this commit refactors the code to have
      tcp_rcv_fastopen_synack() retransmit by reusing the relevant parts of
      tcp_simple_retransmit() that enter CA_Loss (without changing cwnd) and
      call tcp_xmit_retransmit_queue(). We have tcp_simple_retransmit() and
      tcp_rcv_fastopen_synack() share code in this way because in both cases
      we get a packet indicating non-congestion loss (MTU reduction or TFO
      failure) and thus in both cases we want to retransmit as many packets
      as cwnd allows, without reducing cwnd. And given that retransmits will
      set retrans_stamp to a non-zero value (and may do so in a later
      calling context due to TSQ), we also want to enter CA_Loss so that we
      track when all retransmitted packets are ACked and clear retrans_stamp
      when that happens (to ensure later recurring RTOs are using the
      correct retrans_stamp and don't declare ETIMEDOUT prematurely).
      
      Fixes: 9e046bb1 ("tcp: clear tp->retrans_stamp in tcp_rcv_fastopen_synack()")
      Fixes: a7abf3cd ("tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()")
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Link: https://patch.msgid.link/20240624144323.2371403-1-ncardwell.sw@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5dfe9d27
  3. 25 Jun, 2024 5 commits
    • Shannon Nelson's avatar
      ionic: use dev_consume_skb_any outside of napi · 84b767f9
      Shannon Nelson authored
      If we're not in a NAPI softirq context, we need to be careful
      about how we call napi_consume_skb(), specifically we need to
      call it with budget==0 to signal to it that we're not in a
      safe context.
      
      This was found while running some configuration stress testing
      of traffic and a change queue config loop running, and this
      curious note popped out:
      
      [ 4371.402645] BUG: using smp_processor_id() in preemptible [00000000] code: ethtool/20545
      [ 4371.402897] caller is napi_skb_cache_put+0x16/0x80
      [ 4371.403120] CPU: 25 PID: 20545 Comm: ethtool Kdump: loaded Tainted: G           OE      6.10.0-rc3-netnext+ #8
      [ 4371.403302] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 01/23/2021
      [ 4371.403460] Call Trace:
      [ 4371.403613]  <TASK>
      [ 4371.403758]  dump_stack_lvl+0x4f/0x70
      [ 4371.403904]  check_preemption_disabled+0xc1/0xe0
      [ 4371.404051]  napi_skb_cache_put+0x16/0x80
      [ 4371.404199]  ionic_tx_clean+0x18a/0x240 [ionic]
      [ 4371.404354]  ionic_tx_cq_service+0xc4/0x200 [ionic]
      [ 4371.404505]  ionic_tx_flush+0x15/0x70 [ionic]
      [ 4371.404653]  ? ionic_lif_qcq_deinit.isra.23+0x5b/0x70 [ionic]
      [ 4371.404805]  ionic_txrx_deinit+0x71/0x190 [ionic]
      [ 4371.404956]  ionic_reconfigure_queues+0x5f5/0xff0 [ionic]
      [ 4371.405111]  ionic_set_ringparam+0x2e8/0x3e0 [ionic]
      [ 4371.405265]  ethnl_set_rings+0x1f1/0x300
      [ 4371.405418]  ethnl_default_set_doit+0xbb/0x160
      [ 4371.405571]  genl_family_rcv_msg_doit+0xff/0x130
      	[...]
      
      I found that ionic_tx_clean() calls napi_consume_skb() which calls
      napi_skb_cache_put(), but before that last call is the note
          /* Zero budget indicate non-NAPI context called us, like netpoll */
      and
          DEBUG_NET_WARN_ON_ONCE(!in_softirq());
      
      Those are pretty big hints that we're doing it wrong.  We can pass a
      context hint down through the calls to let ionic_tx_clean() know what
      we're doing so it can call napi_consume_skb() correctly.
      
      Fixes: 386e6986 ("ionic: Make use napi_consume_skb")
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Link: https://patch.msgid.link/20240624175015.4520-1-shannon.nelson@amd.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      84b767f9
    • Tristram Ha's avatar
      net: dsa: microchip: fix wrong register write when masking interrupt · b1c4b4d4
      Tristram Ha authored
      The switch global port interrupt mask, REG_SW_PORT_INT_MASK__4, is
      defined as 0x001C in ksz9477_reg.h.  The designers used 32-bit value in
      anticipation for increase of port count in future product but currently
      the maximum port count is 7 and the effective value is 0x7F in register
      0x001F.  Each port has its own interrupt mask and is defined as 0x#01F.
      It uses only 4 bits for different interrupts.
      
      The developer who implemented the current interrupt mechanism in the
      switch driver noticed there are similarities between the mechanism to
      mask port interrupts in global interrupt and individual interrupts in
      each port and so used the same code to handle these interrupts.  He
      updated the code to use the new macro REG_SW_PORT_INT_MASK__1 which is
      defined as 0x1F in ksz_common.h but he forgot to update the 32-bit write
      to 8-bit as now the mask registers are 0x1F and 0x#01F.
      
      In addition all KSZ switches other than the KSZ9897/KSZ9893 and LAN937X
      families use only 8-bit access and so this common code will eventually
      be changed to accommodate them.
      
      Fixes: e1add7dd ("net: dsa: microchip: use common irq routines for girq and pirq")
      Signed-off-by: default avatarTristram Ha <tristram.ha@microchip.com>
      Link: https://lore.kernel.org/r/1719009262-2948-1-git-send-email-Tristram.Ha@microchip.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b1c4b4d4
    • luoxuanqiang's avatar
      Fix race for duplicate reqsk on identical SYN · ff46e3b4
      luoxuanqiang authored
      When bonding is configured in BOND_MODE_BROADCAST mode, if two identical
      SYN packets are received at the same time and processed on different CPUs,
      it can potentially create the same sk (sock) but two different reqsk
      (request_sock) in tcp_conn_request().
      
      These two different reqsk will respond with two SYNACK packets, and since
      the generation of the seq (ISN) incorporates a timestamp, the final two
      SYNACK packets will have different seq values.
      
      The consequence is that when the Client receives and replies with an ACK
      to the earlier SYNACK packet, we will reset(RST) it.
      
      ========================================================================
      
      This behavior is consistently reproducible in my local setup,
      which comprises:
      
                        | NETA1 ------ NETB1 |
      PC_A --- bond --- |                    | --- bond --- PC_B
                        | NETA2 ------ NETB2 |
      
      - PC_A is the Server and has two network cards, NETA1 and NETA2. I have
        bonded these two cards using BOND_MODE_BROADCAST mode and configured
        them to be handled by different CPU.
      
      - PC_B is the Client, also equipped with two network cards, NETB1 and
        NETB2, which are also bonded and configured in BOND_MODE_BROADCAST mode.
      
      If the client attempts a TCP connection to the server, it might encounter
      a failure. Capturing packets from the server side reveals:
      
      10.10.10.10.45182 > localhost: Flags [S], seq 320236027,
      10.10.10.10.45182 > localhost: Flags [S], seq 320236027,
      localhost > 10.10.10.10.45182: Flags [S.], seq 2967855116,
      localhost > 10.10.10.10.45182: Flags [S.], seq 2967855123, <==
      10.10.10.10.45182 > localhost: Flags [.], ack 4294967290,
      10.10.10.10.45182 > localhost: Flags [.], ack 4294967290,
      localhost > 10.10.10.10.45182: Flags [R], seq 2967855117, <==
      localhost > 10.10.10.10.45182: Flags [R], seq 2967855117,
      
      Two SYNACKs with different seq numbers are sent by localhost,
      resulting in an anomaly.
      
      ========================================================================
      
      The attempted solution is as follows:
      Add a return value to inet_csk_reqsk_queue_hash_add() to confirm if the
      ehash insertion is successful (Up to now, the reason for unsuccessful
      insertion is that a reqsk for the same connection has already been
      inserted). If the insertion fails, release the reqsk.
      
      Due to the refcnt, Kuniyuki suggests also adding a return value check
      for the DCCP module; if ehash insertion fails, indicating a successful
      insertion of the same connection, simply release the reqsk as well.
      
      Simultaneously, In the reqsk_queue_hash_req(), the start of the
      req->rsk_timer is adjusted to be after successful insertion.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarluoxuanqiang <luoxuanqiang@kylinos.cn>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240621013929.1386815-1-luoxuanqiang@kylinos.cnSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ff46e3b4
    • Nick Child's avatar
      ibmvnic: Add tx check to prevent skb leak · 0983d288
      Nick Child authored
      Below is a summary of how the driver stores a reference to an skb during
      transmit:
          tx_buff[free_map[consumer_index]]->skb = new_skb;
          free_map[consumer_index] = IBMVNIC_INVALID_MAP;
          consumer_index ++;
      Where variable data looks like this:
          free_map == [4, IBMVNIC_INVALID_MAP, IBMVNIC_INVALID_MAP, 0, 3]
                                                     	consumer_index^
          tx_buff == [skb=null, skb=<ptr>, skb=<ptr>, skb=null, skb=null]
      
      The driver has checks to ensure that free_map[consumer_index] pointed to
      a valid index but there was no check to ensure that this index pointed
      to an unused/null skb address. So, if, by some chance, our free_map and
      tx_buff lists become out of sync then we were previously risking an
      skb memory leak. This could then cause tcp congestion control to stop
      sending packets, eventually leading to ETIMEDOUT.
      
      Therefore, add a conditional to ensure that the skb address is null. If
      not then warn the user (because this is still a bug that should be
      patched) and free the old pointer to prevent memleak/tcp problems.
      Signed-off-by: default avatarNick Child <nnac123@linux.ibm.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0983d288
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 482000cf
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2024-06-24
      
      We've added 12 non-merge commits during the last 10 day(s) which contain
      a total of 10 files changed, 412 insertions(+), 16 deletions(-).
      
      The main changes are:
      
      1) Fix a BPF verifier issue validating may_goto with a negative offset,
         from Alexei Starovoitov.
      
      2) Fix a BPF verifier validation bug with may_goto combined with jump to
         the first instruction, also from Alexei Starovoitov.
      
      3) Fix a bug with overrunning reservations in BPF ring buffer,
         from Daniel Borkmann.
      
      4) Fix a bug in BPF verifier due to missing proper var_off setting related
         to movsx instruction, from Yonghong Song.
      
      5) Silence unnecessary syzkaller-triggered warning in __xdp_reg_mem_model(),
         from Daniil Dulov.
      
      * tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        xdp: Remove WARN() from __xdp_reg_mem_model()
        selftests/bpf: Add tests for may_goto with negative offset.
        bpf: Fix may_goto with negative offset.
        selftests/bpf: Add more ring buffer test coverage
        bpf: Fix overrunning reservations in ringbuf
        selftests/bpf: Tests with may_goto and jumps to the 1st insn
        bpf: Fix the corner case with may_goto and jump to the 1st insn.
        bpf: Update BPF LSM maintainer list
        bpf: Fix remap of arena.
        selftests/bpf: Add a few tests to cover
        bpf: Add missed var_off setting in coerce_subreg_to_size_sx()
        bpf: Add missed var_off setting in set_sext32_default_val()
      ====================
      
      Link: https://patch.msgid.link/20240624124330.8401-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      482000cf
  4. 24 Jun, 2024 5 commits
  5. 23 Jun, 2024 8 commits
  6. 22 Jun, 2024 4 commits
  7. 21 Jun, 2024 13 commits
    • Daniel Borkmann's avatar
      bpf: Fix overrunning reservations in ringbuf · cfa1a232
      Daniel Borkmann authored
      The BPF ring buffer internally is implemented as a power-of-2 sized circular
      buffer, with two logical and ever-increasing counters: consumer_pos is the
      consumer counter to show which logical position the consumer consumed the
      data, and producer_pos which is the producer counter denoting the amount of
      data reserved by all producers.
      
      Each time a record is reserved, the producer that "owns" the record will
      successfully advance producer counter. In user space each time a record is
      read, the consumer of the data advanced the consumer counter once it finished
      processing. Both counters are stored in separate pages so that from user
      space, the producer counter is read-only and the consumer counter is read-write.
      
      One aspect that simplifies and thus speeds up the implementation of both
      producers and consumers is how the data area is mapped twice contiguously
      back-to-back in the virtual memory, allowing to not take any special measures
      for samples that have to wrap around at the end of the circular buffer data
      area, because the next page after the last data page would be first data page
      again, and thus the sample will still appear completely contiguous in virtual
      memory.
      
      Each record has a struct bpf_ringbuf_hdr { u32 len; u32 pg_off; } header for
      book-keeping the length and offset, and is inaccessible to the BPF program.
      Helpers like bpf_ringbuf_reserve() return `(void *)hdr + BPF_RINGBUF_HDR_SZ`
      for the BPF program to use. Bing-Jhong and Muhammad reported that it is however
      possible to make a second allocated memory chunk overlapping with the first
      chunk and as a result, the BPF program is now able to edit first chunk's
      header.
      
      For example, consider the creation of a BPF_MAP_TYPE_RINGBUF map with size
      of 0x4000. Next, the consumer_pos is modified to 0x3000 /before/ a call to
      bpf_ringbuf_reserve() is made. This will allocate a chunk A, which is in
      [0x0,0x3008], and the BPF program is able to edit [0x8,0x3008]. Now, lets
      allocate a chunk B with size 0x3000. This will succeed because consumer_pos
      was edited ahead of time to pass the `new_prod_pos - cons_pos > rb->mask`
      check. Chunk B will be in range [0x3008,0x6010], and the BPF program is able
      to edit [0x3010,0x6010]. Due to the ring buffer memory layout mentioned
      earlier, the ranges [0x0,0x4000] and [0x4000,0x8000] point to the same data
      pages. This means that chunk B at [0x4000,0x4008] is chunk A's header.
      bpf_ringbuf_submit() / bpf_ringbuf_discard() use the header's pg_off to then
      locate the bpf_ringbuf itself via bpf_ringbuf_restore_from_rec(). Once chunk
      B modified chunk A's header, then bpf_ringbuf_commit() refers to the wrong
      page and could cause a crash.
      
      Fix it by calculating the oldest pending_pos and check whether the range
      from the oldest outstanding record to the newest would span beyond the ring
      buffer size. If that is the case, then reject the request. We've tested with
      the ring buffer benchmark in BPF selftests (./benchs/run_bench_ringbufs.sh)
      before/after the fix and while it seems a bit slower on some benchmarks, it
      is still not significantly enough to matter.
      
      Fixes: 457f4436 ("bpf: Implement BPF ring buffer and verifier support for it")
      Reported-by: default avatarBing-Jhong Billy Jheng <billy@starlabs.sg>
      Reported-by: default avatarMuhammad Ramdhan <ramdhan@starlabs.sg>
      Co-developed-by: default avatarBing-Jhong Billy Jheng <billy@starlabs.sg>
      Co-developed-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarBing-Jhong Billy Jheng <billy@starlabs.sg>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240621140828.18238-1-daniel@iogearbox.net
      cfa1a232
    • Alexei Starovoitov's avatar
    • Alexei Starovoitov's avatar
      bpf: Fix the corner case with may_goto and jump to the 1st insn. · 5337ac4c
      Alexei Starovoitov authored
      When the following program is processed by the verifier:
      L1: may_goto L2
          goto L1
      L2: w0 = 0
          exit
      
      the may_goto insn is first converted to:
      L1: r11 = *(u64 *)(r10 -8)
          if r11 == 0x0 goto L2
          r11 -= 1
          *(u64 *)(r10 -8) = r11
          goto L1
      L2: w0 = 0
          exit
      
      then later as the last step the verifier inserts:
        *(u64 *)(r10 -8) = BPF_MAX_LOOPS
      as the first insn of the program to initialize loop count.
      
      When the first insn happens to be a branch target of some jmp the
      bpf_patch_insn_data() logic will produce:
      L1: *(u64 *)(r10 -8) = BPF_MAX_LOOPS
          r11 = *(u64 *)(r10 -8)
          if r11 == 0x0 goto L2
          r11 -= 1
          *(u64 *)(r10 -8) = r11
          goto L1
      L2: w0 = 0
          exit
      
      because instruction patching adjusts all jmps and calls, but for this
      particular corner case it's incorrect and the L1 label should be one
      instruction down, like:
          *(u64 *)(r10 -8) = BPF_MAX_LOOPS
      L1: r11 = *(u64 *)(r10 -8)
          if r11 == 0x0 goto L2
          r11 -= 1
          *(u64 *)(r10 -8) = r11
          goto L1
      L2: w0 = 0
          exit
      
      and that's what this patch is fixing.
      After bpf_patch_insn_data() call adjust_jmp_off() to adjust all jmps
      that point to newly insert BPF_ST insn to point to insn after.
      
      Note that bpf_patch_insn_data() cannot easily be changed to accommodate
      this logic, since jumps that point before or after a sequence of patched
      instructions have to be adjusted with the full length of the patch.
      
      Conceptually it's somewhat similar to "insert" of instructions between other
      instructions with weird semantics. Like "insert" before 1st insn would require
      adjustment of CALL insns to point to newly inserted 1st insn, but not an
      adjustment JMP insns that point to 1st, yet still adjusting JMP insns that
      cross over 1st insn (point to insn before or insn after), hence use simple
      adjust_jmp_off() logic to fix this corner case. Ideally bpf_patch_insn_data()
      would have an auxiliary info to say where 'the start of newly inserted patch
      is', but it would be too complex for backport.
      
      Fixes: 011832b9 ("bpf: Introduce may_goto instruction")
      Reported-by: default avatarZac Ecob <zacecob@protonmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Closes: https://lore.kernel.org/bpf/CAADnVQJ_WWx8w4b=6Gc2EpzAjgv+6A0ridnMz2TvS2egj4r3Gw@mail.gmail.com/
      Link: https://lore.kernel.org/bpf/20240619011859.79334-1-alexei.starovoitov@gmail.com
      5337ac4c
    • David S. Miller's avatar
      Merge branch 'mlxsw-fixes' · 8406b56a
      David S. Miller authored
      Petr Machata says:
      
      ====================
      mlxsw: Fixes
      
      This patchset fixes an issue with mlxsw driver initialization, and a
      memory corruption issue in shared buffer occupancy handling.
      
      v3:
      - Drop the core thermal fix, it's not relevant anymore.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8406b56a
    • Ido Schimmel's avatar
      mlxsw: spectrum_buffers: Fix memory corruptions on Spectrum-4 systems · c28947de
      Ido Schimmel authored
      The following two shared buffer operations make use of the Shared Buffer
      Status Register (SBSR):
      
       # devlink sb occupancy snapshot pci/0000:01:00.0
       # devlink sb occupancy clearmax pci/0000:01:00.0
      
      The register has two masks of 256 bits to denote on which ingress /
      egress ports the register should operate on. Spectrum-4 has more than
      256 ports, so the register was extended by cited commit with a new
      'port_page' field.
      
      However, when filling the register's payload, the driver specifies the
      ports as absolute numbers and not relative to the first port of the port
      page, resulting in memory corruptions [1].
      
      Fix by specifying the ports relative to the first port of the port page.
      
      [1]
      BUG: KASAN: slab-use-after-free in mlxsw_sp_sb_occ_snapshot+0xb6d/0xbc0
      Read of size 1 at addr ffff8881068cb00f by task devlink/1566
      [...]
      Call Trace:
       <TASK>
       dump_stack_lvl+0xc6/0x120
       print_report+0xce/0x670
       kasan_report+0xd7/0x110
       mlxsw_sp_sb_occ_snapshot+0xb6d/0xbc0
       mlxsw_devlink_sb_occ_snapshot+0x75/0xb0
       devlink_nl_sb_occ_snapshot_doit+0x1f9/0x2a0
       genl_family_rcv_msg_doit+0x20c/0x300
       genl_rcv_msg+0x567/0x800
       netlink_rcv_skb+0x170/0x450
       genl_rcv+0x2d/0x40
       netlink_unicast+0x547/0x830
       netlink_sendmsg+0x8d4/0xdb0
       __sys_sendto+0x49b/0x510
       __x64_sys_sendto+0xe5/0x1c0
       do_syscall_64+0xc1/0x1d0
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      [...]
      Allocated by task 1:
       kasan_save_stack+0x33/0x60
       kasan_save_track+0x14/0x30
       __kasan_kmalloc+0x8f/0xa0
       copy_verifier_state+0xbc2/0xfb0
       do_check_common+0x2c51/0xc7e0
       bpf_check+0x5107/0x9960
       bpf_prog_load+0xf0e/0x2690
       __sys_bpf+0x1a61/0x49d0
       __x64_sys_bpf+0x7d/0xc0
       do_syscall_64+0xc1/0x1d0
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      Freed by task 1:
       kasan_save_stack+0x33/0x60
       kasan_save_track+0x14/0x30
       kasan_save_free_info+0x3b/0x60
       poison_slab_object+0x109/0x170
       __kasan_slab_free+0x14/0x30
       kfree+0xca/0x2b0
       free_verifier_state+0xce/0x270
       do_check_common+0x4828/0xc7e0
       bpf_check+0x5107/0x9960
       bpf_prog_load+0xf0e/0x2690
       __sys_bpf+0x1a61/0x49d0
       __x64_sys_bpf+0x7d/0xc0
       do_syscall_64+0xc1/0x1d0
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      Fixes: f8538aec ("mlxsw: Add support for more than 256 ports in SBSR register")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c28947de
    • Ido Schimmel's avatar
      mlxsw: pci: Fix driver initialization with Spectrum-4 · 0602697d
      Ido Schimmel authored
      Cited commit added support for a new reset flow ("all reset") which is
      deeper than the existing reset flow ("software reset") and allows the
      device's PCI firmware to be upgraded.
      
      In the new flow the driver first tells the firmware that "all reset" is
      required by issuing a new reset command (i.e., MRSR.command=6) and then
      triggers the reset by having the PCI core issue a secondary bus reset
      (SBR).
      
      However, due to a race condition in the device's firmware the device is
      not always able to recover from this reset, resulting in initialization
      failures [1].
      
      New firmware versions include a fix for the bug and advertise it using a
      new capability bit in the Management Capabilities Mask (MCAM) register.
      
      Avoid initialization failures by reading the new capability bit and
      triggering the new reset flow only if the bit is set. If the bit is not
      set, trigger a normal PCI hot reset by skipping the call to the
      Management Reset and Shutdown Register (MRSR).
      
      Normal PCI hot reset is weaker than "all reset", but it results in a
      fully operational driver and allows users to flash a new firmware, if
      they want to.
      
      [1]
      mlxsw_spectrum4 0000:01:00.0: not ready 1023ms after bus reset; waiting
      mlxsw_spectrum4 0000:01:00.0: not ready 2047ms after bus reset; waiting
      mlxsw_spectrum4 0000:01:00.0: not ready 4095ms after bus reset; waiting
      mlxsw_spectrum4 0000:01:00.0: not ready 8191ms after bus reset; waiting
      mlxsw_spectrum4 0000:01:00.0: not ready 16383ms after bus reset; waiting
      mlxsw_spectrum4 0000:01:00.0: not ready 32767ms after bus reset; waiting
      mlxsw_spectrum4 0000:01:00.0: not ready 65535ms after bus reset; giving up
      mlxsw_spectrum4 0000:01:00.0: PCI function reset failed with -25
      mlxsw_spectrum4 0000:01:00.0: cannot register bus device
      mlxsw_spectrum4: probe of 0000:01:00.0 failed with error -25
      
      Fixes: f257c73e ("mlxsw: pci: Add support for new reset flow")
      Reported-by: default avatarMaksym Yaremchuk <maksymy@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Tested-by: default avatarMaksym Yaremchuk <maksymy@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0602697d
    • Kuniyuki Iwashima's avatar
      selftest: af_unix: Add Kconfig file. · 11b006d6
      Kuniyuki Iwashima authored
      diag_uid selftest failed on NIPA where the received nlmsg_type is
      NLMSG_ERROR [0] because CONFIG_UNIX_DIAG is not set [1] by default
      and sock_diag_lock_handler() failed to load the module.
      
        # # Starting 2 tests from 2 test cases.
        # #  RUN           diag_uid.uid.1 ...
        # # diag_uid.c:159:1:Expected nlh->nlmsg_type (2) == SOCK_DIAG_BY_FAMILY (20)
        # # 1: Test terminated by assertion
        # #          FAIL  diag_uid.uid.1
        # not ok 1 diag_uid.uid.1
      
      Let's add all AF_UNIX Kconfig to the config file under af_unix dir
      so that NIPA consumes it.
      
      Fixes: ac011361 ("af_unix: Add test for sock_diag and UDIAG_SHOW_UID.")
      Link: https://netdev-3.bots.linux.dev/vmksft-net/results/644841/104-diag-uid/stdout [0]
      Link: https://netdev-3.bots.linux.dev/vmksft-net/results/644841/config [1]
      Reported-by: default avatarJakub Kicinski <kuba@kernel.org>
      Closes: https://lore.kernel.org/netdev/20240617073033.0cbb829d@kernel.org/Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11b006d6
    • Shannon Nelson's avatar
      net: remove drivers@pensando.io from MAINTAINERS · 2490785e
      Shannon Nelson authored
      Our corporate overlords have been changing the domains around
      again and this mailing list has gone away.
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Reviewed-by: default avatarMartin Habets <habetsm.xilinx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2490785e
    • Eric Dumazet's avatar
      net: add softirq safety to netdev_rename_lock · 62e58ddb
      Eric Dumazet authored
      syzbot reported a lockdep violation involving bridge driver [1]
      
      Make sure netdev_rename_lock is softirq safe to fix this issue.
      
      [1]
      WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
      6.10.0-rc2-syzkaller-00249-gbe27b896 #0 Not tainted
         -----------------------------------------------------
      syz-executor.2/9449 [HC0[0]:SC0[2]:HE0:SE0] is trying to acquire:
       ffffffff8f5de668 (netdev_rename_lock.seqcount){+.+.}-{0:0}, at: rtnl_fill_ifinfo+0x38e/0x2270 net/core/rtnetlink.c:1839
      
      and this task is already holding:
       ffff888060c64cb8 (&br->lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
       ffff888060c64cb8 (&br->lock){+.-.}-{2:2}, at: br_port_slave_changelink+0x3d/0x150 net/bridge/br_netlink.c:1212
      which would create a new lock dependency:
       (&br->lock){+.-.}-{2:2} -> (netdev_rename_lock.seqcount){+.+.}-{0:0}
      
      but this new dependency connects a SOFTIRQ-irq-safe lock:
       (&br->lock){+.-.}-{2:2}
      
      ... which became SOFTIRQ-irq-safe at:
         lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
         __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
         _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
         spin_lock include/linux/spinlock.h:351 [inline]
         br_forward_delay_timer_expired+0x50/0x440 net/bridge/br_stp_timer.c:86
         call_timer_fn+0x18e/0x650 kernel/time/timer.c:1792
         expire_timers kernel/time/timer.c:1843 [inline]
         __run_timers kernel/time/timer.c:2417 [inline]
         __run_timer_base+0x66a/0x8e0 kernel/time/timer.c:2428
         run_timer_base kernel/time/timer.c:2437 [inline]
         run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2447
         handle_softirqs+0x2c4/0x970 kernel/softirq.c:554
         __do_softirq kernel/softirq.c:588 [inline]
         invoke_softirq kernel/softirq.c:428 [inline]
         __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637
         irq_exit_rcu+0x9/0x30 kernel/softirq.c:649
         instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
         sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
         asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
         lock_acquire+0x264/0x550 kernel/locking/lockdep.c:5758
         fs_reclaim_acquire+0xaf/0x140 mm/page_alloc.c:3800
         might_alloc include/linux/sched/mm.h:334 [inline]
         slab_pre_alloc_hook mm/slub.c:3890 [inline]
         slab_alloc_node mm/slub.c:3980 [inline]
         kmalloc_trace_noprof+0x3d/0x2c0 mm/slub.c:4147
         kmalloc_noprof include/linux/slab.h:660 [inline]
         kzalloc_noprof include/linux/slab.h:778 [inline]
         class_dir_create_and_add drivers/base/core.c:3255 [inline]
         get_device_parent+0x2a7/0x410 drivers/base/core.c:3315
         device_add+0x325/0xbf0 drivers/base/core.c:3645
         netdev_register_kobject+0x17e/0x320 net/core/net-sysfs.c:2136
         register_netdevice+0x11d5/0x19e0 net/core/dev.c:10375
         nsim_init_netdevsim drivers/net/netdevsim/netdev.c:690 [inline]
         nsim_create+0x647/0x890 drivers/net/netdevsim/netdev.c:750
         __nsim_dev_port_add+0x6c0/0xae0 drivers/net/netdevsim/dev.c:1390
         nsim_dev_port_add_all drivers/net/netdevsim/dev.c:1446 [inline]
         nsim_dev_reload_create drivers/net/netdevsim/dev.c:1498 [inline]
         nsim_dev_reload_up+0x69b/0x8e0 drivers/net/netdevsim/dev.c:985
         devlink_reload+0x478/0x870 net/devlink/dev.c:474
         devlink_nl_reload_doit+0xbd6/0xe50 net/devlink/dev.c:586
         genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
         genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
         genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210
         netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
         genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
         netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
         netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
         netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
         sock_sendmsg_nosec net/socket.c:730 [inline]
         __sock_sendmsg+0x221/0x270 net/socket.c:745
         ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
         ___sys_sendmsg net/socket.c:2639 [inline]
         __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2668
         do_syscall_x64 arch/x86/entry/common.c:52 [inline]
         do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
        entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      to a SOFTIRQ-irq-unsafe lock:
       (netdev_rename_lock.seqcount){+.+.}-{0:0}
      
      ... which became SOFTIRQ-irq-unsafe at:
      ...
         lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
         do_write_seqcount_begin_nested include/linux/seqlock.h:469 [inline]
         do_write_seqcount_begin include/linux/seqlock.h:495 [inline]
         write_seqlock include/linux/seqlock.h:823 [inline]
         dev_change_name+0x184/0x920 net/core/dev.c:1229
         do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2880
         __rtnl_newlink net/core/rtnetlink.c:3696 [inline]
         rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3743
         rtnetlink_rcv_msg+0x89b/0x1180 net/core/rtnetlink.c:6635
         netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
         netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
         netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
         netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
         sock_sendmsg_nosec net/socket.c:730 [inline]
         __sock_sendmsg+0x221/0x270 net/socket.c:745
         __sys_sendto+0x3a4/0x4f0 net/socket.c:2192
         __do_sys_sendto net/socket.c:2204 [inline]
         __se_sys_sendto net/socket.c:2200 [inline]
         __x64_sys_sendto+0xde/0x100 net/socket.c:2200
         do_syscall_x64 arch/x86/entry/common.c:52 [inline]
         do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
        entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      other info that might help us debug this:
      
       Possible interrupt unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(netdev_rename_lock.seqcount);
                                     local_irq_disable();
                                     lock(&br->lock);
                                     lock(netdev_rename_lock.seqcount);
        <Interrupt>
          lock(&br->lock);
      
       *** DEADLOCK ***
      
      3 locks held by syz-executor.2/9449:
        #0: ffffffff8f5e7448 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
        #0: ffffffff8f5e7448 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x1180 net/core/rtnetlink.c:6632
        #1: ffff888060c64cb8 (&br->lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
        #1: ffff888060c64cb8 (&br->lock){+.-.}-{2:2}, at: br_port_slave_changelink+0x3d/0x150 net/bridge/br_netlink.c:1212
        #2: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
        #2: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:781 [inline]
        #2: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: team_change_rx_flags+0x29/0x330 drivers/net/team/team_core.c:1767
      
      the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
      -> (&br->lock){+.-.}-{2:2} {
         HARDIRQ-ON-W at:
                           lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
                           __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
                           _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
                           spin_lock_bh include/linux/spinlock.h:356 [inline]
                           br_add_if+0xb34/0xef0 net/bridge/br_if.c:682
                           do_set_master net/core/rtnetlink.c:2701 [inline]
                           do_setlink+0xe70/0x41f0 net/core/rtnetlink.c:2907
                           __rtnl_newlink net/core/rtnetlink.c:3696 [inline]
                           rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3743
                           rtnetlink_rcv_msg+0x89b/0x1180 net/core/rtnetlink.c:6635
                           netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
                           netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
                           netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
                           netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
                           sock_sendmsg_nosec net/socket.c:730 [inline]
                           __sock_sendmsg+0x221/0x270 net/socket.c:745
                           __sys_sendto+0x3a4/0x4f0 net/socket.c:2192
                           __do_sys_sendto net/socket.c:2204 [inline]
                           __se_sys_sendto net/socket.c:2200 [inline]
                           __x64_sys_sendto+0xde/0x100 net/socket.c:2200
                           do_syscall_x64 arch/x86/entry/common.c:52 [inline]
                           do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
                          entry_SYSCALL_64_after_hwframe+0x77/0x7f
         IN-SOFTIRQ-W at:
                           lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
                           __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
                           _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
                           spin_lock include/linux/spinlock.h:351 [inline]
                           br_forward_delay_timer_expired+0x50/0x440 net/bridge/br_stp_timer.c:86
                           call_timer_fn+0x18e/0x650 kernel/time/timer.c:1792
                           expire_timers kernel/time/timer.c:1843 [inline]
                           __run_timers kernel/time/timer.c:2417 [inline]
                           __run_timer_base+0x66a/0x8e0 kernel/time/timer.c:2428
                           run_timer_base kernel/time/timer.c:2437 [inline]
                           run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2447
                           handle_softirqs+0x2c4/0x970 kernel/softirq.c:554
                           __do_softirq kernel/softirq.c:588 [inline]
                           invoke_softirq kernel/softirq.c:428 [inline]
                           __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637
                           irq_exit_rcu+0x9/0x30 kernel/softirq.c:649
                           instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
                           sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
                           asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
                           lock_acquire+0x264/0x550 kernel/locking/lockdep.c:5758
                           fs_reclaim_acquire+0xaf/0x140 mm/page_alloc.c:3800
                           might_alloc include/linux/sched/mm.h:334 [inline]
                           slab_pre_alloc_hook mm/slub.c:3890 [inline]
                           slab_alloc_node mm/slub.c:3980 [inline]
                           kmalloc_trace_noprof+0x3d/0x2c0 mm/slub.c:4147
                           kmalloc_noprof include/linux/slab.h:660 [inline]
                           kzalloc_noprof include/linux/slab.h:778 [inline]
                           class_dir_create_and_add drivers/base/core.c:3255 [inline]
                           get_device_parent+0x2a7/0x410 drivers/base/core.c:3315
                           device_add+0x325/0xbf0 drivers/base/core.c:3645
                           netdev_register_kobject+0x17e/0x320 net/core/net-sysfs.c:2136
                           register_netdevice+0x11d5/0x19e0 net/core/dev.c:10375
                           nsim_init_netdevsim drivers/net/netdevsim/netdev.c:690 [inline]
                           nsim_create+0x647/0x890 drivers/net/netdevsim/netdev.c:750
                           __nsim_dev_port_add+0x6c0/0xae0 drivers/net/netdevsim/dev.c:1390
                           nsim_dev_port_add_all drivers/net/netdevsim/dev.c:1446 [inline]
                           nsim_dev_reload_create drivers/net/netdevsim/dev.c:1498 [inline]
                           nsim_dev_reload_up+0x69b/0x8e0 drivers/net/netdevsim/dev.c:985
                           devlink_reload+0x478/0x870 net/devlink/dev.c:474
                           devlink_nl_reload_doit+0xbd6/0xe50 net/devlink/dev.c:586
                           genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
                           genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
                           genl_rcv_msg+0xb14/0xec0 net/netlink/genetlink.c:1210
                           netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
                           genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
                           netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
                           netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
                           netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
                           sock_sendmsg_nosec net/socket.c:730 [inline]
                           __sock_sendmsg+0x221/0x270 net/socket.c:745
                           ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
                           ___sys_sendmsg net/socket.c:2639 [inline]
                           __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2668
                           do_syscall_x64 arch/x86/entry/common.c:52 [inline]
                           do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
                          entry_SYSCALL_64_after_hwframe+0x77/0x7f
         INITIAL USE at:
                          lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
                          __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
                          _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
                          spin_lock_bh include/linux/spinlock.h:356 [inline]
                          br_add_if+0xb34/0xef0 net/bridge/br_if.c:682
                          do_set_master net/core/rtnetlink.c:2701 [inline]
                          do_setlink+0xe70/0x41f0 net/core/rtnetlink.c:2907
                          __rtnl_newlink net/core/rtnetlink.c:3696 [inline]
                          rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3743
                          rtnetlink_rcv_msg+0x89b/0x1180 net/core/rtnetlink.c:6635
                          netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
                          netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
                          netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
                          netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
                          sock_sendmsg_nosec net/socket.c:730 [inline]
                          __sock_sendmsg+0x221/0x270 net/socket.c:745
                          __sys_sendto+0x3a4/0x4f0 net/socket.c:2192
                          __do_sys_sendto net/socket.c:2204 [inline]
                          __se_sys_sendto net/socket.c:2200 [inline]
                          __x64_sys_sendto+0xde/0x100 net/socket.c:2200
                          do_syscall_x64 arch/x86/entry/common.c:52 [inline]
                          do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
                         entry_SYSCALL_64_after_hwframe+0x77/0x7f
       }
       ... key      at: [<ffffffff94b9a1a0>] br_dev_setup.__key+0x0/0x20
      
      the dependencies between the lock to be acquired
       and SOFTIRQ-irq-unsafe lock:
      -> (netdev_rename_lock.seqcount){+.+.}-{0:0} {
         HARDIRQ-ON-W at:
                           lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
                           do_write_seqcount_begin_nested include/linux/seqlock.h:469 [inline]
                           do_write_seqcount_begin include/linux/seqlock.h:495 [inline]
                           write_seqlock include/linux/seqlock.h:823 [inline]
                           dev_change_name+0x184/0x920 net/core/dev.c:1229
                           do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2880
                           __rtnl_newlink net/core/rtnetlink.c:3696 [inline]
                           rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3743
                           rtnetlink_rcv_msg+0x89b/0x1180 net/core/rtnetlink.c:6635
                           netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
                           netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
                           netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
                           netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
                           sock_sendmsg_nosec net/socket.c:730 [inline]
                           __sock_sendmsg+0x221/0x270 net/socket.c:745
                           __sys_sendto+0x3a4/0x4f0 net/socket.c:2192
                           __do_sys_sendto net/socket.c:2204 [inline]
                           __se_sys_sendto net/socket.c:2200 [inline]
                           __x64_sys_sendto+0xde/0x100 net/socket.c:2200
                           do_syscall_x64 arch/x86/entry/common.c:52 [inline]
                           do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
                          entry_SYSCALL_64_after_hwframe+0x77/0x7f
         SOFTIRQ-ON-W at:
                           lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
                           do_write_seqcount_begin_nested include/linux/seqlock.h:469 [inline]
                           do_write_seqcount_begin include/linux/seqlock.h:495 [inline]
                           write_seqlock include/linux/seqlock.h:823 [inline]
                           dev_change_name+0x184/0x920 net/core/dev.c:1229
                           do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2880
                           __rtnl_newlink net/core/rtnetlink.c:3696 [inline]
                           rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3743
                           rtnetlink_rcv_msg+0x89b/0x1180 net/core/rtnetlink.c:6635
                           netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
                           netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
                           netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
                           netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
                           sock_sendmsg_nosec net/socket.c:730 [inline]
                           __sock_sendmsg+0x221/0x270 net/socket.c:745
                           __sys_sendto+0x3a4/0x4f0 net/socket.c:2192
                           __do_sys_sendto net/socket.c:2204 [inline]
                           __se_sys_sendto net/socket.c:2200 [inline]
                           __x64_sys_sendto+0xde/0x100 net/socket.c:2200
                           do_syscall_x64 arch/x86/entry/common.c:52 [inline]
                           do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
                          entry_SYSCALL_64_after_hwframe+0x77/0x7f
         INITIAL USE at:
                          lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
                          do_write_seqcount_begin_nested include/linux/seqlock.h:469 [inline]
                          do_write_seqcount_begin include/linux/seqlock.h:495 [inline]
                          write_seqlock include/linux/seqlock.h:823 [inline]
                          dev_change_name+0x184/0x920 net/core/dev.c:1229
                          do_setlink+0xa4b/0x41f0 net/core/rtnetlink.c:2880
                          __rtnl_newlink net/core/rtnetlink.c:3696 [inline]
                          rtnl_newlink+0x180b/0x20a0 net/core/rtnetlink.c:3743
                          rtnetlink_rcv_msg+0x89b/0x1180 net/core/rtnetlink.c:6635
                          netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
                          netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
                          netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
                          netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
                          sock_sendmsg_nosec net/socket.c:730 [inline]
                          __sock_sendmsg+0x221/0x270 net/socket.c:745
                          __sys_sendto+0x3a4/0x4f0 net/socket.c:2192
                          __do_sys_sendto net/socket.c:2204 [inline]
                          __se_sys_sendto net/socket.c:2200 [inline]
                          __x64_sys_sendto+0xde/0x100 net/socket.c:2200
                          do_syscall_x64 arch/x86/entry/common.c:52 [inline]
                          do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
                         entry_SYSCALL_64_after_hwframe+0x77/0x7f
         INITIAL READ USE at:
                               lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
                               seqcount_lockdep_reader_access include/linux/seqlock.h:72 [inline]
                               read_seqbegin include/linux/seqlock.h:772 [inline]
                               netdev_copy_name+0x168/0x2c0 net/core/dev.c:949
                               rtnl_fill_ifinfo+0x38e/0x2270 net/core/rtnetlink.c:1839
                               rtmsg_ifinfo_build_skb+0x18a/0x260 net/core/rtnetlink.c:4073
                               rtmsg_ifinfo_event net/core/rtnetlink.c:4107 [inline]
                               rtmsg_ifinfo+0x91/0x1b0 net/core/rtnetlink.c:4116
                               register_netdevice+0x1665/0x19e0 net/core/dev.c:10422
                               register_netdev+0x3b/0x50 net/core/dev.c:10512
                               loopback_net_init+0x73/0x150 drivers/net/loopback.c:217
                               ops_init+0x359/0x610 net/core/net_namespace.c:139
                               __register_pernet_operations net/core/net_namespace.c:1247 [inline]
                               register_pernet_operations+0x2cb/0x660 net/core/net_namespace.c:1320
                               register_pernet_device+0x33/0x80 net/core/net_namespace.c:1407
                               net_dev_init+0xfcd/0x10d0 net/core/dev.c:11956
                               do_one_initcall+0x248/0x880 init/main.c:1267
                               do_initcall_level+0x157/0x210 init/main.c:1329
                               do_initcalls+0x3f/0x80 init/main.c:1345
                               kernel_init_freeable+0x435/0x5d0 init/main.c:1578
                               kernel_init+0x1d/0x2b0 init/main.c:1467
                               ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
                               ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
       }
       ... key      at: [<ffffffff8f5de668>] netdev_rename_lock+0x8/0xa0
       ... acquired at:
          lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
          seqcount_lockdep_reader_access include/linux/seqlock.h:72 [inline]
          read_seqbegin include/linux/seqlock.h:772 [inline]
          netdev_copy_name+0x168/0x2c0 net/core/dev.c:949
          rtnl_fill_ifinfo+0x38e/0x2270 net/core/rtnetlink.c:1839
          rtmsg_ifinfo_build_skb+0x18a/0x260 net/core/rtnetlink.c:4073
          rtmsg_ifinfo_event net/core/rtnetlink.c:4107 [inline]
          rtmsg_ifinfo+0x91/0x1b0 net/core/rtnetlink.c:4116
          __dev_notify_flags+0xf7/0x400 net/core/dev.c:8816
          __dev_set_promiscuity+0x152/0x5a0 net/core/dev.c:8588
          dev_set_promiscuity+0x51/0xe0 net/core/dev.c:8608
          team_change_rx_flags+0x203/0x330 drivers/net/team/team_core.c:1771
          dev_change_rx_flags net/core/dev.c:8541 [inline]
          __dev_set_promiscuity+0x406/0x5a0 net/core/dev.c:8585
          dev_set_promiscuity+0x51/0xe0 net/core/dev.c:8608
          br_port_clear_promisc net/bridge/br_if.c:135 [inline]
          br_manage_promisc+0x505/0x590 net/bridge/br_if.c:172
          nbp_update_port_count net/bridge/br_if.c:242 [inline]
          br_port_flags_change+0x161/0x1f0 net/bridge/br_if.c:761
          br_setport+0xcb5/0x16d0 net/bridge/br_netlink.c:1000
          br_port_slave_changelink+0x135/0x150 net/bridge/br_netlink.c:1213
          __rtnl_newlink net/core/rtnetlink.c:3689 [inline]
          rtnl_newlink+0x169f/0x20a0 net/core/rtnetlink.c:3743
          rtnetlink_rcv_msg+0x89b/0x1180 net/core/rtnetlink.c:6635
          netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
          netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
          netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
          netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
          sock_sendmsg_nosec net/socket.c:730 [inline]
          __sock_sendmsg+0x221/0x270 net/socket.c:745
          ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
          ___sys_sendmsg net/socket.c:2639 [inline]
          __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2668
          do_syscall_x64 arch/x86/entry/common.c:52 [inline]
          do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
         entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      stack backtrace:
      CPU: 0 PID: 9449 Comm: syz-executor.2 Not tainted 6.10.0-rc2-syzkaller-00249-gbe27b896 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
        print_bad_irq_dependency kernel/locking/lockdep.c:2626 [inline]
        check_irq_usage kernel/locking/lockdep.c:2865 [inline]
        check_prev_add kernel/locking/lockdep.c:3138 [inline]
        check_prevs_add kernel/locking/lockdep.c:3253 [inline]
        validate_chain+0x4de0/0x5900 kernel/locking/lockdep.c:3869
        __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
        lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
        seqcount_lockdep_reader_access include/linux/seqlock.h:72 [inline]
        read_seqbegin include/linux/seqlock.h:772 [inline]
        netdev_copy_name+0x168/0x2c0 net/core/dev.c:949
        rtnl_fill_ifinfo+0x38e/0x2270 net/core/rtnetlink.c:1839
        rtmsg_ifinfo_build_skb+0x18a/0x260 net/core/rtnetlink.c:4073
        rtmsg_ifinfo_event net/core/rtnetlink.c:4107 [inline]
        rtmsg_ifinfo+0x91/0x1b0 net/core/rtnetlink.c:4116
        __dev_notify_flags+0xf7/0x400 net/core/dev.c:8816
        __dev_set_promiscuity+0x152/0x5a0 net/core/dev.c:8588
        dev_set_promiscuity+0x51/0xe0 net/core/dev.c:8608
        team_change_rx_flags+0x203/0x330 drivers/net/team/team_core.c:1771
        dev_change_rx_flags net/core/dev.c:8541 [inline]
        __dev_set_promiscuity+0x406/0x5a0 net/core/dev.c:8585
        dev_set_promiscuity+0x51/0xe0 net/core/dev.c:8608
        br_port_clear_promisc net/bridge/br_if.c:135 [inline]
        br_manage_promisc+0x505/0x590 net/bridge/br_if.c:172
        nbp_update_port_count net/bridge/br_if.c:242 [inline]
        br_port_flags_change+0x161/0x1f0 net/bridge/br_if.c:761
        br_setport+0xcb5/0x16d0 net/bridge/br_netlink.c:1000
        br_port_slave_changelink+0x135/0x150 net/bridge/br_netlink.c:1213
        __rtnl_newlink net/core/rtnetlink.c:3689 [inline]
        rtnl_newlink+0x169f/0x20a0 net/core/rtnetlink.c:3743
        rtnetlink_rcv_msg+0x89b/0x1180 net/core/rtnetlink.c:6635
        netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2564
        netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
        netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
        netlink_sendmsg+0x8db/0xcb0 net/netlink/af_netlink.c:1905
        sock_sendmsg_nosec net/socket.c:730 [inline]
        __sock_sendmsg+0x221/0x270 net/socket.c:745
        ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
        ___sys_sendmsg net/socket.c:2639 [inline]
        __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2668
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7f3b3047cf29
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f3b311740c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f3b305b4050 RCX: 00007f3b3047cf29
      RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000008
      RBP: 00007f3b304ec074 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000006e R14: 00007f3b305b4050 R15: 00007ffca2f3dc68
       </TASK>
      
      Fixes: 0840556e ("net: Protect dev->name by seqlock.")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62e58ddb
    • Taehee Yoo's avatar
      ionic: fix kernel panic due to multi-buffer handling · e3f02f32
      Taehee Yoo authored
      Currently, the ionic_run_xdp() doesn't handle multi-buffer packets
      properly for XDP_TX and XDP_REDIRECT.
      When a jumbo frame is received, the ionic_run_xdp() first makes xdp
      frame with all necessary pages in the rx descriptor.
      And if the action is either XDP_TX or XDP_REDIRECT, it should unmap
      dma-mapping and reset page pointer to NULL for all pages, not only the
      first page.
      But it doesn't for SG pages. So, SG pages unexpectedly will be reused.
      It eventually causes kernel panic.
      
      Oops: general protection fault, probably for non-canonical address 0x504f4e4dbebc64ff: 0000 [#1] PREEMPT SMP NOPTI
      CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.10.0-rc3+ #25
      RIP: 0010:xdp_return_frame+0x42/0x90
      Code: 01 75 12 5b 4c 89 e6 5d 31 c9 41 5c 31 d2 41 5d e9 73 fd ff ff 44 8b 6b 20 0f b7 43 0a 49 81 ed 68 01 00 00 49 29 c5 49 01 fd <41> 80 7d0
      RSP: 0018:ffff99d00122ce08 EFLAGS: 00010202
      RAX: 0000000000005453 RBX: ffff8d325f904000 RCX: 0000000000000001
      RDX: 00000000670e1000 RSI: 000000011f90d000 RDI: 504f4e4d4c4b4a49
      RBP: ffff99d003907740 R08: 0000000000000000 R09: 0000000000000000
      R10: 000000011f90d000 R11: 0000000000000000 R12: ffff8d325f904010
      R13: 504f4e4dbebc64fd R14: ffff8d3242b070c8 R15: ffff99d0039077c0
      FS:  0000000000000000(0000) GS:ffff8d399f780000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f41f6c85e38 CR3: 000000037ac30000 CR4: 00000000007506f0
      PKRU: 55555554
      Call Trace:
       <IRQ>
       ? die_addr+0x33/0x90
       ? exc_general_protection+0x251/0x2f0
       ? asm_exc_general_protection+0x22/0x30
       ? xdp_return_frame+0x42/0x90
       ionic_tx_clean+0x211/0x280 [ionic 15881354510e6a9c655c59c54812b319ed2cd015]
       ionic_tx_cq_service+0xd3/0x210 [ionic 15881354510e6a9c655c59c54812b319ed2cd015]
       ionic_txrx_napi+0x41/0x1b0 [ionic 15881354510e6a9c655c59c54812b319ed2cd015]
       __napi_poll.constprop.0+0x29/0x1b0
       net_rx_action+0x2c4/0x350
       handle_softirqs+0xf4/0x320
       irq_exit_rcu+0x78/0xa0
       common_interrupt+0x77/0x90
      
      Fixes: 5377805d ("ionic: implement xdp frags support")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3f02f32
    • Kory Maincent's avatar
      net: pse-pd: Kconfig: Fix missing firmware loader config select · 7eadf500
      Kory Maincent authored
      Selecting FW_UPLOAD is not sufficient as it allows the firmware loader
      API to be built as a module alongside the pd692x0 driver built as builtin.
      Add select FW_LOADER to fix this issue.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202406200632.hSChnX0g-lkp@intel.com/
      Fixes: 9a993845 ("net: pse-pd: Add PD692x0 PSE controller driver")
      Signed-off-by: default avatarKory Maincent <kory.maincent@bootlin.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7eadf500
    • Hangbin Liu's avatar
      bonding: fix incorrect software timestamping report · a95b031c
      Hangbin Liu authored
      The __ethtool_get_ts_info function returns directly if the device has a
      get_ts_info() method. For bonding with an active slave, this works correctly
      as we simply return the real device's timestamping information. However,
      when there is no active slave, we only check the slave's TX software
      timestamp information. We still need to set the phc index and RX timestamp
      information manually. Otherwise, the result will be look like:
      
        Time stamping parameters for bond0:
        Capabilities:
                software-transmit
        PTP Hardware Clock: 0
        Hardware Transmit Timestamp Modes: none
        Hardware Receive Filter Modes: none
      
      This issue does not affect VLAN or MACVLAN devices, as they only have one
      downlink and can directly use the downlink's timestamping information.
      
      Fixes: b8768dc4 ("net: ethtool: Refactor identical get_ts_info implementations.")
      Reported-by: default avatarLiang Li <liali@redhat.com>
      Closes: https://issues.redhat.com/browse/RHEL-42409Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Acked-by: default avatarKory Maincent <kory.maincent@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a95b031c
    • Aryan Srivastava's avatar
      net: mvpp2: fill-in dev_port attribute · 00418d55
      Aryan Srivastava authored
      Fill this in so user-space can identify multiple ports on the same CP
      unit.
      Signed-off-by: default avatarAryan Srivastava <aryan.srivastava@alliedtelesis.co.nz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      00418d55