1. 09 Oct, 2024 5 commits
  2. 08 Oct, 2024 18 commits
  3. 07 Oct, 2024 4 commits
  4. 04 Oct, 2024 13 commits
    • Christophe JAILLET's avatar
      net: phy: bcm84881: Fix some error handling paths · 9234a254
      Christophe JAILLET authored
      If phy_read_mmd() fails, the error code stored in 'bmsr' should be returned
      instead of 'val' which is likely to be 0.
      
      Fixes: 75f4d8d1 ("net: phy: add Broadcom BCM84881 PHY driver")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Link: https://patch.msgid.link/3e1755b0c40340d00e089d6adae5bca2f8c79e53.1727982168.git.christophe.jaillet@wanadoo.frSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9234a254
    • Anastasia Kovaleva's avatar
      net: Fix an unsafe loop on the list · 1dae9f11
      Anastasia Kovaleva authored
      The kernel may crash when deleting a genetlink family if there are still
      listeners for that family:
      
      Oops: Kernel access of bad area, sig: 11 [#1]
        ...
        NIP [c000000000c080bc] netlink_update_socket_mc+0x3c/0xc0
        LR [c000000000c0f764] __netlink_clear_multicast_users+0x74/0xc0
        Call Trace:
      __netlink_clear_multicast_users+0x74/0xc0
      genl_unregister_family+0xd4/0x2d0
      
      Change the unsafe loop on the list to a safe one, because inside the
      loop there is an element removal from this list.
      
      Fixes: b8273570 ("genetlink: fix netns vs. netlink table locking (2)")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAnastasia Kovaleva <a.kovaleva@yadro.com>
      Reviewed-by: default avatarDmitry Bogdanov <d.bogdanov@yadro.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://patch.msgid.link/20241003104431.12391-1-a.kovaleva@yadro.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1dae9f11
    • Luiz Augusto von Dentz's avatar
      Bluetooth: btusb: Don't fail external suspend requests · 61071229
      Luiz Augusto von Dentz authored
      Commit 4e0a1d8b
      ("Bluetooth: btusb: Don't suspend when there are connections")
      introduces a check for connections to prevent auto-suspend but that
      actually ignored the fact the .suspend callback can be called for
      external suspend requests which
      Documentation/driver-api/usb/power-management.rst states the following:
      
       'External suspend calls should never be allowed to fail in this way,
       only autosuspend calls.  The driver can tell them apart by applying
       the :c:func:`PMSG_IS_AUTO` macro to the message argument to the
       ``suspend`` method; it will return True for internal PM events
       (autosuspend) and False for external PM events.'
      
      In addition to that align system suspend with USB suspend by using
      hci_suspend_dev since otherwise the stack would be expecting events
      such as advertising reports which may not be delivered while the
      transport is suspended.
      
      Fixes: 4e0a1d8b ("Bluetooth: btusb: Don't suspend when there are connections")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Tested-by: default avatarKiran K <kiran.k@intel.com>
      61071229
    • Luiz Augusto von Dentz's avatar
      Bluetooth: hci_conn: Fix UAF in hci_enhanced_setup_sync · 18fd04ad
      Luiz Augusto von Dentz authored
      This checks if the ACL connection remains valid as it could be destroyed
      while hci_enhanced_setup_sync is pending on cmd_sync leading to the
      following trace:
      
      BUG: KASAN: slab-use-after-free in hci_enhanced_setup_sync+0x91b/0xa60
      Read of size 1 at addr ffff888002328ffd by task kworker/u5:2/37
      
      CPU: 0 UID: 0 PID: 37 Comm: kworker/u5:2 Not tainted 6.11.0-rc6-01300-g810be445d8d6 #7099
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
      Workqueue: hci0 hci_cmd_sync_work
      Call Trace:
       <TASK>
       dump_stack_lvl+0x5d/0x80
       ? hci_enhanced_setup_sync+0x91b/0xa60
       print_report+0x152/0x4c0
       ? hci_enhanced_setup_sync+0x91b/0xa60
       ? __virt_addr_valid+0x1fa/0x420
       ? hci_enhanced_setup_sync+0x91b/0xa60
       kasan_report+0xda/0x1b0
       ? hci_enhanced_setup_sync+0x91b/0xa60
       hci_enhanced_setup_sync+0x91b/0xa60
       ? __pfx_hci_enhanced_setup_sync+0x10/0x10
       ? __pfx___mutex_lock+0x10/0x10
       hci_cmd_sync_work+0x1c2/0x330
       process_one_work+0x7d9/0x1360
       ? __pfx_lock_acquire+0x10/0x10
       ? __pfx_process_one_work+0x10/0x10
       ? assign_work+0x167/0x240
       worker_thread+0x5b7/0xf60
       ? __kthread_parkme+0xac/0x1c0
       ? __pfx_worker_thread+0x10/0x10
       ? __pfx_worker_thread+0x10/0x10
       kthread+0x293/0x360
       ? __pfx_kthread+0x10/0x10
       ret_from_fork+0x2f/0x70
       ? __pfx_kthread+0x10/0x10
       ret_from_fork_asm+0x1a/0x30
       </TASK>
      
      Allocated by task 34:
       kasan_save_stack+0x30/0x50
       kasan_save_track+0x14/0x30
       __kasan_kmalloc+0x8f/0xa0
       __hci_conn_add+0x187/0x17d0
       hci_connect_sco+0x2e1/0xb90
       sco_sock_connect+0x2a2/0xb80
       __sys_connect+0x227/0x2a0
       __x64_sys_connect+0x6d/0xb0
       do_syscall_64+0x71/0x140
       entry_SYSCALL_64_after_hwframe+0x76/0x7e
      
      Freed by task 37:
       kasan_save_stack+0x30/0x50
       kasan_save_track+0x14/0x30
       kasan_save_free_info+0x3b/0x60
       __kasan_slab_free+0x101/0x160
       kfree+0xd0/0x250
       device_release+0x9a/0x210
       kobject_put+0x151/0x280
       hci_conn_del+0x448/0xbf0
       hci_abort_conn_sync+0x46f/0x980
       hci_cmd_sync_work+0x1c2/0x330
       process_one_work+0x7d9/0x1360
       worker_thread+0x5b7/0xf60
       kthread+0x293/0x360
       ret_from_fork+0x2f/0x70
       ret_from_fork_asm+0x1a/0x30
      
      Cc: stable@vger.kernel.org
      Fixes: e07a06b4 ("Bluetooth: Convert SCO configure_datapath to hci_sync")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      18fd04ad
    • Luiz Augusto von Dentz's avatar
      Bluetooth: RFCOMM: FIX possible deadlock in rfcomm_sk_state_change · 08d19142
      Luiz Augusto von Dentz authored
      rfcomm_sk_state_change attempts to use sock_lock so it must never be
      called with it locked but rfcomm_sock_ioctl always attempt to lock it
      causing the following trace:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      6.8.0-syzkaller-08951-gfe46a7dd #0 Not tainted
      ------------------------------------------------------
      syz-executor386/5093 is trying to acquire lock:
      ffff88807c396258 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1671 [inline]
      ffff88807c396258 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: rfcomm_sk_state_change+0x5b/0x310 net/bluetooth/rfcomm/sock.c:73
      
      but task is already holding lock:
      ffff88807badfd28 (&d->lock){+.+.}-{3:3}, at: __rfcomm_dlc_close+0x226/0x6a0 net/bluetooth/rfcomm/core.c:491
      
      Reported-by: syzbot+d7ce59b06b3eb14fd218@syzkaller.appspotmail.com
      Tested-by: syzbot+d7ce59b06b3eb14fd218@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=d7ce59b06b3eb14fd218
      Fixes: 3241ad82 ("[Bluetooth] Add timestamp support to L2CAP, RFCOMM and SCO")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      08d19142
    • Kory Maincent's avatar
      net: pse-pd: Fix enabled status mismatch · dda3529d
      Kory Maincent authored
      PSE controllers like the TPS23881 can forcefully turn off their
      configuration state. In such cases, the is_enabled() and get_status()
      callbacks will report the PSE as disabled, while admin_state_enabled
      will show it as enabled. This mismatch can lead the user to attempt
      to enable it, but no action is taken as admin_state_enabled remains set.
      
      The solution is to disable the PSE before enabling it, ensuring the
      actual status matches admin_state_enabled.
      
      Fixes: d83e1376 ("net: pse-pd: Use regulator framework within PSE framework")
      Signed-off-by: default avatarKory Maincent <kory.maincent@bootlin.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://patch.msgid.link/20241002121706.246143-1-kory.maincent@bootlin.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dda3529d
    • Kacper Ludwinski's avatar
      selftests: net: no_forwarding: fix VID for $swp2 in one_bridge_two_pvids() test · 9f49d14e
      Kacper Ludwinski authored
      Currently, the second bridge command overwrites the first one.
      Fix this by adding this VID to the interface behind $swp2.
      
      The one_bridge_two_pvids() test intends to check that there is no
      leakage of traffic between bridge ports which have a single VLAN - the
      PVID VLAN.
      
      Because of a typo, port $swp1 is configured with a PVID twice (second
      command overwrites first), and $swp2 isn't configured at all (and since
      the bridge vlan_default_pvid property is set to 0, this port will not
      have a PVID at all, so it will drop all untagged and priority-tagged
      traffic).
      
      So, instead of testing the configuration that was intended, we are
      testing a different one, where one port has PVID 2 and the other has
      no PVID. This incorrect version of the test should also pass, but is
      ineffective for its purpose, so fix the typo.
      
      This typo has an impact on results of the test,
      potentially leading to wrong conclusions regarding
      the functionality of a network device.
      
      The tests results:
      
      TEST: Switch ports in VLAN-aware bridge with different PVIDs:
      	Unicast non-IP untagged   [ OK ]
      	Multicast non-IP untagged   [ OK ]
      	Broadcast non-IP untagged   [ OK ]
      	Unicast IPv4 untagged   [ OK ]
      	Multicast IPv4 untagged   [ OK ]
      	Unicast IPv6 untagged   [ OK ]
      	Multicast IPv6 untagged   [ OK ]
      	Unicast non-IP VID 1   [ OK ]
      	Multicast non-IP VID 1   [ OK ]
      	Broadcast non-IP VID 1   [ OK ]
      	Unicast IPv4 VID 1   [ OK ]
      	Multicast IPv4 VID 1   [ OK ]
      	Unicast IPv6 VID 1   [ OK ]
      	Multicast IPv6 VID 1   [ OK ]
      	Unicast non-IP VID 4094   [ OK ]
      	Multicast non-IP VID 4094   [ OK ]
      	Broadcast non-IP VID 4094   [ OK ]
      	Unicast IPv4 VID 4094   [ OK ]
      	Multicast IPv4 VID 4094   [ OK ]
      	Unicast IPv6 VID 4094   [ OK ]
      	Multicast IPv6 VID 4094   [ OK ]
      
      Fixes: 476a4f05 ("selftests: forwarding: add a no_forwarding.sh test")
      Reviewed-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarKacper Ludwinski <kac.ludwinski@icloud.com>
      Link: https://patch.msgid.link/20241002051016.849-1-kac.ludwinski@icloud.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9f49d14e
    • Jakub Kicinski's avatar
      Merge branch 'ibmvnic-fix-for-send-scrq-direct' · 500257db
      Jakub Kicinski authored
      Nick Child says:
      
      ====================
      ibmvnic: Fix for send scrq direct
      
      This is a v2 of a patchset (now just patch) which addresses a
      bug in a new feature which is causing major link UP issues with
      certain physical cards.
      
      For a full summary of the issue:
        1. During vnic initialization we get the following values from vnic
           server regarding "Transmit / Receive Descriptor Requirement" (see
            PAPR Table 584. CAPABILITIES Commands):
          - LSO Tx frame = 0x0F , header offsets + L2, L3, L4 headers required
          - CSO Tx frame = 0x0C , header offsets + L2 header required
          - standard frame = 0x0C , header offsets + L2 header required
        2. Assume we are dealing with only "standard frames" from now on (no
           CSO, no LSO)
        3. When using 100G backing device, we don't hand vnic server any header
           information and TX is successful
        4. When using 25G backing device, we don't hand vnic server any header
          information and TX fails and we get "Adapter Error" transport events.
      The obvious issue here is that vnic client should be respecting the 0X0C
      header requirement for standard frames.  But 100G cards will also give
      0x0C despite the fact that we know TX works if we ignore it. That being
      said, we still must respect values given from the managing server. Will
      need to work with them going forward to hopefully get 100G cards to
      return 0x00 for this bitstring so the performance gains of using
      send_subcrq_direct can be continued.
      ====================
      
      Link: https://patch.msgid.link/20241001163200.1802522-1-nnac123@linux.ibm.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      500257db
    • Nick Child's avatar
      ibmvnic: Inspect header requirements before using scrq direct · de390657
      Nick Child authored
      Previously, the TX header requirement for standard frames was ignored.
      This requirement is a bitstring sent from the VIOS which maps to the
      type of header information needed during TX. If no header information,
      is needed then send subcrq direct can be used (which can be more
      performant).
      
      This bitstring was previously ignored for standard packets (AKA non LSO,
      non CSO) due to the belief that the bitstring was over-cautionary. It
      turns out that there are some configurations where the backing device
      does need header information for transmission of standard packets. If
      the information is not supplied then this causes continuous "Adapter
      error" transport events. Therefore, this bitstring should be respected
      and observed before considering the use of send subcrq direct.
      
      Fixes: 74839f7a ("ibmvnic: Introduce send sub-crq direct")
      Signed-off-by: default avatarNick Child <nnac123@linux.ibm.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20241001163200.1802522-2-nnac123@linux.ibm.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      de390657
    • Jakub Kicinski's avatar
      Merge branch 'netfilter-br_netfilter-fix-panic-with-metadata_dst-skb' · 69ea1d4a
      Jakub Kicinski authored
      Andy Roulin says:
      
      ====================
      netfilter: br_netfilter: fix panic with metadata_dst skb
      
      There's a kernel panic possible in the br_netfilter module when sending
      untagged traffic via a VxLAN device. Traceback is included below.
      This happens during the check for fragmentation in br_nf_dev_queue_xmit
      if the MTU on the VxLAN device is not big enough.
      
      It is dependent on:
      1) the br_netfilter module being loaded;
      2) net.bridge.bridge-nf-call-iptables set to 1;
      3) a bridge with a VxLAN (single-vxlan-device) netdevice as a bridge port;
      4) untagged frames with size higher than the VxLAN MTU forwarded/flooded
      
      This case was never supported in the first place, so the first patch drops
      such packets.
      
      A regression selftest is added as part of the second patch.
      
      PING 10.0.0.2 (10.0.0.2) from 0.0.0.0 h1-eth0: 2000(2028) bytes of data.
      [  176.291791] Unable to handle kernel NULL pointer dereference at
      virtual address 0000000000000110
      [  176.292101] Mem abort info:
      [  176.292184]   ESR = 0x0000000096000004
      [  176.292322]   EC = 0x25: DABT (current EL), IL = 32 bits
      [  176.292530]   SET = 0, FnV = 0
      [  176.292709]   EA = 0, S1PTW = 0
      [  176.292862]   FSC = 0x04: level 0 translation fault
      [  176.293013] Data abort info:
      [  176.293104]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
      [  176.293488]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
      [  176.293787]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
      [  176.293995] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043ef5000
      [  176.294166] [0000000000000110] pgd=0000000000000000,
      p4d=0000000000000000
      [  176.294827] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
      [  176.295252] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel veth
      br_netfilter bridge stp llc ipv6 crct10dif_ce
      [  176.295923] CPU: 0 PID: 188 Comm: ping Not tainted
      6.8.0-rc3-g5b3fbd61 #2
      [  176.296314] Hardware name: linux,dummy-virt (DT)
      [  176.296535] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS
      BTYPE=--)
      [  176.296808] pc : br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
      [  176.297382] lr : br_nf_dev_queue_xmit+0x2ac/0x4ec [br_netfilter]
      [  176.297636] sp : ffff800080003630
      [  176.297743] x29: ffff800080003630 x28: 0000000000000008 x27:
      ffff6828c49ad9f8
      [  176.298093] x26: ffff6828c49ad000 x25: 0000000000000000 x24:
      00000000000003e8
      [  176.298430] x23: 0000000000000000 x22: ffff6828c4960b40 x21:
      ffff6828c3b16d28
      [  176.298652] x20: ffff6828c3167048 x19: ffff6828c3b16d00 x18:
      0000000000000014
      [  176.298926] x17: ffffb0476322f000 x16: ffffb7e164023730 x15:
      0000000095744632
      [  176.299296] x14: ffff6828c3f1c880 x13: 0000000000000002 x12:
      ffffb7e137926a70
      [  176.299574] x11: 0000000000000001 x10: ffff6828c3f1c898 x9 :
      0000000000000000
      [  176.300049] x8 : ffff6828c49bf070 x7 : 0008460f18d5f20e x6 :
      f20e0100bebafeca
      [  176.300302] x5 : ffff6828c7f918fe x4 : ffff6828c49bf070 x3 :
      0000000000000000
      [  176.300586] x2 : 0000000000000000 x1 : ffff6828c3c7ad00 x0 :
      ffff6828c7f918f0
      [  176.300889] Call trace:
      [  176.301123]  br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
      [  176.301411]  br_nf_post_routing+0x2a8/0x3e4 [br_netfilter]
      [  176.301703]  nf_hook_slow+0x48/0x124
      [  176.302060]  br_forward_finish+0xc8/0xe8 [bridge]
      [  176.302371]  br_nf_hook_thresh+0x124/0x134 [br_netfilter]
      [  176.302605]  br_nf_forward_finish+0x118/0x22c [br_netfilter]
      [  176.302824]  br_nf_forward_ip.part.0+0x264/0x290 [br_netfilter]
      [  176.303136]  br_nf_forward+0x2b8/0x4e0 [br_netfilter]
      [  176.303359]  nf_hook_slow+0x48/0x124
      [  176.303803]  __br_forward+0xc4/0x194 [bridge]
      [  176.304013]  br_flood+0xd4/0x168 [bridge]
      [  176.304300]  br_handle_frame_finish+0x1d4/0x5c4 [bridge]
      [  176.304536]  br_nf_hook_thresh+0x124/0x134 [br_netfilter]
      [  176.304978]  br_nf_pre_routing_finish+0x29c/0x494 [br_netfilter]
      [  176.305188]  br_nf_pre_routing+0x250/0x524 [br_netfilter]
      [  176.305428]  br_handle_frame+0x244/0x3cc [bridge]
      [  176.305695]  __netif_receive_skb_core.constprop.0+0x33c/0xecc
      [  176.306080]  __netif_receive_skb_one_core+0x40/0x8c
      [  176.306197]  __netif_receive_skb+0x18/0x64
      [  176.306369]  process_backlog+0x80/0x124
      [  176.306540]  __napi_poll+0x38/0x17c
      [  176.306636]  net_rx_action+0x124/0x26c
      [  176.306758]  __do_softirq+0x100/0x26c
      [  176.307051]  ____do_softirq+0x10/0x1c
      [  176.307162]  call_on_irq_stack+0x24/0x4c
      [  176.307289]  do_softirq_own_stack+0x1c/0x2c
      [  176.307396]  do_softirq+0x54/0x6c
      [  176.307485]  __local_bh_enable_ip+0x8c/0x98
      [  176.307637]  __dev_queue_xmit+0x22c/0xd28
      [  176.307775]  neigh_resolve_output+0xf4/0x1a0
      [  176.308018]  ip_finish_output2+0x1c8/0x628
      [  176.308137]  ip_do_fragment+0x5b4/0x658
      [  176.308279]  ip_fragment.constprop.0+0x48/0xec
      [  176.308420]  __ip_finish_output+0xa4/0x254
      [  176.308593]  ip_finish_output+0x34/0x130
      [  176.308814]  ip_output+0x6c/0x108
      [  176.308929]  ip_send_skb+0x50/0xf0
      [  176.309095]  ip_push_pending_frames+0x30/0x54
      [  176.309254]  raw_sendmsg+0x758/0xaec
      [  176.309568]  inet_sendmsg+0x44/0x70
      [  176.309667]  __sys_sendto+0x110/0x178
      [  176.309758]  __arm64_sys_sendto+0x28/0x38
      [  176.309918]  invoke_syscall+0x48/0x110
      [  176.310211]  el0_svc_common.constprop.0+0x40/0xe0
      [  176.310353]  do_el0_svc+0x1c/0x28
      [  176.310434]  el0_svc+0x34/0xb4
      [  176.310551]  el0t_64_sync_handler+0x120/0x12c
      [  176.310690]  el0t_64_sync+0x190/0x194
      [  176.311066] Code: f9402e61 79402aa2 927ff821 f9400023 (f9408860)
      [  176.315743] ---[ end trace 0000000000000000 ]---
      [  176.316060] Kernel panic - not syncing: Oops: Fatal exception in
      interrupt
      [  176.316371] Kernel Offset: 0x37e0e3000000 from 0xffff800080000000
      [  176.316564] PHYS_OFFSET: 0xffff97d780000000
      [  176.316782] CPU features: 0x0,88000203,3c020000,0100421b
      [  176.317210] Memory Limit: none
      [  176.317527] ---[ end Kernel panic - not syncing: Oops: Fatal
      Exception in interrupt ]---\
      ====================
      
      Link: https://patch.msgid.link/20241001154400.22787-1-aroulin@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      69ea1d4a
    • Andy Roulin's avatar
      selftests: add regression test for br_netfilter panic · bc4d22b7
      Andy Roulin authored
      Add a new netfilter selftests to test against br_netfilter panics when
      VxLAN single-device is used together with untagged traffic and high MTU.
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarAndy Roulin <aroulin@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Link: https://patch.msgid.link/20241001154400.22787-3-aroulin@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bc4d22b7
    • Andy Roulin's avatar
      netfilter: br_netfilter: fix panic with metadata_dst skb · f9ff7665
      Andy Roulin authored
      Fix a kernel panic in the br_netfilter module when sending untagged
      traffic via a VxLAN device.
      This happens during the check for fragmentation in br_nf_dev_queue_xmit.
      
      It is dependent on:
      1) the br_netfilter module being loaded;
      2) net.bridge.bridge-nf-call-iptables set to 1;
      3) a bridge with a VxLAN (single-vxlan-device) netdevice as a bridge port;
      4) untagged frames with size higher than the VxLAN MTU forwarded/flooded
      
      When forwarding the untagged packet to the VxLAN bridge port, before
      the netfilter hooks are called, br_handle_egress_vlan_tunnel is called and
      changes the skb_dst to the tunnel dst. The tunnel_dst is a metadata type
      of dst, i.e., skb_valid_dst(skb) is false, and metadata->dst.dev is NULL.
      
      Then in the br_netfilter hooks, in br_nf_dev_queue_xmit, there's a check
      for frames that needs to be fragmented: frames with higher MTU than the
      VxLAN device end up calling br_nf_ip_fragment, which in turns call
      ip_skb_dst_mtu.
      
      The ip_dst_mtu tries to use the skb_dst(skb) as if it was a valid dst
      with valid dst->dev, thus the crash.
      
      This case was never supported in the first place, so drop the packet
      instead.
      
      PING 10.0.0.2 (10.0.0.2) from 0.0.0.0 h1-eth0: 2000(2028) bytes of data.
      [  176.291791] Unable to handle kernel NULL pointer dereference at
      virtual address 0000000000000110
      [  176.292101] Mem abort info:
      [  176.292184]   ESR = 0x0000000096000004
      [  176.292322]   EC = 0x25: DABT (current EL), IL = 32 bits
      [  176.292530]   SET = 0, FnV = 0
      [  176.292709]   EA = 0, S1PTW = 0
      [  176.292862]   FSC = 0x04: level 0 translation fault
      [  176.293013] Data abort info:
      [  176.293104]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
      [  176.293488]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
      [  176.293787]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
      [  176.293995] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043ef5000
      [  176.294166] [0000000000000110] pgd=0000000000000000,
      p4d=0000000000000000
      [  176.294827] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
      [  176.295252] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel veth
      br_netfilter bridge stp llc ipv6 crct10dif_ce
      [  176.295923] CPU: 0 PID: 188 Comm: ping Not tainted
      6.8.0-rc3-g5b3fbd61 #2
      [  176.296314] Hardware name: linux,dummy-virt (DT)
      [  176.296535] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS
      BTYPE=--)
      [  176.296808] pc : br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
      [  176.297382] lr : br_nf_dev_queue_xmit+0x2ac/0x4ec [br_netfilter]
      [  176.297636] sp : ffff800080003630
      [  176.297743] x29: ffff800080003630 x28: 0000000000000008 x27:
      ffff6828c49ad9f8
      [  176.298093] x26: ffff6828c49ad000 x25: 0000000000000000 x24:
      00000000000003e8
      [  176.298430] x23: 0000000000000000 x22: ffff6828c4960b40 x21:
      ffff6828c3b16d28
      [  176.298652] x20: ffff6828c3167048 x19: ffff6828c3b16d00 x18:
      0000000000000014
      [  176.298926] x17: ffffb0476322f000 x16: ffffb7e164023730 x15:
      0000000095744632
      [  176.299296] x14: ffff6828c3f1c880 x13: 0000000000000002 x12:
      ffffb7e137926a70
      [  176.299574] x11: 0000000000000001 x10: ffff6828c3f1c898 x9 :
      0000000000000000
      [  176.300049] x8 : ffff6828c49bf070 x7 : 0008460f18d5f20e x6 :
      f20e0100bebafeca
      [  176.300302] x5 : ffff6828c7f918fe x4 : ffff6828c49bf070 x3 :
      0000000000000000
      [  176.300586] x2 : 0000000000000000 x1 : ffff6828c3c7ad00 x0 :
      ffff6828c7f918f0
      [  176.300889] Call trace:
      [  176.301123]  br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
      [  176.301411]  br_nf_post_routing+0x2a8/0x3e4 [br_netfilter]
      [  176.301703]  nf_hook_slow+0x48/0x124
      [  176.302060]  br_forward_finish+0xc8/0xe8 [bridge]
      [  176.302371]  br_nf_hook_thresh+0x124/0x134 [br_netfilter]
      [  176.302605]  br_nf_forward_finish+0x118/0x22c [br_netfilter]
      [  176.302824]  br_nf_forward_ip.part.0+0x264/0x290 [br_netfilter]
      [  176.303136]  br_nf_forward+0x2b8/0x4e0 [br_netfilter]
      [  176.303359]  nf_hook_slow+0x48/0x124
      [  176.303803]  __br_forward+0xc4/0x194 [bridge]
      [  176.304013]  br_flood+0xd4/0x168 [bridge]
      [  176.304300]  br_handle_frame_finish+0x1d4/0x5c4 [bridge]
      [  176.304536]  br_nf_hook_thresh+0x124/0x134 [br_netfilter]
      [  176.304978]  br_nf_pre_routing_finish+0x29c/0x494 [br_netfilter]
      [  176.305188]  br_nf_pre_routing+0x250/0x524 [br_netfilter]
      [  176.305428]  br_handle_frame+0x244/0x3cc [bridge]
      [  176.305695]  __netif_receive_skb_core.constprop.0+0x33c/0xecc
      [  176.306080]  __netif_receive_skb_one_core+0x40/0x8c
      [  176.306197]  __netif_receive_skb+0x18/0x64
      [  176.306369]  process_backlog+0x80/0x124
      [  176.306540]  __napi_poll+0x38/0x17c
      [  176.306636]  net_rx_action+0x124/0x26c
      [  176.306758]  __do_softirq+0x100/0x26c
      [  176.307051]  ____do_softirq+0x10/0x1c
      [  176.307162]  call_on_irq_stack+0x24/0x4c
      [  176.307289]  do_softirq_own_stack+0x1c/0x2c
      [  176.307396]  do_softirq+0x54/0x6c
      [  176.307485]  __local_bh_enable_ip+0x8c/0x98
      [  176.307637]  __dev_queue_xmit+0x22c/0xd28
      [  176.307775]  neigh_resolve_output+0xf4/0x1a0
      [  176.308018]  ip_finish_output2+0x1c8/0x628
      [  176.308137]  ip_do_fragment+0x5b4/0x658
      [  176.308279]  ip_fragment.constprop.0+0x48/0xec
      [  176.308420]  __ip_finish_output+0xa4/0x254
      [  176.308593]  ip_finish_output+0x34/0x130
      [  176.308814]  ip_output+0x6c/0x108
      [  176.308929]  ip_send_skb+0x50/0xf0
      [  176.309095]  ip_push_pending_frames+0x30/0x54
      [  176.309254]  raw_sendmsg+0x758/0xaec
      [  176.309568]  inet_sendmsg+0x44/0x70
      [  176.309667]  __sys_sendto+0x110/0x178
      [  176.309758]  __arm64_sys_sendto+0x28/0x38
      [  176.309918]  invoke_syscall+0x48/0x110
      [  176.310211]  el0_svc_common.constprop.0+0x40/0xe0
      [  176.310353]  do_el0_svc+0x1c/0x28
      [  176.310434]  el0_svc+0x34/0xb4
      [  176.310551]  el0t_64_sync_handler+0x120/0x12c
      [  176.310690]  el0t_64_sync+0x190/0x194
      [  176.311066] Code: f9402e61 79402aa2 927ff821 f9400023 (f9408860)
      [  176.315743] ---[ end trace 0000000000000000 ]---
      [  176.316060] Kernel panic - not syncing: Oops: Fatal exception in
      interrupt
      [  176.316371] Kernel Offset: 0x37e0e3000000 from 0xffff800080000000
      [  176.316564] PHYS_OFFSET: 0xffff97d780000000
      [  176.316782] CPU features: 0x0,88000203,3c020000,0100421b
      [  176.317210] Memory Limit: none
      [  176.317527] ---[ end Kernel panic - not syncing: Oops: Fatal
      Exception in interrupt ]---\
      
      Fixes: 11538d03 ("bridge: vlan dst_metadata hooks in ingress and egress paths")
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarAndy Roulin <aroulin@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Link: https://patch.msgid.link/20241001154400.22787-2-aroulin@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f9ff7665
    • Vladimir Oltean's avatar
      net: dsa: sja1105: fix reception from VLAN-unaware bridges · 1f9fc48f
      Vladimir Oltean authored
      The blamed commit introduced an unexpected regression in the sja1105
      driver. Packets from VLAN-unaware bridge ports get received correctly,
      but the protocol stack can't seem to decode them properly.
      
      For ds->untag_bridge_pvid users (thus also sja1105), the blamed commit
      did introduce a functional change: dsa_switch_rcv() used to call
      dsa_untag_bridge_pvid(), which looked like this:
      
      	err = br_vlan_get_proto(br, &proto);
      	if (err)
      		return skb;
      
      	/* Move VLAN tag from data to hwaccel */
      	if (!skb_vlan_tag_present(skb) && skb->protocol == htons(proto)) {
      		skb = skb_vlan_untag(skb);
      		if (!skb)
      			return NULL;
      	}
      
      and now it calls dsa_software_vlan_untag() which has just this:
      
      	/* Move VLAN tag from data to hwaccel */
      	if (!skb_vlan_tag_present(skb)) {
      		skb = skb_vlan_untag(skb);
      		if (!skb)
      			return NULL;
      	}
      
      thus lacks any skb->protocol == bridge VLAN protocol check. That check
      is deferred until a later check for skb->vlan_proto (in the hwaccel area).
      
      The new code is problematic because, for VLAN-untagged packets,
      skb_vlan_untag() blindly takes the 4 bytes starting with the EtherType
      and turns them into a hwaccel VLAN tag. This is what breaks the protocol
      stack.
      
      It would be tempting to "make it work as before" and only call
      skb_vlan_untag() for those packets with the skb->protocol actually
      representing a VLAN.
      
      But the premise of the newly introduced dsa_software_vlan_untag() core
      function is not wrong. Drivers set ds->untag_bridge_pvid or
      ds->untag_vlan_aware_bridge_pvid presumably because they send all
      traffic to the CPU reception path as VLAN-tagged. So why should we spend
      any additional CPU cycles assuming that the packet may be VLAN-untagged?
      And why does the sja1105 driver opt into ds->untag_bridge_pvid if it
      doesn't always deliver packets to the CPU as VLAN-tagged?
      
      The answer to the latter question is indeed more interesting: it doesn't
      need to. This got done in commit 884be12f ("net: dsa: sja1105: add
      support for imprecise RX"), because I thought it would be needed, but I
      didn't realize that it doesn't actually make a difference.
      
      As explained in the commit message of the blamed patch, ds->untag_bridge_pvid
      only makes a difference in the VLAN-untagged receive path of a bridge port.
      However, in that operating mode, tag_sja1105.c makes use of VLAN tags
      with the ETH_P_SJA1105 TPID, and it decodes and consumes these VLAN tags
      as if they were DSA tags (aka tag_8021q operation). Even if commit
      884be12f ("net: dsa: sja1105: add support for imprecise RX") added
      this logic in sja1105_bridge_vlan_add():
      
      	/* Always install bridge VLANs as egress-tagged on the CPU port. */
      	if (dsa_is_cpu_port(ds, port))
      		flags = 0;
      
      that was for _bridge_ VLANs, which are _not_ committed to hardware
      in VLAN-unaware mode (aka the mode where ds->untag_bridge_pvid does
      anything at all). Even prior to that change, the tag_8021q VLANs
      were always installed as egress-tagged on the CPU port, see
      dsa_switch_tag_8021q_vlan_add():
      
      	u16 flags = 0; // egress-tagged, non-PVID
      
      	if (dsa_port_is_user(dp))
      		flags |= BRIDGE_VLAN_INFO_UNTAGGED |
      			 BRIDGE_VLAN_INFO_PVID;
      
      	err = dsa_port_do_tag_8021q_vlan_add(dp, info->vid,
      					     flags);
      	if (err)
      		return err;
      
      Whether the sja1105 driver needs the new flag, ds->untag_vlan_aware_bridge_pvid,
      rather than ds->untag_bridge_pvid, is a separate discussion. To fix the
      current bug in VLAN-unaware bridge mode, I would argue that the sja1105
      driver should not request something it doesn't need, rather than
      complicating the core DSA helper. Whereas before the blamed commit, this
      setting was harmless, now it has caused breakage.
      
      Fixes: 93e4649e ("net: dsa: provide a software untagging function on RX for VLAN-aware bridges")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://patch.msgid.link/20241001140206.50933-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1f9fc48f