1. 11 Jul, 2023 5 commits
    • Paolo Abeni's avatar
      Merge branch 'net-fec-fix-some-issues-of-ndo_xdp_xmit' · c0dbbdf5
      Paolo Abeni authored
      Wei Fang says:
      
      ====================
      net: fec: fix some issues of ndo_xdp_xmit()
      
      We encountered some issues when testing the ndo_xdp_xmit() interface
      of the fec driver on i.MX8MP and i.MX93 platforms. These issues are
      easy to reproduce, and the specific reproduction steps are as follows.
      
      step1: The ethernet port of a board (board A) is connected to the EQOS
      port of i.MX8MP/i.MX93, and the FEC port of i.MX8MP/i.MX93 is connected
      to another ethernet port, such as a switch port.
      
      step2: Board A uses the pktgen_sample03_burst_single_flow.sh to generate
      and send packets to i.MX8MP/i.MX93. The command is shown below.
      ./pktgen_sample03_burst_single_flow.sh -i eth0 -d 192.168.6.8 -m \
      56:bf:0d:68:b0:9e -s 1500
      
      step3: i.MX8MP/i.MX93 use the xdp_redirect bfp program to redirect the
      XDP frames from EQOS port to FEC port. The command is shown below.
      ./xdp_redirect eth1 eth0
      
      After a few moments, the warning or error logs will be printed in the
      console, for more details, please refer to the commit message of each
      patch.
      ====================
      
      Link: https://lore.kernel.org/r/20230706081012.2278063-1-wei.fang@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c0dbbdf5
    • Wei Fang's avatar
      net: fec: use netdev_err_once() instead of netdev_err() · 84a10947
      Wei Fang authored
      In the case of heavy XDP traffic to be transmitted, the console
      will print the error log continuously if there are lack of enough
      BDs to accommodate the frames. The log looks like below.
      
      [  160.013112] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      [  160.023116] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      [  160.028926] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      [  160.038946] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      [  160.044758] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      
      Not only will this log be replicated and redundant, it will also
      degrade XDP performance. So we use netdev_err_once() instead of
      netdev_err() now.
      
      Fixes: 6d6b39f1 ("net: fec: add initial XDP support")
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      84a10947
    • Wei Fang's avatar
      net: fec: increase the size of tx ring and update tx_wake_threshold · 56b3c6ba
      Wei Fang authored
      When the XDP feature is enabled and with heavy XDP frames to be
      transmitted, there is a considerable probability that available
      tx BDs are insufficient. This will lead to some XDP frames to be
      discarded and the "NOT enough BD for SG!" error log will appear
      in the console (as shown below).
      
      [  160.013112] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      [  160.023116] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      [  160.028926] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      [  160.038946] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      [  160.044758] fec 30be0000.ethernet eth0: NOT enough BD for SG!
      
      In the case of heavy XDP traffic, sometimes the speed of recycling
      tx BDs may be slower than the speed of sending XDP frames. There
      may be several specific reasons, such as the interrupt is not
      responsed in time, the efficiency of the NAPI callback function is
      too low due to all the queues (tx queues and rx queues) share the
      same NAPI, and so on.
      
      After trying various methods, I think that increase the size of tx
      BD ring is simple and effective. Maybe the best resolution is that
      allocate NAPI for each queue to improve the efficiency of the NAPI
      callback, but this change is a bit big and I didn't try this method.
      Perheps this method will be implemented in a future patch.
      
      This patch also updates the tx_wake_threshold of tx ring which is
      related to the size of tx ring in the previous logic. Otherwise,
      the tx_wake_threshold will be too high (403 BDs), which is more
      likely to impact the slow path in the case of heavy XDP traffic,
      because XDP path and slow path share the tx BD rings. According
      to Jakub's suggestion, the tx_wake_threshold is at least equal to
      tx_stop_threshold + 2 * MAX_SKB_FRAGS, if a queue of hundreds of
      entries is overflowing, we should be able to apply a hysteresis
      of a few tens of entries.
      
      Fixes: 6d6b39f1 ("net: fec: add initial XDP support")
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      56b3c6ba
    • Wei Fang's avatar
      net: fec: recycle pages for transmitted XDP frames · 20f79739
      Wei Fang authored
      Once the XDP frames have been successfully transmitted through the
      ndo_xdp_xmit() interface, it's the driver responsibility to free
      the frames so that the page_pool can recycle the pages and reuse
      them. However, this action is not implemented in the fec driver.
      This leads to a user-visible problem that the console will print
      the following warning log.
      
      [  157.568851] page_pool_release_retry() stalled pool shutdown 1389 inflight 60 sec
      [  217.983446] page_pool_release_retry() stalled pool shutdown 1389 inflight 120 sec
      [  278.399006] page_pool_release_retry() stalled pool shutdown 1389 inflight 181 sec
      [  338.812885] page_pool_release_retry() stalled pool shutdown 1389 inflight 241 sec
      [  399.226946] page_pool_release_retry() stalled pool shutdown 1389 inflight 302 sec
      
      Therefore, to solve this issue, we free XDP frames via xdp_return_frame()
      while cleaning the tx BD ring.
      
      Fixes: 6d6b39f1 ("net: fec: add initial XDP support")
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      20f79739
    • Wei Fang's avatar
      net: fec: dynamically set the NETDEV_XDP_ACT_NDO_XMIT feature of XDP · be7ecbe7
      Wei Fang authored
      When a XDP program is installed or uninstalled, fec_restart() will
      be invoked to reset MAC and buffer descriptor rings. It's reasonable
      not to transmit any packet during the process of reset. However, the
      NETDEV_XDP_ACT_NDO_XMIT bit of xdp_features is enabled by default,
      that is to say, it's possible that the fec_enet_xdp_xmit() will be
      invoked even if the process of reset is not finished. In this case,
      the redirected XDP frames might be dropped and available transmit BDs
      may be incorrectly deemed insufficient. So this patch disable the
      NETDEV_XDP_ACT_NDO_XMIT feature by default and dynamically configure
      this feature when the bpf program is installed or uninstalled.
      
      Fixes: e4ac7cc6 ("net: fec: turn on XDP features")
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      be7ecbe7
  2. 10 Jul, 2023 3 commits
  3. 09 Jul, 2023 2 commits
  4. 08 Jul, 2023 8 commits
    • Eric Dumazet's avatar
      udp6: fix udp6_ehashfn() typo · 51d03e2f
      Eric Dumazet authored
      Amit Klein reported that udp6_ehash_secret was initialized but never used.
      
      Fixes: 1bbdceef ("inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once")
      Reported-by: default avatarAmit Klein <aksecurity@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Cc: David Ahern <dsahern@kernel.org>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51d03e2f
    • Kuniyuki Iwashima's avatar
      icmp6: Fix null-ptr-deref of ip6_null_entry->rt6i_idev in icmp6_dev(). · 2aaa8a15
      Kuniyuki Iwashima authored
      With some IPv6 Ext Hdr (RPL, SRv6, etc.), we can send a packet that
      has the link-local address as src and dst IP and will be forwarded to
      an external IP in the IPv6 Ext Hdr.
      
      For example, the script below generates a packet whose src IP is the
      link-local address and dst is updated to 11::.
      
        # for f in $(find /proc/sys/net/ -name *seg6_enabled*); do echo 1 > $f; done
        # python3
        >>> from socket import *
        >>> from scapy.all import *
        >>>
        >>> SRC_ADDR = DST_ADDR = "fe80::5054:ff:fe12:3456"
        >>>
        >>> pkt = IPv6(src=SRC_ADDR, dst=DST_ADDR)
        >>> pkt /= IPv6ExtHdrSegmentRouting(type=4, addresses=["11::", "22::"], segleft=1)
        >>>
        >>> sk = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW)
        >>> sk.sendto(bytes(pkt), (DST_ADDR, 0))
      
      For such a packet, we call ip6_route_input() to look up a route for the
      next destination in these three functions depending on the header type.
      
        * ipv6_rthdr_rcv()
        * ipv6_rpl_srh_rcv()
        * ipv6_srh_rcv()
      
      If no route is found, ip6_null_entry is set to skb, and the following
      dst_input(skb) calls ip6_pkt_drop().
      
      Finally, in icmp6_dev(), we dereference skb_rt6_info(skb)->rt6i_idev->dev
      as the input device is the loopback interface.  Then, we have to check if
      skb_rt6_info(skb)->rt6i_idev is NULL or not to avoid NULL pointer deref
      for ip6_null_entry.
      
      BUG: kernel NULL pointer dereference, address: 0000000000000000
       PF: supervisor read access in kernel mode
       PF: error_code(0x0000) - not-present page
      PGD 0 P4D 0
      Oops: 0000 [#1] PREEMPT SMP PTI
      CPU: 0 PID: 157 Comm: python3 Not tainted 6.4.0-11996-gb121d614371c #35
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      RIP: 0010:icmp6_send (net/ipv6/icmp.c:436 net/ipv6/icmp.c:503)
      Code: fe ff ff 48 c7 40 30 c0 86 5d 83 e8 c6 44 1c 00 e9 c8 fc ff ff 49 8b 46 58 48 83 e0 fe 0f 84 4a fb ff ff 48 8b 80 d0 00 00 00 <48> 8b 00 44 8b 88 e0 00 00 00 e9 34 fb ff ff 4d 85 ed 0f 85 69 01
      RSP: 0018:ffffc90000003c70 EFLAGS: 00000286
      RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000e0
      RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff888006d72a18
      RBP: ffffc90000003d80 R08: 0000000000000000 R09: 0000000000000001
      R10: ffffc90000003d98 R11: 0000000000000040 R12: ffff888006d72a10
      R13: 0000000000000000 R14: ffff8880057fb800 R15: ffffffff835d86c0
      FS:  00007f9dc72ee740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000000057b2000 CR4: 00000000007506f0
      PKRU: 55555554
      Call Trace:
       <IRQ>
       ip6_pkt_drop (net/ipv6/route.c:4513)
       ipv6_rthdr_rcv (net/ipv6/exthdrs.c:640 net/ipv6/exthdrs.c:686)
       ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:437 (discriminator 5))
       ip6_input_finish (./include/linux/rcupdate.h:781 net/ipv6/ip6_input.c:483)
       __netif_receive_skb_one_core (net/core/dev.c:5455)
       process_backlog (./include/linux/rcupdate.h:781 net/core/dev.c:5895)
       __napi_poll (net/core/dev.c:6460)
       net_rx_action (net/core/dev.c:6529 net/core/dev.c:6660)
       __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:554)
       do_softirq (kernel/softirq.c:454 kernel/softirq.c:441)
       </IRQ>
       <TASK>
       __local_bh_enable_ip (kernel/softirq.c:381)
       __dev_queue_xmit (net/core/dev.c:4231)
       ip6_finish_output2 (./include/net/neighbour.h:544 net/ipv6/ip6_output.c:135)
       rawv6_sendmsg (./include/net/dst.h:458 ./include/linux/netfilter.h:303 net/ipv6/raw.c:656 net/ipv6/raw.c:914)
       sock_sendmsg (net/socket.c:725 net/socket.c:748)
       __sys_sendto (net/socket.c:2134)
       __x64_sys_sendto (net/socket.c:2146 net/socket.c:2142 net/socket.c:2142)
       do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      RIP: 0033:0x7f9dc751baea
      Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
      RSP: 002b:00007ffe98712c38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00007ffe98712cf8 RCX: 00007f9dc751baea
      RDX: 0000000000000060 RSI: 00007f9dc6460b90 RDI: 0000000000000003
      RBP: 00007f9dc56e8be0 R08: 00007ffe98712d70 R09: 000000000000001c
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f9dc6af5d1b
       </TASK>
      Modules linked in:
      CR2: 0000000000000000
       ---[ end trace 0000000000000000 ]---
      RIP: 0010:icmp6_send (net/ipv6/icmp.c:436 net/ipv6/icmp.c:503)
      Code: fe ff ff 48 c7 40 30 c0 86 5d 83 e8 c6 44 1c 00 e9 c8 fc ff ff 49 8b 46 58 48 83 e0 fe 0f 84 4a fb ff ff 48 8b 80 d0 00 00 00 <48> 8b 00 44 8b 88 e0 00 00 00 e9 34 fb ff ff 4d 85 ed 0f 85 69 01
      RSP: 0018:ffffc90000003c70 EFLAGS: 00000286
      RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000e0
      RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff888006d72a18
      RBP: ffffc90000003d80 R08: 0000000000000000 R09: 0000000000000001
      R10: ffffc90000003d98 R11: 0000000000000040 R12: ffff888006d72a10
      R13: 0000000000000000 R14: ffff8880057fb800 R15: ffffffff835d86c0
      FS:  00007f9dc72ee740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000000057b2000 CR4: 00000000007506f0
      PKRU: 55555554
      Kernel panic - not syncing: Fatal exception in interrupt
      Kernel Offset: disabled
      
      Fixes: 4832c30d ("net: ipv6: put host and anycast routes on device with address")
      Reported-by: default avatarWang Yufen <wangyufen@huawei.com>
      Closes: https://lore.kernel.org/netdev/c41403a9-c2f6-3b7e-0c96-e1901e605cd0@huawei.com/Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2aaa8a15
    • David S. Miller's avatar
      Merge branch 's390-ism-fixes' · bbffab69
      David S. Miller authored
      Niklas Schnelle says:
      
      ====================
      s390/ism: Fixes to client handling
      
      This is v2 of the patch previously titled "s390/ism: Detangle ISM client
      IRQ and event forwarding". As suggested by Paolo Abeni I split the patch
      up. While doing so I noticed another problem that was fixed by this patch
      concerning the way the workqueues access the client structs. This means the
      second patch turning the workqueues into simple direct calls also fixes
      a problem. Finally I split off a third patch just for fixing
      ism_unregister_client()s error path.
      
      The code after these 3 patches is identical to the result of the v1 patch
      except that I also turned the dev_err() for still registered DMBs into
      a WARN().
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbffab69
    • Niklas Schnelle's avatar
      s390/ism: Do not unregister clients with registered DMBs · 266deeea
      Niklas Schnelle authored
      When ism_unregister_client() is called but the client still has DMBs
      registered it returns -EBUSY and prints an error. This only happens
      after the client has already been unregistered however. This is
      unexpected as the unregister claims to have failed. Furthermore as this
      implies a client bug a WARN() is more appropriate. Thus move the
      deregistration after the check and use WARN().
      
      Fixes: 89e7d2ba ("net/ism: Add new API for client registration")
      Signed-off-by: default avatarNiklas Schnelle <schnelle@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      266deeea
    • Niklas Schnelle's avatar
      s390/ism: Fix and simplify add()/remove() callback handling · 76631ffa
      Niklas Schnelle authored
      Previously the clients_lock was protecting the clients array against
      concurrent addition/removal of clients but was also accessed from IRQ
      context. This meant that it had to be a spinlock and that the add() and
      remove() callbacks in which clients need to do allocation and take
      mutexes can't be called under the clients_lock. To work around this these
      callbacks were moved to workqueues. This not only introduced significant
      complexity but is also subtly broken in at least one way.
      
      In ism_dev_init() and ism_dev_exit() clients[i]->tgt_ism is used to
      communicate the added/removed ISM device to the work function. While
      write access to client[i]->tgt_ism is protected by the clients_lock and
      the code waits that there is no pending add/remove work before and after
      setting clients[i]->tgt_ism this is not enough. The problem is that the
      wait happens based on per ISM device counters. Thus a concurrent
      ism_dev_init()/ism_dev_exit() for a different ISM device may overwrite
      a clients[i]->tgt_ism between unlocking the clients_lock and the
      subsequent wait for the work to finnish.
      
      Thankfully with the clients_lock no longer held in IRQ context it can be
      turned into a mutex which can be held during the calls to add()/remove()
      completely removing the need for the workqueues and the associated
      broken housekeeping including the per ISM device counters and the
      clients[i]->tgt_ism.
      
      Fixes: 89e7d2ba ("net/ism: Add new API for client registration")
      Signed-off-by: default avatarNiklas Schnelle <schnelle@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76631ffa
    • Niklas Schnelle's avatar
      s390/ism: Fix locking for forwarding of IRQs and events to clients · 6b5c13b5
      Niklas Schnelle authored
      The clients array references all registered clients and is protected by
      the clients_lock. Besides its use as general list of clients the clients
      array is accessed in ism_handle_irq() to forward ISM device events to
      clients.
      
      While the clients_lock is taken in the IRQ handler when calling
      handle_event() it is however incorrectly not held during the
      client->handle_irq() call and for the preceding clients[] access leaving
      it unprotected against concurrent client (un-)registration.
      
      Furthermore the accesses to ism->sba_client_arr[] in ism_register_dmb()
      and ism_unregister_dmb() are not protected by any lock. This is
      especially problematic as the client ID from the ism->sba_client_arr[]
      is not checked against NO_CLIENT and neither is the client pointer
      checked.
      
      Instead of expanding the use of the clients_lock further add a separate
      array in struct ism_dev which references clients subscribed to the
      device's events and IRQs. This array is protected by ism->lock which is
      already taken in ism_handle_irq() and can be taken outside the IRQ
      handler when adding/removing subscribers or the accessing
      ism->sba_client_arr[]. This also means that the clients_lock is no
      longer taken in IRQ context.
      
      Fixes: 89e7d2ba ("net/ism: Add new API for client registration")
      Signed-off-by: default avatarNiklas Schnelle <schnelle@linux.ibm.com>
      Reviewed-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b5c13b5
    • Paolo Abeni's avatar
      net: prevent skb corruption on frag list segmentation · c329b261
      Paolo Abeni authored
      Ian reported several skb corruptions triggered by rx-gro-list,
      collecting different oops alike:
      
      [   62.624003] BUG: kernel NULL pointer dereference, address: 00000000000000c0
      [   62.631083] #PF: supervisor read access in kernel mode
      [   62.636312] #PF: error_code(0x0000) - not-present page
      [   62.641541] PGD 0 P4D 0
      [   62.644174] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [   62.648629] CPU: 1 PID: 913 Comm: napi/eno2-79 Not tainted 6.4.0 #364
      [   62.655162] Hardware name: Supermicro Super Server/A2SDi-12C-HLN4F, BIOS 1.7a 10/13/2022
      [   62.663344] RIP: 0010:__udp_gso_segment (./include/linux/skbuff.h:2858
      ./include/linux/udp.h:23 net/ipv4/udp_offload.c:228 net/ipv4/udp_offload.c:261
      net/ipv4/udp_offload.c:277)
      [   62.687193] RSP: 0018:ffffbd3a83b4f868 EFLAGS: 00010246
      [   62.692515] RAX: 00000000000000ce RBX: 0000000000000000 RCX: 0000000000000000
      [   62.699743] RDX: ffffa124def8a000 RSI: 0000000000000079 RDI: ffffa125952a14d4
      [   62.706970] RBP: ffffa124def8a000 R08: 0000000000000022 R09: 00002000001558c9
      [   62.714199] R10: 0000000000000000 R11: 00000000be554639 R12: 00000000000000e2
      [   62.721426] R13: ffffa125952a1400 R14: ffffa125952a1400 R15: 00002000001558c9
      [   62.728654] FS:  0000000000000000(0000) GS:ffffa127efa40000(0000)
      knlGS:0000000000000000
      [   62.736852] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   62.742702] CR2: 00000000000000c0 CR3: 00000001034b0000 CR4: 00000000003526e0
      [   62.749948] Call Trace:
      [   62.752498]  <TASK>
      [   62.779267] inet_gso_segment (net/ipv4/af_inet.c:1398)
      [   62.787605] skb_mac_gso_segment (net/core/gro.c:141)
      [   62.791906] __skb_gso_segment (net/core/dev.c:3403 (discriminator 2))
      [   62.800492] validate_xmit_skb (./include/linux/netdevice.h:4862
      net/core/dev.c:3659)
      [   62.804695] validate_xmit_skb_list (net/core/dev.c:3710)
      [   62.809158] sch_direct_xmit (net/sched/sch_generic.c:330)
      [   62.813198] __dev_queue_xmit (net/core/dev.c:3805 net/core/dev.c:4210)
      net/netfilter/core.c:626)
      [   62.821093] br_dev_queue_push_xmit (net/bridge/br_forward.c:55)
      [   62.825652] maybe_deliver (net/bridge/br_forward.c:193)
      [   62.829420] br_flood (net/bridge/br_forward.c:233)
      [   62.832758] br_handle_frame_finish (net/bridge/br_input.c:215)
      [   62.837403] br_handle_frame (net/bridge/br_input.c:298
      net/bridge/br_input.c:416)
      [   62.851417] __netif_receive_skb_core.constprop.0 (net/core/dev.c:5387)
      [   62.866114] __netif_receive_skb_list_core (net/core/dev.c:5570)
      [   62.871367] netif_receive_skb_list_internal (net/core/dev.c:5638
      net/core/dev.c:5727)
      [   62.876795] napi_complete_done (./include/linux/list.h:37
      ./include/net/gro.h:434 ./include/net/gro.h:429 net/core/dev.c:6067)
      [   62.881004] ixgbe_poll (drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3191)
      [   62.893534] __napi_poll (net/core/dev.c:6498)
      [   62.897133] napi_threaded_poll (./include/linux/netpoll.h:89
      net/core/dev.c:6640)
      [   62.905276] kthread (kernel/kthread.c:379)
      [   62.913435] ret_from_fork (arch/x86/entry/entry_64.S:314)
      [   62.917119]  </TASK>
      
      In the critical scenario, rx-gro-list GRO-ed packets are fed, via a
      bridge, both to the local input path and to an egress device (tun).
      
      The segmentation of such packets unsafely writes to the cloned skbs
      with shared heads.
      
      This change addresses the issue by uncloning as needed the
      to-be-segmented skbs.
      Reported-by: default avatarIan Kumlien <ian.kumlien@gmail.com>
      Tested-by: default avatarIan Kumlien <ian.kumlien@gmail.com>
      Fixes: 3a1296a3 ("net: Support GRO/GSO fraglist chaining.")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c329b261
    • Rafał Miłecki's avatar
      net: bgmac: postpone turning IRQs off to avoid SoC hangs · e7731194
      Rafał Miłecki authored
      Turning IRQs off is done by accessing Ethernet controller registers.
      That can't be done until device's clock is enabled. It results in a SoC
      hang otherwise.
      
      This bug remained unnoticed for years as most bootloaders keep all
      Ethernet interfaces turned on. It seems to only affect a niche SoC
      family BCM47189. It has two Ethernet controllers but CFE bootloader uses
      only the first one.
      
      Fixes: 34322615 ("net: bgmac: Mask interrupts during probe")
      Signed-off-by: default avatarRafał Miłecki <rafal@milecki.pl>
      Reviewed-by: default avatarMichal Kubiak <michal.kubiak@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7731194
  5. 07 Jul, 2023 15 commits
    • Ivan Babrou's avatar
      udp6: add a missing call into udp_fail_queue_rcv_skb tracepoint · 8139dccd
      Ivan Babrou authored
      The tracepoint has existed for 12 years, but it only covered udp
      over the legacy IPv4 protocol. Having it enabled for udp6 removes
      the unnecessary difference in error visibility.
      Signed-off-by: default avatarIvan Babrou <ivan@cloudflare.com>
      Fixes: 296f7ea7 ("udp: add tracepoints for queueing skb to rcvbuf")
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8139dccd
    • Shannon Nelson's avatar
      ionic: remove dead device fail path · 3a7af34f
      Shannon Nelson authored
      Remove the probe error path code that leaves the driver bound
      to the device, but with essentially a dead device.  This was
      useful maybe twice early in the driver's life and no longer
      makes sense to keep.
      
      Fixes: 30a1e6d0 ("ionic: keep ionic dev on lif init fail")
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a7af34f
    • Nitya Sunkad's avatar
      ionic: remove WARN_ON to prevent panic_on_warn · abfb2a58
      Nitya Sunkad authored
      Remove unnecessary early code development check and the WARN_ON
      that it uses.  The irq alloc and free paths have long been
      cleaned up and this check shouldn't have stuck around so long.
      
      Fixes: 77ceb68e ("ionic: Add notifyq support")
      Signed-off-by: default avatarNitya Sunkad <nitya.sunkad@amd.com>
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abfb2a58
    • Sai Krishna's avatar
      octeontx2-af: Move validation of ptp pointer before its usage · 7709fbd4
      Sai Krishna authored
      Moved PTP pointer validation before its use to avoid smatch warning.
      Also used kzalloc/kfree instead of devm_kzalloc/devm_kfree.
      
      Fixes: 2ef4e45d ("octeontx2-af: Add PTP PPS Errata workaround on CN10K silicon")
      Signed-off-by: default avatarNaveen Mamindlapalli <naveenm@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarSai Krishna <saikrishnag@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7709fbd4
    • Ratheesh Kannoth's avatar
      octeontx2-af: Promisc enable/disable through mbox · af42088b
      Ratheesh Kannoth authored
      In legacy silicon, promiscuous mode is only modified
      through CGX mbox messages. In CN10KB silicon, it is modified
      from CGX mbox and NIX. This breaks legacy application
      behaviour. Fix this by removing call from NIX.
      
      Fixes: d6c9784b ("octeontx2-af: Invoke exact match functions if supported")
      Signed-off-by: default avatarRatheesh Kannoth <rkannoth@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarMichal Kubiak <michal.kubiak@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af42088b
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · b61aac02
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-07-05 (igc)
      
      This series contains updates to igc driver only.
      
      Husaini adds check to increment Qbv change error counter only on taprio
      Qbvs. He also removes delay during Tx ring configuration and
      resolves Tx hang that could occur when transmitting on a gate to be
      closed.
      
      Prasad Koya reports ethtool link mode as TP (twisted pair).
      
      Tee Min corrects value for max SDU.
      
      Aravindhan ensures that registers for PPS are always programmed to occur
      in future.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b61aac02
    • Junfeng Guo's avatar
      gve: Set default duplex configuration to full · 0503efea
      Junfeng Guo authored
      Current duplex mode was unset in the driver, resulting in the default
      parameter being set to 0, which corresponds to half duplex. It might
      mislead users to have incorrect expectation about the driver's
      transmission capabilities.
      Set the default duplex configuration to full, as the driver runs in
      full duplex mode at this point.
      
      Fixes: 7e074d5a ("gve: Enable Link Speed Reporting in the driver.")
      Signed-off-by: default avatarJunfeng Guo <junfeng.guo@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Message-ID: <20230706044128.2726747-1-junfeng.guo@intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0503efea
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 41b9eff0
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-07-05 (ice)
      
      This series contains updates to ice driver only.
      
      Sridhar fixes incorrect comparison of max Tx rate limit to occur against
      each TC value rather than the aggregate. He also resolves an issue with
      the wrong VSI being used when setting max Tx rate when TCs are enabled.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        ice: Fix tx queue rate limit when TCs are configured
        ice: Fix max_rate check while configuring TX rate limits
      ====================
      
      Link: https://lore.kernel.org/r/20230705201346.49370-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      41b9eff0
    • Jakub Kicinski's avatar
      Merge tag 'mlx5-fixes-2023-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 4863b57b
      Jakub Kicinski authored
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2023-07-05
      
      This series provides bug fixes to mlx5 driver.
      
      * tag 'mlx5-fixes-2023-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
        net/mlx5e: RX, Fix page_pool page fragment tracking for XDP
        net/mlx5: Query hca_cap_2 only when supported
        net/mlx5e: TC, CT: Offload ct clear only once
        net/mlx5e: Check for NOT_READY flag state after locking
        net/mlx5: Register a unique thermal zone per device
        net/mlx5e: RX, Fix flush and close release flow of regular rq for legacy rq
        net/mlx5e: fix memory leak in mlx5e_ptp_open
        net/mlx5e: fix memory leak in mlx5e_fs_tt_redirect_any_create
        net/mlx5e: fix double free in mlx5e_destroy_flow_table
      ====================
      
      Link: https://lore.kernel.org/r/20230705175757.284614-1-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4863b57b
    • M A Ramdhan's avatar
      net/sched: cls_fw: Fix improper refcount update leads to use-after-free · 0323bce5
      M A Ramdhan authored
      In the event of a failure in tcf_change_indev(), fw_set_parms() will
      immediately return an error after incrementing or decrementing
      reference counter in tcf_bind_filter().  If attacker can control
      reference counter to zero and make reference freed, leading to
      use after free.
      
      In order to prevent this, move the point of possible failure above the
      point where the TC_FW_CLASSID is handled.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarM A Ramdhan <ramdhan@starlabs.sg>
      Signed-off-by: default avatarM A Ramdhan <ramdhan@starlabs.sg>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Message-ID: <20230705161530.52003-1-ramdhan@starlabs.sg>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0323bce5
    • Quan Zhou's avatar
      wifi: mt76: mt7921e: fix init command fail with enabled device · 525c469e
      Quan Zhou authored
      For some cases as below, we may encounter the unpreditable chip stats
      in driver probe()
      * The system reboot flow do not work properly, such as kernel oops while
        rebooting, and then the driver do not go back to default status at
        this moment.
      * Similar to the flow above. If the device was enabled in BIOS or UEFI,
        the system may switch to Linux without driver fully shutdown.
      
      To avoid the problem, force push the device back to default in probe()
      * mt7921e_mcu_fw_pmctrl() : return control privilege to chip side.
      * mt7921_wfsys_reset()    : cleanup chip config before resource init.
      
      Error log
      [59007.600714] mt7921e 0000:02:00.0: ASIC revision: 79220010
      [59010.889773] mt7921e 0000:02:00.0: Message 00000010 (seq 1) timeout
      [59010.889786] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59014.217839] mt7921e 0000:02:00.0: Message 00000010 (seq 2) timeout
      [59014.217852] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59017.545880] mt7921e 0000:02:00.0: Message 00000010 (seq 3) timeout
      [59017.545893] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59020.874086] mt7921e 0000:02:00.0: Message 00000010 (seq 4) timeout
      [59020.874099] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59024.202019] mt7921e 0000:02:00.0: Message 00000010 (seq 5) timeout
      [59024.202033] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59027.530082] mt7921e 0000:02:00.0: Message 00000010 (seq 6) timeout
      [59027.530096] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59030.857888] mt7921e 0000:02:00.0: Message 00000010 (seq 7) timeout
      [59030.857904] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59034.185946] mt7921e 0000:02:00.0: Message 00000010 (seq 8) timeout
      [59034.185961] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59037.514249] mt7921e 0000:02:00.0: Message 00000010 (seq 9) timeout
      [59037.514262] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59040.842362] mt7921e 0000:02:00.0: Message 00000010 (seq 10) timeout
      [59040.842375] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59040.923845] mt7921e 0000:02:00.0: hardware init failed
      
      Cc: stable@vger.kernel.org
      Fixes: 5c14a5f9 ("mt76: mt7921: introduce mt7921e support")
      Tested-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Tested-by: default avatarJuan Martinez <juan.martinez@amd.com>
      Co-developed-by: default avatarLeon Yen <leon.yen@mediatek.com>
      Signed-off-by: default avatarLeon Yen <leon.yen@mediatek.com>
      Signed-off-by: default avatarQuan Zhou <quan.zhou@mediatek.com>
      Signed-off-by: default avatarDeren Wu <deren.wu@mediatek.com>
      Message-ID: <39fcb7cee08d4ab940d38d82f21897483212483f.1688569385.git.deren.wu@mediatek.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      525c469e
    • Jakub Kicinski's avatar
      Merge branch 'fix-dropping-of-oversize-preemptible-frames-with-felix-dsa-driver' · 1ce1a745
      Jakub Kicinski authored
      Vladimir Oltean says:
      
      ====================
      Fix dropping of oversize preemptible frames with felix DSA driver
      
      It has been reported that preemptible traffic doesn't completely behave
      as expected. Namely, large packets should be able to be squeezed
      (through fragmentation) through taprio time slots smaller than the
      transmission time of the full frame. That does not happen due to logic
      in the driver (for oversize frame dropping with taprio) that was not
      updated in order for this use case to work.
      
      I am not sure whether it qualifies as "net" material, because some
      structural changes are involved, and it is a "never worked" scenario.
      OTOH, this is a complaint coming from users for a v6.4 kernel.
      It's up to maintainers to decide whether this series can be considered;
      I've submitted it as non-RFC in the optimistic case that it will be :)
      
      Demo script illustrating the issue below.
      
      add_taprio()
      {
      	local ifname=$1
      
      	echo "Creating root taprio"
      	tc qdisc replace dev $ifname handle 8001: parent root stab overhead 24 taprio \
      		num_tc 8 \
      		map 0 1 2 3 4 5 6 7 \
      		queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
      		base-time 0 \
      		sched-entry S 01 1216 \
      		sched-entry S fe 12368 \
      		fp P E E E E E E E \
      		flags 0x2
      }
      
      remove_taprio()
      {
      	local ifname=$1
      
      	echo "Removing taprio"
      	tc qdisc del dev $ifname root
      }
      
      ip netns add ns0
      ip link set eno0 netns ns0 && ip -n ns0 link set eno0 up && ip -n ns0 addr add 192.168.100.1/24 dev eno0
      ip addr add 192.168.100.2/24 dev swp0 && ip link set swp0 up
      ip netns exec ns0 ethtool --set-mm eno0 pmac-enabled on verify-enabled off tx-enabled on
      ethtool --set-mm swp0 pmac-enabled on verify-enabled off tx-enabled on
      add_taprio swp0
      
      ping 192.168.100.1 -s 1000 -c 5 # sent through TC0
      ethtool -I --show-mm swp0 | grep MACMergeFragCountTx # should increase
      
      ip addr flush swp0 && ip link set swp0 down
      remove_taprio swp0
      ethtool --set-mm swp0 pmac-enabled off verify-enabled off tx-enabled off
      ip netns exec ns0 ethtool --set-mm eno0 pmac-enabled off verify-enabled off tx-enabled off
      ip netns del ns0
      ====================
      
      Link: https://lore.kernel.org/r/20230705104422.49025-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1ce1a745
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix oversize frame dropping for preemptible TCs · c6efb4ae
      Vladimir Oltean authored
      This switch implements Hold/Release in a strange way, with no control
      from the user as required by IEEE 802.1Q-2018 through Set-And-Hold-MAC
      and Set-And-Release-MAC, but rather, it emits HOLD requests implicitly
      based on the schedule.
      
      Namely, when the gate of a preemptible TC is about to close (actually
      QSYS::PREEMPTION_CFG.HOLD_ADVANCE octet times in advance of this event),
      the QSYS seems to emit a HOLD request pulse towards the MAC which
      preempts the currently transmitted packet, and further packets are held
      back in the queue system.
      
      This allows large frames to be squeezed through small time slots,
      because HOLD requests initiated by the gate events result in the frame
      being segmented in multiple fragments, the bit time of which is equal to
      the size of the time slot.
      
      It has been reported that the vsc9959_tas_guard_bands_update() logic
      breaks this, because it doesn't take preemptible TCs into account, and
      enables oversized frame dropping when the time slot doesn't allow a full
      MTU to be sent, but it does allow 2*minFragSize to be sent (128B).
      Packets larger than 128B are dropped instead of being sent in multiple
      fragments.
      
      Confusingly, the manual says:
      
      | For guard band, SDU calculation of a traffic class of a port, if
      | preemption is enabled (through 'QSYS::PREEMPTION_CFG.P_QUEUES') then
      | QSYS::PREEMPTION_CFG.HOLD_ADVANCE is used, otherwise
      | QSYS::QMAXSDU_CFG_*.QMAXSDU_* is used.
      
      but this only refers to the static guard band durations, and the
      QMAXSDU_CFG_* registers have dual purpose - the other being oversized
      frame dropping, which takes place irrespective of whether frames are
      preemptible or express.
      
      So, to fix the problem, we need to call vsc9959_tas_guard_bands_update()
      from ocelot_port_update_active_preemptible_tcs(), and modify the guard
      band logic to consider a different (lower) oversize limit for
      preemptible traffic classes.
      
      Fixes: 403ffc2c ("net: mscc: ocelot: add support for preemptible traffic classes")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Message-ID: <20230705104422.49025-4-vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c6efb4ae
    • Vladimir Oltean's avatar
      net: dsa: felix: make vsc9959_tas_guard_bands_update() visible to ocelot->ops · c6081914
      Vladimir Oltean authored
      In a future change we will need to make
      ocelot_port_update_active_preemptible_tcs() call
      vsc9959_tas_guard_bands_update(), but that is currently not possible,
      since the ocelot switch lib does not have access to functions private to
      the DSA wrapper.
      
      Move the pointer to vsc9959_tas_guard_bands_update() from felix->info
      (which is private to the DSA driver) to ocelot->ops (which is also
      visible to the ocelot switch lib).
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Message-ID: <20230705104422.49025-3-vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c6081914
    • Vladimir Oltean's avatar
      net: mscc: ocelot: extend ocelot->fwd_domain_lock to cover ocelot->tas_lock · 009d30f1
      Vladimir Oltean authored
      In a future commit we will have to call vsc9959_tas_guard_bands_update()
      from ocelot_port_update_active_preemptible_tcs(), and that will be
      impossible due to the AB/BA locking dependencies between
      ocelot->tas_lock and ocelot->fwd_domain_lock.
      
      Just like we did in commit 3ff468ef ("net: mscc: ocelot: remove
      struct ocelot_mm_state :: lock"), the only solution is to expand the
      scope of ocelot->fwd_domain_lock for it to also serialize changes made
      to the Time-Aware Shaper, because those will have to result in a
      recalculation of cut-through TCs, which is something that depends on the
      forwarding domain.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Message-ID: <20230705104422.49025-2-vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      009d30f1
  6. 06 Jul, 2023 2 commits
  7. 05 Jul, 2023 5 commits
    • Thadeu Lima de Souza Cascardo's avatar
      netfilter: nf_tables: prevent OOB access in nft_byteorder_eval · caf3ef74
      Thadeu Lima de Souza Cascardo authored
      When evaluating byteorder expressions with size 2, a union with 32-bit and
      16-bit members is used. Since the 16-bit members are aligned to 32-bit,
      the array accesses will be out-of-bounds.
      
      It may lead to a stack-out-of-bounds access like the one below:
      
      [   23.095215] ==================================================================
      [   23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
      [   23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
      [   23.096358]
      [   23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
      [   23.096770] Call Trace:
      [   23.096910]  <IRQ>
      [   23.097030]  dump_stack_lvl+0x60/0xc0
      [   23.097218]  print_report+0xcf/0x630
      [   23.097388]  ? nft_byteorder_eval+0x13c/0x320
      [   23.097577]  ? kasan_addr_to_slab+0xd/0xc0
      [   23.097760]  ? nft_byteorder_eval+0x13c/0x320
      [   23.097949]  kasan_report+0xc9/0x110
      [   23.098106]  ? nft_byteorder_eval+0x13c/0x320
      [   23.098298]  __asan_load2+0x83/0xd0
      [   23.098453]  nft_byteorder_eval+0x13c/0x320
      [   23.098659]  nft_do_chain+0x1c8/0xc50
      [   23.098852]  ? __pfx_nft_do_chain+0x10/0x10
      [   23.099078]  ? __kasan_check_read+0x11/0x20
      [   23.099295]  ? __pfx___lock_acquire+0x10/0x10
      [   23.099535]  ? __pfx___lock_acquire+0x10/0x10
      [   23.099745]  ? __kasan_check_read+0x11/0x20
      [   23.099929]  nft_do_chain_ipv4+0xfe/0x140
      [   23.100105]  ? __pfx_nft_do_chain_ipv4+0x10/0x10
      [   23.100327]  ? lock_release+0x204/0x400
      [   23.100515]  ? nf_hook.constprop.0+0x340/0x550
      [   23.100779]  nf_hook_slow+0x6c/0x100
      [   23.100977]  ? __pfx_nft_do_chain_ipv4+0x10/0x10
      [   23.101223]  nf_hook.constprop.0+0x334/0x550
      [   23.101443]  ? __pfx_ip_local_deliver_finish+0x10/0x10
      [   23.101677]  ? __pfx_nf_hook.constprop.0+0x10/0x10
      [   23.101882]  ? __pfx_ip_rcv_finish+0x10/0x10
      [   23.102071]  ? __pfx_ip_local_deliver_finish+0x10/0x10
      [   23.102291]  ? rcu_read_lock_held+0x4b/0x70
      [   23.102481]  ip_local_deliver+0xbb/0x110
      [   23.102665]  ? __pfx_ip_rcv+0x10/0x10
      [   23.102839]  ip_rcv+0x199/0x2a0
      [   23.102980]  ? __pfx_ip_rcv+0x10/0x10
      [   23.103140]  __netif_receive_skb_one_core+0x13e/0x150
      [   23.103362]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
      [   23.103647]  ? mark_held_locks+0x48/0xa0
      [   23.103819]  ? process_backlog+0x36c/0x380
      [   23.103999]  __netif_receive_skb+0x23/0xc0
      [   23.104179]  process_backlog+0x91/0x380
      [   23.104350]  __napi_poll.constprop.0+0x66/0x360
      [   23.104589]  ? net_rx_action+0x1cb/0x610
      [   23.104811]  net_rx_action+0x33e/0x610
      [   23.105024]  ? _raw_spin_unlock+0x23/0x50
      [   23.105257]  ? __pfx_net_rx_action+0x10/0x10
      [   23.105485]  ? mark_held_locks+0x48/0xa0
      [   23.105741]  __do_softirq+0xfa/0x5ab
      [   23.105956]  ? __dev_queue_xmit+0x765/0x1c00
      [   23.106193]  do_softirq.part.0+0x49/0xc0
      [   23.106423]  </IRQ>
      [   23.106547]  <TASK>
      [   23.106670]  __local_bh_enable_ip+0xf5/0x120
      [   23.106903]  __dev_queue_xmit+0x789/0x1c00
      [   23.107131]  ? __pfx___dev_queue_xmit+0x10/0x10
      [   23.107381]  ? find_held_lock+0x8e/0xb0
      [   23.107585]  ? lock_release+0x204/0x400
      [   23.107798]  ? neigh_resolve_output+0x185/0x350
      [   23.108049]  ? mark_held_locks+0x48/0xa0
      [   23.108265]  ? neigh_resolve_output+0x185/0x350
      [   23.108514]  neigh_resolve_output+0x246/0x350
      [   23.108753]  ? neigh_resolve_output+0x246/0x350
      [   23.109003]  ip_finish_output2+0x3c3/0x10b0
      [   23.109250]  ? __pfx_ip_finish_output2+0x10/0x10
      [   23.109510]  ? __pfx_nf_hook+0x10/0x10
      [   23.109732]  __ip_finish_output+0x217/0x390
      [   23.109978]  ip_finish_output+0x2f/0x130
      [   23.110207]  ip_output+0xc9/0x170
      [   23.110404]  ip_push_pending_frames+0x1a0/0x240
      [   23.110652]  raw_sendmsg+0x102e/0x19e0
      [   23.110871]  ? __pfx_raw_sendmsg+0x10/0x10
      [   23.111093]  ? lock_release+0x204/0x400
      [   23.111304]  ? __mod_lruvec_page_state+0x148/0x330
      [   23.111567]  ? find_held_lock+0x8e/0xb0
      [   23.111777]  ? find_held_lock+0x8e/0xb0
      [   23.111993]  ? __rcu_read_unlock+0x7c/0x2f0
      [   23.112225]  ? aa_sk_perm+0x18a/0x550
      [   23.112431]  ? filemap_map_pages+0x4f1/0x900
      [   23.112665]  ? __pfx_aa_sk_perm+0x10/0x10
      [   23.112880]  ? find_held_lock+0x8e/0xb0
      [   23.113098]  inet_sendmsg+0xa0/0xb0
      [   23.113297]  ? inet_sendmsg+0xa0/0xb0
      [   23.113500]  ? __pfx_inet_sendmsg+0x10/0x10
      [   23.113727]  sock_sendmsg+0xf4/0x100
      [   23.113924]  ? move_addr_to_kernel.part.0+0x4f/0xa0
      [   23.114190]  __sys_sendto+0x1d4/0x290
      [   23.114391]  ? __pfx___sys_sendto+0x10/0x10
      [   23.114621]  ? __pfx_mark_lock.part.0+0x10/0x10
      [   23.114869]  ? lock_release+0x204/0x400
      [   23.115076]  ? find_held_lock+0x8e/0xb0
      [   23.115287]  ? rcu_is_watching+0x23/0x60
      [   23.115503]  ? __rseq_handle_notify_resume+0x6e2/0x860
      [   23.115778]  ? __kasan_check_write+0x14/0x30
      [   23.116008]  ? blkcg_maybe_throttle_current+0x8d/0x770
      [   23.116285]  ? mark_held_locks+0x28/0xa0
      [   23.116503]  ? do_syscall_64+0x37/0x90
      [   23.116713]  __x64_sys_sendto+0x7f/0xb0
      [   23.116924]  do_syscall_64+0x59/0x90
      [   23.117123]  ? irqentry_exit_to_user_mode+0x25/0x30
      [   23.117387]  ? irqentry_exit+0x77/0xb0
      [   23.117593]  ? exc_page_fault+0x92/0x140
      [   23.117806]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      [   23.118081] RIP: 0033:0x7f744aee2bba
      [   23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
      [   23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [   23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
      [   23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
      [   23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
      [   23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
      [   23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
      [   23.121617]  </TASK>
      [   23.121749]
      [   23.121845] The buggy address belongs to the virtual mapping at
      [   23.121845]  [ffffc90000000000, ffffc90000009000) created by:
      [   23.121845]  irq_init_percpu_irqstack+0x1cf/0x270
      [   23.122707]
      [   23.122803] The buggy address belongs to the physical page:
      [   23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
      [   23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
      [   23.123998] page_type: 0xffffffff()
      [   23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
      [   23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
      [   23.125023] page dumped because: kasan: bad access detected
      [   23.125326]
      [   23.125421] Memory state around the buggy address:
      [   23.125682]  ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   23.126072]  ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
      [   23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
      [   23.126840]                                               ^
      [   23.127138]  ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
      [   23.127522]  ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
      [   23.127906] ==================================================================
      [   23.128324] Disabling lock debugging due to kernel taint
      
      Using simple s16 pointers for the 16-bit accesses fixes the problem. For
      the 32-bit accesses, src and dst can be used directly.
      
      Fixes: 96518518 ("netfilter: add nftables")
      Cc: stable@vger.kernel.org
      Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      caf3ef74
    • Linus Torvalds's avatar
      Merge tag 'net-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 68433066
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bluetooth, bpf and wireguard.
      
        Current release - regressions:
      
         - nvme-tcp: fix comma-related oops after sendpage changes
      
        Current release - new code bugs:
      
         - ptp: make max_phase_adjustment sysfs device attribute invisible
           when not supported
      
        Previous releases - regressions:
      
         - sctp: fix potential deadlock on &net->sctp.addr_wq_lock
      
         - mptcp:
            - ensure subflow is unhashed before cleaning the backlog
            - do not rely on implicit state check in mptcp_listen()
      
        Previous releases - always broken:
      
         - net: fix net_dev_start_xmit trace event vs skb_transport_offset()
      
         - Bluetooth:
            - fix use-bdaddr-property quirk
            - L2CAP: fix multiple UaFs
            - ISO: use hci_sync for setting CIG parameters
            - hci_event: fix Set CIG Parameters error status handling
            - hci_event: fix parsing of CIS Established Event
            - MGMT: fix marking SCAN_RSP as not connectable
      
         - wireguard: queuing: use saner cpu selection wrapping
      
         - sched: act_ipt: various bug fixes for iptables <> TC interactions
      
         - sched: act_pedit: add size check for TCA_PEDIT_PARMS_EX
      
         - dsa: fixes for receiving PTP packets with 8021q and sja1105 tagging
      
         - eth: sfc: fix null-deref in devlink port without MAE access
      
         - eth: ibmvnic: do not reset dql stats on NON_FATAL err
      
        Misc:
      
         - xsk: honor SO_BINDTODEVICE on bind"
      
      * tag 'net-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
        nfp: clean mc addresses in application firmware when closing port
        selftests: mptcp: pm_nl_ctl: fix 32-bit support
        selftests: mptcp: depend on SYN_COOKIES
        selftests: mptcp: userspace_pm: report errors with 'remove' tests
        selftests: mptcp: userspace_pm: use correct server port
        selftests: mptcp: sockopt: return error if wrong mark
        selftests: mptcp: sockopt: use 'iptables-legacy' if available
        selftests: mptcp: connect: fail if nft supposed to work
        mptcp: do not rely on implicit state check in mptcp_listen()
        mptcp: ensure subflow is unhashed before cleaning the backlog
        s390/qeth: Fix vipa deletion
        octeontx-af: fix hardware timestamp configuration
        net: dsa: sja1105: always enable the send_meta options
        net: dsa: tag_sja1105: fix MAC DA patching from meta frames
        net: Replace strlcpy with strscpy
        pptp: Fix fib lookup calls.
        mlxsw: spectrum_router: Fix an IS_ERR() vs NULL check
        net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX
        xsk: Honor SO_BINDTODEVICE on bind
        ptp: Make max_phase_adjustment sysfs device attribute invisible when not supported
        ...
      68433066
    • Linus Torvalds's avatar
      Merge tag 'f2fs-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs · 73a3fcda
      Linus Torvalds authored
      Pull f2fs updates from Jaegeuk Kim:
       "In this cycle, we've mainly investigated the zoned block device
        support along with patches such as correcting write pointers between
        f2fs and storage, adding asynchronous zone reset flow, and managing
        the number of open zones.
      
        Other than them, f2fs adds another mount option, "errors=x" to specify
        how to handle when it detects an unexpected behavior at runtime.
      
        Enhancements:
         - support 'errors=remount-ro|continue|panic' mount option
         - enforce some inode flag policies
         - allow .tmp compression given extensions
         - add some ioctls to manage the f2fs compression
         - improve looped node chain flow
         - avoid issuing small-sized discard commands during checkpoint
         - implement an asynchronous zone reset
      
        Bug fixes:
         - fix deadlock in xattr and inode page lock
         - fix and add sanity check in some error paths
         - fix to avoid NULL pointer dereference f2fs_write_end_io() along
           with put_super
         - set proper flags to quota files
         - fix potential deadlock due to unpaired node_write lock use
         - fix over-estimating free section during FG GC
         - fix the wrong condition to determine atomic context
      
        As usual, also there are a number of patches with code refactoring and
        minor clean-ups"
      
      * tag 'f2fs-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (46 commits)
        f2fs: fix to do sanity check on direct node in truncate_dnode()
        f2fs: only set release for file that has compressed data
        f2fs: fix compile warning in f2fs_destroy_node_manager()
        f2fs: fix error path handling in truncate_dnode()
        f2fs: fix deadlock in i_xattr_sem and inode page lock
        f2fs: remove unneeded page uptodate check/set
        f2fs: update mtime and ctime in move file range method
        f2fs: compress tmp files given extension
        f2fs: refactor struct f2fs_attr macro
        f2fs: convert to use sbi directly
        f2fs: remove redundant assignment to variable err
        f2fs: do not issue small discard commands during checkpoint
        f2fs: check zone write pointer points to the end of zone
        f2fs: add f2fs_ioc_get_compress_blocks
        f2fs: cleanup MIN_INLINE_XATTR_SIZE
        f2fs: add helper to check compression level
        f2fs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
        f2fs: do more sanity check on inode
        f2fs: compress: fix to check validity of i_compress_flag field
        f2fs: add sanity compress level check for compressed file
        ...
      73a3fcda
    • Linus Torvalds's avatar
      Merge tag 'xfs-6.5-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · bb8e7e9f
      Linus Torvalds authored
      Pull more xfs updates from Darrick Wong:
      
       - Fix some ordering problems with log items during log recovery
      
       - Don't deadlock the system by trying to flush busy freed extents while
         holding on to busy freed extents
      
       - Improve validation of log geometry parameters when reading the
         primary superblock
      
       - Validate the length field in the AGF header
      
       - Fix recordset filtering bugs when re-calling GETFSMAP to return more
         results when the resultset didn't previously fit in the caller's
         buffer
      
       - Fix integer overflows in GETFSMAP when working with rt volumes larger
         than 2^32 fsblocks
      
       - Fix GETFSMAP reporting the undefined space beyond the last rtextent
      
       - Fix filtering bugs in GETFSMAP's log device backend if the log ever
         becomes longer than 2^32 fsblocks
      
       - Improve validation of file offsets in the GETFSMAP range parameters
      
       - Fix an off by one bug in the pmem media failure notification
         computation
      
       - Validate the length field in the AGI header too
      
      * tag 'xfs-6.5-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: Remove unneeded semicolon
        xfs: AGI length should be bounds checked
        xfs: fix the calculation for "end" and "length"
        xfs: fix xfs_btree_query_range callers to initialize btree rec fully
        xfs: validate fsmap offsets specified in the query keys
        xfs: fix logdev fsmap query result filtering
        xfs: clean up the rtbitmap fsmap backend
        xfs: fix getfsmap reporting past the last rt extent
        xfs: fix integer overflows in the fsmap rtbitmap and logdev backends
        xfs: fix interval filtering in multi-step fsmap queries
        xfs: fix bounds check in xfs_defer_agfl_block()
        xfs: AGF length has never been bounds checked
        xfs: journal geometry is not properly bounds checked
        xfs: don't block in busy flushing when freeing extents
        xfs: allow extent free intents to be retried
        xfs: pass alloc flags through to xfs_extent_busy_flush()
        xfs: use deferred frees for btree block freeing
        xfs: don't reverse order of items in bulk AIL insertion
        xfs: remove redundant initializations of pointers drop_leaf and save_leaf
      bb8e7e9f
    • Linus Torvalds's avatar
      Merge tag 'pwm/for-6.5-rc1' of... · ace1ba1c
      Linus Torvalds authored
      Merge tag 'pwm/for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
      
      Pull pwm updates from Thierry Reding:
       "There's a little bit of everything in here: we've got various
        improvements and cleanups to drivers, some fixes across the board and
        a bit of new hardware support"
      
      * tag 'pwm/for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (22 commits)
        dt-bindings: pwm: convert pwm-bcm2835 bindings to YAML
        pwm: Add Renesas RZ/G2L MTU3a PWM driver
        pwm: mtk_disp: Fix the disable flow of disp_pwm
        dt-bindings: pwm: restrict node name suffixes
        pwm: pca9685: Switch i2c driver back to use .probe()
        pwm: ab8500: Fix error code in probe()
        MAINTAINERS: add pwm to PolarFire SoC entry
        pwm: add microchip soft ip corePWM driver
        pwm: sysfs: Do not apply state to already disabled PWMs
        pwm: imx-tpm: force 'real_period' to be zero in suspend
        pwm: meson: make full use of common clock framework
        pwm: meson: don't use hdmi/video clock as mux parent
        pwm: meson: switch to using struct clk_parent_data for mux parents
        pwm: meson: remove not needed check in meson_pwm_calc
        pwm: meson: fix handling of period/duty if greater than UINT_MAX
        pwm: meson: modify and simplify calculation in meson_pwm_get_state
        dt-bindings: pwm: Add R-Car V3U device tree bindings
        dt-bindings: pwm: imx: add i.MX8QXP compatible
        pwm: mediatek: Add support for MT7981
        dt-bindings: pwm: mediatek: Add mediatek,mt7981 compatible
        ...
      ace1ba1c