1. 08 Jul, 2023 6 commits
    • David S. Miller's avatar
      Merge branch 's390-ism-fixes' · bbffab69
      David S. Miller authored
      Niklas Schnelle says:
      
      ====================
      s390/ism: Fixes to client handling
      
      This is v2 of the patch previously titled "s390/ism: Detangle ISM client
      IRQ and event forwarding". As suggested by Paolo Abeni I split the patch
      up. While doing so I noticed another problem that was fixed by this patch
      concerning the way the workqueues access the client structs. This means the
      second patch turning the workqueues into simple direct calls also fixes
      a problem. Finally I split off a third patch just for fixing
      ism_unregister_client()s error path.
      
      The code after these 3 patches is identical to the result of the v1 patch
      except that I also turned the dev_err() for still registered DMBs into
      a WARN().
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbffab69
    • Niklas Schnelle's avatar
      s390/ism: Do not unregister clients with registered DMBs · 266deeea
      Niklas Schnelle authored
      When ism_unregister_client() is called but the client still has DMBs
      registered it returns -EBUSY and prints an error. This only happens
      after the client has already been unregistered however. This is
      unexpected as the unregister claims to have failed. Furthermore as this
      implies a client bug a WARN() is more appropriate. Thus move the
      deregistration after the check and use WARN().
      
      Fixes: 89e7d2ba ("net/ism: Add new API for client registration")
      Signed-off-by: default avatarNiklas Schnelle <schnelle@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      266deeea
    • Niklas Schnelle's avatar
      s390/ism: Fix and simplify add()/remove() callback handling · 76631ffa
      Niklas Schnelle authored
      Previously the clients_lock was protecting the clients array against
      concurrent addition/removal of clients but was also accessed from IRQ
      context. This meant that it had to be a spinlock and that the add() and
      remove() callbacks in which clients need to do allocation and take
      mutexes can't be called under the clients_lock. To work around this these
      callbacks were moved to workqueues. This not only introduced significant
      complexity but is also subtly broken in at least one way.
      
      In ism_dev_init() and ism_dev_exit() clients[i]->tgt_ism is used to
      communicate the added/removed ISM device to the work function. While
      write access to client[i]->tgt_ism is protected by the clients_lock and
      the code waits that there is no pending add/remove work before and after
      setting clients[i]->tgt_ism this is not enough. The problem is that the
      wait happens based on per ISM device counters. Thus a concurrent
      ism_dev_init()/ism_dev_exit() for a different ISM device may overwrite
      a clients[i]->tgt_ism between unlocking the clients_lock and the
      subsequent wait for the work to finnish.
      
      Thankfully with the clients_lock no longer held in IRQ context it can be
      turned into a mutex which can be held during the calls to add()/remove()
      completely removing the need for the workqueues and the associated
      broken housekeeping including the per ISM device counters and the
      clients[i]->tgt_ism.
      
      Fixes: 89e7d2ba ("net/ism: Add new API for client registration")
      Signed-off-by: default avatarNiklas Schnelle <schnelle@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76631ffa
    • Niklas Schnelle's avatar
      s390/ism: Fix locking for forwarding of IRQs and events to clients · 6b5c13b5
      Niklas Schnelle authored
      The clients array references all registered clients and is protected by
      the clients_lock. Besides its use as general list of clients the clients
      array is accessed in ism_handle_irq() to forward ISM device events to
      clients.
      
      While the clients_lock is taken in the IRQ handler when calling
      handle_event() it is however incorrectly not held during the
      client->handle_irq() call and for the preceding clients[] access leaving
      it unprotected against concurrent client (un-)registration.
      
      Furthermore the accesses to ism->sba_client_arr[] in ism_register_dmb()
      and ism_unregister_dmb() are not protected by any lock. This is
      especially problematic as the client ID from the ism->sba_client_arr[]
      is not checked against NO_CLIENT and neither is the client pointer
      checked.
      
      Instead of expanding the use of the clients_lock further add a separate
      array in struct ism_dev which references clients subscribed to the
      device's events and IRQs. This array is protected by ism->lock which is
      already taken in ism_handle_irq() and can be taken outside the IRQ
      handler when adding/removing subscribers or the accessing
      ism->sba_client_arr[]. This also means that the clients_lock is no
      longer taken in IRQ context.
      
      Fixes: 89e7d2ba ("net/ism: Add new API for client registration")
      Signed-off-by: default avatarNiklas Schnelle <schnelle@linux.ibm.com>
      Reviewed-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b5c13b5
    • Paolo Abeni's avatar
      net: prevent skb corruption on frag list segmentation · c329b261
      Paolo Abeni authored
      Ian reported several skb corruptions triggered by rx-gro-list,
      collecting different oops alike:
      
      [   62.624003] BUG: kernel NULL pointer dereference, address: 00000000000000c0
      [   62.631083] #PF: supervisor read access in kernel mode
      [   62.636312] #PF: error_code(0x0000) - not-present page
      [   62.641541] PGD 0 P4D 0
      [   62.644174] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [   62.648629] CPU: 1 PID: 913 Comm: napi/eno2-79 Not tainted 6.4.0 #364
      [   62.655162] Hardware name: Supermicro Super Server/A2SDi-12C-HLN4F, BIOS 1.7a 10/13/2022
      [   62.663344] RIP: 0010:__udp_gso_segment (./include/linux/skbuff.h:2858
      ./include/linux/udp.h:23 net/ipv4/udp_offload.c:228 net/ipv4/udp_offload.c:261
      net/ipv4/udp_offload.c:277)
      [   62.687193] RSP: 0018:ffffbd3a83b4f868 EFLAGS: 00010246
      [   62.692515] RAX: 00000000000000ce RBX: 0000000000000000 RCX: 0000000000000000
      [   62.699743] RDX: ffffa124def8a000 RSI: 0000000000000079 RDI: ffffa125952a14d4
      [   62.706970] RBP: ffffa124def8a000 R08: 0000000000000022 R09: 00002000001558c9
      [   62.714199] R10: 0000000000000000 R11: 00000000be554639 R12: 00000000000000e2
      [   62.721426] R13: ffffa125952a1400 R14: ffffa125952a1400 R15: 00002000001558c9
      [   62.728654] FS:  0000000000000000(0000) GS:ffffa127efa40000(0000)
      knlGS:0000000000000000
      [   62.736852] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   62.742702] CR2: 00000000000000c0 CR3: 00000001034b0000 CR4: 00000000003526e0
      [   62.749948] Call Trace:
      [   62.752498]  <TASK>
      [   62.779267] inet_gso_segment (net/ipv4/af_inet.c:1398)
      [   62.787605] skb_mac_gso_segment (net/core/gro.c:141)
      [   62.791906] __skb_gso_segment (net/core/dev.c:3403 (discriminator 2))
      [   62.800492] validate_xmit_skb (./include/linux/netdevice.h:4862
      net/core/dev.c:3659)
      [   62.804695] validate_xmit_skb_list (net/core/dev.c:3710)
      [   62.809158] sch_direct_xmit (net/sched/sch_generic.c:330)
      [   62.813198] __dev_queue_xmit (net/core/dev.c:3805 net/core/dev.c:4210)
      net/netfilter/core.c:626)
      [   62.821093] br_dev_queue_push_xmit (net/bridge/br_forward.c:55)
      [   62.825652] maybe_deliver (net/bridge/br_forward.c:193)
      [   62.829420] br_flood (net/bridge/br_forward.c:233)
      [   62.832758] br_handle_frame_finish (net/bridge/br_input.c:215)
      [   62.837403] br_handle_frame (net/bridge/br_input.c:298
      net/bridge/br_input.c:416)
      [   62.851417] __netif_receive_skb_core.constprop.0 (net/core/dev.c:5387)
      [   62.866114] __netif_receive_skb_list_core (net/core/dev.c:5570)
      [   62.871367] netif_receive_skb_list_internal (net/core/dev.c:5638
      net/core/dev.c:5727)
      [   62.876795] napi_complete_done (./include/linux/list.h:37
      ./include/net/gro.h:434 ./include/net/gro.h:429 net/core/dev.c:6067)
      [   62.881004] ixgbe_poll (drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3191)
      [   62.893534] __napi_poll (net/core/dev.c:6498)
      [   62.897133] napi_threaded_poll (./include/linux/netpoll.h:89
      net/core/dev.c:6640)
      [   62.905276] kthread (kernel/kthread.c:379)
      [   62.913435] ret_from_fork (arch/x86/entry/entry_64.S:314)
      [   62.917119]  </TASK>
      
      In the critical scenario, rx-gro-list GRO-ed packets are fed, via a
      bridge, both to the local input path and to an egress device (tun).
      
      The segmentation of such packets unsafely writes to the cloned skbs
      with shared heads.
      
      This change addresses the issue by uncloning as needed the
      to-be-segmented skbs.
      Reported-by: default avatarIan Kumlien <ian.kumlien@gmail.com>
      Tested-by: default avatarIan Kumlien <ian.kumlien@gmail.com>
      Fixes: 3a1296a3 ("net: Support GRO/GSO fraglist chaining.")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c329b261
    • Rafał Miłecki's avatar
      net: bgmac: postpone turning IRQs off to avoid SoC hangs · e7731194
      Rafał Miłecki authored
      Turning IRQs off is done by accessing Ethernet controller registers.
      That can't be done until device's clock is enabled. It results in a SoC
      hang otherwise.
      
      This bug remained unnoticed for years as most bootloaders keep all
      Ethernet interfaces turned on. It seems to only affect a niche SoC
      family BCM47189. It has two Ethernet controllers but CFE bootloader uses
      only the first one.
      
      Fixes: 34322615 ("net: bgmac: Mask interrupts during probe")
      Signed-off-by: default avatarRafał Miłecki <rafal@milecki.pl>
      Reviewed-by: default avatarMichal Kubiak <michal.kubiak@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7731194
  2. 07 Jul, 2023 15 commits
    • Ivan Babrou's avatar
      udp6: add a missing call into udp_fail_queue_rcv_skb tracepoint · 8139dccd
      Ivan Babrou authored
      The tracepoint has existed for 12 years, but it only covered udp
      over the legacy IPv4 protocol. Having it enabled for udp6 removes
      the unnecessary difference in error visibility.
      Signed-off-by: default avatarIvan Babrou <ivan@cloudflare.com>
      Fixes: 296f7ea7 ("udp: add tracepoints for queueing skb to rcvbuf")
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8139dccd
    • Shannon Nelson's avatar
      ionic: remove dead device fail path · 3a7af34f
      Shannon Nelson authored
      Remove the probe error path code that leaves the driver bound
      to the device, but with essentially a dead device.  This was
      useful maybe twice early in the driver's life and no longer
      makes sense to keep.
      
      Fixes: 30a1e6d0 ("ionic: keep ionic dev on lif init fail")
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a7af34f
    • Nitya Sunkad's avatar
      ionic: remove WARN_ON to prevent panic_on_warn · abfb2a58
      Nitya Sunkad authored
      Remove unnecessary early code development check and the WARN_ON
      that it uses.  The irq alloc and free paths have long been
      cleaned up and this check shouldn't have stuck around so long.
      
      Fixes: 77ceb68e ("ionic: Add notifyq support")
      Signed-off-by: default avatarNitya Sunkad <nitya.sunkad@amd.com>
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abfb2a58
    • Sai Krishna's avatar
      octeontx2-af: Move validation of ptp pointer before its usage · 7709fbd4
      Sai Krishna authored
      Moved PTP pointer validation before its use to avoid smatch warning.
      Also used kzalloc/kfree instead of devm_kzalloc/devm_kfree.
      
      Fixes: 2ef4e45d ("octeontx2-af: Add PTP PPS Errata workaround on CN10K silicon")
      Signed-off-by: default avatarNaveen Mamindlapalli <naveenm@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarSai Krishna <saikrishnag@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7709fbd4
    • Ratheesh Kannoth's avatar
      octeontx2-af: Promisc enable/disable through mbox · af42088b
      Ratheesh Kannoth authored
      In legacy silicon, promiscuous mode is only modified
      through CGX mbox messages. In CN10KB silicon, it is modified
      from CGX mbox and NIX. This breaks legacy application
      behaviour. Fix this by removing call from NIX.
      
      Fixes: d6c9784b ("octeontx2-af: Invoke exact match functions if supported")
      Signed-off-by: default avatarRatheesh Kannoth <rkannoth@marvell.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarMichal Kubiak <michal.kubiak@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af42088b
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · b61aac02
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-07-05 (igc)
      
      This series contains updates to igc driver only.
      
      Husaini adds check to increment Qbv change error counter only on taprio
      Qbvs. He also removes delay during Tx ring configuration and
      resolves Tx hang that could occur when transmitting on a gate to be
      closed.
      
      Prasad Koya reports ethtool link mode as TP (twisted pair).
      
      Tee Min corrects value for max SDU.
      
      Aravindhan ensures that registers for PPS are always programmed to occur
      in future.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b61aac02
    • Junfeng Guo's avatar
      gve: Set default duplex configuration to full · 0503efea
      Junfeng Guo authored
      Current duplex mode was unset in the driver, resulting in the default
      parameter being set to 0, which corresponds to half duplex. It might
      mislead users to have incorrect expectation about the driver's
      transmission capabilities.
      Set the default duplex configuration to full, as the driver runs in
      full duplex mode at this point.
      
      Fixes: 7e074d5a ("gve: Enable Link Speed Reporting in the driver.")
      Signed-off-by: default avatarJunfeng Guo <junfeng.guo@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Message-ID: <20230706044128.2726747-1-junfeng.guo@intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0503efea
    • Jakub Kicinski's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 41b9eff0
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-07-05 (ice)
      
      This series contains updates to ice driver only.
      
      Sridhar fixes incorrect comparison of max Tx rate limit to occur against
      each TC value rather than the aggregate. He also resolves an issue with
      the wrong VSI being used when setting max Tx rate when TCs are enabled.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        ice: Fix tx queue rate limit when TCs are configured
        ice: Fix max_rate check while configuring TX rate limits
      ====================
      
      Link: https://lore.kernel.org/r/20230705201346.49370-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      41b9eff0
    • Jakub Kicinski's avatar
      Merge tag 'mlx5-fixes-2023-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 4863b57b
      Jakub Kicinski authored
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2023-07-05
      
      This series provides bug fixes to mlx5 driver.
      
      * tag 'mlx5-fixes-2023-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
        net/mlx5e: RX, Fix page_pool page fragment tracking for XDP
        net/mlx5: Query hca_cap_2 only when supported
        net/mlx5e: TC, CT: Offload ct clear only once
        net/mlx5e: Check for NOT_READY flag state after locking
        net/mlx5: Register a unique thermal zone per device
        net/mlx5e: RX, Fix flush and close release flow of regular rq for legacy rq
        net/mlx5e: fix memory leak in mlx5e_ptp_open
        net/mlx5e: fix memory leak in mlx5e_fs_tt_redirect_any_create
        net/mlx5e: fix double free in mlx5e_destroy_flow_table
      ====================
      
      Link: https://lore.kernel.org/r/20230705175757.284614-1-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4863b57b
    • M A Ramdhan's avatar
      net/sched: cls_fw: Fix improper refcount update leads to use-after-free · 0323bce5
      M A Ramdhan authored
      In the event of a failure in tcf_change_indev(), fw_set_parms() will
      immediately return an error after incrementing or decrementing
      reference counter in tcf_bind_filter().  If attacker can control
      reference counter to zero and make reference freed, leading to
      use after free.
      
      In order to prevent this, move the point of possible failure above the
      point where the TC_FW_CLASSID is handled.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarM A Ramdhan <ramdhan@starlabs.sg>
      Signed-off-by: default avatarM A Ramdhan <ramdhan@starlabs.sg>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Message-ID: <20230705161530.52003-1-ramdhan@starlabs.sg>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0323bce5
    • Quan Zhou's avatar
      wifi: mt76: mt7921e: fix init command fail with enabled device · 525c469e
      Quan Zhou authored
      For some cases as below, we may encounter the unpreditable chip stats
      in driver probe()
      * The system reboot flow do not work properly, such as kernel oops while
        rebooting, and then the driver do not go back to default status at
        this moment.
      * Similar to the flow above. If the device was enabled in BIOS or UEFI,
        the system may switch to Linux without driver fully shutdown.
      
      To avoid the problem, force push the device back to default in probe()
      * mt7921e_mcu_fw_pmctrl() : return control privilege to chip side.
      * mt7921_wfsys_reset()    : cleanup chip config before resource init.
      
      Error log
      [59007.600714] mt7921e 0000:02:00.0: ASIC revision: 79220010
      [59010.889773] mt7921e 0000:02:00.0: Message 00000010 (seq 1) timeout
      [59010.889786] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59014.217839] mt7921e 0000:02:00.0: Message 00000010 (seq 2) timeout
      [59014.217852] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59017.545880] mt7921e 0000:02:00.0: Message 00000010 (seq 3) timeout
      [59017.545893] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59020.874086] mt7921e 0000:02:00.0: Message 00000010 (seq 4) timeout
      [59020.874099] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59024.202019] mt7921e 0000:02:00.0: Message 00000010 (seq 5) timeout
      [59024.202033] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59027.530082] mt7921e 0000:02:00.0: Message 00000010 (seq 6) timeout
      [59027.530096] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59030.857888] mt7921e 0000:02:00.0: Message 00000010 (seq 7) timeout
      [59030.857904] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59034.185946] mt7921e 0000:02:00.0: Message 00000010 (seq 8) timeout
      [59034.185961] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59037.514249] mt7921e 0000:02:00.0: Message 00000010 (seq 9) timeout
      [59037.514262] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59040.842362] mt7921e 0000:02:00.0: Message 00000010 (seq 10) timeout
      [59040.842375] mt7921e 0000:02:00.0: Failed to get patch semaphore
      [59040.923845] mt7921e 0000:02:00.0: hardware init failed
      
      Cc: stable@vger.kernel.org
      Fixes: 5c14a5f9 ("mt76: mt7921: introduce mt7921e support")
      Tested-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Tested-by: default avatarJuan Martinez <juan.martinez@amd.com>
      Co-developed-by: default avatarLeon Yen <leon.yen@mediatek.com>
      Signed-off-by: default avatarLeon Yen <leon.yen@mediatek.com>
      Signed-off-by: default avatarQuan Zhou <quan.zhou@mediatek.com>
      Signed-off-by: default avatarDeren Wu <deren.wu@mediatek.com>
      Message-ID: <39fcb7cee08d4ab940d38d82f21897483212483f.1688569385.git.deren.wu@mediatek.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      525c469e
    • Jakub Kicinski's avatar
      Merge branch 'fix-dropping-of-oversize-preemptible-frames-with-felix-dsa-driver' · 1ce1a745
      Jakub Kicinski authored
      Vladimir Oltean says:
      
      ====================
      Fix dropping of oversize preemptible frames with felix DSA driver
      
      It has been reported that preemptible traffic doesn't completely behave
      as expected. Namely, large packets should be able to be squeezed
      (through fragmentation) through taprio time slots smaller than the
      transmission time of the full frame. That does not happen due to logic
      in the driver (for oversize frame dropping with taprio) that was not
      updated in order for this use case to work.
      
      I am not sure whether it qualifies as "net" material, because some
      structural changes are involved, and it is a "never worked" scenario.
      OTOH, this is a complaint coming from users for a v6.4 kernel.
      It's up to maintainers to decide whether this series can be considered;
      I've submitted it as non-RFC in the optimistic case that it will be :)
      
      Demo script illustrating the issue below.
      
      add_taprio()
      {
      	local ifname=$1
      
      	echo "Creating root taprio"
      	tc qdisc replace dev $ifname handle 8001: parent root stab overhead 24 taprio \
      		num_tc 8 \
      		map 0 1 2 3 4 5 6 7 \
      		queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
      		base-time 0 \
      		sched-entry S 01 1216 \
      		sched-entry S fe 12368 \
      		fp P E E E E E E E \
      		flags 0x2
      }
      
      remove_taprio()
      {
      	local ifname=$1
      
      	echo "Removing taprio"
      	tc qdisc del dev $ifname root
      }
      
      ip netns add ns0
      ip link set eno0 netns ns0 && ip -n ns0 link set eno0 up && ip -n ns0 addr add 192.168.100.1/24 dev eno0
      ip addr add 192.168.100.2/24 dev swp0 && ip link set swp0 up
      ip netns exec ns0 ethtool --set-mm eno0 pmac-enabled on verify-enabled off tx-enabled on
      ethtool --set-mm swp0 pmac-enabled on verify-enabled off tx-enabled on
      add_taprio swp0
      
      ping 192.168.100.1 -s 1000 -c 5 # sent through TC0
      ethtool -I --show-mm swp0 | grep MACMergeFragCountTx # should increase
      
      ip addr flush swp0 && ip link set swp0 down
      remove_taprio swp0
      ethtool --set-mm swp0 pmac-enabled off verify-enabled off tx-enabled off
      ip netns exec ns0 ethtool --set-mm eno0 pmac-enabled off verify-enabled off tx-enabled off
      ip netns del ns0
      ====================
      
      Link: https://lore.kernel.org/r/20230705104422.49025-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1ce1a745
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix oversize frame dropping for preemptible TCs · c6efb4ae
      Vladimir Oltean authored
      This switch implements Hold/Release in a strange way, with no control
      from the user as required by IEEE 802.1Q-2018 through Set-And-Hold-MAC
      and Set-And-Release-MAC, but rather, it emits HOLD requests implicitly
      based on the schedule.
      
      Namely, when the gate of a preemptible TC is about to close (actually
      QSYS::PREEMPTION_CFG.HOLD_ADVANCE octet times in advance of this event),
      the QSYS seems to emit a HOLD request pulse towards the MAC which
      preempts the currently transmitted packet, and further packets are held
      back in the queue system.
      
      This allows large frames to be squeezed through small time slots,
      because HOLD requests initiated by the gate events result in the frame
      being segmented in multiple fragments, the bit time of which is equal to
      the size of the time slot.
      
      It has been reported that the vsc9959_tas_guard_bands_update() logic
      breaks this, because it doesn't take preemptible TCs into account, and
      enables oversized frame dropping when the time slot doesn't allow a full
      MTU to be sent, but it does allow 2*minFragSize to be sent (128B).
      Packets larger than 128B are dropped instead of being sent in multiple
      fragments.
      
      Confusingly, the manual says:
      
      | For guard band, SDU calculation of a traffic class of a port, if
      | preemption is enabled (through 'QSYS::PREEMPTION_CFG.P_QUEUES') then
      | QSYS::PREEMPTION_CFG.HOLD_ADVANCE is used, otherwise
      | QSYS::QMAXSDU_CFG_*.QMAXSDU_* is used.
      
      but this only refers to the static guard band durations, and the
      QMAXSDU_CFG_* registers have dual purpose - the other being oversized
      frame dropping, which takes place irrespective of whether frames are
      preemptible or express.
      
      So, to fix the problem, we need to call vsc9959_tas_guard_bands_update()
      from ocelot_port_update_active_preemptible_tcs(), and modify the guard
      band logic to consider a different (lower) oversize limit for
      preemptible traffic classes.
      
      Fixes: 403ffc2c ("net: mscc: ocelot: add support for preemptible traffic classes")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Message-ID: <20230705104422.49025-4-vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c6efb4ae
    • Vladimir Oltean's avatar
      net: dsa: felix: make vsc9959_tas_guard_bands_update() visible to ocelot->ops · c6081914
      Vladimir Oltean authored
      In a future change we will need to make
      ocelot_port_update_active_preemptible_tcs() call
      vsc9959_tas_guard_bands_update(), but that is currently not possible,
      since the ocelot switch lib does not have access to functions private to
      the DSA wrapper.
      
      Move the pointer to vsc9959_tas_guard_bands_update() from felix->info
      (which is private to the DSA driver) to ocelot->ops (which is also
      visible to the ocelot switch lib).
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Message-ID: <20230705104422.49025-3-vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c6081914
    • Vladimir Oltean's avatar
      net: mscc: ocelot: extend ocelot->fwd_domain_lock to cover ocelot->tas_lock · 009d30f1
      Vladimir Oltean authored
      In a future commit we will have to call vsc9959_tas_guard_bands_update()
      from ocelot_port_update_active_preemptible_tcs(), and that will be
      impossible due to the AB/BA locking dependencies between
      ocelot->tas_lock and ocelot->fwd_domain_lock.
      
      Just like we did in commit 3ff468ef ("net: mscc: ocelot: remove
      struct ocelot_mm_state :: lock"), the only solution is to expand the
      scope of ocelot->fwd_domain_lock for it to also serialize changes made
      to the Time-Aware Shaper, because those will have to result in a
      recalculation of cut-through TCs, which is something that depends on the
      forwarding domain.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Message-ID: <20230705104422.49025-2-vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      009d30f1
  3. 06 Jul, 2023 2 commits
  4. 05 Jul, 2023 17 commits
    • Thadeu Lima de Souza Cascardo's avatar
      netfilter: nf_tables: prevent OOB access in nft_byteorder_eval · caf3ef74
      Thadeu Lima de Souza Cascardo authored
      When evaluating byteorder expressions with size 2, a union with 32-bit and
      16-bit members is used. Since the 16-bit members are aligned to 32-bit,
      the array accesses will be out-of-bounds.
      
      It may lead to a stack-out-of-bounds access like the one below:
      
      [   23.095215] ==================================================================
      [   23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
      [   23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
      [   23.096358]
      [   23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
      [   23.096770] Call Trace:
      [   23.096910]  <IRQ>
      [   23.097030]  dump_stack_lvl+0x60/0xc0
      [   23.097218]  print_report+0xcf/0x630
      [   23.097388]  ? nft_byteorder_eval+0x13c/0x320
      [   23.097577]  ? kasan_addr_to_slab+0xd/0xc0
      [   23.097760]  ? nft_byteorder_eval+0x13c/0x320
      [   23.097949]  kasan_report+0xc9/0x110
      [   23.098106]  ? nft_byteorder_eval+0x13c/0x320
      [   23.098298]  __asan_load2+0x83/0xd0
      [   23.098453]  nft_byteorder_eval+0x13c/0x320
      [   23.098659]  nft_do_chain+0x1c8/0xc50
      [   23.098852]  ? __pfx_nft_do_chain+0x10/0x10
      [   23.099078]  ? __kasan_check_read+0x11/0x20
      [   23.099295]  ? __pfx___lock_acquire+0x10/0x10
      [   23.099535]  ? __pfx___lock_acquire+0x10/0x10
      [   23.099745]  ? __kasan_check_read+0x11/0x20
      [   23.099929]  nft_do_chain_ipv4+0xfe/0x140
      [   23.100105]  ? __pfx_nft_do_chain_ipv4+0x10/0x10
      [   23.100327]  ? lock_release+0x204/0x400
      [   23.100515]  ? nf_hook.constprop.0+0x340/0x550
      [   23.100779]  nf_hook_slow+0x6c/0x100
      [   23.100977]  ? __pfx_nft_do_chain_ipv4+0x10/0x10
      [   23.101223]  nf_hook.constprop.0+0x334/0x550
      [   23.101443]  ? __pfx_ip_local_deliver_finish+0x10/0x10
      [   23.101677]  ? __pfx_nf_hook.constprop.0+0x10/0x10
      [   23.101882]  ? __pfx_ip_rcv_finish+0x10/0x10
      [   23.102071]  ? __pfx_ip_local_deliver_finish+0x10/0x10
      [   23.102291]  ? rcu_read_lock_held+0x4b/0x70
      [   23.102481]  ip_local_deliver+0xbb/0x110
      [   23.102665]  ? __pfx_ip_rcv+0x10/0x10
      [   23.102839]  ip_rcv+0x199/0x2a0
      [   23.102980]  ? __pfx_ip_rcv+0x10/0x10
      [   23.103140]  __netif_receive_skb_one_core+0x13e/0x150
      [   23.103362]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
      [   23.103647]  ? mark_held_locks+0x48/0xa0
      [   23.103819]  ? process_backlog+0x36c/0x380
      [   23.103999]  __netif_receive_skb+0x23/0xc0
      [   23.104179]  process_backlog+0x91/0x380
      [   23.104350]  __napi_poll.constprop.0+0x66/0x360
      [   23.104589]  ? net_rx_action+0x1cb/0x610
      [   23.104811]  net_rx_action+0x33e/0x610
      [   23.105024]  ? _raw_spin_unlock+0x23/0x50
      [   23.105257]  ? __pfx_net_rx_action+0x10/0x10
      [   23.105485]  ? mark_held_locks+0x48/0xa0
      [   23.105741]  __do_softirq+0xfa/0x5ab
      [   23.105956]  ? __dev_queue_xmit+0x765/0x1c00
      [   23.106193]  do_softirq.part.0+0x49/0xc0
      [   23.106423]  </IRQ>
      [   23.106547]  <TASK>
      [   23.106670]  __local_bh_enable_ip+0xf5/0x120
      [   23.106903]  __dev_queue_xmit+0x789/0x1c00
      [   23.107131]  ? __pfx___dev_queue_xmit+0x10/0x10
      [   23.107381]  ? find_held_lock+0x8e/0xb0
      [   23.107585]  ? lock_release+0x204/0x400
      [   23.107798]  ? neigh_resolve_output+0x185/0x350
      [   23.108049]  ? mark_held_locks+0x48/0xa0
      [   23.108265]  ? neigh_resolve_output+0x185/0x350
      [   23.108514]  neigh_resolve_output+0x246/0x350
      [   23.108753]  ? neigh_resolve_output+0x246/0x350
      [   23.109003]  ip_finish_output2+0x3c3/0x10b0
      [   23.109250]  ? __pfx_ip_finish_output2+0x10/0x10
      [   23.109510]  ? __pfx_nf_hook+0x10/0x10
      [   23.109732]  __ip_finish_output+0x217/0x390
      [   23.109978]  ip_finish_output+0x2f/0x130
      [   23.110207]  ip_output+0xc9/0x170
      [   23.110404]  ip_push_pending_frames+0x1a0/0x240
      [   23.110652]  raw_sendmsg+0x102e/0x19e0
      [   23.110871]  ? __pfx_raw_sendmsg+0x10/0x10
      [   23.111093]  ? lock_release+0x204/0x400
      [   23.111304]  ? __mod_lruvec_page_state+0x148/0x330
      [   23.111567]  ? find_held_lock+0x8e/0xb0
      [   23.111777]  ? find_held_lock+0x8e/0xb0
      [   23.111993]  ? __rcu_read_unlock+0x7c/0x2f0
      [   23.112225]  ? aa_sk_perm+0x18a/0x550
      [   23.112431]  ? filemap_map_pages+0x4f1/0x900
      [   23.112665]  ? __pfx_aa_sk_perm+0x10/0x10
      [   23.112880]  ? find_held_lock+0x8e/0xb0
      [   23.113098]  inet_sendmsg+0xa0/0xb0
      [   23.113297]  ? inet_sendmsg+0xa0/0xb0
      [   23.113500]  ? __pfx_inet_sendmsg+0x10/0x10
      [   23.113727]  sock_sendmsg+0xf4/0x100
      [   23.113924]  ? move_addr_to_kernel.part.0+0x4f/0xa0
      [   23.114190]  __sys_sendto+0x1d4/0x290
      [   23.114391]  ? __pfx___sys_sendto+0x10/0x10
      [   23.114621]  ? __pfx_mark_lock.part.0+0x10/0x10
      [   23.114869]  ? lock_release+0x204/0x400
      [   23.115076]  ? find_held_lock+0x8e/0xb0
      [   23.115287]  ? rcu_is_watching+0x23/0x60
      [   23.115503]  ? __rseq_handle_notify_resume+0x6e2/0x860
      [   23.115778]  ? __kasan_check_write+0x14/0x30
      [   23.116008]  ? blkcg_maybe_throttle_current+0x8d/0x770
      [   23.116285]  ? mark_held_locks+0x28/0xa0
      [   23.116503]  ? do_syscall_64+0x37/0x90
      [   23.116713]  __x64_sys_sendto+0x7f/0xb0
      [   23.116924]  do_syscall_64+0x59/0x90
      [   23.117123]  ? irqentry_exit_to_user_mode+0x25/0x30
      [   23.117387]  ? irqentry_exit+0x77/0xb0
      [   23.117593]  ? exc_page_fault+0x92/0x140
      [   23.117806]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      [   23.118081] RIP: 0033:0x7f744aee2bba
      [   23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
      [   23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [   23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
      [   23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
      [   23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
      [   23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
      [   23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
      [   23.121617]  </TASK>
      [   23.121749]
      [   23.121845] The buggy address belongs to the virtual mapping at
      [   23.121845]  [ffffc90000000000, ffffc90000009000) created by:
      [   23.121845]  irq_init_percpu_irqstack+0x1cf/0x270
      [   23.122707]
      [   23.122803] The buggy address belongs to the physical page:
      [   23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
      [   23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
      [   23.123998] page_type: 0xffffffff()
      [   23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
      [   23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
      [   23.125023] page dumped because: kasan: bad access detected
      [   23.125326]
      [   23.125421] Memory state around the buggy address:
      [   23.125682]  ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   23.126072]  ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
      [   23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
      [   23.126840]                                               ^
      [   23.127138]  ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
      [   23.127522]  ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
      [   23.127906] ==================================================================
      [   23.128324] Disabling lock debugging due to kernel taint
      
      Using simple s16 pointers for the 16-bit accesses fixes the problem. For
      the 32-bit accesses, src and dst can be used directly.
      
      Fixes: 96518518 ("netfilter: add nftables")
      Cc: stable@vger.kernel.org
      Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      caf3ef74
    • Linus Torvalds's avatar
      Merge tag 'net-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 68433066
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bluetooth, bpf and wireguard.
      
        Current release - regressions:
      
         - nvme-tcp: fix comma-related oops after sendpage changes
      
        Current release - new code bugs:
      
         - ptp: make max_phase_adjustment sysfs device attribute invisible
           when not supported
      
        Previous releases - regressions:
      
         - sctp: fix potential deadlock on &net->sctp.addr_wq_lock
      
         - mptcp:
            - ensure subflow is unhashed before cleaning the backlog
            - do not rely on implicit state check in mptcp_listen()
      
        Previous releases - always broken:
      
         - net: fix net_dev_start_xmit trace event vs skb_transport_offset()
      
         - Bluetooth:
            - fix use-bdaddr-property quirk
            - L2CAP: fix multiple UaFs
            - ISO: use hci_sync for setting CIG parameters
            - hci_event: fix Set CIG Parameters error status handling
            - hci_event: fix parsing of CIS Established Event
            - MGMT: fix marking SCAN_RSP as not connectable
      
         - wireguard: queuing: use saner cpu selection wrapping
      
         - sched: act_ipt: various bug fixes for iptables <> TC interactions
      
         - sched: act_pedit: add size check for TCA_PEDIT_PARMS_EX
      
         - dsa: fixes for receiving PTP packets with 8021q and sja1105 tagging
      
         - eth: sfc: fix null-deref in devlink port without MAE access
      
         - eth: ibmvnic: do not reset dql stats on NON_FATAL err
      
        Misc:
      
         - xsk: honor SO_BINDTODEVICE on bind"
      
      * tag 'net-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
        nfp: clean mc addresses in application firmware when closing port
        selftests: mptcp: pm_nl_ctl: fix 32-bit support
        selftests: mptcp: depend on SYN_COOKIES
        selftests: mptcp: userspace_pm: report errors with 'remove' tests
        selftests: mptcp: userspace_pm: use correct server port
        selftests: mptcp: sockopt: return error if wrong mark
        selftests: mptcp: sockopt: use 'iptables-legacy' if available
        selftests: mptcp: connect: fail if nft supposed to work
        mptcp: do not rely on implicit state check in mptcp_listen()
        mptcp: ensure subflow is unhashed before cleaning the backlog
        s390/qeth: Fix vipa deletion
        octeontx-af: fix hardware timestamp configuration
        net: dsa: sja1105: always enable the send_meta options
        net: dsa: tag_sja1105: fix MAC DA patching from meta frames
        net: Replace strlcpy with strscpy
        pptp: Fix fib lookup calls.
        mlxsw: spectrum_router: Fix an IS_ERR() vs NULL check
        net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX
        xsk: Honor SO_BINDTODEVICE on bind
        ptp: Make max_phase_adjustment sysfs device attribute invisible when not supported
        ...
      68433066
    • Linus Torvalds's avatar
      Merge tag 'f2fs-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs · 73a3fcda
      Linus Torvalds authored
      Pull f2fs updates from Jaegeuk Kim:
       "In this cycle, we've mainly investigated the zoned block device
        support along with patches such as correcting write pointers between
        f2fs and storage, adding asynchronous zone reset flow, and managing
        the number of open zones.
      
        Other than them, f2fs adds another mount option, "errors=x" to specify
        how to handle when it detects an unexpected behavior at runtime.
      
        Enhancements:
         - support 'errors=remount-ro|continue|panic' mount option
         - enforce some inode flag policies
         - allow .tmp compression given extensions
         - add some ioctls to manage the f2fs compression
         - improve looped node chain flow
         - avoid issuing small-sized discard commands during checkpoint
         - implement an asynchronous zone reset
      
        Bug fixes:
         - fix deadlock in xattr and inode page lock
         - fix and add sanity check in some error paths
         - fix to avoid NULL pointer dereference f2fs_write_end_io() along
           with put_super
         - set proper flags to quota files
         - fix potential deadlock due to unpaired node_write lock use
         - fix over-estimating free section during FG GC
         - fix the wrong condition to determine atomic context
      
        As usual, also there are a number of patches with code refactoring and
        minor clean-ups"
      
      * tag 'f2fs-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (46 commits)
        f2fs: fix to do sanity check on direct node in truncate_dnode()
        f2fs: only set release for file that has compressed data
        f2fs: fix compile warning in f2fs_destroy_node_manager()
        f2fs: fix error path handling in truncate_dnode()
        f2fs: fix deadlock in i_xattr_sem and inode page lock
        f2fs: remove unneeded page uptodate check/set
        f2fs: update mtime and ctime in move file range method
        f2fs: compress tmp files given extension
        f2fs: refactor struct f2fs_attr macro
        f2fs: convert to use sbi directly
        f2fs: remove redundant assignment to variable err
        f2fs: do not issue small discard commands during checkpoint
        f2fs: check zone write pointer points to the end of zone
        f2fs: add f2fs_ioc_get_compress_blocks
        f2fs: cleanup MIN_INLINE_XATTR_SIZE
        f2fs: add helper to check compression level
        f2fs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
        f2fs: do more sanity check on inode
        f2fs: compress: fix to check validity of i_compress_flag field
        f2fs: add sanity compress level check for compressed file
        ...
      73a3fcda
    • Linus Torvalds's avatar
      Merge tag 'xfs-6.5-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · bb8e7e9f
      Linus Torvalds authored
      Pull more xfs updates from Darrick Wong:
      
       - Fix some ordering problems with log items during log recovery
      
       - Don't deadlock the system by trying to flush busy freed extents while
         holding on to busy freed extents
      
       - Improve validation of log geometry parameters when reading the
         primary superblock
      
       - Validate the length field in the AGF header
      
       - Fix recordset filtering bugs when re-calling GETFSMAP to return more
         results when the resultset didn't previously fit in the caller's
         buffer
      
       - Fix integer overflows in GETFSMAP when working with rt volumes larger
         than 2^32 fsblocks
      
       - Fix GETFSMAP reporting the undefined space beyond the last rtextent
      
       - Fix filtering bugs in GETFSMAP's log device backend if the log ever
         becomes longer than 2^32 fsblocks
      
       - Improve validation of file offsets in the GETFSMAP range parameters
      
       - Fix an off by one bug in the pmem media failure notification
         computation
      
       - Validate the length field in the AGI header too
      
      * tag 'xfs-6.5-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: Remove unneeded semicolon
        xfs: AGI length should be bounds checked
        xfs: fix the calculation for "end" and "length"
        xfs: fix xfs_btree_query_range callers to initialize btree rec fully
        xfs: validate fsmap offsets specified in the query keys
        xfs: fix logdev fsmap query result filtering
        xfs: clean up the rtbitmap fsmap backend
        xfs: fix getfsmap reporting past the last rt extent
        xfs: fix integer overflows in the fsmap rtbitmap and logdev backends
        xfs: fix interval filtering in multi-step fsmap queries
        xfs: fix bounds check in xfs_defer_agfl_block()
        xfs: AGF length has never been bounds checked
        xfs: journal geometry is not properly bounds checked
        xfs: don't block in busy flushing when freeing extents
        xfs: allow extent free intents to be retried
        xfs: pass alloc flags through to xfs_extent_busy_flush()
        xfs: use deferred frees for btree block freeing
        xfs: don't reverse order of items in bulk AIL insertion
        xfs: remove redundant initializations of pointers drop_leaf and save_leaf
      bb8e7e9f
    • Linus Torvalds's avatar
      Merge tag 'pwm/for-6.5-rc1' of... · ace1ba1c
      Linus Torvalds authored
      Merge tag 'pwm/for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
      
      Pull pwm updates from Thierry Reding:
       "There's a little bit of everything in here: we've got various
        improvements and cleanups to drivers, some fixes across the board and
        a bit of new hardware support"
      
      * tag 'pwm/for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (22 commits)
        dt-bindings: pwm: convert pwm-bcm2835 bindings to YAML
        pwm: Add Renesas RZ/G2L MTU3a PWM driver
        pwm: mtk_disp: Fix the disable flow of disp_pwm
        dt-bindings: pwm: restrict node name suffixes
        pwm: pca9685: Switch i2c driver back to use .probe()
        pwm: ab8500: Fix error code in probe()
        MAINTAINERS: add pwm to PolarFire SoC entry
        pwm: add microchip soft ip corePWM driver
        pwm: sysfs: Do not apply state to already disabled PWMs
        pwm: imx-tpm: force 'real_period' to be zero in suspend
        pwm: meson: make full use of common clock framework
        pwm: meson: don't use hdmi/video clock as mux parent
        pwm: meson: switch to using struct clk_parent_data for mux parents
        pwm: meson: remove not needed check in meson_pwm_calc
        pwm: meson: fix handling of period/duty if greater than UINT_MAX
        pwm: meson: modify and simplify calculation in meson_pwm_get_state
        dt-bindings: pwm: Add R-Car V3U device tree bindings
        dt-bindings: pwm: imx: add i.MX8QXP compatible
        pwm: mediatek: Add support for MT7981
        dt-bindings: pwm: mediatek: Add mediatek,mt7981 compatible
        ...
      ace1ba1c
    • Linus Torvalds's avatar
      Merge tag 'devicetree-for-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux · b9861581
      Linus Torvalds authored
      Pull more devicetree updates from Rob Herring:
      
       - Whitespace clean-ups in binding examples
      
       - Restrict node name suffixes to "-[0-9]+" for cases of multiple
         instances which don't have unit-addresses
      
       - Convert brcm,kona-wdt and cdns,wdt-r1p2 watchdog bindings to DT
         schema
      
      * tag 'devicetree-for-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        dt-bindings: soc: qcom: stats: Update maintainer email
        dt-bindings: cleanup DTS example whitespaces
        dt-bindings: timestamp: restrict node name suffixes
        dt-bindings: slimbus: restrict node name suffixes
        dt-bindings: watchdog: restrict node name suffixes
        dt-bindings: watchdog: brcm,kona-wdt: convert txt file to yaml
        dt-bindings: watchdog: cdns,wdt-r1p2: Convert cadence watchdog to yaml
      b9861581
    • Aravindhan Gunasekaran's avatar
      igc: Handle PPS start time programming for past time values · 84a192e4
      Aravindhan Gunasekaran authored
      I225/6 hardware can be programmed to start PPS output once
      the time in Target Time registers is reached. The time
      programmed in these registers should always be into future.
      Only then PPS output is triggered when SYSTIM register
      reaches the programmed value. There are two modes in i225/6
      hardware to program PPS, pulse and clock mode.
      
      There were issues reported where PPS is not generated when
      start time is in past.
      
      Example 1, "echo 0 0 0 2 0 > /sys/class/ptp/ptp0/period"
      
      In the current implementation, a value of '0' is programmed
      into Target time registers and PPS output is in pulse mode.
      Eventually an interrupt which is triggered upon SYSTIM
      register reaching Target time is not fired. Thus no PPS
      output is generated.
      
      Example 2, "echo 0 0 0 1 0 > /sys/class/ptp/ptp0/period"
      
      Above case, a value of '0' is programmed into Target time
      registers and PPS output is in clock mode. Here, HW tries to
      catch-up the current time by incrementing Target Time
      register. This catch-up time seem to vary according to
      programmed PPS period time as per the HW design. In my
      experiments, the delay ranged between few tens of seconds to
      few minutes. The PPS output is only generated after the
      Target time register reaches current time.
      
      In my experiments, I also observed PPS stopped working with
      below test and could not recover until module is removed and
      loaded again.
      
      1) echo 0 <future time> 0 1 0 > /sys/class/ptp/ptp1/period
      2) echo 0 0 0 1 0 > /sys/class/ptp/ptp1/period
      3) echo 0 0 0 1 0 > /sys/class/ptp/ptp1/period
      
      After this PPS did not work even if i re-program with proper
      values. I could only get this back working by reloading the
      driver.
      
      This patch takes care of calculating and programming
      appropriate future time value into Target Time registers.
      
      Fixes: 5e91c72e ("igc: Fix PPS delta between two synchronized end-points")
      Signed-off-by: default avatarAravindhan Gunasekaran <aravindhan.gunasekaran@intel.com>
      Reviewed-by: default avatarMuhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      84a192e4
    • Tan Tee Min's avatar
      igc: Include the length/type field and VLAN tag in queueMaxSDU · 25102893
      Tan Tee Min authored
      IEEE 802.1Q does not have clear definitions of what constitutes an
      SDU (Service Data Unit), but IEEE Std 802.3 clause 3.1.2 does define
      the MAC service primitives and clause 3.2.7 does define the MAC Client
      Data for Q-tagged frames.
      
      It shows that the mac_service_data_unit (MSDU) does NOT contain the
      preamble, destination and source address, or FCS. The MSDU does contain
      the length/type field, MAC client data, VLAN tag and any padding
      data (prior to the FCS).
      
      Thus, the maximum 802.3 frame size that is allowed to be transmitted
      should be QueueMaxSDU (MSDU) + 16 (6 byte SA + 6 byte DA + 4 byte FCS).
      
      Fixes: 92a0dcb8 ("igc: offload queue max SDU from tc-taprio")
      Signed-off-by: default avatarTan Tee Min <tee.min.tan@linux.intel.com>
      Reviewed-by: default avatarMuhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      25102893
    • Prasad Koya's avatar
      igc: set TP bit in 'supported' and 'advertising' fields of ethtool_link_ksettings · 9ac3fc2f
      Prasad Koya authored
      set TP bit in the 'supported' and 'advertising' fields. i225/226 parts
      only support twisted pair copper.
      
      Fixes: 8c5ad0da ("igc: Add ethtool support")
      Signed-off-by: default avatarPrasad Koya <prasad@arista.com>
      Acked-by: default avatarSasha Neftin <sasha.neftin@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      9ac3fc2f
    • Yinjun Zhang's avatar
      nfp: clean mc addresses in application firmware when closing port · cc7eab25
      Yinjun Zhang authored
      When moving devices from one namespace to another, mc addresses are
      cleaned in software while not removed from application firmware. Thus
      the mc addresses are remained and will cause resource leak.
      
      Now use `__dev_mc_unsync` to clean mc addresses when closing port.
      
      Fixes: e20aa071 ("nfp: fix schedule in atomic context when sync mc address")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarYinjun Zhang <yinjun.zhang@corigine.com>
      Acked-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Message-ID: <20230705052818.7122-1-louis.peens@corigine.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cc7eab25
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · fdaff05b
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2023-07-05
      
      We've added 2 non-merge commits during the last 1 day(s) which contain
      a total of 3 files changed, 16 insertions(+), 4 deletions(-).
      
      The main changes are:
      
      1) Fix BTF to warn but not returning an error for a NULL BTF to still be
         able to load modules under CONFIG_DEBUG_INFO_BTF, from SeongJae Park.
      
      2) Fix xsk sockets to honor SO_BINDTODEVICE in bind(), from Ilya Maximets.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        xsk: Honor SO_BINDTODEVICE on bind
        bpf, btf: Warn but return no error for NULL btf from __register_btf_kfunc_id_set()
      ====================
      
      Link: https://lore.kernel.org/r/20230705171716.6494-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fdaff05b
    • Dragos Tatulea's avatar
      net/mlx5e: RX, Fix page_pool page fragment tracking for XDP · 7abd955a
      Dragos Tatulea authored
      Currently mlx5e releases pages directly to the page_pool for XDP_TX and
      does page fragment counting for XDP_REDIRECT. RX pages from the
      page_pool are leaking on XDP_REDIRECT because the xdp core will release
      only one fragment out of MLX5E_PAGECNT_BIAS_MAX and subsequently the page
      is marked as "skip release" which avoids the driver release.
      
      A fix would be to take an extra fragment for XDP_REDIRECT and not set the
      "skip release" bit so that the release on the driver side can handle the
      remaining bias fragments. But this would be a shortsighted solution.
      Instead, this patch converges the two XDP paths (XDP_TX and XDP_REDIRECT) to
      always do fragment tracking. The "skip release" bit is no longer
      necessary for XDP.
      
      Fixes: 6f574284 ("net/mlx5e: RX, Enable skb page recycling through the page_pool")
      Signed-off-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      7abd955a
    • Maher Sanalla's avatar
      net/mlx5: Query hca_cap_2 only when supported · 6496357a
      Maher Sanalla authored
      On vport enable, where fw's hca caps are queried, the driver queries
      hca_caps_2 without checking if fw truly supports them, causing a false
      failure of vfs vport load and blocking SRIOV enablement on old devices
      such as CX4 where hca_caps_2 support is missing.
      
      Thus, add a check for the said caps support before accessing them.
      
      Fixes: e5b9642a ("net/mlx5: E-Switch, Implement devlink port function cmds to control migratable")
      Signed-off-by: default avatarMaher Sanalla <msanalla@nvidia.com>
      Reviewed-by: default avatarShay Drory <shayd@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      6496357a
    • Yevgeny Kliteynik's avatar
      net/mlx5e: TC, CT: Offload ct clear only once · f7a48511
      Yevgeny Kliteynik authored
      Non-clear CT action causes a flow rule split, while CT clear action
      doesn't and is just a header-rewrite to the current flow rule.
      But ct offload is done in post_parse and is per ct action instance,
      so ct clear offload is parsed multiple times, while its deleted once.
      
      Fix this by post_parsing the ct action only once per flow attribute
      (which is per flow rule) by using a offloaded ct_attr flag.
      
      Fixes: 08fe94ec ("net/mlx5e: TC, Remove special handling of CT action")
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      f7a48511
    • Vlad Buslov's avatar
      net/mlx5e: Check for NOT_READY flag state after locking · 65e64640
      Vlad Buslov authored
      Currently the check for NOT_READY flag is performed before obtaining the
      necessary lock. This opens a possibility for race condition when the flow
      is concurrently removed from unready_flows list by the workqueue task,
      which causes a double-removal from the list and a crash[0]. Fix the issue
      by moving the flag check inside the section protected by
      uplink_priv->unready_flows_lock mutex.
      
      [0]:
      [44376.389654] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] SMP
      [44376.391665] CPU: 7 PID: 59123 Comm: tc Not tainted 6.4.0-rc4+ #1
      [44376.392984] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [44376.395342] RIP: 0010:mlx5e_tc_del_fdb_flow+0xb3/0x340 [mlx5_core]
      [44376.396857] Code: 00 48 8b b8 68 ce 02 00 e8 8a 4d 02 00 4c 8d a8 a8 01 00 00 4c 89 ef e8 8b 79 88 e1 48 8b 83 98 06 00 00 48 8b 93 90 06 00 00 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 90 06
      [44376.399167] RSP: 0018:ffff88812cc97570 EFLAGS: 00010246
      [44376.399680] RAX: dead000000000122 RBX: ffff8881088e3800 RCX: ffff8881881bac00
      [44376.400337] RDX: dead000000000100 RSI: ffff88812cc97500 RDI: ffff8881242f71b0
      [44376.401001] RBP: ffff88811cbb0940 R08: 0000000000000400 R09: 0000000000000001
      [44376.401663] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88812c944000
      [44376.402342] R13: ffff8881242f71a8 R14: ffff8881222b4000 R15: 0000000000000000
      [44376.402999] FS:  00007f0451104800(0000) GS:ffff88852cb80000(0000) knlGS:0000000000000000
      [44376.403787] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [44376.404343] CR2: 0000000000489108 CR3: 0000000123a79003 CR4: 0000000000370ea0
      [44376.405004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [44376.405665] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [44376.406339] Call Trace:
      [44376.406651]  <TASK>
      [44376.406939]  ? die_addr+0x33/0x90
      [44376.407311]  ? exc_general_protection+0x192/0x390
      [44376.407795]  ? asm_exc_general_protection+0x22/0x30
      [44376.408292]  ? mlx5e_tc_del_fdb_flow+0xb3/0x340 [mlx5_core]
      [44376.408876]  __mlx5e_tc_del_fdb_peer_flow+0xbc/0xe0 [mlx5_core]
      [44376.409482]  mlx5e_tc_del_flow+0x42/0x210 [mlx5_core]
      [44376.410055]  mlx5e_flow_put+0x25/0x50 [mlx5_core]
      [44376.410529]  mlx5e_delete_flower+0x24b/0x350 [mlx5_core]
      [44376.411043]  tc_setup_cb_reoffload+0x22/0x80
      [44376.411462]  fl_reoffload+0x261/0x2f0 [cls_flower]
      [44376.411907]  ? mlx5e_rep_indr_setup_ft_cb+0x160/0x160 [mlx5_core]
      [44376.412481]  ? mlx5e_rep_indr_setup_ft_cb+0x160/0x160 [mlx5_core]
      [44376.413044]  tcf_block_playback_offloads+0x76/0x170
      [44376.413497]  tcf_block_unbind+0x7b/0xd0
      [44376.413881]  tcf_block_setup+0x17d/0x1c0
      [44376.414269]  tcf_block_offload_cmd.isra.0+0xf1/0x130
      [44376.414725]  tcf_block_offload_unbind+0x43/0x70
      [44376.415153]  __tcf_block_put+0x82/0x150
      [44376.415532]  ingress_destroy+0x22/0x30 [sch_ingress]
      [44376.415986]  qdisc_destroy+0x3b/0xd0
      [44376.416343]  qdisc_graft+0x4d0/0x620
      [44376.416706]  tc_get_qdisc+0x1c9/0x3b0
      [44376.417074]  rtnetlink_rcv_msg+0x29c/0x390
      [44376.419978]  ? rep_movs_alternative+0x3a/0xa0
      [44376.420399]  ? rtnl_calcit.isra.0+0x120/0x120
      [44376.420813]  netlink_rcv_skb+0x54/0x100
      [44376.421192]  netlink_unicast+0x1f6/0x2c0
      [44376.421573]  netlink_sendmsg+0x232/0x4a0
      [44376.421980]  sock_sendmsg+0x38/0x60
      [44376.422328]  ____sys_sendmsg+0x1d0/0x1e0
      [44376.422709]  ? copy_msghdr_from_user+0x6d/0xa0
      [44376.423127]  ___sys_sendmsg+0x80/0xc0
      [44376.423495]  ? ___sys_recvmsg+0x8b/0xc0
      [44376.423869]  __sys_sendmsg+0x51/0x90
      [44376.424226]  do_syscall_64+0x3d/0x90
      [44376.424587]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [44376.425046] RIP: 0033:0x7f045134f887
      [44376.425403] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
      [44376.426914] RSP: 002b:00007ffd63a82b98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [44376.427592] RAX: ffffffffffffffda RBX: 000000006481955f RCX: 00007f045134f887
      [44376.428195] RDX: 0000000000000000 RSI: 00007ffd63a82c00 RDI: 0000000000000003
      [44376.428796] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
      [44376.429404] R10: 00007f0451208708 R11: 0000000000000246 R12: 0000000000000001
      [44376.430039] R13: 0000000000409980 R14: 000000000047e538 R15: 0000000000485400
      [44376.430644]  </TASK>
      [44376.430907] Modules linked in: mlx5_ib mlx5_core act_mirred act_tunnel_key cls_flower vxlan dummy sch_ingress openvswitch nsh rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_g
      ss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: mlx5_core]
      [44376.433936] ---[ end trace 0000000000000000 ]---
      [44376.434373] RIP: 0010:mlx5e_tc_del_fdb_flow+0xb3/0x340 [mlx5_core]
      [44376.434951] Code: 00 48 8b b8 68 ce 02 00 e8 8a 4d 02 00 4c 8d a8 a8 01 00 00 4c 89 ef e8 8b 79 88 e1 48 8b 83 98 06 00 00 48 8b 93 90 06 00 00 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 90 06
      [44376.436452] RSP: 0018:ffff88812cc97570 EFLAGS: 00010246
      [44376.436924] RAX: dead000000000122 RBX: ffff8881088e3800 RCX: ffff8881881bac00
      [44376.437530] RDX: dead000000000100 RSI: ffff88812cc97500 RDI: ffff8881242f71b0
      [44376.438179] RBP: ffff88811cbb0940 R08: 0000000000000400 R09: 0000000000000001
      [44376.438786] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88812c944000
      [44376.439393] R13: ffff8881242f71a8 R14: ffff8881222b4000 R15: 0000000000000000
      [44376.439998] FS:  00007f0451104800(0000) GS:ffff88852cb80000(0000) knlGS:0000000000000000
      [44376.440714] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [44376.441225] CR2: 0000000000489108 CR3: 0000000123a79003 CR4: 0000000000370ea0
      [44376.441843] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [44376.442471] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: ad86755b ("net/mlx5e: Protect unready flows with dedicated lock")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      65e64640
    • Saeed Mahameed's avatar
      net/mlx5: Register a unique thermal zone per device · 631079e0
      Saeed Mahameed authored
      Prior to this patch only one "mlx5" thermal zone could have been
      registered regardless of the number of individual mlx5 devices in the
      system.
      
      To fix this setup a unique name per device to register its own thermal
      zone.
      
      In order to not register a thermal zone for a virtual device (VF/SF) add
      a check for PF device type.
      
      The new name is a concatenation between "mlx5_" and "<PCI_DEV_BDF>", which
      will also help associating a thermal zone with its PCI device.
      
      $ lspci | grep ConnectX
      00:04.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
      00:05.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
      
      $ cat /sys/devices/virtual/thermal/thermal_zone0/type
      mlx5_0000:00:04.0
      $ cat /sys/devices/virtual/thermal/thermal_zone1/type
      mlx5_0000:00:05.0
      
      Fixes: c1fef618 ("net/mlx5: Implement thermal zone")
      CC: Sandipan Patra <spatra@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      631079e0
    • Dragos Tatulea's avatar
      net/mlx5e: RX, Fix flush and close release flow of regular rq for legacy rq · 2e2d1965
      Dragos Tatulea authored
      Regular (non-XSK) RQs get flushed on XSK setup and re-activated on XSK
      close. If the same regular RQ is closed (a config change for example)
      soon after the XSK close, a double release occurs because the missing
      wqes get released a second time.
      
      Fixes: 3f93f829 ("net/mlx5e: RX, Defer page release in legacy rq for better recycling")
      Signed-off-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2e2d1965