1. 16 Aug, 2024 15 commits
    • Vladimir Oltean's avatar
      net: dsa: provide a software untagging function on RX for VLAN-aware bridges · 93e4649e
      Vladimir Oltean authored
      Through code analysis, I realized that the ds->untag_bridge_pvid logic
      is contradictory - see the newly added FIXME above the kernel-doc for
      dsa_software_untag_vlan_unaware_bridge().
      
      Moreover, for the Felix driver, I need something very similar, but which
      is actually _not_ contradictory: untag the bridge PVID on RX, but for
      VLAN-aware bridges. The existing logic does it for VLAN-unaware bridges.
      
      Since I don't want to change the functionality of drivers which were
      supposedly properly tested with the ds->untag_bridge_pvid flag, I have
      introduced a new one: ds->untag_vlan_aware_bridge_pvid, and I have
      refactored the DSA reception code into a common path for both flags.
      
      TODO: both flags should be unified under a single ds->software_vlan_untag,
      which users of both current flags should set. This is not something that
      can be carried out right away. It needs very careful examination of all
      drivers which make use of this functionality, since some of them
      actually get this wrong in the first place.
      
      For example, commit 9130c2d3 ("net: dsa: microchip: ksz8795: Use
      software untagging on CPU port") uses this in a driver which has
      ds->configure_vlan_while_not_filtering = true. The latter mechanism has
      been known for many years to be broken by design:
      https://lore.kernel.org/netdev/CABumfLzJmXDN_W-8Z=p9KyKUVi_HhS7o_poBkeKHS2BkAiyYpw@mail.gmail.com/
      and we have the situation of 2 bugs canceling each other. There is no
      private VLAN, and the port follows the PVID of the VLAN-unaware bridge.
      So, it's kinda ok for that driver to use the ds->untag_bridge_pvid
      mechanism, in a broken way.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93e4649e
    • Vladimir Oltean's avatar
      net: mscc: ocelot: serialize access to the injection/extraction groups · c5e12ac3
      Vladimir Oltean authored
      As explained by Horatiu Vultur in commit 603ead96 ("net: sparx5: Add
      spinlock for frame transmission from CPU") which is for a similar
      hardware design, multiple CPUs can simultaneously perform injection
      or extraction. There are only 2 register groups for injection and 2
      for extraction, and the driver only uses one of each. So we'd better
      serialize access using spin locks, otherwise frame corruption is
      possible.
      
      Note that unlike in sparx5, FDMA in ocelot does not have this issue
      because struct ocelot_fdma_tx_ring already contains an xmit_lock.
      
      I guess this is mostly a problem for NXP LS1028A, as that is dual core.
      I don't think VSC7514 is. So I'm blaming the commit where LS1028A (aka
      the felix DSA driver) started using register-based packet injection and
      extraction.
      
      Fixes: 0a6f17c6 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5e12ac3
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix QoS class for injected packets with "ocelot-8021q" · e1b9e802
      Vladimir Oltean authored
      There are 2 distinct code paths (listed below) in the source code which
      set up an injection header for Ocelot(-like) switches. Code path (2)
      lacks the QoS class and source port being set correctly. Especially the
      improper QoS classification is a problem for the "ocelot-8021q"
      alternative DSA tagging protocol, because we support tc-taprio and each
      packet needs to be scheduled precisely through its time slot. This
      includes PTP, which is normally assigned to a traffic class other than
      0, but would be sent through TC 0 nonetheless.
      
      The code paths are:
      
      (1) ocelot_xmit_common() from net/dsa/tag_ocelot.c - called only by the
          standard "ocelot" DSA tagging protocol which uses NPI-based
          injection - sets up bit fields in the tag manually to account for
          a small difference (destination port offset) between Ocelot and
          Seville. Namely, ocelot_ifh_set_dest() is omitted out of
          ocelot_xmit_common(), because there's also seville_ifh_set_dest().
      
      (2) ocelot_ifh_set_basic(), called by:
          - ocelot_fdma_prepare_skb() for FDMA transmission of the ocelot
            switchdev driver
          - ocelot_port_xmit() -> ocelot_port_inject_frame() for
            register-based transmission of the ocelot switchdev driver
          - felix_port_deferred_xmit() -> ocelot_port_inject_frame() for the
            DSA tagger ocelot-8021q when it must transmit PTP frames (also
            through register-based injection).
          sets the bit fields according to its own logic.
      
      The problem is that (2) doesn't call ocelot_ifh_set_qos_class().
      Copying that logic from ocelot_xmit_common() fixes that.
      
      Unfortunately, although desirable, it is not easily possible to
      de-duplicate code paths (1) and (2), and make net/dsa/tag_ocelot.c
      directly call ocelot_ifh_set_basic()), because of the ocelot/seville
      difference. This is the "minimal" fix with some logic duplicated (but
      at least more consolidated).
      
      Fixes: 0a6f17c6 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1b9e802
    • Vladimir Oltean's avatar
      net: mscc: ocelot: use ocelot_xmit_get_vlan_info() also for FDMA and register injection · 67c3ca2c
      Vladimir Oltean authored
      Problem description
      -------------------
      
      On an NXP LS1028A (felix DSA driver) with the following configuration:
      
      - ocelot-8021q tagging protocol
      - VLAN-aware bridge (with STP) spanning at least swp0 and swp1
      - 8021q VLAN upper interfaces on swp0 and swp1: swp0.700, swp1.700
      - ptp4l on swp0.700 and swp1.700
      
      we see that the ptp4l instances do not see each other's traffic,
      and they all go to the grand master state due to the
      ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES condition.
      
      Jumping to the conclusion for the impatient
      -------------------------------------------
      
      There is a zero-day bug in the ocelot switchdev driver in the way it
      handles VLAN-tagged packet injection. The correct logic already exists in
      the source code, in function ocelot_xmit_get_vlan_info() added by commit
      5ca721c5 ("net: dsa: tag_ocelot: set the classified VLAN during xmit").
      But it is used only for normal NPI-based injection with the DSA "ocelot"
      tagging protocol. The other injection code paths (register-based and
      FDMA-based) roll their own wrong logic. This affects and was noticed on
      the DSA "ocelot-8021q" protocol because it uses register-based injection.
      
      By moving ocelot_xmit_get_vlan_info() to a place that's common for both
      the DSA tagger and the ocelot switch library, it can also be called from
      ocelot_port_inject_frame() in ocelot.c.
      
      We need to touch the lines with ocelot_ifh_port_set()'s prototype
      anyway, so let's rename it to something clearer regarding what it does,
      and add a kernel-doc. ocelot_ifh_set_basic() should do.
      
      Investigation notes
      -------------------
      
      Debugging reveals that PTP event (aka those carrying timestamps, like
      Sync) frames injected into swp0.700 (but also swp1.700) hit the wire
      with two VLAN tags:
      
      00000000: 01 1b 19 00 00 00 00 01 02 03 04 05 81 00 02 bc
                                                    ~~~~~~~~~~~
      00000010: 81 00 02 bc 88 f7 00 12 00 2c 00 00 02 00 00 00
                ~~~~~~~~~~~
      00000020: 00 00 00 00 00 00 00 00 00 00 00 01 02 ff fe 03
      00000030: 04 05 00 01 00 04 00 00 00 00 00 00 00 00 00 00
      00000040: 00 00
      
      The second (unexpected) VLAN tag makes felix_check_xtr_pkt() ->
      ptp_classify_raw() fail to see these as PTP packets at the link
      partner's receiving end, and return PTP_CLASS_NONE (because the BPF
      classifier is not written to expect 2 VLAN tags).
      
      The reason why packets have 2 VLAN tags is because the transmission
      code treats VLAN incorrectly.
      
      Neither ocelot switchdev, nor felix DSA, declare the NETIF_F_HW_VLAN_CTAG_TX
      feature. Therefore, at xmit time, all VLANs should be in the skb head,
      and none should be in the hwaccel area. This is done by:
      
      static struct sk_buff *validate_xmit_vlan(struct sk_buff *skb,
      					  netdev_features_t features)
      {
      	if (skb_vlan_tag_present(skb) &&
      	    !vlan_hw_offload_capable(features, skb->vlan_proto))
      		skb = __vlan_hwaccel_push_inside(skb);
      	return skb;
      }
      
      But ocelot_port_inject_frame() handles things incorrectly:
      
      	ocelot_ifh_port_set(ifh, port, rew_op, skb_vlan_tag_get(skb));
      
      void ocelot_ifh_port_set(struct sk_buff *skb, void *ifh, int port, u32 rew_op)
      {
      	(...)
      	if (vlan_tag)
      		ocelot_ifh_set_vlan_tci(ifh, vlan_tag);
      	(...)
      }
      
      The way __vlan_hwaccel_push_inside() pushes the tag inside the skb head
      is by calling:
      
      static inline void __vlan_hwaccel_clear_tag(struct sk_buff *skb)
      {
      	skb->vlan_present = 0;
      }
      
      which does _not_ zero out skb->vlan_tci as seen by skb_vlan_tag_get().
      This means that ocelot, when it calls skb_vlan_tag_get(), sees
      (and uses) a residual skb->vlan_tci, while the same VLAN tag is
      _already_ in the skb head.
      
      The trivial fix for double VLAN headers is to replace the content of
      ocelot_ifh_port_set() with:
      
      	if (skb_vlan_tag_present(skb))
      		ocelot_ifh_set_vlan_tci(ifh, skb_vlan_tag_get(skb));
      
      but this would not be correct either, because, as mentioned,
      vlan_hw_offload_capable() is false for us, so we'd be inserting dead
      code and we'd always transmit packets with VID=0 in the injection frame
      header.
      
      I can't actually test the ocelot switchdev driver and rely exclusively
      on code inspection, but I don't think traffic from 8021q uppers has ever
      been injected properly, and not double-tagged. Thus I'm blaming the
      introduction of VLAN fields in the injection header - early driver code.
      
      As hinted at in the early conclusion, what we _want_ to happen for
      VLAN transmission was already described once in commit 5ca721c5
      ("net: dsa: tag_ocelot: set the classified VLAN during xmit").
      
      ocelot_xmit_get_vlan_info() intends to ensure that if the port through
      which we're transmitting is under a VLAN-aware bridge, the outer VLAN
      tag from the skb head is stripped from there and inserted into the
      injection frame header (so that the packet is processed in hardware
      through that actual VLAN). And in all other cases, the packet is sent
      with VID=0 in the injection frame header, since the port is VLAN-unaware
      and has logic to strip this VID on egress (making it invisible to the
      wire).
      
      Fixes: 08d02364 ("net: mscc: fix the injection header")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67c3ca2c
    • Vladimir Oltean's avatar
      selftests: net: bridge_vlan_aware: test that other TPIDs are seen as untagged · e29b82ef
      Vladimir Oltean authored
      The bridge VLAN implementation w.r.t. VLAN protocol is described in
      merge commit 1a0b20b2 ("Merge branch 'bridge-next'"). We are only
      sensitive to those VLAN tags whose TPID is equal to the bridge's
      vlan_protocol. Thus, an 802.1ad VLAN should be treated as 802.1Q-untagged.
      
      Add 3 tests which validate that:
      - 802.1ad-tagged traffic is learned into the PVID of an 802.1Q-aware
        bridge
      - Double-tagged traffic is forwarded when just the PVID of the port is
        present in the VLAN group of the ports
      - Double-tagged traffic is not forwarded when the PVID of the port is
        absent from the VLAN group of the ports
      
      The test passes with both veth and ocelot.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Tested-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e29b82ef
    • Vladimir Oltean's avatar
      selftests: net: local_termination: add PTP frames to the mix · 23797950
      Vladimir Oltean authored
      A breakage in the felix DSA driver shows we do not have enough test
      coverage. More generally, it is sufficiently special that it is likely
      drivers will treat it differently.
      
      This is not meant to be a full PTP test, it just makes sure that PTP
      packets sent to the different addresses corresponding to their profiles
      are received correctly. The local_termination selftest seemed like the
      most appropriate place for this addition.
      
      PTP RX/TX in some cases makes no sense (over a bridge) and this is why
      $skip_ptp exists. And in others - PTP over a bridge port - the IP stack
      needs convincing through the available bridge netfilter hooks to leave
      the PTP packets alone and not stolen by the bridge rx_handler. It is
      safe to assume that users have that figured out already. This is a
      driver level test, and by using tcpdump, all that extra setup is out of
      scope here.
      
      send_non_ip() was an unfinished idea; written but never used.
      Replace it with a more generic send_raw(), and send 3 PTP packet types
      times 3 transports.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23797950
    • Vladimir Oltean's avatar
      selftests: net: local_termination: don't use xfail_on_veth() · 9aa3749c
      Vladimir Oltean authored
      xfail_on_veth() for this test is an incorrect approximation which gives
      false positives and false negatives.
      
      When local_termination fails with "reception succeeded, but should have failed",
      it is because the DUT ($h2) accepts packets even when not configured as
      promiscuous. This is not something specific to veth; even the bridge
      behaves that way, but this is not captured by the xfail_on_veth test.
      
      The IFF_UNICAST_FLT flag is not explicitly exported to user space, but
      it can somewhat be determined from the interface's behavior. We have to
      create a macvlan upper with a different MAC address. This forces a
      dev_uc_add() call in the kernel. When the unicast filtering list is
      not empty, but the device doesn't support IFF_UNICAST_FLT,
      __dev_set_rx_mode() force-enables promiscuity on the interface, to
      ensure correct behavior (that the requested address is received).
      
      We can monitor the change in the promiscuity flag and infer from it
      whether the device supports unicast filtering.
      
      There is no equivalent thing for allmulti, unfortunately. We never know
      what's hiding behind a device which has allmulti=off. Whether it will
      actually perform RX multicast filtering of unknown traffic is a strong
      "maybe". The bridge driver, for example, completely ignores the flag.
      We'll have to keep the xfail behavior, but instead of XFAIL on just
      veth, always XFAIL.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9aa3749c
    • Vladimir Oltean's avatar
      selftests: net: local_termination: introduce new tests which capture VLAN behavior · 5fea8bb0
      Vladimir Oltean authored
      Add more coverage to the local termination selftest as follows:
      - 8021q upper of $h2
      - 8021q upper of $h2, where $h2 is a port of a VLAN-unaware bridge
      - 8021q upper of $h2, where $h2 is a port of a VLAN-aware bridge
      - 8021q upper of VLAN-unaware br0, which is the upper of $h2
      - 8021q upper of VLAN-aware br0, which is the upper of $h2
      
      Especially the cases with traffic sent through the VLAN upper of a
      VLAN-aware bridge port will be immediately relevant when we will start
      transmitting PTP packets as an additional kind of traffic.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fea8bb0
    • Vladimir Oltean's avatar
      selftests: net: local_termination: add one more test for VLAN-aware bridges · 5b8e7418
      Vladimir Oltean authored
      The current bridge() test is for packet reception on a VLAN-unaware
      bridge. Some things are different enough with VLAN-aware bridges that
      it's worth renaming this test into vlan_unaware_bridge(), and add a new
      vlan_aware_bridge() test.
      
      The two will share the same implementation: bridge() becomes a common
      function, which receives $vlan_filtering as an argument. Rename it to
      test_bridge() at the same time, because just bridge() pollutes the
      global namespace and we cannot invoke the binary with the same name from
      the iproute2 package currently.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b8e7418
    • Vladimir Oltean's avatar
      selftests: net: local_termination: parameterize test name · df7cf5cc
      Vladimir Oltean authored
      There are upcoming tests which verify the RX filtering of a bridge
      (or bridge port), but under differing vlan_filtering conditions.
      Since we currently print $h2 (the DUT) in the log_test() output, it
      becomes necessary to make a further distinction between tests, to not
      give the user the impression that the exact same thing is run twice.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df7cf5cc
    • Vladimir Oltean's avatar
      selftests: net: local_termination: parameterize sending interface · 4261fa35
      Vladimir Oltean authored
      In future changes we will want to subject the DUT, $h2, to additional
      VLAN-tagged traffic. For that, we need to run the tests using $h1.100 as
      a sending interface, rather than the currently hardcoded $h1.
      
      Add a parameter to run_test() and modify its 2 callers to explicitly
      pass $h1, as was implicit before.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4261fa35
    • Vladimir Oltean's avatar
      selftests: net: local_termination: refactor macvlan creation/deletion · 8d019b15
      Vladimir Oltean authored
      This will be used in other subtests as well; make new macvlan_create()
      and macvlan_destroy() functions.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d019b15
    • Jakub Kicinski's avatar
      MAINTAINERS: add selftests to network drivers · b153b3c7
      Jakub Kicinski authored
      tools/testing/selftests/drivers/net/ is not listed under
      networking entries. Add it to NETWORKING DRIVERS.
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20240814142832.3473685-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b153b3c7
    • Pavan Chebbi's avatar
      bnxt_en: Don't clear ntuple filters and rss contexts during ethtool ops · c948c097
      Pavan Chebbi authored
      The driver currently blindly deletes its cache of RSS cotexts and
      ntuple filters when the ethtool channel count is changing.  It also
      deletes the ntuple filters cache when the default indirection table
      is changing.
      
      The core will not allow ethtool channels to drop below any that
      have been configured as ntuple destinations since this commit from 2022:
      
      47f3ecf4 ("ethtool: Fail number of channels change when it conflicts with rxnfc")
      
      So there is absolutely no need to delete the ntuple filters and
      RSS contexts when changing ethtool channels.
      
      It is also unnecessary to delete ntuple filters when the default
      RSS indirection table is changing.
      
      Remove bnxt_clear_usr_fltrs() and bnxt_clear_rss_ctxis() from the
      ethtool ops and change them to static functions.
      
      This bug will cause confusion to the end user and causes failure when
      running the rss_ctx.py selftest.
      
      Fixes: 1018319f ("bnxt_en: Invalidate user filters when needed")
      Reported-by: default avatarJakub Kicinski <kuba@kernel.org>
      Closes: https://lore.kernel.org/netdev/20240725111912.7bc17cf6@kernel.org/Reviewed-by: default avatarAndy Gospodarek <andrew.gospodarek@broadcom.com>
      Signed-off-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Link: https://patch.msgid.link/20240814225429.199280-1-michael.chan@broadcom.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c948c097
    • Jiri Pirko's avatar
      virtio_net: move netdev_tx_reset_queue() call before RX napi enable · b96ed2c9
      Jiri Pirko authored
      During suspend/resume the following BUG was hit:
      ------------[ cut here ]------------
      kernel BUG at lib/dynamic_queue_limits.c:99!
      Internal error: Oops - BUG: 0 [#1] SMP ARM
      Modules linked in: bluetooth ecdh_generic ecc libaes
      CPU: 1 PID: 1282 Comm: rtcwake Not tainted
      6.10.0-rc3-00732-gc8bd1f7f #15240
      Hardware name: Generic DT based system
      PC is at dql_completed+0x270/0x2cc
      LR is at __free_old_xmit+0x120/0x198
      pc : [<c07ffa54>]    lr : [<c0c42bf4>]    psr: 80000013
      ...
      Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      Control: 10c5387d  Table: 43a4406a  DAC: 00000051
      ...
      Process rtcwake (pid: 1282, stack limit = 0xfbc21278)
      Stack: (0xe0805e80 to 0xe0806000)
      ...
      Call trace:
        dql_completed from __free_old_xmit+0x120/0x198
        __free_old_xmit from free_old_xmit+0x44/0xe4
        free_old_xmit from virtnet_poll_tx+0x88/0x1b4
        virtnet_poll_tx from __napi_poll+0x2c/0x1d4
        __napi_poll from net_rx_action+0x140/0x2b4
        net_rx_action from handle_softirqs+0x11c/0x350
        handle_softirqs from call_with_stack+0x18/0x20
        call_with_stack from do_softirq+0x48/0x50
        do_softirq from __local_bh_enable_ip+0xa0/0xa4
        __local_bh_enable_ip from virtnet_open+0xd4/0x21c
        virtnet_open from virtnet_restore+0x94/0x120
        virtnet_restore from virtio_device_restore+0x110/0x1f4
        virtio_device_restore from dpm_run_callback+0x3c/0x100
        dpm_run_callback from device_resume+0x12c/0x2a8
        device_resume from dpm_resume+0x12c/0x1e0
        dpm_resume from dpm_resume_end+0xc/0x18
        dpm_resume_end from suspend_devices_and_enter+0x1f0/0x72c
        suspend_devices_and_enter from pm_suspend+0x270/0x2a0
        pm_suspend from state_store+0x68/0xc8
        state_store from kernfs_fop_write_iter+0x10c/0x1cc
        kernfs_fop_write_iter from vfs_write+0x2b0/0x3dc
        vfs_write from ksys_write+0x5c/0xd4
        ksys_write from ret_fast_syscall+0x0/0x54
      Exception stack(0xe8bf1fa8 to 0xe8bf1ff0)
      ...
      ---[ end trace 0000000000000000 ]---
      
      After virtnet_napi_enable() is called, the following path is hit:
        __napi_poll()
          -> virtnet_poll()
            -> virtnet_poll_cleantx()
              -> netif_tx_wake_queue()
      
      That wakes the TX queue and allows skbs to be submitted and accounted by
      BQL counters.
      
      Then netdev_tx_reset_queue() is called that resets BQL counters and
      eventually leads to the BUG in dql_completed().
      
      Move virtnet_napi_tx_enable() what does BQL counters reset before RX
      napi enable to avoid the issue.
      Reported-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Closes: https://lore.kernel.org/netdev/e632e378-d019-4de7-8f13-07c572ab37a9@samsung.com/
      Fixes: c8bd1f7f ("virtio_net: add support for Byte Queue Limits")
      Tested-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Link: https://patch.msgid.link/20240814122500.1710279-1-jiri@resnulli.usSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b96ed2c9
  2. 15 Aug, 2024 18 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · a4a35f6c
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from wireless and netfilter
      
        Current release - regressions:
      
         - udp: fall back to software USO if IPv6 extension headers are
           present
      
         - wifi: iwlwifi: correctly lookup DMA address in SG table
      
        Current release - new code bugs:
      
         - eth: mlx5e: fix queue stats access to non-existing channels splat
      
        Previous releases - regressions:
      
         - eth: mlx5e: take state lock during tx timeout reporter
      
         - eth: mlxbf_gige: disable RX filters until RX path initialized
      
         - eth: igc: fix reset adapter logics when tx mode change
      
        Previous releases - always broken:
      
         - tcp: update window clamping condition
      
         - netfilter:
            - nf_queue: drop packets with cloned unconfirmed conntracks
            - nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
      
         - vsock: fix recursive ->recvmsg calls
      
         - dsa: vsc73xx: fix MDIO bus access and PHY opera
      
         - eth: gtp: pull network headers in gtp_dev_xmit()
      
         - eth: igc: fix packet still tx after gate close by reducing i226 MAC
           retry buffer
      
         - eth: mana: fix RX buf alloc_size alignment and atomic op panic
      
         - eth: hns3: fix a deadlock problem when config TC during resetting"
      
      * tag 'net-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits)
        net: hns3: use correct release function during uninitialization
        net: hns3: void array out of bound when loop tnl_num
        net: hns3: fix a deadlock problem when config TC during resetting
        net: hns3: use the user's cfg after reset
        net: hns3: fix wrong use of semaphore up
        selftests: net: lib: kill PIDs before del netns
        pse-core: Conditionally set current limit during PI regulator registration
        net: thunder_bgx: Fix netdev structure allocation
        net: ethtool: Allow write mechanism of LPL and both LPL and EPL
        vsock: fix recursive ->recvmsg calls
        selftest: af_unix: Fix kselftest compilation warnings
        netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
        netfilter: nf_tables: Introduce nf_tables_getobj_single
        netfilter: nf_tables: Audit log dump reset after the fact
        selftests: netfilter: add test for br_netfilter+conntrack+queue combination
        netfilter: nf_queue: drop packets with cloned unconfirmed conntracks
        netfilter: flowtable: initialise extack before use
        netfilter: nfnetlink: Initialise extack before use in ACKs
        netfilter: allow ipv6 fragments to arrive on different devices
        tcp: Update window clamping condition
        ...
      a4a35f6c
    • Linus Torvalds's avatar
      Merge tag 'media/v6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 20573d8e
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
       "Two regression fixes:
      
         - fix atomisp support for ISP2400
      
         - fix dvb-usb regression for TeVii s480 dual DVB-S2 S660 board"
      
      * tag 'media/v6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        media: atomisp: Fix streaming no longer working on BYT / ISP2400 devices
        media: Revert "media: dvb-usb: Fix unexpected infinite loop in dvb_usb_read_remote_control()"
      20573d8e
    • Linus Torvalds's avatar
      Merge tag 'ata-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux · 6e80a1fd
      Linus Torvalds authored
      Pull ata fix from Niklas Cassel:
      
       - Revert a recent change to sense data generation.
      
         Sense data can be in either fixed format or descriptor format.
      
         The D_SENSE bit in the Control mode page controls which format to
         generate. All places but one respected the D_SENSE bit.
      
         The recent change fixed the one place that didn't respect the D_SENSE
         bit. However, it turns out that hdparm, hddtemp and udisks
         (incorrectly) assumes sense data in descriptor format.
      
         Therefore, even while the change was technically correct, revert it,
         since even if these user space programs are fixed to (correctly) look
         at the format type before parsing the data, older versions of these
         tools will be around roughly forever.
      
      * tag 'ata-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
        Revert "ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error"
      6e80a1fd
    • Paolo Abeni's avatar
      Merge tag 'nf-24-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 9c5af2d7
      Paolo Abeni authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Ignores ifindex for types other than mcast/linklocal in ipv6 frag
         reasm, from Tom Hughes.
      
      2) Initialize extack for begin/end netlink message marker in batch,
         from Donald Hunter.
      
      3) Initialize extack for flowtable offload support, also from Donald.
      
      4) Dropped packets with cloned unconfirmed conntracks in nfqueue,
         later it should be possible to explore lookup after reinject but
         Florian prefers this approach at this stage. From Florian Westphal.
      
      5) Add selftest for cloned unconfirmed conntracks in nfqueue for
         previous update.
      
      6) Audit after filling netlink header successfully in object dump,
         from Phil Sutter.
      
      7-8) Fix concurrent dump and reset which could result in underflow
           counter / quota objects.
      
      netfilter pull request 24-08-15
      
      * tag 'nf-24-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
        netfilter: nf_tables: Introduce nf_tables_getobj_single
        netfilter: nf_tables: Audit log dump reset after the fact
        selftests: netfilter: add test for br_netfilter+conntrack+queue combination
        netfilter: nf_queue: drop packets with cloned unconfirmed conntracks
        netfilter: flowtable: initialise extack before use
        netfilter: nfnetlink: Initialise extack before use in ACKs
        netfilter: allow ipv6 fragments to arrive on different devices
      ====================
      
      Link: https://patch.msgid.link/20240814222042.150590-1-pablo@netfilter.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9c5af2d7
    • Paolo Abeni's avatar
      Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver' · 34dfdf21
      Paolo Abeni authored
      Jijie Shao says:
      
      ====================
      There are some bugfix for the HNS3 ethernet driver
      ====================
      
      Link: https://patch.msgid.link/20240813141024.1707252-1-shaojijie@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      34dfdf21
    • Peiyang Wang's avatar
      net: hns3: use correct release function during uninitialization · 7660833d
      Peiyang Wang authored
      pci_request_regions is called to apply for PCI I/O and memory resources
      when the driver is initialized, Therefore, when the driver is uninstalled,
      pci_release_regions should be used to release PCI I/O and memory resources
      instead of pci_release_mem_regions is used to release memory reasouces
      only.
      Signed-off-by: default avatarPeiyang Wang <wangpeiyang1@huawei.com>
      Signed-off-by: default avatarJijie Shao <shaojijie@huawei.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7660833d
    • Peiyang Wang's avatar
      net: hns3: void array out of bound when loop tnl_num · 86db7bfb
      Peiyang Wang authored
      When query reg inf of SSU, it loops tnl_num times. However, tnl_num comes
      from hardware and the length of array is a fixed value. To void array out
      of bound, make sure the loop time is not greater than the length of array
      Signed-off-by: default avatarPeiyang Wang <wangpeiyang1@huawei.com>
      Signed-off-by: default avatarJijie Shao <shaojijie@huawei.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      86db7bfb
    • Jie Wang's avatar
      net: hns3: fix a deadlock problem when config TC during resetting · be5e816d
      Jie Wang authored
      When config TC during the reset process, may cause a deadlock, the flow is
      as below:
                                   pf reset start
                                       │
                                       ▼
                                    ......
      setup tc                         │
          │                            ▼
          ▼                      DOWN: napi_disable()
      napi_disable()(skip)             │
          │                            │
          ▼                            ▼
        ......                      ......
          │                            │
          ▼                            │
      napi_enable()                    │
                                       ▼
                                 UINIT: netif_napi_del()
                                       │
                                       ▼
                                    ......
                                       │
                                       ▼
                                 INIT: netif_napi_add()
                                       │
                                       ▼
                                    ......                 global reset start
                                       │                      │
                                       ▼                      ▼
                                 UP: napi_enable()(skip)    ......
                                       │                      │
                                       ▼                      ▼
                                    ......                 napi_disable()
      
      In reset process, the driver will DOWN the port and then UINIT, in this
      case, the setup tc process will UP the port before UINIT, so cause the
      problem. Adds a DOWN process in UINIT to fix it.
      
      Fixes: bb6b94a8 ("net: hns3: Add reset interface implementation in client")
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarJijie Shao <shaojijie@huawei.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      be5e816d
    • Peiyang Wang's avatar
      net: hns3: use the user's cfg after reset · 30545e17
      Peiyang Wang authored
      Consider the followed case that the user change speed and reset the net
      interface. Before the hw change speed successfully, the driver get old
      old speed from hw by timer task. After reset, the previous speed is config
      to hw. As a result, the new speed is configed successfully but lost after
      PF reset. The followed pictured shows more dirrectly.
      
      +------+              +----+                 +----+
      | USER |              | PF |                 | HW |
      +---+--+              +-+--+                 +-+--+
          |  ethtool -s 100G  |                      |
          +------------------>|   set speed 100G     |
          |                   +--------------------->|
          |                   |  set successfully    |
          |                   |<---------------------+---+
          |                   |query cfg (timer task)|   |
          |                   +--------------------->|   | handle speed
          |                   |     return 200G      |   | changing event
          |  ethtool --reset  |<---------------------+   | (100G)
          +------------------>|  cfg previous speed  |<--+
          |                   |  after reset (200G)  |
          |                   +--------------------->|
          |                   |                      +---+
          |                   |query cfg (timer task)|   |
          |                   +--------------------->|   | handle speed
          |                   |     return 100G      |   | changing event
          |                   |<---------------------+   | (200G)
          |                   |                      |<--+
          |                   |query cfg (timer task)|
          |                   +--------------------->|
          |                   |     return 200G      |
          |                   |<---------------------+
          |                   |                      |
          v                   v                      v
      
      This patch save new speed if hw change speed successfully, which will be
      used after reset successfully.
      
      Fixes: 2d03eacc ("net: hns3: Only update mac configuation when necessary")
      Signed-off-by: default avatarPeiyang Wang <wangpeiyang1@huawei.com>
      Signed-off-by: default avatarJijie Shao <shaojijie@huawei.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      30545e17
    • Jie Wang's avatar
      net: hns3: fix wrong use of semaphore up · 8445d9d3
      Jie Wang authored
      Currently, if hns3 PF or VF FLR reset failed after five times retry,
      the reset done process will directly release the semaphore
      which has already released in hclge_reset_prepare_general.
      This will cause down operation fail.
      
      So this patch fixes it by adding reset state judgement. The up operation is
      only called after successful PF FLR reset.
      
      Fixes: 8627bded ("net: hns3: refactor the precedure of PF FLR")
      Fixes: f28368bb ("net: hns3: refactor the procedure of VF FLR")
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarJijie Shao <shaojijie@huawei.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8445d9d3
    • Matthieu Baerts (NGI0)'s avatar
      selftests: net: lib: kill PIDs before del netns · 7965a7f3
      Matthieu Baerts (NGI0) authored
      When deleting netns, it is possible to still have some tasks running,
      e.g. background tasks like tcpdump running in the background, not
      stopped because the test has been interrupted.
      
      Before deleting the netns, it is then safer to kill all attached PIDs,
      if any. That should reduce some noises after the end of some tests, and
      help with the debugging of some issues. That's why this modification is
      seen as a "fix".
      
      Fixes: 25ae948b ("selftests/net: add lib.sh")
      Acked-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Link: https://patch.msgid.link/20240813-upstream-net-20240813-selftests-net-lib-kill-v1-1-27b689b248b8@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7965a7f3
    • Oleksij Rempel's avatar
      pse-core: Conditionally set current limit during PI regulator registration · cdc90f75
      Oleksij Rempel authored
      Fix an issue where `devm_regulator_register()` would fail for PSE
      controllers that do not support current limit control, such as simple
      GPIO-based controllers like the podl-pse-regulator. The
      `REGULATOR_CHANGE_CURRENT` flag and `max_uA` constraint are now
      conditionally set only if the `pi_set_current_limit` operation is
      supported. This change prevents the regulator registration routine from
      attempting to call `pse_pi_set_current_limit()`, which would return
      `-EOPNOTSUPP` and cause the registration to fail.
      
      Fixes: 4a83abce ("net: pse-pd: Add new power limit get and set c33 features")
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarKory Maincent <kory.maincent@bootlin.com>
      Tested-by: default avatarKyle Swenson <kyle.swenson@est.tech>
      Link: https://patch.msgid.link/20240813073719.2304633-1-o.rempel@pengutronix.deSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cdc90f75
    • Marc Zyngier's avatar
      net: thunder_bgx: Fix netdev structure allocation · 1f1b1942
      Marc Zyngier authored
      Commit 94833add ("net: thunderx: Unembed netdev structure") had
      a go at dynamically allocating the netdev structures for the thunderx_bgx
      driver.  This change results in my ThunderX box catching fire (to be fair,
      it is what it does best).
      
      The issues with this change are that:
      
      - bgx_lmac_enable() is called *after* bgx_acpi_register_phy() and
        bgx_init_of_phy(), both expecting netdev to be a valid pointer.
      
      - bgx_init_of_phy() populates the MAC addresses for *all* LMACs
        attached to a given BGX instance, and thus needs netdev for each of
        them to have been allocated.
      
      There is a few things to be said about how the driver mixes LMAC and
      BGX states which leads to this sorry state, but that's beside the point.
      
      To address this, go back to a situation where all netdev structures
      are allocated before the driver starts relying on them, and move the
      freeing of these structures to driver removal. Someone brave enough
      can always go and restructure the driver if they want.
      
      Fixes: 94833add ("net: thunderx: Unembed netdev structure")
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: Breno Leitao <leitao@debian.org>
      Cc: Sunil Goutham <sgoutham@marvell.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarBreno Leitao <leitao@debian.org>
      Link: https://patch.msgid.link/20240812141322.1742918-1-maz@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      1f1b1942
    • Danielle Ratson's avatar
      net: ethtool: Allow write mechanism of LPL and both LPL and EPL · fde25c20
      Danielle Ratson authored
      CMIS 5.2 standard section 9.4.2 defines four types of firmware update
      supported mechanism: None, only LPL, only EPL, both LPL and EPL.
      
      Currently, only LPL (Local Payload) type of write firmware block is
      supported. However, if the module supports both LPL and EPL the flashing
      process wrongly fails for no supporting LPL.
      
      Fix that, by allowing the write mechanism to be LPL or both LPL and
      EPL.
      
      Fixes: c4f78134 ("ethtool: cmis_fw_update: add a layer for supporting firmware update using CDB")
      Reported-by: default avatarVladyslav Mykhaliuk <vmykhaliuk@nvidia.com>
      Signed-off-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Link: https://patch.msgid.link/20240812140824.3718826-1-danieller@nvidia.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      fde25c20
    • Cong Wang's avatar
      vsock: fix recursive ->recvmsg calls · 69139d29
      Cong Wang authored
      After a vsock socket has been added to a BPF sockmap, its prot->recvmsg
      has been replaced with vsock_bpf_recvmsg(). Thus the following
      recursiion could happen:
      
      vsock_bpf_recvmsg()
       -> __vsock_recvmsg()
        -> vsock_connectible_recvmsg()
         -> prot->recvmsg()
          -> vsock_bpf_recvmsg() again
      
      We need to fix it by calling the original ->recvmsg() without any BPF
      sockmap logic in __vsock_recvmsg().
      
      Fixes: 634f1a71 ("vsock: support sockmap")
      Reported-by: syzbot+bdb4bd87b5e22058e2a4@syzkaller.appspotmail.com
      Tested-by: syzbot+bdb4bd87b5e22058e2a4@syzkaller.appspotmail.com
      Cc: Bobby Eshleman <bobby.eshleman@bytedance.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Stefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Link: https://patch.msgid.link/20240812022153.86512-1-xiyou.wangcong@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      69139d29
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2024-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · b2ca1661
      Jakub Kicinski authored
      Kalle Valo says:
      
      ====================
      wireless fixes for v6.11
      
      We have few fixes to drivers. The most important here is a fix for
      iwlwifi which caused major slowdowns for several users.
      
      * tag 'wireless-2024-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
        wifi: iwlwifi: correctly lookup DMA address in SG table
        wifi: mt76: mt7921: fix NULL pointer access in mt7921_ipv6_addr_change
        wifi: brcmfmac: cfg80211: Handle SSID based pmksa deletion
        wifi: rtlwifi: rtl8192du: Initialise value32 in _rtl92du_init_queue_reserved_page
        wifi: ath12k: use 128 bytes aligned iova in transmit path for WCN7850
      ====================
      
      Link: https://patch.msgid.link/20240814171606.E14A0C116B1@smtp.kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b2ca1661
    • Abhinav Jain's avatar
      selftest: af_unix: Fix kselftest compilation warnings · 6c569b77
      Abhinav Jain authored
      Change expected_buf from (const void *) to (const char *)
      in function __recvpair().
      This change fixes the below warnings during test compilation:
      
      ```
      In file included from msg_oob.c:14:
      msg_oob.c: In function ‘__recvpair’:
      
      ../../kselftest_harness.h:106:40: warning: format ‘%s’ expects argument
      of type ‘char *’,but argument 6 has type ‘const void *’ [-Wformat=]
      
      ../../kselftest_harness.h:101:17: note: in expansion of macro ‘__TH_LOG’
      msg_oob.c:235:17: note: in expansion of macro ‘TH_LOG’
      
      ../../kselftest_harness.h:106:40: warning: format ‘%s’ expects argument
      of type ‘char *’,but argument 6 has type ‘const void *’ [-Wformat=]
      
      ../../kselftest_harness.h:101:17: note: in expansion of macro ‘__TH_LOG’
      msg_oob.c:259:25: note: in expansion of macro ‘TH_LOG’
      ```
      
      Fixes: d098d772 ("selftest: af_unix: Add msg_oob.c.")
      Signed-off-by: default avatarAbhinav Jain <jain.abhinav177@gmail.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://patch.msgid.link/20240814080743.1156166-1-jain.abhinav177@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6c569b77
    • Linus Torvalds's avatar
      Merge tag 'for-6.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 1fb91896
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
      
       - extend tree-checker verification of directory item type
      
       - fix regression in page/folio and extent state tracking in xarray, the
         dirty status can get out of sync and can cause problems e.g. a hang
      
       - in send, detect last extent and allow to clone it instead of sending
         it as write, reduces amount of data transferred in the stream
      
       - fix checking extent references when cleaning deleted subvolumes
      
       - fix one more case in the extent map shrinker, let it run only in the
         kswapd context so it does not cause latency spikes during other
         operations
      
      * tag 'for-6.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fix invalid mapping of extent xarray state
        btrfs: send: allow cloning non-aligned extent if it ends at i_size
        btrfs: only run the extent map shrinker from kswapd tasks
        btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
        btrfs: check delayed refs when we're checking if a ref exists
      1fb91896
  3. 14 Aug, 2024 7 commits