1. 04 Mar, 2022 11 commits
  2. 03 Mar, 2022 29 commits
    • Jacob Keller's avatar
      ice: convert VF storage to hash table with krefs and RCU · 3d5985a1
      Jacob Keller authored
      The ice driver stores VF structures in a simple array which is allocated
      once at the time of VF creation. The VF structures are then accessed
      from the array by their VF ID. The ID must be between 0 and the number
      of allocated VFs.
      
      Multiple threads can access this table:
      
       * .ndo operations such as .ndo_get_vf_cfg or .ndo_set_vf_trust
       * interrupts, such as due to messages from the VF using the virtchnl
         communication
       * processing such as device reset
       * commands to add or remove VFs
      
      The current implementation does not keep track of when all threads are
      done operating on a VF and can potentially result in use-after-free
      issues caused by one thread accessing a VF structure after it has been
      released when removing VFs. Some of these are prevented with various
      state flags and checks.
      
      In addition, this structure is quite static and does not support a
      planned future where virtualization can be more dynamic. As we begin to
      look at supporting Scalable IOV with the ice driver (as opposed to just
      supporting Single Root IOV), this structure is not sufficient.
      
      In the future, VFs will be able to be added and removed individually and
      dynamically.
      
      To allow for this, and to better protect against a whole class of
      use-after-free bugs, replace the VF storage with a combination of a hash
      table and krefs to reference track all of the accesses to VFs through
      the hash table.
      
      A hash table still allows efficient look up of the VF given its ID, but
      also allows adding and removing VFs. It does not require contiguous VF
      IDs.
      
      The use of krefs allows the cleanup of the VF memory to be delayed until
      after all threads have released their reference (by calling ice_put_vf).
      
      To prevent corruption of the hash table, a combination of RCU and the
      mutex table_lock are used. Addition and removal from the hash table use
      the RCU-aware hash macros. This allows simple read-only look ups that
      iterate to locate a single VF can be fast using RCU. Accesses which
      modify the hash table, or which can't take RCU because they sleep, will
      hold the mutex lock.
      
      By using this design, we have a stronger guarantee that the VF structure
      can't be released until after all threads are finished operating on it.
      We also pave the way for the more dynamic Scalable IOV implementation in
      the future.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      3d5985a1
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 80901bff
      Jakub Kicinski authored
      net/batman-adv/hard-interface.c
        commit 690bb6fb ("batman-adv: Request iflink once in batadv-on-batadv check")
        commit 6ee3c393 ("batman-adv: Demote batadv-on-batadv skip error message")
      https://lore.kernel.org/all/20220302163049.101957-1-sw@simonwunderlich.de/
      
      net/smc/af_smc.c
        commit 4d08b7b5 ("net/smc: Fix cleanup when register ULP fails")
        commit 462791bb ("net/smc: add sysctl interface for SMC")
      https://lore.kernel.org/all/20220302112209.355def40@canb.auug.org.au/Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      80901bff
    • Linus Torvalds's avatar
      Merge tag 'net-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · b949c21f
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from can, xfrm, wifi, bluetooth, and netfilter.
      
        Lots of various size fixes, the length of the tag speaks for itself.
        Most of the 5.17-relevant stuff comes from xfrm, wifi and bt trees
        which had been lagging as you pointed out previously. But there's also
        a larger than we'd like portion of fixes for bugs from previous
        releases.
      
        Three more fixes still under discussion, including and xfrm revert for
        uAPI error.
      
        Current release - regressions:
      
         - iwlwifi: don't advertise TWT support, prevent FW crash
      
         - xfrm: fix the if_id check in changelink
      
         - xen/netfront: destroy queues before real_num_tx_queues is zeroed
      
         - bluetooth: fix not checking MGMT cmd pending queue, make scanning
           work again
      
        Current release - new code bugs:
      
         - mptcp: make SIOCOUTQ accurate for fallback socket
      
         - bluetooth: access skb->len after null check
      
         - bluetooth: hci_sync: fix not using conn_timeout
      
         - smc: fix cleanup when register ULP fails
      
         - dsa: restore error path of dsa_tree_change_tag_proto
      
         - iwlwifi: fix build error for IWLMEI
      
         - iwlwifi: mvm: propagate error from request_ownership to the user
      
        Previous releases - regressions:
      
         - xfrm: fix pMTU regression when reported pMTU is too small
      
         - xfrm: fix TCP MSS calculation when pMTU is close to 1280
      
         - bluetooth: fix bt_skb_sendmmsg not allocating partial chunks
      
         - ipv6: ensure we call ipv6_mc_down() at most once, prevent leaks
      
         - ipv6: prevent leaks in igmp6 when input queues get full
      
         - fix up skbs delta_truesize in UDP GRO frag_list
      
         - eth: e1000e: fix possible HW unit hang after an s0ix exit
      
         - eth: e1000e: correct NVM checksum verification flow
      
         - ptp: ocp: fix large time adjustments
      
        Previous releases - always broken:
      
         - tcp: make tcp_read_sock() more robust in presence of urgent data
      
         - xfrm: distinguishing SAs and SPs by if_id in xfrm_migrate
      
         - xfrm: fix xfrm_migrate issues when address family changes
      
         - dcb: flush lingering app table entries for unregistered devices
      
         - smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error
      
         - mac80211: fix EAPoL rekey fail in 802.3 rx path
      
         - mac80211: fix forwarded mesh frames AC & queue selection
      
         - netfilter: nf_queue: fix socket access races and bugs
      
         - batman-adv: fix ToCToU iflink problems and check the result belongs
           to the expected net namespace
      
         - can: gs_usb, etas_es58x: fix opened_channel_cnt's accounting
      
         - can: rcar_canfd: register the CAN device when fully ready
      
         - eth: igb, igc: phy: drop premature return leaking HW semaphore
      
         - eth: ixgbe: xsk: change !netif_carrier_ok() handling in
           ixgbe_xmit_zc(), prevent live lock when link goes down
      
         - eth: stmmac: only enable DMA interrupts when ready
      
         - eth: sparx5: move vlan checks before any changes are made
      
         - eth: iavf: fix races around init, removal, resets and vlan ops
      
         - ibmvnic: more reset flow fixes
      
        Misc:
      
         - eth: fix return value of __setup handlers"
      
      * tag 'net-5.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
        ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report()
        net: dsa: make dsa_tree_change_tag_proto actually unwind the tag proto change
        ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc()
        selftests: mlxsw: resource_scale: Fix return value
        selftests: mlxsw: tc_police_scale: Make test more robust
        net: dcb: disable softirqs in dcbnl_flush_dev()
        bnx2: Fix an error message
        sfc: extend the locking on mcdi->seqno
        net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error cause by server
        net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error generated by client
        net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()
        tcp: make tcp_read_sock() more robust
        bpf, sockmap: Do not ignore orig_len parameter
        net: ipa: add an interconnect dependency
        net: fix up skbs delta_truesize in UDP GRO frag_list
        iwlwifi: mvm: return value for request_ownership
        nl80211: Update bss channel on channel switch for P2P_CLIENT
        iwlwifi: fix build error for IWLMEI
        ptp: ocp: Add ptp_ocp_adjtime_coarse for large adjustments
        batman-adv: Don't expect inter-netns unique iflink indices
        ...
      b949c21f
    • Jacob Keller's avatar
      ice: introduce VF accessor functions · fb916db1
      Jacob Keller authored
      Before we switch the VF data structure storage mechanism to a hash,
      introduce new accessor functions to define the new interface.
      
      * ice_get_vf_by_id is a function used to obtain a reference to a VF from
        the table based on its VF ID
      * ice_has_vfs is used to quickly check if any VFs are configured
      * ice_get_num_vfs is used to get an exact count of how many VFs are
        configured
      
      We can drop the old ice_validate_vf_id function, since every caller was
      just going to immediately access the VF table to get a reference
      anyways. This way we simply use the single ice_get_vf_by_id to both
      validate the VF ID is within range and that there exists a VF with that
      ID.
      
      This change enables us to more easily convert the codebase to the hash
      table since most callers now properly use the interface.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      fb916db1
    • Jacob Keller's avatar
      ice: factor VF variables to separate structure · 000773c0
      Jacob Keller authored
      We maintain a number of values for VFs within the ice_pf structure. This
      includes the VF table, the number of allocated VFs, the maximum number
      of supported SR-IOV VFs, the number of queue pairs per VF, the number of
      MSI-X vectors per VF, and a bitmap of the VFs with detected MDD events.
      
      We're about to add a few more variables to this list. Clean this up
      first by extracting these members out into a new ice_vfs structure
      defined in ice_virtchnl_pf.h
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      000773c0
    • Linus Torvalds's avatar
      Merge tag 'mips-fixes-5.17_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · e58bd49d
      Linus Torvalds authored
      Pull MIPS fixes from Thomas Bogendoerfer:
      
       - Fix memory detection for MT7621 devices
      
       - Fix setnocoherentio kernel option
      
       - Fix warning when CONFIG_SCHED_CORE is enabled
      
      * tag 'mips-fixes-5.17_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
        MIPS: ralink: mt7621: use bitwise NOT instead of logical
        mips: setup: fix setnocoherentio() boolean setting
        MIPS: smp: fill in sibling and core maps earlier
        MIPS: ralink: mt7621: do memory detection on KSEG1
      e58bd49d
    • Linus Torvalds's avatar
      Merge tag 'auxdisplay-for-linus-v5.17-rc7' of git://github.com/ojeda/linux · 4d5ae234
      Linus Torvalds authored
      Pull auxdisplay fixes from Miguel Ojeda:
       "A few lcd2s fixes from Andy Shevchenko"
      
      * tag 'auxdisplay-for-linus-v5.17-rc7' of git://github.com/ojeda/linux:
        auxdisplay: lcd2s: Use proper API to free the instance of charlcd object
        auxdisplay: lcd2s: Fix memory leak in ->remove()
        auxdisplay: lcd2s: Fix lcd2s_redefine_char() feature
      4d5ae234
    • Eric Dumazet's avatar
      ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report() · 2d3916f3
      Eric Dumazet authored
      While investigating on why a synchronize_net() has been added recently
      in ipv6_mc_down(), I found that igmp6_event_query() and igmp6_event_report()
      might drop skbs in some cases.
      
      Discussion about removing synchronize_net() from ipv6_mc_down()
      will happen in a different thread.
      
      Fixes: f185de28 ("mld: add new workqueues for process mld events")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Taehee Yoo <ap420073@gmail.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: David Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20220303173728.937869-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2d3916f3
    • Jacob Keller's avatar
      ice: convert ice_for_each_vf to include VF entry iterator · c4c2c7db
      Jacob Keller authored
      The ice_for_each_vf macro is intended to be used to loop over all VFs.
      The current implementation relies on an iterator that is the index into
      the VF array in the PF structure. This forces all users to perform a
      look up themselves.
      
      This abstraction forces a lot of duplicate work on callers and leaks the
      interface implementation to the caller. Replace this with an
      implementation that includes the VF pointer the primary iterator. This
      version simplifies callers which just want to iterate over every VF, as
      they no longer need to perform their own lookup.
      
      The "i" iterator value is replaced with a new unsigned int "bkt"
      parameter, as this will match the necessary interface for replacing
      the VF array with a hash table. For now, the bkt is the VF ID, but in
      the future it will simply be the hash bucket index. Document that it
      should not be treated as a VF ID.
      
      This change aims to simplify switching from the array to a hash table. I
      considered alternative implementations such as an xarray but decided
      that the hash table was the simplest and most suitable implementation. I
      also looked at methods to hide the bkt iterator entirely, but I couldn't
      come up with a feasible solution that worked for hash table iterators.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      c4c2c7db
    • Jacob Keller's avatar
      ice: use ice_for_each_vf for iteration during removal · 19281e86
      Jacob Keller authored
      When removing VFs, the driver takes a weird approach of assigning
      pf->num_alloc_vfs to 0 before iterating over the VFs using a temporary
      variable.
      
      This logic has been in the driver for a long time, and seems to have
      been carried forward from i40e.
      
      We want to refactor the way VFs are stored, and iterating over the data
      structure without the ice_for_each_vf interface impedes this work.
      
      The logic relies on implicitly using the num_alloc_vfs as a sort of
      "safe guard" for accessing VF data.
      
      While this sort of guard makes sense for Single Root IOV where all VFs
      are added at once, the data structures don't work for VFs which can be
      added and removed dynamically. We also have a separate state flag,
      ICE_VF_DEINIT_IN_PROGRESS which is a stronger protection against
      concurrent removal and access.
      
      Avoid the custom tmp iteration and replace it with the standard
      ice_for_each_vf iterator. Delay the assignment of num_alloc_vfs until
      after this loop finishes.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      19281e86
    • Jacob Keller's avatar
      ice: remove checks in ice_vc_send_msg_to_vf · 59e1f857
      Jacob Keller authored
      The ice_vc_send_msg_to_vf function is used by the PF to send a response
      to a VF. This function has overzealous checks to ensure its not passed a
      NULL VF pointer and to ensure that the passed in struct ice_vf has a
      valid vf_id sub-member.
      
      These checks have existed since commit 1071a835 ("ice: Implement
      virtchnl commands for AVF support") and function as simple sanity
      checks.
      
      We are planning to refactor the ice driver to use a hash table along
      with appropriate locks in a future refactor. This change will modify how
      the ice_validate_vf_id function works. Instead of a simple >= check to
      ensure the VF ID is between some range, it will check the hash table to
      see if the specified VF ID is actually in the table. This requires that
      the function properly lock the table to prevent race conditions.
      
      The checks may seem ok at first glance, but they don't really provide
      much benefit.
      
      In order for ice_vc_send_msg_to_vf to have these checks fail, the
      callers must either (1) pass NULL as the VF, (2) construct an invalid VF
      pointer manually, or (3) be using a VF pointer which becomes invalid
      after they obtain it properly using ice_get_vf_by_id.
      
      For (1), a cursory glance over callers of ice_vc_send_msg_to_vf can show
      that in most cases the functions already operate assuming their VF
      pointer is valid, such as by derferencing vf->pf or other members.
      
      They obtain the VF pointer by accessing the VF array using the VF ID,
      which can never produce a NULL value (since its a simple address
      operation on the array it will not be NULL.
      
      The sole exception for (1) is that ice_vc_process_vf_msg will forward a
      NULL VF pointer to this function as part of its goto error handler
      logic. This requires some minor cleanup to simply exit immediately when
      an invalid VF ID is detected (Rather than use the same error flow as
      the rest of the function).
      
      For (2), it is unexpected for a flow to construct a VF pointer manually
      instead of accessing the VF array. Defending against this is likely to
      just hide bad programming.
      
      For (3), it is definitely true that VF pointers could become invalid,
      for example if a thread is processing a VF message while the VF gets
      removed. However, the correct solution is not to add additional checks
      like this which do not guarantee to prevent the race. Instead we plan to
      solve the root of the problem by preventing the possibility entirely.
      
      This solution will require the change to a hash table with proper
      locking and reference counts of the VF structures. When this is done,
      ice_validate_vf_id will require locking of the hash table. This will be
      problematic because all of the callers of ice_vc_send_msg_to_vf will
      already have to take the lock to obtain the VF pointer anyways. With a
      mutex, this leads to a double lock that could hang the kernel thread.
      
      Avoid this by removing the checks which don't provide much value, so
      that we can safely add the necessary protections properly.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      59e1f857
    • Jacob Keller's avatar
      ice: move VFLR acknowledge during ice_free_vfs · 44efe75f
      Jacob Keller authored
      After removing all VFs, the driver clears the VFLR indication for VFs.
      This has been in ice since the beginning of SR-IOV support in the ice
      driver.
      
      The implementation was copied from i40e, and the motivation for the VFLR
      indication clearing is described in the commit f7414531 ("i40e:
      acknowledge VFLR when disabling SR-IOV")
      
      The commit explains that we need to clear the VFLR indication because
      the virtual function undergoes a VFLR event. If we don't indicate that
      it is complete it can cause an issue when VFs are re-enabled due to
      a "phantom" VFLR.
      
      The register block read was added under a pci_vfs_assigned check
      originally. This was done because we added the check after calling
      pci_disable_sriov. This was later moved to disable SRIOV earlier in the
      flow so that the VF drivers could be torn down before we removed
      functionality.
      
      Move the VFLR acknowledge into the main loop that tears down VF
      resources. This avoids using the tmp value for iterating over VFs
      multiple times. The result will make it easier to refactor the VF array
      in a future change.
      
      It's possible we might want to modify this flow to also stop checking
      pci_vfs_assigned. However, it seems reasonable to keep this change: we
      should only clear the VFLR if we actually disabled SR-IOV.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      44efe75f
    • Jacob Keller's avatar
      ice: move clear_malvf call in ice_free_vfs · 294627a6
      Jacob Keller authored
      The ice_mbx_clear_malvf function is used to clear the indication and
      count of how many times a VF was detected as malicious. During
      ice_free_vfs, we use this function to ensure that all removed VFs are
      reset to a clean state.
      
      The call currently is done at the end of ice_free_vfs() using a tmp
      value to iterate over all of the entries in the bitmap.
      
      This separate iteration using tmp is problematic for a planned refactor
      of the VF array data structure. To avoid this, lets move the call
      slightly higher into the function inside the loop where we teardown all
      of the VFs. This avoids one use of the tmp value used for iteration.
      We'll fix the other user in a future change.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      294627a6
    • Jacob Keller's avatar
      ice: pass num_vfs to ice_set_per_vf_res() · cd0f4f3b
      Jacob Keller authored
      We are planning to replace the simple array structure tracking VFs with
      a hash table. This change will also remove the "num_alloc_vfs" variable.
      
      Instead, new access functions to use the hash table as the source of
      truth will be introduced. These will generally be equivalent to existing
      checks, except during VF initialization.
      
      Specifically, ice_set_per_vf_res() cannot use the hash table as it will
      be operating prior to VF structures being inserted into the hash table.
      
      Instead of using pf->num_alloc_vfs, simply pass the num_vfs value in
      from the caller.
      
      Note that a sub-function of ice_set_per_vf_res, ice_determine_res, also
      implicitly depends on pf->num_alloc_vfs. Replace ice_determine_res with
      a simpler inline implementation based on rounddown_pow_of_two. Note that
      we must explicitly check that the argument is non-zero since it does not
      play well with zero as a value.
      
      Instead of using the function and while loop, simply calculate the
      number of queues we have available by dividing by num_vfs. Check if the
      desired queues are available. If not, round down to the nearest power of
      2 that fits within our available queues.
      
      This matches the behavior of ice_determine_res but is easier to follow
      as simple in-line logic. Remove ice_determine_res entirely.
      
      With this change, we no longer depend on the pf->num_alloc_vfs during
      the initialization phase of VFs. This will allow us to safely remove it
      in a future planned refactor of the VF data structures.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      cd0f4f3b
    • Jacob Keller's avatar
      ice: store VF pointer instead of VF ID · b03d519d
      Jacob Keller authored
      The VSI structure contains a vf_id field used to associate a VSI with a
      VF. This is used mainly for ICE_VSI_VF as well as partially for
      ICE_VSI_CTRL associated with the VFs.
      
      This API was designed with the idea that VFs are stored in a simple
      array that was expected to be static throughout most of the driver's
      life.
      
      We plan on refactoring VF storage in a few key ways:
      
        1) converting from a simple static array to a hash table
        2) using krefs to track VF references obtained from the hash table
        3) use RCU to delay release of VF memory until after all references
           are dropped
      
      This is motivated by the goal to ensure that the lifetime of VF
      structures is accounted for, and prevent various use-after-free bugs.
      
      With the existing vsi->vf_id, the reference tracking for VFs would
      become somewhat convoluted, because each VSI maintains a vf_id field
      which will then require performing a look up. This means all these flows
      will require reference tracking and proper usage of rcu_read_lock, etc.
      
      We know that the VF VSI will always be backed by a valid VF structure,
      because the VSI is created during VF initialization and removed before
      the VF is destroyed. Rely on this and store a reference to the VF in the
      VSI structure instead of storing a VF ID. This will simplify the usage
      and avoid the need to perform lookups on the hash table in the future.
      
      For ICE_VSI_VF, it is expected that vsi->vf is always non-NULL after
      ice_vsi_alloc succeeds. Because of this, use WARN_ON when checking if a
      vsi->vf pointer is valid when dealing with VF VSIs. This will aid in
      debugging code which violates this assumption and avoid more disastrous
      panics.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      b03d519d
    • Jacob Keller's avatar
      ice: refactor unwind cleanup in eswitch mode · df830543
      Jacob Keller authored
      The code for supporting eswitch mode and port representors on VFs uses
      an unwind based cleanup flow when handling errors.
      
      These flows are used to cleanup and get everything back to the state
      prior to attempting to switch from legacy to representor mode or back.
      
      The unwind iterations make sense, but complicate a plan to refactor the
      VF array structure. In the future we won't have a clean method of
      reversing an iteration of the VFs.
      
      Instead, we can change the cleanup flow to just iterate over all VF
      structures and clean up appropriately.
      
      First notice that ice_repr_add_for_all_vfs and ice_repr_rem_from_all_vfs
      have an additional step of re-assigning the VC ops. There is no good
      reason to do this outside of ice_repr_add and ice_repr_rem. It can
      simply be done as the last step of these functions.
      
      Second, make sure ice_repr_rem is safe to call on a VF which does not
      have a representor. Check if vf->repr is NULL first and exit early if
      so.
      
      Move ice_repr_rem_from_all_vfs above ice_repr_add_for_all_vfs so that we
      can call it from the cleanup function.
      
      In ice_eswitch.c, replace the unwind iteration with a call to
      ice_eswitch_release_reprs. This will go through all of the VFs and
      revert the VF back to the standard model without the eswitch mode.
      
      To make this safe, ensure this function checks whether or not the
      represent or has been moved. Rely on the metadata destination in
      vf->repr->dst. This must be NULL if the representor has not been moved
      to eswitch mode.
      
      Ensure that we always re-assign this value back to NULL after freeing
      it, and move the ice_eswitch_release_reprs so that it can be called from
      the setup function.
      
      With these changes, eswitch cleanup no longer uses an unwind flow that
      is problematic for the planned VF data structure change.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      df830543
    • Vladimir Oltean's avatar
      net: dsa: make dsa_tree_change_tag_proto actually unwind the tag proto change · e1bec7fa
      Vladimir Oltean authored
      The blamed commit said one thing but did another. It explains that we
      should restore the "return err" to the original "goto out_unwind_tagger",
      but instead it replaced it with "goto out_unlock".
      
      When DSA_NOTIFIER_TAG_PROTO fails after the first switch of a
      multi-switch tree, the switches would end up not using the same tagging
      protocol.
      
      Fixes: 0b0e2ff1 ("net: dsa: restore error path of dsa_tree_change_tag_proto")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220303154249.1854436-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e1bec7fa
    • Maciej Fijalkowski's avatar
      ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc() · 6c7273a2
      Maciej Fijalkowski authored
      Commit c685c69f ("ixgbe: don't do any AF_XDP zero-copy transmit if
      netif is not OK") addressed the ring transient state when
      MEM_TYPE_XSK_BUFF_POOL was being configured which in turn caused the
      interface to through down/up. Maurice reported that when carrier is not
      ok and xsk_pool is present on ring pair, ksoftirqd will consume 100% CPU
      cycles due to the constant NAPI rescheduling as ixgbe_poll() states that
      there is still some work to be done.
      
      To fix this, do not set work_done to false for a !netif_carrier_ok().
      
      Fixes: c685c69f ("ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OK")
      Reported-by: default avatarMaurice Baijens <maurice.baijens@ellips.com>
      Tested-by: default avatarMaurice Baijens <maurice.baijens@ellips.com>
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6c7273a2
    • Jakub Kicinski's avatar
      Merge branch 'selftests-mlxsw-a-couple-of-fixes' · 312f2d50
      Jakub Kicinski authored
      Ido Schimmel says:
      
      ====================
      selftests: mlxsw: A couple of fixes
      
      Patch #1 fixes a breakage due to a change in iproute2 output. The real
      problem is not iproute2, but the fact that the check was not strict
      enough. Fixed by using JSON output instead. Targeting at net so that the
      test will pass as part of old and new kernels regardless of iproute2
      version.
      
      Patch #2 fixes an issue uncovered by the first one.
      ====================
      
      Link: https://lore.kernel.org/r/20220302161447.217447-1-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      312f2d50
    • Amit Cohen's avatar
      selftests: mlxsw: resource_scale: Fix return value · 196f9bc0
      Amit Cohen authored
      The test runs several test cases and is supposed to return an error in
      case at least one of them failed.
      
      Currently, the check of the return value of each test case is in the
      wrong place, which can result in the wrong return value. For example:
      
       # TESTS='tc_police' ./resource_scale.sh
       TEST: 'tc_police' [default] 968                                     [FAIL]
               tc police offload count failed
       Error: mlxsw_spectrum: Failed to allocate policer index.
       We have an error talking to the kernel
       Command failed /tmp/tmp.i7Oc5HwmXY:969
       TEST: 'tc_police' [default] overflow 969                            [ OK ]
       ...
       TEST: 'tc_police' [ipv4_max] overflow 969                           [ OK ]
      
       $ echo $?
       0
      
      Fix this by moving the check to be done after each test case.
      
      Fixes: 059b18e2 ("selftests: mlxsw: Return correct error code in resource scale test")
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      196f9bc0
    • Amit Cohen's avatar
      selftests: mlxsw: tc_police_scale: Make test more robust · dc975207
      Amit Cohen authored
      The test adds tc filters and checks how many of them were offloaded by
      grepping for 'in_hw'.
      
      iproute2 commit f4cd4f127047 ("tc: add skip_hw and skip_sw to control
      action offload") added offload indication to tc actions, producing the
      following output:
      
       $ tc filter show dev swp2 ingress
       ...
       filter protocol ipv6 pref 1000 flower chain 0 handle 0x7c0
         eth_type ipv6
         dst_ip 2001:db8:1::7bf
         skip_sw
         in_hw in_hw_count 1
               action order 1:  police 0x7c0 rate 10Mbit burst 100Kb mtu 2Kb action drop overhead 0b
               ref 1 bind 1
               not_in_hw
               used_hw_stats immediate
      
      The current grep expression matches on both 'in_hw' and 'not_in_hw',
      resulting in incorrect results.
      
      Fix that by using JSON output instead.
      
      Fixes: 5061e773 ("selftests: mlxsw: Add scale test for tc-police")
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dc975207
    • Vladimir Oltean's avatar
      net: dcb: disable softirqs in dcbnl_flush_dev() · 10b6bb62
      Vladimir Oltean authored
      Ido Schimmel points out that since commit 52cff74e ("dcbnl : Disable
      software interrupts before taking dcb_lock"), the DCB API can be called
      by drivers from softirq context.
      
      One such in-tree example is the chelsio cxgb4 driver:
      dcb_rpl
      -> cxgb4_dcb_handle_fw_update
         -> dcb_ieee_setapp
      
      If the firmware for this driver happened to send an event which resulted
      in a call to dcb_ieee_setapp() at the exact same time as another
      DCB-enabled interface was unregistering on the same CPU, the softirq
      would deadlock, because the interrupted process was already holding the
      dcb_lock in dcbnl_flush_dev().
      
      Fix this unlikely event by using spin_lock_bh() in dcbnl_flush_dev() as
      in the rest of the dcbnl code.
      
      Fixes: 91b0383f ("net: dcb: flush lingering app table entries for unregistered devices")
      Reported-by: default avatarIdo Schimmel <idosch@idosch.org>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220302193939.1368823-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      10b6bb62
    • Christophe JAILLET's avatar
      bnx2: Fix an error message · 8ccffe9a
      Christophe JAILLET authored
      Fix an error message and report the correct failing function.
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ccffe9a
    • David S. Miller's avatar
      Merge branch 'ptp-ocp-next' · 25bf4df4
      David S. Miller authored
      Jonathan Lemon says:
      
      ====================
      ptp: ocp: TOD and monitoring updates
      
      Add a series of patches for monitoring the status of the
      driver and adjusting TOD handling, especially around leap seconds.
      
      Add documentation for the new sysfs nodes.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25bf4df4
    • Jonathan Lemon's avatar
      docs: ABI: Document new timecard sysfs nodes. · 4db07317
      Jonathan Lemon authored
      Add documentation for the tod_correction, clock_status_drift,
      and clock_status_offset nodes.
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4db07317
    • Vadim Fedorenko's avatar
      ptp: ocp: adjust utc_tai_offset to TOD info · e68462a0
      Vadim Fedorenko authored
      utc_tai_offset is used to correct IRIG, DCF and NMEA outputs and is
      set during initialisation but is not corrected during leap second
      announce event.  Add watchdog code to control this correction.
      Signed-off-by: default avatarVadim Fedorenko <vadfed@fb.com>
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e68462a0
    • Vadim Fedorenko's avatar
      ptp: ocp: add tod_correction attribute · 44a412d1
      Vadim Fedorenko authored
      TOD correction register is used to compensate for leap seconds in
      different domains.  Export it as an attribute with write access.
      Signed-off-by: default avatarVadim Fedorenko <vadfed@fb.com>
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44a412d1
    • Vadim Fedorenko's avatar
      ptp: ocp: Expose clock status drift and offset · 2f23f486
      Vadim Fedorenko authored
      Monitoring of clock variance could be done through checking
      the offset and the drift updates that are applied to atomic
      clocks.  Expose these values as attributes for the timecard.
      Signed-off-by: default avatarVadim Fedorenko <vadfed@fb.com>
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f23f486
    • Vadim Fedorenko's avatar
      ptp: ocp: add TOD debug information · 9f492c4c
      Vadim Fedorenko authored
      TOD information is currently displayed only on module load,
      which doesn't provide updated information as the system runs.
      
      Create a debug file which provides the current TOD status information,
      and move the information display there.
      Signed-off-by: default avatarVadim Fedorenko <vadfed@fb.com>
      Signed-off-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9f492c4c