1. 22 Aug, 2019 23 commits
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 7ee7f3e8
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2019-08-22
      
      This series contains updates to i40e driver only.
      
      Arnd Bergmann reduces the stack usage which was causing warnings on
      32-bit architectures due to large structure sizes for 2 functions
      getting inlined, so use noinline_for_stack to prevent the compilers from
      combining the 2 functions.
      
      Mauro S. M. Rodrigues fixes an issue when reading an EEPROM from SFP
      modules that comply with SFF-8472 but do not implement the Digital
      Diagnostic Monitoring (DDM) interface for i40e.
      
      Huhai found we were not checking the return value for configuring the
      transmit ring and continuing with XDP configuration of the transmit
      ring.
      
      Beilei fixes an issue of shifting signed 32-bit integers.
      
      Sylwia adds support for "packet drop mode" to the MAC configuration for
      admin queue command.  This bit controls the behavior when a no-drop
      packet is blocking a TC queue.  Adds support for persistent LLDP by
      checking the LLDP flag and reading the LLDP from the NVM when enabled.
      
      Adrian fixes the "recovery mode" check to take into account which device
      we are on, since x710 devices have 4 register values to check for status
      and x722 devices only have 2 register values to check.
      
      Piotr Azarewicz bumps the supported firmware API version to 1.9 which
      extends the PHY access admin queue command support.
      
      Jake makes sure the traffic class stats for a VEB are reset when the VEB
      stats are reset.
      
      Slawomir fixes a NULL pointer dereference where the VSI pointer was not
      updated before passing it to the i40e_set_vf_mac() when the VF is in a
      reset state, so wait for the reset to complete.
      
      Grzegorz removes the i40e_update_dcb_config() which was not using the
      correct NVM reads, so call i40e_init_dcb() in its place to correctly
      update the DCB configuration.
      
      Piotr Kwapulinski expands the scope of i40e_set_mac_type() since this is
      needed during probe to determine if we are in recovery mode.  Fixed the
      driver reset path when in recovery mode.
      
      Marcin fixed an issue where we were breaking out of a loop too early
      when trying to get the PHY capabilities.
      
      v2: Combined patch 7 & 9 in the original series, since both patches
          bumped firmware API version.  Also combined patches 12 & 13 in the
          original series, since one increased the scope of checking for MAC
          and the follow-on patch made use of function within the new scope.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ee7f3e8
    • Marcin Formela's avatar
      i40e: fix retrying in i40e_aq_get_phy_capabilities · 1b5f5d38
      Marcin Formela authored
      Fixed a bug where driver was breaking out of the loop and
      reporting an error without retrying first.
      Signed-off-by: default avatarMarcin Formela <marcin.formela@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1b5f5d38
    • Sylwia Wnuczko's avatar
      i40e: Persistent LLDP support · 65c275e4
      Sylwia Wnuczko authored
      This patch adds a function to read NVM module data and uses it to
      read current LLDP agent configuration from NVM API version 1.8.
      Signed-off-by: default avatarSylwia Wnuczko <sylwia.wnuczko@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      65c275e4
    • Piotr Kwapulinski's avatar
      i40e: allow reset in recovery mode · a39f165d
      Piotr Kwapulinski authored
      Driver waits after issuing a reset. When a reset takes too long a driver
      gives up. Implemented by invoking PF reset in a loop. After defined
      number of unsuccessful PF reset trials it returns error.
      Without this patch PF reset fails when NIC is in recovery mode.
      
      So make i40e_set_mac_type() public. i40e driver requires i40e_set_mac_type()
      to be public. It is required for recovery mode handling. Without this patch
      recovery mode could not be detected in i40e_probe().
      Signed-off-by: default avatarPiotr Kwapulinski <piotr.kwapulinski@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      a39f165d
    • Grzegorz Siwik's avatar
      i40e: Remove function i40e_update_dcb_config() · 541d9731
      Grzegorz Siwik authored
      This patch removes function i40e_update_dcb_config(). Instead of
      i40e_update_dcb_config() we use i40e_init_dcb(), which implements the
      correct NVM read.
      Signed-off-by: default avatarGrzegorz Siwik <grzegorz.siwik@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      541d9731
    • Slawomir Laba's avatar
      i40e: Fix crash caused by stress setting of VF MAC addresses · 9889707b
      Slawomir Laba authored
      Add update to the VSI pointer passed to the i40e_set_vf_mac function.
      If VF is in reset state the driver waits in i40e_set_vf_mac function
      for the reset to be complete, yet after reset the vsi pointer
      that was passed into this function is no longer valid.
      
      The patch updates local VSI pointer directly from pf->vsi array,
      by using the id stored in VF pointer (lan_vsi_idx).
      
      Without this commit the driver might occasionally invoke general
      protection fault in kernel and disable the OS entirely.
      Signed-off-by: default avatarSlawomir Laba <slawomirx.laba@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      9889707b
    • Jacob Keller's avatar
      i40e: reset veb.tc_stats when resetting veb.stats · 1e0303fd
      Jacob Keller authored
      The stats structure for the VEB switch statistics is reset periodically,
      but the tc_stats are not reset at the same time.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1e0303fd
    • Piotr Azarewicz's avatar
      i40e: Update FW API version to 1.9 · f93b3fd9
      Piotr Azarewicz authored
      Upcoming FW increment API version to 1.9 due to Extend PHY access AQ
      command support. SW is ready for that support as well.
      Signed-off-by: default avatarPiotr Azarewicz <piotr.azarewicz@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      f93b3fd9
    • Adrian Podlawski's avatar
      i40e: check_recovery_mode had wrong if statement · d4256c8e
      Adrian Podlawski authored
      Function check_recovery_mode had wrong if statement.
      Now we check proper FWS1B register values, which are responsible for
      the recovery mode. Recovery mode has 4 values for x710 and 2 for x722.
      That's why we need 6 different flags which are defined in the code.
      Now in the if statement, we recognize type of mac address
      and register value.
      Without those changes driver could show wrong state.
      Signed-off-by: default avatarAdrian Podlawski <adrian.podlawski@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d4256c8e
    • Sylwia Wnuczko's avatar
      i40e: Add drop mode parameter to set mac config · d802c760
      Sylwia Wnuczko authored
      This patch adds "drop mode" parameter to set mac config AQ command.
      This bit controls the behavior when a no-drop packet is blocking a TC
      queue.
      0 – The PF driver is notified.
      1 – The blocking packet is dropped and then the PF driver is notified.
      Signed-off-by: default avatarSylwia Wnuczko <sylwia.wnuczko@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d802c760
    • Beilei Xing's avatar
      i40e: fix shifts of signed values · fb598262
      Beilei Xing authored
      This patch fixes following error reported by cppcheck:
      (error) Shifting signed 32-bit value by 31 bits is undefined behaviour
      Signed-off-by: default avatarBeilei Xing <beilei.xing@intel.com>
      Signed-off-by: default avatarFerruh Yigit <ferruh.yigit@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      fb598262
    • huhai's avatar
      i40e: add check on i40e_configure_tx_ring() return value · 408bfc38
      huhai authored
      When i40e_configure_tx_ring(vsi->tx_rings[i]) returns an error, we should
      exit from i40e_vsi_configure_tx and return the error, instead of continuing
      to check whether xdp is enable, and configure the xdp transmit ring.
      Signed-off-by: default avatarhuhai <huhai@kylinos.cn>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      408bfc38
    • Mauro S. M. Rodrigues's avatar
      i40e: Check if transceiver implements DDM before access · bc6c1eaa
      Mauro S. M. Rodrigues authored
      Similar to the ixgbe issue fixed in:
      655c9141 ("ixgbe: Check DDM existence in transceiver before access)
      
      i40e has the same issue when reading eeprom from SFP's module that comply
      with SFF-8472 but not implement the Digital Diagnostic Monitoring (DDM)
      interface described in it. The existence of such area is specified by bit
      6 of byte 92, set to 1 if implemented.
      
      Without this patch, due to not checking this bit i40e fails to read SFP
      module's eeprom with the follow message:
      
      ethtool -m enP51p1s0f0
      Cannot get Module EEPROM data: Input/output error
      
      Because it fails to read the additional 256 bytes in which it was assumed
      to exist the DDM data.
      Signed-off-by: default avatar"Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      bc6c1eaa
    • Arnd Bergmann's avatar
      i40e: reduce stack usage in i40e_set_fc · 33b16568
      Arnd Bergmann authored
      The functions i40e_aq_get_phy_abilities_resp() and i40e_set_fc() both
      have giant structure on the stack, which makes each one use stack frames
      larger than 500 bytes.
      
      As clang decides one function into the other, we get a warning for
      exceeding the frame size limit on 32-bit architectures:
      
      drivers/net/ethernet/intel/i40e/i40e_common.c:1654:23: error: stack frame size of 1116 bytes in function 'i40e_set_fc' [-Werror,-Wframe-larger-than=]
      
      When building with gcc, the inlining does not happen, but i40e_set_fc()
      calls i40e_aq_get_phy_abilities_resp() anyway, so they add up on the
      kernel stack just as much.
      
      The parts that actually use large stacks don't overlap, so make sure
      each one is a separate function, and mark them as noinline_for_stack to
      prevent the compilers from combining them again.
      
      Fixes: 0a862b43 ("i40e/i40evf: Add module_types and update_link_info")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      33b16568
    • Colin Ian King's avatar
      nexthops: remove redundant assignment to variable err · c76c9925
      Colin Ian King authored
      Variable err is initialized to a value that is never read and it is
      re-assigned later. The initialization is redundant and can be removed.
      
      Addresses-Coverity: ("Unused Value")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c76c9925
    • David S. Miller's avatar
      Merge branch 'mlx5-hyperv' · 8da3803d
      David S. Miller authored
      Haiyang Zhang says:
      
      ====================
      Add software backchannel and mlx5e HV VHCA stats
      
      This patch set adds paravirtual backchannel in software in pci_hyperv,
      which is required by the mlx5e driver HV VHCA stats agent.
      
      The stats agent is responsible on running a periodic rx/tx packets/bytes
      stats update.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8da3803d
    • Eran Ben Elisha's avatar
      net/mlx5e: Add mlx5e HV VHCA stats agent · cef35af3
      Eran Ben Elisha authored
      HV VHCA stats agent is responsible on running a preiodic rx/tx
      packets/bytes stats update. Currently the supported format is version
      MLX5_HV_VHCA_STATS_VERSION. Block ID 1 is dedicated for statistics data
      transfer from the VF to the PF.
      
      The reporter fetch the statistics data from all opened channels, fill it
      in a buffer and send it to mlx5_hv_vhca_write_agent.
      
      As the stats layer should include some metadata per block (sequence and
      offset), the HV VHCA layer shall modify the buffer before actually send it
      over block 1.
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cef35af3
    • Eran Ben Elisha's avatar
      net/mlx5: Add HV VHCA control agent · 29ddad43
      Eran Ben Elisha authored
      Control agent is responsible over of the control block (ID 0). It should
      update the PF via this block about every capability change. In addition,
      upon block 0 invalidate, it should activate all other supported agents
      with data requests from the PF.
      
      Upon agent create/destroy, the invalidate callback of the control agent
      is being called in order to update the PF driver about this change.
      
      The control agent is an integral part of HV VHCA and will be created
      and destroy as part of the HV VHCA init/cleanup flow.
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29ddad43
    • Eran Ben Elisha's avatar
      net/mlx5: Add HV VHCA infrastructure · 87175120
      Eran Ben Elisha authored
      HV VHCA is a layer which provides PF to VF communication channel based on
      HyperV PCI config channel. It implements Mellanox's Inter VHCA control
      communication protocol. The protocol contains control block in order to
      pass messages between the PF and VF drivers, and data blocks in order to
      pass actual data.
      
      The infrastructure is agent based. Each agent will be responsible of
      contiguous buffer blocks in the VHCA config space. This infrastructure will
      bind agents to their blocks, and those agents can only access read/write
      the buffer blocks assigned to them. Each agent will provide three
      callbacks (control, invalidate, cleanup). Control will be invoked when
      block-0 is invalidated with a command that concerns this agent. Invalidate
      callback will be invoked if one of the blocks assigned to this agent was
      invalidated. Cleanup will be invoked before the agent is being freed in
      order to clean all of its open resources or deferred works.
      
      Block-0 serves as the control block. All execution commands from the PF
      will be written by the PF over this block. VF will ack on those by
      writing on block-0 as well. Its format is described by struct
      mlx5_hv_vhca_control_block layout.
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87175120
    • Eran Ben Elisha's avatar
      net/mlx5: Add wrappers for HyperV PCIe operations · 913d14e8
      Eran Ben Elisha authored
      Add wrapper functions for HyperV PCIe read / write /
      block_invalidate_register operations.  This will be used as an
      infrastructure in the downstream patch for software communication.
      
      This will be enabled by default if CONFIG_PCI_HYPERV_INTERFACE is set.
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      913d14e8
    • Haiyang Zhang's avatar
      PCI: hv: Add a Hyper-V PCI interface driver for software backchannel interface · 348dd93e
      Haiyang Zhang authored
      This interface driver is a helper driver allows other drivers to
      have a common interface with the Hyper-V PCI frontend driver.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      348dd93e
    • Dexuan Cui's avatar
      PCI: hv: Add a paravirtual backchannel in software · e5d2f910
      Dexuan Cui authored
      Windows SR-IOV provides a backchannel mechanism in software for communication
      between a VF driver and a PF driver.  These "configuration blocks" are
      similar in concept to PCI configuration space, but instead of doing reads and
      writes in 32-bit chunks through a very slow path, packets of up to 128 bytes
      can be sent or received asynchronously.
      
      Nearly every SR-IOV device contains just such a communications channel in
      hardware, so using this one in software is usually optional.  Using the
      software channel, however, allows driver implementers to leverage software
      tools that fuzz the communications channel looking for vulnerabilities.
      
      The usage model for these packets puts the responsibility for reading or
      writing on the VF driver.  The VF driver sends a read or a write packet,
      indicating which "block" is being referred to by number.
      
      If the PF driver wishes to initiate communication, it can "invalidate" one or
      more of the first 64 blocks.  This invalidation is delivered via a callback
      supplied by the VF driver by this driver.
      
      No protocol is implied, except that supplied by the PF and VF drivers.
      Signed-off-by: default avatarJake Oshins <jakeo@microsoft.com>
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5d2f910
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2019-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · fed07ef3
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5 tc flow handling for concurrent execution (Part 3)
      
      This series includes updates to mlx5 ethernet and core driver:
      
      Vlad submits part 3 of 3 part series to allow TC flow handling
      for concurrent execution.
      
      Vlad says:
      ==========
      
      Structure mlx5e_neigh_hash_entry code that uses it are refactored in
      following ways:
      
      - Extend neigh_hash_entry with rcu and modify its users to always take
        reference to the structure when using it (neigh_hash_entry has already
        had atomic reference counter which was only used when scheduling neigh
        update on workqueue from atomic context of neigh update netevent).
      
      - Always use mlx5e_neigh_update_table->encap_lock when modifying neigh
        update hash table and list. Originally, this lock was only used to
        synchronize with netevent handler function, which is called from bh
        context and cannot use rtnl lock for synchronization. Use rcu read lock
        instead of encap_lock to lookup nhe in atomic context of netevent even
        handler function. Convert encap_lock to mutex to allow creating new
        neigh hash entries while holding it, which is safe to do because the
        lock is no longer used in atomic context.
      
      - Rcu-ify mlx5e_neigh_hash_entry->encap_list by changing operations on
        encap list to their rcu counterparts and extending encap structure
        with rcu_head to free the encap instances after rcu grace period. This
        allows fast traversal of list of encaps attached to nhe under rcu read
        lock protection.
      
      - Take encap_table_lock when accessing encap entries in neigh update and
        neigh stats update code to protect from concurrent encap entry
        insertion or removal.
      
      This approach leads to potential race condition when neigh update and
      neigh stats update code can access encap and flow entries that are not
      fully initialized or are being destroyed, or neigh can change state
      without updating encaps that are created concurrently. Prevent these
      issues by following changes in flow and encap initialization:
      
      - Extend mlx5e_tc_flow with 'init_done' completion. Modify neigh update
        to wait for both encap and flow completions to prevent concurrent
        access to a structure that is being initialized by tc.
      
      - Skip structures that failed during initialization: encaps with
        encap_id<0 and flows that don't have OFFLOADED flag set.
      
      - To ensure that no new flows are added to encap when it is being
        accessed by neigh update or neigh stats update, take encap_table_lock
        mutex.
      
      - To prevent concurrent deletion by tc, ensure that neigh update and
        neigh stats update hold references to encap and flow instances while
        using them.
      
      With changes presented in this patch set it is now safe to execute tc
      concurrently with neigh update and neigh stats update. However, these
      two workqueue tasks modify same flow "tmp_list" field to store flows
      with reference taken in temporary list to release the references after
      update operation finishes and should not be executed concurrently with
      each other.
      
      Last 3 patches of this series provide 3 new mlx5 trace points to track
      mlx5 tc requests and mlx5 neigh updates.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fed07ef3
  2. 21 Aug, 2019 17 commits