1. 28 Mar, 2017 19 commits
    • Or Gerlitz's avatar
      net/mlx5e: Add offloading of NIC TC pedit (header re-write) actions · 2f4fe4ca
      Or Gerlitz authored
      This includes calling the parsing code that translates from pedit
      speak to the HW API, allocation (deallocation) of a modify header
      context and setting the modify header id associated with this
      context to the FTE of that flow.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      2f4fe4ca
    • Or Gerlitz's avatar
      net/mlx5e: Add parsing of TC pedit actions to HW format · d79b6df6
      Or Gerlitz authored
      Parse/translate a set of TC pedit actions to be formed in the HW API format.
      
      User-space provides set of keys where each one of them is made of: command (add or
      set), header-type, byte offset within that header along with a 32 bit mask and value.
      
      The mask dictates what bits in the 32 bit word that starts on the offset we should
      be dealing with, but under negative polarity (unset bits are to be modified).
      
      We do a 1st pass over the set of keys while using the header-type and offset to
      fill the masks and the values into a data-structure containting all the
      supported network headers.
      
      We then do a 2nd pass over the set of fields to re-write supported by the HW,
      where for each such candidate field, we use the masks filled on the 1st pass to
      realize if we should offloading re-write it.
      
      In case offloading is required, we fill a HW descriptor with the following:
      
      (1) the header field to modify
      (2) the bit offset within the field from where to modify (set command only)
      (3) the value to set/add
      (4) the length in bits 1...32 to modify (set command only)
      
      Note that it's possible for a given pedit mask to dictate modifying the
      same header field multiple times or to modify multiple header fields.
      Currently such combinations are not supported for offloading, hence, for set
      commands, the offset within the field is always zero, and the length to modify
      is the field size.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarAmir Vadai <amir@vadai.me>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      d79b6df6
    • Or Gerlitz's avatar
      net/sched: Add accessor functions to pedit keys for offloading drivers · ffe2e217
      Or Gerlitz authored
      HW drivers will use the header-type and command fields from the extended
      keys, and some fields (e.g mask, val, offset) from the legacy keys.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      ffe2e217
    • Or Gerlitz's avatar
      net/mlx5: Introduce alloc/dealloc modify header context commands · 2de24fed
      Or Gerlitz authored
      Implement the low-level commands to support packet header re-write.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      2de24fed
    • Or Gerlitz's avatar
      net/mlx5: Introduce modify header structures, commands and steering action definitions · 2a69cb9f
      Or Gerlitz authored
      Add the definitions related to creation/deletion of a modify header
      context and the modify header steering action which are used for HW
      packet header modify (re-write) as part of steering. Add as well the
      modify header id into two intermediate structs and set it to the FTE.
      
      Note that as the push/pop vlan steering actions are emulated by the
      ewitch management code, we're not breaking any compatibility while
      changing their values to make room for the modify header action which
      is not emulated and whose value is part of the FW API. The new bit
      values for the emulated actions are at the end of the possible range.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      2a69cb9f
    • Or Gerlitz's avatar
      net/mlx5: Reorder few command cases to reflect their natural order · a750276f
      Or Gerlitz authored
      Move the commands related to scheduling elements and vport qos to
      a suitable location (according to the MLX5_CMD_OP enum values) in
      the command string and internal error helpers.
      
      This patch doesn't change any functionality.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      a750276f
    • Or Gerlitz's avatar
      net/mlx5: Add helper to initialize a flow steering actions struct instance · e753b2b5
      Or Gerlitz authored
      There are bunch of places in the code where the intermediate struct
      that keeps the elements related to flow actions is initialized with
      the same default values. Put that into a small DECLARE type helper.
      
      This patch doesn't change any functionality.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      e753b2b5
    • Or Gerlitz's avatar
      net/mlx5e: Properly deal with resource cleanup when adding TC flow fails · aa0cbbae
      Or Gerlitz authored
      The code for adding tc fdb flows leaves things half set when it fails
      in the middle. Currently we are not leaking things (e.g eswitch
      vlan reference, encap reference and HW resources) since the main
      code to add flower rules does a cleanup by calling mlx5e_tc_del_flow().
      
      This cleanup further works just b/c we're checking there if the HW rule
      for the flow we are attempting to delete is valid before touching it, and
      since under the current possible combinations of supported actions it's okay
      to go and blidnly deref or delete all the action related resources (encap, vlan).
      
      Instead, do things properly, namely make sure that if add flow fails we
      clean all what was allocated or referenced. Now, the flow delete code can
      blindly deref/deallocate both the rule and the actions related resources and
      when more action combinations are introduced (such as the upcoming header
      re-write) we are fine with clear and robust code.
      
      While here, align all of nic/fdb parse actions/add flow functions to get
      mlx5e_tc_flow struct param and pick the attributes or whatever else needed
      from there.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      aa0cbbae
    • Or Gerlitz's avatar
      net/mlx5e: Add intermediate struct for TC flow parsing attributes · 17091853
      Or Gerlitz authored
      Add intermediate structure to store attributes parsed from TC filter
      matching/actions parts which are soon to be configured into the HW.
      
      Currently put there the flow matching spec after being parsed. More
      content to be added in down-stream patch.
      
      This patch doesn't change any functionality.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      17091853
    • Or Gerlitz's avatar
      net/mlx5e: Add NIC attributes for offloaded TC flows · 3bc4b7bf
      Or Gerlitz authored
      Add structure that contains the attributes related to offloaded
      NIC flows. Currently it has the actions and flow tag.
      
      While here, do xmas tree cleanup of the TC configure function.
      
      This patch doesn't change any functionality.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      3bc4b7bf
    • Or Gerlitz's avatar
      net/mlx5e: Add prefix for e-switch offloaded TC flow attributes · ecf5bb79
      Or Gerlitz authored
      Add esw_ prefix to the flow attributes attached to offloaded e-switch
      TC flows. This is a pre-step to add attributes to offloaded NIC TC flows.
      
      Also, save one pointer space by using gcc's zero size array, this would
      be beneficial for environments where 100Ks (or Ms) of flows are offloaded.
      
      This patch doesn't change any functionality.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      ecf5bb79
    • David S. Miller's avatar
      Merge tag 'mlx5e-failsafe' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · cc628c96
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5e-failsafe 27-03-2017
      
      This series provides a fail-safe mechanism to allow safely re-configuring
      mlx5e netdevice and provides a resiliency against sporadic
      configuration failures.
      
      To enable this we do some refactoring and code reorganizing to allow
      breaking the drivers open/close flows to stages:
            open -> activate -> deactivate -> close.
      
      In addition we need to allow creating fresh HW ring resources
      (mlx5e_channels) with their own "new" set of parameters, while keeping
      the current ones running and active until the new channels are
      successfully created with the new configuration, and only then we can
      safly replace (switch) old channels with new ones.
      
      For that we introduce mlx5e_channels object and an API to manage it:
       - channels = open_channels(new_params):
         open fresh TX/RX channels
       - activate_channels(channels):
         redirect traffic to them and attach them to the netdev
       - deactivate_channes(channels)
         stop traffic and detach from netdev
       - close(channels)
         Free the TX/RX HW resources of those channels
      
      With the above strategy it is straightforward to achieve the desired
      behavior of fail-safe configuration.  In pseudo code:
      
      make_new_config(new_params)
      {
      	old_channels = current_active_channels;
      	new_channels = create_channels(new_params);
      	if (!new_channels)
      		return "Failed, but current channels are still active :)"
      
      	deactivate_channels(old_channels); /* Can't fail */
      	set_hw_new_state();                /* If needed  */
      	activate_channels(new_channels);   /* Can't fail */
      	close_channels(old_channels);
      	current_active_channels = new_channels;
      
              return "SUCCESS";
      }
      
      At the top of this series, we change the following flows to be fail-safe:
      ethtool:
         - ring parameters
         - coalesce parameters
         - tx copy break parameters
         - cqe compressing/moderation mode setting (priv flags)
      ndos:
         - tc setup
         - set features: LRO
         - change mtu
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc628c96
    • David S. Miller's avatar
      Merge branch 'bond-link-status-fixes' · 95ed0edd
      David S. Miller authored
      Mahesh Bandewar says:
      
      ====================
      link-status fixes for mii-monitoring
      
      The mii monitoring is divided into two phases - inspect and commit. The
      inspect phase technically should not make any changes to the state and
      defer it to the commit phase. However detected link state inconsistencies
      on several machines and discovered that it's the result of some
      inconsistent update to link states and assumption that you *always* get
      rtnl-mutex. In reality when trylock() fails to acquire rtnl-mutex, the
      commit phase is postponed until next mii-mon run. At the next round
      because of the state change performed in the previous inspect-run, this
      round does not detect any changes and would skip calling commit phase.
      This would result in an inconsistent state until next link event happens
      (if it ever happens).
      
      During the the commit phase, it's always assumed that speed and duplex
      fetch is always successful, but that's always not the case. However the
      slave state is marked UP irrespective of speed / duplex fetch operation.
      If the speed / duplex fetch operation results in insane values for either
      of these two fields, then keeping internal link state UP is not going to
      provide fruitful results either.
      
      Please see into individual patches for more details.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95ed0edd
    • Mahesh Bandewar's avatar
    • Mahesh Bandewar's avatar
      bonding: correctly update link status during mii-commit phase · b5bf0f5b
      Mahesh Bandewar authored
      bond_miimon_commit() marks the link UP after attempting to get the speed
      and duplex settings for the link. There is a possibility that
      bond_update_speed_duplex() could fail. This is another place where it
      could result into an inconsistent bonding link state.
      
      With this patch the link will be marked UP only if the speed and duplex
      values retrieved have sane values and processed further.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b5bf0f5b
    • Mahesh Bandewar's avatar
      bonding: make speed, duplex setting consistent with link state · c4adfc82
      Mahesh Bandewar authored
      bond_update_speed_duplex() retrieves speed and duplex settings. There
      is a possibility of failure in retrieving these values but caller has
      to assume it's always successful. This leads to having inconsistent
      slave link settings. If these (speed, duplex) values cannot be
      retrieved, then keeping the link UP causes problems.
      
      The updated bond_update_speed_duplex() returns 0 on success if it
      retrieves sane values for speed and duplex. On failure it returns 1
      and marks the link down.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4adfc82
    • Mahesh Bandewar's avatar
      bonding: improve link-status update in mii-monitoring · de77ecd4
      Mahesh Bandewar authored
      The primary issue is that mii-inspect phase updates link-state and
      expects changes to be committed during the mii-commit phase. After
      the inspect phase if it fails to acquire rtnl-mutex, the commit
      phase (bond_mii_commit) doesn't get to run. This partially updated
      state stays and makes the internal-state inconsistent.
      
      e.g. setup bond0 => slaves: eth1, eth2
      eth1 goes DOWN -> UP
         mii_monitor()
      	mii-inspect()
      	    bond_set_slave_link_state(eth1, UP, DontNotify)
      	rtnl_trylock() <- fails!
      
      Next mii-monitor round
      eth1: No change
         mii_monitor()
      	mii-inspect()
      	    eth1->link == current-status (ethtool_ops->get_link)
      	    no-change-detected
      
      End result:
          eth1:
            Link = BOND_LINK_UP
            Speed = 0xfffff  [SpeedUnknown]
            Duplex = 0xff    [DuplexUnknown]
      
      This doesn't always happen but for some unlucky machines in a large set
      of machines it creates problems.
      
      The fix for this is to avoid making changes during inspect phase and
      postpone them until acquiring the rtnl-mutex / invoking commit phase.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de77ecd4
    • Mahesh Bandewar's avatar
      bonding: split bond_set_slave_link_state into two parts · f307668b
      Mahesh Bandewar authored
      Split the function into two (a) propose (b) commit phase without
      changing the semantics for the original API.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f307668b
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 205ed44e
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2017-03-27
      
      This series contains updates to i40e and i40evf only.
      
      Alex updates the driver code so that we can do bulk updates of the page
      reference count instead of just incrementing it by one reference at a
      time.  Fixed an issue where we were not resetting skb back to NULL when
      we have freed it.  Cleaned up the i40e_process_skb_fields() to align with
      other Intel drivers.  Removed FCoE code, since it is not supported in any
      of the Fortville/Fortpark hardware, so there is not much point of carrying
      the code around, especially if it is broken and untested.
      
      Harshitha fixes a bug in the driver where the calculation of the RSS size
      was not taking into account the number of traffic classes enabled.
      
      Robert fixes a potential race condition during VF reset by eliminating
      IOMMU DMAR Faults caused by VF hardware and when the OS initiates a VF
      reset and before the reset is finished we modify the VF's settings.
      
      Bimmy removes a delay that is no longer needed, since it was only needed
      for preproduction hardware.
      
      Colin King fixes null pointer dereference, where VSI was being
      dereferenced before the VSI NULL check.
      
      Jake fixes an issue with the recent addition of the "client code" to the
      driver, where we attempt to use an uninitialized variable, so correctly
      initialize the params variable by calling i40e_client_get_params().
      
      v2: dropped patch 5 of the original series from Carolyn since we need
          more documentation and reason why the added delay, so Carolyn is
          taking the time to update the patch before we re-submit it for
          kernel inclusion.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      205ed44e
  2. 27 Mar, 2017 21 commits