1. 04 Sep, 2019 8 commits
    • Ioana Radulescu's avatar
      dpaa2-eth: Minor refactoring in ethtool stats · ae90a6f0
      Ioana Radulescu authored
      As we prepare to read more pages from the DPNI stat counters,
      reorganize the code a bit to make it easier to extend.
      Signed-off-by: default avatarIoana Radulescu <ruxandra.radulescu@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae90a6f0
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 2c1f9e26
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      100GbE Intel Wired LAN Driver Updates 2019-09-03
      
      This series contains updates to ice driver only.
      
      Anirudh adds the ability for the driver to handle EMP resets correctly
      by adding the logic to the existing ice_reset_subtask().
      
      Jeb fixes up the logic to properly free up the resources for a switch
      rule whether or not it was successful in the removal.
      
      Brett fixes up the reporting of ITR values to let the user know odd ITR
      values are not allowed.  Fixes the driver to only disable VLAN pruning
      on VLAN deletion when the VLAN being deleted is the last VLAN on the VF
      VSI.
      
      Chinh updates the driver to determine the TSA value from the priority
      value when in CEE mode.
      
      Bruce aligns the driver with the hardware specification by ensuring that
      a PF reset is done as part of the unload logic.  Also update the driver
      unloading field, based on the latest hardware specification, which
      allows us to remove an unnecessary endian conversion.  Moves #defines
      based on their need in the code.
      
      Jesse adds the current state of auto-negotiation in the link up message.
      In addition, adds additional information to inform the user of an issue
      with the topology/configuration of the link.
      
      Usha updates the driver to allow the maximum TCs that the firmware
      supports, rather than hard coding to a set value.
      
      Dave updates the DCB initialization flow to handle the case of an actual
      error during DCB init.  Updated the driver to report the current stats,
      even when the netdev is down, which aligns with our other drivers.
      
      Mitch fixes the VF reset code flows to ensure that it properly calls
      ice_dis_vsi_txq() to notify the firmware that the VF is being reset.
      
      Michal fixes the driver so the DCB is not enabled when the SW LLDP is
      activated, which was causing a communication issue with other NICs.  The
      problem lies in that DCB was being enabled without checking the number
      of TCs.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c1f9e26
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2019-09-01-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 94810bd3
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2019-09-01  (Software steering support)
      
      Abstract:
      --------
      Mellanox ConnetX devices supports packet matching, packet modification and
      redirection. These functionalities are also referred to as flow-steering.
      To configure a steering rule, the rule is written to the device owned
      memory, this memory is accessed and cached by the device when processing
      a packet.
      Steering rules are constructed from multiple steering entries (STE).
      
      Rules are configured using the Firmware command interface. The Firmware
      processes the given driver command and translates them to STEs, then
      writes them to the device memory in the current steering tables.
      This process is slow due to the architecture of the command interface and
      the processing complexity of each rule.
      
      The highlight of this patchset is to cut the middle man (The firmware) and
      do steering rules programming into device directly from the driver, with
      no firmware intervention whatsoever.
      
      Motivation:
      -----------
      Software (driver managed) steering allows for high rule insertion rates
      compared to the FW steering described above, this is achieved by using
      internal RDMA writes to the device owned memory instead of the slow
      command interface to program steering rules.
      
      Software (driver managed) steering, doesn't depend on new FW
      for new steering functionality, new implementations can be done in the
      driver skipping the FW layer.
      
      Performance:
      ------------
      The insertion rate on a single core using the new approach allows
      programming ~300K rules per sec. (Done via direct raw test to the new mlx5
      sw steering layer, without any kernel layer involved).
      
      Test: TC L2 rules
      33K/s with Software steering (this patchset).
      5K/s  with FW and current driver.
      This will improve OVS based solution performance.
      
      Architecture and implementation details:
      ----------------------------------------
      Software steering will be dynamically selected via devlink device
      parameter. Example:
      $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
                pci/0000:06:00.0:
                name flow_steering_mode type driver-specific
                values:
                   cmode runtime value smfs
      
      mlx5 software steering module a.k.a (DR - Direct Rule) is implemented
      and contained in mlx5/core/steering directory and controlled by
      MLX5_SW_STEERING kconfig flag.
      
      mlx5 core steering layer (fs_core) already provides a shim layer for
      implementing different steering mechanisms, software steering will
      leverage that as seen at the end of this series.
      
      When Software Steering for a specific steering domain
      (NIC/RDMA/Vport/ESwitch, etc ..) is supported, it will cause rules
      targeting this domain to be created using  SW steering instead of FW.
      
      The implementation includes:
      Domain - The steering domain is the object that all other object resides
          in. It holds the memory allocator, send engine, locks and other shared
          data needed by lower objects such as table, matcher, rule, action.
          Each domain can contain multiple tables. Domain is equivalent to
          namespaces e.g (NIC/RDMA/Vport/ESwitch, etc ..) as implemented
          currently in mlx5_core fs_core (flow steering core).
      
      Table - Table objects are used for holding multiple matchers, each table
          has a level used to prevent processing loops. Packets are being
          directed to this table once it is set as the root table, this is done
          by fs_core using a FW command. A packet is being processed inside the
          table matcher by matcher until a successful hit, otherwise the packet
          will perform the default action.
      
      Matcher - Matchers objects are used to specify the fields mask for
          matching when processing a packet. A matcher belongs to a table, each
          matcher can hold multiple rules, each rule with different matching
          values corresponding to the matcher mask. Each matcher has a priority
          used for rule processing order inside the table.
      
      Action - Action objects are created to specify different steering actions
          such as count, reformat (encapsulate, decapsulate, ...), modify
          header, forward to table and many other actions. When creating a rule
          a sequence of actions can be provided to be executed on a successful
          match.
      
      Rule - Rule objects are used to specify a specific match on packets as
          well as the actions that should be executed. A rule belongs to a
          matcher.
      
      STE - This layer is used to hold the specific STE format for the device
          and to convert the requested rule to STEs. Each rule is constructed of
          an STE chain, Multiple rules construct a steering graph. Each node in
          the graph is a hash table containing multiple STEs. The index of each
          STE in the hash table is being calculated using a CRC32 hash function.
      
      Memory pool - Used for managing and caching device owned memory for rule
          insertion. The memory is being allocated using DM (device memory) API.
      
      Communication with device - layer for standard RDMA operation using  RC QP
          to configure the device steering.
      
      Command utility - This module holds all of the FW commands that are
          required for SW steering to function.
      
      Patch planning and files:
      -------------------------
      1) First patch, adds the support to Add flow steering actions to fs_cmd
      shim layer.
      
      2) Next 12 patch will add a file per each Software steering
      functionality/module as described above. (See patches with title: DR, *)
      
      3) Add CONFIG_MLX5_SW_STEERING for software steering support and enable
      build with the new files
      
      4) Next two patches will add the support for software steering in mlx5
      steering shim layer
      net/mlx5: Add API to set the namespace steering mode
      net/mlx5: Add direct rule fs_cmd implementation
      
      5) Last two patches will add the new devlink parameter to select mlx5
      steering mode, will be valid only for switchdev mode for now.
      Two modes are supported:
          1. DMFS - Device managed flow steering
          2. SMFS - Software/Driver managed flow steering.
      
          In the DMFS mode, the HW steering entities are created through the
          FW. In the SMFS mode this entities are created though the driver
          directly.
      
          The driver will use the devlink steering mode only if the steering
          domain supports it, for now SMFS will manages only the switchdev
          eswitch steering domain.
      
          User command examples:
          - Set SMFS flow steering mode::
      
              $ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
      
          - Read device flow steering mode::
      
              $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
                pci/0000:06:00.0:
                name flow_steering_mode type driver-specific
                values:
                   cmode runtime value smfs
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94810bd3
    • Brett Creeley's avatar
      ice: Only disable VLAN pruning for the VF when all VLANs are removed · cd186e51
      Brett Creeley authored
      Currently if the VF adds a VLAN, VLAN pruning will be enabled for that VSI.
      Also, when a VLAN gets deleted it will disable VLAN pruning even if other
      VLAN(s) exists for the VF. Fix this by only disabling VLAN pruning on the
      VF VSI when removing the last VF (i.e. vf->num_vlan == 0).
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      cd186e51
    • Michal Swiatkowski's avatar
      ice: Remove enable DCB when SW LLDP is activated · 03bba020
      Michal Swiatkowski authored
      Remove code that enables DCB in initialization when SW LLDP is
      activated. DCB flag is set or reset before in ice_init_pf_dcb
      based on number of TCs. So there is not need to overwrite it.
      
      Setting DCB without checking number of TCs can cause communication
      problems with other cards. Host card sends packet with VLAN priority
      tag, but client card doesn't strip this tag and ping doesn't work.
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      03bba020
    • Dave Ertman's avatar
      ice: Report stats when VSI is down · 3d57fd10
      Dave Ertman authored
      There is currently a check in get_ndo_stats that
      returns before updating stats if the VSI is down
      or there are no Tx or Rx queues.  This causes the
      netdev to report zero stats with the netdev is down.
      
      Remove the check so that the behavior of reporting
      stats is the same as it was in IXGBE.
      Signed-off-by: default avatarDave Ertman <david.m.ertman@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      3d57fd10
    • Mitch Williams's avatar
      ice: Always notify FW of VF reset · 06914ac2
      Mitch Williams authored
      The call to ice_dis_vsi_txq() acts as the notification to the firmware
      that the VF is being reset. Because of this, we need to make this call
      every time we reset, regardless of whatever else we do to stop the Tx
      queues.
      
      Without this change, VF resets would fail to complete on interfaces that
      were up and running.
      Signed-off-by: default avatarMitch Williams <mitch.a.williams@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      06914ac2
    • Dave Ertman's avatar
      ice: Correctly handle return values for init DCB · 473ca574
      Dave Ertman authored
      In the init path for DCB, the call to ice_init_dcb()
      can return a non-zero value for either an actual
      error, or due to the FW lldp engine being stopped.
      
      We are currently treating all non-zero values only as
      an indication that the FW LLDP engine is stopped.
      
      Check for an actual error in the DCB init flow.
      Signed-off-by: default avatarDave Ertman <david.m.ertman@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      473ca574
  2. 03 Sep, 2019 28 commits
  3. 02 Sep, 2019 4 commits
    • David S. Miller's avatar
      Merge branch 'mvpp2-per-cpu-buffers' · 67538eb5
      David S. Miller authored
      Matteo Croce says:
      
      ====================
      mvpp2: per-cpu buffers
      
      This patchset workarounds an PP2 HW limitation which prevents to use
      per-cpu rx buffers.
      The first patch is just a refactor to prepare for the second one.
      The second one allocates percpu buffers if the following conditions are met:
      - CPU number is less or equal 4
      - no port is using jumbo frames
      
      If the following conditions are not met at load time, of jumbo frame is enabled
      later on, the shared allocation is reverted.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      67538eb5
    • Matteo Croce's avatar
      mvpp2: percpu buffers · 7d04b0b1
      Matteo Croce authored
      Every mvpp2 unit can use up to 8 buffers mapped by the BM (the HW buffer
      manager). The HW will place the frames in the buffer pool depending on the
      frame size: short (< 128 bytes), long (< 1664) or jumbo (up to 9856).
      
      As any unit can have up to 4 ports, the driver allocates only 2 pools,
      one for small and one long frames, and share them between ports.
      When the first port MTU is set higher than 1664 bytes, a third pool is
      allocated for jumbo frames.
      
      This shared allocation makes impossible to use percpu allocators,
      and creates contention between HW queues.
      
      If possible, i.e. if the number of possible CPU are less than 8 and jumbo
      frames are not used, switch to a new scheme: allocate 8 per-cpu pools for
      short and long frames and bind every pool to an RXQ.
      
      When the first port MTU is set higher than 1664 bytes, the allocation
      scheme is reverted to the old behaviour (3 shared pools), and when all
      ports MTU are lowered, the per-cpu buffers are allocated again.
      Signed-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d04b0b1
    • Matteo Croce's avatar
      mvpp2: refactor BM pool functions · 13616361
      Matteo Croce authored
      Refactor mvpp2_bm_pool_create(), mvpp2_bm_pool_destroy() and
      mvpp2_bm_pools_init() so that they accept a struct device instead
      of a struct platform_device, as they just need platform_device->dev.
      
      Removing such dependency makes the BM code more reusable in context
      where we don't have a pointer to the platform_device.
      Signed-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13616361
    • Vladimir Oltean's avatar
      net: dsa: Fix off-by-one number of calls to devlink_port_unregister · 4ba0ebbc
      Vladimir Oltean authored
      When a function such as dsa_slave_create fails, currently the following
      stack trace can be seen:
      
      [    2.038342] sja1105 spi0.1: Probed switch chip: SJA1105T
      [    2.054556] sja1105 spi0.1: Reset switch and programmed static config
      [    2.063837] sja1105 spi0.1: Enabled switch tagging
      [    2.068706] fsl-gianfar soc:ethernet@2d90000 eth2: error -19 setting up slave phy
      [    2.076371] ------------[ cut here ]------------
      [    2.080973] WARNING: CPU: 1 PID: 21 at net/core/devlink.c:6184 devlink_free+0x1b4/0x1c0
      [    2.088954] Modules linked in:
      [    2.092005] CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 5.3.0-rc6-01360-g41b52e38d2b6-dirty #1746
      [    2.100912] Hardware name: Freescale LS1021A
      [    2.105162] Workqueue: events deferred_probe_work_func
      [    2.110287] [<c03133a4>] (unwind_backtrace) from [<c030d8cc>] (show_stack+0x10/0x14)
      [    2.117992] [<c030d8cc>] (show_stack) from [<c10b08d8>] (dump_stack+0xb4/0xc8)
      [    2.125180] [<c10b08d8>] (dump_stack) from [<c0349d04>] (__warn+0xe0/0xf8)
      [    2.132018] [<c0349d04>] (__warn) from [<c0349e34>] (warn_slowpath_null+0x40/0x48)
      [    2.139549] [<c0349e34>] (warn_slowpath_null) from [<c0f19d74>] (devlink_free+0x1b4/0x1c0)
      [    2.147772] [<c0f19d74>] (devlink_free) from [<c1064fc0>] (dsa_switch_teardown+0x60/0x6c)
      [    2.155907] [<c1064fc0>] (dsa_switch_teardown) from [<c1065950>] (dsa_register_switch+0x8e4/0xaa8)
      [    2.164821] [<c1065950>] (dsa_register_switch) from [<c0ba7fe4>] (sja1105_probe+0x21c/0x2ec)
      [    2.173216] [<c0ba7fe4>] (sja1105_probe) from [<c0b35948>] (spi_drv_probe+0x80/0xa4)
      [    2.180920] [<c0b35948>] (spi_drv_probe) from [<c0a4c1cc>] (really_probe+0x108/0x400)
      [    2.188711] [<c0a4c1cc>] (really_probe) from [<c0a4c694>] (driver_probe_device+0x78/0x1bc)
      [    2.196933] [<c0a4c694>] (driver_probe_device) from [<c0a4a3dc>] (bus_for_each_drv+0x58/0xb8)
      [    2.205414] [<c0a4a3dc>] (bus_for_each_drv) from [<c0a4c024>] (__device_attach+0xd0/0x168)
      [    2.213637] [<c0a4c024>] (__device_attach) from [<c0a4b1d0>] (bus_probe_device+0x84/0x8c)
      [    2.221772] [<c0a4b1d0>] (bus_probe_device) from [<c0a4b72c>] (deferred_probe_work_func+0x84/0xc4)
      [    2.230686] [<c0a4b72c>] (deferred_probe_work_func) from [<c03650a4>] (process_one_work+0x218/0x510)
      [    2.239772] [<c03650a4>] (process_one_work) from [<c03660d8>] (worker_thread+0x2a8/0x5c0)
      [    2.247908] [<c03660d8>] (worker_thread) from [<c036b348>] (kthread+0x148/0x150)
      [    2.255265] [<c036b348>] (kthread) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
      [    2.262444] Exception stack(0xea965fb0 to 0xea965ff8)
      [    2.267466] 5fa0:                                     00000000 00000000 00000000 00000000
      [    2.275598] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      [    2.283729] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
      [    2.290333] ---[ end trace ca5d506728a0581a ]---
      
      devlink_free is complaining right here:
      
      	WARN_ON(!list_empty(&devlink->port_list));
      
      This happens because devlink_port_unregister is no longer done right
      away in dsa_port_setup when a DSA_PORT_TYPE_USER has failed.
      Vivien said about this change that:
      
          Also no need to call devlink_port_unregister from within dsa_port_setup
          as this step is inconditionally handled by dsa_port_teardown on error.
      
      which is not really true. The devlink_port_unregister function _is_
      being called unconditionally from within dsa_port_setup, but not for
      this port that just failed, just for the previous ones which were set
      up.
      
      ports_teardown:
      	for (i = 0; i < port; i++)
      		dsa_port_teardown(&ds->ports[i]);
      
      Initially I was tempted to fix this by extending the "for" loop to also
      cover the port that failed during setup. But this could have potentially
      unforeseen consequences unrelated to devlink_port or even other types of
      ports than user ports, which I can't really test for. For example, if
      for some reason devlink_port_register itself would fail, then
      unconditionally unregistering it in dsa_port_teardown would not be a
      smart idea. The list might go on.
      
      So just make dsa_port_setup undo the setup it had done upon failure, and
      let the for loop undo the work of setting up the previous ports, which
      are guaranteed to be brought up to a consistent state.
      
      Fixes: 955222ca ("net: dsa: use a single switch statement for port setup")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ba0ebbc