1. 14 Jul, 2018 9 commits
    • David S. Miller's avatar
      Merge branch 'mlxsw-VRRP' · f5c64e56
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Add VRRP support
      
      When a router that is acting as the default gateway of a host stops
      functioning, the host will encounter packet loss until the router starts
      functioning again.
      
      To increase the reliability of the default gateway without performing
      reconfiguration on the host, a host can use a Virtual Router Redundancy
      Protocol (VRRP) Router. This virtual router is composed from several
      routers where only one is actually forwarding packets from the host (the
      master router) while the other routers act as backup routers. The
      election of the master router is determined by the VRRP protocol [1].
      
      Packets addressed to the virtual router are always sent to the virtual
      router MAC address (IPv4: 00-00-5E-00-01-XX, IPv6: 00-00-5E-00-02-XX).
      Such packets can only be accepted by the master router and must be
      discarded by the backup routers.
      
      In Linux, VRRP is usually implemented by configuring a macvlan with the
      virtual router MAC on top of the router interface that is connected to
      the host / LAN. The macvlan on the master router is assigned the virtual
      IP (VIP) that the host uses as its gateway.
      
      In order to support VRRP in mlxsw, we first need to enable macvlan upper
      devices on top of mlxsw netdevs and their uppers. This is done by the
      first patch, which also takes care of sanitizing macvlan configurations
      that are not currently supported by the driver.
      
      The second patch directs packets with destination MAC addresses as the
      macvlans to the router so that they will undergo an L3 lookup. This is
      consistent with the kernel's behavior where the macvlan's Rx handler
      will re-inject such packets to the Rx path so that they will be picked
      up by the IPvX protocol handlers and undergo an L3 lookup. Note that the
      driver prevents the macvlans from being enslaved to other devices, to
      ensure the packets will be picked up by the protocol handler and not by
      another Rx handler.
      
      The third patch adds packet traps for VRRP control packets for both IPv4
      and IPv6. Finally, the last patch optimizes the reception of VRRP MACs
      by potentially skipping one L2 lookup for them.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f5c64e56
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Optimize processing of VRRP MACs · c3a49540
      Ido Schimmel authored
      Hosts using a VRRP router send their packets with a destination MAC of
      the VRRP router which is of the following form [1]:
      
      IPv4 - 00-00-5E-00-01-{VRID}
      IPv6 - 00-00-5E-00-02-{VRID}
      
      Where VRID is the ID of the virtual router. Such packets are directed to
      the router block in the ASIC by an FDB entry that was added in the
      previous patch.
      
      However, in certain cases it is possible to skip this FDB lookup and
      send such packets directly to the router. This is accomplished by adding
      these special MAC addresses to the RIF cache. If the cache is hit, the
      packet will skip the L2 lookup and ingress the router with the RIF
      specified in the cache entry.
      
      1. https://tools.ietf.org/html/rfc5798#section-7.3Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3a49540
    • Ido Schimmel's avatar
      mlxsw: spectrum: Add VRRP traps · 11566d34
      Ido Schimmel authored
      Virtual Router Redundancy Protocol packets are used to communicate the
      state of the Master router associated with the virtual router ID (VRID).
      
      These are link-local multicast packets sent with IP protocol 112 that
      are trapped in the router block in the ASIC.
      
      Add a trap for these packets and mark the trapped packets to prevent
      them from potentially being re-flooded by the bridge driver.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11566d34
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Direct macvlans' MACs to router · 2db99378
      Ido Schimmel authored
      An IP packet received on a netdev with a macvlan upper whose MAC matches
      the packet's destination MAC will be re-injected to the Rx path as if it
      was received by the macvlan, and perform an L3 lookup.
      
      Reflect this functionality to the ASIC by programming FDB entries that
      will direct MACs of macvlan uppers to the router.
      
      In a similar fashion to router interfaces (RIFs) that are programmed
      upon the addition of the first IP address on an interface and destroyed
      upon the removal of the last IP address, the FDB entries for the macvlan
      are added and destroyed based on the addition of the first and removal
      of the last IP address on the macvlan.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2db99378
    • Ido Schimmel's avatar
      mlxsw: spectrum: Enable macvlan upper devices · c5516185
      Ido Schimmel authored
      In order to allow more unicast MAC addresses (e.g., VRRP virtual MAC) to
      be directed to the router we need to enable macvlan uppers on top of
      mlxsw netdevs.
      
      Allow macvlan upper devices on top of mlxsw netdevs and sanitize
      configurations that can't work. For example, a macvlan can't be enslaved
      to a bridge as without ACLs the device doesn't take the destination MAC
      into account when classifying a packet to a bridge instance (i.e., a
      FID).
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5516185
    • Yafang Shao's avatar
      tcp: remove redundant rcv_nxt update · ff0432e5
      Yafang Shao authored
      tcp_rcv_nxt_update() is already executed in tcp_data_queue().
      This line is redundant.
      
      See bellow,
      	tcp_queue_rcv
      		tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq);
      	tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq); <<<< redundant
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff0432e5
    • kbuild test robot's avatar
      net: mvpp2: mvpp2_cls_flow_get() can be static · 9cee8c43
      kbuild test robot authored
      Fixes: f9358e12 ("net: mvpp2: split ingress traffic into multiple flows")
      Signed-off-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9cee8c43
    • Linus Walleij's avatar
      of: mdio: Support fixed links in of_phy_get_and_connect() · 6eb9c9da
      Linus Walleij authored
      By a simple extension of of_phy_get_and_connect() drivers
      that have a fixed link on e.g. RGMII can support also
      fixed links, so in addition to:
      
      ethernet-port {
      	phy-mode = "rgmii";
      	phy-handle = <&foo>;
      };
      
      This setup with a fixed-link node and no phy-handle will
      now also work just fine:
      
      ethernet-port {
      	phy-mode = "rgmii";
      	fixed-link {
      		speed = <1000>;
      		full-duplex;
      		pause;
      	};
      };
      
      This is very helpful for connecting random ethernet ports
      to e.g. DSA switches that typically reside on fixed links.
      
      The phy-mode is still there as the fixes link in this case
      is still an RGMII link.
      
      Tested on the Cortina Gemini driver with the Vitesse DSA
      router chip on a fixed 1Gbit link.
      Suggested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6eb9c9da
    • Vlad Buslov's avatar
      net: sched: refactor flower walk to iterate over idr · 01683a14
      Vlad Buslov authored
      Extend struct tcf_walker with additional 'cookie' field. It is intended to
      be used by classifier walk implementations to continue iteration directly
      from particular filter, instead of iterating 'skip' number of times.
      
      Change flower walk implementation to save filter handle in 'cookie'. Each
      time flower walk is called, it looks up filter with saved handle directly
      with idr, instead of iterating over filter linked list 'skip' number of
      times. This change improves complexity of dumping flower classifier from
      quadratic to linearithmic. (assuming idr lookup has logarithmic complexity)
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reported-by: default avatarSimon Horman <simon.horman@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      01683a14
  2. 13 Jul, 2018 31 commits
    • Nikolay Aleksandrov's avatar
      net: ipmr: add support for passing full packet on wrong vif · c921c207
      Nikolay Aleksandrov authored
      This patch adds support for IGMPMSG_WRVIFWHOLE which is used to pass
      full packet and real vif id when the incoming interface is wrong.
      While the RP and FHR are setting up state we need to be sending the
      registers encapsulated with all the data inside otherwise we lose it.
      The RP then decapsulates it and forwards it to the interested parties.
      Currently with WRONGVIF we can only be sending empty register packets
      and will lose that data.
      This behaviour can be enabled by using MRT_PIM with
      val == IGMPMSG_WRVIFWHOLE. This doesn't prevent IGMPMSG_WRONGVIF from
      happening, it happens in addition to it, also it is controlled by the same
      throttling parameters as WRONGVIF (i.e. 1 packet per 3 seconds currently).
      Both messages are generated to keep backwards compatibily and avoid
      breaking someone who was enabling MRT_PIM with val == 4, since any
      positive val is accepted and treated the same.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c921c207
    • Linus Walleij's avatar
      net: gemini: Indicate that we can handle jumboframes · 430ac34d
      Linus Walleij authored
      The hardware supposedly handles frames up to 10236 bytes and
      implements .ndo_change_mtu() so accept 10236 minus the ethernet
      header for a VLAN tagged frame on the netdevices. Use
      ETH_MIN_MTU as minimum MTU.
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      430ac34d
    • Linus Walleij's avatar
      net: gemini: Move main init to port · 06d51513
      Linus Walleij authored
      The initialization sequence for the ethernet, setting up
      interrupt routing and such things, need to be done after
      both the ports are clocked and reset. Before this the
      config will not "take". Move the initialization to the
      port probe function and keep track of init status in
      the state.
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06d51513
    • Linus Walleij's avatar
      net: gemini: Allow multiple ports to instantiate · 60cc7767
      Linus Walleij authored
      The code was not tested with two ports actually in use at
      the same time. (I blame this on lack of actual hardware using
      that feature.) Now after locating a system using both ports,
      add necessary fix to make both ports come up.
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60cc7767
    • Linus Walleij's avatar
      net: gemini: Improve connection prints · 9ab5c929
      Linus Walleij authored
      Switch over to using a module parameter and debug prints
      that can be controlled by this or ethtool like everyone
      else. Depromote all other prints to debug messages.
      
      The phy_print_status() was already in place, albeit never
      really used because the debuglevel hiding it had to be
      set up using ethtool.
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ab5c929
    • Linus Walleij's avatar
      net: gemini: Look up L3 maxlen from table · cedca418
      Linus Walleij authored
      The code to calculate the hardware register enumerator
      for the maximum L3 length isn't entirely simple to read.
      Use the existing defines and rewrite the function into a
      table look-up.
      Acked-by: default avatarMichał Mirosław <mirq-linux@rere.qmqm.pl>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cedca418
    • David S. Miller's avatar
      Merge branch 'devlink-Add-support-for-region-access' · 750c721e
      David S. Miller authored
      Alex Vesker says:
      
      ====================
      devlink: Add support for region access
      
      This is a proposal which will allow access to driver defined address
      regions using devlink. Each device can create its supported address
      regions and register them. A device which exposes a region will allow
      access to it using devlink.
      
      The suggested implementation will allow exposing regions to the user,
      reading and dumping snapshots taken from different regions.
      A snapshot represents a memory image of a region taken by the driver.
      
      If a device collects a snapshot of an address region it can be later
      exposed using devlink region read or dump commands.
      This functionality allows for future analyses on the snapshots to be
      done.
      
      The major benefit of this support is not only to provide access to
      internal address regions which were inaccessible to the user but also
      to provide an additional way to debug complex error states using the
      region snapshots.
      
      Implemented commands:
      $ devlink region help
      $ devlink region show [ DEV/REGION ]
      $ devlink region del DEV/REGION snapshot SNAPSHOT_ID
      $ devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ]
      $ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ]
      	address ADDRESS length length
      
      Show all of the exposed regions with region sizes:
      $ devlink region show
      pci/0000:00:05.0/cr-space: size 1048576 snapshot [1 2]
      pci/0000:00:05.0/fw-health: size 64 snapshot [1 2]
      
      Delete a snapshot using:
      $ devlink region del pci/0000:00:05.0/cr-space snapshot 1
      
      Dump a snapshot:
      $ devlink region dump pci/0000:00:05.0/fw-health snapshot 1
      0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
      0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8
      0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc
      0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5
      
      Read a specific part of a snapshot:
      $ devlink region read pci/0000:00:05.0/fw-health snapshot 1 address 0
      	length 16
      0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
      
      For more information you can check devlink-region.8 man page
      
      Future:
      There is a plan to extend the support to include a write command
      as well as performing read and dump live region
      
      v1->v2:
      -Add a parameter to enable devlink region snapshot
      -Allocate snapshot memory using kvmalloc
      -Introduce destructor function devlink_snapshot_data_dest_t to avoid
       double allocation
      
      v2->v3:
      -Fix incorrect comment in devlink.h for DEVLINK_ATTR_REGION_SIZE
       from u32 to u64
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      750c721e
    • Alex Vesker's avatar
      net/mlx4_core: Use devlink region_snapshot parameter · 3c641ba4
      Alex Vesker authored
      This parameter enables capturing region snapshot of the crspace
      during critical errors. The default value of this parameter is
      disabled, it can be enabled using devlink param commands.
      It is possible to configure during runtime and also driver init.
      Signed-off-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c641ba4
    • Alex Vesker's avatar
      devlink: Add generic parameters region_snapshot · f6a69885
      Alex Vesker authored
      region_snapshot - When set enables capturing region snapshots
      Signed-off-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6a69885
    • Alex Vesker's avatar
      net/mlx4_core: Add Crdump FW snapshot support · bedc989b
      Alex Vesker authored
      Crdump allows the driver to create a snapshot of the FW PCI
      crspace and health buffer during a critical FW issue.
      In case of a FW command timeout, FW getting stuck or a non zero
      value on the catastrophic buffer, a snapshot will be taken.
      
      The snapshot is exposed using devlink, cr-space, fw-health
      address regions are registered on init and snapshots are attached
      once a new snapshot is collected by the driver.
      Signed-off-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bedc989b
    • Alex Vesker's avatar
      net/mlx4_core: Add health buffer address capability · 523f9eb1
      Alex Vesker authored
      Health buffer address is a 32 bit PCI address offset provided by
      the FW. This offset is used for reading FW health debug data
      located on the shared CR space. Cr space is accessible in both
      driver and FW and allows for different queries and configurations.
      Health buffer size is always 64B of readable data followed by a
      lock which is used to block volatile CR space access.
      Signed-off-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      523f9eb1
    • Alex Vesker's avatar
      devlink: Add support for region snapshot read command · 4e54795a
      Alex Vesker authored
      Add support for DEVLINK_CMD_REGION_READ_GET used for both reading
      and dumping region data. Read allows reading from a region specific
      address for given length. Dump allows reading the full region.
      If only snapshot ID is provided a snapshot dump will be done.
      If snapshot ID, Address and Length are provided a snapshot read
      will done.
      
      This is used for both snapshot access and will be used in the same
      way to access current data on the region.
      Signed-off-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e54795a
    • Alex Vesker's avatar
      devlink: Add support for region snapshot delete command · 866319bb
      Alex Vesker authored
      Add support for DEVLINK_CMD_REGION_DEL used
      for deleting a snapshot from a region. The snapshot ID is required.
      Also added notification support for NEW and DEL of snapshots.
      Signed-off-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      866319bb
    • Alex Vesker's avatar
      devlink: Extend the support querying for region snapshot IDs · a006d467
      Alex Vesker authored
      Extend the support for DEVLINK_CMD_REGION_GET command to also
      return the IDs of the snapshot currently present on the region.
      Each reply will include a nested snapshots attribute that
      can contain multiple snapshot attributes each with an ID.
      Signed-off-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a006d467
    • Alex Vesker's avatar
      devlink: Add support for region get command · d8db7ea5
      Alex Vesker authored
      Add support for DEVLINK_CMD_REGION_GET command which is used for
      querying for the supported DEV/REGION values of devlink devices.
      The support is both for doit and dumpit.
      
      Reply includes:
        BUS_NAME, DEVICE_NAME, REGION_NAME, REGION_SIZE
      Signed-off-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8db7ea5
    • Alex Vesker's avatar
      devlink: Add support for creating region snapshots · d7e52722
      Alex Vesker authored
      Each device address region can store multiple snapshots,
      each snapshot is identified using a different numerical ID.
      This ID is used when deleting a snapshot or showing an address
      region specific snapshot. This patch exposes a callback to add
      a new snapshot to an address region.
      The snapshot will be deleted using the destructor function
      when destroying a region or when a snapshot delete command
      from devlink user tool.
      Signed-off-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7e52722
    • Alex Vesker's avatar
      devlink: Add callback to query for snapshot id before snapshot create · ccadfa44
      Alex Vesker authored
      To restrict the driver with the snapshot ID selection a new callback
      is introduced for the driver to get the snapshot ID before creating
      a new snapshot. This will also allow giving the same ID for multiple
      snapshots taken of different regions on the same time.
      Signed-off-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ccadfa44
    • Alex Vesker's avatar
      devlink: Add support for creating and destroying regions · b16ebe92
      Alex Vesker authored
      This allows a device to register its supported address regions.
      Each address region can be accessed directly for example reading
      the snapshots taken of this address space.
      Drivers are not limited in the name selection for different regions.
      An example of a region-name can be: pci cr-space, register-space.
      Signed-off-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b16ebe92
    • David S. Miller's avatar
      Merge branch 'mvpp2-add-RSS-support' · 23c9ef2b
      David S. Miller authored
      Maxime Chevallier says:
      
      ====================
      net: mvpp2: add RSS support
      
      This series adds support for RSS on PPv2. There already was some code to
      handle the RSS tables, but the driver was missing all the classification
      steps required to actually use these tables.
      
      RSS is used through the classifier, using at least 2 lookups :
       - One using the C2 engine, a TCAM engine that match the packet based on
         some header extracted fields, assigns the default rx queue for that
         packet and tag it for RSS
       - One using the C3Hx engine, which computes the hash that's used to perform
         the lookup in the RSS table.
      
      Since RSS spreads the load across CPUs, we need to make sure that packets
      from the same flow are always assigned the same rx queue, to prevent
      re-ordering.
      
      This series therefore adds a classification step based on the Header Parser,
      that separate ingress traffic into 52 flows, based on some L2, L3 and L4
      parameters.
      
      Patches 1 and 2 fix some header issues, from the driver splitting
      
      Patches 3 to 7 make sure the correct receive queue setup is used for RSS
      
      Patches 8 to 14 deal with the way we handle the RSS tables
      
      Patch 15 implement basic classifier configuration, by using it to assign the
      default receive queue
      
      Patch 16 implement the ingress traffic splitting into multiple flows
      
      Patch 17 adds RSS support, by using the needed classification steps
      
      Patch 18 adds the required ethtool ops to configure the flow hash parameters
      
      This was tested on MacchiatoBin, giving some nice performance improvements
      using ip forwarding (going from 5Gbps to 9.6Gbps total throughput).
      
      RSS is disabled by default.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23c9ef2b
    • Maxime Chevallier's avatar
      net: mvpp2: allow setting RSS flow hash parameters with ethtool · 436d4fdb
      Maxime Chevallier authored
      This commit allows setting the RSS hash generation parameters from
      ethtool. When setting parameters for a given flow type from ethtool
      (e.g. tcp4), all the corresponding flows in the flow table are updated,
      according to the supported hash parameters.
      
      For example, when configuring TCP over IPv4 hash parameters to be
      src/dst IP  + src/dst port ("ethtool -N eth0 rx-flow-hash tcp4 sdfn"),
      we only set the "src/dst port" hash parameters on the non-fragmented TCP
      over IPv4 flows.
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      436d4fdb
    • Maxime Chevallier's avatar
      net: mvpp2: add an RSS classification step for each flow · d33ec452
      Maxime Chevallier authored
      One of the classification action that can be performed is to compute a
      hash of the packet header based on some header fields, and lookup a RSS
      table based on this hash to determine the final RxQ.
      
      This is done by adding one lookup entry per flow per port, so that we
      can configure the hash generation parameters for each flow and each
      port.
      
      There are 2 possible engines that can be used for RSS hash generation :
      
       - C3HA, that generates a hash based on up to 4 header-extracted fields
       - C3HB, that does the same as c3HA, but also includes L4 info in the hash
      
      There are a lot of fields that can be extracted from the header. For now,
      we only use the ones that we can configure using ethtool :
       - DST MAC address
       - L3 info
       - Source IP
       - Destination IP
       - Source port
       - Destination port
      
      The C3HB engine is selected when we use L4 fields (src/dst port).
      
                     Header parser          Dec table
       Ingress pkt  +-------------+ flow id +----------------------------+
      ------------->| TCAM + SRAM |-------->|TCP IPv4 w/ VLAN, not frag  |
                    +-------------+         |TCP IPv4 w/o VLAN, not frag |
                                            |TCP IPv4 w/ VLAN, frag      |--+
                                            |etc.                        |  |
                                            +----------------------------+  |
                                                                            |
                                                  Flow table                |
        +---------+   +------------+         +--------------------------+   |
        | RSS tbl |<--| Classifier |<--------| flow 0: C2 lookup        |   |
        +---------+   +------------+         |         C3 lookup port 0 |   |
                       |         |           |         C3 lookup port 1 |   |
               +-----------+ +-------------+ |         ...              |   |
               | C2 engine | | C3H engines | | flow 1: C2 lookup        |<--+
               +-----------+ +-------------+ |         C3 lookup port 0 |
                                             |         ...              |
                                             | ...                      |
                                             | flow 51 : C2 lookup      |
                                             |           ...            |
                                             +--------------------------+
      
      The C2 engine also gains the role of enabling and disabling the RSS
      table lookup for this packet.
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d33ec452
    • Maxime Chevallier's avatar
      net: mvpp2: split ingress traffic into multiple flows · f9358e12
      Maxime Chevallier authored
      The PPv2 classifier allows to perform classification operations on each
      ingress packet, based on the flow the packet is assigned to.
      
      The current code uses only 1 flow per port, and the only classification
      action consists of assigning the rx queue to the packet, depending on the
      port.
      
      In preparation for adding RSS support, we have to split all incoming
      traffic into different flows. Since RSS assigns a rx queue depending on
      the hash of some header fields, we have to make sure that the hash is
      generated in a consistent way for all packets in the same flow.
      
      What we call a "flow" is actually a set of attributes attached to a
      packet that depends on various L2/L3/L4 info.
      
      This patch introduces 52 flows, wich are a combination of various L2, L3
      and L4 attributes :
       - Whether or not the packet has a VLAN tag
       - Whether the packet is IPv4, IPv6 or something else
       - Whether the packet is TCP, UDP or something else
       - Whether or not the packet is fragmented at L3 level.
      
      The flow is associated to a packet by the Header Parser. Each flow
      corresponds to an entry in the decoding table. This entry then points to
      the sequence of classification lookups to be performed by the
      classifier, represented in the flow table.
      
      For now, the only lookup we perform is a C2 lookup to set the default
      rx queue.
      
                     Header parser          Dec table
       Ingress pkt  +-------------+ flow id +----------------------------+
      ------------->| TCAM + SRAM |-------->|TCP IPv4 w/ VLAN, not frag  |
                    +-------------+         |TCP IPv4 w/o VLAN, not frag |
                                            |TCP IPv4 w/ VLAN, frag      |--+
                                            |etc.                        |  |
                                            +----------------------------+  |
                                                                            |
                                                 Flow table                 |
                      +------------+        +---------------------+         |
           To RxQ <---| Classifier |<-------| flow 0: C2 lookup   |<--------+
                      +------------+        | flow 1: C2 lookup   |
                             |              | ...                 |
                      +------------+        | flow 51 : C2 lookup |
      		| C2 engine  |        +---------------------+
                      +------------+
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9358e12
    • Maxime Chevallier's avatar
      net: mvpp2: use classifier to assign default rx queue · b1a962c6
      Maxime Chevallier authored
      The PPv2 Controller has a classifier, that can perform multiple lookup
      operations for each packet, using different engines.
      
      One of these engines is the C2 engine, which performs TCAM based lookups
      on data extracted from the packet header. When a packet matches an
      entry, the engine sets various attributes, used to perform
      classification operations.
      
      One of these attributes is the rx queue in which the packet should be sent.
      The current code uses the lookup_id table (also called decoding table)
      to assign the rx queue. However, this only works if we use one entry per
      port in the decoding table, which won't be the case once we add RSS
      lookups.
      
      This patch uses the C2 engine to assign the rx queue to each packet.
      
      The C2 engine is used through the flow table, which dictates what
      classification operations are done for a given flow.
      
      Right now, we have one flow per port, which contains every ingress
      packet for this port.
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1a962c6
    • Maxime Chevallier's avatar
      net: mvpp2: rename per-port RSS init function · e6e21c02
      Maxime Chevallier authored
      mvpp22_init_rss function configures the RSS parameters for each port, so
      rename it accordingly. Since this function relies on classifier
      configuration, move its call right after the classifier config.
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6e21c02
    • Maxime Chevallier's avatar
      net: mvpp2: make sure we don't spread load on disabled CPUs · 2a2f467d
      Maxime Chevallier authored
      When filling the RSS table, we have to make sure that the rx queue is
      attached to an online CPU.
      
      This patch is not a full support for cpu_hotplug, but rather a way to
      make sure that we don't break network on system booted with the maxcpus
      parameter.
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a2f467d
    • Antoine Tenart's avatar
      net: mvpp2: improve the distribution of packets on CPUs when using RSS · 662ae3fe
      Antoine Tenart authored
      This patch adds an extra indirection when setting the indirection table
      into the RSS hardware table to improve the packets distribution across
      CPUs. For example, if 2 queues are used on a multi-core system this new
      indirection will choose two queues on two different CPUs instead of the
      two first queues which are on the same first CPU.
      Signed-off-by: default avatarAntoine Tenart <antoine.tenart@bootlin.com>
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      662ae3fe
    • Antoine Tenart's avatar
      net: mvpp2: RSS indirection table support · 8179642b
      Antoine Tenart authored
      This patch adds the RSS indirection table support, allowing to use the
      ethtool -x and -X options to dump and set this table.
      Signed-off-by: default avatarAntoine Tenart <antoine.tenart@bootlin.com>
      [Maxime: Small warning fixes, use one table per port]
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8179642b
    • Maxime Chevallier's avatar
      net: mvpp2: use one RSS table per port · a27a254c
      Maxime Chevallier authored
      PPv2 Controller has 8 RSS Tables, of 32 entries each. A lookup in the
      RXQ2RSS_TABLE is performed for each incoming packet, and the RSS Table
      to be used is chosen according to the default rx queue that would be
      used for the packet.
      
      This default rx queue is set in the Lookup_id Table (also called
      Decoding Table), and is equal to the port->first_rxq.
      
      Since the Classifier itself isn't active at any time for the moment,
      this doesn't have a direct effect, the default rx queue at the moment is
      the one where all packets end-up into.
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a27a254c
    • Maxime Chevallier's avatar
      net: mvpp2: fix RSS register definitions · 4b86097b
      Maxime Chevallier authored
      There is no RSS_TABLE register in PPv2 Controller. The register 0x1510
      which was specified is actually named "RSS_HASH_SEL", but isn't used by
      this driver at all.
      
      Based on how this register was used, it should have been the
      RXQ2RSS_TABLE register, which allows to select the RSS table that will
      be used for the incoming packet.
      
      The RSS_TABLE_POINTER is actually a field of this RXQ2RSS_TABLE
      register.
      
      Since RSS tables are actually not used by the driver for now, this
      commit does not fix a runtime bug.
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b86097b
    • Antoine Tenart's avatar
      net: mvpp2: fix a typo in the RSS code · 132baa03
      Antoine Tenart authored
      Cosmetic patch fixing a typo in one of the RSS comments.
      Signed-off-by: default avatarAntoine Tenart <antoine.tenart@bootlin.com>
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      132baa03
    • Maxime Chevallier's avatar
      net: mvpp2: use only one rx queue per port per CPU · f8c6ba84
      Maxime Chevallier authored
      The number of receive queue per port is :
       - MVPP2_DEFAULT_RXQ if in single queue mode
       - MVPP2_DEFAULT_RXQ * num_possible_cpus if in multi queue mode
      
      with MVPP2_DEFAULT_RXQ = 4.
      
      However, we don't use the extra rx queues at the moment, we really only
      need one per port per CPU, until some more advanced classification rules
      are implemented.
      Suggested-by: default avatarStefan Chulski <stefanc@marvell.com>
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8c6ba84