1. 03 Jun, 2021 19 commits
  2. 02 Jun, 2021 21 commits
    • David S. Miller's avatar
      Merge branch 'devlink-rate-objects' · 270d47dc
      David S. Miller authored
      Dmytro Linkin says:
      
      ====================
      devlink: rate objects API
      
      Resending without RFC.
      
      Currently kernel provides a way to change tx rate of single VF in
      switchdev mode via tc-police action. When lots of VFs are configured
      management of theirs rates becomes non-trivial task and some grouping
      mechanism is required. Implementing such grouping in tc-police will bring
      flow related limitations and unwanted complications, like:
      - tc-police is a policer and there is a user request for a traffic
        shaper, so shared tc-police action is not suitable;
      - flows requires net device to be placed on, means "groups" wouldn't
        have net device instance itself. Taking into the account previous
        point was reviewed a sollution, when representor have a policer and
        the driver use a shaper if qdisc contains group of VFs - such approach
        ugly, compilated and misleading;
      - TC is ingress only, while configuring "other" side of the wire looks
        more like a "real" picture where shaping is outside of the steering
        world, similar to "ip link" command;
      
      According to that devlink is the most appropriate place.
      
      This series introduces devlink API for managing tx rate of single devlink
      port or of a group by invoking callbacks (see below) of corresponding
      driver. Also devlink port or a group can be added to the parent group,
      where driver responsible to handle rates of a group elements. To achieve
      all of that new rate object is added. It can be one of the two types:
      - leaf - represents a single devlink port; created/destroyed by the
        driver and bound to the devlink port. As example, some driver may
        create leaf rate object for every devlink port associated with VF.
        Since leaf have 1to1 mapping to it's devlink port, in user space it is
        referred as pci/<bus_addr>/<port_index>;
      - node - represents a group of rate objects; created/deleted by request
        from the userspace; initially empty (no rate objects added). In
        userspace it is referred as pci/<bus_addr>/<node_name>, where node name
        can be any, except decimal number, to avoid collisions with leafs.
      
      devlink_ops extended with following callbacks:
      - rate_{leaf|node}_tx_{share|max}_set
      - rate_node_{new|del}
      - rate_{leaf|node}_parent_set
      
      KAPI provides:
      - creation/destruction of the leaf rate object associated with devlink
        port
      - destruction of rate nodes to allow a vendor driver to free allocated
        resources on driver removal or due to the other reasons when nodes
        destruction required
      
      UAPI provides:
      - dumping all or single rate objects
      - setting tx_{share|max} of rate object of any type
      - creating/deleting node rate object
      - setting/unsetting parent of any rate object
      
      Added devlink rate object support for netdevsim driver
      
      Issues/open questions:
      - Does user need DEVLINK_CMD_RATE_DEL_ALL_CHILD command to clean all
        children of particular parent node? For example:
        $ devlink port function rate flush netdevsim/netdevsim10/group
      - priv pointer passed to the callbacks is a source of bugs; in leaf case
        driver can embed rate object into internal structure and use
        container_of() on it; in node case it cannot be done since nodes are
        created from userspace
      
      v1->v2:
      - fixed kernel-doc for devlink_rate_leaf_{create|destroy}()
      - s/func/function/ for all devlink port command occurences
      
      v2->v3:
      - devlink:
        - added devlink_rate_nodes_destroy() function
      - netdevsim:
        - added call of devlink_rate_nodes_destroy() function
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      270d47dc
    • Dmytro Linkin's avatar
      Documentation: devlink rate objects · b62767e7
      Dmytro Linkin authored
      Add devlink rate objects section at devlink port documentation.
      Add devlink rate support info at netdevsim devlink documentation.
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b62767e7
    • Dmytro Linkin's avatar
      selftest: netdevsim: Add devlink rate grouping test · 1a9c0482
      Dmytro Linkin authored
      Test verifies that netdevsim correctly implements devlink ops callbacks
      that set node as a parent of devlink leaf or node rate object.
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a9c0482
    • Dmytro Linkin's avatar
      netdevsim: Allow setting parent node of rate objects · f3d101b4
      Dmytro Linkin authored
      Implement new devlink ops that allow setting rate node as a parent for
      devlink port (leaf) or another devlink node through devlink API.
      Expose parent names to netdevsim debugfs in read only mode.
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3d101b4
    • Dmytro Linkin's avatar
      devlink: Allow setting parent node of rate objects · d7555984
      Dmytro Linkin authored
      Refactor DEVLINK_CMD_RATE_{GET|SET} command handlers to support setting
      a node as a parent for another rate object (leaf or node) by means of
      new attribute DEVLINK_ATTR_RATE_PARENT_NODE_NAME. Extend devlink ops
      with new callbacks rate_{leaf|node}_parent_set() to set node as a parent
      for rate object to allow supporting drivers to implement rate grouping
      through devlink. Driver implementations are allowed to support leafs
      or node children only. Invoking callback with NULL as parent should be
      threated by the driver as unset parent action.
      Extend rate object struct with reference counter to disallow deleting a
      node with any child pointing to it. User should unset parent for the
      child explicitly.
      
      Example:
      
      $ devlink port function rate add netdevsim/netdevsim10/group1
      
      $ devlink port function rate add netdevsim/netdevsim10/group2
      
      $ devlink port function rate set netdevsim/netdevsim10/group1 parent group2
      
      $ devlink port function rate show netdevsim/netdevsim10/group1
      netdevsim/netdevsim10/group1: type node parent group2
      
      $ devlink port function rate set netdevsim/netdevsim10/group1 noparent
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7555984
    • Dmytro Linkin's avatar
      selftest: netdevsim: Add devlink rate nodes test · 413ee943
      Dmytro Linkin authored
      Test verifies that it is possible to create, delete and set min/max tx
      rate of devlink rate node on netdevsim VF.
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      413ee943
    • Dmytro Linkin's avatar
      netdevsim: Implement support for devlink rate nodes · 885226f5
      Dmytro Linkin authored
      Implement new devlink ops that allow creation, deletion and setting of
      shared/max tx rate of devlink rate nodes through devlink API.
      Expose rate node and it's tx rates to netdevsim debugfs.
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      885226f5
    • Dmytro Linkin's avatar
      devlink: Introduce rate nodes · a8ecb93e
      Dmytro Linkin authored
      Implement support for DEVLINK_CMD_RATE_{NEW|DEL} commands that are used
      to create and delete devlink rate nodes. Add new attribute
      DEVLINK_ATTR_RATE_NODE_NAME that specify node name string. The node name
      is an alphanumeric identifier. No valid node name can be a devlink port
      index, eg. decimal number. Extend devlink ops with new callbacks
      rate_node_{new|del}() and rate_node_tx_{share|max}_set() to allow
      supporting drivers to implement ports rate grouping and setting tx rate
      of rate nodes through devlink.
      Expose devlink_rate_nodes_destroy() function to allow vendor driver do
      proper cleanup of internally allocated resources for the nodes if the
      driver goes down or due to any other reasons which requires nodes to be
      destroyed.
      Disallow moving device from switchdev to legacy mode if any node exists
      on that device. User must explicitly delete nodes before switching mode.
      
      Example:
      
      $ devlink port function rate add netdevsim/netdevsim10/group1
      
      $ devlink port function rate set netdevsim/netdevsim10/group1 \
              tx_share 10mbit tx_max 100mbit
      
      Add + set command can be combined:
      
      $ devlink port function rate add netdevsim/netdevsim10/group1 \
              tx_share 10mbit tx_max 100mbit
      
      $ devlink port function rate show netdevsim/netdevsim10/group1
      netdevsim/netdevsim10/group1: type node tx_share 10mbit tx_max 100mbit
      
      $ devlink port function rate del netdevsim/netdevsim10/group1
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8ecb93e
    • Dmytro Linkin's avatar
      selftest: netdevsim: Add devlink port shared/max tx rate test · 31f07233
      Dmytro Linkin authored
      Test verifies that netdevsim VFs can set and retrieve shared/max tx
      rate through new devlink API.
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31f07233
    • Dmytro Linkin's avatar
      netdevsim: Implement devlink rate leafs tx rate support · 605c4f8f
      Dmytro Linkin authored
      Implement new devlink ops that allow shared and max tx rate control for
      devlink port rate objects (leafs) through devlink API.
      
      Expose rate values of VF ports to netdevsim debugfs.
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      605c4f8f
    • Dmytro Linkin's avatar
      devlink: Allow setting tx rate for devlink rate leaf objects · 1897db2e
      Dmytro Linkin authored
      Implement support for DEVLINK_CMD_RATE_SET command with new attributes
      DEVLINK_ATTR_RATE_TX_{SHARE|MAX} that are used to set devlink rate
      shared/max tx rate values. Extend devlink ops with new callbacks
      rate_leaf_tx_{share|max}_set() to allow supporting drivers to implement
      rate control through devlink.
      
      New attributes are optional. Driver implementations are allowed to
      support either or both of them.
      
      Shared rate example:
      
      $ devlink port function rate set netdevsim/netdevsim10/0 tx_share 10mbit
      
      $ devlink port function rate show netdevsim/netdevsim10/0
      netdevsim/netdevsim10/0: type leaf tx_share 10mbit
      
      Max rate example:
      
      $ devlink port function rate set netdevsim/netdevsim10/0 tx_max 100mbit
      
      $ devlink port function rate show netdevsim/netdevsim10/0
      netdevsim/netdevsim10/0: type leaf tx_max 100mbit
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1897db2e
    • Dmytro Linkin's avatar
      selftest: netdevsim: Add devlink rate test · a27d8e35
      Dmytro Linkin authored
      Test verifies that all netdevsim VF ports have rate leaf object created
      by default.
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a27d8e35
    • Dmytro Linkin's avatar
      netdevsim: Register devlink rate leaf objects per VF · 885dfe12
      Dmytro Linkin authored
      Register devlink rate leaf objects per VF.
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      885dfe12
    • Dmytro Linkin's avatar
      devlink: Introduce rate object · 4677efc4
      Dmytro Linkin authored
      Allow registering rate object for devlink ports with dedicated
      devlink_rate_leaf_{create|destroy}() API. Implement new netlink
      DEVLINK_CMD_RATE_GET command that is used to retrieve rate object info.
      Add new DEVLINK_CMD_RATE_{NEW|DEL} commands that are used for
      notifications when creating/deleting leaf rate object.
      
      Rate API is intended to be used for rate limiting of individual
      devlink ports (leafs) and their aggregates (nodes).
      
      Example:
      
      $ devlink port show
      pci/0000:03:00.0/0
      pci/0000:03:00.0/1
      
      $ devlink port function rate show
      pci/0000:03:00.0/0: type leaf
      pci/0000:03:00.0/1: type leaf
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4677efc4
    • Dmytro Linkin's avatar
      netdevsim: Implement legacy/switchdev mode for VFs · 160dc373
      Dmytro Linkin authored
      Implement callbacks to set/get eswitch mode value. Add helpers to check
      current mode.
      
      Instantiate VFs' net devices and devlink ports on switchdev enabling and
      remove them on legacy enabling. Changing number of VFs while in
      switchdev mode triggers VFs creation/deletion.
      
      Also disable NDO API callback to set VF rate, since it's legacy API.
      Switchdev API to set VF rate will be implemented in one of the next
      patches.
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      160dc373
    • Dmytro Linkin's avatar
      netdevsim: Implement VFs · 92ba1f29
      Dmytro Linkin authored
      Allow creation of netdevsim ports for VFs along with allocations of
      corresponding net devices and devlink ports.
      Add enums and helpers to distinguish PFs' ports from VFs' ports.
      
      Ports creation/deletion debugfs API intended to be used with physical
      ports only.
      VFs instantiation will be done in one of the next patches.
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92ba1f29
    • Dmytro Linkin's avatar
      netdevsim: Implement port types and indexing · 814b9ce6
      Dmytro Linkin authored
      Define type of ports, which netdevsim driver currently operates with as
      PF. Define new port type - VF, which will be implemented in following
      patches. Add helper functions to distinguish them. Add helper function
      to get VF index from port index.
      
      Add port indexing logic where PFs' indexes starts from 0, VFs' - from
      NSIM_DEV_VF_PORT_INDEX_BASE.
      All ports uses same index pool, which means that PF port may be created
      with index from VFs' indexes range.
      Maximum number of VFs, which the driver can allocate, is limited by
      UINT_MAX - BASE.
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      814b9ce6
    • Dmytro Linkin's avatar
      netdevsim: Disable VFs on nsim_dev_reload_destroy() call · 32ac15d8
      Dmytro Linkin authored
      Move VFs disabling from device release() to nsim_dev_reload_destroy() to
      make VFs disabling and ports removal simultaneous.
      This is a requirement for VFs ports implemented in next patches.
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      32ac15d8
    • Dmytro Linkin's avatar
      netdevsim: Add max_vfs to bus_dev · d3953819
      Dmytro Linkin authored
      Currently there is no limit to the number of VFs netdevsim can enable.
      In a real systems this value exist and used by the driver.
      Fore example, some features might need to consider this value when
      allocating memory.
      
      Expose max_vfs variable to debugfs as configurable resource. If are VFs
      configured (num_vfs != 0) then changing of max_vfs not allowed.
      Co-developed-by: default avatarYuval Avnery <yuvalav@nvidia.com>
      Signed-off-by: default avatarYuval Avnery <yuvalav@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3953819
    • David S. Miller's avatar
      Merge branch 'nfp-ct-offload' · 53c7bb55
      David S. Miller authored
      Simon Horman says:
      
      ====================
      Introduce conntrack offloading to the nfp driver
      
      Louis Peens says:
      
      This is the first in a series of patches to offload conntrack
      to the nfp. The approach followed is to flatten out three
      different flow rules into a single offloaded flow. The three
      different flows are:
      
      1) The rule sending the packet to conntrack (pre_ct)
      2) The rule matching on +trk+est after a packet has been through
         conntrack. (post_ct)
      3) The rule received via callback from the netfilter (nft)
      
      In order to offload a flow we need a combination of all three flows, but
      they could be added/deleted at different times and in different order.
      
      To solve this we save potential offloadable CT flows in the driver,
      and every time we receive a callback we check against these saved flows
      for valid merges. Once we have a valid combination of all three flows
      this will be offloaded to the NFP. This is demonstrated in the diagram
      below.
      
      	+-------------+                      +----------+
      	| pre_ct flow +--------+             | nft flow |
      	+-------------+        v             +------+---+
      	                  +----------+              |
      	                  | tc_merge +--------+     |
      	                  +----------+        v     v
      	+--------------+       ^           +-------------+
      	| post_ct flow +-------+       +---+nft_tc merge |
      	+--------------+               |   +-------------+
      	                               |
      	                               |
      	                               |
      	                               v
      	                        Offload to nfp
      
      This series is only up to the point of the pre_ct and post_ct
      merges into the tc_merge. Follow up series will continue
      to add the nft flows and merging of these flows with the result
      of the pre_ct and post_ct merged flows.
      
      Changes since v2:
      - nfp: flower-ct: add zone table entry when handling pre/post_ct flows
          Fixed another docstring. Should finally have the patch check
          environment properly configured now to avoid more of these.
      - nfp: flower-ct: add tc merge functionality
          Fixed warning found by "kernel test robot <lkp@intel.com>"
          Added code comment explaining chain_index comparison
      
      Changes since v1:
      - nfp: flower-ct: add ct zone table
          Fixed unused variable compile warning
          Fixed missing colon in struct description
      ====================
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53c7bb55
    • Louis Peens's avatar
      nfp: flower-ct: add tc merge functionality · 3c863c30
      Louis Peens authored
      Add merging of pre/post_ct flow rules into the tc_merge table.
      Pre_ct flows needs to be merge with post_ct flows and vice versa.
      
      This needs to be done for all flows in the same zone table, as well
      as with the wc_zone_table, which is for flows masking out ct_zone
      info.
      
      Cleanup is happening when all the tables are cleared up and prints
      a warning traceback as this is not expected in the final version.
      At this point we are not actually returning success for the offload,
      so we do not get any delete requests for flows, so we can't delete
      them that way yet. This means that cleanup happens in what would
      usually be an exception path.
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarYinjun Zhang <yinjun.zhang@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c863c30