Commit 270d47dc authored by David S. Miller's avatar David S. Miller

Merge branch 'devlink-rate-objects'

Dmytro Linkin says:

====================
devlink: rate objects API

Resending without RFC.

Currently kernel provides a way to change tx rate of single VF in
switchdev mode via tc-police action. When lots of VFs are configured
management of theirs rates becomes non-trivial task and some grouping
mechanism is required. Implementing such grouping in tc-police will bring
flow related limitations and unwanted complications, like:
- tc-police is a policer and there is a user request for a traffic
  shaper, so shared tc-police action is not suitable;
- flows requires net device to be placed on, means "groups" wouldn't
  have net device instance itself. Taking into the account previous
  point was reviewed a sollution, when representor have a policer and
  the driver use a shaper if qdisc contains group of VFs - such approach
  ugly, compilated and misleading;
- TC is ingress only, while configuring "other" side of the wire looks
  more like a "real" picture where shaping is outside of the steering
  world, similar to "ip link" command;

According to that devlink is the most appropriate place.

This series introduces devlink API for managing tx rate of single devlink
port or of a group by invoking callbacks (see below) of corresponding
driver. Also devlink port or a group can be added to the parent group,
where driver responsible to handle rates of a group elements. To achieve
all of that new rate object is added. It can be one of the two types:
- leaf - represents a single devlink port; created/destroyed by the
  driver and bound to the devlink port. As example, some driver may
  create leaf rate object for every devlink port associated with VF.
  Since leaf have 1to1 mapping to it's devlink port, in user space it is
  referred as pci/<bus_addr>/<port_index>;
- node - represents a group of rate objects; created/deleted by request
  from the userspace; initially empty (no rate objects added). In
  userspace it is referred as pci/<bus_addr>/<node_name>, where node name
  can be any, except decimal number, to avoid collisions with leafs.

devlink_ops extended with following callbacks:
- rate_{leaf|node}_tx_{share|max}_set
- rate_node_{new|del}
- rate_{leaf|node}_parent_set

KAPI provides:
- creation/destruction of the leaf rate object associated with devlink
  port
- destruction of rate nodes to allow a vendor driver to free allocated
  resources on driver removal or due to the other reasons when nodes
  destruction required

UAPI provides:
- dumping all or single rate objects
- setting tx_{share|max} of rate object of any type
- creating/deleting node rate object
- setting/unsetting parent of any rate object

Added devlink rate object support for netdevsim driver

Issues/open questions:
- Does user need DEVLINK_CMD_RATE_DEL_ALL_CHILD command to clean all
  children of particular parent node? For example:
  $ devlink port function rate flush netdevsim/netdevsim10/group
- priv pointer passed to the callbacks is a source of bugs; in leaf case
  driver can embed rate object into internal structure and use
  container_of() on it; in node case it cannot be done since nodes are
  created from userspace

v1->v2:
- fixed kernel-doc for devlink_rate_leaf_{create|destroy}()
- s/func/function/ for all devlink port command occurences

v2->v3:
- devlink:
  - added devlink_rate_nodes_destroy() function
- netdevsim:
  - added call of devlink_rate_nodes_destroy() function
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 53c7bb55 b62767e7
...@@ -164,6 +164,41 @@ device to instantiate the subfunction device on particular PCI function. ...@@ -164,6 +164,41 @@ device to instantiate the subfunction device on particular PCI function.
A subfunction device is created on the :ref:`Documentation/driver-api/auxiliary_bus.rst <auxiliary_bus>`. A subfunction device is created on the :ref:`Documentation/driver-api/auxiliary_bus.rst <auxiliary_bus>`.
At this point a matching subfunction driver binds to the subfunction's auxiliary device. At this point a matching subfunction driver binds to the subfunction's auxiliary device.
Rate object management
======================
Devlink provides API to manage tx rates of single devlink port or a group.
This is done through rate objects, which can be one of the two types:
``leaf``
Represents a single devlink port; created/destroyed by the driver. Since leaf
have 1to1 mapping to its devlink port, in user space it is referred as
``pci/<bus_addr>/<port_index>``;
``node``
Represents a group of rate objects (leafs and/or nodes); created/deleted by
request from the userspace; initially empty (no rate objects added). In
userspace it is referred as ``pci/<bus_addr>/<node_name>``, where
``node_name`` can be any identifier, except decimal number, to avoid
collisions with leafs.
API allows to configure following rate object's parameters:
``tx_share``
Minimum TX rate value shared among all other rate objects, or rate objects
that parts of the parent group, if it is a part of the same group.
``tx_max``
Maximum TX rate value.
``parent``
Parent node name. Parent node rate limits are considered as additional limits
to all node children limits. ``tx_max`` is an upper limit for children.
``tx_share`` is a total bandwidth distributed among children.
Driver implementations are allowed to support both or either rate object types
and setting methods of their parameters.
Terms and Definitions Terms and Definitions
===================== =====================
......
...@@ -57,6 +57,32 @@ entries, FIB rule entries and nexthops that the driver will allow. ...@@ -57,6 +57,32 @@ entries, FIB rule entries and nexthops that the driver will allow.
$ devlink resource set netdevsim/netdevsim0 path /nexthops size 16 $ devlink resource set netdevsim/netdevsim0 path /nexthops size 16
$ devlink dev reload netdevsim/netdevsim0 $ devlink dev reload netdevsim/netdevsim0
Rate objects
============
The ``netdevsim`` driver supports rate objects management, which includes:
- registerging/unregistering leaf rate objects per VF devlink port;
- creation/deletion node rate objects;
- setting tx_share and tx_max rate values for any rate object type;
- setting parent node for any rate object type.
Rate nodes and it's parameters are exposed in ``netdevsim`` debugfs in RO mode.
For example created rate node with name ``some_group``:
.. code:: shell
$ ls /sys/kernel/debug/netdevsim/netdevsim0/rate_groups/some_group
rate_parent tx_max tx_share
Same parameters are exposed for leaf objects in corresponding ports directories.
For ex.:
.. code:: shell
$ ls /sys/kernel/debug/netdevsim/netdevsim0/ports/1
dev ethtool rate_parent tx_max tx_share
Driver-specific Traps Driver-specific Traps
===================== =====================
......
...@@ -27,21 +27,34 @@ static struct nsim_bus_dev *to_nsim_bus_dev(struct device *dev) ...@@ -27,21 +27,34 @@ static struct nsim_bus_dev *to_nsim_bus_dev(struct device *dev)
static int nsim_bus_dev_vfs_enable(struct nsim_bus_dev *nsim_bus_dev, static int nsim_bus_dev_vfs_enable(struct nsim_bus_dev *nsim_bus_dev,
unsigned int num_vfs) unsigned int num_vfs)
{ {
nsim_bus_dev->vfconfigs = kcalloc(num_vfs, struct nsim_dev *nsim_dev;
sizeof(struct nsim_vf_config), int err = 0;
GFP_KERNEL | __GFP_NOWARN);
if (nsim_bus_dev->max_vfs < num_vfs)
return -ENOMEM;
if (!nsim_bus_dev->vfconfigs) if (!nsim_bus_dev->vfconfigs)
return -ENOMEM; return -ENOMEM;
nsim_bus_dev->num_vfs = num_vfs; nsim_bus_dev->num_vfs = num_vfs;
return 0; nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
if (nsim_esw_mode_is_switchdev(nsim_dev)) {
err = nsim_esw_switchdev_enable(nsim_dev, NULL);
if (err)
nsim_bus_dev->num_vfs = 0;
}
return err;
} }
static void nsim_bus_dev_vfs_disable(struct nsim_bus_dev *nsim_bus_dev) void nsim_bus_dev_vfs_disable(struct nsim_bus_dev *nsim_bus_dev)
{ {
kfree(nsim_bus_dev->vfconfigs); struct nsim_dev *nsim_dev;
nsim_bus_dev->vfconfigs = NULL;
nsim_bus_dev->num_vfs = 0; nsim_bus_dev->num_vfs = 0;
nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
if (nsim_esw_mode_is_switchdev(nsim_dev))
nsim_esw_legacy_enable(nsim_dev, NULL);
} }
static ssize_t static ssize_t
...@@ -56,7 +69,7 @@ nsim_bus_dev_numvfs_store(struct device *dev, struct device_attribute *attr, ...@@ -56,7 +69,7 @@ nsim_bus_dev_numvfs_store(struct device *dev, struct device_attribute *attr,
if (ret) if (ret)
return ret; return ret;
rtnl_lock(); mutex_lock(&nsim_bus_dev->vfs_lock);
if (nsim_bus_dev->num_vfs == num_vfs) if (nsim_bus_dev->num_vfs == num_vfs)
goto exit_good; goto exit_good;
if (nsim_bus_dev->num_vfs && num_vfs) { if (nsim_bus_dev->num_vfs && num_vfs) {
...@@ -74,7 +87,7 @@ nsim_bus_dev_numvfs_store(struct device *dev, struct device_attribute *attr, ...@@ -74,7 +87,7 @@ nsim_bus_dev_numvfs_store(struct device *dev, struct device_attribute *attr,
exit_good: exit_good:
ret = count; ret = count;
exit_unlock: exit_unlock:
rtnl_unlock(); mutex_unlock(&nsim_bus_dev->vfs_lock);
return ret; return ret;
} }
...@@ -92,6 +105,79 @@ static struct device_attribute nsim_bus_dev_numvfs_attr = ...@@ -92,6 +105,79 @@ static struct device_attribute nsim_bus_dev_numvfs_attr =
__ATTR(sriov_numvfs, 0664, nsim_bus_dev_numvfs_show, __ATTR(sriov_numvfs, 0664, nsim_bus_dev_numvfs_show,
nsim_bus_dev_numvfs_store); nsim_bus_dev_numvfs_store);
ssize_t nsim_bus_dev_max_vfs_read(struct file *file,
char __user *data,
size_t count, loff_t *ppos)
{
struct nsim_bus_dev *nsim_bus_dev = file->private_data;
char buf[11];
size_t len;
len = snprintf(buf, sizeof(buf), "%u\n", nsim_bus_dev->max_vfs);
if (len < 0)
return len;
return simple_read_from_buffer(data, count, ppos, buf, len);
}
ssize_t nsim_bus_dev_max_vfs_write(struct file *file,
const char __user *data,
size_t count, loff_t *ppos)
{
struct nsim_bus_dev *nsim_bus_dev = file->private_data;
struct nsim_vf_config *vfconfigs;
ssize_t ret;
char buf[10];
u32 val;
if (*ppos != 0)
return 0;
if (count >= sizeof(buf))
return -ENOSPC;
mutex_lock(&nsim_bus_dev->vfs_lock);
/* Reject if VFs are configured */
if (nsim_bus_dev->num_vfs) {
ret = -EBUSY;
goto unlock;
}
ret = copy_from_user(buf, data, count);
if (ret) {
ret = -EFAULT;
goto unlock;
}
buf[count] = '\0';
ret = kstrtouint(buf, 10, &val);
if (ret) {
ret = -EIO;
goto unlock;
}
/* max_vfs limited by the maximum number of provided port indexes */
if (val > NSIM_DEV_VF_PORT_INDEX_MAX - NSIM_DEV_VF_PORT_INDEX_BASE) {
ret = -ERANGE;
goto unlock;
}
vfconfigs = kcalloc(val, sizeof(struct nsim_vf_config), GFP_KERNEL | __GFP_NOWARN);
if (!vfconfigs) {
ret = -ENOMEM;
goto unlock;
}
kfree(nsim_bus_dev->vfconfigs);
nsim_bus_dev->vfconfigs = vfconfigs;
nsim_bus_dev->max_vfs = val;
*ppos += count;
ret = count;
unlock:
mutex_unlock(&nsim_bus_dev->vfs_lock);
return ret;
}
static ssize_t static ssize_t
new_port_store(struct device *dev, struct device_attribute *attr, new_port_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count) const char *buf, size_t count)
...@@ -113,7 +199,7 @@ new_port_store(struct device *dev, struct device_attribute *attr, ...@@ -113,7 +199,7 @@ new_port_store(struct device *dev, struct device_attribute *attr,
mutex_lock(&nsim_bus_dev->nsim_bus_reload_lock); mutex_lock(&nsim_bus_dev->nsim_bus_reload_lock);
devlink_reload_disable(devlink); devlink_reload_disable(devlink);
ret = nsim_dev_port_add(nsim_bus_dev, port_index); ret = nsim_dev_port_add(nsim_bus_dev, NSIM_DEV_PORT_TYPE_PF, port_index);
devlink_reload_enable(devlink); devlink_reload_enable(devlink);
mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock); mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
return ret ? ret : count; return ret ? ret : count;
...@@ -142,7 +228,7 @@ del_port_store(struct device *dev, struct device_attribute *attr, ...@@ -142,7 +228,7 @@ del_port_store(struct device *dev, struct device_attribute *attr,
mutex_lock(&nsim_bus_dev->nsim_bus_reload_lock); mutex_lock(&nsim_bus_dev->nsim_bus_reload_lock);
devlink_reload_disable(devlink); devlink_reload_disable(devlink);
ret = nsim_dev_port_del(nsim_bus_dev, port_index); ret = nsim_dev_port_del(nsim_bus_dev, NSIM_DEV_PORT_TYPE_PF, port_index);
devlink_reload_enable(devlink); devlink_reload_enable(devlink);
mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock); mutex_unlock(&nsim_bus_dev->nsim_bus_reload_lock);
return ret ? ret : count; return ret ? ret : count;
...@@ -168,9 +254,6 @@ static const struct attribute_group *nsim_bus_dev_attr_groups[] = { ...@@ -168,9 +254,6 @@ static const struct attribute_group *nsim_bus_dev_attr_groups[] = {
static void nsim_bus_dev_release(struct device *dev) static void nsim_bus_dev_release(struct device *dev)
{ {
struct nsim_bus_dev *nsim_bus_dev = to_nsim_bus_dev(dev);
nsim_bus_dev_vfs_disable(nsim_bus_dev);
} }
static struct device_type nsim_bus_dev_type = { static struct device_type nsim_bus_dev_type = {
...@@ -311,6 +394,8 @@ static struct bus_type nsim_bus = { ...@@ -311,6 +394,8 @@ static struct bus_type nsim_bus = {
.num_vf = nsim_num_vf, .num_vf = nsim_num_vf,
}; };
#define NSIM_BUS_DEV_MAX_VFS 4
static struct nsim_bus_dev * static struct nsim_bus_dev *
nsim_bus_dev_new(unsigned int id, unsigned int port_count) nsim_bus_dev_new(unsigned int id, unsigned int port_count)
{ {
...@@ -329,15 +414,28 @@ nsim_bus_dev_new(unsigned int id, unsigned int port_count) ...@@ -329,15 +414,28 @@ nsim_bus_dev_new(unsigned int id, unsigned int port_count)
nsim_bus_dev->dev.type = &nsim_bus_dev_type; nsim_bus_dev->dev.type = &nsim_bus_dev_type;
nsim_bus_dev->port_count = port_count; nsim_bus_dev->port_count = port_count;
nsim_bus_dev->initial_net = current->nsproxy->net_ns; nsim_bus_dev->initial_net = current->nsproxy->net_ns;
nsim_bus_dev->max_vfs = NSIM_BUS_DEV_MAX_VFS;
mutex_init(&nsim_bus_dev->nsim_bus_reload_lock); mutex_init(&nsim_bus_dev->nsim_bus_reload_lock);
mutex_init(&nsim_bus_dev->vfs_lock);
/* Disallow using nsim_bus_dev */ /* Disallow using nsim_bus_dev */
smp_store_release(&nsim_bus_dev->init, false); smp_store_release(&nsim_bus_dev->init, false);
nsim_bus_dev->vfconfigs = kcalloc(nsim_bus_dev->max_vfs,
sizeof(struct nsim_vf_config),
GFP_KERNEL | __GFP_NOWARN);
if (!nsim_bus_dev->vfconfigs) {
err = -ENOMEM;
goto err_nsim_bus_dev_id_free;
}
err = device_register(&nsim_bus_dev->dev); err = device_register(&nsim_bus_dev->dev);
if (err) if (err)
goto err_nsim_bus_dev_id_free; goto err_nsim_vfs_free;
return nsim_bus_dev; return nsim_bus_dev;
err_nsim_vfs_free:
kfree(nsim_bus_dev->vfconfigs);
err_nsim_bus_dev_id_free: err_nsim_bus_dev_id_free:
ida_free(&nsim_bus_dev_ids, nsim_bus_dev->dev.id); ida_free(&nsim_bus_dev_ids, nsim_bus_dev->dev.id);
err_nsim_bus_dev_free: err_nsim_bus_dev_free:
...@@ -351,6 +449,7 @@ static void nsim_bus_dev_del(struct nsim_bus_dev *nsim_bus_dev) ...@@ -351,6 +449,7 @@ static void nsim_bus_dev_del(struct nsim_bus_dev *nsim_bus_dev)
smp_store_release(&nsim_bus_dev->init, false); smp_store_release(&nsim_bus_dev->init, false);
device_unregister(&nsim_bus_dev->dev); device_unregister(&nsim_bus_dev->dev);
ida_free(&nsim_bus_dev_ids, nsim_bus_dev->dev.id); ida_free(&nsim_bus_dev_ids, nsim_bus_dev->dev.id);
kfree(nsim_bus_dev->vfconfigs);
kfree(nsim_bus_dev); kfree(nsim_bus_dev);
} }
......
This diff is collapsed.
...@@ -113,6 +113,11 @@ static int nsim_set_vf_rate(struct net_device *dev, int vf, int min, int max) ...@@ -113,6 +113,11 @@ static int nsim_set_vf_rate(struct net_device *dev, int vf, int min, int max)
struct netdevsim *ns = netdev_priv(dev); struct netdevsim *ns = netdev_priv(dev);
struct nsim_bus_dev *nsim_bus_dev = ns->nsim_bus_dev; struct nsim_bus_dev *nsim_bus_dev = ns->nsim_bus_dev;
if (nsim_esw_mode_is_switchdev(ns->nsim_dev)) {
pr_err("Not supported in switchdev mode. Please use devlink API.\n");
return -EOPNOTSUPP;
}
if (vf >= nsim_bus_dev->num_vfs) if (vf >= nsim_bus_dev->num_vfs)
return -EINVAL; return -EINVAL;
...@@ -261,6 +266,18 @@ static const struct net_device_ops nsim_netdev_ops = { ...@@ -261,6 +266,18 @@ static const struct net_device_ops nsim_netdev_ops = {
.ndo_get_devlink_port = nsim_get_devlink_port, .ndo_get_devlink_port = nsim_get_devlink_port,
}; };
static const struct net_device_ops nsim_vf_netdev_ops = {
.ndo_start_xmit = nsim_start_xmit,
.ndo_set_rx_mode = nsim_set_rx_mode,
.ndo_set_mac_address = eth_mac_addr,
.ndo_validate_addr = eth_validate_addr,
.ndo_change_mtu = nsim_change_mtu,
.ndo_get_stats64 = nsim_get_stats64,
.ndo_setup_tc = nsim_setup_tc,
.ndo_set_features = nsim_set_features,
.ndo_get_devlink_port = nsim_get_devlink_port,
};
static void nsim_setup(struct net_device *dev) static void nsim_setup(struct net_device *dev)
{ {
ether_setup(dev); ether_setup(dev);
...@@ -280,6 +297,49 @@ static void nsim_setup(struct net_device *dev) ...@@ -280,6 +297,49 @@ static void nsim_setup(struct net_device *dev)
dev->max_mtu = ETH_MAX_MTU; dev->max_mtu = ETH_MAX_MTU;
} }
static int nsim_init_netdevsim(struct netdevsim *ns)
{
int err;
ns->netdev->netdev_ops = &nsim_netdev_ops;
err = nsim_udp_tunnels_info_create(ns->nsim_dev, ns->netdev);
if (err)
return err;
rtnl_lock();
err = nsim_bpf_init(ns);
if (err)
goto err_utn_destroy;
nsim_ipsec_init(ns);
err = register_netdevice(ns->netdev);
if (err)
goto err_ipsec_teardown;
rtnl_unlock();
return 0;
err_ipsec_teardown:
nsim_ipsec_teardown(ns);
nsim_bpf_uninit(ns);
err_utn_destroy:
rtnl_unlock();
nsim_udp_tunnels_info_destroy(ns->netdev);
return err;
}
static int nsim_init_netdevsim_vf(struct netdevsim *ns)
{
int err;
ns->netdev->netdev_ops = &nsim_vf_netdev_ops;
rtnl_lock();
err = register_netdevice(ns->netdev);
rtnl_unlock();
return err;
}
struct netdevsim * struct netdevsim *
nsim_create(struct nsim_dev *nsim_dev, struct nsim_dev_port *nsim_dev_port) nsim_create(struct nsim_dev *nsim_dev, struct nsim_dev_port *nsim_dev_port)
{ {
...@@ -299,33 +359,15 @@ nsim_create(struct nsim_dev *nsim_dev, struct nsim_dev_port *nsim_dev_port) ...@@ -299,33 +359,15 @@ nsim_create(struct nsim_dev *nsim_dev, struct nsim_dev_port *nsim_dev_port)
ns->nsim_dev_port = nsim_dev_port; ns->nsim_dev_port = nsim_dev_port;
ns->nsim_bus_dev = nsim_dev->nsim_bus_dev; ns->nsim_bus_dev = nsim_dev->nsim_bus_dev;
SET_NETDEV_DEV(dev, &ns->nsim_bus_dev->dev); SET_NETDEV_DEV(dev, &ns->nsim_bus_dev->dev);
dev->netdev_ops = &nsim_netdev_ops;
nsim_ethtool_init(ns); nsim_ethtool_init(ns);
if (nsim_dev_port_is_pf(nsim_dev_port))
err = nsim_udp_tunnels_info_create(nsim_dev, dev); err = nsim_init_netdevsim(ns);
else
err = nsim_init_netdevsim_vf(ns);
if (err) if (err)
goto err_free_netdev; goto err_free_netdev;
rtnl_lock();
err = nsim_bpf_init(ns);
if (err)
goto err_utn_destroy;
nsim_ipsec_init(ns);
err = register_netdevice(dev);
if (err)
goto err_ipsec_teardown;
rtnl_unlock();
return ns; return ns;
err_ipsec_teardown:
nsim_ipsec_teardown(ns);
nsim_bpf_uninit(ns);
err_utn_destroy:
rtnl_unlock();
nsim_udp_tunnels_info_destroy(dev);
err_free_netdev: err_free_netdev:
free_netdev(dev); free_netdev(dev);
return ERR_PTR(err); return ERR_PTR(err);
...@@ -337,9 +379,12 @@ void nsim_destroy(struct netdevsim *ns) ...@@ -337,9 +379,12 @@ void nsim_destroy(struct netdevsim *ns)
rtnl_lock(); rtnl_lock();
unregister_netdevice(dev); unregister_netdevice(dev);
if (nsim_dev_port_is_pf(ns->nsim_dev_port)) {
nsim_ipsec_teardown(ns); nsim_ipsec_teardown(ns);
nsim_bpf_uninit(ns); nsim_bpf_uninit(ns);
}
rtnl_unlock(); rtnl_unlock();
if (nsim_dev_port_is_pf(ns->nsim_dev_port))
nsim_udp_tunnels_info_destroy(dev); nsim_udp_tunnels_info_destroy(dev);
free_netdev(dev); free_netdev(dev);
} }
......
...@@ -197,11 +197,22 @@ static inline void nsim_dev_psample_exit(struct nsim_dev *nsim_dev) ...@@ -197,11 +197,22 @@ static inline void nsim_dev_psample_exit(struct nsim_dev *nsim_dev)
} }
#endif #endif
enum nsim_dev_port_type {
NSIM_DEV_PORT_TYPE_PF,
NSIM_DEV_PORT_TYPE_VF,
};
#define NSIM_DEV_VF_PORT_INDEX_BASE 128
#define NSIM_DEV_VF_PORT_INDEX_MAX UINT_MAX
struct nsim_dev_port { struct nsim_dev_port {
struct list_head list; struct list_head list;
struct devlink_port devlink_port; struct devlink_port devlink_port;
unsigned int port_index; unsigned int port_index;
enum nsim_dev_port_type port_type;
struct dentry *ddir; struct dentry *ddir;
struct dentry *rate_parent;
char *parent_name;
struct netdevsim *ns; struct netdevsim *ns;
}; };
...@@ -212,6 +223,8 @@ struct nsim_dev { ...@@ -212,6 +223,8 @@ struct nsim_dev {
struct dentry *ddir; struct dentry *ddir;
struct dentry *ports_ddir; struct dentry *ports_ddir;
struct dentry *take_snapshot; struct dentry *take_snapshot;
struct dentry *max_vfs;
struct dentry *nodes_ddir;
struct bpf_offload_dev *bpf_dev; struct bpf_offload_dev *bpf_dev;
bool bpf_bind_accept; bool bpf_bind_accept;
bool bpf_bind_verifier_accept; bool bpf_bind_verifier_accept;
...@@ -247,8 +260,22 @@ struct nsim_dev { ...@@ -247,8 +260,22 @@ struct nsim_dev {
u32 sleep; u32 sleep;
} udp_ports; } udp_ports;
struct nsim_dev_psample *psample; struct nsim_dev_psample *psample;
u16 esw_mode;
}; };
int nsim_esw_legacy_enable(struct nsim_dev *nsim_dev, struct netlink_ext_ack *extack);
int nsim_esw_switchdev_enable(struct nsim_dev *nsim_dev, struct netlink_ext_ack *extack);
static inline bool nsim_esw_mode_is_legacy(struct nsim_dev *nsim_dev)
{
return nsim_dev->esw_mode == DEVLINK_ESWITCH_MODE_LEGACY;
}
static inline bool nsim_esw_mode_is_switchdev(struct nsim_dev *nsim_dev)
{
return nsim_dev->esw_mode == DEVLINK_ESWITCH_MODE_SWITCHDEV;
}
static inline struct net *nsim_dev_net(struct nsim_dev *nsim_dev) static inline struct net *nsim_dev_net(struct nsim_dev *nsim_dev)
{ {
return devlink_net(priv_to_devlink(nsim_dev)); return devlink_net(priv_to_devlink(nsim_dev));
...@@ -259,8 +286,10 @@ void nsim_dev_exit(void); ...@@ -259,8 +286,10 @@ void nsim_dev_exit(void);
int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev); int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev);
void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev); void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev);
int nsim_dev_port_add(struct nsim_bus_dev *nsim_bus_dev, int nsim_dev_port_add(struct nsim_bus_dev *nsim_bus_dev,
enum nsim_dev_port_type type,
unsigned int port_index); unsigned int port_index);
int nsim_dev_port_del(struct nsim_bus_dev *nsim_bus_dev, int nsim_dev_port_del(struct nsim_bus_dev *nsim_bus_dev,
enum nsim_dev_port_type type,
unsigned int port_index); unsigned int port_index);
struct nsim_fib_data *nsim_fib_create(struct devlink *devlink, struct nsim_fib_data *nsim_fib_create(struct devlink *devlink,
...@@ -269,6 +298,23 @@ void nsim_fib_destroy(struct devlink *devlink, struct nsim_fib_data *fib_data); ...@@ -269,6 +298,23 @@ void nsim_fib_destroy(struct devlink *devlink, struct nsim_fib_data *fib_data);
u64 nsim_fib_get_val(struct nsim_fib_data *fib_data, u64 nsim_fib_get_val(struct nsim_fib_data *fib_data,
enum nsim_resource_id res_id, bool max); enum nsim_resource_id res_id, bool max);
ssize_t nsim_bus_dev_max_vfs_read(struct file *file,
char __user *data,
size_t count, loff_t *ppos);
ssize_t nsim_bus_dev_max_vfs_write(struct file *file,
const char __user *data,
size_t count, loff_t *ppos);
void nsim_bus_dev_vfs_disable(struct nsim_bus_dev *nsim_bus_dev);
static inline bool nsim_dev_port_is_pf(struct nsim_dev_port *nsim_dev_port)
{
return nsim_dev_port->port_type == NSIM_DEV_PORT_TYPE_PF;
}
static inline bool nsim_dev_port_is_vf(struct nsim_dev_port *nsim_dev_port)
{
return nsim_dev_port->port_type == NSIM_DEV_PORT_TYPE_VF;
}
#if IS_ENABLED(CONFIG_XFRM_OFFLOAD) #if IS_ENABLED(CONFIG_XFRM_OFFLOAD)
void nsim_ipsec_init(struct netdevsim *ns); void nsim_ipsec_init(struct netdevsim *ns);
void nsim_ipsec_teardown(struct netdevsim *ns); void nsim_ipsec_teardown(struct netdevsim *ns);
...@@ -308,7 +354,9 @@ struct nsim_bus_dev { ...@@ -308,7 +354,9 @@ struct nsim_bus_dev {
struct net *initial_net; /* Purpose of this is to carry net pointer struct net *initial_net; /* Purpose of this is to carry net pointer
* during the probe time only. * during the probe time only.
*/ */
unsigned int max_vfs;
unsigned int num_vfs; unsigned int num_vfs;
struct mutex vfs_lock; /* Protects vfconfigs */
struct nsim_vf_config *vfconfigs; struct nsim_vf_config *vfconfigs;
/* Lock for devlink->reload_enabled in netdevsim module */ /* Lock for devlink->reload_enabled in netdevsim module */
struct mutex nsim_bus_reload_lock; struct mutex nsim_bus_reload_lock;
......
...@@ -34,6 +34,7 @@ struct devlink_ops; ...@@ -34,6 +34,7 @@ struct devlink_ops;
struct devlink { struct devlink {
struct list_head list; struct list_head list;
struct list_head port_list; struct list_head port_list;
struct list_head rate_list;
struct list_head sb_list; struct list_head sb_list;
struct list_head dpipe_table_list; struct list_head dpipe_table_list;
struct list_head resource_list; struct list_head resource_list;
...@@ -133,6 +134,24 @@ struct devlink_port_attrs { ...@@ -133,6 +134,24 @@ struct devlink_port_attrs {
}; };
}; };
struct devlink_rate {
struct list_head list;
enum devlink_rate_type type;
struct devlink *devlink;
void *priv;
u64 tx_share;
u64 tx_max;
struct devlink_rate *parent;
union {
struct devlink_port *devlink_port;
struct {
char *name;
refcount_t refcnt;
};
};
};
struct devlink_port { struct devlink_port {
struct list_head list; struct list_head list;
struct list_head param_list; struct list_head param_list;
...@@ -152,6 +171,8 @@ struct devlink_port { ...@@ -152,6 +171,8 @@ struct devlink_port {
struct delayed_work type_warn_dw; struct delayed_work type_warn_dw;
struct list_head reporter_list; struct list_head reporter_list;
struct mutex reporters_lock; /* Protects reporter_list */ struct mutex reporters_lock; /* Protects reporter_list */
struct devlink_rate *devlink_rate;
}; };
struct devlink_port_new_attrs { struct devlink_port_new_attrs {
...@@ -1453,6 +1474,30 @@ struct devlink_ops { ...@@ -1453,6 +1474,30 @@ struct devlink_ops {
struct devlink_port *port, struct devlink_port *port,
enum devlink_port_fn_state state, enum devlink_port_fn_state state,
struct netlink_ext_ack *extack); struct netlink_ext_ack *extack);
/**
* Rate control callbacks.
*/
int (*rate_leaf_tx_share_set)(struct devlink_rate *devlink_rate, void *priv,
u64 tx_share, struct netlink_ext_ack *extack);
int (*rate_leaf_tx_max_set)(struct devlink_rate *devlink_rate, void *priv,
u64 tx_max, struct netlink_ext_ack *extack);
int (*rate_node_tx_share_set)(struct devlink_rate *devlink_rate, void *priv,
u64 tx_share, struct netlink_ext_ack *extack);
int (*rate_node_tx_max_set)(struct devlink_rate *devlink_rate, void *priv,
u64 tx_max, struct netlink_ext_ack *extack);
int (*rate_node_new)(struct devlink_rate *rate_node, void **priv,
struct netlink_ext_ack *extack);
int (*rate_node_del)(struct devlink_rate *rate_node, void *priv,
struct netlink_ext_ack *extack);
int (*rate_leaf_parent_set)(struct devlink_rate *child,
struct devlink_rate *parent,
void *priv_child, void *priv_parent,
struct netlink_ext_ack *extack);
int (*rate_node_parent_set)(struct devlink_rate *child,
struct devlink_rate *parent,
void *priv_child, void *priv_parent,
struct netlink_ext_ack *extack);
}; };
static inline void *devlink_priv(struct devlink *devlink) static inline void *devlink_priv(struct devlink *devlink)
...@@ -1512,6 +1557,9 @@ void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 contro ...@@ -1512,6 +1557,9 @@ void devlink_port_attrs_pci_vf_set(struct devlink_port *devlink_port, u32 contro
void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port, void devlink_port_attrs_pci_sf_set(struct devlink_port *devlink_port,
u32 controller, u16 pf, u32 sf, u32 controller, u16 pf, u32 sf,
bool external); bool external);
int devlink_rate_leaf_create(struct devlink_port *port, void *priv);
void devlink_rate_leaf_destroy(struct devlink_port *devlink_port);
void devlink_rate_nodes_destroy(struct devlink *devlink);
int devlink_sb_register(struct devlink *devlink, unsigned int sb_index, int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
u32 size, u16 ingress_pools_count, u32 size, u16 ingress_pools_count,
u16 egress_pools_count, u16 ingress_tc_count, u16 egress_pools_count, u16 ingress_tc_count,
......
...@@ -126,6 +126,11 @@ enum devlink_command { ...@@ -126,6 +126,11 @@ enum devlink_command {
DEVLINK_CMD_HEALTH_REPORTER_TEST, DEVLINK_CMD_HEALTH_REPORTER_TEST,
DEVLINK_CMD_RATE_GET, /* can dump */
DEVLINK_CMD_RATE_SET,
DEVLINK_CMD_RATE_NEW,
DEVLINK_CMD_RATE_DEL,
/* add new commands above here */ /* add new commands above here */
__DEVLINK_CMD_MAX, __DEVLINK_CMD_MAX,
DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1 DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
...@@ -206,6 +211,11 @@ enum devlink_port_flavour { ...@@ -206,6 +211,11 @@ enum devlink_port_flavour {
*/ */
}; };
enum devlink_rate_type {
DEVLINK_RATE_TYPE_LEAF,
DEVLINK_RATE_TYPE_NODE,
};
enum devlink_param_cmode { enum devlink_param_cmode {
DEVLINK_PARAM_CMODE_RUNTIME, DEVLINK_PARAM_CMODE_RUNTIME,
DEVLINK_PARAM_CMODE_DRIVERINIT, DEVLINK_PARAM_CMODE_DRIVERINIT,
...@@ -534,6 +544,13 @@ enum devlink_attr { ...@@ -534,6 +544,13 @@ enum devlink_attr {
DEVLINK_ATTR_RELOAD_ACTION_STATS, /* nested */ DEVLINK_ATTR_RELOAD_ACTION_STATS, /* nested */
DEVLINK_ATTR_PORT_PCI_SF_NUMBER, /* u32 */ DEVLINK_ATTR_PORT_PCI_SF_NUMBER, /* u32 */
DEVLINK_ATTR_RATE_TYPE, /* u16 */
DEVLINK_ATTR_RATE_TX_SHARE, /* u64 */
DEVLINK_ATTR_RATE_TX_MAX, /* u64 */
DEVLINK_ATTR_RATE_NODE_NAME, /* string */
DEVLINK_ATTR_RATE_PARENT_NODE_NAME, /* string */
/* add new attributes above here, update the policy in devlink.c */ /* add new attributes above here, update the policy in devlink.c */
__DEVLINK_ATTR_MAX, __DEVLINK_ATTR_MAX,
......
This diff is collapsed.
...@@ -5,12 +5,13 @@ lib_dir=$(dirname $0)/../../../net/forwarding ...@@ -5,12 +5,13 @@ lib_dir=$(dirname $0)/../../../net/forwarding
ALL_TESTS="fw_flash_test params_test regions_test reload_test \ ALL_TESTS="fw_flash_test params_test regions_test reload_test \
netns_reload_test resource_test dev_info_test \ netns_reload_test resource_test dev_info_test \
empty_reporter_test dummy_reporter_test" empty_reporter_test dummy_reporter_test rate_test"
NUM_NETIFS=0 NUM_NETIFS=0
source $lib_dir/lib.sh source $lib_dir/lib.sh
BUS_ADDR=10 BUS_ADDR=10
PORT_COUNT=4 PORT_COUNT=4
VF_COUNT=4
DEV_NAME=netdevsim$BUS_ADDR DEV_NAME=netdevsim$BUS_ADDR
SYSFS_NET_DIR=/sys/bus/netdevsim/devices/$DEV_NAME/net/ SYSFS_NET_DIR=/sys/bus/netdevsim/devices/$DEV_NAME/net/
DEBUGFS_DIR=/sys/kernel/debug/netdevsim/$DEV_NAME/ DEBUGFS_DIR=/sys/kernel/debug/netdevsim/$DEV_NAME/
...@@ -507,6 +508,170 @@ dummy_reporter_test() ...@@ -507,6 +508,170 @@ dummy_reporter_test()
log_test "dummy reporter test" log_test "dummy reporter test"
} }
rate_leafs_get()
{
local handle=$1
cmd_jq "devlink port function rate show -j" \
'.[] | to_entries | .[] | select(.value.type == "leaf") | .key | select(contains("'$handle'"))'
}
rate_nodes_get()
{
local handle=$1
cmd_jq "devlink port function rate show -j" \
'.[] | to_entries | .[] | select(.value.type == "node") | .key | select(contains("'$handle'"))'
}
rate_attr_set()
{
local handle=$1
local name=$2
local value=$3
local units=$4
devlink port function rate set $handle $name $value$units
}
rate_attr_get()
{
local handle=$1
local name=$2
cmd_jq "devlink port function rate show $handle -j" '.[][].'$name
}
rate_attr_tx_rate_check()
{
local handle=$1
local name=$2
local rate=$3
local debug_file=$4
rate_attr_set $handle $name $rate mbit
check_err $? "Failed to set $name value"
local debug_value=$(cat $debug_file)
check_err $? "Failed to read $name value from debugfs"
[ "$debug_value" == "$rate" ]
check_err $? "Unexpected $name debug value $debug_value != $rate"
local api_value=$(( $(rate_attr_get $handle $name) * 8 / 1000000 ))
check_err $? "Failed to get $name attr value"
[ "$api_value" == "$rate" ]
check_err $? "Unexpected $name attr value $api_value != $rate"
}
rate_attr_parent_check()
{
local handle=$1
local parent=$2
local debug_file=$3
rate_attr_set $handle parent $parent
check_err $? "Failed to set parent"
debug_value=$(cat $debug_file)
check_err $? "Failed to get parent debugfs value"
[ "$debug_value" == "$parent" ]
check_err $? "Unexpected parent debug value $debug_value != $parent"
api_value=$(rate_attr_get $r_obj parent)
check_err $? "Failed to get parent attr value"
[ "$api_value" == "$parent" ]
check_err $? "Unexpected parent attr value $api_value != $parent"
}
rate_node_add()
{
local handle=$1
devlink port function rate add $handle
}
rate_node_del()
{
local handle=$1
devlink port function rate del $handle
}
rate_test()
{
RET=0
echo $VF_COUNT > /sys/bus/netdevsim/devices/$DEV_NAME/sriov_numvfs
devlink dev eswitch set $DL_HANDLE mode switchdev
local leafs=`rate_leafs_get $DL_HANDLE`
local num_leafs=`echo $leafs | wc -w`
[ "$num_leafs" == "$VF_COUNT" ]
check_err $? "Expected $VF_COUNT rate leafs but got $num_leafs"
rate=10
for r_obj in $leafs
do
rate_attr_tx_rate_check $r_obj tx_share $rate \
$DEBUGFS_DIR/ports/${r_obj##*/}/tx_share
rate=$(($rate+10))
done
rate=100
for r_obj in $leafs
do
rate_attr_tx_rate_check $r_obj tx_max $rate \
$DEBUGFS_DIR/ports/${r_obj##*/}/tx_max
rate=$(($rate+100))
done
local node1_name='group1'
local node1="$DL_HANDLE/$node1_name"
rate_node_add "$node1"
check_err $? "Failed to add node $node1"
local num_nodes=`rate_nodes_get $DL_HANDLE | wc -w`
[ $num_nodes == 1 ]
check_err $? "Expected 1 rate node in output but got $num_nodes"
local node_tx_share=10
rate_attr_tx_rate_check $node1 tx_share $node_tx_share \
$DEBUGFS_DIR/rate_nodes/${node1##*/}/tx_share
local node_tx_max=100
rate_attr_tx_rate_check $node1 tx_max $node_tx_max \
$DEBUGFS_DIR/rate_nodes/${node1##*/}/tx_max
rate_node_del "$node1"
check_err $? "Failed to delete node $node1"
local num_nodes=`rate_nodes_get $DL_HANDLE | wc -w`
[ $num_nodes == 0 ]
check_err $? "Expected 0 rate node but got $num_nodes"
local node1_name='group1'
local node1="$DL_HANDLE/$node1_name"
rate_node_add "$node1"
check_err $? "Failed to add node $node1"
rate_attr_parent_check $r_obj $node1_name \
$DEBUGFS_DIR/ports/${r_obj##*/}/rate_parent
local node2_name='group2'
local node2="$DL_HANDLE/$node2_name"
rate_node_add "$node2"
check_err $? "Failed to add node $node2"
rate_attr_parent_check $node2 $node1_name \
$DEBUGFS_DIR/rate_nodes/$node2_name/rate_parent
rate_node_del "$node2"
check_err $? "Failed to delete node $node2"
rate_attr_set "$r_obj" noparent
check_err $? "Failed to unset $r_obj parent node"
rate_node_del "$node1"
check_err $? "Failed to delete node $node1"
log_test "rate test"
}
setup_prepare() setup_prepare()
{ {
modprobe netdevsim modprobe netdevsim
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment