Commit a3799729 authored by Paolo Abeni's avatar Paolo Abeni

Merge branch 'net-page_pool-add-netlink-based-introspection'

Jakub Kicinski says:

====================
net: page_pool: add netlink-based introspection

We recently started to deploy newer kernels / drivers at Meta,
making significant use of page pools for the first time.
We immediately run into page pool leaks both real and false positive
warnings. As Eric pointed out/predicted there's no guarantee that
applications will read / close their sockets so a page pool page
may be stuck in a socket (but not leaked) forever. This happens
a lot in our fleet. Most of these are obviously due to application
bugs but we should not be printing kernel warnings due to minor
application resource leaks.

Conversely the page pool memory may get leaked at runtime, and
we have no way to detect / track that, unless someone reconfigures
the NIC and destroys the page pools which leaked the pages.

The solution presented here is to expose the memory use of page
pools via netlink. This allows for continuous monitoring of memory
used by page pools, regardless if they were destroyed or not.
Sample in patch 15 can print the memory use and recycling
efficiency:

$ ./page-pool
    eth0[2]	page pools: 10 (zombies: 0)
		refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
		recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)

v4:
 - use dev_net(netdev)->loopback_dev
 - extend inflight doc
v3: https://lore.kernel.org/all/20231122034420.1158898-1-kuba@kernel.org/
 - ID is still here, can't decide if it matters
 - rename destroyed -> detach-time, good enough?
 - fix build for netsec
v2: https://lore.kernel.org/r/20231121000048.789613-1-kuba@kernel.org
 - hopefully fix build with PAGE_POOL=n
v1: https://lore.kernel.org/all/20231024160220.3973311-1-kuba@kernel.org/
 - The main change compared to the RFC is that the API now exposes
   outstanding references and byte counts even for "live" page pools.
   The warning is no longer printed if page pool is accessible via netlink.
RFC: https://lore.kernel.org/all/20230816234303.3786178-1-kuba@kernel.org/
====================

Link: https://lore.kernel.org/r/20231126230740.2148636-1-kuba@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
parents a2147245 637567e4
......@@ -86,6 +86,112 @@ attribute-sets:
See Documentation/networking/xdp-rx-metadata.rst for more details.
type: u64
enum: xdp-rx-metadata
-
name: page-pool
attributes:
-
name: id
doc: Unique ID of a Page Pool instance.
type: uint
checks:
min: 1
max: u32-max
-
name: ifindex
doc: |
ifindex of the netdev to which the pool belongs.
May be reported as 0 if the page pool was allocated for a netdev
which got destroyed already (page pools may outlast their netdevs
because they wait for all memory to be returned).
type: u32
checks:
min: 1
max: s32-max
-
name: napi-id
doc: Id of NAPI using this Page Pool instance.
type: uint
checks:
min: 1
max: u32-max
-
name: inflight
type: uint
doc: |
Number of outstanding references to this page pool (allocated
but yet to be freed pages). Allocated pages may be held in
socket receive queues, driver receive ring, page pool recycling
ring, the page pool cache, etc.
-
name: inflight-mem
type: uint
doc: |
Amount of memory held by inflight pages.
-
name: detach-time
type: uint
doc: |
Seconds in CLOCK_BOOTTIME of when Page Pool was detached by
the driver. Once detached Page Pool can no longer be used to
allocate memory.
Page Pools wait for all the memory allocated from them to be freed
before truly disappearing. "Detached" Page Pools cannot be
"re-attached", they are just waiting to disappear.
Attribute is absent if Page Pool has not been detached, and
can still be used to allocate new memory.
-
name: page-pool-info
subset-of: page-pool
attributes:
-
name: id
-
name: ifindex
-
name: page-pool-stats
doc: |
Page pool statistics, see docs for struct page_pool_stats
for information about individual statistics.
attributes:
-
name: info
doc: Page pool identifying information.
type: nest
nested-attributes: page-pool-info
-
name: alloc-fast
type: uint
value: 8 # reserve some attr ids in case we need more metadata later
-
name: alloc-slow
type: uint
-
name: alloc-slow-high-order
type: uint
-
name: alloc-empty
type: uint
-
name: alloc-refill
type: uint
-
name: alloc-waive
type: uint
-
name: recycle-cached
type: uint
-
name: recycle-cache-full
type: uint
-
name: recycle-ring
type: uint
-
name: recycle-ring-full
type: uint
-
name: recycle-released-refcnt
type: uint
operations:
list:
......@@ -120,8 +226,74 @@ operations:
doc: Notification about device configuration being changed.
notify: dev-get
mcgrp: mgmt
-
name: page-pool-get
doc: |
Get / dump information about Page Pools.
(Only Page Pools associated with a net_device can be listed.)
attribute-set: page-pool
do:
request:
attributes:
- id
reply: &pp-reply
attributes:
- id
- ifindex
- napi-id
- inflight
- inflight-mem
- detach-time
dump:
reply: *pp-reply
config-cond: page-pool
-
name: page-pool-add-ntf
doc: Notification about page pool appearing.
notify: page-pool-get
mcgrp: page-pool
config-cond: page-pool
-
name: page-pool-del-ntf
doc: Notification about page pool disappearing.
notify: page-pool-get
mcgrp: page-pool
config-cond: page-pool
-
name: page-pool-change-ntf
doc: Notification about page pool configuration being changed.
notify: page-pool-get
mcgrp: page-pool
config-cond: page-pool
-
name: page-pool-stats-get
doc: Get page pool statistics.
attribute-set: page-pool-stats
do:
request:
attributes:
- info
reply: &pp-stats-reply
attributes:
- info
- alloc-fast
- alloc-slow
- alloc-slow-high-order
- alloc-empty
- alloc-refill
- alloc-waive
- recycle-cached
- recycle-cache-full
- recycle-ring
- recycle-ring-full
- recycle-released-refcnt
dump:
reply: *pp-stats-reply
config-cond: page-pool-stats
mcast-groups:
list:
-
name: mgmt
-
name: page-pool
......@@ -41,6 +41,11 @@ Architecture overview
| Fast cache | | ptr-ring cache |
+-----------------+ +------------------+
Monitoring
==========
Information about page pools on the system can be accessed via the netdev
genetlink family (see Documentation/netlink/specs/netdev.yaml).
API interface
=============
The number of pools created **must** match the number of hardware queues
......@@ -107,8 +112,9 @@ page_pool_get_stats() and structures described below are available.
It takes a pointer to a ``struct page_pool`` and a pointer to a struct
page_pool_stats allocated by the caller.
The API will fill in the provided struct page_pool_stats with
statistics about the page_pool.
Older drivers expose page pool statistics via ethtool or debugfs.
The same statistics are accessible via the netlink netdev family
in a driver-independent fashion.
.. kernel-doc:: include/net/page_pool/types.h
:identifiers: struct page_pool_recycle_stats
......
......@@ -3331,6 +3331,7 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
pp.pool_size += bp->rx_ring_size;
pp.nid = dev_to_node(&bp->pdev->dev);
pp.napi = &rxr->bnapi->napi;
pp.netdev = bp->dev;
pp.dev = &bp->pdev->dev;
pp.dma_dir = bp->rx_dir;
pp.max_len = PAGE_SIZE;
......
......@@ -902,6 +902,7 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params,
pp_params.nid = node;
pp_params.dev = rq->pdev;
pp_params.napi = rq->cq.napi;
pp_params.netdev = rq->netdev;
pp_params.dma_dir = rq->buff.map_dir;
pp_params.max_len = PAGE_SIZE;
......
......@@ -2137,6 +2137,7 @@ static int mana_create_page_pool(struct mana_rxq *rxq, struct gdma_context *gc)
pprm.pool_size = RX_BUFFERS_PER_QUEUE;
pprm.nid = gc->numa_node;
pprm.napi = &rxq->rx_cq.napi;
pprm.netdev = rxq->ndev;
rxq->page_pool = page_pool_create(&pprm);
......
......@@ -1302,6 +1302,8 @@ static int netsec_setup_rx_dring(struct netsec_priv *priv)
.dma_dir = xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE,
.offset = NETSEC_RXBUF_HEADROOM,
.max_len = NETSEC_RX_BUF_SIZE,
.napi = &priv->napi,
.netdev = priv->ndev,
};
int i, err;
......
......@@ -1119,6 +1119,26 @@ static inline void hlist_move_list(struct hlist_head *old,
old->first = NULL;
}
/**
* hlist_splice_init() - move all entries from one list to another
* @from: hlist_head from which entries will be moved
* @last: last entry on the @from list
* @to: hlist_head to which entries will be moved
*
* @to can be empty, @from must contain at least @last.
*/
static inline void hlist_splice_init(struct hlist_head *from,
struct hlist_node *last,
struct hlist_head *to)
{
if (to->first)
to->first->pprev = &last->next;
last->next = to->first;
to->first = from->first;
from->first->pprev = &to->first;
from->first = NULL;
}
#define hlist_entry(ptr, type, member) container_of(ptr,type,member)
#define hlist_for_each(pos, head) \
......
......@@ -2447,6 +2447,10 @@ struct net_device {
#if IS_ENABLED(CONFIG_DPLL)
struct dpll_pin *dpll_pin;
#endif
#if IS_ENABLED(CONFIG_PAGE_POOL)
/** @page_pools: page pools created for this netdevice */
struct hlist_head page_pools;
#endif
};
#define to_net_dev(d) container_of(d, struct net_device, dev)
......
......@@ -83,6 +83,8 @@
/********** net/core/skbuff.c **********/
#define SKB_LIST_POISON_NEXT ((void *)(0x800 + POISON_POINTER_DELTA))
/********** net/ **********/
#define NET_PTR_POISON ((void *)(0x801 + POISON_POINTER_DELTA))
/********** kernel/bpf/ **********/
#define BPF_PTR_POISON ((void *)(0xeB9FUL + POISON_POINTER_DELTA))
......
......@@ -55,16 +55,12 @@
#include <net/page_pool/types.h>
#ifdef CONFIG_PAGE_POOL_STATS
/* Deprecated driver-facing API, use netlink instead */
int page_pool_ethtool_stats_get_count(void);
u8 *page_pool_ethtool_stats_get_strings(u8 *data);
u64 *page_pool_ethtool_stats_get(u64 *data, void *stats);
/*
* Drivers that wish to harvest page pool stats and report them to users
* (perhaps via ethtool, debugfs, or another mechanism) can allocate a
* struct page_pool_stats call page_pool_get_stats to get stats for the specified pool.
*/
bool page_pool_get_stats(struct page_pool *pool,
bool page_pool_get_stats(const struct page_pool *pool,
struct page_pool_stats *stats);
#else
static inline int page_pool_ethtool_stats_get_count(void)
......
......@@ -5,6 +5,7 @@
#include <linux/dma-direction.h>
#include <linux/ptr_ring.h>
#include <linux/types.h>
#define PP_FLAG_DMA_MAP BIT(0) /* Should page_pool do the DMA
* map/unmap
......@@ -48,6 +49,7 @@ struct pp_alloc_cache {
* @pool_size: size of the ptr_ring
* @nid: NUMA node id to allocate from pages from
* @dev: device, for DMA pre-mapping purposes
* @netdev: netdev this pool will serve (leave as NULL if none or multiple)
* @napi: NAPI which is the sole consumer of pages, otherwise NULL
* @dma_dir: DMA mapping direction
* @max_len: max DMA sync memory size for PP_FLAG_DMA_SYNC_DEV
......@@ -66,6 +68,7 @@ struct page_pool_params {
unsigned int offset;
);
struct_group_tagged(page_pool_params_slow, slow,
struct net_device *netdev;
/* private: used by test code only */
void (*init_callback)(struct page *page, void *arg);
void *init_arg;
......@@ -187,6 +190,13 @@ struct page_pool {
/* Slow/Control-path information follows */
struct page_pool_params_slow slow;
/* User-facing fields, protected by page_pools_lock */
struct {
struct hlist_node list;
u64 detach_time;
u32 napi_id;
u32 id;
} user;
};
struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
......
......@@ -64,16 +64,52 @@ enum {
NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1)
};
enum {
NETDEV_A_PAGE_POOL_ID = 1,
NETDEV_A_PAGE_POOL_IFINDEX,
NETDEV_A_PAGE_POOL_NAPI_ID,
NETDEV_A_PAGE_POOL_INFLIGHT,
NETDEV_A_PAGE_POOL_INFLIGHT_MEM,
NETDEV_A_PAGE_POOL_DETACH_TIME,
__NETDEV_A_PAGE_POOL_MAX,
NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
};
enum {
NETDEV_A_PAGE_POOL_STATS_INFO = 1,
NETDEV_A_PAGE_POOL_STATS_ALLOC_FAST = 8,
NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW,
NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW_HIGH_ORDER,
NETDEV_A_PAGE_POOL_STATS_ALLOC_EMPTY,
NETDEV_A_PAGE_POOL_STATS_ALLOC_REFILL,
NETDEV_A_PAGE_POOL_STATS_ALLOC_WAIVE,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHED,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHE_FULL,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING_FULL,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_RELEASED_REFCNT,
__NETDEV_A_PAGE_POOL_STATS_MAX,
NETDEV_A_PAGE_POOL_STATS_MAX = (__NETDEV_A_PAGE_POOL_STATS_MAX - 1)
};
enum {
NETDEV_CMD_DEV_GET = 1,
NETDEV_CMD_DEV_ADD_NTF,
NETDEV_CMD_DEV_DEL_NTF,
NETDEV_CMD_DEV_CHANGE_NTF,
NETDEV_CMD_PAGE_POOL_GET,
NETDEV_CMD_PAGE_POOL_ADD_NTF,
NETDEV_CMD_PAGE_POOL_DEL_NTF,
NETDEV_CMD_PAGE_POOL_CHANGE_NTF,
NETDEV_CMD_PAGE_POOL_STATS_GET,
__NETDEV_CMD_MAX,
NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)
};
#define NETDEV_MCGRP_MGMT "mgmt"
#define NETDEV_MCGRP_PAGE_POOL "page-pool"
#endif /* _UAPI_LINUX_NETDEV_H */
......@@ -18,7 +18,7 @@ obj-y += dev.o dev_addr_lists.o dst.o netevent.o \
obj-$(CONFIG_NETDEV_ADDR_LIST_TEST) += dev_addr_lists_test.o
obj-y += net-sysfs.o
obj-$(CONFIG_PAGE_POOL) += page_pool.o
obj-$(CONFIG_PAGE_POOL) += page_pool.o page_pool_user.o
obj-$(CONFIG_PROC_FS) += net-procfs.o
obj-$(CONFIG_NET_PKTGEN) += pktgen.o
obj-$(CONFIG_NETPOLL) += netpoll.o
......
......@@ -10,11 +10,42 @@
#include <uapi/linux/netdev.h>
/* Integer value ranges */
static const struct netlink_range_validation netdev_a_page_pool_id_range = {
.min = 1ULL,
.max = 4294967295ULL,
};
static const struct netlink_range_validation netdev_a_page_pool_ifindex_range = {
.min = 1ULL,
.max = 2147483647ULL,
};
/* Common nested types */
const struct nla_policy netdev_page_pool_info_nl_policy[NETDEV_A_PAGE_POOL_IFINDEX + 1] = {
[NETDEV_A_PAGE_POOL_ID] = NLA_POLICY_FULL_RANGE(NLA_UINT, &netdev_a_page_pool_id_range),
[NETDEV_A_PAGE_POOL_IFINDEX] = NLA_POLICY_FULL_RANGE(NLA_U32, &netdev_a_page_pool_ifindex_range),
};
/* NETDEV_CMD_DEV_GET - do */
static const struct nla_policy netdev_dev_get_nl_policy[NETDEV_A_DEV_IFINDEX + 1] = {
[NETDEV_A_DEV_IFINDEX] = NLA_POLICY_MIN(NLA_U32, 1),
};
/* NETDEV_CMD_PAGE_POOL_GET - do */
#ifdef CONFIG_PAGE_POOL
static const struct nla_policy netdev_page_pool_get_nl_policy[NETDEV_A_PAGE_POOL_ID + 1] = {
[NETDEV_A_PAGE_POOL_ID] = NLA_POLICY_FULL_RANGE(NLA_UINT, &netdev_a_page_pool_id_range),
};
#endif /* CONFIG_PAGE_POOL */
/* NETDEV_CMD_PAGE_POOL_STATS_GET - do */
#ifdef CONFIG_PAGE_POOL_STATS
static const struct nla_policy netdev_page_pool_stats_get_nl_policy[NETDEV_A_PAGE_POOL_STATS_INFO + 1] = {
[NETDEV_A_PAGE_POOL_STATS_INFO] = NLA_POLICY_NESTED(netdev_page_pool_info_nl_policy),
};
#endif /* CONFIG_PAGE_POOL_STATS */
/* Ops table for netdev */
static const struct genl_split_ops netdev_nl_ops[] = {
{
......@@ -29,10 +60,39 @@ static const struct genl_split_ops netdev_nl_ops[] = {
.dumpit = netdev_nl_dev_get_dumpit,
.flags = GENL_CMD_CAP_DUMP,
},
#ifdef CONFIG_PAGE_POOL
{
.cmd = NETDEV_CMD_PAGE_POOL_GET,
.doit = netdev_nl_page_pool_get_doit,
.policy = netdev_page_pool_get_nl_policy,
.maxattr = NETDEV_A_PAGE_POOL_ID,
.flags = GENL_CMD_CAP_DO,
},
{
.cmd = NETDEV_CMD_PAGE_POOL_GET,
.dumpit = netdev_nl_page_pool_get_dumpit,
.flags = GENL_CMD_CAP_DUMP,
},
#endif /* CONFIG_PAGE_POOL */
#ifdef CONFIG_PAGE_POOL_STATS
{
.cmd = NETDEV_CMD_PAGE_POOL_STATS_GET,
.doit = netdev_nl_page_pool_stats_get_doit,
.policy = netdev_page_pool_stats_get_nl_policy,
.maxattr = NETDEV_A_PAGE_POOL_STATS_INFO,
.flags = GENL_CMD_CAP_DO,
},
{
.cmd = NETDEV_CMD_PAGE_POOL_STATS_GET,
.dumpit = netdev_nl_page_pool_stats_get_dumpit,
.flags = GENL_CMD_CAP_DUMP,
},
#endif /* CONFIG_PAGE_POOL_STATS */
};
static const struct genl_multicast_group netdev_nl_mcgrps[] = {
[NETDEV_NLGRP_MGMT] = { "mgmt", },
[NETDEV_NLGRP_PAGE_POOL] = { "page-pool", },
};
struct genl_family netdev_nl_family __ro_after_init = {
......
......@@ -11,11 +11,22 @@
#include <uapi/linux/netdev.h>
/* Common nested types */
extern const struct nla_policy netdev_page_pool_info_nl_policy[NETDEV_A_PAGE_POOL_IFINDEX + 1];
int netdev_nl_dev_get_doit(struct sk_buff *skb, struct genl_info *info);
int netdev_nl_dev_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
int netdev_nl_page_pool_get_doit(struct sk_buff *skb, struct genl_info *info);
int netdev_nl_page_pool_get_dumpit(struct sk_buff *skb,
struct netlink_callback *cb);
int netdev_nl_page_pool_stats_get_doit(struct sk_buff *skb,
struct genl_info *info);
int netdev_nl_page_pool_stats_get_dumpit(struct sk_buff *skb,
struct netlink_callback *cb);
enum {
NETDEV_NLGRP_MGMT,
NETDEV_NLGRP_PAGE_POOL,
};
extern struct genl_family netdev_nl_family;
......
......@@ -23,6 +23,8 @@
#include <trace/events/page_pool.h>
#include "page_pool_priv.h"
#define DEFER_TIME (msecs_to_jiffies(1000))
#define DEFER_WARN_INTERVAL (60 * HZ)
......@@ -69,7 +71,7 @@ static const char pp_stats[][ETH_GSTRING_LEN] = {
* is passed to this API which is filled in. The caller can then report
* those stats to the user (perhaps via ethtool, debugfs, etc.).
*/
bool page_pool_get_stats(struct page_pool *pool,
bool page_pool_get_stats(const struct page_pool *pool,
struct page_pool_stats *stats)
{
int cpu = 0;
......@@ -238,6 +240,18 @@ static int page_pool_init(struct page_pool *pool,
return 0;
}
static void page_pool_uninit(struct page_pool *pool)
{
ptr_ring_cleanup(&pool->ring, NULL);
if (pool->p.flags & PP_FLAG_DMA_MAP)
put_device(pool->p.dev);
#ifdef CONFIG_PAGE_POOL_STATS
free_percpu(pool->recycle_stats);
#endif
}
/**
* page_pool_create() - create a page pool.
* @params: parameters, see struct page_pool_params
......@@ -252,13 +266,21 @@ struct page_pool *page_pool_create(const struct page_pool_params *params)
return ERR_PTR(-ENOMEM);
err = page_pool_init(pool, params);
if (err < 0) {
pr_warn("%s() gave up with errno %d\n", __func__, err);
kfree(pool);
return ERR_PTR(err);
}
if (err < 0)
goto err_free;
err = page_pool_list(pool);
if (err)
goto err_uninit;
return pool;
err_uninit:
page_pool_uninit(pool);
err_free:
pr_warn("%s() gave up with errno %d\n", __func__, err);
kfree(pool);
return ERR_PTR(err);
}
EXPORT_SYMBOL(page_pool_create);
......@@ -507,7 +529,7 @@ EXPORT_SYMBOL(page_pool_alloc_pages);
*/
#define _distance(a, b) (s32)((a) - (b))
static s32 page_pool_inflight(struct page_pool *pool)
s32 page_pool_inflight(const struct page_pool *pool, bool strict)
{
u32 release_cnt = atomic_read(&pool->pages_state_release_cnt);
u32 hold_cnt = READ_ONCE(pool->pages_state_hold_cnt);
......@@ -515,8 +537,13 @@ static s32 page_pool_inflight(struct page_pool *pool)
inflight = _distance(hold_cnt, release_cnt);
trace_page_pool_release(pool, inflight, hold_cnt, release_cnt);
WARN(inflight < 0, "Negative(%d) inflight packet-pages", inflight);
if (strict) {
trace_page_pool_release(pool, inflight, hold_cnt, release_cnt);
WARN(inflight < 0, "Negative(%d) inflight packet-pages",
inflight);
} else {
inflight = max(0, inflight);
}
return inflight;
}
......@@ -821,14 +848,8 @@ static void __page_pool_destroy(struct page_pool *pool)
if (pool->disconnect)
pool->disconnect(pool);
ptr_ring_cleanup(&pool->ring, NULL);
if (pool->p.flags & PP_FLAG_DMA_MAP)
put_device(pool->p.dev);
#ifdef CONFIG_PAGE_POOL_STATS
free_percpu(pool->recycle_stats);
#endif
page_pool_unlist(pool);
page_pool_uninit(pool);
kfree(pool);
}
......@@ -865,7 +886,7 @@ static int page_pool_release(struct page_pool *pool)
int inflight;
page_pool_scrub(pool);
inflight = page_pool_inflight(pool);
inflight = page_pool_inflight(pool, true);
if (!inflight)
__page_pool_destroy(pool);
......@@ -876,18 +897,21 @@ static void page_pool_release_retry(struct work_struct *wq)
{
struct delayed_work *dwq = to_delayed_work(wq);
struct page_pool *pool = container_of(dwq, typeof(*pool), release_dw);
void *netdev;
int inflight;
inflight = page_pool_release(pool);
if (!inflight)
return;
/* Periodic warning */
if (time_after_eq(jiffies, pool->defer_warn)) {
/* Periodic warning for page pools the user can't see */
netdev = READ_ONCE(pool->slow.netdev);
if (time_after_eq(jiffies, pool->defer_warn) &&
(!netdev || netdev == NET_PTR_POISON)) {
int sec = (s32)((u32)jiffies - (u32)pool->defer_start) / HZ;
pr_warn("%s() stalled pool shutdown %d inflight %d sec\n",
__func__, inflight, sec);
pr_warn("%s() stalled pool shutdown: id %u, %d inflight %d sec\n",
__func__, pool->user.id, inflight, sec);
pool->defer_warn = jiffies + DEFER_WARN_INTERVAL;
}
......@@ -932,6 +956,7 @@ void page_pool_destroy(struct page_pool *pool)
if (!page_pool_release(pool))
return;
page_pool_detached(pool);
pool->defer_start = jiffies;
pool->defer_warn = jiffies + DEFER_WARN_INTERVAL;
......
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __PAGE_POOL_PRIV_H
#define __PAGE_POOL_PRIV_H
s32 page_pool_inflight(const struct page_pool *pool, bool strict);
int page_pool_list(struct page_pool *pool);
void page_pool_detached(struct page_pool *pool);
void page_pool_unlist(struct page_pool *pool);
#endif
This diff is collapsed.
......@@ -64,16 +64,52 @@ enum {
NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1)
};
enum {
NETDEV_A_PAGE_POOL_ID = 1,
NETDEV_A_PAGE_POOL_IFINDEX,
NETDEV_A_PAGE_POOL_NAPI_ID,
NETDEV_A_PAGE_POOL_INFLIGHT,
NETDEV_A_PAGE_POOL_INFLIGHT_MEM,
NETDEV_A_PAGE_POOL_DETACH_TIME,
__NETDEV_A_PAGE_POOL_MAX,
NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
};
enum {
NETDEV_A_PAGE_POOL_STATS_INFO = 1,
NETDEV_A_PAGE_POOL_STATS_ALLOC_FAST = 8,
NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW,
NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW_HIGH_ORDER,
NETDEV_A_PAGE_POOL_STATS_ALLOC_EMPTY,
NETDEV_A_PAGE_POOL_STATS_ALLOC_REFILL,
NETDEV_A_PAGE_POOL_STATS_ALLOC_WAIVE,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHED,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHE_FULL,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING_FULL,
NETDEV_A_PAGE_POOL_STATS_RECYCLE_RELEASED_REFCNT,
__NETDEV_A_PAGE_POOL_STATS_MAX,
NETDEV_A_PAGE_POOL_STATS_MAX = (__NETDEV_A_PAGE_POOL_STATS_MAX - 1)
};
enum {
NETDEV_CMD_DEV_GET = 1,
NETDEV_CMD_DEV_ADD_NTF,
NETDEV_CMD_DEV_DEL_NTF,
NETDEV_CMD_DEV_CHANGE_NTF,
NETDEV_CMD_PAGE_POOL_GET,
NETDEV_CMD_PAGE_POOL_ADD_NTF,
NETDEV_CMD_PAGE_POOL_DEL_NTF,
NETDEV_CMD_PAGE_POOL_CHANGE_NTF,
NETDEV_CMD_PAGE_POOL_STATS_GET,
__NETDEV_CMD_MAX,
NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)
};
#define NETDEV_MCGRP_MGMT "mgmt"
#define NETDEV_MCGRP_PAGE_POOL "page-pool"
#endif /* _UAPI_LINUX_NETDEV_H */
This diff is collapsed.
......@@ -21,6 +21,16 @@ const char *netdev_xdp_act_str(enum netdev_xdp_act value);
const char *netdev_xdp_rx_metadata_str(enum netdev_xdp_rx_metadata value);
/* Common nested types */
struct netdev_page_pool_info {
struct {
__u32 id:1;
__u32 ifindex:1;
} _present;
__u64 id;
__u32 ifindex;
};
/* ============== NETDEV_CMD_DEV_GET ============== */
/* NETDEV_CMD_DEV_GET - do */
struct netdev_dev_get_req {
......@@ -87,4 +97,165 @@ struct netdev_dev_get_ntf {
void netdev_dev_get_ntf_free(struct netdev_dev_get_ntf *rsp);
/* ============== NETDEV_CMD_PAGE_POOL_GET ============== */
/* NETDEV_CMD_PAGE_POOL_GET - do */
struct netdev_page_pool_get_req {
struct {
__u32 id:1;
} _present;
__u64 id;
};
static inline struct netdev_page_pool_get_req *
netdev_page_pool_get_req_alloc(void)
{
return calloc(1, sizeof(struct netdev_page_pool_get_req));
}
void netdev_page_pool_get_req_free(struct netdev_page_pool_get_req *req);
static inline void
netdev_page_pool_get_req_set_id(struct netdev_page_pool_get_req *req, __u64 id)
{
req->_present.id = 1;
req->id = id;
}
struct netdev_page_pool_get_rsp {
struct {
__u32 id:1;
__u32 ifindex:1;
__u32 napi_id:1;
__u32 inflight:1;
__u32 inflight_mem:1;
__u32 detach_time:1;
} _present;
__u64 id;
__u32 ifindex;
__u64 napi_id;
__u64 inflight;
__u64 inflight_mem;
__u64 detach_time;
};
void netdev_page_pool_get_rsp_free(struct netdev_page_pool_get_rsp *rsp);
/*
* Get / dump information about Page Pools.
(Only Page Pools associated with a net_device can be listed.)
*/
struct netdev_page_pool_get_rsp *
netdev_page_pool_get(struct ynl_sock *ys, struct netdev_page_pool_get_req *req);
/* NETDEV_CMD_PAGE_POOL_GET - dump */
struct netdev_page_pool_get_list {
struct netdev_page_pool_get_list *next;
struct netdev_page_pool_get_rsp obj __attribute__((aligned(8)));
};
void netdev_page_pool_get_list_free(struct netdev_page_pool_get_list *rsp);
struct netdev_page_pool_get_list *
netdev_page_pool_get_dump(struct ynl_sock *ys);
/* NETDEV_CMD_PAGE_POOL_GET - notify */
struct netdev_page_pool_get_ntf {
__u16 family;
__u8 cmd;
struct ynl_ntf_base_type *next;
void (*free)(struct netdev_page_pool_get_ntf *ntf);
struct netdev_page_pool_get_rsp obj __attribute__((aligned(8)));
};
void netdev_page_pool_get_ntf_free(struct netdev_page_pool_get_ntf *rsp);
/* ============== NETDEV_CMD_PAGE_POOL_STATS_GET ============== */
/* NETDEV_CMD_PAGE_POOL_STATS_GET - do */
struct netdev_page_pool_stats_get_req {
struct {
__u32 info:1;
} _present;
struct netdev_page_pool_info info;
};
static inline struct netdev_page_pool_stats_get_req *
netdev_page_pool_stats_get_req_alloc(void)
{
return calloc(1, sizeof(struct netdev_page_pool_stats_get_req));
}
void
netdev_page_pool_stats_get_req_free(struct netdev_page_pool_stats_get_req *req);
static inline void
netdev_page_pool_stats_get_req_set_info_id(struct netdev_page_pool_stats_get_req *req,
__u64 id)
{
req->_present.info = 1;
req->info._present.id = 1;
req->info.id = id;
}
static inline void
netdev_page_pool_stats_get_req_set_info_ifindex(struct netdev_page_pool_stats_get_req *req,
__u32 ifindex)
{
req->_present.info = 1;
req->info._present.ifindex = 1;
req->info.ifindex = ifindex;
}
struct netdev_page_pool_stats_get_rsp {
struct {
__u32 info:1;
__u32 alloc_fast:1;
__u32 alloc_slow:1;
__u32 alloc_slow_high_order:1;
__u32 alloc_empty:1;
__u32 alloc_refill:1;
__u32 alloc_waive:1;
__u32 recycle_cached:1;
__u32 recycle_cache_full:1;
__u32 recycle_ring:1;
__u32 recycle_ring_full:1;
__u32 recycle_released_refcnt:1;
} _present;
struct netdev_page_pool_info info;
__u64 alloc_fast;
__u64 alloc_slow;
__u64 alloc_slow_high_order;
__u64 alloc_empty;
__u64 alloc_refill;
__u64 alloc_waive;
__u64 recycle_cached;
__u64 recycle_cache_full;
__u64 recycle_ring;
__u64 recycle_ring_full;
__u64 recycle_released_refcnt;
};
void
netdev_page_pool_stats_get_rsp_free(struct netdev_page_pool_stats_get_rsp *rsp);
/*
* Get page pool statistics.
*/
struct netdev_page_pool_stats_get_rsp *
netdev_page_pool_stats_get(struct ynl_sock *ys,
struct netdev_page_pool_stats_get_req *req);
/* NETDEV_CMD_PAGE_POOL_STATS_GET - dump */
struct netdev_page_pool_stats_get_list {
struct netdev_page_pool_stats_get_list *next;
struct netdev_page_pool_stats_get_rsp obj __attribute__((aligned(8)));
};
void
netdev_page_pool_stats_get_list_free(struct netdev_page_pool_stats_get_list *rsp);
struct netdev_page_pool_stats_get_list *
netdev_page_pool_stats_get_dump(struct ynl_sock *ys);
#endif /* _LINUX_NETDEV_GEN_H */
......@@ -239,7 +239,7 @@ int ynl_error_parse(struct ynl_parse_arg *yarg, const char *msg);
#ifndef MNL_HAS_AUTO_SCALARS
static inline uint64_t mnl_attr_get_uint(const struct nlattr *attr)
{
if (mnl_attr_get_len(attr) == 4)
if (mnl_attr_get_payload_len(attr) == 4)
return mnl_attr_get_u32(attr);
return mnl_attr_get_u64(attr);
}
......
ethtool
devlink
netdev
page-pool
\ No newline at end of file
......@@ -18,7 +18,7 @@ include $(wildcard *.d)
all: $(BINS)
$(BINS): ../lib/ynl.a ../generated/protos.a
$(BINS): ../lib/ynl.a ../generated/protos.a $(SRCS)
@echo -e '\tCC sample $@'
@$(COMPILE.c) $(CFLAGS_$@) $@.c -o $@.o
@$(LINK.c) $@.o -o $@ $(LDLIBS)
......
// SPDX-License-Identifier: GPL-2.0
#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>
#include <ynl.h>
#include <net/if.h>
#include "netdev-user.h"
struct stat {
unsigned int ifc;
struct {
unsigned int cnt;
size_t refs, bytes;
} live[2];
size_t alloc_slow, alloc_fast, recycle_ring, recycle_cache;
};
struct stats_array {
unsigned int i, max;
struct stat *s;
};
static struct stat *find_ifc(struct stats_array *a, unsigned int ifindex)
{
unsigned int i;
for (i = 0; i < a->i; i++) {
if (a->s[i].ifc == ifindex)
return &a->s[i];
}
a->i++;
if (a->i == a->max) {
a->max *= 2;
a->s = reallocarray(a->s, a->max, sizeof(*a->s));
}
a->s[i].ifc = ifindex;
return &a->s[i];
}
static void count(struct stat *s, unsigned int l,
struct netdev_page_pool_get_rsp *pp)
{
s->live[l].cnt++;
if (pp->_present.inflight)
s->live[l].refs += pp->inflight;
if (pp->_present.inflight_mem)
s->live[l].bytes += pp->inflight_mem;
}
int main(int argc, char **argv)
{
struct netdev_page_pool_stats_get_list *pp_stats;
struct netdev_page_pool_get_list *pools;
struct stats_array a = {};
struct ynl_error yerr;
struct ynl_sock *ys;
ys = ynl_sock_create(&ynl_netdev_family, &yerr);
if (!ys) {
fprintf(stderr, "YNL: %s\n", yerr.msg);
return 1;
}
a.max = 128;
a.s = calloc(a.max, sizeof(*a.s));
if (!a.s)
goto err_close;
pools = netdev_page_pool_get_dump(ys);
if (!pools)
goto err_free;
ynl_dump_foreach(pools, pp) {
struct stat *s = find_ifc(&a, pp->ifindex);
count(s, 1, pp);
if (pp->_present.destroyed)
count(s, 0, pp);
}
netdev_page_pool_get_list_free(pools);
pp_stats = netdev_page_pool_stats_get_dump(ys);
if (!pp_stats)
goto err_free;
ynl_dump_foreach(pp_stats, pp) {
struct stat *s = find_ifc(&a, pp->info.ifindex);
if (pp->_present.alloc_fast)
s->alloc_fast += pp->alloc_fast;
if (pp->_present.alloc_slow)
s->alloc_slow += pp->alloc_slow;
if (pp->_present.recycle_ring)
s->recycle_ring += pp->recycle_ring;
if (pp->_present.recycle_cached)
s->recycle_cache += pp->recycle_cached;
}
netdev_page_pool_stats_get_list_free(pp_stats);
for (unsigned int i = 0; i < a.i; i++) {
char ifname[IF_NAMESIZE];
struct stat *s = &a.s[i];
const char *name;
double recycle;
if (!s->ifc) {
name = "<orphan>\t";
} else {
name = if_indextoname(s->ifc, ifname);
if (name)
printf("%8s", name);
printf("[%d]\t", s->ifc);
}
printf("page pools: %u (zombies: %u)\n",
s->live[1].cnt, s->live[0].cnt);
printf("\t\trefs: %zu bytes: %zu (refs: %zu bytes: %zu)\n",
s->live[1].refs, s->live[1].bytes,
s->live[0].refs, s->live[0].bytes);
/* We don't know how many pages are sitting in cache and ring
* so we will under-count the recycling rate a bit.
*/
recycle = (double)(s->recycle_ring + s->recycle_cache) /
(s->alloc_fast + s->alloc_slow) * 100;
printf("\t\trecycling: %.1lf%% (alloc: %zu:%zu recycle: %zu:%zu)\n",
recycle, s->alloc_slow, s->alloc_fast,
s->recycle_ring, s->recycle_cache);
}
ynl_sock_destroy(ys);
return 0;
err_free:
free(a.s);
err_close:
fprintf(stderr, "YNL: %s\n", ys->err.msg);
ynl_sock_destroy(ys);
return 2;
}
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment