Commit 1970dc6f authored by John Hubbard's avatar John Hubbard Committed by Linus Torvalds

mm/gup: /proc/vmstat: pin_user_pages (FOLL_PIN) reporting

Now that pages are "DMA-pinned" via pin_user_page*(), and unpinned via
unpin_user_pages*(), we need some visibility into whether all of this is
working correctly.

Add two new fields to /proc/vmstat:

    nr_foll_pin_acquired
    nr_foll_pin_released

These are documented in Documentation/core-api/pin_user_pages.rst.  They
represent the number of pages (since boot time) that have been pinned
("nr_foll_pin_acquired") and unpinned ("nr_foll_pin_released"), via
pin_user_pages*() and unpin_user_pages*().

In the absence of long-running DMA or RDMA operations that hold pages
pinned, the above two fields will normally be equal to each other.

Also: update Documentation/core-api/pin_user_pages.rst, to remove an
earlier (now confirmed untrue) claim about a performance problem with
/proc/vmstat.

Also: update Documentation/core-api/pin_user_pages.rst to rename the new
/proc/vmstat entries, to the names listed here.
Signed-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: default avatarJan Kara <jack@suse.cz>
Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20200211001536.1027652-9-jhubbard@nvidia.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 47e29d32
...@@ -208,12 +208,35 @@ has the following new calls to exercise the new pin*() wrapper functions: ...@@ -208,12 +208,35 @@ has the following new calls to exercise the new pin*() wrapper functions:
You can monitor how many total dma-pinned pages have been acquired and released You can monitor how many total dma-pinned pages have been acquired and released
since the system was booted, via two new /proc/vmstat entries: :: since the system was booted, via two new /proc/vmstat entries: ::
/proc/vmstat/nr_foll_pin_requested /proc/vmstat/nr_foll_pin_acquired
/proc/vmstat/nr_foll_pin_requested /proc/vmstat/nr_foll_pin_released
Those are both going to show zero, unless CONFIG_DEBUG_VM is set. This is Under normal conditions, these two values will be equal unless there are any
because there is a noticeable performance drop in unpin_user_page(), when they long-term [R]DMA pins in place, or during pin/unpin transitions.
are activated.
* nr_foll_pin_acquired: This is the number of logical pins that have been
acquired since the system was powered on. For huge pages, the head page is
pinned once for each page (head page and each tail page) within the huge page.
This follows the same sort of behavior that get_user_pages() uses for huge
pages: the head page is refcounted once for each tail or head page in the huge
page, when get_user_pages() is applied to a huge page.
* nr_foll_pin_released: The number of logical pins that have been released since
the system was powered on. Note that pages are released (unpinned) on a
PAGE_SIZE granularity, even if the original pin was applied to a huge page.
Becaused of the pin count behavior described above in "nr_foll_pin_acquired",
the accounting balances out, so that after doing this::
pin_user_pages(huge_page);
for (each page in huge_page)
unpin_user_page(page);
...the following is expected::
nr_foll_pin_released == nr_foll_pin_acquired
(...unless it was already out of balance due to a long-term RDMA pin being in
place.)
References References
========== ==========
......
...@@ -243,6 +243,8 @@ enum node_stat_item { ...@@ -243,6 +243,8 @@ enum node_stat_item {
NR_DIRTIED, /* page dirtyings since bootup */ NR_DIRTIED, /* page dirtyings since bootup */
NR_WRITTEN, /* page writings since bootup */ NR_WRITTEN, /* page writings since bootup */
NR_KERNEL_MISC_RECLAIMABLE, /* reclaimable non-slab kernel pages */ NR_KERNEL_MISC_RECLAIMABLE, /* reclaimable non-slab kernel pages */
NR_FOLL_PIN_ACQUIRED, /* via: pin_user_page(), gup flag: FOLL_PIN */
NR_FOLL_PIN_RELEASED, /* pages returned via unpin_user_page() */
NR_VM_NODE_STAT_ITEMS NR_VM_NODE_STAT_ITEMS
}; };
......
...@@ -86,6 +86,8 @@ static __maybe_unused struct page *try_grab_compound_head(struct page *page, ...@@ -86,6 +86,8 @@ static __maybe_unused struct page *try_grab_compound_head(struct page *page,
if (flags & FOLL_GET) if (flags & FOLL_GET)
return try_get_compound_head(page, refs); return try_get_compound_head(page, refs);
else if (flags & FOLL_PIN) { else if (flags & FOLL_PIN) {
int orig_refs = refs;
/* /*
* When pinning a compound page of order > 1 (which is what * When pinning a compound page of order > 1 (which is what
* hpage_pincount_available() checks for), use an exact count to * hpage_pincount_available() checks for), use an exact count to
...@@ -104,6 +106,9 @@ static __maybe_unused struct page *try_grab_compound_head(struct page *page, ...@@ -104,6 +106,9 @@ static __maybe_unused struct page *try_grab_compound_head(struct page *page,
if (hpage_pincount_available(page)) if (hpage_pincount_available(page))
hpage_pincount_add(page, refs); hpage_pincount_add(page, refs);
mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
orig_refs);
return page; return page;
} }
...@@ -158,6 +163,8 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags) ...@@ -158,6 +163,8 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags)
* once, so that the page really is pinned. * once, so that the page really is pinned.
*/ */
page_ref_add(page, refs); page_ref_add(page, refs);
mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED, 1);
} }
return true; return true;
...@@ -178,6 +185,7 @@ static bool __unpin_devmap_managed_user_page(struct page *page) ...@@ -178,6 +185,7 @@ static bool __unpin_devmap_managed_user_page(struct page *page)
count = page_ref_sub_return(page, refs); count = page_ref_sub_return(page, refs);
mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED, 1);
/* /*
* devmap page refcounts are 1-based, rather than 0-based: if * devmap page refcounts are 1-based, rather than 0-based: if
* refcount is 1, then the page is free and the refcount is * refcount is 1, then the page is free and the refcount is
...@@ -228,6 +236,8 @@ void unpin_user_page(struct page *page) ...@@ -228,6 +236,8 @@ void unpin_user_page(struct page *page)
if (page_ref_sub_and_test(page, refs)) if (page_ref_sub_and_test(page, refs))
__put_page(page); __put_page(page);
mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED, 1);
} }
EXPORT_SYMBOL(unpin_user_page); EXPORT_SYMBOL(unpin_user_page);
...@@ -2014,6 +2024,9 @@ EXPORT_SYMBOL(get_user_pages_unlocked); ...@@ -2014,6 +2024,9 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
static void put_compound_head(struct page *page, int refs, unsigned int flags) static void put_compound_head(struct page *page, int refs, unsigned int flags)
{ {
if (flags & FOLL_PIN) { if (flags & FOLL_PIN) {
mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED,
refs);
if (hpage_pincount_available(page)) if (hpage_pincount_available(page))
hpage_pincount_sub(page, refs); hpage_pincount_sub(page, refs);
else else
......
...@@ -1168,6 +1168,8 @@ const char * const vmstat_text[] = { ...@@ -1168,6 +1168,8 @@ const char * const vmstat_text[] = {
"nr_dirtied", "nr_dirtied",
"nr_written", "nr_written",
"nr_kernel_misc_reclaimable", "nr_kernel_misc_reclaimable",
"nr_foll_pin_acquired",
"nr_foll_pin_released",
/* enum writeback_stat_item counters */ /* enum writeback_stat_item counters */
"nr_dirty_threshold", "nr_dirty_threshold",
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment