Commit 1b5946a8 authored by Kirill A. Shutemov's avatar Kirill A. Shutemov Committed by Linus Torvalds

thp: update Documentation/{vm/transhuge,filesystems/proc}.txt

Add info about tmpfs/shmem with huge pages.

Link: http://lkml.kernel.org/r/1466021202-61880-38-git-send-email-kirill.shutemov@linux.intel.comSigned-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 779750d2
...@@ -436,6 +436,7 @@ Private_Dirty: 0 kB ...@@ -436,6 +436,7 @@ Private_Dirty: 0 kB
Referenced: 892 kB Referenced: 892 kB
Anonymous: 0 kB Anonymous: 0 kB
AnonHugePages: 0 kB AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB Private_Hugetlb: 0 kB
Swap: 0 kB Swap: 0 kB
...@@ -464,6 +465,8 @@ accessed. ...@@ -464,6 +465,8 @@ accessed.
a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
and a page is modified, the file page is replaced by a private anonymous copy. and a page is modified, the file page is replaced by a private anonymous copy.
"AnonHugePages" shows the ammount of memory backed by transparent hugepage. "AnonHugePages" shows the ammount of memory backed by transparent hugepage.
"ShmemPmdMapped" shows the ammount of shared (shmem/tmpfs) memory backed by
huge pages.
"Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by "Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field. reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field.
...@@ -868,6 +871,9 @@ VmallocTotal: 112216 kB ...@@ -868,6 +871,9 @@ VmallocTotal: 112216 kB
VmallocUsed: 428 kB VmallocUsed: 428 kB
VmallocChunk: 111088 kB VmallocChunk: 111088 kB
AnonHugePages: 49152 kB AnonHugePages: 49152 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
MemTotal: Total usable ram (i.e. physical ram minus a few reserved MemTotal: Total usable ram (i.e. physical ram minus a few reserved
bits and the kernel binary code) bits and the kernel binary code)
...@@ -912,6 +918,9 @@ MemAvailable: An estimate of how much memory is available for starting new ...@@ -912,6 +918,9 @@ MemAvailable: An estimate of how much memory is available for starting new
AnonHugePages: Non-file backed huge pages mapped into userspace page tables AnonHugePages: Non-file backed huge pages mapped into userspace page tables
Mapped: files which have been mmaped, such as libraries Mapped: files which have been mmaped, such as libraries
Shmem: Total memory used by shared memory (shmem) and tmpfs Shmem: Total memory used by shared memory (shmem) and tmpfs
ShmemHugePages: Memory used by shared memory (shmem) and tmpfs allocated
with huge pages
ShmemPmdMapped: Shared memory mapped into userspace with huge pages
Slab: in-kernel data structures cache Slab: in-kernel data structures cache
SReclaimable: Part of Slab, that might be reclaimed, such as caches SReclaimable: Part of Slab, that might be reclaimed, such as caches
SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure
......
...@@ -9,8 +9,8 @@ using huge pages for the backing of virtual memory with huge pages ...@@ -9,8 +9,8 @@ using huge pages for the backing of virtual memory with huge pages
that supports the automatic promotion and demotion of page sizes and that supports the automatic promotion and demotion of page sizes and
without the shortcomings of hugetlbfs. without the shortcomings of hugetlbfs.
Currently it only works for anonymous memory mappings but in the Currently it only works for anonymous memory mappings and tmpfs/shmem.
future it can expand over the pagecache layer starting with tmpfs. But in the future it can expand to other filesystems.
The reason applications are running faster is because of two The reason applications are running faster is because of two
factors. The first factor is almost completely irrelevant and it's not factors. The first factor is almost completely irrelevant and it's not
...@@ -57,10 +57,6 @@ miss is going to run faster. ...@@ -57,10 +57,6 @@ miss is going to run faster.
feature that applies to all dynamic high order allocations in the feature that applies to all dynamic high order allocations in the
kernel) kernel)
- this initial support only offers the feature in the anonymous memory
regions but it'd be ideal to move it to tmpfs and the pagecache
later
Transparent Hugepage Support maximizes the usefulness of free memory Transparent Hugepage Support maximizes the usefulness of free memory
if compared to the reservation approach of hugetlbfs by allowing all if compared to the reservation approach of hugetlbfs by allowing all
unused memory to be used as cache or other movable (or even unmovable unused memory to be used as cache or other movable (or even unmovable
...@@ -94,21 +90,21 @@ madvise(MADV_HUGEPAGE) on their critical mmapped regions. ...@@ -94,21 +90,21 @@ madvise(MADV_HUGEPAGE) on their critical mmapped regions.
== sysfs == == sysfs ==
Transparent Hugepage Support can be entirely disabled (mostly for Transparent Hugepage Support for anonymous memory can be entirely disabled
debugging purposes) or only enabled inside MADV_HUGEPAGE regions (to (mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE
avoid the risk of consuming more memory resources) or enabled system regions (to avoid the risk of consuming more memory resources) or enabled
wide. This can be achieved with one of: system wide. This can be achieved with one of:
echo always >/sys/kernel/mm/transparent_hugepage/enabled echo always >/sys/kernel/mm/transparent_hugepage/enabled
echo madvise >/sys/kernel/mm/transparent_hugepage/enabled echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
echo never >/sys/kernel/mm/transparent_hugepage/enabled echo never >/sys/kernel/mm/transparent_hugepage/enabled
It's also possible to limit defrag efforts in the VM to generate It's also possible to limit defrag efforts in the VM to generate
hugepages in case they're not immediately free to madvise regions or anonymous hugepages in case they're not immediately free to madvise
to never try to defrag memory and simply fallback to regular pages regions or to never try to defrag memory and simply fallback to regular
unless hugepages are immediately available. Clearly if we spend CPU pages unless hugepages are immediately available. Clearly if we spend CPU
time to defrag memory, we would expect to gain even more by the fact time to defrag memory, we would expect to gain even more by the fact we
we use hugepages later instead of regular pages. This isn't always use hugepages later instead of regular pages. This isn't always
guaranteed, but it may be more likely in case the allocation is for a guaranteed, but it may be more likely in case the allocation is for a
MADV_HUGEPAGE region. MADV_HUGEPAGE region.
...@@ -133,9 +129,9 @@ that are have used madvise(MADV_HUGEPAGE). This is the default behaviour. ...@@ -133,9 +129,9 @@ that are have used madvise(MADV_HUGEPAGE). This is the default behaviour.
"never" should be self-explanatory. "never" should be self-explanatory.
By default kernel tries to use huge zero page on read page fault. By default kernel tries to use huge zero page on read page fault to
It's possible to disable huge zero page by writing 0 or enable it anonymous mapping. It's possible to disable huge zero page by writing 0
back by writing 1: or enable it back by writing 1:
echo 0 >/sys/kernel/mm/transparent_hugepage/use_zero_page echo 0 >/sys/kernel/mm/transparent_hugepage/use_zero_page
echo 1 >/sys/kernel/mm/transparent_hugepage/use_zero_page echo 1 >/sys/kernel/mm/transparent_hugepage/use_zero_page
...@@ -204,21 +200,67 @@ Support by passing the parameter "transparent_hugepage=always" or ...@@ -204,21 +200,67 @@ Support by passing the parameter "transparent_hugepage=always" or
"transparent_hugepage=madvise" or "transparent_hugepage=never" "transparent_hugepage=madvise" or "transparent_hugepage=never"
(without "") to the kernel command line. (without "") to the kernel command line.
== Hugepages in tmpfs/shmem ==
You can control hugepage allocation policy in tmpfs with mount option
"huge=". It can have following values:
- "always":
Attempt to allocate huge pages every time we need a new page;
- "never":
Do not allocate huge pages;
- "within_size":
Only allocate huge page if it will be fully within i_size.
Also respect fadvise()/madvise() hints;
- "advise:
Only allocate huge pages if requested with fadvise()/madvise();
The default policy is "never".
"mount -o remount,huge= /mountpoint" works fine after mount: remounting
huge=never will not attempt to break up huge pages at all, just stop more
from being allocated.
There's also sysfs knob to control hugepage allocation policy for internal
shmem mount: /sys/kernel/mm/transparent_hugepage/shmem_enabled. The mount
is used for SysV SHM, memfds, shared anonymous mmaps (of /dev/zero or
MAP_ANONYMOUS), GPU drivers' DRM objects, Ashmem.
In addition to policies listed above, shmem_enabled allows two further
values:
- "deny":
For use in emergencies, to force the huge option off from
all mounts;
- "force":
Force the huge option on for all - very useful for testing;
== Need of application restart == == Need of application restart ==
The transparent_hugepage/enabled values only affect future The transparent_hugepage/enabled values and tmpfs mount option only affect
behavior. So to make them effective you need to restart any future behavior. So to make them effective you need to restart any
application that could have been using hugepages. This also applies to application that could have been using hugepages. This also applies to the
the regions registered in khugepaged. regions registered in khugepaged.
== Monitoring usage == == Monitoring usage ==
The number of transparent huge pages currently used by the system is The number of anonymous transparent huge pages currently used by the
available by reading the AnonHugePages field in /proc/meminfo. To system is available by reading the AnonHugePages field in /proc/meminfo.
identify what applications are using transparent huge pages, it is To identify what applications are using anonymous transparent huge pages,
necessary to read /proc/PID/smaps and count the AnonHugePages fields it is necessary to read /proc/PID/smaps and count the AnonHugePages fields
for each mapping. Note that reading the smaps file is expensive and for each mapping.
reading it frequently will incur overhead.
The number of file transparent huge pages mapped to userspace is available
by reading ShmemPmdMapped and ShmemHugePages fields in /proc/meminfo.
To identify what applications are mapping file transparent huge pages, it
is necessary to read /proc/PID/smaps and count the FileHugeMapped fields
for each mapping.
Note that reading the smaps file is expensive and reading it
frequently will incur overhead.
There are a number of counters in /proc/vmstat that may be used to There are a number of counters in /proc/vmstat that may be used to
monitor how successfully the system is providing huge pages for use. monitor how successfully the system is providing huge pages for use.
...@@ -238,6 +280,12 @@ thp_collapse_alloc_failed is incremented if khugepaged found a range ...@@ -238,6 +280,12 @@ thp_collapse_alloc_failed is incremented if khugepaged found a range
of pages that should be collapsed into one huge page but failed of pages that should be collapsed into one huge page but failed
the allocation. the allocation.
thp_file_alloc is incremented every time a file huge page is successfully
i allocated.
thp_file_mapped is incremented every time a file huge page is mapped into
user address space.
thp_split_page is incremented every time a huge page is split into base thp_split_page is incremented every time a huge page is split into base
pages. This can happen for a variety of reasons but a common pages. This can happen for a variety of reasons but a common
reason is that a huge page is old and is being reclaimed. reason is that a huge page is old and is being reclaimed.
...@@ -403,19 +451,27 @@ pages: ...@@ -403,19 +451,27 @@ pages:
on relevant sub-page of the compound page. on relevant sub-page of the compound page.
- map/unmap of the whole compound page accounted in compound_mapcount - map/unmap of the whole compound page accounted in compound_mapcount
(stored in first tail page). (stored in first tail page). For file huge pages, we also increment
->_mapcount of all sub-pages in order to have race-free detection of
last unmap of subpages.
PageDoubleMap() indicates that ->_mapcount in all subpages is offset up by one. PageDoubleMap() indicates that the page is *possibly* mapped with PTEs.
This additional reference is required to get race-free detection of unmap of
subpages when we have them mapped with both PMDs and PTEs. For anonymous pages PageDoubleMap() also indicates ->_mapcount in all
subpages is offset up by one. This additional reference is required to
get race-free detection of unmap of subpages when we have them mapped with
both PMDs and PTEs.
This is optimization required to lower overhead of per-subpage mapcount This is optimization required to lower overhead of per-subpage mapcount
tracking. The alternative is alter ->_mapcount in all subpages on each tracking. The alternative is alter ->_mapcount in all subpages on each
map/unmap of the whole compound page. map/unmap of the whole compound page.
We set PG_double_map when a PMD of the page got split for the first time, For anonymous pages, we set PG_double_map when a PMD of the page got split
but still have PMD mapping. The additional references go away with last for the first time, but still have PMD mapping. The additional references
compound_mapcount. go away with last compound_mapcount.
File pages get PG_double_map set on first map of the page with PTE and
goes away when the page gets evicted from page cache.
split_huge_page internally has to distribute the refcounts in the head split_huge_page internally has to distribute the refcounts in the head
page to the tail pages before clearing all PG_head/tail bits from the page page to the tail pages before clearing all PG_head/tail bits from the page
...@@ -427,7 +483,7 @@ sum of mapcount of all sub-pages plus one (split_huge_page caller must ...@@ -427,7 +483,7 @@ sum of mapcount of all sub-pages plus one (split_huge_page caller must
have reference for head page). have reference for head page).
split_huge_page uses migration entries to stabilize page->_refcount and split_huge_page uses migration entries to stabilize page->_refcount and
page->_mapcount. page->_mapcount of anonymous pages. File pages just got unmapped.
We safe against physical memory scanners too: the only legitimate way We safe against physical memory scanners too: the only legitimate way
scanner can get reference to a page is get_page_unless_zero(). scanner can get reference to a page is get_page_unless_zero().
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment