Commit 9e122cc1 authored by David Hildenbrand's avatar David Hildenbrand Committed by Linus Torvalds

memory-hotplug.rst: document the "auto-movable" online policy

Commit e83a437f ("mm/memory_hotplug: introduce "auto-movable" online
policy") introduced a new memory online policy to automatically select a
zone for memory blocks to be onlined.  It added a way to set the active
online policy and tunables for the auto-movable online policy.

Follow-up commits tweaked the "auto-movable" policy to also consider
memory device details when selecting zones for memory blocks to be
onlined.

Let's document the new toggles and how the two online policies we have
work.

[david@redhat.com: updates]
  Link: https://lkml.kernel.org/r/20211011082058.6076-4-david@redhat.com

Link: https://lkml.kernel.org/r/20210930144117.23641-4-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent a8db400f
...@@ -165,9 +165,8 @@ Or alternatively:: ...@@ -165,9 +165,8 @@ Or alternatively::
% echo 1 > /sys/devices/system/memory/memoryXXX/online % echo 1 > /sys/devices/system/memory/memoryXXX/online
The kernel will select the target zone automatically, usually defaulting to The kernel will select the target zone automatically, depending on the
``ZONE_NORMAL`` unless ``movable_node`` has been specified on the kernel configured ``online_policy``.
command line or if the memory block would intersect the ZONE_MOVABLE already.
One can explicitly request to associate an offline memory block with One can explicitly request to associate an offline memory block with
ZONE_MOVABLE by:: ZONE_MOVABLE by::
...@@ -198,6 +197,9 @@ Auto-onlining can be enabled by writing ``online``, ``online_kernel`` or ...@@ -198,6 +197,9 @@ Auto-onlining can be enabled by writing ``online``, ``online_kernel`` or
% echo online > /sys/devices/system/memory/auto_online_blocks % echo online > /sys/devices/system/memory/auto_online_blocks
Similarly to manual onlining, with ``online`` the kernel will select the
target zone automatically, depending on the configured ``online_policy``.
Modifying the auto-online behavior will only affect all subsequently added Modifying the auto-online behavior will only affect all subsequently added
memory blocks only. memory blocks only.
...@@ -393,11 +395,16 @@ command line parameters are relevant: ...@@ -393,11 +395,16 @@ command line parameters are relevant:
======================== ======================================================= ======================== =======================================================
``memhp_default_state`` configure auto-onlining by essentially setting ``memhp_default_state`` configure auto-onlining by essentially setting
``/sys/devices/system/memory/auto_online_blocks``. ``/sys/devices/system/memory/auto_online_blocks``.
``movable_node`` configure automatic zone selection in the kernel. When ``movable_node`` configure automatic zone selection in the kernel when
set, the kernel will default to ZONE_MOVABLE, unless using the ``contig-zones`` online policy. When
other zones can be kept contiguous. set, the kernel will default to ZONE_MOVABLE when
onlining a memory block, unless other zones can be kept
contiguous.
======================== ======================================================= ======================== =======================================================
See Documentation/admin-guide/kernel-parameters.txt for a more generic
description of these command line parameters.
Module Parameters Module Parameters
------------------ ------------------
...@@ -414,20 +421,114 @@ and they can be observed (and some even modified at runtime) via:: ...@@ -414,20 +421,114 @@ and they can be observed (and some even modified at runtime) via::
The following module parameters are currently defined: The following module parameters are currently defined:
======================== ======================================================= ================================ ===============================================
``memmap_on_memory`` read-write: Allocate memory for the memmap from the ``memmap_on_memory`` read-write: Allocate memory for the memmap from
added memory block itself. Even if enabled, actual the added memory block itself. Even if enabled,
support depends on various other system properties and actual support depends on various other system
should only be regarded as a hint whether the behavior properties and should only be regarded as a
would be desired. hint whether the behavior would be desired.
While allocating the memmap from the memory block While allocating the memmap from the memory
itself makes memory hotplug less likely to fail and block itself makes memory hotplug less likely
keeps the memmap on the same NUMA node in any case, it to fail and keeps the memmap on the same NUMA
can fragment physical memory in a way that huge pages node in any case, it can fragment physical
in bigger granularity cannot be formed on hotplugged memory in a way that huge pages in bigger
memory. granularity cannot be formed on hotplugged
======================== ======================================================= memory.
``online_policy`` read-write: Set the basic policy used for
automatic zone selection when onlining memory
blocks without specifying a target zone.
``contig-zones`` has been the kernel default
before this parameter was added. After an
online policy was configured and memory was
online, the policy should not be changed
anymore.
When set to ``contig-zones``, the kernel will
try keeping zones contiguous. If a memory block
intersects multiple zones or no zone, the
behavior depends on the ``movable_node`` kernel
command line parameter: default to ZONE_MOVABLE
if set, default to the applicable kernel zone
(usually ZONE_NORMAL) if not set.
When set to ``auto-movable``, the kernel will
try onlining memory blocks to ZONE_MOVABLE if
possible according to the configuration and
memory device details. With this policy, one
can avoid zone imbalances when eventually
hotplugging a lot of memory later and still
wanting to be able to hotunplug as much as
possible reliably, very desirable in
virtualized environments. This policy ignores
the ``movable_node`` kernel command line
parameter and isn't really applicable in
environments that require it (e.g., bare metal
with hotunpluggable nodes) where hotplugged
memory might be exposed via the
firmware-provided memory map early during boot
to the system instead of getting detected,
added and onlined later during boot (such as
done by virtio-mem or by some hypervisors
implementing emulated DIMMs). As one example, a
hotplugged DIMM will be onlined either
completely to ZONE_MOVABLE or completely to
ZONE_NORMAL, not a mixture.
As another example, as many memory blocks
belonging to a virtio-mem device will be
onlined to ZONE_MOVABLE as possible,
special-casing units of memory blocks that can
only get hotunplugged together. *This policy
does not protect from setups that are
problematic with ZONE_MOVABLE and does not
change the zone of memory blocks dynamically
after they were onlined.*
``auto_movable_ratio`` read-write: Set the maximum MOVABLE:KERNEL
memory ratio in % for the ``auto-movable``
online policy. Whether the ratio applies only
for the system across all NUMA nodes or also
per NUMA nodes depends on the
``auto_movable_numa_aware`` configuration.
All accounting is based on present memory pages
in the zones combined with accounting per
memory device. Memory dedicated to the CMA
allocator is accounted as MOVABLE, although
residing on one of the kernel zones. The
possible ratio depends on the actual workload.
The kernel default is "301" %, for example,
allowing for hotplugging 24 GiB to a 8 GiB VM
and automatically onlining all hotplugged
memory to ZONE_MOVABLE in many setups. The
additional 1% deals with some pages being not
present, for example, because of some firmware
allocations.
Note that ZONE_NORMAL memory provided by one
memory device does not allow for more
ZONE_MOVABLE memory for a different memory
device. As one example, onlining memory of a
hotplugged DIMM to ZONE_NORMAL will not allow
for another hotplugged DIMM to get onlined to
ZONE_MOVABLE automatically. In contrast, memory
hotplugged by a virtio-mem device that got
onlined to ZONE_NORMAL will allow for more
ZONE_MOVABLE memory within *the same*
virtio-mem device.
``auto_movable_numa_aware`` read-write: Configure whether the
``auto_movable_ratio`` in the ``auto-movable``
online policy also applies per NUMA
node in addition to the whole system across all
NUMA nodes. The kernel default is "Y".
Disabling NUMA awareness can be helpful when
dealing with NUMA nodes that should be
completely hotunpluggable, onlining the memory
completely to ZONE_MOVABLE automatically if
possible.
Parameter availability depends on CONFIG_NUMA.
================================ ===============================================
ZONE_MOVABLE ZONE_MOVABLE
============ ============
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment