• David Hildenbrand's avatar
    mm/memory_hotplug: introduce "auto-movable" online policy · e83a437f
    David Hildenbrand authored
    When onlining without specifying a zone (using "online" instead of
    "online_kernel" or "online_movable"), we currently select a zone such that
    existing zones are kept contiguous.  This online policy made sense in the
    past, where contiguous zones where required.
    
    We'd like to implement smarter policies, however:
    
    * User space has little insight.  As one example, it has no idea which
      memory blocks logically belong together (e.g., to a DIMM or to a
      virtio-mem device).
    
    * Drivers that add memory in separate memory blocks, especially
      virtio-mem, want memory to get onlined right from the kernel when
      adding.
    
    So we really want to have onlining to differing zones managed in the
    kernel, configured by user space.
    
    We see more and more cases where we might eventually hotplug a lot of
    memory in the future (e.g., eventually grow a 2 GiB VM to 64 GiB),
    however:
    
    * Resizing happens dynamically, in smaller steps in both directions
      (e.g., 2 GiB -> 8 GiB -> 4 GiB -> 16 GiB ...)
    
    * We still want as much flexibility as possible, especially,
      hotunplugging as much memory as possible later.
    
    We can really only use "online_movable" if we know that the amount of
    memory we are going to hotplug upfront, and we know that it won't result
    in a zone imbalance.  So in our example, a 2 GiB VM that could grow to 64
    GiB could currently not use "online_movable", and instead, "online_kernel"
    would have to be used, resulting in worse (no) memory hotunplug
    reliability.
    
    Let's add a new "auto-movable" online policy that considers the current
    zone ratios (global, per-node) to determine, whether we a memory block can
    be onlined to ZONE_MOVABLE:
    
    	MOVABLE : KERNEL
    
    However, internally we'll only consider the following ratio for now:
    
    	MOVABLE : KERNEL_EARLY
    
    For now, we don't allow for hotplugged KERNEL memory to allow for more
    MOVABLE memory, because there is no coordination across memory devices.
    In follow-up patches, we will allow for more KERNEL memory within a memory
    device to allow for more MOVABLE memory within the same memory device --
    which only makes sense for special memory device types.
    
    We base our calculation on "present pages", see the code comments for
    details.  Hotplugged memory will get online to ZONE_MOVABLE if the
    configured ratio allows for it.  Depending on the setup, this can result
    in fragmented zones, which can make compaction slower and dynamic
    allocation of gigantic pages when not using CMA less reliable (...  which
    is already pretty unreliable).
    
    The old policy will be the default and called "contig-zones".  In
    follow-up patches, our new policy will use additional information, such as
    memory groups, to make even smarter decisions across memory blocks.
    
    Configuration:
    
    * memory_hotplug.online_policy is used to switch between both polices
      and defaults to "contig-zones".
    
    * memory_hotplug.auto_movable_ratio defines the maximum ratio is in
      percent and defaults to "301" -- allowing e.g., most 8 GiB machines to
      grow to 32 GiB and have all hotplugged memory in ZONE_MOVABLE.  The
      additional percent accounts for a handful of lost present pages (e.g.,
      firmware allocations).  User space is expected to adjust this ratio when
      enabling the new "auto-movable" policy, though.
    
    * memory_hotplug.auto_movable_numa_aware considers numa node stats in
      addition to global stats, and defaults to "true".
    
    Note: just like the old policy, the new policy won't take things like
    unmovable huge pages or memory ballooning that doesn't support balloon
    compaction into account.  User space has to configure onlining
    accordingly.
    
    Link: https://lkml.kernel.org/r/20210806124715.17090-3-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Cc: Anshuman Khandual <anshuman.khandual@arm.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Hui Zhu <teawater@gmail.com>
    Cc: Jason Wang <jasowang@redhat.com>
    Cc: Len Brown <lenb@kernel.org>
    Cc: Marek Kedzierski <mkedzier@redhat.com>
    Cc: "Michael S. Tsirkin" <mst@redhat.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Mike Rapoport <rppt@kernel.org>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
    Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
    Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
    Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    e83a437f
memory_hotplug.c 62.7 KB