• David Hildenbrand's avatar
    drivers/base/memory: rename MMOP_ONLINE_KEEP to MMOP_ONLINE · 956f8b44
    David Hildenbrand authored
    Patch series "mm/memory_hotplug: allow to specify a default online_type", v3.
    
    Distributions nowadays use udev rules ([1] [2]) to specify if and how to
    online hotplugged memory.  The rules seem to get more complex with many
    special cases.  Due to the various special cases,
    CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used.  All memory hotplug
    is handled via udev rules.
    
    Every time we hotplug memory, the udev rule will come to the same
    conclusion.  Especially Hyper-V (but also soon virtio-mem) add a lot of
    memory in separate memory blocks and wait for memory to get onlined by
    user space before continuing to add more memory blocks (to not add memory
    faster than it is getting onlined).  This of course slows down the whole
    memory hotplug process.
    
    To make the job of distributions easier and to avoid udev rules that get
    more and more complicated, let's extend the mechanism provided by
    - /sys/devices/system/memory/auto_online_blocks
    - "memhp_default_state=" on the kernel cmdline
    to be able to specify also "online_movable" as well as "online_kernel"
    
    === Example /usr/libexec/config-memhotplug ===
    
    #!/bin/bash
    
    VIRT=`systemd-detect-virt --vm`
    ARCH=`uname -p`
    
    sense_virtio_mem() {
      if [ -d "/sys/bus/virtio/drivers/virtio_mem/" ]; then
        DEVICES=`find /sys/bus/virtio/drivers/virtio_mem/ -maxdepth 1 -type l | wc -l`
        if [ $DEVICES != "0" ]; then
            return 0
        fi
      fi
      return 1
    }
    
    if [ ! -e "/sys/devices/system/memory/auto_online_blocks" ]; then
      echo "Memory hotplug configuration support missing in the kernel"
      exit 1
    fi
    
    if grep "memhp_default_state=" /proc/cmdline > /dev/null; then
      echo "Memory hotplug configuration overridden in kernel cmdline (memhp_default_state=)"
      exit 1
    fi
    
    if [ $VIRT == "microsoft" ]; then
      echo "Detected Hyper-V on $ARCH"
      # Hyper-V wants all memory in ZONE_NORMAL
      ONLINE_TYPE="online_kernel"
    elif sense_virtio_mem; then
      echo "Detected virtio-mem on $ARCH"
      # virtio-mem wants all memory in ZONE_NORMAL
      ONLINE_TYPE="online_kernel"
    elif [ $ARCH == "s390x" ] || [ $ARCH == "s390" ]; then
      echo "Detected $ARCH"
      # standby memory should not be onlined automatically
      ONLINE_TYPE="offline"
    elif [ $ARCH == "ppc64" ] || [ $ARCH == "ppc64le" ]; then
      echo "Detected" $ARCH
      # PPC64 onlines all hotplugged memory right from the kernel
      ONLINE_TYPE="offline"
    elif [ $VIRT == "none" ]; then
      echo "Detected bare-metal on $ARCH"
      # Bare metal users expect hotplugged memory to be unpluggable. We assume
      # that ZONE imbalances on such enterpise servers cannot happen and is
      # properly documented
      ONLINE_TYPE="online_movable"
    else
      # TODO: Hypervisors that want to unplug DIMMs and can guarantee that ZONE
      # imbalances won't happen
      echo "Detected $VIRT on $ARCH"
      # Usually, ballooning is used in virtual environments, so memory should go to
      # ZONE_NORMAL. However, sometimes "movable_node" is relevant.
      ONLINE_TYPE="online"
    fi
    
    echo "Selected online_type:" $ONLINE_TYPE
    
    # Configure what to do with memory that will be hotplugged in the future
    echo $ONLINE_TYPE 2>/dev/null > /sys/devices/system/memory/auto_online_blocks
    if [ $? != "0" ]; then
      echo "Memory hotplug cannot be configured (e.g., old kernel or missing permissions)"
      # A backup udev rule should handle old kernels if necessary
      exit 1
    fi
    
    # Process all already pluggedd blocks (e.g., DIMMs, but also Hyper-V or virtio-mem)
    if [ $ONLINE_TYPE != "offline" ]; then
      for MEMORY in /sys/devices/system/memory/memory*; do
        STATE=`cat $MEMORY/state`
        if [ $STATE == "offline" ]; then
            echo $ONLINE_TYPE > $MEMORY/state
        fi
      done
    fi
    
    === Example /usr/lib/systemd/system/config-memhotplug.service ===
    
    [Unit]
    Description=Configure memory hotplug behavior
    DefaultDependencies=no
    Conflicts=shutdown.target
    Before=sysinit.target shutdown.target
    After=systemd-modules-load.service
    ConditionPathExists=|/sys/devices/system/memory/auto_online_blocks
    
    [Service]
    ExecStart=/usr/libexec/config-memhotplug
    Type=oneshot
    TimeoutSec=0
    RemainAfterExit=yes
    
    [Install]
    WantedBy=sysinit.target
    
    === Example modification to the 40-redhat.rules [2] ===
    
    : diff --git a/40-redhat.rules b/40-redhat.rules-new
    : index 2c690e5..168fd03 100644
    : --- a/40-redhat.rules
    : +++ b/40-redhat.rules-new
    : @@ -6,6 +6,9 @@ SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", ATTR{online}=="0", ATTR{online}
    :  # Memory hotadd request
    :  SUBSYSTEM!="memory", GOTO="memory_hotplug_end"
    :  ACTION!="add", GOTO="memory_hotplug_end"
    : +# memory hotplug behavior configured
    : +PROGRAM=="grep online /sys/devices/system/memory/auto_online_blocks", GOTO="memory_hotplug_end"
    : +
    :  PROGRAM="/bin/uname -p", RESULT=="s390*", GOTO="memory_hotplug_end"
    :
    :  ENV{.state}="online"
    
    ===
    
    [1] https://github.com/lnykryn/systemd-rhel/pull/281
    [2] https://github.com/lnykryn/systemd-rhel/blob/staging/rules/40-redhat.rules
    
    This patch (of 8):
    
    The name is misleading and it's not really clear what is "kept".  Let's
    just name it like the online_type name we expose to user space ("online").
    
    Add some documentation to the types.
    Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Reviewed-by: default avatarWei Yang <richard.weiyang@gmail.com>
    Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
    Acked-by: default avatarPankaj Gupta <pankaj.gupta.linux@gmail.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: "Rafael J. Wysocki" <rafael@kernel.org>
    Cc: Wei Yang <richard.weiyang@gmail.com>
    Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
    Cc: Yumei Huang <yuhuang@redhat.com>
    Cc: Igor Mammedov <imammedo@redhat.com>
    Cc: Eduardo Habkost <ehabkost@redhat.com>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Haiyang Zhang <haiyangz@microsoft.com>
    Cc: K. Y. Srinivasan <kys@microsoft.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Stephen Hemminger <sthemmin@microsoft.com>
    Cc: Wei Liu <wei.liu@kernel.org>
    Link: http://lkml.kernel.org/r/20200319131221.14044-1-david@redhat.com
    Link: http://lkml.kernel.org/r/20200317104942.11178-2-david@redhat.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    956f8b44
memory.c 20 KB