• Dan Williams's avatar
    dax: Assign RAM regions to memory-hotplug by default · e9ee9fe3
    Dan Williams authored
    The default mode for device-dax instances is backwards for RAM-regions
    as evidenced by the fact that it tends to catch end users by surprise.
    "Where is my memory?". Recall that platforms are increasingly shipping
    with performance-differentiated memory pools beyond typical DRAM and
    NUMA effects. This includes HBM (high-bandwidth-memory) and CXL (dynamic
    interleave, varied media types, and future fabric attached
    possibilities).
    
    For this reason the EFI_MEMORY_SP (EFI Special Purpose Memory => Linux
    'Soft Reserved') attribute is expected to be applied to all memory-pools
    that are not the general purpose pool. This designation gives an
    Operating System a chance to defer usage of a memory pool until later in
    the boot process where its performance properties can be interrogated
    and administrator policy can be applied.
    
    'Soft Reserved' memory can be anything from too limited and precious to
    be part of the general purpose pool (HBM), too slow to host hot kernel
    data structures (some PMEM media), or anything in between. However, in
    the absence of an explicit policy, the memory should at least be made
    usable by default. The current device-dax default hides all
    non-general-purpose memory behind a device interface.
    
    The expectation is that the distribution of users that want the memory
    online by default vs device-dedicated-access by default follows the
    Pareto principle. A small number of enlightened users may want to do
    userspace memory management through a device, but general users just
    want the kernel to make the memory available with an option to get more
    advanced later.
    
    Arrange for all device-dax instances not backed by PMEM to default to
    attaching to the dax_kmem driver. From there the baseline memory hotplug
    policy (CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE / memhp_default_state=)
    gates whether the memory comes online or stays offline. Where, if it
    stays offline, it can be reliably converted back to device-mode where it
    can be partitioned, or fronted by a userspace allocator.
    
    So, if someone wants device-dax instances for their 'Soft Reserved'
    memory:
    
    1/ Build a kernel with CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n or boot
       with memhp_default_state=offline, or roll the dice and hope that the
       kernel has not pinned a page in that memory before step 2.
    
    2/ Write a udev rule to convert the target dax device(s) from
       'system-ram' mode to 'devdax' mode:
    
       daxctl reconfigure-device $dax -m devdax -f
    
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Reviewed-by: default avatarGregory Price <gregory.price@memverge.com>
    Tested-by: default avatarFan Ni <fan.ni@samsung.com>
    Reviewed-by: default avatarDave Jiang <dave.jiang@intel.com>
    Link: https://lore.kernel.org/r/167602003336.1924368.6809503401422267885.stgit@dwillia2-xfh.jf.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
    e9ee9fe3
bus.c 35.3 KB