• Dan Williams's avatar
    mm/sparsemem: support sub-section hotplug · ba72b4c8
    Dan Williams authored
    The libnvdimm sub-system has suffered a series of hacks and broken
    workarounds for the memory-hotplug implementation's awkward
    section-aligned (128MB) granularity.
    
    For example the following backtrace is emitted when attempting
    arch_add_memory() with physical address ranges that intersect 'System
    RAM' (RAM) with 'Persistent Memory' (PMEM) within a given section:
    
        # cat /proc/iomem | grep -A1 -B1 Persistent\ Memory
        100000000-1ffffffff : System RAM
        200000000-303ffffff : Persistent Memory (legacy)
        304000000-43fffffff : System RAM
        440000000-23ffffffff : Persistent Memory
        2400000000-43bfffffff : Persistent Memory
          2400000000-43bfffffff : namespace2.0
    
        WARNING: CPU: 38 PID: 928 at arch/x86/mm/init_64.c:850 add_pages+0x5c/0x60
        [..]
        RIP: 0010:add_pages+0x5c/0x60
        [..]
        Call Trace:
         devm_memremap_pages+0x460/0x6e0
         pmem_attach_disk+0x29e/0x680 [nd_pmem]
         ? nd_dax_probe+0xfc/0x120 [libnvdimm]
         nvdimm_bus_probe+0x66/0x160 [libnvdimm]
    
    It was discovered that the problem goes beyond RAM vs PMEM collisions as
    some platform produce PMEM vs PMEM collisions within a given section.
    The libnvdimm workaround for that case revealed that the libnvdimm
    section-alignment-padding implementation has been broken for a long
    while.
    
    A fix for that long-standing breakage introduces as many problems as it
    solves as it would require a backward-incompatible change to the
    namespace metadata interpretation.  Instead of that dubious route [1],
    address the root problem in the memory-hotplug implementation.
    
    Note that EEXIST is no longer treated as success as that is how
    sparse_add_section() reports subsection collisions, it was also obviated
    by recent changes to perform the request_region() for 'System RAM'
    before arch_add_memory() in the add_memory() sequence.
    
    [1] https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com
    
    [osalvador@suse.de: fix deactivate_section for early sections]
      Link: http://lkml.kernel.org/r/20190715081549.32577-2-osalvador@suse.de
    Link: http://lkml.kernel.org/r/156092354368.979959.6232443923440952359.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
    Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
    Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
    Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Logan Gunthorpe <logang@deltatee.com>
    Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Jane Chu <jane.chu@oracle.com>
    Cc: Jeff Moyer <jmoyer@redhat.com>
    Cc: Jérôme Glisse <jglisse@redhat.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Mike Rapoport <rppt@linux.ibm.com>
    Cc: Toshi Kani <toshi.kani@hpe.com>
    Cc: Wei Yang <richardw.yang@linux.intel.com>
    Cc: Jason Gunthorpe <jgg@mellanox.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    ba72b4c8
memory_hotplug.c 47.5 KB