• Laurent Dufour's avatar
    mm: replace memmap_context by meminit_context · c1d0da83
    Laurent Dufour authored
    Patch series "mm: fix memory to node bad links in sysfs", v3.
    
    Sometimes, firmware may expose interleaved memory layout like this:
    
     Early memory node ranges
       node   1: [mem 0x0000000000000000-0x000000011fffffff]
       node   2: [mem 0x0000000120000000-0x000000014fffffff]
       node   1: [mem 0x0000000150000000-0x00000001ffffffff]
       node   0: [mem 0x0000000200000000-0x000000048fffffff]
       node   2: [mem 0x0000000490000000-0x00000007ffffffff]
    
    In that case, we can see memory blocks assigned to multiple nodes in
    sysfs:
    
      $ ls -l /sys/devices/system/memory/memory21
      total 0
      lrwxrwxrwx 1 root root     0 Aug 24 05:27 node1 -> ../../node/node1
      lrwxrwxrwx 1 root root     0 Aug 24 05:27 node2 -> ../../node/node2
      -rw-r--r-- 1 root root 65536 Aug 24 05:27 online
      -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_device
      -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_index
      drwxr-xr-x 2 root root     0 Aug 24 05:27 power
      -r--r--r-- 1 root root 65536 Aug 24 05:27 removable
      -rw-r--r-- 1 root root 65536 Aug 24 05:27 state
      lrwxrwxrwx 1 root root     0 Aug 24 05:25 subsystem -> ../../../../bus/memory
      -rw-r--r-- 1 root root 65536 Aug 24 05:25 uevent
      -r--r--r-- 1 root root 65536 Aug 24 05:27 valid_zones
    
    The same applies in the node's directory with a memory21 link in both
    the node1 and node2's directory.
    
    This is wrong but doesn't prevent the system to run.  However when
    later, one of these memory blocks is hot-unplugged and then hot-plugged,
    the system is detecting an inconsistency in the sysfs layout and a
    BUG_ON() is raised:
    
      kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084!
      LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
      Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4
      CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25
      Call Trace:
        add_memory_resource+0x23c/0x340 (unreliable)
        __add_memory+0x5c/0xf0
        dlpar_add_lmb+0x1b4/0x500
        dlpar_memory+0x1f8/0xb80
        handle_dlpar_errorlog+0xc0/0x190
        dlpar_store+0x198/0x4a0
        kobj_attr_store+0x30/0x50
        sysfs_kf_write+0x64/0x90
        kernfs_fop_write+0x1b0/0x290
        vfs_write+0xe8/0x290
        ksys_write+0xdc/0x130
        system_call_exception+0x160/0x270
        system_call_common+0xf0/0x27c
    
    This has been seen on PowerPC LPAR.
    
    The root cause of this issue is that when node's memory is registered,
    the range used can overlap another node's range, thus the memory block
    is registered to multiple nodes in sysfs.
    
    There are two issues here:
    
     (a) The sysfs memory and node's layouts are broken due to these
         multiple links
    
     (b) The link errors in link_mem_sections() should not lead to a system
         panic.
    
    To address (a) register_mem_sect_under_node should not rely on the
    system state to detect whether the link operation is triggered by a hot
    plug operation or not.  This is addressed by the patches 1 and 2 of this
    series.
    
    Issue (b) will be addressed separately.
    
    This patch (of 2):
    
    The memmap_context enum is used to detect whether a memory operation is
    due to a hot-add operation or happening at boot time.
    
    Make it general to the hotplug operation and rename it as
    meminit_context.
    
    There is no functional change introduced by this patch
    Suggested-by: default avatarDavid Hildenbrand <david@redhat.com>
    Signed-off-by: default avatarLaurent Dufour <ldufour@linux.ibm.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
    Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: "Rafael J . Wysocki" <rafael@kernel.org>
    Cc: Nathan Lynch <nathanl@linux.ibm.com>
    Cc: Scott Cheloha <cheloha@linux.ibm.com>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Fenghua Yu <fenghua.yu@intel.com>
    Cc: <stable@vger.kernel.org>
    Link: https://lkml.kernel.org/r/20200915094143.79181-1-ldufour@linux.ibm.com
    Link: https://lkml.kernel.org/r/20200915132624.9723-1-ldufour@linux.ibm.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    c1d0da83
mm.h 98.6 KB