• Baoquan He's avatar
    mm: memmap defer init doesn't work as expected · dc2da7b4
    Baoquan He authored
    VMware observed a performance regression during memmap init on their
    platform, and bisected to commit 73a6e474 ("mm: memmap_init:
    iterate over memblock regions rather that check each PFN") causing it.
    
    Before the commit:
    
      [0.033176] Normal zone: 1445888 pages used for memmap
      [0.033176] Normal zone: 89391104 pages, LIFO batch:63
      [0.035851] ACPI: PM-Timer IO Port: 0x448
    
    With commit
    
      [0.026874] Normal zone: 1445888 pages used for memmap
      [0.026875] Normal zone: 89391104 pages, LIFO batch:63
      [2.028450] ACPI: PM-Timer IO Port: 0x448
    
    The root cause is the current memmap defer init doesn't work as expected.
    
    Before, memmap_init_zone() was used to do memmap init of one whole zone,
    to initialize all low zones of one numa node, but defer memmap init of
    the last zone in that numa node.  However, since commit 73a6e474,
    function memmap_init() is adapted to iterater over memblock regions
    inside one zone, then call memmap_init_zone() to do memmap init for each
    region.
    
    E.g, on VMware's system, the memory layout is as below, there are two
    memory regions in node 2.  The current code will mistakenly initialize the
    whole 1st region [mem 0xab00000000-0xfcffffffff], then do memmap defer to
    iniatialize only one memmory section on the 2nd region [mem
    0x10000000000-0x1033fffffff].  In fact, we only expect to see that there's
    only one memory section's memmap initialized.  That's why more time is
    costed at the time.
    
    [    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
    [    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
    [    0.008843] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x55ffffffff]
    [    0.008844] ACPI: SRAT: Node 1 PXM 1 [mem 0x5600000000-0xaaffffffff]
    [    0.008844] ACPI: SRAT: Node 2 PXM 2 [mem 0xab00000000-0xfcffffffff]
    [    0.008845] ACPI: SRAT: Node 2 PXM 2 [mem 0x10000000000-0x1033fffffff]
    
    Now, let's add a parameter 'zone_end_pfn' to memmap_init_zone() to pass
    down the real zone end pfn so that defer_init() can use it to judge
    whether defer need be taken in zone wide.
    
    Link: https://lkml.kernel.org/r/20201223080811.16211-1-bhe@redhat.com
    Link: https://lkml.kernel.org/r/20201223080811.16211-2-bhe@redhat.com
    Fixes: commit 73a6e474 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
    Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
    Reported-by: default avatarRahul Gopakumar <gopakumarr@vmware.com>
    Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    dc2da7b4
memory_hotplug.c 50.7 KB