• Pavel Tatashin's avatar
    mm: deferred_init_memmap improvements · 2f47a91f
    Pavel Tatashin authored
    Patch series "complete deferred page initialization", v12.
    
    SMP machines can benefit from the DEFERRED_STRUCT_PAGE_INIT config
    option, which defers initializing struct pages until all cpus have been
    started so it can be done in parallel.
    
    However, this feature is sub-optimal, because the deferred page
    initialization code expects that the struct pages have already been
    zeroed, and the zeroing is done early in boot with a single thread only.
    Also, we access that memory and set flags before struct pages are
    initialized.  All of this is fixed in this patchset.
    
    In this work we do the following:
     - Never read access struct page until it was initialized
     - Never set any fields in struct pages before they are initialized
     - Zero struct page at the beginning of struct page initialization
    
    ==========================================================================
    Performance improvements on x86 machine with 8 nodes:
    Intel(R) Xeon(R) CPU E7-8895 v3 @ 2.60GHz and 1T of memory:
                            TIME          SPEED UP
    base no deferred:       95.796233s
    fix no deferred:        79.978956s    19.77%
    
    base deferred:          77.254713s
    fix deferred:           55.050509s    40.34%
    ==========================================================================
    SPARC M6 3600 MHz with 15T of memory
                            TIME          SPEED UP
    base no deferred:       358.335727s
    fix no deferred:        302.320936s   18.52%
    
    base deferred:          237.534603s
    fix deferred:           182.103003s   30.44%
    ==========================================================================
    Raw dmesg output with timestamps:
    x86 base no deferred:    https://hastebin.com/ofunepurit.scala
    x86 base deferred:       https://hastebin.com/ifazegeyas.scala
    x86 fix no deferred:     https://hastebin.com/pegocohevo.scala
    x86 fix deferred:        https://hastebin.com/ofupevikuk.scala
    sparc base no deferred:  https://hastebin.com/ibobeteken.go
    sparc base deferred:     https://hastebin.com/fariqimiyu.go
    sparc fix no deferred:   https://hastebin.com/muhegoheyi.go
    sparc fix deferred:      https://hastebin.com/xadinobutu.go
    
    This patch (of 11):
    
    deferred_init_memmap() is called when struct pages are initialized later
    in boot by slave CPUs.  This patch simplifies and optimizes this
    function, and also fixes a couple issues (described below).
    
    The main change is that now we are iterating through free memblock areas
    instead of all configured memory.  Thus, we do not have to check if the
    struct page has already been initialized.
    
    =====
    In deferred_init_memmap() where all deferred struct pages are
    initialized we have a check like this:
    
      if (page->flags) {
    	VM_BUG_ON(page_zone(page) != zone);
    	goto free_range;
      }
    
    This way we are checking if the current deferred page has already been
    initialized.  It works, because memory for struct pages has been zeroed,
    and the only way flags are not zero if it went through
    __init_single_page() before.  But, once we change the current behavior
    and won't zero the memory in memblock allocator, we cannot trust
    anything inside "struct page"es until they are initialized.  This patch
    fixes this.
    
    The deferred_init_memmap() is re-written to loop through only free
    memory ranges provided by memblock.
    
    Note, this first issue is relevant only when the following change is
    merged:
    
    =====
    This patch fixes another existing issue on systems that have holes in
    zones i.e CONFIG_HOLES_IN_ZONE is defined.
    
    In for_each_mem_pfn_range() we have code like this:
    
      if (!pfn_valid_within(pfn)
    	goto free_range;
    
    Note: 'page' is not set to NULL and is not incremented but 'pfn'
    advances.  Thus means if deferred struct pages are enabled on systems
    with these kind of holes, linux would get memory corruptions.  I have
    fixed this issue by defining a new macro that performs all the necessary
    operations when we free the current set of pages.
    
    [pasha.tatashin@oracle.com: buddy page accessed before initialized]
      Link: http://lkml.kernel.org/r/20171102170221.7401-2-pasha.tatashin@oracle.com
    Link: http://lkml.kernel.org/r/20171013173214.27300-2-pasha.tatashin@oracle.comSigned-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
    Reviewed-by: default avatarSteven Sistare <steven.sistare@oracle.com>
    Reviewed-by: default avatarDaniel Jordan <daniel.m.jordan@oracle.com>
    Reviewed-by: default avatarBob Picco <bob.picco@oracle.com>
    Tested-by: default avatarBob Picco <bob.picco@oracle.com>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Cc: Christian Borntraeger <borntraeger@de.ibm.com>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Will Deacon <will.deacon@arm.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Sam Ravnborg <sam@ravnborg.org>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Alexander Potapenko <glider@google.com>
    Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    2f47a91f
page_alloc.c 212 KB