• Lin Feng's avatar
    init/main: fix broken buffer_init when DEFERRED_STRUCT_PAGE_INIT set · ba8f3587
    Lin Feng authored
    In the booting phase if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set,
    we have following callchain:
    
    start_kernel
    ...
      mm_init
        mem_init
         memblock_free_all
           reset_all_zones_managed_pages
           free_low_memory_core_early
    ...
      buffer_init
        nr_free_buffer_pages
          zone->managed_pages
    ...
      rest_init
        kernel_init
          kernel_init_freeable
            page_alloc_init_late
              kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid);
              wait_for_completion(&pgdat_init_all_done_comp);
              ...
              files_maxfiles_init
    
    It's clear that buffer_init depends on zone->managed_pages, but it's reset
    in reset_all_zones_managed_pages after that pages are readded into
    zone->managed_pages, but when buffer_init runs this process is half done
    and most of them will finally be added till deferred_init_memmap done.  In
    large memory couting of nr_free_buffer_pages drifts too much, also
    drifting from kernels to kernels on same hardware.
    
    Fix is simple, it delays buffer_init run till deferred_init_memmap all
    done.
    
    But as corrected by this patch, max_buffer_heads becomes very large, the
    value is roughly as many as 4 times of totalram_pages, formula:
    max_buffer_heads = nrpages * (10%) * (PAGE_SIZE / sizeof(struct
    buffer_head));
    
    Say in a 64GB memory box we have 16777216 pages, then max_buffer_heads
    turns out to be roughly 67,108,864.  In common cases, should a buffer_head
    be mapped to one page/block(4KB)?  So max_buffer_heads never exceeds
    totalram_pages.  IMO it's likely to make buffer_heads_over_limit bool
    value alwasy false, then make codes 'if (buffer_heads_over_limit)' test in
    vmscan unnecessary.
    
    So this patch will change the original behavior related to
    buffer_heads_over_limit in vmscan since we used a half done value of
    zone->managed_pages before, or should we use a smaller factor(<10%) in
    previous formula.
    
    akpm: I think this is OK - the max_buffer_heads code is only needed on
    highmem machines, to prevent ZONE_NORMAL from being consumed by large
    amounts of buffer_heads attached to highmem pagecache.  This problem will
    not occur on 64-bit machines, so this feature's non-functionality on such
    machines is a feature, not a bug.
    
    Link: https://lkml.kernel.org/r/20201123110500.103523-1-linf@wangsu.comSigned-off-by: default avatarLin Feng <linf@wangsu.com>
    Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    ba8f3587
page_alloc.c 248 KB