• Vlastimil Babka's avatar
    mm, compaction: introduce kcompactd · 698b1b30
    Vlastimil Babka authored
    Memory compaction can be currently performed in several contexts:
    
     - kswapd balancing a zone after a high-order allocation failure
     - direct compaction to satisfy a high-order allocation, including THP
       page fault attemps
     - khugepaged trying to collapse a hugepage
     - manually from /proc
    
    The purpose of compaction is two-fold.  The obvious purpose is to
    satisfy a (pending or future) high-order allocation, and is easy to
    evaluate.  The other purpose is to keep overal memory fragmentation low
    and help the anti-fragmentation mechanism.  The success wrt the latter
    purpose is more
    
    The current situation wrt the purposes has a few drawbacks:
    
     - compaction is invoked only when a high-order page or hugepage is not
       available (or manually).  This might be too late for the purposes of
       keeping memory fragmentation low.
     - direct compaction increases latency of allocations.  Again, it would
       be better if compaction was performed asynchronously to keep
       fragmentation low, before the allocation itself comes.
     - (a special case of the previous) the cost of compaction during THP
       page faults can easily offset the benefits of THP.
     - kswapd compaction appears to be complex, fragile and not working in
       some scenarios.  It could also end up compacting for a high-order
       allocation request when it should be reclaiming memory for a later
       order-0 request.
    
    To improve the situation, we should be able to benefit from an
    equivalent of kswapd, but for compaction - i.e. a background thread
    which responds to fragmentation and the need for high-order allocations
    (including hugepages) somewhat proactively.
    
    One possibility is to extend the responsibilities of kswapd, which could
    however complicate its design too much.  It should be better to let
    kswapd handle reclaim, as order-0 allocations are often more critical
    than high-order ones.
    
    Another possibility is to extend khugepaged, but this kthread is a
    single instance and tied to THP configs.
    
    This patch goes with the option of a new set of per-node kthreads called
    kcompactd, and lays the foundations, without introducing any new
    tunables.  The lifecycle mimics kswapd kthreads, including the memory
    hotplug hooks.
    
    For compaction, kcompactd uses the standard compaction_suitable() and
    ompact_finished() criteria and the deferred compaction functionality.
    Unlike direct compaction, it uses only sync compaction, as there's no
    allocation latency to minimize.
    
    This patch doesn't yet add a call to wakeup_kcompactd.  The kswapd
    compact/reclaim loop for high-order pages will be replaced by waking up
    kcompactd in the next patch with the description of what's wrong with
    the old approach.
    
    Waking up of the kcompactd threads is also tied to kswapd activity and
    follows these rules:
     - we don't want to affect any fastpaths, so wake up kcompactd only from
       the slowpath, as it's done for kswapd
     - if kswapd is doing reclaim, it's more important than compaction, so
       don't invoke kcompactd until kswapd goes to sleep
     - the target order used for kswapd is passed to kcompactd
    
    Future possible future uses for kcompactd include the ability to wake up
    kcompactd on demand in special situations, such as when hugepages are
    not available (currently not done due to __GFP_NO_KSWAPD) or when a
    fragmentation event (i.e.  __rmqueue_fallback()) occurs.  It's also
    possible to perform periodic compaction with kcompactd.
    
    [arnd@arndb.de: fix build errors with kcompactd]
    [paul.gortmaker@windriver.com: don't use modular references for non modular code]
    Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
    Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
    Cc: Hugh Dickins <hughd@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    698b1b30
vmstat.c 42.2 KB