• Yang Shi's avatar
    mm: vmscan: correct some vmscan counters for THP swapout · 98879b3b
    Yang Shi authored
    Commit bd4c82c2 ("mm, THP, swap: delay splitting THP after swapped
    out"), THP can be swapped out in a whole.  But, nr_reclaimed and some
    other vm counters still get inc'ed by one even though a whole THP (512
    pages) gets swapped out.
    
    This doesn't make too much sense to memory reclaim.
    
    For example, direct reclaim may just need reclaim SWAP_CLUSTER_MAX
    pages, reclaiming one THP could fulfill it.  But, if nr_reclaimed is not
    increased correctly, direct reclaim may just waste time to reclaim more
    pages, SWAP_CLUSTER_MAX * 512 pages in worst case.
    
    And, it may cause pgsteal_{kswapd|direct} is greater than
    pgscan_{kswapd|direct}, like the below:
    
    pgsteal_kswapd 122933
    pgsteal_direct 26600225
    pgscan_kswapd 174153
    pgscan_direct 14678312
    
    nr_reclaimed and nr_scanned must be fixed in parallel otherwise it would
    break some page reclaim logic, e.g.
    
    vmpressure: this looks at the scanned/reclaimed ratio so it won't change
    semantics as long as scanned & reclaimed are fixed in parallel.
    
    compaction/reclaim: compaction wants a certain number of physical pages
    freed up before going back to compacting.
    
    kswapd priority raising: kswapd raises priority if we scan fewer pages
    than the reclaim target (which itself is obviously expressed in order-0
    pages).  As a result, kswapd can falsely raise its aggressiveness even
    when it's making great progress.
    
    Other than nr_scanned and nr_reclaimed, some other counters, e.g.
    pgactivate, nr_skipped, nr_ref_keep and nr_unmap_fail need to be fixed too
    since they are user visible via cgroup, /proc/vmstat or trace points,
    otherwise they would be underreported.
    
    When isolating pages from LRUs, nr_taken has been accounted in base page,
    but nr_scanned and nr_skipped are still accounted in THP.  It doesn't make
    too much sense too since this may cause trace point underreport the
    numbers as well.
    
    So accounting those counters in base page instead of accounting THP as one
    page.
    
    nr_dirty, nr_unqueued_dirty, nr_congested and nr_writeback are used by
    file cache, so they are not impacted by THP swap.
    
    This change may result in lower steal/scan ratio in some cases since THP
    may get split during page reclaim, then a part of tail pages get reclaimed
    instead of the whole 512 pages, but nr_scanned is accounted by 512,
    particularly for direct reclaim.  But, this should be not a significant
    issue.
    
    Link: http://lkml.kernel.org/r/1559025859-72759-2-git-send-email-yang.shi@linux.alibaba.comSigned-off-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
    Reviewed-by: default avatar"Huang, Ying" <ying.huang@intel.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Hillf Danton <hdanton@sina.com>
    Cc: Josef Bacik <josef@toxicpanda.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    98879b3b
vmscan.c 122 KB