• Vlastimil Babka's avatar
    mm, page_alloc: actually ignore mempolicies for high priority allocations · d6a24df0
    Vlastimil Babka authored
    __alloc_pages_slowpath() has for a long time contained code to ignore
    node restrictions from memory policies for high priority allocations.
    The current code that resets the zonelist iterator however does
    effectively nothing after commit 7810e678 ("mm, page_alloc: do not
    break __GFP_THISNODE by zonelist reset") removed a buggy zonelist reset.
    Even before that commit, mempolicy restrictions were still not ignored,
    as they are passed in ac->nodemask which is untouched by the code.
    
    We can either remove the code, or make it work as intended.  Since
    ac->nodemask can be set from task's mempolicy via alloc_pages_current()
    and thus also alloc_pages(), it may indeed affect kernel allocations,
    and it makes sense to ignore it to allow progress for high priority
    allocations.
    
    Thus, this patch resets ac->nodemask to NULL in such cases.  This
    assumes all callers can handle it (i.e.  there are no guarantees as in
    the case of __GFP_THISNODE) which seems to be the case.  The same
    assumption is already present in check_retry_cpuset() for some time.
    
    The expected effect is that high priority kernel allocations in the
    context of userspace tasks (e.g.  OOM victims) restricted by mempolicies
    will have higher chance to succeed if they are restricted to nodes with
    depleted memory, while there are other nodes with free memory left.
    
    It's not a new intention, but for the first time the code will match the
    intention, AFAICS.  It was intended by commit 183f6371 ("mm: ignore
    mempolicies when using ALLOC_NO_WATERMARK") in v3.6 but I think it never
    really worked, as mempolicy restriction was already encoded in nodemask,
    not zonelist, at that time.
    
    So originally that was for ALLOC_NO_WATERMARK only.  Then it was
    adjusted by e46e7b77 ("mm, page_alloc: recalculate the preferred
    zoneref if the context can ignore memory policies") and cd04ae1e
    ("mm, oom: do not rely on TIF_MEMDIE for memory reserves access") to the
    current state.  So even GFP_ATOMIC would now ignore mempolicies after
    the initial attempts fail - if the code worked as people thought it
    does.
    
    Link: http://lkml.kernel.org/r/20180612122624.8045-1-vbabka@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    d6a24df0
page_alloc.c 220 KB