• Andrea Arcangeli's avatar
    oom: allow !__GFP_FS allocations access emergency reserves like __GFP_NOFAIL · fa175d10
    Andrea Arcangeli authored
    With the previous two commits I cannot reproduce any ext4 related
    livelocks anymore, however I hit ext4 memory corruption. ext4 thinks
    it can handle alloc_pages to fail and it doesn't use __GFP_NOFAIL in
    some places but it actually cannot. No surprise as those errors paths
    couldn't ever run so they're likely untested.
    
    I logged all the stack traces of all ext4 failures that lead to the
    ext4 final corruption, at least one of them should be the culprit (the
    lasts ones are more probable). The actual bug in the error paths
    should be found by code review (or the error paths should be deleted
    and __GFP_NOFAIL should be added to the gfp_mask).
    
    Until ext4 is fixed, it is safer to threat !__GFP_FS like __GFP_NOFAIL
    if TIF_MEMDIE is not set (so we cannot exercise any new allocation
    error path in kernel threads, because they're never picked as OOM
    killer victims and TIF_MEMDIE never gets set on them).
    
    I assume other filesystems may have become complacent of this
    accommodating allocator behavior that cannot fail an allocation if
    invoked by a kernel thread too, but the longer we keep the
    __GFP_NOFAIL behavior in should_alloc_retry for small order
    allocations, the less robust these error paths will become and the
    harder it will be to remove this livelock prone assumption in
    should_alloc_retry. In fact we should remove that assumption not just
    for !__GFP_FS allocations.
    
    In practice with this fix there's no regression and all livelocks are
    still gone. The only risk in this approach is to extinguish the
    emergency reserves earlier than before but only during OOM (during
    normal runtime GFP_ATOMIC allocation or other __GFP_MEMALLOC
    allocation reliability is not affected). Clearly this actually reduces
    the livelock risk (verified in practice too) so it is a low risk net
    improvement to the OOM handling with no risk of regression because
    this way no new allocation error paths is exercised.
    fa175d10
page_alloc.c 185 KB