Commit fa175d10 authored by Andrea Arcangeli's avatar Andrea Arcangeli

oom: allow !__GFP_FS allocations access emergency reserves like __GFP_NOFAIL

With the previous two commits I cannot reproduce any ext4 related
livelocks anymore, however I hit ext4 memory corruption. ext4 thinks
it can handle alloc_pages to fail and it doesn't use __GFP_NOFAIL in
some places but it actually cannot. No surprise as those errors paths
couldn't ever run so they're likely untested.

I logged all the stack traces of all ext4 failures that lead to the
ext4 final corruption, at least one of them should be the culprit (the
lasts ones are more probable). The actual bug in the error paths
should be found by code review (or the error paths should be deleted
and __GFP_NOFAIL should be added to the gfp_mask).

Until ext4 is fixed, it is safer to threat !__GFP_FS like __GFP_NOFAIL
if TIF_MEMDIE is not set (so we cannot exercise any new allocation
error path in kernel threads, because they're never picked as OOM
killer victims and TIF_MEMDIE never gets set on them).

I assume other filesystems may have become complacent of this
accommodating allocator behavior that cannot fail an allocation if
invoked by a kernel thread too, but the longer we keep the
__GFP_NOFAIL behavior in should_alloc_retry for small order
allocations, the less robust these error paths will become and the
harder it will be to remove this livelock prone assumption in
should_alloc_retry. In fact we should remove that assumption not just
for !__GFP_FS allocations.

In practice with this fix there's no regression and all livelocks are
still gone. The only risk in this approach is to extinguish the
emergency reserves earlier than before but only during OOM (during
normal runtime GFP_ATOMIC allocation or other __GFP_MEMALLOC
allocation reliability is not affected). Clearly this actually reduces
the livelock risk (verified in practice too) so it is a low risk net
improvement to the OOM handling with no risk of regression because
this way no new allocation error paths is exercised.
parent 47fb3887
......@@ -2359,7 +2359,7 @@ should_alloc_retry(gfp_t gfp_mask, unsigned int order,
* the PG_lock and in turn preventing the OOM killer victim
* task to exit).
*/
if (order <= PAGE_ALLOC_COSTLY_ORDER && (gfp_mask & __GFP_FS))
if (order <= PAGE_ALLOC_COSTLY_ORDER)
return 1;
/*
......@@ -2377,16 +2377,17 @@ should_alloc_retry(gfp_t gfp_mask, unsigned int order,
static inline int gfp_to_alloc_flags(gfp_t gfp_mask);
static void gfp_nofail_emergency(gfp_t *gfp_mask, int *alloc_flags,
static void gfp_emergency(gfp_t *gfp_mask, int *alloc_flags,
unsigned int order)
{
/*
* If we reached an out of memory condition in the context of
* a __GFP_NOFAIL (in turn livelock prone) allocation try to
* give access to the emergency pools, otherwise we could
* livelock.
* a __GFP_NOFAIL or a !__GFP_FS (in turn livelock prone)
* allocation try to give access to the emergency pools,
* otherwise we could livelock.
*/
if ((*gfp_mask & __GFP_NOFAIL) && !order) {
if (((*gfp_mask & __GFP_NOFAIL) || !(*gfp_mask & __GFP_FS)) &&
!order) {
*gfp_mask |= __GFP_MEMALLOC;
*gfp_mask &= ~__GFP_NOMEMALLOC;
*alloc_flags = gfp_to_alloc_flags(*gfp_mask);
......@@ -2434,6 +2435,7 @@ __alloc_pages_may_oom(gfp_t *gfp_mask, unsigned int order, int *alloc_flags,
goto out;
/* The OOM killer does not compensate for light reclaim */
if (!(*gfp_mask & __GFP_FS)) {
gfp_emergency(gfp_mask, alloc_flags, order);
/*
* XXX: Page reclaim didn't yield anything,
* and the OOM killer can't be invoked, but
......@@ -2446,7 +2448,7 @@ __alloc_pages_may_oom(gfp_t *gfp_mask, unsigned int order, int *alloc_flags,
if (*gfp_mask & __GFP_THISNODE)
goto out;
} else
gfp_nofail_emergency(gfp_mask, alloc_flags, order);
gfp_emergency(gfp_mask, alloc_flags, order);
/* Exhausted what can be done so it's blamo time */
if (out_of_memory(ac->zonelist, *gfp_mask, order, ac->nodemask, false)
|| WARN_ON_ONCE(*gfp_mask & __GFP_NOFAIL))
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment