oom: allow !__GFP_FS allocations access emergency reserves like __GFP_NOFAIL
With the previous two commits I cannot reproduce any ext4 related livelocks anymore, however I hit ext4 memory corruption. ext4 thinks it can handle alloc_pages to fail and it doesn't use __GFP_NOFAIL in some places but it actually cannot. No surprise as those errors paths couldn't ever run so they're likely untested. I logged all the stack traces of all ext4 failures that lead to the ext4 final corruption, at least one of them should be the culprit (the lasts ones are more probable). The actual bug in the error paths should be found by code review (or the error paths should be deleted and __GFP_NOFAIL should be added to the gfp_mask). Until ext4 is fixed, it is safer to threat !__GFP_FS like __GFP_NOFAIL if TIF_MEMDIE is not set (so we cannot exercise any new allocation error path in kernel threads, because they're never picked as OOM killer victims and TIF_MEMDIE never gets set on them). I assume other filesystems may have become complacent of this accommodating allocator behavior that cannot fail an allocation if invoked by a kernel thread too, but the longer we keep the __GFP_NOFAIL behavior in should_alloc_retry for small order allocations, the less robust these error paths will become and the harder it will be to remove this livelock prone assumption in should_alloc_retry. In fact we should remove that assumption not just for !__GFP_FS allocations. In practice with this fix there's no regression and all livelocks are still gone. The only risk in this approach is to extinguish the emergency reserves earlier than before but only during OOM (during normal runtime GFP_ATOMIC allocation or other __GFP_MEMALLOC allocation reliability is not affected). Clearly this actually reduces the livelock risk (verified in practice too) so it is a low risk net improvement to the OOM handling with no risk of regression because this way no new allocation error paths is exercised.
Showing
Please register or sign in to comment