• Michal Hocko's avatar
    mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress · 373ccbe5
    Michal Hocko authored
    Tetsuo Handa has reported that the system might basically livelock in
    OOM condition without triggering the OOM killer.
    
    The issue is caused by internal dependency of the direct reclaim on
    vmstat counter updates (via zone_reclaimable) which are performed from
    the workqueue context.  If all the current workers get assigned to an
    allocation request, though, they will be looping inside the allocator
    trying to reclaim memory but zone_reclaimable can see stalled numbers so
    it will consider a zone reclaimable even though it has been scanned way
    too much.  WQ concurrency logic will not consider this situation as a
    congested workqueue because it relies that worker would have to sleep in
    such a situation.  This also means that it doesn't try to spawn new
    workers or invoke the rescuer thread if the one is assigned to the
    queue.
    
    In order to fix this issue we need to do two things.  First we have to
    let wq concurrency code know that we are in trouble so we have to do a
    short sleep.  In order to prevent from issues handled by 0e093d99
    ("writeback: do not sleep on the congestion queue if there are no
    congested BDIs or if significant congestion is not being encountered in
    the current zone") we limit the sleep only to worker threads which are
    the ones of the interest anyway.
    
    The second thing to do is to create a dedicated workqueue for vmstat and
    mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to
    have a spare worker thread for it.
    Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
    Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Cristopher Lameter <clameter@sgi.com>
    Cc: Joonsoo Kim <js1304@gmail.com>
    Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    373ccbe5
vmstat.c 41.3 KB