Commit 0bc1f8b0 authored by Chen Yucong's avatar Chen Yucong Committed by Linus Torvalds

hwpoison: fix the handling path of the victimized page frame that belong to non-LRU

Until now, the kernel has the same policy to handle victimized page
frames that belong to kernel-space(reserved/slab-subsystem) or
non-LRU(unknown page state).  In other word, the result of handling
either of these victimized page frames is (IGNORED | FAILED), and the
return value of memory_failure() is -EBUSY.

This patch is to avoid that memory_failure() returns very soon due to
the "true" value of (!PageLRU(p)), and it also ensures that
action_result() can report more precise information("reserved kernel",
"kernel slab", and "unknown page state") instead of "non LRU",
especially for memory errors which are detected by memory-scrubbing.

Andi said:

: While running the mcelog test suite on 3.14 I hit the following VM_BUG_ON:
:
: soft_offline: 0x56d4: unknown non LRU page type 3ffff800008000
: page:ffffea000015b400 count:3 mapcount:2097169 mapping:          (null) index:0xffff8800056d7000
: page flags: 0x3ffff800004081(locked|slab|head)
: ------------[ cut here ]------------
: kernel BUG at mm/rmap.c:1495!
:
: I think what happened is that a LRU page turned into a slab page in
: parallel with offlining.  memory_failure initially tests for this case,
: but doesn't retest later after the page has been locked.
:
: ...
:
: I ran this patch in a loop over night with some stress plus
: the mcelog test suite running in a loop. I cannot guarantee it hit it,
: but it should have given it a good beating.
:
: The kernel survived with no messages, although the mcelog test suite
: got killed at some point because it couldn't fork anymore. Probably
: some unrelated problem.
:
: So the patch is ok for me for .16.
Signed-off-by: default avatarChen Yucong <slaoub@gmail.com>
Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: default avatarAndi Kleen <andi@firstfloor.org>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent b27ebf77
...@@ -895,7 +895,7 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn, ...@@ -895,7 +895,7 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
struct page *hpage = *hpagep; struct page *hpage = *hpagep;
struct page *ppage; struct page *ppage;
if (PageReserved(p) || PageSlab(p)) if (PageReserved(p) || PageSlab(p) || !PageLRU(p))
return SWAP_SUCCESS; return SWAP_SUCCESS;
/* /*
...@@ -1159,9 +1159,6 @@ int memory_failure(unsigned long pfn, int trapno, int flags) ...@@ -1159,9 +1159,6 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
action_result(pfn, "free buddy, 2nd try", DELAYED); action_result(pfn, "free buddy, 2nd try", DELAYED);
return 0; return 0;
} }
action_result(pfn, "non LRU", IGNORED);
put_page(p);
return -EBUSY;
} }
} }
...@@ -1194,6 +1191,9 @@ int memory_failure(unsigned long pfn, int trapno, int flags) ...@@ -1194,6 +1191,9 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
return 0; return 0;
} }
if (!PageHuge(p) && !PageTransTail(p) && !PageLRU(p))
goto identify_page_state;
/* /*
* For error on the tail page, we should set PG_hwpoison * For error on the tail page, we should set PG_hwpoison
* on the head page to show that the hugepage is hwpoisoned * on the head page to show that the hugepage is hwpoisoned
...@@ -1243,6 +1243,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags) ...@@ -1243,6 +1243,7 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
goto out; goto out;
} }
identify_page_state:
res = -EBUSY; res = -EBUSY;
/* /*
* The first check uses the current page flags which may not have any * The first check uses the current page flags which may not have any
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment