• Aili Yao's avatar
    mm/gup: check page posion status for coredump. · d3378e86
    Aili Yao authored
    When we do coredump for user process signal, this may be an SIGBUS signal
    with BUS_MCEERR_AR or BUS_MCEERR_AO code, which means this signal is
    resulted from ECC memory fail like SRAR or SRAO, we expect the memory
    recovery work is finished correctly, then the get_dump_page() will not
    return the error page as its process pte is set invalid by
    memory_failure().
    
    But memory_failure() may fail, and the process's related pte may not be
    correctly set invalid, for current code, we will return the poison page,
    get it dumped, and then lead to system panic as its in kernel code.
    
    So check the poison status in get_dump_page(), and if TRUE, return NULL.
    
    There maybe other scenario that is also better to check the posion status
    and not to panic, so make a wrapper for this check, Thanks to David's
    suggestion(<david@redhat.com>).
    
    [akpm@linux-foundation.org: s/0/false/]
    [yaoaili@kingsoft.com: is_page_poisoned() arg cannot be null, per Matthew]
    
    Link: https://lkml.kernel.org/r/20210322115233.05e4e82a@alex-virtual-machine
    Link: https://lkml.kernel.org/r/20210319104437.6f30e80d@alex-virtual-machineSigned-off-by: default avatarAili Yao <yaoaili@kingsoft.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Aili Yao <yaoaili@kingsoft.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    d3378e86
gup.c 81.1 KB