• David Hildenbrand's avatar
    mm: streamline COW logic in do_swap_page() · c145e0b4
    David Hildenbrand authored
    Currently we have a different COW logic when:
    * triggering a read-fault to swapin first and then trigger a write-fault
      -> do_swap_page() + do_wp_page()
    * triggering a write-fault to swapin
      -> do_swap_page() + do_wp_page() only if we fail reuse in do_swap_page()
    
    The COW logic in do_swap_page() is different than our reuse logic in
    do_wp_page().  The COW logic in do_wp_page() -- page_count() == 1 -- makes
    currently sure that we certainly don't have a remaining reference, e.g.,
    via GUP, on the target page we want to reuse: if there is any unexpected
    reference, we have to copy to avoid information leaks.
    
    As do_swap_page() behaves differently, in environments with swap enabled
    we can currently have an unintended information leak from the parent to
    the child, similar as known from CVE-2020-29374:
    
    	1. Parent writes to anonymous page
    	-> Page is mapped writable and modified
    	2. Page is swapped out
    	-> Page is unmapped and replaced by swap entry
    	3. fork()
    	-> Swap entries are copied to child
    	4. Child pins page R/O
    	-> Page is mapped R/O into child
    	5. Child unmaps page
    	-> Child still holds GUP reference
    	6. Parent writes to page
    	-> Page is reused in do_swap_page()
    	-> Child can observe changes
    
    Exchanging 2. and 3. should have the same effect.
    
    Let's apply the same COW logic as in do_wp_page(), conditionally trying to
    remove the page from the swapcache after freeing the swap entry, however,
    before actually mapping our page.  We can change the order now that we use
    try_to_free_swap(), which doesn't care about the mapcount, instead of
    reuse_swap_page().
    
    To handle references from the LRU pagevecs, conditionally drain the local
    LRU pagevecs when required, however, don't consider the page_count() when
    deciding whether to drain to keep it simple for now.
    
    Link: https://lkml.kernel.org/r/20220131162940.210846-5-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Don Dutile <ddutile@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Jann Horn <jannh@google.com>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Liang Zhang <zhangliang5@huawei.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Mike Rapoport <rppt@linux.ibm.com>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Yang Shi <shy828301@gmail.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    c145e0b4
memory.c 150 KB