• Minchan Kim's avatar
    mm: fix unexpected zeroed page mapping with zram swap · e914d8f0
    Minchan Kim authored
    Two processes under CLONE_VM cloning, user process can be corrupted by
    seeing zeroed page unexpectedly.
    
          CPU A                        CPU B
    
      do_swap_page                do_swap_page
      SWP_SYNCHRONOUS_IO path     SWP_SYNCHRONOUS_IO path
      swap_readpage valid data
        swap_slot_free_notify
          delete zram entry
                                  swap_readpage zeroed(invalid) data
                                  pte_lock
                                  map the *zero data* to userspace
                                  pte_unlock
      pte_lock
      if (!pte_same)
        goto out_nomap;
      pte_unlock
      return and next refault will
      read zeroed data
    
    The swap_slot_free_notify is bogus for CLONE_VM case since it doesn't
    increase the refcount of swap slot at copy_mm so it couldn't catch up
    whether it's safe or not to discard data from backing device.  In the
    case, only the lock it could rely on to synchronize swap slot freeing is
    page table lock.  Thus, this patch gets rid of the swap_slot_free_notify
    function.  With this patch, CPU A will see correct data.
    
          CPU A                        CPU B
    
      do_swap_page                do_swap_page
      SWP_SYNCHRONOUS_IO path     SWP_SYNCHRONOUS_IO path
                                  swap_readpage original data
                                  pte_lock
                                  map the original data
                                  swap_free
                                    swap_range_free
                                      bd_disk->fops->swap_slot_free_notify
      swap_readpage read zeroed data
                                  pte_unlock
      pte_lock
      if (!pte_same)
        goto out_nomap;
      pte_unlock
      return
      on next refault will see mapped data by CPU B
    
    The concern of the patch would increase memory consumption since it
    could keep wasted memory with compressed form in zram as well as
    uncompressed form in address space.  However, most of cases of zram uses
    no readahead and do_swap_page is followed by swap_free so it will free
    the compressed form from in zram quickly.
    
    Link: https://lkml.kernel.org/r/YjTVVxIAsnKAXjTd@google.com
    Fixes: 0bcac06f ("mm, swap: skip swapcache for swapin of synchronous device")
    Reported-by: default avatarIvan Babrou <ivan@cloudflare.com>
    Tested-by: default avatarIvan Babrou <ivan@cloudflare.com>
    Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
    Cc: Nitin Gupta <ngupta@vflare.org>
    Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: <stable@vger.kernel.org>	[4.14+]
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    e914d8f0
page_io.c 9.74 KB