Commit 5b51072e authored by Andrea Arcangeli's avatar Andrea Arcangeli Committed by Linus Torvalds

userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem

Userfaultfd did not create private memory when UFFDIO_COPY was invoked
on a MAP_PRIVATE shmem mapping.  Instead it wrote to the shmem file,
even when that had not been opened for writing.  Though, fortunately,
that could only happen where there was a hole in the file.

Fix the shmem-backed implementation of UFFDIO_COPY to create private
memory for MAP_PRIVATE mappings.  The hugetlbfs-backed implementation
was already correct.

This change is visible to userland, if userfaultfd has been used in
unintended ways: so it introduces a small risk of incompatibility, but
is necessary in order to respect file permissions.

An app that uses UFFDIO_COPY for anything like postcopy live migration
won't notice the difference, and in fact it'll run faster because there
will be no copy-on-write and memory waste in the tmpfs pagecache
anymore.

Userfaults on MAP_PRIVATE shmem keep triggering only on file holes like
before.

The real zeropage can also be built on a MAP_PRIVATE shmem mapping
through UFFDIO_ZEROPAGE and that's safe because the zeropage pte is
never dirty, in turn even an mprotect upgrading the vma permission from
PROT_READ to PROT_READ|PROT_WRITE won't make the zeropage pte writable.

Link: http://lkml.kernel.org/r/20181126173452.26955-3-aarcange@redhat.com
Fixes: 4c27fe4c ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
Reported-by: default avatarMike Rapoport <rppt@linux.ibm.com>
Reviewed-by: default avatarHugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 9e368259
...@@ -380,7 +380,17 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, ...@@ -380,7 +380,17 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm,
{ {
ssize_t err; ssize_t err;
if (vma_is_anonymous(dst_vma)) { /*
* The normal page fault path for a shmem will invoke the
* fault, fill the hole in the file and COW it right away. The
* result generates plain anonymous memory. So when we are
* asked to fill an hole in a MAP_PRIVATE shmem mapping, we'll
* generate anonymous memory directly without actually filling
* the hole. For the MAP_PRIVATE case the robustness check
* only happens in the pagetable (to verify it's still none)
* and not in the radix tree.
*/
if (!(dst_vma->vm_flags & VM_SHARED)) {
if (!zeropage) if (!zeropage)
err = mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma, err = mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma,
dst_addr, src_addr, page); dst_addr, src_addr, page);
...@@ -489,7 +499,8 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, ...@@ -489,7 +499,8 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
* dst_vma. * dst_vma.
*/ */
err = -ENOMEM; err = -ENOMEM;
if (vma_is_anonymous(dst_vma) && unlikely(anon_vma_prepare(dst_vma))) if (!(dst_vma->vm_flags & VM_SHARED) &&
unlikely(anon_vma_prepare(dst_vma)))
goto out_unlock; goto out_unlock;
while (src_addr < src_start + len) { while (src_addr < src_start + len) {
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment