Commit 11b914ee authored by David Hildenbrand's avatar David Hildenbrand Committed by Andrew Morton

mm/memory: move page_count() check into validate_page_before_insert()

Patch series "mm/memory: cleanly support zeropage in vm_insert_page*(),
vm_map_pages*() and vmf_insert_mixed()", v2.

There is interest in mapping zeropages via vm_insert_pages() [1] into
MAP_SHARED mappings.

For now, we only get zeropages in MAP_SHARED mappings via
vmf_insert_mixed() from FSDAX code, and I think it's a bit shaky in some
cases because we refcount the zeropage when mapping it but not necessarily
always when unmapping it ...  and we should actually never refcount it.

It's all a bit tricky, especially how zeropages in MAP_SHARED mappings
interact with GUP (FOLL_LONGTERM), mprotect(), write-faults and s390x
forbidding the shared zeropage (rewrite [2] s now upstream).

This series tries to take the careful approach of only allowing the
zeropage where it is likely safe to use (which should cover the existing
FSDAX use case and [1]), preventing that it could accidentally get mapped
writable during a write fault, mprotect() etc, and preventing issues with
FOLL_LONGTERM in the future with other users.

Tested with a patch from Vincent that uses the zeropage in context of
[1].

[1] https://lkml.kernel.org/r/20240430111354.637356-1-vdonnefort@google.com
[2] https://lkml.kernel.org/r/20240411161441.910170-1-david@redhat.com


This patch (of 3):

We'll now also cover the case where insert_page() is called from
__vm_insert_mixed(), which sounds like the right thing to do.

Link: https://lkml.kernel.org/r/20240522125713.775114-2-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
parent c66b0a05
...@@ -1981,6 +1981,8 @@ static int validate_page_before_insert(struct page *page) ...@@ -1981,6 +1981,8 @@ static int validate_page_before_insert(struct page *page)
{ {
struct folio *folio = page_folio(page); struct folio *folio = page_folio(page);
if (!folio_ref_count(folio))
return -EINVAL;
if (folio_test_anon(folio) || folio_test_slab(folio) || if (folio_test_anon(folio) || folio_test_slab(folio) ||
page_has_type(page)) page_has_type(page))
return -EINVAL; return -EINVAL;
...@@ -2035,8 +2037,6 @@ static int insert_page_in_batch_locked(struct vm_area_struct *vma, pte_t *pte, ...@@ -2035,8 +2037,6 @@ static int insert_page_in_batch_locked(struct vm_area_struct *vma, pte_t *pte,
{ {
int err; int err;
if (!page_count(page))
return -EINVAL;
err = validate_page_before_insert(page); err = validate_page_before_insert(page);
if (err) if (err)
return err; return err;
...@@ -2170,8 +2170,6 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr, ...@@ -2170,8 +2170,6 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
{ {
if (addr < vma->vm_start || addr >= vma->vm_end) if (addr < vma->vm_start || addr >= vma->vm_end)
return -EFAULT; return -EFAULT;
if (!page_count(page))
return -EINVAL;
if (!(vma->vm_flags & VM_MIXEDMAP)) { if (!(vma->vm_flags & VM_MIXEDMAP)) {
BUG_ON(mmap_read_trylock(vma->vm_mm)); BUG_ON(mmap_read_trylock(vma->vm_mm));
BUG_ON(vma->vm_flags & VM_PFNMAP); BUG_ON(vma->vm_flags & VM_PFNMAP);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment