Commit 134fca90 authored by Jiri Kosina's avatar Jiri Kosina Committed by Linus Torvalds

mm/mincore.c: make mincore() more conservative

The semantics of what mincore() considers to be resident is not
completely clear, but Linux has always (since 2.3.52, which is when
mincore() was initially done) treated it as "page is available in page
cache".

That's potentially a problem, as that [in]directly exposes
meta-information about pagecache / memory mapping state even about
memory not strictly belonging to the process executing the syscall,
opening possibilities for sidechannel attacks.

Change the semantics of mincore() so that it only reveals pagecache
information for non-anonymous mappings that belog to files that the
calling process could (if it tried to) successfully open for writing;
otherwise we'd be including shared non-exclusive mappings, which

 - is the sidechannel

 - is not the usecase for mincore(), as that's primarily used for data,
   not (shared) text

[jkosina@suse.cz: v2]
  Link: http://lkml.kernel.org/r/20190312141708.6652-2-vbabka@suse.cz
[mhocko@suse.com: restructure can_do_mincore() conditions]
Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1903062342020.19912@cbobk.fhfr.pmSigned-off-by: default avatarJiri Kosina <jkosina@suse.cz>
Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
Acked-by: default avatarJosh Snyder <joshs@netflix.com>
Acked-by: default avatarMichal Hocko <mhocko@suse.com>
Originally-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
Originally-by: default avatarDominique Martinet <asmadeus@codewreck.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Cyril Hrubis <chrubis@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Daniel Gruss <daniel@gruss.cc>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 97500a4a
...@@ -169,6 +169,22 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, ...@@ -169,6 +169,22 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
return 0; return 0;
} }
static inline bool can_do_mincore(struct vm_area_struct *vma)
{
if (vma_is_anonymous(vma))
return true;
if (!vma->vm_file)
return false;
/*
* Reveal pagecache information only for non-anonymous mappings that
* correspond to the files the calling process could (if tried) open
* for writing; otherwise we'd be including shared non-exclusive
* mappings, which opens a side channel.
*/
return inode_owner_or_capable(file_inode(vma->vm_file)) ||
inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0;
}
/* /*
* Do a chunk of "sys_mincore()". We've already checked * Do a chunk of "sys_mincore()". We've already checked
* all the arguments, we hold the mmap semaphore: we should * all the arguments, we hold the mmap semaphore: we should
...@@ -189,8 +205,13 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v ...@@ -189,8 +205,13 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v
vma = find_vma(current->mm, addr); vma = find_vma(current->mm, addr);
if (!vma || addr < vma->vm_start) if (!vma || addr < vma->vm_start)
return -ENOMEM; return -ENOMEM;
mincore_walk.mm = vma->vm_mm;
end = min(vma->vm_end, addr + (pages << PAGE_SHIFT)); end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
if (!can_do_mincore(vma)) {
unsigned long pages = DIV_ROUND_UP(end - addr, PAGE_SIZE);
memset(vec, 1, pages);
return pages;
}
mincore_walk.mm = vma->vm_mm;
err = walk_page_range(addr, end, &mincore_walk); err = walk_page_range(addr, end, &mincore_walk);
if (err < 0) if (err < 0)
return err; return err;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment