• Dave Hansen's avatar
    x86/mpx, mm/core: Fix recursive munmap() corruption · 5a28fc94
    Dave Hansen authored
    This is a bit of a mess, to put it mildly.  But, it's a bug
    that only seems to have showed up in 4.20 but wasn't noticed
    until now, because nobody uses MPX.
    
    MPX has the arch_unmap() hook inside of munmap() because MPX
    uses bounds tables that protect other areas of memory.  When
    memory is unmapped, there is also a need to unmap the MPX
    bounds tables.  Barring this, unused bounds tables can eat 80%
    of the address space.
    
    But, the recursive do_munmap() that gets called vi arch_unmap()
    wreaks havoc with __do_munmap()'s state.  It can result in
    freeing populated page tables, accessing bogus VMA state,
    double-freed VMAs and more.
    
    See the "long story" further below for the gory details.
    
    To fix this, call arch_unmap() before __do_unmap() has a chance
    to do anything meaningful.  Also, remove the 'vma' argument
    and force the MPX code to do its own, independent VMA lookup.
    
    == UML / unicore32 impact ==
    
    Remove unused 'vma' argument to arch_unmap().  No functional
    change.
    
    I compile tested this on UML but not unicore32.
    
    == powerpc impact ==
    
    powerpc uses arch_unmap() well to watch for munmap() on the
    VDSO and zeroes out 'current->mm->context.vdso_base'.  Moving
    arch_unmap() makes this happen earlier in __do_munmap().  But,
    'vdso_base' seems to only be used in perf and in the signal
    delivery that happens near the return to userspace.  I can not
    find any likely impact to powerpc, other than the zeroing
    happening a little earlier.
    
    powerpc does not use the 'vma' argument and is unaffected by
    its removal.
    
    I compile-tested a 64-bit powerpc defconfig.
    
    == x86 impact ==
    
    For the common success case this is functionally identical to
    what was there before.  For the munmap() failure case, it's
    possible that some MPX tables will be zapped for memory that
    continues to be in use.  But, this is an extraordinarily
    unlikely scenario and the harm would be that MPX provides no
    protection since the bounds table got reset (zeroed).
    
    I can't imagine anyone doing this:
    
    	ptr = mmap();
    	// use ptr
    	ret = munmap(ptr);
    	if (ret)
    		// oh, there was an error, I'll
    		// keep using ptr.
    
    Because if you're doing munmap(), you are *done* with the
    memory.  There's probably no good data in there _anyway_.
    
    This passes the original reproducer from Richard Biener as
    well as the existing mpx selftests/.
    
    The long story:
    
    munmap() has a couple of pieces:
    
     1. Find the affected VMA(s)
     2. Split the start/end one(s) if neceesary
     3. Pull the VMAs out of the rbtree
     4. Actually zap the memory via unmap_region(), including
        freeing page tables (or queueing them to be freed).
     5. Fix up some of the accounting (like fput()) and actually
        free the VMA itself.
    
    This specific ordering was actually introduced by:
    
      dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap")
    
    during the 4.20 merge window.  The previous __do_munmap() code
    was actually safe because the only thing after arch_unmap() was
    remove_vma_list().  arch_unmap() could not see 'vma' in the
    rbtree because it was detached, so it is not even capable of
    doing operations unsafe for remove_vma_list()'s use of 'vma'.
    
    Richard Biener reported a test that shows this in dmesg:
    
      [1216548.787498] BUG: Bad rss-counter state mm:0000000017ce560b idx:1 val:551
      [1216548.787500] BUG: non-zero pgtables_bytes on freeing mm: 24576
    
    What triggered this was the recursive do_munmap() called via
    arch_unmap().  It was freeing page tables that has not been
    properly zapped.
    
    But, the problem was bigger than this.  For one, arch_unmap()
    can free VMAs.  But, the calling __do_munmap() has variables
    that *point* to VMAs and obviously can't handle them just
    getting freed while the pointer is still in use.
    
    I tried a couple of things here.  First, I tried to fix the page
    table freeing problem in isolation, but I then found the VMA
    issue.  I also tried having the MPX code return a flag if it
    modified the rbtree which would force __do_munmap() to re-walk
    to restart.  That spiralled out of control in complexity pretty
    fast.
    
    Just moving arch_unmap() and accepting that the bonkers failure
    case might eat some bounds tables seems like the simplest viable
    fix.
    
    This was also reported in the following kernel bugzilla entry:
    
      https://bugzilla.kernel.org/show_bug.cgi?id=203123
    
    There are some reports that this commit triggered this bug:
    
      dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap")
    
    While that commit certainly made the issues easier to hit, I believe
    the fundamental issue has been with us as long as MPX itself, thus
    the Fixes: tag below is for one of the original MPX commits.
    
    [ mingo: Minor edits to the changelog and the patch. ]
    Reported-by: default avatarRichard Biener <rguenther@suse.de>
    Reported-by: default avatarH.J. Lu <hjl.tools@gmail.com>
    Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
    Reviewed-by Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
    Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Jeff Dike <jdike@addtoit.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: linux-arch@vger.kernel.org
    Cc: linux-mm@kvack.org
    Cc: linux-um@lists.infradead.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: stable@vger.kernel.org
    Fixes: dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap")
    Link: http://lkml.kernel.org/r/20190419194747.5E1AD6DC@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    5a28fc94
mpx.h 3.05 KB