• Vlastimil Babka's avatar
    mm: include VM_MIXEDMAP flag in the VM_SPECIAL list to avoid m(un)locking · 9050d7eb
    Vlastimil Babka authored
    Daniel Borkmann reported a VM_BUG_ON assertion failing:
    
      ------------[ cut here ]------------
      kernel BUG at mm/mlock.c:528!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: ccm arc4 iwldvm [...]
       video
      CPU: 3 PID: 2266 Comm: netsniff-ng Not tainted 3.14.0-rc2+ #8
      Hardware name: LENOVO 2429BP3/2429BP3, BIOS G4ET37WW (1.12 ) 05/29/2012
      task: ffff8801f87f9820 ti: ffff88002cb44000 task.ti: ffff88002cb44000
      RIP: 0010:[<ffffffff81171ad0>]  [<ffffffff81171ad0>] munlock_vma_pages_range+0x2e0/0x2f0
      Call Trace:
        do_munmap+0x18f/0x3b0
        vm_munmap+0x41/0x60
        SyS_munmap+0x22/0x30
        system_call_fastpath+0x1a/0x1f
      RIP   munlock_vma_pages_range+0x2e0/0x2f0
      ---[ end trace a0088dcf07ae10f2 ]---
    
    because munlock_vma_pages_range() thinks it's unexpectedly in the middle
    of a THP page.  This can be reproduced with default config since 3.11
    kernels.  A reproducer can be found in the kernel's selftest directory
    for networking by running ./psock_tpacket.
    
    The problem is that an order=2 compound page (allocated by
    alloc_one_pg_vec_page() is part of the munlocked VM_MIXEDMAP vma (mapped
    by packet_mmap()) and mistaken for a THP page and assumed to be order=9.
    
    The checks for THP in munlock came with commit ff6a6da6 ("mm:
    accelerate munlock() treatment of THP pages"), i.e.  since 3.9, but did
    not trigger a bug.  It just makes munlock_vma_pages_range() skip such
    compound pages until the next 512-pages-aligned page, when it encounters
    a head page.  This is however not a problem for vma's where mlocking has
    no effect anyway, but it can distort the accounting.
    
    Since commit 7225522b ("mm: munlock: batch non-THP page isolation
    and munlock+putback using pagevec") this can trigger a VM_BUG_ON in
    PageTransHuge() check.
    
    This patch fixes the issue by adding VM_MIXEDMAP flag to VM_SPECIAL, a
    list of flags that make vma's non-mlockable and non-mergeable.  The
    reasoning is that VM_MIXEDMAP vma's are similar to VM_PFNMAP, which is
    already on the VM_SPECIAL list, and both are intended for non-LRU pages
    where mlocking makes no sense anyway.  Related Lkml discussion can be
    found in [2].
    
     [1] tools/testing/selftests/net/psock_tpacket
     [2] https://lkml.org/lkml/2014/1/10/427Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
    Reported-by: default avatarDaniel Borkmann <dborkman@redhat.com>
    Tested-by: default avatarDaniel Borkmann <dborkman@redhat.com>
    Cc: Thomas Hellstrom <thellstrom@vmware.com>
    Cc: John David Anglin <dave.anglin@bell.net>
    Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
    Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
    Cc: Carsten Otte <cotte@de.ibm.com>
    Cc: Jared Hulbert <jaredeh@gmail.com>
    Tested-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Acked-by: default avatarRik van Riel <riel@redhat.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: <stable@vger.kernel.org> [3.11.x+]
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    9050d7eb
huge_memory.c 77.7 KB