• Alistair Popple's avatar
    mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node · ddbc84f3
    Alistair Popple authored
    ZONE_MOVABLE uses the remaining memory in each node.  Its starting pfn
    is also aligned to MAX_ORDER_NR_PAGES.  It is possible for the remaining
    memory in a node to be less than MAX_ORDER_NR_PAGES, meaning there is
    not enough room for ZONE_MOVABLE on that node.
    
    Unfortunately this condition is not checked for.  This leads to
    zone_movable_pfn[] getting set to a pfn greater than the last pfn in a
    node.
    
    calculate_node_totalpages() then sets zone->present_pages to be greater
    than zone->spanned_pages which is invalid, as spanned_pages represents
    the maximum number of pages in a zone assuming no holes.
    
    Subsequently it is possible free_area_init_core() will observe a zone of
    size zero with present pages.  In this case it will skip setting up the
    zone, including the initialisation of free_lists[].
    
    However populated_zone() checks zone->present_pages to see if a zone has
    memory available.  This is used by iterators such as
    walk_zones_in_node().  pagetypeinfo_showfree() uses this to walk the
    free_list of each zone in each node, which are assumed to be initialised
    due to the zone not being empty.
    
    As free_area_init_core() never initialised the free_lists[] this results
    in the following kernel crash when trying to read /proc/pagetypeinfo:
    
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      #PF: supervisor read access in kernel mode
      #PF: error_code(0x0000) - not-present page
      PGD 0 P4D 0
      Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
      CPU: 0 PID: 456 Comm: cat Not tainted 5.16.0 #461
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
      RIP: 0010:pagetypeinfo_show+0x163/0x460
      Code: 9e 82 e8 80 57 0e 00 49 8b 06 b9 01 00 00 00 4c 39 f0 75 16 e9 65 02 00 00 48 83 c1 01 48 81 f9 a0 86 01 00 0f 84 48 02 00 00 <48> 8b 00 4c 39 f0 75 e7 48 c7 c2 80 a2 e2 82 48 c7 c6 79 ef e3 82
      RSP: 0018:ffffc90001c4bd10 EFLAGS: 00010003
      RAX: 0000000000000000 RBX: ffff88801105f638 RCX: 0000000000000001
      RDX: 0000000000000001 RSI: 000000000000068b RDI: ffff8880163dc68b
      RBP: ffffc90001c4bd90 R08: 0000000000000001 R09: ffff8880163dc67e
      R10: 656c6261766f6d6e R11: 6c6261766f6d6e55 R12: ffff88807ffb4a00
      R13: ffff88807ffb49f8 R14: ffff88807ffb4580 R15: ffff88807ffb3000
      FS:  00007f9c83eff5c0(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000013c8e000 CR4: 0000000000350ef0
      Call Trace:
       seq_read_iter+0x128/0x460
       proc_reg_read_iter+0x51/0x80
       new_sync_read+0x113/0x1a0
       vfs_read+0x136/0x1d0
       ksys_read+0x70/0xf0
       __x64_sys_read+0x1a/0x20
       do_syscall_64+0x3b/0xc0
       entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    Fix this by checking that the aligned zone_movable_pfn[] does not exceed
    the end of the node, and if it does skip creating a movable zone on this
    node.
    
    Link: https://lkml.kernel.org/r/20220215025831.2113067-1-apopple@nvidia.com
    Fixes: 2a1e274a ("Create the ZONE_MOVABLE zone")
    Signed-off-by: default avatarAlistair Popple <apopple@nvidia.com>
    Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
    Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Zi Yan <ziy@nvidia.com>
    Cc: Anshuman Khandual <anshuman.khandual@arm.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    ddbc84f3
page_alloc.c 265 KB