• Michal Hocko's avatar
    mm, memory_hotplug: fix off-by-one in is_pageblock_removable · 891cb2a7
    Michal Hocko authored
    Rong Chen has reported the following boot crash:
    
        PGD 0 P4D 0
        Oops: 0000 [#1] PREEMPT SMP PTI
        CPU: 1 PID: 239 Comm: udevd Not tainted 5.0.0-rc4-00149-gefad4e47 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
        RIP: 0010:page_mapping+0x12/0x80
        Code: 5d c3 48 89 df e8 0e ad 02 00 85 c0 75 da 89 e8 5b 5d c3 0f 1f 44 00 00 53 48 89 fb 48 8b 43 08 48 8d 50 ff a8 01 48 0f 45 da <48> 8b 53 08 48 8d 42 ff 83 e2 01 48 0f 44 c3 48 83 38 ff 74 2f 48
        RSP: 0018:ffff88801fa87cd8 EFLAGS: 00010202
        RAX: ffffffffffffffff RBX: fffffffffffffffe RCX: 000000000000000a
        RDX: fffffffffffffffe RSI: ffffffff820b9a20 RDI: ffff88801e5c0000
        RBP: 6db6db6db6db6db7 R08: ffff88801e8bb000 R09: 0000000001b64d13
        R10: ffff88801fa87cf8 R11: 0000000000000001 R12: ffff88801e640000
        R13: ffffffff820b9a20 R14: ffff88801f145258 R15: 0000000000000001
        FS:  00007fb2079817c0(0000) GS:ffff88801dd00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000006 CR3: 000000001fa82000 CR4: 00000000000006a0
        Call Trace:
         __dump_page+0x14/0x2c0
         is_mem_section_removable+0x24c/0x2c0
         removable_show+0x87/0xa0
         dev_attr_show+0x25/0x60
         sysfs_kf_seq_show+0xba/0x110
         seq_read+0x196/0x3f0
         __vfs_read+0x34/0x180
         vfs_read+0xa0/0x150
         ksys_read+0x44/0xb0
         do_syscall_64+0x5e/0x4a0
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
    
    and bisected it down to commit efad4e47 ("mm, memory_hotplug:
    is_mem_section_removable do not pass the end of a zone").
    
    The reason for the crash is that the mapping is garbage for poisoned
    (uninitialized) page.  This shouldn't happen as all pages in the zone's
    boundary should be initialized.
    
    Later debugging revealed that the actual problem is an off-by-one when
    evaluating the end_page.  'start_pfn + nr_pages' resp 'zone_end_pfn'
    refers to a pfn after the range and as such it might belong to a
    differen memory section.
    
    This along with CONFIG_SPARSEMEM then makes the loop condition
    completely bogus because a pointer arithmetic doesn't work for pages
    from two different sections in that memory model.
    
    Fix the issue by reworking is_pageblock_removable to be pfn based and
    only use struct page where necessary.  This makes the code slightly
    easier to follow and we will remove the problematic pointer arithmetic
    completely.
    
    Link: http://lkml.kernel.org/r/20190218181544.14616-1-mhocko@kernel.org
    Fixes: efad4e47 ("mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone")
    Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
    Reported-by: <rong.a.chen@intel.com>
    Tested-by: <rong.a.chen@intel.com>
    Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
    Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
    Cc: Matthew Wilcox <willy@infradead.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    891cb2a7
memory_hotplug.c 48.1 KB