• Michal Hocko's avatar
    mm, memory_hotplug: do not clear numa_node association after hot_remove · 46a3679b
    Michal Hocko authored
    Per-cpu numa_node provides a default node for each possible cpu.  The
    association gets initialized during the boot when the architecture
    specific code explores cpu->NUMA affinity.  When the whole NUMA node is
    removed though we are clearing this association
    
    try_offline_node
      check_and_unmap_cpu_on_node
        unmap_cpu_on_node
          numa_clear_node
            numa_set_node(cpu, NUMA_NO_NODE)
    
    This means that whoever calls cpu_to_node for a cpu associated with such a
    node will get NUMA_NO_NODE.  This is problematic for two reasons.  First
    it is fragile because __alloc_pages_node would simply blow up on an
    out-of-bound access.  We have encountered this when loading kvm module
    
      BUG: unable to handle kernel paging request at 00000000000021c0
      IP: __alloc_pages_nodemask+0x93/0xb70
      PGD 800000ffe853e067 PUD 7336bbc067 PMD 0
      Oops: 0000 [#1] SMP
      [...]
      CPU: 88 PID: 1223749 Comm: modprobe Tainted: G        W          4.4.156-94.64-default #1
      RIP: __alloc_pages_nodemask+0x93/0xb70
      RSP: 0018:ffff887354493b40  EFLAGS: 00010202
      RAX: 00000000000021c0 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000002 RDI: 00000000014000c0
      RBP: 00000000014000c0 R08: ffffffffffffffff R09: 0000000000000000
      R10: ffff88fffc89e790 R11: 0000000000014000 R12: 0000000000000101
      R13: ffffffffa0772cd4 R14: ffffffffa0769ac0 R15: 0000000000000000
      FS:  00007fdf2f2f1700(0000) GS:ffff88fffc880000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000000021c0 CR3: 00000077205ee000 CR4: 0000000000360670
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        alloc_vmcs_cpu+0x3d/0x90 [kvm_intel]
        hardware_setup+0x781/0x849 [kvm_intel]
        kvm_arch_hardware_setup+0x28/0x190 [kvm]
        kvm_init+0x7c/0x2d0 [kvm]
        vmx_init+0x1e/0x32c [kvm_intel]
        do_one_initcall+0xca/0x1f0
        do_init_module+0x5a/0x1d7
        load_module+0x1393/0x1c90
        SYSC_finit_module+0x70/0xa0
        entry_SYSCALL_64_fastpath+0x1e/0xb7
      DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xb7
    
    on an older kernel but the code is basically the same in the current Linus
    tree as well.  alloc_vmcs_cpu could use alloc_pages_nodemask which would
    recognize NUMA_NO_NODE and use alloc_pages_node which would translate it
    to numa_mem_id but that is wrong as well because it would use a cpu
    affinity of the local CPU which might be quite far from the original node.
    It is also reasonable to expect that cpu_to_node will provide a sane
    value and there might be many more callers like that.
    
    The second problem is that __register_one_node relies on cpu_to_node to
    properly associate cpus back to the node when it is onlined.  We do not
    want to lose that link as there is no arch independent way to get it from
    the early boot time AFAICS.
    
    Drop the whole check_and_unmap_cpu_on_node machinery and keep the
    association to fix both issues.  The NODE_DATA(nid) is not deallocated so
    it will stay in place and if anybody wants to allocate from that node then
    a fallback node will be used.
    
    Thanks to Vlastimil Babka for his live system debugging skills that helped
    debugging the issue.
    
    Link: http://lkml.kernel.org/r/20181108100413.966-1-mhocko@kernel.org
    Fixes: e13fe869 ("cpu-hotplug,memory-hotplug: clear cpu_to_node() when offlining the node")
    Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
    Debugged-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Reported-by: default avatarMiroslav Benes <mbenes@suse.cz>
    Acked-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    46a3679b
memory_hotplug.c 47.7 KB