• Srivatsa S. Bhat's avatar
    x86/mce: Rework cmci_rediscover() to play well with CPU hotplug · 7a0c819d
    Srivatsa S. Bhat authored
    Dave Jones reports that offlining a CPU leads to this trace:
    
    numa_remove_cpu cpu 1 node 0: mask now 0,2-3
    smpboot: CPU 1 is now offline
    BUG: using smp_processor_id() in preemptible [00000000] code:
    cpu-offline.sh/10591
    caller is cmci_rediscover+0x6a/0xe0
    Pid: 10591, comm: cpu-offline.sh Not tainted 3.9.0-rc3+ #2
    Call Trace:
     [<ffffffff81333bbd>] debug_smp_processor_id+0xdd/0x100
     [<ffffffff8101edba>] cmci_rediscover+0x6a/0xe0
     [<ffffffff815f5b9f>] mce_cpu_callback+0x19d/0x1ae
     [<ffffffff8160ea66>] notifier_call_chain+0x66/0x150
     [<ffffffff8107ad7e>] __raw_notifier_call_chain+0xe/0x10
     [<ffffffff8104c2e3>] cpu_notify+0x23/0x50
     [<ffffffff8104c31e>] cpu_notify_nofail+0xe/0x20
     [<ffffffff815ef082>] _cpu_down+0x302/0x350
     [<ffffffff815ef106>] cpu_down+0x36/0x50
     [<ffffffff815f1c9d>] store_online+0x8d/0xd0
     [<ffffffff813edc48>] dev_attr_store+0x18/0x30
     [<ffffffff81226eeb>] sysfs_write_file+0xdb/0x150
     [<ffffffff811adfb2>] vfs_write+0xa2/0x170
     [<ffffffff811ae16c>] sys_write+0x4c/0xa0
     [<ffffffff81613019>] system_call_fastpath+0x16/0x1b
    
    However, a look at cmci_rediscover shows that it can be simplified quite
    a bit, apart from solving the above issue. It invokes functions that
    take spin locks with interrupts disabled, and hence it can run in atomic
    context. Also, it is run in the CPU_POST_DEAD phase, so the dying CPU
    is already dead and out of the cpu_online_mask. So take these points into
    account and simplify the code, and thereby also fix the above issue.
    Reported-by: default avatarDave Jones <davej@redhat.com>
    Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
    Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
    7a0c819d
mce.h 6.89 KB