• Thomas Gleixner's avatar
    cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n · 206b9235
    Thomas Gleixner authored
    Tianyu reported a crash in a CPU hotplug teardown callback when booting a
    kernel which has CONFIG_HOTPLUG_CPU disabled with the 'nosmt' boot
    parameter.
    
    It turns out that the SMP=y CONFIG_HOTPLUG_CPU=n case has been broken
    forever in case that a bringup callback fails. Unfortunately this issue was
    not recognized when the CPU hotplug code was reworked, so the shortcoming
    just stayed in place.
    
    When a bringup callback fails, the CPU hotplug code rolls back the
    operation and takes the CPU offline.
    
    The 'nosmt' command line argument uses a bringup failure to abort the
    bringup of SMT sibling CPUs. This partial bringup is required due to the
    MCE misdesign on Intel CPUs.
    
    With CONFIG_HOTPLUG_CPU=y the rollback works perfectly fine, but
    CONFIG_HOTPLUG_CPU=n lacks essential mechanisms to exercise the low level
    teardown of a CPU including the synchronizations in various facilities like
    RCU, NOHZ and others.
    
    As a consequence the teardown callbacks which must be executed on the
    outgoing CPU within stop machine with interrupts disabled are executed on
    the control CPU in interrupt enabled and preemptible context causing the
    kernel to crash and burn. The pre state machine code has a different
    failure mode which is more subtle and resulting in a less obvious use after
    free crash because the control side frees resources which are still in use
    by the undead CPU.
    
    But this is not a x86 only problem. Any architecture which supports the
    SMP=y HOTPLUG_CPU=n combination suffers from the same issue. It's just less
    likely to be triggered because in 99.99999% of the cases all bringup
    callbacks succeed.
    
    The easy solution of making HOTPLUG_CPU mandatory for SMP is not working on
    all architectures as the following architectures have either no hotplug
    support at all or not all subarchitectures support it:
    
     alpha, arc, hexagon, openrisc, riscv, sparc (32bit), mips (partial).
    
    Crashing the kernel in such a situation is not an acceptable state
    either.
    
    Implement a minimal rollback variant by limiting the teardown to the point
    where all regular teardown callbacks have been invoked and leave the CPU in
    the 'dead' idle state. This has the following consequences:
    
     - the CPU is brought down to the point where the stop_machine takedown
       would happen.
    
     - the CPU stays there forever and is idle
    
     - The CPU is cleared in the CPU active mask, but not in the CPU online
       mask which is a legit state.
    
     - Interrupts are not forced away from the CPU
    
     - All facilities which only look at online mask would still see it, but
       that is the case during normal hotplug/unplug operations as well. It's
       just a (way) longer time frame.
    
    This will expose issues, which haven't been exposed before or only seldom,
    because now the normally transient state of being non active but online is
    a permanent state. In testing this exposed already an issue vs. work queues
    where the vmstat code schedules work on the almost dead CPU which ends up
    in an unbound workqueue and triggers 'preemtible context' warnings. This is
    not a problem of this change, it merily exposes an already existing issue.
    Still this is better than crashing fully without a chance to debug it.
    
    This is mainly thought as workaround for those architectures which do not
    support HOTPLUG_CPU. All others should enforce HOTPLUG_CPU for SMP.
    
    Fixes: 2e1a3483 ("cpu/hotplug: Split out the state walk into functions")
    Reported-by: default avatarTianyu Lan <Tianyu.Lan@microsoft.com>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Tested-by: default avatarTianyu Lan <Tianyu.Lan@microsoft.com>
    Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Konrad Wilk <konrad.wilk@oracle.com>
    Cc: Josh Poimboeuf <jpoimboe@redhat.com>
    Cc: Mukesh Ojha <mojha@codeaurora.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Jiri Kosina <jkosina@suse.cz>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Micheal Kelley <michael.h.kelley@microsoft.com>
    Cc: "K. Y. Srinivasan" <kys@microsoft.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: K. Y. Srinivasan <kys@microsoft.com>
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190326163811.503390616@linutronix.de
    206b9235
cpu.c 55 KB