• Ingo Molnar's avatar
    debug lockups: Improve lockup detection · c1dc0b9c
    Ingo Molnar authored
    When debugging a recent lockup bug i found various deficiencies
    in how our current lockup detection helpers work:
    
     - SysRq-L is not very efficient as it uses a workqueue, hence
       it cannot punch through hard lockups and cannot see through
       most soft lockups either.
    
     - The SysRq-L code depends on the NMI watchdog - which is off
       by default.
    
     - We dont print backtraces from the RCU code's built-in
       'RCU state machine is stuck' debug code. This debug
       code tends to be one of the first (and only) mechanisms
       that show that a lockup has occured.
    
    This patch changes the code so taht we:
    
     - Trigger the NMI backtrace code from SysRq-L instead of using
       a workqueue (which cannot punch through hard lockups)
    
     - Trigger print-all-CPU-backtraces from the RCU lockup detection
       code
    
    Also decouple the backtrace printing code from the NMI watchdog:
    
     - Dont use variable size cpumasks (it might not be initialized
       and they are a bit more fragile anyway)
    
     - Trigger an NMI immediately via an IPI, instead of waiting
       for the NMI tick to occur. This is a lot faster and can
       produce more relevant backtraces. It will also work if the
       NMI watchdog is disabled.
    
     - Dont print the 'dazed and confused' message when we print
       a backtrace from the NMI
    
     - Do a show_regs() plus a dump_stack() to get maximum info
       out of the dump. Worst-case we get two stacktraces - which
       is not a big deal. Sometimes, if register content is
       corrupted, the precise stack walker in show_regs() wont
       give us a full backtrace - in this case dump_stack() will
       do it.
    
    Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    LKML-Reference: <new-submission>
    Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    c1dc0b9c
nmi.c 13.1 KB