• Jack F Vogel's avatar
    [PATCH] check nmi watchdog is broken · 67701ae9
    Jack F Vogel authored
    A bug against an xSeries system showed up recently noting that the
    check_nmi_watchdog() test was failing.
    
    I have been investigating it and discovered in both i386 and x86_64 the
    recent change to the routine to use the cpu_callin_map has uncovered a
    problem.  Prior to that change, on an SMP box, the test was trivally
    passing because all cpu's were found to not yet be online, but now with the
    callin_map they are discovered, it goes on to test the counter and they
    have not yet begun to increment, so it announces a CPU is stuck and bails
    out.
    
    On all the systems I have access to test, the announcement of failure is
    also bougs...  by the time you can login and check /proc/interrupts, the
    NMI count is happily incrementing on all CPUs.  Its just that the test is
    being done too early.
    
    I have tried moving the call to the test around a bit, and it was always
    too early.  I finally hit on this proposed solution, it delays the routine
    via a late_initcall(), seems like the right solution to me.
    Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
    Cc: Andi Kleen <ak@muc.de>
    Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    67701ae9
nmi.c 11.1 KB