• Robert Richter's avatar
    perf, x86: Try to handle unknown nmis with an enabled PMU · 4177c42a
    Robert Richter authored
    When the PMU is enabled it is valid to have unhandled nmis, two
    events could trigger 'simultaneously' raising two back-to-back
    NMIs. If the first NMI handles both, the latter will be empty
    and daze the CPU.
    
    The solution to avoid an 'unknown nmi' massage in this case was
    simply to stop the nmi handler chain when the PMU is enabled by
    stating the nmi was handled. This has the drawback that a) we
    can not detect unknown nmis anymore, and b) subsequent nmi
    handlers are not called.
    
    This patch addresses this. Now, we check this unknown NMI if it
    could be a PMU back-to-back NMI. Otherwise we pass it and let
    the kernel handle the unknown nmi.
    
    This is a debug log:
    
     cpu #6, nmi #32333, skip_nmi #32330, handled = 1, time = 1934364430
     cpu #6, nmi #32334, skip_nmi #32330, handled = 1, time = 1934704616
     cpu #6, nmi #32335, skip_nmi #32336, handled = 2, time = 1936032320
     cpu #6, nmi #32336, skip_nmi #32336, handled = 0, time = 1936034139
     cpu #6, nmi #32337, skip_nmi #32336, handled = 1, time = 1936120100
     cpu #6, nmi #32338, skip_nmi #32336, handled = 1, time = 1936404607
     cpu #6, nmi #32339, skip_nmi #32336, handled = 1, time = 1937983416
     cpu #6, nmi #32340, skip_nmi #32341, handled = 2, time = 1938201032
     cpu #6, nmi #32341, skip_nmi #32341, handled = 0, time = 1938202830
     cpu #6, nmi #32342, skip_nmi #32341, handled = 1, time = 1938443743
     cpu #6, nmi #32343, skip_nmi #32341, handled = 1, time = 1939956552
     cpu #6, nmi #32344, skip_nmi #32341, handled = 1, time = 1940073224
     cpu #6, nmi #32345, skip_nmi #32341, handled = 1, time = 1940485677
     cpu #6, nmi #32346, skip_nmi #32347, handled = 2, time = 1941947772
     cpu #6, nmi #32347, skip_nmi #32347, handled = 1, time = 1941949818
     cpu #6, nmi #32348, skip_nmi #32347, handled = 0, time = 1941951591
     Uhhuh. NMI received for unknown reason 00 on CPU 6.
     Do you have a strange power saving mode enabled?
     Dazed and confused, but trying to continue
    
    Deltas:
    
     nmi #32334 340186
     nmi #32335 1327704
     nmi #32336 1819      <<<< back-to-back nmi [1]
     nmi #32337 85961
     nmi #32338 284507
     nmi #32339 1578809
     nmi #32340 217616
     nmi #32341 1798      <<<< back-to-back nmi [2]
     nmi #32342 240913
     nmi #32343 1512809
     nmi #32344 116672
     nmi #32345 412453
     nmi #32346 1462095   <<<< 1st nmi (standard) handling 2 counters
     nmi #32347 2046      <<<< 2nd nmi (back-to-back) handling one
     counter nmi #32348 1773      <<<< 3rd nmi (back-to-back)
     handling no counter! [3]
    
    For  back-to-back nmi detection there are the following rules:
    
    The PMU nmi handler was handling more than one counter and no
    counter was handled in the subsequent nmi (see [1] and [2]
    above).
    
    There is another case if there are two subsequent back-to-back
    nmis [3]. The 2nd is detected as back-to-back because the first
    handled more than one counter. If the second handles one counter
    and the 3rd handles nothing, we drop the 3rd nmi because it
    could be a back-to-back nmi.
    Signed-off-by: default avatarRobert Richter <robert.richter@amd.com>
    Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
    [ renamed nmi variable to pmu_nmi to avoid clash with .nmi in entry.S ]
    Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
    Cc: peterz@infradead.org
    Cc: gorcunov@gmail.com
    Cc: fweisbec@gmail.com
    Cc: ying.huang@intel.com
    Cc: ming.m.lin@intel.com
    Cc: eranian@google.com
    LKML-Reference: <1283454469-1909-3-git-send-email-dzickus@redhat.com>
    Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    4177c42a
perf_event.c 39.3 KB