• Lendacky, Thomas's avatar
    x86/perf/amd: Resolve NMI latency issues for active PMCs · 6d3edaae
    Lendacky, Thomas authored
    On AMD processors, the detection of an overflowed PMC counter in the NMI
    handler relies on the current value of the PMC. So, for example, to check
    for overflow on a 48-bit counter, bit 47 is checked to see if it is 1 (not
    overflowed) or 0 (overflowed).
    
    When the perf NMI handler executes it does not know in advance which PMC
    counters have overflowed. As such, the NMI handler will process all active
    PMC counters that have overflowed. NMI latency in newer AMD processors can
    result in multiple overflowed PMC counters being processed in one NMI and
    then a subsequent NMI, that does not appear to be a back-to-back NMI, not
    finding any PMC counters that have overflowed. This may appear to be an
    unhandled NMI resulting in either a panic or a series of messages,
    depending on how the kernel was configured.
    
    To mitigate this issue, add an AMD handle_irq callback function,
    amd_pmu_handle_irq(), that will invoke the common x86_pmu_handle_irq()
    function and upon return perform some additional processing that will
    indicate if the NMI has been handled or would have been handled had an
    earlier NMI not handled the overflowed PMC. Using a per-CPU variable, a
    minimum value of the number of active PMCs or 2 will be set whenever a
    PMC is active. This is used to indicate the possible number of NMIs that
    can still occur. The value of 2 is used for when an NMI does not arrive
    at the LAPIC in time to be collapsed into an already pending NMI. Each
    time the function is called without having handled an overflowed counter,
    the per-CPU value is checked. If the value is non-zero, it is decremented
    and the NMI indicates that it handled the NMI. If the value is zero, then
    the NMI indicates that it did not handle the NMI.
    Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Cc: <stable@vger.kernel.org> # 4.14.x-
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Stephane Eranian <eranian@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vince Weaver <vincent.weaver@maine.edu>
    Link: https://lkml.kernel.org/r/Message-ID:
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    6d3edaae
core.c 22.1 KB