• Kan Liang's avatar
    perf/x86/intel: Limit the period on Haswell · 25dfc9e3
    Kan Liang authored
    Running the ltp test cve-2015-3290 concurrently reports the following
    warnings.
    
    perfevents: irq loop stuck!
      WARNING: CPU: 31 PID: 32438 at arch/x86/events/intel/core.c:3174
      intel_pmu_handle_irq+0x285/0x370
      Call Trace:
       <NMI>
       ? __warn+0xa4/0x220
       ? intel_pmu_handle_irq+0x285/0x370
       ? __report_bug+0x123/0x130
       ? intel_pmu_handle_irq+0x285/0x370
       ? __report_bug+0x123/0x130
       ? intel_pmu_handle_irq+0x285/0x370
       ? report_bug+0x3e/0xa0
       ? handle_bug+0x3c/0x70
       ? exc_invalid_op+0x18/0x50
       ? asm_exc_invalid_op+0x1a/0x20
       ? irq_work_claim+0x1e/0x40
       ? intel_pmu_handle_irq+0x285/0x370
       perf_event_nmi_handler+0x3d/0x60
       nmi_handle+0x104/0x330
    
    Thanks to Thomas Gleixner's analysis, the issue is caused by the low
    initial period (1) of the frequency estimation algorithm, which triggers
    the defects of the HW, specifically erratum HSW11 and HSW143. (For the
    details, please refer https://lore.kernel.org/lkml/87plq9l5d2.ffs@tglx/)
    
    The HSW11 requires a period larger than 100 for the INST_RETIRED.ALL
    event, but the initial period in the freq mode is 1. The erratum is the
    same as the BDM11, which has been supported in the kernel. A minimum
    period of 128 is enforced as well on HSW.
    
    HSW143 is regarding that the fixed counter 1 may overcount 32 with the
    Hyper-Threading is enabled. However, based on the test, the hardware
    has more issues than it tells. Besides the fixed counter 1, the message
    'interrupt took too long' can be observed on any counter which was armed
    with a period < 32 and two events expired in the same NMI. A minimum
    period of 32 is enforced for the rest of the events.
    The recommended workaround code of the HSW143 is not implemented.
    Because it only addresses the issue for the fixed counter. It brings
    extra overhead through extra MSR writing. No related overcounting issue
    has been reported so far.
    
    Fixes: 3a632cb2 ("perf/x86/intel: Add simple Haswell PMU support")
    Reported-by: default avatarLi Huafei <lihuafei1@huawei.com>
    Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/all/20240819183004.3132920-1-kan.liang@linux.intel.com
    Closes: https://lore.kernel.org/lkml/20240729223328.327835-1-lihuafei1@huawei.com/
    25dfc9e3
core.c 208 KB