• Wyes Karny's avatar
    x86: Remove vendor checks from prefer_mwait_c1_over_halt · aebef63c
    Wyes Karny authored
    Remove vendor checks from prefer_mwait_c1_over_halt function. Restore
    the decision tree to support MWAIT C1 as the default idle state based on
    CPUID checks as done by Thomas Gleixner in
    commit 09fd4b4e ("x86: use cpuid to check MWAIT support for C1")
    
    The decision tree is removed in
    commit 69fb3676 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
    
    Prefer MWAIT when the following conditions are satisfied:
        1. CPUID_Fn00000001_ECX [Monitor] should be set
        2. CPUID_Fn00000005 should be supported
        3. If CPUID_Fn00000005_ECX [EMX] is set then there should be
           at least one C1 substate available, indicated by
           CPUID_Fn00000005_EDX [MWaitC1SubStates] bits.
    
    Otherwise use HLT for default_idle function.
    
    HPC customers who want to optimize for lower latency are known to
    disable Global C-States in the BIOS. In fact, some vendors allow
    choosing a BIOS 'performance' profile which explicitly disables
    C-States.  In this scenario, the cpuidle driver will not be loaded and
    the kernel will continue with the default idle state chosen at boot
    time. On AMD systems currently the default idle state is HLT which has
    a higher exit latency compared to MWAIT.
    
    The reason for the choice of HLT over MWAIT on AMD systems is:
    
    1. Families prior to 10h didn't support MWAIT
    2. Families 10h-15h supported MWAIT, but not MWAIT C1. Hence it was
       preferable to use HLT as the default state on these systems.
    
    However, AMD Family 17h onwards supports MWAIT as well as MWAIT C1. And
    it is preferable to use MWAIT as the default idle state on these
    systems, as it has lower exit latencies.
    
    The below table represents the exit latency for HLT and MWAIT on AMD
    Zen 3 system. Exit latency is measured by issuing a wakeup (IPI) to
    other CPU and measuring how many clock cycles it took to wakeup.  Each
    iteration measures 10K wakeups by pinning source and destination.
    
    HLT:
    
    25.0000th percentile  :      1900 ns
    50.0000th percentile  :      2000 ns
    75.0000th percentile  :      2300 ns
    90.0000th percentile  :      2500 ns
    95.0000th percentile  :      2600 ns
    99.0000th percentile  :      2800 ns
    99.5000th percentile  :      3000 ns
    99.9000th percentile  :      3400 ns
    99.9500th percentile  :      3600 ns
    99.9900th percentile  :      5900 ns
      Min latency         :      1700 ns
      Max latency         :      5900 ns
    Total Samples      9999
    
    MWAIT:
    
    25.0000th percentile  :      1400 ns
    50.0000th percentile  :      1500 ns
    75.0000th percentile  :      1700 ns
    90.0000th percentile  :      1800 ns
    95.0000th percentile  :      1900 ns
    99.0000th percentile  :      2300 ns
    99.5000th percentile  :      2500 ns
    99.9000th percentile  :      3200 ns
    99.9500th percentile  :      3500 ns
    99.9900th percentile  :      4600 ns
      Min latency         :      1200 ns
      Max latency         :      4600 ns
    Total Samples      9997
    
    Improvement (99th percentile): 21.74%
    
    Below is another result for context_switch2 micro-benchmark, which
    brings out the impact of improved wakeup latency through increased
    context-switches per second.
    
    with HLT:
    -------------------------------
    50.0000th percentile  :  190184
    75.0000th percentile  :  191032
    90.0000th percentile  :  192314
    95.0000th percentile  :  192520
    99.0000th percentile  :  192844
    MIN  :  190148
    MAX  :  192852
    
    with MWAIT:
    -------------------------------
    50.0000th percentile  :  277444
    75.0000th percentile  :  278268
    90.0000th percentile  :  278888
    95.0000th percentile  :  279164
    99.0000th percentile  :  280504
    MIN  :  273278
    MAX  :  281410
    
    Improvement(99th percentile): ~ 45.46%
    Signed-off-by: default avatarWyes Karny <wyes.karny@amd.com>
    Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
    Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Tested-by: default avatarZhang Rui <rui.zhang@intel.com>
    Link: https://ozlabs.org/~anton/junkcode/context_switch2.c
    Link: https://lkml.kernel.org/r/0cc675d8fd1f55e41b510e10abf2e21b6e9803d5.1654538381.git-series.wyes.karny@amd.com
    aebef63c
process.c 24.6 KB