x86: Remove vendor checks from prefer_mwait_c1_over_halt
Remove vendor checks from prefer_mwait_c1_over_halt function. Restore the decision tree to support MWAIT C1 as the default idle state based on CPUID checks as done by Thomas Gleixner in commit 09fd4b4e ("x86: use cpuid to check MWAIT support for C1") The decision tree is removed in commit 69fb3676 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param") Prefer MWAIT when the following conditions are satisfied: 1. CPUID_Fn00000001_ECX [Monitor] should be set 2. CPUID_Fn00000005 should be supported 3. If CPUID_Fn00000005_ECX [EMX] is set then there should be at least one C1 substate available, indicated by CPUID_Fn00000005_EDX [MWaitC1SubStates] bits. Otherwise use HLT for default_idle function. HPC customers who want to optimize for lower latency are known to disable Global C-States in the BIOS. In fact, some vendors allow choosing a BIOS 'performance' profile which explicitly disables C-States. In this scenario, the cpuidle driver will not be loaded and the kernel will continue with the default idle state chosen at boot time. On AMD systems currently the default idle state is HLT which has a higher exit latency compared to MWAIT. The reason for the choice of HLT over MWAIT on AMD systems is: 1. Families prior to 10h didn't support MWAIT 2. Families 10h-15h supported MWAIT, but not MWAIT C1. Hence it was preferable to use HLT as the default state on these systems. However, AMD Family 17h onwards supports MWAIT as well as MWAIT C1. And it is preferable to use MWAIT as the default idle state on these systems, as it has lower exit latencies. The below table represents the exit latency for HLT and MWAIT on AMD Zen 3 system. Exit latency is measured by issuing a wakeup (IPI) to other CPU and measuring how many clock cycles it took to wakeup. Each iteration measures 10K wakeups by pinning source and destination. HLT: 25.0000th percentile : 1900 ns 50.0000th percentile : 2000 ns 75.0000th percentile : 2300 ns 90.0000th percentile : 2500 ns 95.0000th percentile : 2600 ns 99.0000th percentile : 2800 ns 99.5000th percentile : 3000 ns 99.9000th percentile : 3400 ns 99.9500th percentile : 3600 ns 99.9900th percentile : 5900 ns Min latency : 1700 ns Max latency : 5900 ns Total Samples 9999 MWAIT: 25.0000th percentile : 1400 ns 50.0000th percentile : 1500 ns 75.0000th percentile : 1700 ns 90.0000th percentile : 1800 ns 95.0000th percentile : 1900 ns 99.0000th percentile : 2300 ns 99.5000th percentile : 2500 ns 99.9000th percentile : 3200 ns 99.9500th percentile : 3500 ns 99.9900th percentile : 4600 ns Min latency : 1200 ns Max latency : 4600 ns Total Samples 9997 Improvement (99th percentile): 21.74% Below is another result for context_switch2 micro-benchmark, which brings out the impact of improved wakeup latency through increased context-switches per second. with HLT: ------------------------------- 50.0000th percentile : 190184 75.0000th percentile : 191032 90.0000th percentile : 192314 95.0000th percentile : 192520 99.0000th percentile : 192844 MIN : 190148 MAX : 192852 with MWAIT: ------------------------------- 50.0000th percentile : 277444 75.0000th percentile : 278268 90.0000th percentile : 278888 95.0000th percentile : 279164 99.0000th percentile : 280504 MIN : 273278 MAX : 281410 Improvement(99th percentile): ~ 45.46% Signed-off-by: Wyes Karny <wyes.karny@amd.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Link: https://ozlabs.org/~anton/junkcode/context_switch2.c Link: https://lkml.kernel.org/r/0cc675d8fd1f55e41b510e10abf2e21b6e9803d5.1654538381.git-series.wyes.karny@amd.com
Showing
Please register or sign in to comment