1. 08 Jun, 2022 2 commits
    • Wyes Karny's avatar
      x86: Remove vendor checks from prefer_mwait_c1_over_halt · aebef63c
      Wyes Karny authored
      Remove vendor checks from prefer_mwait_c1_over_halt function. Restore
      the decision tree to support MWAIT C1 as the default idle state based on
      CPUID checks as done by Thomas Gleixner in
      commit 09fd4b4e ("x86: use cpuid to check MWAIT support for C1")
      
      The decision tree is removed in
      commit 69fb3676 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
      
      Prefer MWAIT when the following conditions are satisfied:
          1. CPUID_Fn00000001_ECX [Monitor] should be set
          2. CPUID_Fn00000005 should be supported
          3. If CPUID_Fn00000005_ECX [EMX] is set then there should be
             at least one C1 substate available, indicated by
             CPUID_Fn00000005_EDX [MWaitC1SubStates] bits.
      
      Otherwise use HLT for default_idle function.
      
      HPC customers who want to optimize for lower latency are known to
      disable Global C-States in the BIOS. In fact, some vendors allow
      choosing a BIOS 'performance' profile which explicitly disables
      C-States.  In this scenario, the cpuidle driver will not be loaded and
      the kernel will continue with the default idle state chosen at boot
      time. On AMD systems currently the default idle state is HLT which has
      a higher exit latency compared to MWAIT.
      
      The reason for the choice of HLT over MWAIT on AMD systems is:
      
      1. Families prior to 10h didn't support MWAIT
      2. Families 10h-15h supported MWAIT, but not MWAIT C1. Hence it was
         preferable to use HLT as the default state on these systems.
      
      However, AMD Family 17h onwards supports MWAIT as well as MWAIT C1. And
      it is preferable to use MWAIT as the default idle state on these
      systems, as it has lower exit latencies.
      
      The below table represents the exit latency for HLT and MWAIT on AMD
      Zen 3 system. Exit latency is measured by issuing a wakeup (IPI) to
      other CPU and measuring how many clock cycles it took to wakeup.  Each
      iteration measures 10K wakeups by pinning source and destination.
      
      HLT:
      
      25.0000th percentile  :      1900 ns
      50.0000th percentile  :      2000 ns
      75.0000th percentile  :      2300 ns
      90.0000th percentile  :      2500 ns
      95.0000th percentile  :      2600 ns
      99.0000th percentile  :      2800 ns
      99.5000th percentile  :      3000 ns
      99.9000th percentile  :      3400 ns
      99.9500th percentile  :      3600 ns
      99.9900th percentile  :      5900 ns
        Min latency         :      1700 ns
        Max latency         :      5900 ns
      Total Samples      9999
      
      MWAIT:
      
      25.0000th percentile  :      1400 ns
      50.0000th percentile  :      1500 ns
      75.0000th percentile  :      1700 ns
      90.0000th percentile  :      1800 ns
      95.0000th percentile  :      1900 ns
      99.0000th percentile  :      2300 ns
      99.5000th percentile  :      2500 ns
      99.9000th percentile  :      3200 ns
      99.9500th percentile  :      3500 ns
      99.9900th percentile  :      4600 ns
        Min latency         :      1200 ns
        Max latency         :      4600 ns
      Total Samples      9997
      
      Improvement (99th percentile): 21.74%
      
      Below is another result for context_switch2 micro-benchmark, which
      brings out the impact of improved wakeup latency through increased
      context-switches per second.
      
      with HLT:
      -------------------------------
      50.0000th percentile  :  190184
      75.0000th percentile  :  191032
      90.0000th percentile  :  192314
      95.0000th percentile  :  192520
      99.0000th percentile  :  192844
      MIN  :  190148
      MAX  :  192852
      
      with MWAIT:
      -------------------------------
      50.0000th percentile  :  277444
      75.0000th percentile  :  278268
      90.0000th percentile  :  278888
      95.0000th percentile  :  279164
      99.0000th percentile  :  280504
      MIN  :  273278
      MAX  :  281410
      
      Improvement(99th percentile): ~ 45.46%
      Signed-off-by: default avatarWyes Karny <wyes.karny@amd.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: default avatarZhang Rui <rui.zhang@intel.com>
      Link: https://ozlabs.org/~anton/junkcode/context_switch2.c
      Link: https://lkml.kernel.org/r/0cc675d8fd1f55e41b510e10abf2e21b6e9803d5.1654538381.git-series.wyes.karny@amd.com
      aebef63c
    • Wyes Karny's avatar
      x86: Handle idle=nomwait cmdline properly for x86_idle · 8bcedb4c
      Wyes Karny authored
      When kernel is booted with idle=nomwait do not use MWAIT as the
      default idle state.
      
      If the user boots the kernel with idle=nomwait, it is a clear
      direction to not use mwait as the default idle state.
      However, the current code does not take this into consideration
      while selecting the default idle state on x86.
      
      Fix it by checking for the idle=nomwait boot option in
      prefer_mwait_c1_over_halt().
      
      Also update the documentation around idle=nomwait appropriately.
      
      [ dhansen: tweak commit message ]
      Signed-off-by: default avatarWyes Karny <wyes.karny@amd.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Tested-by: default avatarZhang Rui <rui.zhang@intel.com>
      Link: https://lkml.kernel.org/r/fdc2dc2d0a1bc21c2f53d989ea2d2ee3ccbc0dbe.1654538381.git-series.wyes.karny@amd.com
      8bcedb4c
  2. 28 Apr, 2022 1 commit
  3. 27 Apr, 2022 1 commit
    • Matthieu Baerts's avatar
      x86/pm: Fix false positive kmemleak report in msr_build_context() · b0b592cf
      Matthieu Baerts authored
      Since
      
        e2a1256b ("x86/speculation: Restore speculation related MSRs during S3 resume")
      
      kmemleak reports this issue:
      
        unreferenced object 0xffff888009cedc00 (size 256):
          comm "swapper/0", pid 1, jiffies 4294693823 (age 73.764s)
          hex dump (first 32 bytes):
            00 00 00 00 00 00 00 00 48 00 00 00 00 00 00 00  ........H.......
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          backtrace:
            msr_build_context (include/linux/slab.h:621)
            pm_check_save_msr (arch/x86/power/cpu.c:520)
            do_one_initcall (init/main.c:1298)
            kernel_init_freeable (init/main.c:1370)
            kernel_init (init/main.c:1504)
            ret_from_fork (arch/x86/entry/entry_64.S:304)
      
      Reproducer:
      
        - boot the VM with a debug kernel config (see
          https://github.com/multipath-tcp/mptcp_net-next/issues/268)
        - wait ~1 minute
        - start a kmemleak scan
      
      The root cause here is alignment within the packed struct saved_context
      (from suspend_64.h). Kmemleak only searches for pointers that are
      aligned (see how pointers are scanned in kmemleak.c), but pahole shows
      that the saved_msrs struct member and all members after it in the
      structure are unaligned:
      
        struct saved_context {
          struct pt_regs             regs;                 /*     0   168 */
          /* --- cacheline 2 boundary (128 bytes) was 40 bytes ago --- */
          u16                        ds;                   /*   168     2 */
      
          ...
      
          u64                        misc_enable;          /*   232     8 */
          bool                       misc_enable_saved;    /*   240     1 */
      
         /* Note below odd offset values for the remainder of this struct */
      
          struct saved_msrs          saved_msrs;           /*   241    16 */
          /* --- cacheline 4 boundary (256 bytes) was 1 bytes ago --- */
          long unsigned int          efer;                 /*   257     8 */
          u16                        gdt_pad;              /*   265     2 */
          struct desc_ptr            gdt_desc;             /*   267    10 */
          u16                        idt_pad;              /*   277     2 */
          struct desc_ptr            idt;                  /*   279    10 */
          u16                        ldt;                  /*   289     2 */
          u16                        tss;                  /*   291     2 */
          long unsigned int          tr;                   /*   293     8 */
          long unsigned int          safety;               /*   301     8 */
          long unsigned int          return_address;       /*   309     8 */
      
          /* size: 317, cachelines: 5, members: 25 */
          /* last cacheline: 61 bytes */
        } __attribute__((__packed__));
      
      Move misc_enable_saved to the end of the struct declaration so that
      saved_msrs fits in before the cacheline 4 boundary.
      
      The comment above the saved_context declaration says to fix wakeup_64.S
      file and __save/__restore_processor_state() if the struct is modified:
      it looks like all the accesses in wakeup_64.S are done through offsets
      which are computed at build-time. Update that comment accordingly.
      
      At the end, the false positive kmemleak report is due to a limitation
      from kmemleak but it is always good to avoid unaligned members for
      optimisation purposes.
      
      Please note that it looks like this issue is not new, e.g.
      
        https://lore.kernel.org/all/9f1bb619-c4ee-21c4-a251-870bd4db04fa@lwfinger.net/
        https://lore.kernel.org/all/94e48fcd-1dbd-ebd2-4c91-f39941735909@molgen.mpg.de/
      
        [ bp: Massage + cleanup commit message. ]
      
      Fixes: 7a9c2dd0 ("x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume")
      Suggested-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Link: https://lore.kernel.org/r/20220426202138.498310-1-matthieu.baerts@tessares.net
      b0b592cf
  4. 05 Apr, 2022 1 commit
  5. 04 Apr, 2022 6 commits
  6. 03 Apr, 2022 8 commits
  7. 02 Apr, 2022 21 commits