• Waiman Long's avatar
    x86/hpet: Reduce HPET counter read contention · f99fd22e
    Waiman Long authored
    On a large system with many CPUs, using HPET as the clock source can
    have a significant impact on the overall system performance because
    of the following reasons:
     1) There is a single HPET counter shared by all the CPUs.
     2) HPET counter reading is a very slow operation.
    
    Using HPET as the default clock source may happen when, for example,
    the TSC clock calibration exceeds the allowable tolerance. Something
    the performance slowdown can be so severe that the system may crash
    because of a NMI watchdog soft lockup, for example.
    
    During the TSC clock calibration process, the default clock source
    will be set temporarily to HPET. For systems with many CPUs, it is
    possible that NMI watchdog soft lockup may occur occasionally during
    that short time period where HPET clocking is active as is shown in
    the kernel log below:
    
    [   71.646504] hpet0: 8 comparators, 64-bit 14.318180 MHz counter
    [   71.655313] Switching to clocksource hpet
    [   95.679135] BUG: soft lockup - CPU#144 stuck for 23s! [swapper/144:0]
    [   95.693363] BUG: soft lockup - CPU#145 stuck for 23s! [swapper/145:0]
    [   95.695580] BUG: soft lockup - CPU#582 stuck for 23s! [swapper/582:0]
    [   95.698128] BUG: soft lockup - CPU#357 stuck for 23s! [swapper/357:0]
    
    This patch addresses the above issues by reducing HPET read contention
    using the fact that if more than one CPUs are trying to access HPET at
    the same time, it will be more efficient when only one CPU in the group
    reads the HPET counter and shares it with the rest of the group instead
    of each group member trying to read the HPET counter individually.
    
    This is done by using a combination quadword that contains a 32-bit
    stored HPET value and a 32-bit spinlock.  The CPU that gets the lock
    will be responsible for reading the HPET counter and storing it in
    the quadword. The others will monitor the change in HPET value and
    lock status and grab the latest stored HPET value accordingly. This
    change is only enabled on 64-bit SMP configuration.
    
    On a 4-socket Haswell-EX box with 144 threads (HT on), running the
    AIM7 compute workload (1500 users) on a 4.8-rc1 kernel (HZ=1000)
    with and without the patch has the following performance numbers
    (with HPET or TSC as clock source):
    
    TSC		= 1042431 jobs/min
    HPET w/o patch	=  798068 jobs/min
    HPET with patch	= 1029445 jobs/min
    
    The perf profile showed a reduction of the %CPU time consumed by
    read_hpet from 11.19% without patch to 1.24% with patch.
    
    [ tglx: It's really sad that we need to have such hacks just to deal with
      	the fact that cpu vendors have not managed to fix the TSC wreckage
      	within 15+ years. Were They Forgetting? ]
    Signed-off-by: default avatarWaiman Long <Waiman.Long@hpe.com>
    Tested-by: default avatarPrarit Bhargava <prarit@redhat.com>
    Cc: Scott J Norton <scott.norton@hpe.com>
    Cc: Douglas Hatch <doug.hatch@hpe.com>
    Cc: Randy Wright <rwright@hpe.com>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Borislav Petkov <bp@suse.de>
    Link: http://lkml.kernel.org/r/1473182530-29175-1-git-send-email-Waiman.Long@hpe.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    f99fd22e
hpet.c 31.8 KB