• Marko Mäkelä's avatar
    MDEV-19845: Adaptive spin loops · 042fc295
    Marko Mäkelä authored
    Starting with the Intel Skylake microarchitecture, the PAUSE
    instruction latency is about 140 clock cycles instead of earlier 10.
    On AMD processors, the latency could be 10 or 50 clock cycles,
    depending on microarchitecture.
    
    Because of this big range of latency, let us scale the loops around
    the PAUSE instruction based on timing results at server startup.
    
    my_cpu_relax_multiplier: New variable: How many times to invoke PAUSE
    in a loop. Only defined for IA-32 and AMD64.
    
    my_cpu_init(): Determine with RDTSC the time to run 16 PAUSE instructions
    in two unrolled loops according, and based on the quicker of the two
    runs, initialize my_cpu_relax_multiplier. This form of calibration was
    suggested by Mikhail Sinyavin from Intel.
    
    LF_BACKOFF(), ut_delay(): Use my_cpu_relax_multiplier when available.
    
    ut_delay(): Define inline in my_cpu.h.
    
    UT_COMPILER_BARRIER(): Remove. This does not seem to have any effect,
    because in our ut_delay() implementation, no computations are being
    performed inside the loop. The purpose of UT_COMPILER_BARRIER() was to
    prohibit the compiler from reordering computations. It was not
    emitting any code.
    042fc295
configure.cmake 29.8 KB