• Akshay Adiga's avatar
    cpufreq: powernv: Ramp-down global pstate slower than local-pstate · eaa2c3ae
    Akshay Adiga authored
    The frequency transition latency from pmin to pmax is observed to be in
    few millisecond granurality. And it usually happens to take a performance
    penalty during sudden frequency rampup requests.
    
    This patch set solves this problem by using an entity called "global
    pstates". The global pstate is a Chip-level entity, so the global entitiy
    (Voltage) is managed across the cores. The local pstate is a Core-level
    entity, so the local entity (frequency) is managed across threads.
    
    This patch brings down global pstate at a slower rate than the local
    pstate. Hence by holding global pstates higher than local pstate makes
    the subsequent rampups faster.
    
    A per policy structure is maintained to keep track of the global and
    local pstate changes. The global pstate is brought down using a parabolic
    equation. The ramp down time to pmin is set to ~5 seconds. To make sure
    that the global pstates are dropped at regular interval , a timer is
    queued for every 2 seconds during ramp-down phase, which eventually brings
    the pstate down to local pstate.
    
    Iozone results show fairly consistent performance boost.
    YCSB on redis shows improved Max latencies in most cases.
    
    Iozone write/rewite test were made with filesizes 200704Kb and 401408Kb
    with different record sizes . The following table shows IOoperations/sec
    with and without patch.
    
    Iozone Results ( in op/sec) ( mean over 3 iterations )
    ---------------------------------------------------------------------
    file size-                      with            without		  %
    recordsize-IOtype               patch           patch		change
    ----------------------------------------------------------------------
    200704-1-SeqWrite               1616532         1615425         0.06
    200704-1-Rewrite                2423195         2303130         5.21
    200704-2-SeqWrite               1628577         1602620         1.61
    200704-2-Rewrite                2428264         2312154         5.02
    200704-4-SeqWrite               1617605         1617182         0.02
    200704-4-Rewrite                2430524         2351238         3.37
    200704-8-SeqWrite               1629478         1600436         1.81
    200704-8-Rewrite                2415308e         2298136         5.09
    200704-16-SeqWrite              1619632         1618250         0.08
    200704-16-Rewrite               2396650         2352591         1.87
    200704-32-SeqWrite              1632544         1598083         2.15
    200704-32-Rewrite               2425119         2329743         4.09
    200704-64-SeqWrite              1617812         1617235         0.03
    200704-64-Rewrite               2402021         2321080         3.48
    200704-128-SeqWrite             1631998         1600256         1.98
    200704-128-Rewrite              2422389         2304954         5.09
    200704-256 SeqWrite             1617065         1616962         0.00
    200704-256-Rewrite              2432539         2301980         5.67
    200704-512-SeqWrite             1632599         1598656         2.12
    200704-512-Rewrite              2429270         2323676         4.54
    200704-1024-SeqWrite            1618758         1616156         0.16
    200704-1024-Rewrite             2431631         2315889         4.99
    401408-1-SeqWrite               1631479         1608132         1.45
    401408-1-Rewrite                2501550         2459409         1.71
    401408-2-SeqWrite               1617095         1626069         -0.55
    401408-2-Rewrite                2507557         2443621         2.61
    401408-4-SeqWrite               1629601         1611869         1.10
    401408-4-Rewrite                2505909         2462098         1.77
    401408-8-SeqWrite               1617110         1626968         -0.60
    401408-8-Rewrite                2512244         2456827         2.25
    401408-16-SeqWrite              1632609         1609603         1.42
    401408-16-Rewrite               2500792         2451405         2.01
    401408-32-SeqWrite              1619294         1628167         -0.54
    401408-32-Rewrite               2510115         2451292         2.39
    401408-64-SeqWrite              1632709         1603746         1.80
    401408-64-Rewrite               2506692         2433186         3.02
    401408-128-SeqWrite             1619284         1627461         -0.50
    401408-128-Rewrite              2518698         2453361         2.66
    401408-256-SeqWrite             1634022         1610681         1.44
    401408-256-Rewrite              2509987         2446328         2.60
    401408-512-SeqWrite             1617524         1628016         -0.64
    401408-512-Rewrite              2504409         2442899         2.51
    401408-1024-SeqWrite            1629812         1611566         1.13
    401408-1024-Rewrite             2507620          2442968        2.64
    
    Tested with YCSB workload (50% update + 50% read) over redis for 1 million
    records and 1 million operation. Each test was carried out with target
    operations per second and persistence disabled.
    
    Max-latency (in us)( mean over 5 iterations )
    ---------------------------------------------------------------
    op/s    Operation       with patch      without patch   %change
    ---------------------------------------------------------------
    15000   Read            61480.6         50261.4         22.32
    15000   cleanup         215.2           293.6           -26.70
    15000   update          25666.2         25163.8         2.00
    
    25000   Read            32626.2         89525.4         -63.56
    25000   cleanup         292.2           263.0           11.10
    25000   update          32293.4         90255.0         -64.22
    
    35000   Read            34783.0         33119.0         5.02
    35000   cleanup         321.2           395.8           -18.8
    35000   update          36047.0         38747.8         -6.97
    
    40000   Read            38562.2         42357.4         -8.96
    40000   cleanup         371.8           384.6           -3.33
    40000   update          27861.4         41547.8         -32.94
    
    45000   Read            42271.0         88120.6         -52.03
    45000   cleanup         263.6           383.0           -31.17
    45000   update          29755.8         81359.0         -63.43
    
    (test without target op/s)
    47659   Read            83061.4         136440.6        -39.12
    47659   cleanup         195.8           193.8           1.03
    47659   update          73429.4         124971.8        -41.24
    Signed-off-by: default avatarAkshay Adiga <akshay.adiga@linux.vnet.ibm.com>
    Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
    Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
    Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    eaa2c3ae
powernv-cpufreq.c 25.2 KB