• Nicholas Piggin's avatar
    timers: Fix excessive granularity of new timers after a nohz idle · 2fe59f50
    Nicholas Piggin authored
    When a timer base is idle, it is forwarded when a new timer is added
    to ensure that granularity does not become excessive. When not idle,
    the timer tick is expected to increment the base.
    
    However there are several problems:
    
    - If an existing timer is modified, the base is forwarded only after
      the index is calculated.
    
    - The base is not forwarded by add_timer_on.
    
    - There is a window after a timer is restarted from a nohz idle, after
      it is marked not-idle and before the timer tick on this CPU, where a
      timer may be added but the ancient base does not get forwarded.
    
    These result in excessive granularity (a 1 jiffy timeout can blow out
    to 100s of jiffies), which cause the rcu lockup detector to trigger,
    among other things.
    
    Fix this by keeping track of whether the timer base has been idle
    since it was last run or forwarded, and if so then forward it before
    adding a new timer.
    
    There is still a case where mod_timer optimises the case of a pending
    timer mod with the same expiry time, where the timer can see excessive
    granularity relative to the new, shorter interval. A comment is added,
    but it's not changed because it is an important fastpath for
    networking.
    
    This has been tested and found to fix the RCU softlockup messages.
    
    Testing was also done with tracing to measure requested versus
    achieved wakeup latencies for all non-deferrable timers in an idle
    system (with no lockup watchdogs running). Wakeup latency relative to
    absolute latency is calculated (note this suffers from round-up skew
    at low absolute times) and analysed:
    
                 max     avg      std
    upstream   506.0    1.20     4.68
    patched      2.0    1.08     0.15
    
    The bug was noticed due to the lockup detector Kconfig changes
    dropping it out of people's .configs and resulting in larger base
    clk skew When the lockup detectors are enabled, no CPU can go idle for
    longer than 4 seconds, which limits the granularity errors.
    Sub-optimal timer behaviour is observable on a smaller scale in that
    case:
    
    	     max     avg      std
    upstream     9.0    1.05     0.19
    patched      2.0    1.04     0.11
    
    Fixes: Fixes: a683f390 ("timers: Forward the wheel clock whenever possible")
    Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Tested-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
    Tested-by: default avatarDavid Miller <davem@davemloft.net>
    Cc: dzickus@redhat.com
    Cc: sfr@canb.auug.org.au
    Cc: mpe@ellerman.id.au
    Cc: Stephen Boyd <sboyd@codeaurora.org>
    Cc: linuxarm@huawei.com
    Cc: abdhalee@linux.vnet.ibm.com
    Cc: John Stultz <john.stultz@linaro.org>
    Cc: akpm@linux-foundation.org
    Cc: paulmck@linux.vnet.ibm.com
    Cc: torvalds@linux-foundation.org
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20170822084348.21436-1-npiggin@gmail.com
    2fe59f50
timer.c 54.9 KB