An error occurred fetching the project authors.
  1. 10 Jun, 2013 1 commit
    • John Stultz's avatar
      2.6.32.y: timekeeping: Fix nohz issue with commit 61b76840 · d556d326
      John Stultz authored
      Commit 61b76840 ("time: Avoid
      making adjustments if we haven't accumulated anything")
      introduced a regression with nohz.
      
      Basically with kernels between 2.6.20-something to 2.6.32,
      we accumulate time in half second chunks, rather then every
      timer-tick. This was added because when NOHZ landed, if you
      were idle for a few seconds, you had to spin for every tick
      we skipped in the accumulation loop, which created some bad
      latencies.
      
      However, this required that we create the xtime_cache() which
      was still updated each tick, so that filesystem timestamps,
      etc continued to see time increment normally.
      
      Of course, the xtime_cache is updated at the bottom of
      update_wall_time(). So the early return on
      (offset < timekeeper.cycle_interval), added by the problematic
      commit causes the xtime_cache to not be updated.
      
      This can cause code using current_kernel_time() (like the mqueue
      code) or hrtimer_get_softirq_time(), which uses the non-updated
      xtime_cache, to see timers to fire with very coarse half-second
      granularity.
      
      Many thanks to Romain for describing the issue clearly,
      providing test case to reproduce it and helping with testing
      the solution.
      
      This change is for 2.6.32-stable ONLY!
      
      Cc: stable@vger.kernel.org
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Romain Francoise <romain@orebokech.com>
      Reported-by: default avatarRomain Francoise <romain@orebokech.com>
      Tested-by: default avatarRomain Francoise <romain@orebokech.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d556d326
  2. 07 Oct, 2012 10 commits
  3. 09 Dec, 2011 1 commit
  4. 23 Jun, 2011 1 commit
  5. 23 May, 2011 1 commit
    • john stultz's avatar
      Fix time() inconsistencies caused by intermediate xtime_cache values being read · 2a4027a4
      john stultz authored
      Currently with 2.6.32-longterm, its possible for time() to occasionally
      return values one second earlier then the previous time() call.
      
      This happens because update_xtime_cache() does:
      	xtime_cache = xtime;
      	timespec_add_ns(&xtime_cache, nsec);
      
      Its possible that xtime is 1sec,999msecs, and nsecs is 1ms, resulting in
      a xtime_cache that is 2sec,0ms.
      
      get_seconds() (which is used by sys_time()) does not take the
      xtime_lock, which is ok as the xtime.tv_sec value is a long and can be
      atomically read safely.
      
      The problem occurs the next call to update_xtime_cache() if xtime has
      not increased:
      	/* This sets xtime_cache back to 1sec, 999msec */
      	xtime_cache = xtime; 
      	/* get_seconds, calls here, and sees a 1second inconsistency */
      	timespec_add_ns(&xtime_cache, nsec);
      
      
      In order to resolve this, we could add locking to get_seconds(), but it
      needs to be lock free, as it is called from the machine check handler,
      opening a possible deadlock.
      
      So instead, this patch introduces an intermediate value for the
      calculations, so that we only assign xtime_cache once with the correct
      time, using ACCESS_ONCE to make sure the compiler doesn't optimize out
      any intermediate values.
      
      The xtime_cache manipulations were removed with 2.6.35, so that kernel
      and later do not need this change.
      
      In 2.6.33 and 2.6.34 the logarithmic accumulation should make it so
      xtime is updated each tick, so it is unlikely that two updates to
      xtime_cache could occur while the difference between xtime and
      xtime_cache crosses the second boundary. However, the paranoid might
      want to pull this into 2.6.33/34-longterm just to be sure.
      
      Thanks to Stephen for helping finally narrow down the root cause and
      many hours of help with testing and validation. Also thanks to Max,
      Andi, Eric and Paul for review of earlier attempts and helping clarify
      what is possible with regard to out of order execution.
      Acked-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      2a4027a4
  6. 13 Aug, 2010 1 commit
    • Lin Ming's avatar
      timekeeping: Fix clock_gettime vsyscall time warp · 8aa31494
      Lin Ming authored
      commit 0696b711 upstream.
      
      Since commit 0a544198 "timekeeping: Move NTP adjusted clock multiplier
      to struct timekeeper" the clock multiplier of vsyscall is updated with
      the unmodified clock multiplier of the clock source and not with the
      NTP adjusted multiplier of the timekeeper.
      
      This causes user space observerable time warps:
      new CLOCK-warp maximum: 120 nsecs,  00000025c337c537 -> 00000025c337c4bf
      
      Add a new argument "mult" to update_vsyscall() and hand in the
      timekeeping internal NTP adjusted multiplier.
      Signed-off-by: default avatarLin Ming <ming.m.lin@intel.com>
      Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Tony Luck <tony.luck@intel.com>
      LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKurt Garloff <garloff@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      8aa31494
  7. 23 Feb, 2010 1 commit
  8. 28 Jan, 2010 1 commit
    • Jon Hunter's avatar
      nohz: Prevent clocksource wrapping during idle · a9238ce3
      Jon Hunter authored
      commit 98962465 upstream.
      
      The dynamic tick allows the kernel to sleep for periods longer than a
      single tick, but it does not limit the sleep time currently. In the
      worst case the kernel could sleep longer than the wrap around time of
      the time keeping clock source which would result in losing track of
      time.
      
      Prevent this by limiting it to the safe maximum sleep time of the
      current time keeping clock source. The value is calculated when the
      clock source is registered.
      
      [ tglx: simplified the code a bit and massaged the commit msg ]
      Signed-off-by: default avatarJon Hunter <jon-hunter@ti.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <1250617512-23567-2-git-send-email-jon-hunter@ti.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      a9238ce3
  9. 11 Oct, 2009 1 commit
  10. 25 Aug, 2009 1 commit
  11. 21 Aug, 2009 1 commit
    • john stultz's avatar
      time: Introduce CLOCK_REALTIME_COARSE · da15cfda
      john stultz authored
      After talking with some application writers who want very fast, but not
      fine-grained timestamps, I decided to try to implement new clock_ids
      to clock_gettime(): CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE
      which returns the time at the last tick. This is very fast as we don't
      have to access any hardware (which can be very painful if you're using
      something like the acpi_pm clocksource), and we can even use the vdso
      clock_gettime() method to avoid the syscall. The only trade off is you
      only get low-res tick grained time resolution.
      
      This isn't a new idea, I know Ingo has a patch in the -rt tree that made
      the vsyscall gettimeofday() return coarse grained time when the
      vsyscall64 sysctrl was set to 2. However this affects all applications
      on a system.
      
      With this method, applications can choose the proper speed/granularity
      trade-off for themselves.
      Signed-off-by: default avatarJohn Stultz <johnstul@us.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: nikolag@ca.ibm.com
      Cc: Darren Hart <dvhltc@us.ibm.com>
      Cc: arjan@infradead.org
      Cc: jonathan@jonmasters.org
      LKML-Reference: <1250734414.6897.5.camel@localhost.localdomain>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      da15cfda
  12. 15 Aug, 2009 11 commits
  13. 07 Jul, 2009 2 commits
    • Thomas Gleixner's avatar
      timekeeping: Move ktime_get() functions to timekeeping.c · a40f262c
      Thomas Gleixner authored
      The ktime_get() functions for GENERIC_TIME=n are still located in
      hrtimer.c. Move them to time/timekeeping.c where they belong.
      
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      a40f262c
    • Martin Schwidefsky's avatar
      timekeeping: optimized ktime_get[_ts] for GENERIC_TIME=y · 951ed4d3
      Martin Schwidefsky authored
      The generic ktime_get function defined in kernel/hrtimer.c is suboptimial
      for GENERIC_TIME=y:
      
       0)               |  ktime_get() {
       0)               |    ktime_get_ts() {
       0)               |      getnstimeofday() {
       0)               |        read_tod_clock() {
       0)   0.601 us    |        }
       0)   1.938 us    |      }
       0)               |      set_normalized_timespec() {
       0)   0.602 us    |      }
       0)   4.375 us    |    }
       0)   5.523 us    |  }
      
      Overall there are two read_seqbegin/read_seqretry loops and a lot of
      unnecessary struct timespec calculations. ktime_get returns a nano second
      value which is the sum of xtime, wall_to_monotonic and the nano second
      delta from the clock source.
      
      ktime_get can be optimized for GENERIC_TIME=y. The new version only calls
      clocksource_read:
      
       0)               |  ktime_get() {
       0)               |    read_tod_clock() {
       0)   0.610 us    |    }
       0)   1.977 us    |  }
      
      It uses a single read_seqbegin/readseqretry loop and just adds everthing
      to a nano second value.
      
      ktime_get_ts is optimized in a similar fashion.
      
      [ tglx: added WARN_ON(timekeeping_suspended) as in getnstimeofday() ]
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Acked-by: default avatarjohn stultz <johnstul@us.ibm.com>
      LKML-Reference: <20090707112728.3005244d@skybase>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      951ed4d3
  14. 15 May, 2009 1 commit
    • Thomas Gleixner's avatar
      sched, timers: move calc_load() to scheduler · dce48a84
      Thomas Gleixner authored
      Dimitri Sivanich noticed that xtime_lock is held write locked across
      calc_load() which iterates over all online CPUs. That can cause long
      latencies for xtime_lock readers on large SMP systems. 
      
      The load average calculation is an rough estimate anyway so there is
      no real need to protect the readers vs. the update. It's not a problem
      when the avenrun array is updated while a reader copies the values.
      
      Instead of iterating over all online CPUs let the scheduler_tick code
      update the number of active tasks shortly before the avenrun update
      happens. The avenrun update itself is handled by the CPU which calls
      do_timer().
      
      [ Impact: reduce xtime_lock write locked section ]
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      dce48a84
  15. 02 May, 2009 1 commit
  16. 21 Apr, 2009 1 commit
  17. 31 Dec, 2008 1 commit
    • Thomas Gleixner's avatar
      sched_clock: prevent scd->clock from moving backwards, take #2 · 1c5745aa
      Thomas Gleixner authored
      Redo:
      
        5b7dba4f: sched_clock: prevent scd->clock from moving backwards
      
      which had to be reverted due to s2ram hangs:
      
        ca7e716c: Revert "sched_clock: prevent scd->clock from moving backwards"
      
      ... this time with resume restoring GTOD later in the sequence
      taken into account as well.
      
      The "timekeeping_suspended" flag is not very nice but we cannot call into
      GTOD before it has been properly resumed and the scheduler will run very
      early in the resume sequence.
      
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1c5745aa
  18. 04 Dec, 2008 1 commit
    • john stultz's avatar
      time: catch xtime_nsec underflows and fix them · 6c9bacb4
      john stultz authored
      Impact: fix time warp bug
      
      Alex Shi, along with Yanmin Zhang have been noticing occasional time
      inconsistencies recently. Through their great diagnosis, they found that
      the xtime_nsec value used in update_wall_time was occasionally going
      negative. After looking through the code for awhile, I realized we have
      the possibility for an underflow when three conditions are met in
      update_wall_time():
      
        1) We have accumulated a second's worth of nanoseconds, so we
           incremented xtime.tv_sec and appropriately decrement xtime_nsec.
           (This doesn't cause xtime_nsec to go negative, but it can cause it
            to be small).
      
        2) The remaining offset value is large, but just slightly less then
           cycle_interval.
      
        3) clocksource_adjust() is speeding up the clock, causing a
           corrective amount (compensating for the increase in the multiplier
           being multiplied against the unaccumulated offset value) to be
           subtracted from xtime_nsec.
      
      This can cause xtime_nsec to underflow.
      
      Unfortunately, since we notify the NTP subsystem via second_overflow()
      whenever we accumulate a full second, and this effects the error
      accumulation that has already occured, we cannot simply revert the
      accumulated second from xtime nor move the second accumulation to after
      the clocksource_adjust call without a change in behavior.
      
      This leaves us with (at least) two options:
      
      1) Simply return from clocksource_adjust() without making a change if we
         notice the adjustment would cause xtime_nsec to go negative.
      
      This would work, but I'm concerned that if a large adjustment was needed
      (due to the error being large), it may be possible to get stuck with an
      ever increasing error that becomes too large to correct (since it may
      always force xtime_nsec negative). This may just be paranoia on my part.
      
      2) Catch xtime_nsec if it is negative, then add back the amount its
         negative to both xtime_nsec and the error.
      
      This second method is consistent with how we've handled earlier rounding
      issues, and also has the benefit that the error being added is always in
      the oposite direction also always equal or smaller then the correction
      being applied. So the risk of a corner case where things get out of
      control is lessened.
      
      This patch fixes bug 11970, as tested by Yanmin Zhang
      http://bugzilla.kernel.org/show_bug.cgi?id=11970
      
      Reported-by: alex.shi@intel.com
      Signed-off-by: default avatarJohn Stultz <johnstul@us.ibm.com>
      Acked-by: default avatar"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
      Tested-by: default avatar"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      6c9bacb4
  19. 24 Sep, 2008 1 commit
    • Roman Zippel's avatar
      timekeeping: fix rounding problem during clock update · 5cd1c9c5
      Roman Zippel authored
      Due to a rounding problem during a clock update it's possible for readers
      to observe the clock jumping back by 1nsec.  The following simplified
      example demonstrates the problem:
      
      cycle	xtime
      0	0
      1000	999999.6
      2000	1999999.2
      3000	2999998.8
      ...
      
      1500 =	1499999.4
      =	0.0 + 1499999.4
      =	999999.6 + 499999.8
      
      When reading the clock only the full nanosecond part is used, while
      timekeeping internally keeps nanosecond fractions.  If the clock is now
      updated at cycle 1500 here, a nanosecond is missing due to the truncation.
      
      The simple fix is to round up the xtime value during the update, this also
      changes the distance to the reference time, but the adjustment will
      automatically take care that it stays under control.
      Signed-off-by: default avatarRoman Zippel <zippel@linux-m68k.org>
      Signed-off-by: default avatarJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      5cd1c9c5
  20. 21 Aug, 2008 1 commit
    • John Stultz's avatar
      clocksource: introduce CLOCK_MONOTONIC_RAW · 2d42244a
      John Stultz authored
      In talking with Josip Loncaric, and his work on clock synchronization (see
      btime.sf.net), he mentioned that for really close synchronization, it is
      useful to have access to "hardware time", that is a notion of time that is
      not in any way adjusted by the clock slewing done to keep close time sync.
      
      Part of the issue is if we are using the kernel's ntp adjusted
      representation of time in order to measure how we should correct time, we
      can run into what Paul McKenney aptly described as "Painting a road using
      the lines we're painting as the guide".
      
      I had been thinking of a similar problem, and was trying to come up with a
      way to give users access to a purely hardware based time representation
      that avoided users having to know the underlying frequency and mask values
      needed to deal with the wide variety of possible underlying hardware
      counters.
      
      My solution is to introduce CLOCK_MONOTONIC_RAW.  This exposes a
      nanosecond based time value, that increments starting at bootup and has no
      frequency adjustments made to it what so ever.
      
      The time is accessed from userspace via the posix_clock_gettime() syscall,
      passing CLOCK_MONOTONIC_RAW as the clock_id.
      Signed-off-by: default avatarJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: default avatarRoman Zippel <zippel@linux-m68k.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2d42244a