1. 17 Apr, 2013 1 commit
    • Pavel Emelyanov's avatar
      posix timers: Allocate timer id per process (v2) · 5ed67f05
      Pavel Emelyanov authored
      Currently kernel generates IDs for posix timers in a global manner --
      there's a kernel-wide IDR tree from which IDs are created. This makes
      it impossible to recreate a timer with a desired ID (in particular
      this is done by the CRIU checkpoint-restore project) -- since these
      IDs are global it may happen, that at the time we recreate a timer, the
      ID we want for it is already busy by some other timer.
      
      In order to address this, replace the IDR tree with a global hash
      table for timers and makes timer IDs unique per signal_struct (to
      which timers are linked anyway). With this, two timers belonging to
      different processes may have equal IDs and we can recreate either of
      them with the ID we want.
      Signed-off-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Matthew Helsley <matt.helsley@gmail.com>
      Link: http://lkml.kernel.org/r/513D9FF5.9010004@parallels.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      5ed67f05
  2. 11 Apr, 2013 1 commit
  3. 08 Apr, 2013 2 commits
    • David Engraf's avatar
      hrtimer: Fix ktime_add_ns() overflow on 32bit architectures · 51fd36f3
      David Engraf authored
      One can trigger an overflow when using ktime_add_ns() on a 32bit
      architecture not supporting CONFIG_KTIME_SCALAR.
      
      When passing a very high value for u64 nsec, e.g. 7881299347898368000
      the do_div() function converts this value to seconds (7881299347) which
      is still to high to pass to the ktime_set() function as long. The result
      in is a negative value.
      
      The problem on my system occurs in the tick-sched.c,
      tick_nohz_stop_sched_tick() when time_delta is set to
      timekeeping_max_deferment(). The check for time_delta < KTIME_MAX is
      valid, thus ktime_add_ns() is called with a too large value resulting in
      a negative expire value. This leads to an endless loop in the ticker code:
      
      time_delta: 7881299347898368000
      expires = ktime_add_ns(last_update, time_delta)
      expires: negative value
      
      This fix caps the value to KTIME_MAX.
      
      This error doesn't occurs on 64bit or architectures supporting
      CONFIG_KTIME_SCALAR (e.g. ARM, x86-32).
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid Engraf <david.engraf@sysgo.com>
      [jstultz: Minor tweaks to commit message & header]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      51fd36f3
    • Prarit Bhargava's avatar
      hrtimer: Add expiry time overflow check in hrtimer_interrupt · 8f294b5a
      Prarit Bhargava authored
      The settimeofday01 test in the LTP testsuite effectively does
      
              gettimeofday(current time);
              settimeofday(Jan 1, 1970 + 100 seconds);
              settimeofday(current time);
      
      This test causes a stack trace to be displayed on the console during the
      setting of timeofday to Jan 1, 1970 + 100 seconds:
      
      [  131.066751] ------------[ cut here ]------------
      [  131.096448] WARNING: at kernel/time/clockevents.c:209 clockevents_program_event+0x135/0x140()
      [  131.104935] Hardware name: Dinar
      [  131.108150] Modules linked in: sg nfsv3 nfs_acl nfsv4 auth_rpcgss nfs dns_resolver fscache lockd sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables kvm_amd kvm sp5100_tco bnx2 i2c_piix4 crc32c_intel k10temp fam15h_power ghash_clmulni_intel amd64_edac_mod pcspkr serio_raw edac_mce_amd edac_core microcode xfs libcrc32c sr_mod sd_mod cdrom ata_generic crc_t10dif pata_acpi radeon i2c_algo_bit drm_kms_helper ttm drm ahci pata_atiixp libahci libata usb_storage i2c_core dm_mirror dm_region_hash dm_log dm_mod
      [  131.176784] Pid: 0, comm: swapper/28 Not tainted 3.8.0+ #6
      [  131.182248] Call Trace:
      [  131.184684]  <IRQ>  [<ffffffff810612af>] warn_slowpath_common+0x7f/0xc0
      [  131.191312]  [<ffffffff8106130a>] warn_slowpath_null+0x1a/0x20
      [  131.197131]  [<ffffffff810b9fd5>] clockevents_program_event+0x135/0x140
      [  131.203721]  [<ffffffff810bb584>] tick_program_event+0x24/0x30
      [  131.209534]  [<ffffffff81089ab1>] hrtimer_interrupt+0x131/0x230
      [  131.215437]  [<ffffffff814b9600>] ? cpufreq_p4_target+0x130/0x130
      [  131.221509]  [<ffffffff81619119>] smp_apic_timer_interrupt+0x69/0x99
      [  131.227839]  [<ffffffff8161805d>] apic_timer_interrupt+0x6d/0x80
      [  131.233816]  <EOI>  [<ffffffff81099745>] ? sched_clock_cpu+0xc5/0x120
      [  131.240267]  [<ffffffff814b9ff0>] ? cpuidle_wrap_enter+0x50/0xa0
      [  131.246252]  [<ffffffff814b9fe9>] ? cpuidle_wrap_enter+0x49/0xa0
      [  131.252238]  [<ffffffff814ba050>] cpuidle_enter_tk+0x10/0x20
      [  131.257877]  [<ffffffff814b9c89>] cpuidle_idle_call+0xa9/0x260
      [  131.263692]  [<ffffffff8101c42f>] cpu_idle+0xaf/0x120
      [  131.268727]  [<ffffffff815f8971>] start_secondary+0x255/0x257
      [  131.274449] ---[ end trace 1151a50552231615 ]---
      
      When we change the system time to a low value like this, the value of
      timekeeper->offs_real will be a negative value.
      
      It seems that the WARN occurs because an hrtimer has been started in the time
      between the releasing of the timekeeper lock and the IPI call (via a call to
      on_each_cpu) in clock_was_set() in the do_settimeofday() code.  The end result
      is that a REALTIME_CLOCK timer has been added with softexpires = expires =
      KTIME_MAX.  The hrtimer_interrupt() fires/is called and the loop at
      kernel/hrtimer.c:1289 is executed.  In this loop the code subtracts the
      clock base's offset (which was set to timekeeper->offs_real in
      do_settimeofday()) from the current hrtimer_cpu_base->expiry value (which
      was KTIME_MAX):
      
      	KTIME_MAX - (a negative value) = overflow
      
      A simple check for an overflow can resolve this problem.  Using KTIME_MAX
      instead of the overflow value will result in the hrtimer function being run,
      and the reprogramming of the timer after that.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      [jstultz: Tweaked commit subject]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      8f294b5a
  4. 04 Apr, 2013 12 commits
  5. 03 Apr, 2013 1 commit
  6. 28 Mar, 2013 1 commit
    • Christian Daudt's avatar
      ARM: bcm281xx: Add timer driver (driver portion) · 8011657b
      Christian Daudt authored
      This adds support for the Broadcom timer, used in the following SoCs:
      BCM11130, BCM11140, BCM11351, BCM28145, BCM28155
      
      Updates from V6:
      - Split DT portion into a separate patch
      
      Updates from V5:
      - Rebase to latest arm-soc/for-next
      
      Updates from V4:
      - Switch code to use CLOCKSOURCE_OF_DECLARE
      
      Updates from V3:
      - Migrate to 3.9 timer framework updates
      
      Updates from V2:
      - prepend static fns + fields with kona_
      
      Updates from V1:
      - Rename bcm_timer.c to bcm_kona_timer.c
      - Pull .h into bcm_kona_timer.c
      - Make timers static
      - Clean up comment block
      - Switched to using clockevents_config_and_register
      - Added an error to the get_timer loop if it repeats too much
      - Added to Documentation/devicetree/bindings/arm/bcm/bcm,kona-timer.txt
      - Added missing readl to timer_disable_and_clear
      
      Note: bcm,kona-timer was kept as the 'compatible' field to make it
      specific enough for when there are multiple bcm timers (bcm,timer is
      too generic).
      Signed-off-by: default avatarChristian Daudt <csd@broadcom.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Reviewed-by: default avatarStephen Warren <swarren@nvidia.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      8011657b
  7. 25 Mar, 2013 3 commits
  8. 22 Mar, 2013 8 commits
  9. 15 Mar, 2013 7 commits
    • Feng Tang's avatar
      timekeeping: utilize the suspend-nonstop clocksource to count suspended time · e445cf1c
      Feng Tang authored
      There are some new processors whose TSC clocksource won't stop during
      suspend. Currently, after system resumes, kernel will use persistent
      clock or RTC to compensate the sleep time, but with these nonstop
      clocksources, we could skip the special compensation from external
      sources, and just use current clocksource for time recounting.
      
      This can solve some time drift bugs caused by some not-so-accurate or
      error-prone RTC devices.
      
      The current way to count suspended time is first try to use the persistent
      clock, and then try the RTC if persistent clock can't be used. This
      patch will change the trying order to:
      	suspend-nonstop clocksource -> persistent clock -> RTC
      
      When counting the sleep time with nonstop clocksource, use an accurate way
      suggested by Jason Gunthorpe to cover very large delta cycles.
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      [jstultz: Small optimization, avoiding re-reading the clocksource]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      e445cf1c
    • Feng Tang's avatar
      x86: tsc: Add support for new S3_NONSTOP feature · 82f9c080
      Feng Tang authored
      Add support for new S3_NONSTOP feature
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      82f9c080
    • Feng Tang's avatar
      clocksource: Add new feature flag CLOCK_SOURCE_SUSPEND_NONSTOP · 5caf4636
      Feng Tang authored
      Some x86 processors have a TSC clocksource, which continues to run
      even when system is suspended. Also most OMAP platforms have a
      32 KHz timer which has similar capability. Add a feature flag so that
      it could be utilized.
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      5caf4636
    • Feng Tang's avatar
      x86: Add cpu capability flag X86_FEATURE_NONSTOP_TSC_S3 · c54fdbb2
      Feng Tang authored
      On some new Intel Atom processors (Penwell and Cloverview), there is
      a feature that the TSC won't stop in S3 state, say the TSC value
      won't be reset to 0 after resume. This feature makes TSC a more reliable
      clocksource and could benefit the timekeeping code during system
      suspend/resume cycle, so add a flag for it.
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      [jstultz: Fix checkpatch warning]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      c54fdbb2
    • Prarit Bhargava's avatar
      x86: Do full rtc synchronization with ntp · 3195ef59
      Prarit Bhargava authored
      Every 11 minutes ntp attempts to update the x86 rtc with the current
      system time.  Currently, the x86 code only updates the rtc if the system
      time is within +/-15 minutes of the current value of the rtc. This
      was done originally to avoid setting the RTC if the RTC was in localtime
      mode (common with Windows dualbooting).  Other architectures do a full
      synchronization and now that we have better infrastructure to detect
      when the RTC is in localtime, there is no reason that x86 should be
      software limited to a 30 minute window.
      
      This patch changes the behavior of the kernel to do a full synchronization
      (year, month, day, hour, minute, and second) of the rtc when ntp requests
      a synchronization between the system time and the rtc.
      
      I've used the RTC library functions in this patchset as they do all the
      required bounds checking.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: x86@kernel.org
      Cc: Matt Fleming <matt.fleming@intel.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: linux-efi@vger.kernel.org
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      [jstultz: Tweak commit message, fold in build fix found by fengguang
      Also add select RTC_LIB to X86, per new dependency, as found by prarit]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      3195ef59
    • John Stultz's avatar
      timekeeping: Use inject_offset in warp_clock · 7859e404
      John Stultz authored
      When warping the clock (from a local time RTC), use
      timekeeping_inject_offset() to atomically add the offset.
      
      This avoids any minor time error caused by the delay between
      reading the time, and then setting the adjusted time.
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      7859e404
    • Dong Zhu's avatar
      timekeeping: Avoid adjust kernel time once hwclock kept in UTC time · c30bd099
      Dong Zhu authored
      If the Hardware Clock kept in local time,kernel will adjust the time
      to be UTC time.But if Hardware Clock kept in UTC time,system will make
      a dummy settimeofday call first (sys_tz.tz_minuteswest = 0) to make sure
      the time is not shifted,so at this point I think maybe it is not necessary
      to set the kernel time once the sys_tz.tz_minuteswest is zero.
      Signed-off-by: default avatarDong Zhu <bluezhudong@gmail.com>
      [jstultz: Updated to merge with conflicting changes ]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      c30bd099
  10. 13 Mar, 2013 4 commits
    • Thomas Gleixner's avatar
      x86: Use tick broadcast expired check · 35b61edb
      Thomas Gleixner authored
      Avoid going back into deep idle if the tick broadcast IPI is about to
      fire.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Arjan van de Veen <arjan@infradead.org>
      Cc: x86@kernel.org
      Link: http://lkml.kernel.org/r/20130306111537.702278273@linutronix.de
      35b61edb
    • Thomas Gleixner's avatar
      arm: Use tick broadcast expired check · 80bbe9f2
      Thomas Gleixner authored
      Avoid going back into deep idle if the tick broadcast IPI is about to
      fire.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: LAK <linux-arm-kernel@lists.infradead.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Arjan van de Veen <arjan@infradead.org>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Tested-by: default avatarSantosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: Jason Liu <liu.h.jason@gmail.com>
      Link: http://lkml.kernel.org/r/20130306111537.640722922@linutronix.de
      80bbe9f2
    • Thomas Gleixner's avatar
      tick: Provide a check for a forced broadcast pending · eaa907c5
      Thomas Gleixner authored
      On the CPU which gets woken along with the target CPU of the broadcast
      the following happens:
      
        deep_idle()
      			<-- spurious wakeup
        broadcast_exit()
          set forced bit
        
        enable interrupts
          
      			<-- Nothing happens
      
        disable interrupts
      
        broadcast_enter()
      			<-- Here we observe the forced bit is set
        deep_idle()
      
      Now after that the target CPU of the broadcast runs the broadcast
      handler and finds the other CPU in both the broadcast and the forced
      mask, sends the IPI and stuff gets back to normal.
      
      So it's not actually harmful, just more evidence for the theory, that
      hardware designers have access to very special drug supplies.
      
      Now there is no point in going back to deep idle just to wake up again
      right away via an IPI. Provide a check which allows the idle code to
      avoid the deep idle transition.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: LAK <linux-arm-kernel@lists.infradead.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Arjan van de Veen <arjan@infradead.org>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Tested-by: default avatarSantosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: Jason Liu <liu.h.jason@gmail.com>
      Link: http://lkml.kernel.org/r/20130306111537.565418308@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      eaa907c5
    • Thomas Gleixner's avatar
      tick: Handle broadcast wakeup of multiple cpus · 989dcb64
      Thomas Gleixner authored
      Some brilliant hardware implementations wake multiple cores when the
      broadcast timer fires. This leads to the following interesting
      problem:
      
      CPU0				CPU1
      wakeup from idle		wakeup from idle
      
      leave broadcast mode		leave broadcast mode
       restart per cpu timer		 restart per cpu timer
       	     	 		go back to idle
      handle broadcast
       (empty mask)			
      				enter broadcast mode
      				programm broadcast device
      enter broadcast mode
      programm broadcast device
      
      So what happens is that due to the forced reprogramming of the cpu
      local timer, we need to set a event in the future. Now if we manage to
      go back to idle before the timer fires, we switch off the timer and
      arm the broadcast device with an already expired time (covered by
      forced mode). So in the worst case we repeat the above ping pong
      forever.
      					
      Unfortunately we have no information about what caused the wakeup, but
      we can check current time against the expiry time of the local cpu. If
      the local event is already in the past, we know that the broadcast
      timer is about to fire and send an IPI. So we mark ourself as an IPI
      target even if we left broadcast mode and avoid the reprogramming of
      the local cpu timer.
      
      This still leaves the possibility that a CPU which is not handling the
      broadcast interrupt is going to reach idle again before the IPI
      arrives. This can't be solved in the core code and will be handled in
      follow up patches.
      Reported-by: default avatarJason Liu <liu.h.jason@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: LAK <linux-arm-kernel@lists.infradead.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Arjan van de Veen <arjan@infradead.org>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Tested-by: default avatarSantosh Shilimkar <santosh.shilimkar@ti.com>
      Link: http://lkml.kernel.org/r/20130306111537.492045206@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      989dcb64