1. 06 Jul, 2013 1 commit
  2. 05 Jul, 2013 2 commits
    • Thomas Gleixner's avatar
      hrtimers: Move SMP function call to thread context · 5ec2481b
      Thomas Gleixner authored
      smp_call_function_* must not be called from softirq context.
      
      But clock_was_set() which calls on_each_cpu() is called from softirq
      context to implement a delayed clock_was_set() for the timer interrupt
      handler. Though that almost never gets invoked. A recent change in the
      resume code uses the softirq based delayed clock_was_set to support
      Xens resume mechanism.
      
      linux-next contains a new warning which warns if smp_call_function_*
      is called from softirq context which gets triggered by that Xen
      change.
      
      Fix this by moving the delayed clock_was_set() call to a work context.
      Reported-and-tested-by: default avatarArtem Savkov <artem.savkov@gmail.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>,
      Cc: Konrad Wilk <konrad.wilk@oracle.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: xen-devel@lists.xen.org
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      5ec2481b
    • Thomas Gleixner's avatar
      clocksource: Reselect clocksource when watchdog validated high-res capability · 332962f2
      Thomas Gleixner authored
      Up to commit 5d33b883 (clocksource: Always verify highres capability)
      we had no sanity check when selecting a clocksource, which prevented
      that a non highres capable clocksource is used when the system already
      switched to highres/nohz mode.
      
      The new sanity check works as Alex and Tim found out. It prevents the
      TSC from being used. This happens because on x86 the boot process
      looks like this:
      
       tsc_start_freqency_validation(TSC);
       clocksource_register(HPET);
       clocksource_done_booting();
      	clocksource_select()
      		Selects HPET which is valid for high-res
      
       switch_to_highres();
      
       clocksource_register(TSC);
       	TSC is not selected, because it is not yet
      	flagged as VALID_HIGH_RES
      
       clocksource_watchdog()
      	Validates TSC for highres, but that does not make TSC
      	the current clocksource.
      
      Before the sanity check was added, we installed TSC unvalidated which
      worked most of the time. If the TSC was really detected as unstable,
      then the unstable logic removed it and installed HPET again.
      
      The sanity check is correct and needed. So the watchdog needs to kick
      a reselection of the clocksource, when it qualifies TSC as a valid
      high res clocksource.
      
      To solve this, we mark the clocksource which got the flag
      CLOCK_SOURCE_VALID_FOR_HRES set by the watchdog with an new flag
      CLOCK_SOURCE_RESELECT and trigger the watchdog thread. The watchdog
      thread evaluates the flag and invokes clocksource_select() when set.
      
      To avoid that the clocksource_done_booting() code, which is about to
      install the first real clocksource anyway, needs to go through
      clocksource_select and tick_oneshot_notify() pointlessly, split out
      the clocksource_watchdog_kthread() list walk code and invoke the
      select/notify only when called from clocksource_watchdog_kthread().
      
      So clocksource_done_booting() can utilize the same splitout code
      without the select/notify invocation and the clocksource_mutex
      unlock/relock dance.
      Reported-and-tested-by: default avatarAlex Shi <alex.shi@intel.com>
      Cc: Hans Peter Anvin <hpa@linux.intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <andi.kleen@intel.com>
      Tested-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307042239150.11637@ionos.tec.linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      332962f2
  3. 04 Jul, 2013 2 commits
  4. 03 Jul, 2013 6 commits
    • Frederic Weisbecker's avatar
      posix_timers: fix racy timer delta caching on task exit · a0b2062b
      Frederic Weisbecker authored
      When a task exits, we perform a caching of the remaining cputime delta
      before expiring of its timers.
      
      This is done from the following places:
      
      * When the task is reaped. We iterate through its list of
        posix cpu timers and store the remaining timer delta to
        the timer struct instead of the absolute value.
        (See posix_cpu_timers_exit() / posix_cpu_timers_exit_group() )
      
      * When we call posix_cpu_timer_get() or posix_cpu_timer_schedule().
        If the timer's task is considered dying when watched from these
        places, the same conversion from absolute to relative expiry time
        is performed. Then the given task's reference is released.
        (See clear_dead_task() ).
      
      The relevance of this caching is questionable but this is another
      and deeper debate.
      
      The big issue here is that these two sources of caching don't mix
      up very well together.
      
      More specifically, the caching can easily be done twice, resulting
      in a wrong delta as it gets spuriously substracted a second time by
      the elapsed clock. This can happen in the following scenario:
      
      1) The task exits and gets reaped: we call posix_cpu_timers_exit()
         and the absolute timer expiry values are converted to a relative
         delta.
      
      2) timer_gettime() -> posix_cpu_timer_get() is called and relies on
         clear_dead_task() because  tsk->exit_state == EXIT_DEAD.
         The delta gets substracted again by the elapsed clock and we return
         a wrong result.
      
      To fix this, just remove the caching done on task reaping time.  It
      doesn't bring much value on its own.  The caching done from
      posix_cpu_timer_get/schedule is enough.
      
      And it would also be hard to get it really right: we could make it put and
      clear the target task in the timer struct so that readers know if they are
      dealing with a relative cached of absolute value.  But it would be racy.
      The only safe way to do it would be to lock the itimer->it_lock so that we
      know nobody reads the cputime expiry value while we modify it and its
      target task reference.  Doing so would involve some funny workarounds to
      avoid circular lock against the sighand lock.  There is just no reason to
      maintain this.
      
      The user visible effect of this patch can be observed by running the
      following code: it creates a subthread that launches a posix cputimer
      which expires after 10 seconds. But then the subthread only busy loops for 2
      seconds and exits. The parent reaps the subthread and read the timer value.
      Its expected value should the be the initial timer's expiration value
      minus the cputime elapsed in the subthread. Roughly 10 - 2 = 8 seconds:
      
      	#include <sys/time.h>
      	#include <stdio.h>
      	#include <unistd.h>
      	#include <time.h>
      	#include <pthread.h>
      
      	static timer_t id;
      	static struct itimerspec val = { .it_value.tv_sec = 10, }, new;
      
      	static void *thread(void *unused)
      	{
      		int err;
      		struct timeval start, end, diff;
      
      		timer_create(CLOCK_THREAD_CPUTIME_ID, NULL, &id);
      		if (err < 0) {
      			perror("Can't create timer\n");
      			return NULL;
      		}
      
      		/* Arm 10 sec timer */
      		err = timer_settime(id, 0, &val, NULL);
      		if (err < 0) {
      			perror("Can't set timer\n");
      			return NULL;
      		}
      
      		/* Exit after 2 seconds of execution */
      		gettimeofday(&start, NULL);
      	        do {
      			gettimeofday(&end, NULL);
      			timersub(&end, &start, &diff);
      		} while (diff.tv_sec < 2);
      
      		return NULL;
      	}
      
      	int main(int argc, char **argv)
      	{
      		pthread_t pthread;
      		int err;
      
      		err = pthread_create(&pthread, NULL, thread, NULL);
      		if (err) {
      			perror("Can't create thread\n");
      			return -1;
      		}
      		pthread_join(pthread, NULL);
      		/* Just wait a little bit to make sure the child got reaped */
      		sleep(1);
      		err = timer_gettime(id, &new);
      		if (err)
      			perror("Can't get timer value\n");
      		printf("%d %ld\n", new.it_value.tv_sec, new.it_value.tv_nsec);
      
      		return 0;
      	}
      
      Before the patch:
      
             $ ./posix_cpu_timers
             6 2278074
      
      After the patch:
      
            $ ./posix_cpu_timers
            8 1158766
      
      Before the patch, the elapsed time got two more seconds spuriously accounted.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Olivier Langlois <olivier@trillion01.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a0b2062b
    • Frederic Weisbecker's avatar
      posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule() · 76cdcdd9
      Frederic Weisbecker authored
      In order to re-arm a timer after it fired, we take a sample of the current
      process or thread cputime.
      
      If the task is dying though, we don't arm anything but we cache the
      remaining timer expiration delta for further reads.
      
      Something similar is performed in posix_cpu_timer_get() but here we forget
      to take the process wide cputime sample before caching it.
      
      As a result we are storing random stack content, leading every further
      reads of that timer to return junk values.
      
      Fix this by taking the appropriate sample in the case of process wide
      timers.
      
      This probably doesn't matter much in practice because, at this stage, the
      thread is the last one in the group and we reached exit_notify().  This
      implies that we called exit_itimers() and there should be no more timers
      to handle for that task.
      
      So this is likely dead code anyway but let's fix the current logic
      and the warning that came along:
      
          kernel/posix-cpu-timers.c: In function 'posix_cpu_timer_schedule':
          kernel/posix-cpu-timers.c:1127: warning: 'now' may be used uninitialized in this function
      
      Then we can start to think further about cleaning up that code.
      Reported-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reported-by: default avatarChen Gang <gang.chen@asianux.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Chen Gang <gang.chen@asianux.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Olivier Langlois <olivier@trillion01.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      76cdcdd9
    • Frederic Weisbecker's avatar
      selftests: add basic posix timers selftests · 0bc4b0cf
      Frederic Weisbecker authored
      Add some initial basic tests on a few posix timers interface such as
      setitimer() and timer_settime().
      
      These simply check that expiration happens in a reasonable timeframe after
      expected elapsed clock time (user time, user + system time, real time,
      ...).
      
      This is helpful for finding basic breakages while hacking
      on this subsystem.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Olivier Langlois <olivier@trillion01.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0bc4b0cf
    • Frederic Weisbecker's avatar
      posix_cpu_timers: consolidate expired timers check · 2473f3e7
      Frederic Weisbecker authored
      Consolidate the common code amongst per thread and per process timers list
      on tick time.
      
      List traversal, expiry check and subsequent updates can be shared in a
      common helper.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Olivier Langlois <olivier@trillion01.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2473f3e7
    • Frederic Weisbecker's avatar
      posix_cpu_timers: consolidate timer list cleanups · 1a7fa510
      Frederic Weisbecker authored
      Cleaning up the posix cpu timers on task exit shares some common code
      among timer list types, most notably the list traversal and expiry time
      update.
      
      Unify this in a common helper.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Olivier Langlois <olivier@trillion01.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1a7fa510
    • Frederic Weisbecker's avatar
      posix_cpu_timer: consolidate expiry time type · 55ccb616
      Frederic Weisbecker authored
      The posix cpu timer expiry time is stored in a union of two types: a 64
      bits field if we rely on scheduler precise accounting, or a cputime_t if
      we rely on jiffies.
      
      This results in quite some duplicate code and special cases to handle the
      two types.
      
      Just unify this into a single 64 bits field.  cputime_t can always fit
      into it.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Olivier Langlois <olivier@trillion01.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      55ccb616
  5. 02 Jul, 2013 3 commits
    • Thomas Gleixner's avatar
      tick: Sanitize broadcast control logic · 07bd1172
      Thomas Gleixner authored
      The recent implementation of a generic dummy timer resulted in a
      different registration order of per cpu local timers which made the
      broadcast control logic go belly up.
      
      If the dummy timer is the first clock event device which is registered
      for a CPU, then it is installed, the broadcast timer is initialized
      and the CPU is marked as broadcast target.
      
      If a real clock event device is installed after that, we can fail to
      take the CPU out of the broadcast mask. In the worst case we end up
      with two periodic timer events firing for the same CPU. One from the
      per cpu hardware device and one from the broadcast.
      
      Now the problem is that we have no way to distinguish whether the
      system is in a state which makes broadcasting necessary or the
      broadcast bit was set due to the nonfunctional dummy timer
      installment.
      
      To solve this we need to keep track of the system state seperately and
      provide a more detailed decision logic whether we keep the CPU in
      broadcast mode or not.
      
      The old decision logic only clears the broadcast mode, if the newly
      installed clock event device is not affected by power states.
      
      The new logic clears the broadcast mode if one of the following is
      true:
      
        - The new device is not affected by power states.
      
        - The system is not in a power state affected mode
      
        - The system has switched to oneshot mode. The oneshot broadcast is
          controlled from the deep idle state. The CPU is not in idle at
          this point, so it's safe to remove it from the mask.
      
      If we clear the broadcast bit for the CPU when a new device is
      installed, we also shutdown the broadcast device when this was the
      last CPU in the broadcast mask.
      
      If the broadcast bit is kept, then we leave the new device in shutdown
      state and rely on the broadcast to deliver the timer interrupts via
      the broadcast ipis.
      Reported-and-tested-by: default avatarStehle Vincent-B46079 <B46079@freescale.com>
      Reviewed-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Cc: John Stultz <john.stultz@linaro.org>,
      Cc: Mark Rutland <mark.rutland@arm.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307012153060.4013@ionos.tec.linutronix.de
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      07bd1172
    • Thomas Gleixner's avatar
      tick: Prevent uncontrolled switch to oneshot mode · 1f73a980
      Thomas Gleixner authored
      When the system switches from periodic to oneshot mode, the broadcast
      logic causes a possibility that a CPU which has not yet switched to
      oneshot mode puts its own clock event device into oneshot mode without
      updating the state and the timer handler.
      
      CPU0				CPU1
      				per cpu tickdev is in periodic mode
      				and switched to broadcast
      
      Switch to oneshot mode
       tick_broadcast_switch_to_oneshot()
        cpumask_copy(tick_oneshot_broacast_mask,
      	       tick_broadcast_mask);
      
        broadcast device mode = oneshot
      
      				Timer interrupt
      						
      				irq_enter()
      				 tick_check_oneshot_broadcast()
      				  dev->set_mode(ONESHOT);
      
      				tick_handle_periodic()
      				 if (dev->mode == ONESHOT)
      				   dev->next_event += period;
      				   FAIL.
      
      We fail, because dev->next_event contains KTIME_MAX, if the device was
      in periodic mode before the uncontrolled switch to oneshot happened.
      
      We must copy the broadcast bits over to the oneshot mask, because
      otherwise a CPU which relies on the broadcast would not been woken up
      anymore after the broadcast device switched to oneshot mode.
      
      So we need to verify in tick_check_oneshot_broadcast() whether the CPU
      has already switched to oneshot mode. If not, leave the device
      untouched and let the CPU switch controlled into oneshot mode.
      
      This is a long standing bug, which was never noticed, because the main
      user of the broadcast x86 cannot run into that scenario, AFAICT. The
      nonarchitected timer mess of ARM creates a gazillion of differently
      broken abominations which trigger the shortcomings of that broadcast
      code, which better had never been necessary in the first place.
      Reported-and-tested-by: default avatarStehle Vincent-B46079 <B46079@freescale.com>
      Reviewed-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Cc: John Stultz <john.stultz@linaro.org>,
      Cc: Mark Rutland <mark.rutland@arm.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307012153060.4013@ionos.tec.linutronix.de
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      1f73a980
    • Thomas Gleixner's avatar
      tick: Make oneshot broadcast robust vs. CPU offlining · c9b5a266
      Thomas Gleixner authored
      In periodic mode we remove offline cpus from the broadcast propagation
      mask. In oneshot mode we fail to do so. This was not a problem so far,
      but the recent changes to the broadcast propagation introduced a
      constellation which can result in a NULL pointer dereference.
      
      What happens is:
      
      CPU0			CPU1
      			idle()
      			  arch_idle()
      			    tick_broadcast_oneshot_control(OFF);
      			      set cpu1 in tick_broadcast_force_mask
      			  if (cpu_offline())
      			     arch_cpu_dead()
      
      cpu_dead_cleanup(cpu1)
       cpu1 tickdevice pointer = NULL
      
      broadcast interrupt
        dereference cpu1 tickdevice pointer -> OOPS
      
      We dereference the pointer because cpu1 is still set in
      tick_broadcast_force_mask and tick_do_broadcast() expects a valid
      cpumask and therefor lacks any further checks.
      
      Remove the cpu from the tick_broadcast_force_mask before we set the
      tick device pointer to NULL. Also add a sanity check to the oneshot
      broadcast function, so we can detect such issues w/o crashing the
      machine.
      Reported-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Cc: athorlton@sgi.com
      Cc: CAI Qian <caiqian@redhat.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1306261303260.4013@ionos.tec.linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      c9b5a266
  6. 30 Jun, 2013 6 commits
    • Linus Torvalds's avatar
      Linux 3.10 · 8bb495e3
      Linus Torvalds authored
      8bb495e3
    • Linus Torvalds's avatar
      Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · f0277dce
      Linus Torvalds authored
      Pull another powerpc fix from Benjamin Herrenschmidt:
       "I mentioned that while we had fixed the kernel crashes, EEH error
        recovery didn't always recover...  It appears that I had a fix for
        that already in powerpc-next (with a stable CC).
      
        I cherry-picked it today and did a few tests and it seems that things
        now work quite well.  The patch is also pretty simple, so I see no
        reason to wait before merging it."
      
      * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
        powerpc/eeh: Fix fetching bus for single-dev-PE
      f0277dce
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 4b483802
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "This is a set of seven bug fixes.  Several fcoe fixes for locking
        problems, initiator issues and a VLAN API change, all of which could
        eventually lead to data corruption, one fix for a qla2xxx locking
        problem which could lead to multiple completions of the same request
        (and subsequent data corruption) and a use after free in the ipr
        driver.  Plus one minor MAINTAINERS file update"
      
      (only six bugfixes in this pull, since I had already pulled the fcoe API
      fix directly from Robert Love)
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        [SCSI] ipr: Avoid target_destroy accessing memory after it was freed
        [SCSI] qla2xxx: Fix for locking issue between driver ISR and mailbox routines
        MAINTAINERS: Fix fcoe mailing list
        libfc: extend ex_lock to protect all of fc_seq_send
        libfc: Correct check for initiator role
        libfcoe: Fix Conflicting FCFs issue in the fabric
      4b483802
    • Gavin Shan's avatar
      powerpc/eeh: Fix fetching bus for single-dev-PE · ea461abf
      Gavin Shan authored
      While running Linux as guest on top of phyp, we possiblly have
      PE that includes single PCI device. However, we didn't return
      its PCI bus correctly and it leads to failure on recovery from
      EEH errors for single-dev-PE. The patch fixes the issue.
      
      Cc: <stable@vger.kernel.org> # v3.7+
      Cc: Steve Best <sbest@us.ibm.com>
      Signed-off-by: default avatarGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ea461abf
    • Linus Torvalds's avatar
      Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · 6c355bea
      Linus Torvalds authored
      Pull powerpc fixes from Ben Herrenschmidt:
       "We discovered some breakage in our "EEH" (PCI Error Handling) code
        while doing error injection, due to a couple of regressions.  One of
        them is due to a patch (37f02195 "powerpc/pci: fix PCI-e devices
        rescan issue on powerpc platform") that, in hindsight, I shouldn't
        have merged considering that it caused more problems than it solved.
      
        Please pull those two fixes.  One for a simple EEH address cache
        initialization issue.  The other one is a patch from Guenter that I
        had originally planned to put in 3.11 but which happens to also fix
        that other regression (a kernel oops during EEH error handling and
        possibly hotplug).
      
        With those two, the couple of test machines I've hammered with error
        injection are remaining up now.  EEH appears to still fail to recover
        on some devices, so there is another problem that Gavin is looking
        into but at least it's no longer crashing the kernel."
      
      * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
        powerpc/pci: Improve device hotplug initialization
        powerpc/eeh: Add eeh_dev to the cache during boot
      6c355bea
    • Olof Johansson's avatar
      ARM: dt: Only print warning, not WARN() on bad cpu map in device tree · 8d5bc1a6
      Olof Johansson authored
      Due to recent changes and expecations of proper cpu bindings, there are
      now cases for many of the in-tree devicetrees where a WARN() will hit
      on boot due to badly formatted /cpus nodes.
      
      Downgrade this to a pr_warn() to be less alarmist, since it's not a
      new problem.
      
      Tested on Arndale, Cubox, Seaboard and Panda ES. Panda hits the WARN
      without this, the others do not.
      Acked-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8d5bc1a6
  7. 29 Jun, 2013 11 commits
  8. 28 Jun, 2013 9 commits
    • David Vrabel's avatar
      x86: xen: Sync the CMOS RTC as well as the Xen wallclock · 47433b8c
      David Vrabel authored
      Adjustments to Xen's persistent clock via update_persistent_clock()
      don't actually persist, as the Xen wallclock is a software only clock
      and modifications to it do not modify the underlying CMOS RTC.
      
      The x86_platform.set_wallclock hook is there to keep the hardware RTC
      synchronized. On a guest this is pointless.
      
      On Dom0 we can use the native implementaion which actually updates the
      hardware RTC, but we still need to keep the software emulation of RTC
      for the guests up to date. The subscription to the pvclock_notifier
      allows us to emulate this easily. The notifier is called at every tick
      and when the clock was set.
      
      Right now we only use that notifier when the clock was set, but due to
      the fact that it is called periodically from the timekeeping update
      code, we can utilize it to emulate the NTP driven drift compensation
      of update_persistant_clock() for the Xen wall (software) clock.
      
      Add a 11 minutes periodic update to the pvclock_gtod notifier callback
      to achieve that. The static variable 'next' which maintains that 11
      minutes update cycle is protected by the core code serialization so
      there is no need to add a Xen specific serialization mechanism.
      
      [ tglx: Massaged changelog and added a few comments ]
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/1372329348-20841-6-git-send-email-david.vrabel@citrix.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      47433b8c
    • David Vrabel's avatar
      x86: xen: Sync the wallclock when the system time is set · 5584880e
      David Vrabel authored
      Currently the Xen wallclock is only updated every 11 minutes if NTP is
      synchronized to its clock source (using the sync_cmos_clock() work).
      If a guest is started before NTP is synchronized it may see an
      incorrect wallclock time.
      
      Use the pvclock_gtod notifier chain to receive a notification when the
      system time has changed and update the wallclock to match.
      
      This chain is called on every timer tick and we want to avoid an extra
      (expensive) hypercall on every tick.  Because dom0 has historically
      never provided a very accurate wallclock and guests do not expect one,
      we can do this simply: the wallclock is only updated if the clock was
      set.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/1372329348-20841-5-git-send-email-david.vrabel@citrix.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      5584880e
    • David Vrabel's avatar
      timekeeping: Indicate that clock was set in the pvclock gtod notifier · 780427f0
      David Vrabel authored
      If the clock was set (stepped), set the action parameter to functions
      in the pvclock gtod notifier chain to non-zero.  This allows the
      callee to only do work if the clock was stepped.
      
      This will be used on Xen as the synchronization of the Xen wallclock
      to the control domain's (dom0) system time will be done with this
      notifier and updating on every timer tick is unnecessary and too
      expensive.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/1372329348-20841-4-git-send-email-david.vrabel@citrix.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      780427f0
    • David Vrabel's avatar
      timekeeping: Pass flags instead of multiple bools to timekeeping_update() · 04397fe9
      David Vrabel authored
      Instead of passing multiple bools to timekeeping_updated(), define
      flags and use a single 'action' parameter.  It is then more obvious
      what each timekeeping_update() call does.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/1372329348-20841-3-git-send-email-david.vrabel@citrix.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      04397fe9
    • David Vrabel's avatar
      xen: Remove clock_was_set() call in the resume path · 0eb07165
      David Vrabel authored
      commit 359cdd3f(xen: maintain clock offset over save/restore) added
      a clock_was_set() call into the xen resume code to propagate the
      system time changes. With the modified hrtimer resume code, which
      makes sure that all cpus are notified this call is not longer necessary.
      
      [ tglx: Separated it from the hrtimer change ]
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk  <konrad.wilk@oracle.com>
      Cc: John Stultz  <john.stultz@linaro.org>
      Cc: <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/1372329348-20841-2-git-send-email-david.vrabel@citrix.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      0eb07165
    • David Vrabel's avatar
      hrtimers: Support resuming with two or more CPUs online (but stopped) · 7c4c3a0f
      David Vrabel authored
      hrtimers_resume() only reprograms the timers for the current CPU as it
      assumes that all other CPUs are offline at this point in the resume
      process. If other CPUs are online then their timers will not be
      corrected and they may fire at the wrong time.
      
      When running as a Xen guest, this assumption is not true.  Non-boot
      CPUs are only stopped with IRQs disabled instead of offlining them.
      This is a performance optimization as disabling the CPUs would add an
      unacceptable amount of additional downtime during a live migration (>
      200 ms for a 4 VCPU guest).
      
      hrtimers_resume() cannot call on_each_cpu(retrigger_next_event,...)
      as the other CPUs will be stopped with IRQs disabled.  Instead, defer
      the call to the next softirq.
      
      [ tglx: Separated the xen change out ]
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk  <konrad.wilk@oracle.com>
      Cc: John Stultz  <john.stultz@linaro.org>
      Cc: <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/1372329348-20841-2-git-send-email-david.vrabel@citrix.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      7c4c3a0f
    • Akira Takeuchi's avatar
      mn10300: Use early_param() to parse "mem=" parameter · e3f12a53
      Akira Takeuchi authored
      This fixes the problem that "init=" options may not be passed to kernel
      correctly.
      
      parse_mem_cmdline() of mn10300 arch gets rid of "mem=" string from
      redboot_command_line. Then init_setup() parses the "init=" options from
      static_command_line, which is a copy of redboot_command_line, and keeps
      the pointer to the init options in execute_command variable.
      
      Since the commit 026cee00 upstream (params: <level>_initcall-like kernel
      parameters), static_command_line becomes overwritten by saved_command_line at
      do_initcall_level(). Notice that saved_command_line is a command line
      which includes "mem=" string.
      
      As a result, execute_command may point to weird string by the length of
      "mem=" parameter.
      I noticed this problem when using the command line like this:
      
          mem=128M console=ttyS0,115200 init=/bin/sh
      
      Here is the processing flow of command line parameters.
          start_kernel()
            setup_arch(&command_line)
               parse_mem_cmdline(cmdline_p)
                 * strcpy(boot_command_line, redboot_command_line);
                 * Remove "mem=xxx" from redboot_command_line.
                 * *cmdline_p = redboot_command_line;
            setup_command_line(command_line) <-- command_line is redboot_command_line
              * strcpy(saved_command_line, boot_command_line)
              * strcpy(static_command_line, command_line)
            parse_early_param()
              strlcpy(tmp_cmdline, boot_command_line, COMMAND_LINE_SIZE);
              parse_early_options(tmp_cmdline);
                parse_args("early options", cmdline, NULL, 0, 0, 0, do_early_param);
            parse_args("Booting ..", static_command_line, ...);
              init_setup() <-- save the pointer in execute_command
            rest_init()
              kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND);
      
      At this point, execute_command points to "/bin/sh" string.
      
          kernel_init()
            kernel_init_freeable()
              do_basic_setup()
                do_initcalls()
                  do_initcall_level()
                    (*) strcpy(static_command_line, saved_command_line);
      
      Here, execute_command gets to point to "200" string !!
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      e3f12a53
    • Akira Takeuchi's avatar
      mn10300: Allow to pass array name to get_user() · c6dc9f0a
      Akira Takeuchi authored
      This fixes the following compile error:
      
      CC block/scsi_ioctl.o
      block/scsi_ioctl.c: In function 'sg_scsi_ioctl':
      block/scsi_ioctl.c:449: error: invalid initializer
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      c6dc9f0a
    • Bart Van Assche's avatar
      timer: Fix jiffies wrap behavior of round_jiffies_common() · 9e04d380
      Bart Van Assche authored
      Direct compare of jiffies related values does not work in the wrap
      around case. Replace it with time_is_after_jiffies().
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Link: http://lkml.kernel.org/r/519BC066.5080600@acm.org
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      9e04d380