An error occurred fetching the project authors.
  1. 22 Feb, 2012 1 commit
    • Peter Zijlstra's avatar
      sched/events: Revert trace_sched_stat_sleeptime() · 8c79a045
      Peter Zijlstra authored
      Commit 1ac9bc69 ("sched/tracing: Add a new tracepoint for sleeptime")
      added a new sched:sched_stat_sleeptime tracepoint.
      
      It's broken: the first sample we get on a task might be bad because
      of a stale sleep_start value that wasn't reset at the last task switch
      because the tracepoint was not active.
      
      It also breaks the existing schedstat samples due to the side
      effects of:
      
      -               se->statistics.sleep_start = 0;
      ...
      -               se->statistics.block_start = 0;
      
      Nor do I see means to fix it without adding overhead to the scheduler
      fast path, which I'm not willing to for the sake of redundant
      instrumentation.
      
      Most importantly, sleep time information can already be constructed
      by tracing context switches and wakeups, and taking the timestamp
      difference between the schedule-out, the wakeup and the schedule-in.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Vagin <avagin@openvz.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-pc4c9qhl8q6vg3bs4j6k0rbd@git.kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8c79a045
  2. 26 Jan, 2012 2 commits
    • Christian Borntraeger's avatar
      sched/s390: Fix compile error in sched/core.c · db7e527d
      Christian Borntraeger authored
      Commit 029632fb ("sched: Make
      separate sched*.c translation units") removed the include of
      asm/mutex.h from sched.c.
      
      This breaks the combination of:
      
       CONFIG_MUTEX_SPIN_ON_OWNER=yes
       CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX=yes
      
      like s390 without mutex debugging:
      
        CC      kernel/sched/core.o
        kernel/sched/core.c: In function ‘mutex_spin_on_owner’:
        kernel/sched/core.c:3287: error: implicit declaration of function ‘arch_mutex_cpu_relax’
      
      Lets re-add the include to kernel/sched/core.c
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1326268696-30904-1-git-send-email-borntraeger@de.ibm.comSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      db7e527d
    • Peter Zijlstra's avatar
      sched: Fix rq->nr_uninterruptible update race · 4ca9b72b
      Peter Zijlstra authored
      KOSAKI Motohiro noticed the following race:
      
       > CPU0                    CPU1
       > --------------------------------------------------------
       > deactivate_task()
       >                         task->state = TASK_UNINTERRUPTIBLE;
       > activate_task()
       >    rq->nr_uninterruptible--;
       >
       >                         schedule()
       >                           deactivate_task()
       >                             rq->nr_uninterruptible++;
       >
      
      Kosaki-San's scenario is possible when CPU0 runs
      __sched_setscheduler() against CPU1's current @task.
      
      __sched_setscheduler() does a dequeue/enqueue in order to move
      the task to its new queue (position) to reflect the newly provided
      scheduling parameters. However it should be completely invariant to
      nr_uninterruptible accounting, sched_setscheduler() doesn't affect
      readyness to run, merely policy on when to run.
      
      So convert the inappropriate activate/deactivate_task usage to
      enqueue/dequeue_task, which avoids the nr_uninterruptible accounting.
      
      Also convert the two other sites: __migrate_task() and
      normalize_task() that still use activate/deactivate_task. These sites
      aren't really a problem since __migrate_task() will only be called on
      non-running task (and therefore are immume to the described problem)
      and normalize_task() isn't ever used on regular systems.
      
      Also remove the comments from activate/deactivate_task since they're
      misleading at best.
      Reported-by: default avatarKOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1327486224.2614.45.camel@laptopSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4ca9b72b
  3. 10 Jan, 2012 1 commit
  4. 23 Dec, 2011 2 commits
  5. 21 Dec, 2011 2 commits
  6. 16 Dec, 2011 1 commit
  7. 06 Dec, 2011 10 commits
  8. 17 Nov, 2011 2 commits
  9. 16 Nov, 2011 3 commits
  10. 14 Nov, 2011 2 commits
  11. 06 Oct, 2011 5 commits
  12. 04 Oct, 2011 2 commits
  13. 30 Sep, 2011 1 commit
    • Peter Zijlstra's avatar
      posix-cpu-timers: Cure SMP wobbles · d670ec13
      Peter Zijlstra authored
      David reported:
      
        Attached below is a watered-down version of rt/tst-cpuclock2.c from
        GLIBC.  Just build it with "gcc -o test test.c -lpthread -lrt" or
        similar.
      
        Run it several times, and you will see cases where the main thread
        will measure a process clock difference before and after the nanosleep
        which is smaller than the cpu-burner thread's individual thread clock
        difference.  This doesn't make any sense since the cpu-burner thread
        is part of the top-level process's thread group.
      
        I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
        64-bit binaries).
      
        For example:
      
        [davem@boricha build-x86_64-linux]$ ./test
        process: before(0.001221967) after(0.498624371) diff(497402404)
        thread:  before(0.000081692) after(0.498316431) diff(498234739)
        self:    before(0.001223521) after(0.001240219) diff(16698)
        [davem@boricha build-x86_64-linux]$ 
      
        The diff of 'process' should always be >= the diff of 'thread'.
      
        I make sure to wrap the 'thread' clock measurements the most tightly
        around the nanosleep() call, and that the 'process' clock measurements
        are the outer-most ones.
      
        ---
        #include <unistd.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <time.h>
        #include <fcntl.h>
        #include <string.h>
        #include <errno.h>
        #include <pthread.h>
      
        static pthread_barrier_t barrier;
      
        static void *chew_cpu(void *arg)
        {
      	  pthread_barrier_wait(&barrier);
      	  while (1)
      		  __asm__ __volatile__("" : : : "memory");
      	  return NULL;
        }
      
        int main(void)
        {
      	  clockid_t process_clock, my_thread_clock, th_clock;
      	  struct timespec process_before, process_after;
      	  struct timespec me_before, me_after;
      	  struct timespec th_before, th_after;
      	  struct timespec sleeptime;
      	  unsigned long diff;
      	  pthread_t th;
      	  int err;
      
      	  err = clock_getcpuclockid(0, &process_clock);
      	  if (err)
      		  return 1;
      
      	  err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
      	  if (err)
      		  return 1;
      
      	  pthread_barrier_init(&barrier, NULL, 2);
      	  err = pthread_create(&th, NULL, chew_cpu, NULL);
      	  if (err)
      		  return 1;
      
      	  err = pthread_getcpuclockid(th, &th_clock);
      	  if (err)
      		  return 1;
      
      	  pthread_barrier_wait(&barrier);
      
      	  err = clock_gettime(process_clock, &process_before);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(my_thread_clock, &me_before);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(th_clock, &th_before);
      	  if (err)
      		  return 1;
      
      	  sleeptime.tv_sec = 0;
      	  sleeptime.tv_nsec = 500000000;
      	  nanosleep(&sleeptime, NULL);
      
      	  err = clock_gettime(th_clock, &th_after);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(my_thread_clock, &me_after);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(process_clock, &process_after);
      	  if (err)
      		  return 1;
      
      	  diff = process_after.tv_nsec - process_before.tv_nsec;
      	  printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 process_before.tv_sec, process_before.tv_nsec,
      		 process_after.tv_sec, process_after.tv_nsec, diff);
      	  diff = th_after.tv_nsec - th_before.tv_nsec;
      	  printf("thread:  before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 th_before.tv_sec, th_before.tv_nsec,
      		 th_after.tv_sec, th_after.tv_nsec, diff);
      	  diff = me_after.tv_nsec - me_before.tv_nsec;
      	  printf("self:    before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 me_before.tv_sec, me_before.tv_nsec,
      		 me_after.tv_sec, me_after.tv_nsec, diff);
      
      	  return 0;
        }
      
      This is due to us using p->se.sum_exec_runtime in
      thread_group_cputime() where we iterate the thread group and sum all
      data. This does not take time since the last schedule operation (tick
      or otherwise) into account. We can cure this by using
      task_sched_runtime() at the cost of having to take locks.
      
      This also means we can (and must) do away with
      thread_group_sched_runtime() since the modified thread_group_cputime()
      is now more accurate and would deadlock when called from
      thread_group_sched_runtime().
      
      Aside of that it makes the function safe on 32 bit systems. The old
      code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a
      64bit value and could be changed on another cpu at the same time.
      Reported-by: default avatarDavid Miller <davem@davemloft.net>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org
      Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twinsTested-by: default avatarDavid Miller <davem@davemloft.net>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      d670ec13
  14. 29 Sep, 2011 2 commits
    • Shi, Alex's avatar
      nohz: Remove nohz_cpu_mask · fc0763f5
      Shi, Alex authored
      RCU no longer uses this global variable, nor does anyone else.  This
      commit therefore removes this variable.  This reduces memory footprint
      and also removes some atomic instructions and memory barriers from
      the dyntick-idle path.
      Signed-off-by: default avatarAlex Shi <alex.shi@intel.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      fc0763f5
    • Paul E. McKenney's avatar
      rcu: Restore checks for blocking in RCU read-side critical sections · b3fbab05
      Paul E. McKenney authored
      Long ago, using TREE_RCU with PREEMPT would result in "scheduling
      while atomic" diagnostics if you blocked in an RCU read-side critical
      section.  However, PREEMPT now implies TREE_PREEMPT_RCU, which defeats
      this diagnostic.  This commit therefore adds a replacement diagnostic
      based on PROVE_RCU.
      
      Because rcu_lockdep_assert() and lockdep_rcu_dereference() are now being
      used for things that have nothing to do with rcu_dereference(), rename
      lockdep_rcu_dereference() to lockdep_rcu_suspicious() and add a third
      argument that is a string indicating what is suspicious.  This third
      argument is passed in from a new third argument to rcu_lockdep_assert().
      Update all calls to rcu_lockdep_assert() to add an informative third
      argument.
      
      Also, add a pair of rcu_lockdep_assert() calls from within
      rcu_note_context_switch(), one complaining if a context switch occurs
      in an RCU-bh read-side critical section and another complaining if a
      context switch occurs in an RCU-sched read-side critical section.
      These are present only if the PROVE_RCU kernel parameter is enabled.
      
      Finally, fix some checkpatch whitespace complaints in lockdep.c.
      
      Again, you must enable PROVE_RCU to see these new diagnostics.  But you
      are enabling PROVE_RCU to check out new RCU uses in any case, aren't you?
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b3fbab05
  15. 27 Sep, 2011 1 commit
  16. 26 Sep, 2011 1 commit
  17. 20 Sep, 2011 1 commit
  18. 08 Sep, 2011 1 commit
    • Peter Zijlstra's avatar
      posix-cpu-timers: Cure SMP accounting oddities · e8abccb7
      Peter Zijlstra authored
      David reported:
      
        Attached below is a watered-down version of rt/tst-cpuclock2.c from
        GLIBC.  Just build it with "gcc -o test test.c -lpthread -lrt" or
        similar.
      
        Run it several times, and you will see cases where the main thread
        will measure a process clock difference before and after the nanosleep
        which is smaller than the cpu-burner thread's individual thread clock
        difference.  This doesn't make any sense since the cpu-burner thread
        is part of the top-level process's thread group.
      
        I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
        64-bit binaries).
      
        For example:
      
        [davem@boricha build-x86_64-linux]$ ./test
        process: before(0.001221967) after(0.498624371) diff(497402404)
        thread:  before(0.000081692) after(0.498316431) diff(498234739)
        self:    before(0.001223521) after(0.001240219) diff(16698)
        [davem@boricha build-x86_64-linux]$
      
        The diff of 'process' should always be >= the diff of 'thread'.
      
        I make sure to wrap the 'thread' clock measurements the most tightly
        around the nanosleep() call, and that the 'process' clock measurements
        are the outer-most ones.
      
        ---
        #include <unistd.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <time.h>
        #include <fcntl.h>
        #include <string.h>
        #include <errno.h>
        #include <pthread.h>
      
        static pthread_barrier_t barrier;
      
        static void *chew_cpu(void *arg)
        {
      	  pthread_barrier_wait(&barrier);
      	  while (1)
      		  __asm__ __volatile__("" : : : "memory");
      	  return NULL;
        }
      
        int main(void)
        {
      	  clockid_t process_clock, my_thread_clock, th_clock;
      	  struct timespec process_before, process_after;
      	  struct timespec me_before, me_after;
      	  struct timespec th_before, th_after;
      	  struct timespec sleeptime;
      	  unsigned long diff;
      	  pthread_t th;
      	  int err;
      
      	  err = clock_getcpuclockid(0, &process_clock);
      	  if (err)
      		  return 1;
      
      	  err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
      	  if (err)
      		  return 1;
      
      	  pthread_barrier_init(&barrier, NULL, 2);
      	  err = pthread_create(&th, NULL, chew_cpu, NULL);
      	  if (err)
      		  return 1;
      
      	  err = pthread_getcpuclockid(th, &th_clock);
      	  if (err)
      		  return 1;
      
      	  pthread_barrier_wait(&barrier);
      
      	  err = clock_gettime(process_clock, &process_before);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(my_thread_clock, &me_before);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(th_clock, &th_before);
      	  if (err)
      		  return 1;
      
      	  sleeptime.tv_sec = 0;
      	  sleeptime.tv_nsec = 500000000;
      	  nanosleep(&sleeptime, NULL);
      
      	  err = clock_gettime(th_clock, &th_after);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(my_thread_clock, &me_after);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(process_clock, &process_after);
      	  if (err)
      		  return 1;
      
      	  diff = process_after.tv_nsec - process_before.tv_nsec;
      	  printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 process_before.tv_sec, process_before.tv_nsec,
      		 process_after.tv_sec, process_after.tv_nsec, diff);
      	  diff = th_after.tv_nsec - th_before.tv_nsec;
      	  printf("thread:  before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 th_before.tv_sec, th_before.tv_nsec,
      		 th_after.tv_sec, th_after.tv_nsec, diff);
      	  diff = me_after.tv_nsec - me_before.tv_nsec;
      	  printf("self:    before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 me_before.tv_sec, me_before.tv_nsec,
      		 me_after.tv_sec, me_after.tv_nsec, diff);
      
      	  return 0;
        }
      
      This is due to us using p->se.sum_exec_runtime in
      thread_group_cputime() where we iterate the thread group and sum all
      data. This does not take time since the last schedule operation (tick
      or otherwise) into account. We can cure this by using
      task_sched_runtime() at the cost of having to take locks.
      
      This also means we can (and must) do away with
      thread_group_sched_runtime() since the modified thread_group_cputime()
      is now more accurate and would deadlock when called from
      thread_group_sched_runtime().
      Reported-by: default avatarDavid Miller <davem@davemloft.net>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins
      Cc: stable@kernel.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      e8abccb7