• Peter Zijlstra's avatar
    sched/core: Fix task and run queue sched_info::run_delay inconsistencies · 1de64443
    Peter Zijlstra authored
    Mike Meyer reported the following bug:
    
    > During evaluation of some performance data, it was discovered thread
    > and run queue run_delay accounting data was inconsistent with the other
    > accounting data that was collected.  Further investigation found under
    > certain circumstances execution time was leaking into the task and
    > run queue accounting of run_delay.
    >
    > Consider the following sequence:
    >
    >     a. thread is running.
    >     b. thread moves beween cgroups, changes scheduling class or priority.
    >     c. thread sleeps OR
    >     d. thread involuntarily gives up cpu.
    >
    > a. implies:
    >
    >     thread->sched_info.last_queued = 0
    >
    > a. and b. results in the following:
    >
    >     1. dequeue_task(rq, thread)
    >
    >            sched_info_dequeued(rq, thread)
    >                delta = 0
    >
    >                sched_info_reset_dequeued(thread)
    >                    thread->sched_info.last_queued = 0
    >
    >                thread->sched_info.run_delay += delta
    >
    >     2. enqueue_task(rq, thread)
    >
    >            sched_info_queued(rq, thread)
    >
    >                /* thread is still on cpu at this point. */
    >                thread->sched_info.last_queued = task_rq(thread)->clock;
    >
    > c. results in:
    >
    >     dequeue_task(rq, thread)
    >
    >         sched_info_dequeued(rq, thread)
    >
    >             /* delta is execution time not run_delay. */
    >             delta = task_rq(thread)->clock - thread->sched_info.last_queued
    >
    >         sched_info_reset_dequeued(thread)
    >             thread->sched_info.last_queued = 0
    >
    >         thread->sched_info.run_delay += delta
    >
    >     Since thread was running between enqueue_task(rq, thread) and
    >     dequeue_task(rq, thread), the delta above is really execution
    >     time and not run_delay.
    >
    > d. results in:
    >
    >     __sched_info_switch(thread, next_thread)
    >
    >         sched_info_depart(rq, thread)
    >
    >             sched_info_queued(rq, thread)
    >
    >                 /* last_queued not updated due to being non-zero */
    >                 return
    >
    >     Since thread was running between enqueue_task(rq, thread) and
    >     __sched_info_switch(thread, next_thread), the execution time
    >     between enqueue_task(rq, thread) and
    >     __sched_info_switch(thread, next_thread) now will become
    >     associated with run_delay due to when last_queued was last updated.
    >
    
    This alternative patch solves the problem by not calling
    sched_info_{de,}queued() in {de,en}queue_task(). Therefore the
    sched_info state is preserved and things work as expected.
    
    By inlining the {de,en}queue_task() functions the new condition
    becomes (mostly) a compile-time constant and we'll not emit any new
    branch instructions.
    
    It even shrinks the code (due to inlining {en,de}queue_task()):
    
    $ size defconfig-build/kernel/sched/core.o defconfig-build/kernel/sched/core.o.orig
       text    data     bss     dec     hex filename
      64019   23378    2344   89741   15e8d defconfig-build/kernel/sched/core.o
      64149   23378    2344   89871   15f0f defconfig-build/kernel/sched/core.o.orig
    Reported-by: default avatarMike Meyer <Mike.Meyer@Teradata.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20150930154413.GO3604@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    1de64443
core.c 206 KB