• Chanho Min's avatar
    sched/rt: Fix task stack corruption under __ARCH_WANT_INTERRUPTS_ON_CTXSW · cb297a3e
    Chanho Min authored
    This issue happens under the following conditions:
    
     1. preemption is off
     2. __ARCH_WANT_INTERRUPTS_ON_CTXSW is defined
     3. RT scheduling class
     4. SMP system
    
    Sequence is as follows:
    
     1.suppose current task is A. start schedule()
     2.task A is enqueued pushable task at the entry of schedule()
       __schedule
        prev = rq->curr;
        ...
        put_prev_task
         put_prev_task_rt
          enqueue_pushable_task
     4.pick the task B as next task.
       next = pick_next_task(rq);
     3.rq->curr set to task B and context_switch is started.
       rq->curr = next;
     4.At the entry of context_swtich, release this cpu's rq->lock.
       context_switch
        prepare_task_switch
         prepare_lock_switch
          raw_spin_unlock_irq(&rq->lock);
     5.Shortly after rq->lock is released, interrupt is occurred and start IRQ context
     6.try_to_wake_up() which called by ISR acquires rq->lock
        try_to_wake_up
         ttwu_remote
          rq = __task_rq_lock(p)
          ttwu_do_wakeup(rq, p, wake_flags);
            task_woken_rt
     7.push_rt_task picks the task A which is enqueued before.
       task_woken_rt
        push_rt_tasks(rq)
         next_task = pick_next_pushable_task(rq)
     8.At find_lock_lowest_rq(), If double_lock_balance() returns 0,
       lowest_rq can be the remote rq.
      (But,If preemption is on, double_lock_balance always return 1 and it
       does't happen.)
       push_rt_task
        find_lock_lowest_rq
         if (double_lock_balance(rq, lowest_rq))..
     9.find_lock_lowest_rq return the available rq. task A is migrated to
       the remote cpu/rq.
       push_rt_task
        ...
        deactivate_task(rq, next_task, 0);
        set_task_cpu(next_task, lowest_rq->cpu);
        activate_task(lowest_rq, next_task, 0);
     10. But, task A is on irq context at this cpu.
         So, task A is scheduled by two cpus at the same time until restore from IRQ.
         Task A's stack is corrupted.
    
    To fix it, don't migrate an RT task if it's still running.
    Signed-off-by: default avatarChanho Min <chanho.min@lge.com>
    Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
    Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    Cc: <stable@kernel.org>
    Link: http://lkml.kernel.org/r/CAOAMb1BHA=5fm7KTewYyke6u-8DP0iUuJMpgQw54vNeXFsGpoQ@mail.gmail.comSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
    cb297a3e
rt.c 46.1 KB