• Hiroshi Shimamoto's avatar
    sched: fix race in schedule() · 0e1f3483
    Hiroshi Shimamoto authored
    Fix a hard to trigger crash seen in the -rt kernel that also affects
    the vanilla scheduler.
    
    There is a race condition between schedule() and some dequeue/enqueue
    functions; rt_mutex_setprio(), __setscheduler() and sched_move_task().
    
    When scheduling to idle, idle_balance() is called to pull tasks from
    other busy processor. It might drop the rq lock. It means that those 3
    functions encounter on_rq=0 and running=1. The current task should be
    put when running.
    
    Here is a possible scenario:
    
       CPU0                               CPU1
        |                              schedule()
        |                              ->deactivate_task()
        |                              ->idle_balance()
        |                              -->load_balance_newidle()
    rt_mutex_setprio()                     |
        |                              --->double_lock_balance()
        *get lock                          *rel lock
        * on_rq=0, ruuning=1               |
        * sched_class is changed           |
        *rel lock                          *get lock
        :                                  |
                                           :
                                       ->put_prev_task_rt()
                                       ->pick_next_task_fair()
                                           => panic
    
    The current process of CPU1(P1) is scheduling. Deactivated P1, and the
    scheduler looks for another process on other CPU's runqueue because CPU1
    will be idle. idle_balance(), load_balance_newidle() and
    double_lock_balance() are called and double_lock_balance() could drop
    the rq lock. On the other hand, CPU0 is trying to boost the priority of
    P1. The result of boosting only P1's prio and sched_class are changed to
    RT. The sched entities of P1 and P1's group are never put. It makes
    cfs_rq invalid, because the cfs_rq has curr and no leaf, but
    pick_next_task_fair() is called, then the kernel panics.
    Signed-off-by: default avatarHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
    Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
    Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    0e1f3483
sched.c 194 KB