• Peter Zijlstra's avatar
    sched/rt: Fix double enqueue caused by rt_effective_prio · f558c2b8
    Peter Zijlstra authored
    Double enqueues in rt runqueues (list) have been reported while running
    a simple test that spawns a number of threads doing a short sleep/run
    pattern while being concurrently setscheduled between rt and fair class.
    
      WARNING: CPU: 3 PID: 2825 at kernel/sched/rt.c:1294 enqueue_task_rt+0x355/0x360
      CPU: 3 PID: 2825 Comm: setsched__13
      RIP: 0010:enqueue_task_rt+0x355/0x360
      Call Trace:
       __sched_setscheduler+0x581/0x9d0
       _sched_setscheduler+0x63/0xa0
       do_sched_setscheduler+0xa0/0x150
       __x64_sys_sched_setscheduler+0x1a/0x30
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xae
    
      list_add double add: new=ffff9867cb629b40, prev=ffff9867cb629b40,
    		       next=ffff98679fc67ca0.
      kernel BUG at lib/list_debug.c:31!
      invalid opcode: 0000 [#1] PREEMPT_RT SMP PTI
      CPU: 3 PID: 2825 Comm: setsched__13
      RIP: 0010:__list_add_valid+0x41/0x50
      Call Trace:
       enqueue_task_rt+0x291/0x360
       __sched_setscheduler+0x581/0x9d0
       _sched_setscheduler+0x63/0xa0
       do_sched_setscheduler+0xa0/0x150
       __x64_sys_sched_setscheduler+0x1a/0x30
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    __sched_setscheduler() uses rt_effective_prio() to handle proper queuing
    of priority boosted tasks that are setscheduled while being boosted.
    rt_effective_prio() is however called twice per each
    __sched_setscheduler() call: first directly by __sched_setscheduler()
    before dequeuing the task and then by __setscheduler() to actually do
    the priority change. If the priority of the pi_top_task is concurrently
    being changed however, it might happen that the two calls return
    different results. If, for example, the first call returned the same rt
    priority the task was running at and the second one a fair priority, the
    task won't be removed by the rt list (on_list still set) and then
    enqueued in the fair runqueue. When eventually setscheduled back to rt
    it will be seen as enqueued already and the WARNING/BUG be issued.
    
    Fix this by calling rt_effective_prio() only once and then reusing the
    return value. While at it refactor code as well for clarity. Concurrent
    priority inheritance handling is still safe and will eventually converge
    to a new state by following the inheritance chain(s).
    
    Fixes: 0782e63b ("sched: Handle priority boosted tasks proper in setscheduler()")
    [squashed Peterz changes; added changelog]
    Reported-by: default avatarMark Simmons <msimmons@redhat.com>
    Signed-off-by: default avatarJuri Lelli <juri.lelli@redhat.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/20210803104501.38333-1-juri.lelli@redhat.com
    f558c2b8
core.c 262 KB