• Hao Jia's avatar
    sched/core: Avoid obvious double update_rq_clock warning · 2679a837
    Hao Jia authored
    When we use raw_spin_rq_lock() to acquire the rq lock and have to
    update the rq clock while holding the lock, the kernel may issue
    a WARN_DOUBLE_CLOCK warning.
    
    Since we directly use raw_spin_rq_lock() to acquire rq lock instead of
    rq_lock(), there is no corresponding change to rq->clock_update_flags.
    In particular, we have obtained the rq lock of other CPUs, the
    rq->clock_update_flags of this CPU may be RQCF_UPDATED at this time, and
    then calling update_rq_clock() will trigger the WARN_DOUBLE_CLOCK warning.
    
    So we need to clear RQCF_UPDATED of rq->clock_update_flags to avoid
    the WARN_DOUBLE_CLOCK warning.
    
    For the sched_rt_period_timer() and migrate_task_rq_dl() cases
    we simply replace raw_spin_rq_lock()/raw_spin_rq_unlock() with
    rq_lock()/rq_unlock().
    
    For the {pull,push}_{rt,dl}_task() cases, we add the
    double_rq_clock_clear_update() function to clear RQCF_UPDATED of
    rq->clock_update_flags, and call double_rq_clock_clear_update()
    before double_lock_balance()/double_rq_lock() returns to avoid the
    WARN_DOUBLE_CLOCK warning.
    
    Some call trace reports:
    Call Trace 1:
     <IRQ>
     sched_rt_period_timer+0x10f/0x3a0
     ? enqueue_top_rt_rq+0x110/0x110
     __hrtimer_run_queues+0x1a9/0x490
     hrtimer_interrupt+0x10b/0x240
     __sysvec_apic_timer_interrupt+0x8a/0x250
     sysvec_apic_timer_interrupt+0x9a/0xd0
     </IRQ>
     <TASK>
     asm_sysvec_apic_timer_interrupt+0x12/0x20
    
    Call Trace 2:
     <TASK>
     activate_task+0x8b/0x110
     push_rt_task.part.108+0x241/0x2c0
     push_rt_tasks+0x15/0x30
     finish_task_switch+0xaa/0x2e0
     ? __switch_to+0x134/0x420
     __schedule+0x343/0x8e0
     ? hrtimer_start_range_ns+0x101/0x340
     schedule+0x4e/0xb0
     do_nanosleep+0x8e/0x160
     hrtimer_nanosleep+0x89/0x120
     ? hrtimer_init_sleeper+0x90/0x90
     __x64_sys_nanosleep+0x96/0xd0
     do_syscall_64+0x34/0x90
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    Call Trace 3:
     <TASK>
     deactivate_task+0x93/0xe0
     pull_rt_task+0x33e/0x400
     balance_rt+0x7e/0x90
     __schedule+0x62f/0x8e0
     do_task_dead+0x3f/0x50
     do_exit+0x7b8/0xbb0
     do_group_exit+0x2d/0x90
     get_signal+0x9df/0x9e0
     ? preempt_count_add+0x56/0xa0
     ? __remove_hrtimer+0x35/0x70
     arch_do_signal_or_restart+0x36/0x720
     ? nanosleep_copyout+0x39/0x50
     ? do_nanosleep+0x131/0x160
     ? audit_filter_inodes+0xf5/0x120
     exit_to_user_mode_prepare+0x10f/0x1e0
     syscall_exit_to_user_mode+0x17/0x30
     do_syscall_64+0x40/0x90
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    Call Trace 4:
     update_rq_clock+0x128/0x1a0
     migrate_task_rq_dl+0xec/0x310
     set_task_cpu+0x84/0x1e4
     try_to_wake_up+0x1d8/0x5c0
     wake_up_process+0x1c/0x30
     hrtimer_wakeup+0x24/0x3c
     __hrtimer_run_queues+0x114/0x270
     hrtimer_interrupt+0xe8/0x244
     arch_timer_handler_phys+0x30/0x50
     handle_percpu_devid_irq+0x88/0x140
     generic_handle_domain_irq+0x40/0x60
     gic_handle_irq+0x48/0xe0
     call_on_irq_stack+0x2c/0x60
     do_interrupt_handler+0x80/0x84
    
    Steps to reproduce:
    1. Enable CONFIG_SCHED_DEBUG when compiling the kernel
    2. echo 1 > /sys/kernel/debug/clear_warn_once
       echo "WARN_DOUBLE_CLOCK" > /sys/kernel/debug/sched/features
       echo "NO_RT_PUSH_IPI" > /sys/kernel/debug/sched/features
    3. Run some rt/dl tasks that periodically work and sleep, e.g.
    Create 2*n rt or dl (90% running) tasks via rt-app (on a system
    with n CPUs), and Dietmar Eggemann reports Call Trace 4 when running
    on PREEMPT_RT kernel.
    Signed-off-by: default avatarHao Jia <jiahao.os@bytedance.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: default avatarDietmar Eggemann <dietmar.eggemann@arm.com>
    Link: https://lore.kernel.org/r/20220430085843.62939-2-jiahao.os@bytedance.com
    2679a837
core.c 280 KB