• Rafael J. Wysocki's avatar
    cpufreq: Avoid leaving stale IRQ work items during CPU offline · 85572c2c
    Rafael J. Wysocki authored
    The scheduler code calling cpufreq_update_util() may run during CPU
    offline on the target CPU after the IRQ work lists have been flushed
    for it, so the target CPU should be prevented from running code that
    may queue up an IRQ work item on it at that point.
    
    Unfortunately, that may not be the case if dvfs_possible_from_any_cpu
    is set for at least one cpufreq policy in the system, because that
    allows the CPU going offline to run the utilization update callback
    of the cpufreq governor on behalf of another (online) CPU in some
    cases.
    
    If that happens, the cpufreq governor callback may queue up an IRQ
    work on the CPU running it, which is going offline, and the IRQ work
    may not be flushed after that point.  Moreover, that IRQ work cannot
    be flushed until the "offlining" CPU goes back online, so if any
    other CPU calls irq_work_sync() to wait for the completion of that
    IRQ work, it will have to wait until the "offlining" CPU is back
    online and that may not happen forever.  In particular, a system-wide
    deadlock may occur during CPU online as a result of that.
    
    The failing scenario is as follows.  CPU0 is the boot CPU, so it
    creates a cpufreq policy and becomes the "leader" of it
    (policy->cpu).  It cannot go offline, because it is the boot CPU.
    Next, other CPUs join the cpufreq policy as they go online and they
    leave it when they go offline.  The last CPU to go offline, say CPU3,
    may queue up an IRQ work while running the governor callback on
    behalf of CPU0 after leaving the cpufreq policy because of the
    dvfs_possible_from_any_cpu effect described above.  Then, CPU0 is
    the only online CPU in the system and the stale IRQ work is still
    queued on CPU3.  When, say, CPU1 goes back online, it will run
    irq_work_sync() to wait for that IRQ work to complete and so it
    will wait for CPU3 to go back online (which may never happen even
    in principle), but (worse yet) CPU0 is waiting for CPU1 at that
    point too and a system-wide deadlock occurs.
    
    To address this problem notice that CPUs which cannot run cpufreq
    utilization update code for themselves (for example, because they
    have left the cpufreq policies that they belonged to), should also
    be prevented from running that code on behalf of the other CPUs that
    belong to a cpufreq policy with dvfs_possible_from_any_cpu set and so
    in that case the cpufreq_update_util_data pointer of the CPU running
    the code must not be NULL as well as for the CPU which is the target
    of the cpufreq utilization update in progress.
    
    Accordingly, change cpufreq_this_cpu_can_update() into a regular
    function in kernel/sched/cpufreq.c (instead of a static inline in a
    header file) and make it check the cpufreq_update_util_data pointer
    of the local CPU if dvfs_possible_from_any_cpu is set for the target
    cpufreq policy.
    
    Also update the schedutil governor to do the
    cpufreq_this_cpu_can_update() check in the non-fast-switch
    case too to avoid the stale IRQ work issues.
    
    Fixes: 99d14d0e ("cpufreq: Process remote callbacks from any CPU if the platform permits")
    Link: https://lore.kernel.org/linux-pm/20191121093557.bycvdo4xyinbc5cb@vireshk-i7/Reported-by: default avatarAnson Huang <anson.huang@nxp.com>
    Tested-by: default avatarAnson Huang <anson.huang@nxp.com>
    Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
    Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
    Tested-by: Peng Fan <peng.fan@nxp.com> (i.MX8QXP-MEK)
    Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    85572c2c
cpufreq.h 946 Bytes