• Douglas Anderson's avatar
    drm/msm: Avoid unclocked GMU register access in 6xx gpu_busy · 6694482a
    Douglas Anderson authored
    From testing on sc7180-trogdor devices, reading the GMU registers
    needs the GMU clocks to be enabled. Those clocks get turned on in
    a6xx_gmu_resume(). Confusingly enough, that function is called as a
    result of the runtime_pm of the GPU "struct device", not the GMU
    "struct device". Unfortunately the current a6xx_gpu_busy() grabs a
    reference to the GMU's "struct device".
    
    The fact that we were grabbing the wrong reference was easily seen to
    cause crashes that happen if we change the GPU's pm_runtime usage to
    not use autosuspend. It's also believed to cause some long tail GPU
    crashes even with autosuspend.
    
    We could look at changing it so that we do pm_runtime_get_if_in_use()
    on the GPU's "struct device", but then we run into a different
    problem. pm_runtime_get_if_in_use() will return 0 for the GPU's
    "struct device" the whole time when we're in the "autosuspend
    delay". That is, when we drop the last reference to the GPU but we're
    waiting a period before actually suspending then we'll think the GPU
    is off. One reason that's bad is that if the GPU didn't actually turn
    off then the cycle counter doesn't lose state and that throws off all
    of our calculations.
    
    Let's change the code to keep track of the suspend state of
    devfreq. msm_devfreq_suspend() is always called before we actually
    suspend the GPU and msm_devfreq_resume() after we resume it. This
    means we can use the suspended state to know if we're powered or not.
    
    NOTE: one might wonder when exactly our status function is called when
    devfreq is supposed to be disabled. The stack crawl I captured was:
      msm_devfreq_get_dev_status
      devfreq_simple_ondemand_func
      devfreq_update_target
      qos_notifier_call
      qos_max_notifier_call
      blocking_notifier_call_chain
      pm_qos_update_target
      freq_qos_apply
      apply_constraint
      __dev_pm_qos_update_request
      dev_pm_qos_update_request
      msm_devfreq_idle_work
    
    Fixes: eadf7928 ("drm/msm: Check for powered down HW in the devfreq callbacks")
    Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
    Reviewed-by: default avatarRob Clark <robdclark@gmail.com>
    Patchwork: https://patchwork.freedesktop.org/patch/489124/
    Link: https://lore.kernel.org/r/20220610124639.v4.1.Ie846c5352bc307ee4248d7cab998ab3016b85d06@changeidSigned-off-by: default avatarRob Clark <robdclark@chromium.org>
    6694482a
a6xx_gpu.c 58.3 KB