Commit 4823e5da authored by Lucas Stach's avatar Lucas Stach Committed by Alex Deucher

drm/scheduler: fix timeout worker setup for out of order job completions

drm_sched_job_finish() is a work item scheduled for each finished job on
a unbound system workqueue. This means the workers can execute out of order
with regard to the real hardware job completions.

If this happens queueing a timeout worker for the first job on the ring
mirror list is wrong, as this may be a job which has already finished
executing. Fix this by reorganizing the code to always queue the worker
for the next job on the list, if this job hasn't finished yet. This is
robust against a potential reordering of the finish workers.

Also move out the timeout worker cancelling, so that we don't need to
take the job list lock twice. As a small optimization list_del is used
to remove the job from the ring mirror list, as there is no need to
reinit the list head in the job we are about to free.
Signed-off-by: default avatarLucas Stach <l.stach@pengutronix.de>
Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
parent 1e1dbd6f
...@@ -552,24 +552,28 @@ static void drm_sched_job_finish(struct work_struct *work) ...@@ -552,24 +552,28 @@ static void drm_sched_job_finish(struct work_struct *work)
finish_work); finish_work);
struct drm_gpu_scheduler *sched = s_job->sched; struct drm_gpu_scheduler *sched = s_job->sched;
/* remove job from ring_mirror_list */ /*
spin_lock(&sched->job_list_lock); * Canceling the timeout without removing our job from the ring mirror
list_del_init(&s_job->node); * list is safe, as we will only end up in this worker if our jobs
if (sched->timeout != MAX_SCHEDULE_TIMEOUT) { * finished fence has been signaled. So even if some another worker
struct drm_sched_job *next; * manages to find this job as the next job in the list, the fence
* signaled check below will prevent the timeout to be restarted.
spin_unlock(&sched->job_list_lock); */
cancel_delayed_work_sync(&s_job->work_tdr); cancel_delayed_work_sync(&s_job->work_tdr);
spin_lock(&sched->job_list_lock);
spin_lock(&sched->job_list_lock);
/* queue TDR for next job */ /* queue TDR for next job */
next = list_first_entry_or_null(&sched->ring_mirror_list, if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
struct drm_sched_job, node); !list_is_last(&s_job->node, &sched->ring_mirror_list)) {
struct drm_sched_job *next = list_next_entry(s_job, node);
if (next) if (!dma_fence_is_signaled(&next->s_fence->finished))
schedule_delayed_work(&next->work_tdr, sched->timeout); schedule_delayed_work(&next->work_tdr, sched->timeout);
} }
/* remove job from ring_mirror_list */
list_del(&s_job->node);
spin_unlock(&sched->job_list_lock); spin_unlock(&sched->job_list_lock);
dma_fence_put(&s_job->s_fence->finished); dma_fence_put(&s_job->s_fence->finished);
sched->ops->free_job(s_job); sched->ops->free_job(s_job);
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment