Commit ec6abe83 authored by Alex Sierra's avatar Alex Sierra Committed by Alex Deucher

drm/amdkfd: rm BO resv on validation to avoid deadlock

This fix the deadlock with the BO reservations during SVM_BO evictions
while allocations in VRAM are concurrently performed. More specific,
while the ttm waits for the fence to be signaled (ttm_bo_wait), it
already has the BO reserved. In parallel, the restore worker might be
running, prefetching memory to VRAM. This also requires to reserve the
BO, but blocks the mmap semaphore first. The deadlock happens when the
SVM_BO eviction worker kicks in and waits for the mmap semaphore held
in restore worker. Preventing signal the fence back, causing the
deadlock until the ttm times out.

We don't need to hold the BO reservation anymore during validation
and mapping. Now the physical addresses are taken from hmm_range_fault.
We also take migrate_mutex to prevent range migration while
validate_and_map update GPU page table.
Signed-off-by: default avatarAlex Sierra <alex.sierra@amd.com>
Signed-off-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: default avatarPhilip Yang <philip.yang@amd.com>
Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
parent 097cbf26
......@@ -1307,7 +1307,7 @@ struct svm_validate_context {
struct svm_range *prange;
bool intr;
unsigned long bitmap[MAX_GPU_INSTANCE];
struct ttm_validate_buffer tv[MAX_GPU_INSTANCE+1];
struct ttm_validate_buffer tv[MAX_GPU_INSTANCE];
struct list_head validate_list;
struct ww_acquire_ctx ticket;
};
......@@ -1334,11 +1334,6 @@ static int svm_range_reserve_bos(struct svm_validate_context *ctx)
ctx->tv[gpuidx].num_shared = 4;
list_add(&ctx->tv[gpuidx].head, &ctx->validate_list);
}
if (ctx->prange->svm_bo && ctx->prange->ttm_res) {
ctx->tv[MAX_GPU_INSTANCE].bo = &ctx->prange->svm_bo->bo->tbo;
ctx->tv[MAX_GPU_INSTANCE].num_shared = 1;
list_add(&ctx->tv[MAX_GPU_INSTANCE].head, &ctx->validate_list);
}
r = ttm_eu_reserve_buffers(&ctx->ticket, &ctx->validate_list,
ctx->intr, NULL);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment