Commit ad765fae authored by Chris Wilson's avatar Chris Wilson Committed by Rodrigo Vivi

drm/i915/gem: Look for waitboosting across the whole object prior to individual waits

We employ a "waitboost" heuristic to detect when userspace is stalled
waiting for results from earlier execution. Under latency sensitive work
mixed between the gpu/cpu, the GPU is typically under-utilised and so
RPS sees that low utilisation as a reason to downclock the frequency,
causing longer stalls and lower throughput. The user left waiting for
the results is not impressed.

On applying commit 047a1b87 ("dma-buf & drm/amdgpu: remove dma_resv
workaround") it was observed that deinterlacing h264 on Haswell
performance dropped by 2-5x. The reason being that the natural workload
was not intense enough to trigger RPS (using HW evaluation intervals) to
upclock, and so it was depending on waitboosting for the throughput.

Commit 047a1b87 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
changes the composition of dma-resv from keeping a single write fence +
multiple read fences, to a single array of multiple write and read
fences (a maximum of one pair of write/read fences per context). The
iteration order was also changed implicitly from all-read fences then
the single write fence, to a mix of write fences followed by read
fences. It is that ordering change that belied the fragility of
waitboosting.

Currently, a waitboost is inspected at the point of waiting on an
outstanding fence. If the GPU is backlogged such that we haven't yet
stated the request we need to wait on, we force the GPU to upclock until
the completion of that request. By changing the order in which we waited
upon requests, we ended up waiting on those requests in sequence and as
such we saw that each request was already started and so not a suitable
candidate for waitboosting.

Instead of asking whether to boost each fence in turn, we can look at
whether boosting is required for the dma-resv ensemble prior to waiting
on any fence, making the heuristic more robust to the order in which
fences are stored in the dma-resv.
Reported-by: default avatarThomas Voegtle <tv@lio96.de>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284
Fixes: 047a1b87 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: default avatarKarolina Drobnik <karolina.drobnik@intel.com>
Tested-by: default avatarThomas Voegtle <tv@lio96.de>
Reviewed-by: default avatarAndi Shyti <andi.shyti@linux.intel.com>
Acked-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/07e05518d9f6620d20cc1101ec1849203fe973f9.1657289332.git.karolina.drobnik@intel.com
(cherry picked from commit 394e2b57)
Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
parent a1c5a7bf
......@@ -9,6 +9,7 @@
#include <linux/jiffies.h>
#include "gt/intel_engine.h"
#include "gt/intel_rps.h"
#include "i915_gem_ioctls.h"
#include "i915_gem_object.h"
......@@ -31,6 +32,37 @@ i915_gem_object_wait_fence(struct dma_fence *fence,
timeout);
}
static void
i915_gem_object_boost(struct dma_resv *resv, unsigned int flags)
{
struct dma_resv_iter cursor;
struct dma_fence *fence;
/*
* Prescan all fences for potential boosting before we begin waiting.
*
* When we wait, we wait on outstanding fences serially. If the
* dma-resv contains a sequence such as 1:1, 1:2 instead of a reduced
* form 1:2, then as we look at each wait in turn we see that each
* request is currently executing and not worthy of boosting. But if
* we only happen to look at the final fence in the sequence (because
* of request coalescing or splitting between read/write arrays by
* the iterator), then we would boost. As such our decision to boost
* or not is delicately balanced on the order we wait on fences.
*
* So instead of looking for boosts sequentially, look for all boosts
* upfront and then wait on the outstanding fences.
*/
dma_resv_iter_begin(&cursor, resv,
dma_resv_usage_rw(flags & I915_WAIT_ALL));
dma_resv_for_each_fence_unlocked(&cursor, fence)
if (dma_fence_is_i915(fence) &&
!i915_request_started(to_request(fence)))
intel_rps_boost(to_request(fence));
dma_resv_iter_end(&cursor);
}
static long
i915_gem_object_wait_reservation(struct dma_resv *resv,
unsigned int flags,
......@@ -40,6 +72,8 @@ i915_gem_object_wait_reservation(struct dma_resv *resv,
struct dma_fence *fence;
long ret = timeout ?: 1;
i915_gem_object_boost(resv, flags);
dma_resv_iter_begin(&cursor, resv,
dma_resv_usage_rw(flags & I915_WAIT_ALL));
dma_resv_for_each_fence_unlocked(&cursor, fence) {
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment