Commits · f922fbb0f2ad1fd3e3186f39c46673419e6d9281 · Kirill Smelkov / linux

17 Aug, 2022 11 commits

drm/i915/guc: clear stalled request after a reset · f922fbb0

Daniele Ceraolo Spurio authored Aug 11, 2022

If the GuC CTs are full and we need to stall the request submission
while waiting for space, we save the stalled request and where the stall
occurred; when the CTs have space again we pick up the request submission
from where we left off.

If a full GT reset occurs, the state of all contexts is cleared and all
non-guilty requests are unsubmitted, therefore we need to restart the
stalled request submission from scratch. To make sure that we do so,
clear the saved request after a reset.

Fixes note: the patch that introduced the bug is in 5.15, but no
officially supported platform had GuC submission enabled by default
in that kernel, so the backport to that particular version (and only
that one) can potentially be skipped.

Fixes: 925dc1cf ("drm/i915/guc: Implement GuC submission tasklet")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: <stable@vger.kernel.org> # v5.15+
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220811210812.3239621-1-daniele.ceraolospurio@intel.com

f922fbb0

drm/i915/guc: skip scrub_ctbs selftest if reset is disabled · b0f2eb94

Daniele Ceraolo Spurio authored Jul 08, 2022

The test needs GT reset to trigger the scrubbing logic, so we can only
run it when reset is enabled.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220708224158.929327-1-daniele.ceraolospurio@intel.com

b0f2eb94

drm/i915/guc: Reduce spam from error capture · b092e4a9

John Harrison authored Jul 27, 2022

Some debug code got left in when the GuC based register save for error
capture was added. Remove that.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728022028.2190627-8-John.C.Harrison@Intel.com

b092e4a9

drm/i915/guc: Make GuC log sizes runtime configurable · 8ad0152a

John Harrison authored Jul 27, 2022

The GuC log buffer sizes had to be configured statically at compile
time. This can be quite troublesome when needing to get larger logs
out of a released driver. So re-organise the code to allow a boot time
module parameter override.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728022028.2190627-7-John.C.Harrison@Intel.com

8ad0152a

drm/i915/guc: Use streaming loads to speed up dumping the guc log · 5ece208a

Chris Wilson authored Jul 27, 2022

Use a temporary page and mempy_from_wc to reduce the time it takes to
dump the guc log to debugfs.
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728022028.2190627-6-John.C.Harrison@Intel.com

5ece208a

drm/i915/guc: Record CTB info in error logs · c5de70f6

John Harrison authored Jul 27, 2022

When debugging GuC communication issues, it is useful to have the CTB
info available. So add the state and buffer contents to the error
capture log.

Also, add a sub-structure for the GuC specific error capture info as
it is now becoming numerous.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728022028.2190627-5-John.C.Harrison@Intel.com

c5de70f6

drm/i915/guc: Add GuC <-> kernel time stamp translation information · 368d179a

John Harrison authored Jul 27, 2022

It is useful to be able to match GuC events to kernel events when
looking at the GuC log. That requires being able to convert GuC
timestamps to kernel time. So, when dumping error captures and/or GuC
logs, include a stamp in both time zones plus the clock frequency.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728022028.2190627-4-John.C.Harrison@Intel.com

368d179a

drm/i915/guc: Fix capture size warning and bump the size · 56c7f0e2

John Harrison authored Jul 27, 2022

There was a size check to warn if the GuC error state capture buffer
allocation would be too small to fit a reasonable amount of capture
data for the current platform. Unfortunately, the test was done too
early in the boot sequence and was actually testing 'if(-ENODEV >
size)'.

Move the check to be later. The check is only used to print a warning
message, so it doesn't really matter how early or late it is done.
Note that it is not possible to dynamically size the buffer because
the allocation needs to be done before the engine information is
available (at least, it would be in the intended two-phase GuC init
process).

Now that the check works, it is reporting size too small for newer
platforms. The check includes a 3x oversample multiplier to allow for
multiple error captures to be bufferd by GuC before i915 has a chance
to read them out. This is less important than simply being big enough
to fit the first capture.

So a) bump the default size to be large enough for one capture minimum
and b) make the warning only if one capture won't fit, instead use a
notice for the 3x size.

Note that the size estimate is a worst case scenario. Actual captures
will likely be smaller.

Lastly, use drm_warn istead of DRM_WARN as the former provides more
infmration and the latter is deprecated.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728022028.2190627-3-John.C.Harrison@Intel.com

56c7f0e2

drm/i915/guc: Add a helper for log buffer size · 5ce27d62

Alan Previn authored Jul 27, 2022

Add a helper to get GuC log buffer size.
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728022028.2190627-2-John.C.Harrison@Intel.com

5ce27d62

drm/i915/dg2: Add additional tuning settings · 6dc85721

Matt Roper authored Aug 16, 2022

Some additional MMIO tuning settings have appeared in the bspec's
performance tuning guide section.

One of the tuning settings here is also documented as formal workaround
Wa_22012654132 for some steppings of DG2.  However the tuning setting
applies to all DG2 variants and steppings, making it a superset of the
workaround.

v2:
 - Move DRAW_WATERMARK to engine workaround section.  It only moves into
   the engine context on future platforms.  (Lucas)
 - CHICKEN_RASTER_2 needs to be handled as a masked register.  (Lucas)

Bspec: 68331
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220816210601.2041572-2-matthew.d.roper@intel.com

6dc85721

drm/i915/gt: Add dedicated function for non-ctx register tuning settings · 7e55536c

Matt Roper authored Aug 16, 2022

The bspec performance tuning section gives recommended settings that the
driver should program for various MMIO registers. Although these
settings aren't "workarounds" we use the workaround infrastructure to do
this programming to make sure it is handled at the appropriate places
and doesn't conflict with any real workarounds.

Since more of these are starting to show up on recent platforms, it's a
good time to create a dedicated function to hold them so that there's
less ambiguity about how/where to implement new ones.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220816210601.2041572-1-matthew.d.roper@intel.com

7e55536c

09 Aug, 2022 2 commits

drm/i915/ttm: fix CCS handling · 8676145e

Matthew Auld authored Aug 05, 2022

Crucible + recent Mesa seems to sometimes hit:

GEM_BUG_ON(num_ccs_blks > NUM_CCS_BLKS_PER_XFER)

And it looks like we can also trigger this with gem_lmem_swapping, if we
modify the test to use slightly larger object sizes.

Looking closer it looks like we have the following issues in
migrate_copy():

  - We are using plain integer in various places, which we can easily
    overflow with a large object.

  - We pass the entire object size (when the src is lmem) into
    emit_pte() and then try to copy it, which doesn't work, since we
    only have a few fixed sized windows in which to map the pages and
    perform the copy. With an object > 8M we therefore aren't properly
    copying the pages. And then with an object > 64M we trigger the
    GEM_BUG_ON(num_ccs_blks > NUM_CCS_BLKS_PER_XFER).

So it looks like our copy handling for any object > 8M (which is our
CHUNK_SZ) is currently broken on DG2.

Fixes: da0595ae ("drm/i915/migrate: Evict and restore the flatccs capable lmem obj")
Testcase: igt@gem_lmem_swapping
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C<ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220805132240.442747-2-matthew.auld@intel.com

8676145e

drm/i915/ttm: remove calc_ctrl_surf_instr_size · dba4d442

Matthew Auld authored Aug 05, 2022

We only ever need to emit one ccs block copy command.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C<ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220805132240.442747-1-matthew.auld@intel.com

dba4d442

08 Aug, 2022 1 commit

drm/i915: pass a pointer for tlb seqno at vma_invalidate_tlb() · 3d037d99

Mauro Carvalho Chehab authored Aug 04, 2022

WRITE_ONCE() should happen at the original var, not on a local
copy of it.

Cc: stable@vger.kernel.org
Fixes: 5d36acb7 ("drm/i915/gt: Batch TLB invalidations")
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[added cc-stable while merging it]
Link: https://patchwork.freedesktop.org/patch/msgid/f9550e6bacea10131ff40dd8981b69eb9251cdcd.1659598090.git.mchehab@kernel.org

3d037d99

03 Aug, 2022 1 commit

drm/i915/gem: Remove shared locking on freeing objects · 7dd5c565

Chris Wilson authored Jul 26, 2022

The obj->base.resv may be shared across many objects, some of which may
still be live and locked, preventing objects from being freed
indefintely. We could individualise the lock during the free, or rely on
a freed object having no contention and being able to immediately free
the pages it owns.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/6469
Fixes: be7612fd ("drm/i915: Require object lock when freeing pages during destruction")
Fixes: 6cb12fbd ("drm/i915: Use trylock instead of blocking lock for __i915_gem_free_objects.")
Cc: <stable@vger.kernel.org> # v5.17+
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Nirmoy Das <nirmoy.das@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220726144844.18429-1-nirmoy.das@intel.com

7dd5c565

02 Aug, 2022 2 commits

drm/i915/dg2: Add Wa_1509727124 · ae5a3d2c

Harish Chegondi authored Aug 01, 2022

Bspec: 46052
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220801213839.8549-1-harish.chegondi@intel.com

ae5a3d2c

drm/i915/dg2: Update DG2 to GuC v70.4.1 · 2775e201

John Harrison authored Jul 28, 2022

New release of GuC with a bunch of fixes specific to DG2. Some of
these require follow up i915 changes to enable.

Note also that it is not necessary to maintain backwards compatibility
with 70.1.2 for DG2 because DG2 is still under force probe protection.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728230722.2749701-2-John.C.Harrison@Intel.com

2775e201

01 Aug, 2022 1 commit

drm/i915/guc: Don't send policy update for child contexts. · 6c82c752

Daniele Ceraolo Spurio authored Jul 27, 2022

The GuC FW applies the parent context policy to all the children,
so individual updates to the children are not supported and we
should not send them.

Note that sending the message did not have any functional consequences,
because the GuC just drops it and logs an error; since we were trying
to set the child policy to match the parent anyway the message being
dropped was not a problem.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728003339.2361010-1-daniele.ceraolospurio@intel.com

6c82c752

29 Jul, 2022 6 commits

drm/i915/guc: Don't abort on CTB_UNUSED status · dd9d3cbe

John Harrison authored Jul 27, 2022

When the KMD sends a CLIENT_RESET request to GuC (as part of the
suspend sequence), GuC will mark the CTB buffer as 'UNUSED'. If the
KMD then checked the CTB queue, it would see a non-zero status value
and report the buffer as corrupted.

Technically, no G2H messages should be received once the CLIENT_RESET
has been sent. However, if a context was outstanding on an engine then
it would get reset and a reset notification would be sent. So, don't
actually treat UNUSED as a catastrophic error. Just flag it up as
unexpected and keep going.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728024225.2363663-7-John.C.Harrison@Intel.com

dd9d3cbe

drm/i915/guc: Support larger contexts on newer hardware · 52d4cfdc

Matthew Brost authored Jul 27, 2022

The GuC needs a copy of a golden context for implementing watchdog
resets (aka media resets). This context is larger on newer platforms.
So adjust the size being allocated/copied accordingly.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728024225.2363663-6-John.C.Harrison@Intel.com

52d4cfdc

drm/i915/selftest: Cope with not having an RCS engine · a96d8f05

John Harrison authored Jul 27, 2022

It is no longer guaranteed that there will always be an RCS engine.
So, use the helper function for finding the first available engine that
can be used for general purpose selftets.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728024225.2363663-5-John.C.Harrison@Intel.com

a96d8f05

drm/i915/guc: Add selftest for a hung GuC · 69142c0a

Rahul Kumar Singh authored Jul 28, 2022

Add a test to check that the hangcheck will recover from a submission
hang in the GuC.
Signed-off-by: Rahul Kumar Singh <rahul.kumar.singh@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728182616.2417491-1-John.C.Harrison@Intel.com

69142c0a

drm/i915/guc: Fix issues with live_preempt_cancel · 15c5401d

Matthew Brost authored Jul 27, 2022

Having semaphores results in different behavior when a dependent request
is cancelled. In the case of semaphores the request could be on the HW
and complete successfully while without the request is held in the
driver and the error from the dependent request is propagated. Fix
live_preempt_cancel to take this behavior into account.

Also update live_preempt_cancel to use new function intel_context_ban
rather than intel_context_set_banned.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728024225.2363663-3-John.C.Harrison@Intel.com

15c5401d

drm/i915/guc: Route semaphores to GuC for Gen12+ · 9fb34737

Michał Winiarski authored Jul 27, 2022

In GuC submission mode, there is an option to use auto-switch out
semaphores and have GuC auto-switch in a waiting context. This
requires routing the semaphore interrupt to GuC.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220728024225.2363663-2-John.C.Harrison@Intel.com

9fb34737

28 Jul, 2022 10 commits

drm/i915/guc: Check for ct enabled while waiting for response · 22645976

Zhanjun Dong authored Jul 15, 2022

We are seeing error message of "No response for request". Some cases
happened while waiting for response and reset/suspend action was triggered.
In this case, no response is not an error, active requests will be
cancelled.

This patch will handle this condition and change the error message into
debug message.
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220715211313.143645-1-zhanjun.dong@intel.com

22645976

drm/i915/gt: describe the new tlb parameter at i915_vma_resource · d1051db8

Mauro Carvalho Chehab authored Jul 27, 2022

TLB cache invalidation can happen on two different situations:

1. synchronously, at __vma_put_pages();
2. asynchronously.

On the first case, TLB cache invalidation happens inside
__vma_put_pages(). So, no need to do it later on.

However, on the second case, the pages will keep in memory
until __i915_vma_evict() is called.

So, we need to store the TLB data at struct i915_vma_resource,
in order to do a TLB cache invalidation before allowing
userspace to re-use the same memory.

So, i915_vma_resource_unbind() has gained a new parameter
in order to store the TLB data at the second case.

Document it.
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/aa55eef7e63b8f3d0f69b525db2dd2eb87e9db6b.1658924372.git.mchehab@kernel.org

d1051db8

drm/i915/gt: Batch TLB invalidations · 5d36acb7

Chris Wilson authored Jul 27, 2022

Invalidate TLB in batches, in order to reduce performance regressions.

Currently, every caller performs a full barrier around a TLB
invalidation, ignoring all other invalidations that may have already
removed their PTEs from the cache. As this is a synchronous operation
and can be quite slow, we cause multiple threads to contend on the TLB
invalidate mutex blocking userspace.

We only need to invalidate the TLB once after replacing our PTE to
ensure that there is no possible continued access to the physical
address before releasing our pages. By tracking a seqno for each full
TLB invalidate we can quickly determine if one has been performed since
rewriting the PTE, and only if necessary trigger one for ourselves.

That helps to reduce the performance regression introduced by TLB
invalidate logic.

[mchehab: rebased to not require moving the code to a separate file]

Cc: stable@vger.kernel.org
Fixes: 7938d615 ("drm/i915: Flush TLBs before releasing backing store")
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/4e97ef5deb6739cadaaf40aa45620547e9c4ec06.1658924372.git.mchehab@kernel.org

5d36acb7

drm/i915/gt: Skip TLB invalidations once wedged · be0366f1

Chris Wilson authored Jul 27, 2022

Skip all further TLB invalidations once the device is wedged and
had been reset, as, on such cases, it can no longer process instructions
on the GPU and the user no longer has access to the TLB's in each engine.

So, an attempt to do a TLB cache invalidation will produce a timeout.

That helps to reduce the performance regression introduced by TLB
invalidate logic.

Cc: stable@vger.kernel.org
Fixes: 7938d615 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/5aa86564b9ec5fe7fe605c1dd7de76855401ed73.1658924372.git.mchehab@kernel.org

be0366f1

drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations · dfc83de1

Chris Wilson authored Jul 27, 2022

Ensure that the TLB of the OA unit is also invalidated
on gen12 HW, as just invalidating the TLB of an engine is not
enough.

Cc: stable@vger.kernel.org
Fixes: 7938d615 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/59724d9f5cf1e93b1620d01b8332ac991555283d.1658924372.git.mchehab@kernel.org

dfc83de1

drm/i915/gt: document with_intel_gt_pm_if_awake() · 4d87d362

Mauro Carvalho Chehab authored Jul 27, 2022

Add a kernel-doc markup to document this new macro.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/b974905bd0f6b5308b91561cc85eeecd94f1452a.1658924372.git.mchehab@kernel.org

4d87d362

drm/i915/gt: Ignore TLB invalidations on idle engines · 4bedceae

Chris Wilson authored Jul 27, 2022

Check if the device is powered down prior to any engine activity,
as, on such cases, all the TLBs were already invalidated, so an
explicit TLB invalidation is not needed, thus reducing the
performance regression impact due to it.

This becomes more significant with GuC, as it can only do so when
the connection to the GuC is awake.

Cc: stable@vger.kernel.org
Fixes: 7938d615 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/278a57a672edac75683f0818b292e95da583a5fe.1658924372.git.mchehab@kernel.org

4bedceae

drm/i915/ttm: don't leak the ccs state · 353819d8

Matthew Auld authored Jul 27, 2022

The kernel only manages the ccs state with lmem-only objects, however
the kernel should still take care not to leak the CCS state from the
previous user.

Fixes: 48760ffe ("drm/i915/gt: Clear compress metadata for Flat-ccs objects")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Ramalingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220727164346.282407-1-matthew.auld@intel.com

353819d8

drm/i915: disable pci resize on 32-bit machine · f5dfbfc0

Nirmoy Das authored Jul 27, 2022

PCI bar resize only works with 64 bit BAR so disable
this on 32-bit machine and resolve below compilation error:

drivers/gpu/drm/i915/gt/intel_region_lmem.c:94:23: error: result of
comparison of constant 4294967296 with expression of type
'resource_size_t' (aka 'unsigned int') is always false
[-Werror,-Wtautological-constant-out-of-range-compare]
                    root_res->start > 0x100000000ull)

Fixes: a91d1a17 ("drm/i915: Add support for LMEM PCIe resizable bar")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Acked-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220727173306.16247-1-nirmoy.das@intel.com

f5dfbfc0

drm/i915: Suppress oom warning for shmemfs object allocation failure · a8c18bec

Chris Wilson authored Jul 27, 2022

We report object allocation failures to userspace with ENOMEM, yet we
still show the memory warning after failing to shrink device allocated
pages. While this warning is similar to other system page allocation
failures, it is superfluous to the ENOMEM provided directly to
userspace.

v2: Add NOWARN in few more places from where we might return
    ENOMEM to userspace.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4936Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Co-developed-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220727174023.16766-1-nirmoy.das@intel.com

a8c18bec

21 Jul, 2022 2 commits

drm/i915/selftests: Fix comment typo · b25c377a

Jason Wang authored Jul 16, 2022

Fix the double `wait' typo in comment.
Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220716040520.31676-1-wangborong@cdjrlc.com

b25c377a

drm/i915/gt: Remove unneeded semicolon · 2be1959e

Jason Wang authored Jul 17, 2022

The semicolon after the `}' is unneeded.
Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Removed line mention when pushing]
Link: https://patchwork.freedesktop.org/patch/msgid/20220716184439.72056-1-wangborong@cdjrlc.com

2be1959e

20 Jul, 2022 1 commit

drm/i915/guc: Don't use pr_err when not necessary · a4a43070

John Harrison authored Jul 14, 2022

A bunch of code was copy/pasted using pr_err as the default way to
report errors. However, drm_err is significantly more useful in
identifying where the error came from. So update the code to use that
instead.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220715004028.2126239-1-John.C.Harrison@Intel.com

a4a43070

19 Jul, 2022 3 commits

drm/i915/guc: support v69 in parallel to v70 · 774ce151

Daniele Ceraolo Spurio authored Jul 18, 2022

This patch re-introduces support for GuC v69 in parallel to v70. As this
is a quick fix, v69 has been re-introduced as the single "fallback" guc
version in case v70 is not available on disk and only for platforms that
are out of force_probe and require the GuC by default. All v69 specific
code has been labeled as such for easy identification, and the same was
done for all v70 functions for which there is a separate v69 version,
to avoid accidentally calling the wrong version via the unlabeled name.

When the fallback mode kicks in, a drm_notice message is printed in
dmesg to inform the user of the required update. The existing
logging of the fetch function has also been updated so that we no
longer complain immediately if we can't find a fw and we only throw an
error if the fetch of both the base and fallback blobs fails.

The plan is to follow this up with a more complex rework to allow for
multiple different GuC versions to be supported at the same time.

v2: reduce the fallback to platform that require it, switch to
firmware_request_nowarn(), improve logs.

Fixes: 2584b354 ("drm/i915/guc: Update to GuC version 70.1.1")
Link: https://lists.freedesktop.org/archives/intel-gfx/2022-July/301640.htmlSigned-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220718230732.1409641-1-daniele.ceraolospurio@intel.com

774ce151

drm/i915/gt: Expose per-gt RPS defaults in sysfs · fdff0a85

Ashutosh Dixit authored Jul 18, 2022

Add the following sysfs files to gt/gtN/.defaults/:
* rps_min_freq_mhz
* rps_max_freq_mhz

v2: Correct gt/gtN/.defaults/* file names in commit message
v3: Remove rps_boost_freq_mhz since it is not consumed by userspace

Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/cf6e483bf79f871c2c8c74af6005bf6a83a3a1ce.1658192398.git.ashutosh.dixit@intel.com

fdff0a85

drm/i915/gt: Create gt/gtN/.defaults/ for per gt sysfs defaults · 5dca122f

Ashutosh Dixit authored Jul 18, 2022

Create a gt/gtN/.defaults/ directory (similar to
engine/<engine-name>/.defaults/) to expose default parameter values for
each gt in sysfs. This allows userspace to restore default parameter values
after they have changed. The empty 'struct gt_defaults' will be populated
by subsequent patches.

v2: Changed 'struct intel_rps_defaults rps_defaults' to
    'struct gt_defaults defaults' (Andi)

Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/be7c30d0ae58be9d8d5b8242ba00a1b2825e63ad.1658192398.git.ashutosh.dixit@intel.com

5dca122f