Commits · 8e2e9c435e2bdcf3cbd53a0e82877616ae9a513a · Kirill Smelkov / linux

03 Mar, 2022 5 commits

drm/i915/guc: Move lrc desc setup to where it is needed · 8e2e9c43

John Harrison authored Mar 01, 2022

The LRC descriptor was being initialised early on in the context
registration sequence. It could then be determined that the actual
registration needs to be delayed and the descriptor would be wiped
out. This is inefficient, so move the setup to later in the process
after the point of no return.

v2: Move some split changes into the split patch (and do them
correctly).
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220302003357.4188363-6-John.C.Harrison@Intel.com

8e2e9c43

drm/i915/guc: Split guc_lrc_desc_pin apart · 58ea7d62

John Harrison authored Mar 01, 2022

The LRC descriptor pool is going away. Further, the function that was
populating it was also doing a bunch of logic about the context
registration sequence. So, split that code apart into separate state
setup and try to register functions. Note that some of those 'try to
register' code paths actually undo the state setup and leave it to be
redone again later (with potentially different values). This is
inefficient. The next patch will correct this.

Also, move a comment about ignoring return values to the place where
the return values are actually ignored.

v2: Move some more splitting from a later patch (and do it correctly).
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220302003357.4188363-5-John.C.Harrison@Intel.com

58ea7d62

drm/i915/guc: Better name for context id limit · d1249022

John Harrison authored Mar 01, 2022

The LRC descriptor pool is going away. So, stop using it as the limit
for how many context ids are available. Instead, size the pool
according to the number of contexts allowed. Note that this is just a
naming change, the actual limit is identical in value.

While at it, also update a kzalloc(sizeof()*count) to be a
kcalloc(count,size).
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220302003357.4188363-4-John.C.Harrison@Intel.com

d1249022

drm/i915/guc: Add an explicit 'submission_initialized' flag · 09570c50

John Harrison authored Mar 01, 2022

The LRC descriptor pool is going away. So, stop using it as a check
for whether submission has been initialised or not.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220302003357.4188363-3-John.C.Harrison@Intel.com

09570c50

drm/i915/guc: Do not conflate lrc_desc with GuC id for registration · 02942b42

John Harrison authored Mar 01, 2022

The LRC descriptor pool is going away. So, stop using it as a check for
context registration, use the GuC id instead (being the thing that
actually gets registered with the GuC).

Also, rename the set/clear/query helper functions for context id
mappings to better reflect their purpose and to differentiate from
other registration related helper functions.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220302003357.4188363-2-John.C.Harrison@Intel.com

02942b42

02 Mar, 2022 14 commits

drm/i915/xehpsdv: Move render/compute engine reset domains related workarounds · b2006061

Srinivasan Shanmugam authored Mar 01, 2022

Registers that exist in the shared render/compute reset domain need to
be placed on an engine workaround list to ensure that they are properly
re-applied whenever an RCS or CCS engine is reset.  We have a number of
workarounds (updating registers MLTICTXCTL, L3SQCREG1_CCS0,
GEN12_MERT_MOD_CTRL, and GEN12_GAMCNTRL_CTRL) that are incorrectly
implemented on the 'gt' workaround list and need to be moved
accordingly.

Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.s@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-14-matthew.d.roper@intel.com

b2006061

drm/i915/xehp: Add compute workarounds · ff6b19d3

Matt Roper authored Mar 01, 2022

Additional workarounds are required once we start exposing CCS engines.

Note that we have a number of workarounds that update registers in the
shared render/compute reset domain. Historically we've just added such
registers to the RCS engine's workaround list. But going forward we
should be more careful to place such workarounds on a wa_list for an
engine that definitely exists and is not fused off (e.g., a platform
with no RCS would never apply the RCS wa_list). We'll keep
rcs_engine_wa_init() focused on RCS-specific workarounds that only need
to be applied if the RCS engine is present. A separate
general_render_compute_wa_init() function will be used to define
workarounds that touch registers in the shared render/compute reset
domain and that we need to apply regardless of what render and/or
compute engines actually exist. Any workarounds defined in this new
function will internally be added to the first present RCS or CCS
engine's workaround list to ensure they get applied (and only get
applied once rather than being needlessly re-applied several times).

Co-author: Srinivasan Shanmugam
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-13-matthew.d.roper@intel.com

ff6b19d3

drm/i915/xehp: handle fused off CCS engines · 88ed07cb

Daniele Ceraolo Spurio authored Mar 01, 2022

HW resources are divided across the active CCS engines at the compute
slice level, with each CCS having priority on one of the cslices.
If a compute slice has no enabled DSS, its paired compute engine is not
usable in full parallel execution because the other ones already fully
saturate the HW, so consider it fused off.

v2 (José):
 - moved it to its own function
 - fixed definition of ccs_mask

v3 (Matt):
 - Replace fls() condition with a simple IP version test

v4 (Matt):
 - Don't try to calculate a ccs_mask using
   intel_slicemask_from_dssmask() until we've determined that we're
   running on an Xe_HP platform where the logic makes sense (and won't
   overflow).

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220302052008.1884985-1-matthew.d.roper@intel.com

88ed07cb

drm/i915/xehp: Don't support parallel submission on compute / render · e393e2aa

Matthew Brost authored Mar 01, 2022

A different emit breadcrumbs ring programming is required for compute /
render and we don't have UMD user so just reject parallel submission for
these engine classes.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-11-matthew.d.roper@intel.com

e393e2aa

drm/i915/xehp/guc: enable compute engine inside GuC · ea4ca894

Daniele Ceraolo Spurio authored Mar 01, 2022

Tell GuC that CCS is enabled by setting the CCS mask in its ADS.

Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Original-author: Michel Thierry
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-10-matthew.d.roper@intel.com

ea4ca894

drm/i915/xehp: Enable ccs/dual-ctx in RCU_MODE · 87cb6d80

Matt Roper authored Mar 01, 2022

We have to specify in the Render Control Unit Mode register
when CCS is enabled.

v2:
 - Move RCU_MODE programming to a helper function.  (Tvrtko)
 - Clean up and clarify comments.  (Tvrtko)
 - Add RCU_MODE to the GuC save/restore list.  (Daniele)
v3:
 - Move this patch before the GuC ADS update to enable compute engines;
   the definition of RCU_MODE and its insertion into the save/restore
   list moves to this patch.  (Daniele)
v4:
 - Call xehp_enable_ccs_engines() directly in guc_resume() and
   execlists_resume() rather than adding an extra layer of wrapping to
   the engine->resume() vfunc.  (Umesh)

Bspec: 46034
Original-author: Michel Thierry
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220302001554.1836066-1-matthew.d.roper@intel.com

87cb6d80

drm/i915/xehp: Define context scheduling attributes in lrc descriptor · adfadb56

Matt Roper authored Mar 01, 2022

In Dual Context mode the EUs are shared between render and compute
command streamers. The hardware provides a field in the lrc descriptor
to indicate the prioritization of the thread dispatch associated to the
corresponding context.

The context priority is set to 'low' at creation time and relies on the
existing context priority to set it to low/normal/high.

Bspec: 46145, 46260
Original-author: Michel Thierry
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Prasad Nallani <prasad.nallani@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-8-matthew.d.roper@intel.com

adfadb56

drm/i915: Move context descriptor fields to intel_lrc.h · f4c1fdb9

Matt Roper authored Mar 01, 2022

This is a more appropriate header for these definitions.

v2:
 - Cleanup whitespace. (Lucas)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-7-matthew.d.roper@intel.com

f4c1fdb9

drm/i915/xehp: CCS should use RCS setup functions · c674c5b9

Matt Roper authored Mar 01, 2022

The compute engine handles the same commands the render engine can
(except 3D pipeline), so it makes sense that CCS is more similar to RCS
than non-render engines.

The CCS context state (lrc) is also similar to the render one, so reuse
it. Note that the compute engine has its own CTX_R_PWR_CLK_STATE
register.

In order to avoid having multiple RCS && CCS checks, add the following
engine flag:
 - I915_ENGINE_HAS_RCS_REG_STATE - use the render (larger) reg state ctx.

BSpec: 46260
Original-author: Michel Thierry
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-6-matthew.d.roper@intel.com

c674c5b9

drm/i915/xehp: compute engine pipe_control · 803efd29

Daniele Ceraolo Spurio authored Mar 01, 2022

CCS will reuse the RCS functions for breadcrumb and flush emission.
However, CCS pipe_control has additional programming restrictions:
 - Command Streamer Stall Enable must be always set
 - Post Sync Operations must not be set to Write PS Depth Count
 - 3D-related bits must not be set

v2:
 - Drop unwanted blank line.  (Lucas)

Bspec: 47112
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-5-matthew.d.roper@intel.com

803efd29

drm/i915/xehp: Add Compute CS IRQ handlers · 505c4857

Matt Roper authored Mar 01, 2022

Add execlists and GuC interrupts for compute CS into existing IRQ handlers.

All compute command streamers belong to the same compute class, so the
only change needed to enable their interrupts is to program their GT engine
interrupt mask registers.

CCS0 shares the register with CCS1, while CCS2 and CCS3 are in a new one.

BSpec: 50844, 54029, 54030, 53223, 53224.
Original-author: Michel Thierry
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-4-matthew.d.roper@intel.com

505c4857

drm/i915/xehp: CCS shares the render reset domain · 4b88ad50

Matt Roper authored Mar 01, 2022

The reset domain is shared between render and all compute engines,
so resetting one will affect the others.

Note:  Before performing a reset on an RCS or CCS engine, the GuC will
attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid
impacting other clients (since some shared modules will be reset).  If
other engines are executing non-preemptable workloads, the impact is
unavoidable and some work may be lost.

Bspec: 52549
Original-author: Michel Thierry
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-3-matthew.d.roper@intel.com

4b88ad50

drm/i915/xehp: Define compute class and engine · 944823c9

Matt Roper authored Mar 01, 2022

Introduce a Compute Command Streamer (CCS), which has access to
the media and GPGPU pipelines (but not the 3D pipeline).

To begin with, define the compute class/engine common functions, based
on the existing render ones.

v2:
 - Add kerneldoc for drm_i915_gem_engine_class since we're adding a new
   element to it.  (Daniel)
 - Make engine class <-> guc class converters use lookup tables to make
   it more clear/explicit how the IDs map.  (Tvrtko)

v3:
 - Don't update uapi for now; we'll just include the driver-internal
   changes for the time being.

Bspec: 46167, 45544
Original-author: Michel Thierry
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20220301231549.1817978-2-matthew.d.roper@intel.com

944823c9

drm/i915: Depend on !PREEMPT_RT. · a8b2b8b0

Sebastian Andrzej Siewior authored Feb 14, 2022

There are a few sections in the driver which are not compatible with
PREEMPT_RT. They trigger warnings and can lead to deadlocks at runtime.

Disable the i915 driver on a PREEMPT_RT enabled kernel. This way
PREEMPT_RT itself can be enabled without needing to address the i915
issues first. The RT related patches are still in RT queue and will be
handled later.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YgqmfKhwU5spS069@linutronix.de

a8b2b8b0

01 Mar, 2022 7 commits

drm/i915/guc: Do not complain about stale reset notifications · e2a1e7ab

John Harrison authored Feb 24, 2022

It is possible for reset notifications to arrive for a context that is
in the process of being banned. So don't flag these as an error, just
report it as informational (because it is still useful to know that
resets are happening even if they are being ignored).

v2: Better wording for the message (review feedback from Tvrtko).
v3: Fix rebase issue (review feedback from Daniele).
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225015232.1939497-1-John.C.Harrison@Intel.com

e2a1e7ab

drm/i915/guc: Initialize GuC submission locks and queues early · e068ef3f

Daniele Ceraolo Spurio authored Feb 14, 2022

Move initialization of submission-related spinlock, lists and workers to
init_early. This fixes an issue where if the GuC init fails we might
still try to get the lock in the context cleanup code. Note that it is
safe to call the GuC context cleanup code even if the init failed
because all contexts are initialized with an invalid GuC ID, which will
cause the GuC side of the cleanup to be skipped, so it is easier to just
make sure the variables are initialized than to special case the cleanup
to handle the case when they're not.

References: https://gitlab.freedesktop.org/drm/intel/-/issues/4932Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220215011123.734572-1-daniele.ceraolospurio@intel.com

e068ef3f

drm/i915/guc: Fix flag query helper function to not modify state · eee5215b

John Harrison authored Feb 17, 2022

A flag query helper was actually writing to the flags word rather than
just reading. Fix that. Also update the function's comment as it was
out of date.

NB: No need for a 'Fixes' tag. The test was only ever used inside a
BUG_ON during context registration. Rather than asserting that the
condition was true, it was making the condition true. So, in theory,
there was no consequence because we should never have hit a BUG_ON
anyway. Which means the write should always have been a no-op.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220217212942.629922-1-John.C.Harrison@Intel.com

eee5215b

drm/i915/selftests: exercise mmap migration · fb87550d

Matthew Auld authored Feb 28, 2022

Exercise each of the migration scenarios, verifying that the final
placement and buffer contents match our expectations.

v2(Thomas): Replace for_i915_gem_ww() block with simpler object_lock()

v3:
- For testing purposes allow forcing the io_size such that we can
  exercise the allocation + migration path on devices that don't have the
  small BAR limit.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220228123607.580432-4-matthew.auld@intel.com

fb87550d

drm/i915/selftests: handle allocation failures · 6e0c5bf0

Matthew Auld authored Feb 28, 2022

If we have to contend with non-mappable LMEM, then we need to ensure the
object fits within the mappable portion, like in the selftests, where we
later try to CPU access the pages. However if it can't then we need to
gracefully handle this, without throwing an error.

Also it looks like TTM will return -ENOMEM, in ttm_bo_mem_space() after
exhausting all possible placements.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220228123607.580432-3-matthew.auld@intel.com

6e0c5bf0

drm/i915/ttm: mappable migration on fault · 503725c2

Matthew Auld authored Feb 28, 2022

The end goal is to have userspace tell the kernel what buffers will
require CPU access, however if we ever reach the CPU fault handler, and
the current resource is not mappable, then we should attempt to migrate
the buffer to the mappable portion of LMEM, or even system memory, if the
allowable placements permit it.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220228123607.580432-2-matthew.auld@intel.com

503725c2

drm/i915/ttm: make eviction mappable aware · 93735059

Matthew Auld authored Feb 28, 2022

If we need to make room for some mappable object, then we should
only victimize objects that have one or pages that occupy the visible
portion of LMEM. Let's also create a new priority hint for objects that
are placed in mappable memory, where we know that CPU access was
requested, that way we hopefully victimize these last.

v2(Thomas): s/TTM_PL_PRIV/I915_PL_LMEM0/
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220228123607.580432-1-matthew.auld@intel.com

93735059

28 Feb, 2022 8 commits

drm/i915: Clarify vma lifetime · c03d9826

Thomas Hellström authored Feb 22, 2022

It's unclear what reference the initial vma kref reference refers to.
A vma can have multiple weak references, the object vma list,
the vm's bound list and the GT's closed_list, and the initial vma
reference can be put from lookups of all these lists.

With the current implementation this means
that any holder of yet another vma refcount (currently only
i915_gem_object_unbind()) needs to be holding two of either
*) An object refcount,
*) A vm open count
*) A vma open count

in order for us to not risk leaking a reference by having the
initial vma reference being put twice.

Address this by re-introducing i915_vma_destroy() which removes all
weak references of the vma and *then* puts the initial vma refcount.
This makes a strong vma reference hold on to the vma unconditionally.

Perhaps a better name would be i915_vma_revoke() or i915_vma_zombify(),
since other callers may still hold a refcount, but with the prospect of
being able to replace the vma refcount with the object lock in the near
future, let's stick with i915_vma_destroy().

Finally this commit fixes a race in that previously i915_vma_release() and
now i915_vma_destroy() could destroy a vma without taking the vm->mutex
after an advisory check that the vma mm_node was not allocated.
This would race with the ungrab_vma() function creating a trace similar
to the below one. This was fixed in one of the __i915_vma_put() callsites
in
commit bc1922e5 ("drm/i915: Fix a race between vma / object destruction and unbinding")
but although not seemingly triggered by CI, that
is not sufficient. This patch is needed to fix that properly.

[823.012188] Console: switching to colour dummy device 80x25
[823.012422] [IGT] gem_ppgtt: executing
[823.016667] [IGT] gem_ppgtt: starting subtest blt-vs-render-ctx0
[852.436465] stack segment: 0000 [#1] PREEMPT SMP NOPTI
[852.436480] CPU: 0 PID: 3200 Comm: gem_ppgtt Not tainted 5.16.0-CI-CI_DRM_11115+ #1
[852.436489] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR5 RVP, BIOS ADLPFWI1.R00.2422.A00.2110131104 10/13/2021
[852.436499] RIP: 0010:ungrab_vma+0x9/0x80 [i915]
[852.436711] Code: ef e8 4b 85 cf e0 e8 36 a3 d6 e0 8b 83 f8 9c 00 00 85 c0 75 e1 5b 5d 41 5c 41 5d c3 e9 d6 fd 14 00 55 53 48 8b af c0 00 00 00 <8b> 45 00 85 c0 75 03 5b 5d c3 48 8b 85 a0 02 00 00 48 89 fb 48 8b
[852.436727] RSP: 0018:ffffc90006db7880 EFLAGS: 00010246
[852.436734] RAX: 0000000000000000 RBX: ffffc90006db7598 RCX: 0000000000000000
[852.436742] RDX: ffff88815349e898 RSI: ffff88815349e858 RDI: ffff88810a284140
[852.436748] RBP: 6b6b6b6b6b6b6b6b R08: ffff88815349e898 R09: ffff88815349e8e8
[852.436754] R10: 0000000000000001 R11: 0000000051ef1141 R12: ffff88810a284140
[852.436762] R13: 0000000000000000 R14: ffff88815349e868 R15: ffff88810a284458
[852.436770] FS:  00007f5c04b04e40(0000) GS:ffff88849f000000(0000) knlGS:0000000000000000
[852.436781] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[852.436788] CR2: 00007f5c04b38fe0 CR3: 000000010a6e8001 CR4: 0000000000770ef0
[852.436797] PKRU: 55555554
[852.436801] Call Trace:
[852.436806]  <TASK>
[852.436811]  i915_gem_evict_for_node+0x33c/0x3c0 [i915]
[852.437014]  i915_gem_gtt_reserve+0x106/0x130 [i915]
[852.437211]  i915_vma_pin_ww+0x8f4/0xb60 [i915]
[852.437412]  eb_validate_vmas+0x688/0x860 [i915]
[852.437596]  i915_gem_do_execbuffer+0xc0e/0x25b0 [i915]
[852.437770]  ? deactivate_slab+0x5f2/0x7d0
[852.437778]  ? _raw_spin_unlock_irqrestore+0x50/0x60
[852.437789]  ? i915_gem_execbuffer2_ioctl+0xc6/0x2c0 [i915]
[852.437944]  ? init_object+0x49/0x80
[852.437950]  ? __lock_acquire+0x5e6/0x2580
[852.437963]  i915_gem_execbuffer2_ioctl+0x116/0x2c0 [i915]
[852.438129]  ? i915_gem_do_execbuffer+0x25b0/0x25b0 [i915]
[852.438300]  drm_ioctl_kernel+0xac/0x140
[852.438310]  drm_ioctl+0x201/0x3d0
[852.438316]  ? i915_gem_do_execbuffer+0x25b0/0x25b0 [i915]
[852.438490]  __x64_sys_ioctl+0x6a/0xa0
[852.438498]  do_syscall_64+0x37/0xb0
[852.438507]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[852.438515] RIP: 0033:0x7f5c0415b317
[852.438523] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
[852.438542] RSP: 002b:00007ffd765039a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[852.438553] RAX: ffffffffffffffda RBX: 000055e4d7829dd0 RCX: 00007f5c0415b317
[852.438562] RDX: 00007ffd76503a00 RSI: 00000000c0406469 RDI: 0000000000000017
[852.438571] RBP: 00007ffd76503a00 R08: 0000000000000000 R09: 0000000000000081
[852.438579] R10: 00000000ffffff7f R11: 0000000000000246 R12: 00000000c0406469
[852.438587] R13: 0000000000000017 R14: 00007ffd76503a00 R15: 0000000000000000
[852.438598]  </TASK>
[852.438602] Modules linked in: snd_hda_codec_hdmi i915 mei_hdcp x86_pkg_temp_thermal snd_hda_intel snd_intel_dspcfg drm_buddy coretemp crct10dif_pclmul crc32_pclmul snd_hda_codec ttm ghash_clmulni_intel snd_hwdep snd_hda_core e1000e drm_dp_helper ptp snd_pcm mei_me drm_kms_helper pps_core mei syscopyarea sysfillrect sysimgblt fb_sys_fops prime_numbers intel_lpss_pci smsc75xx usbnet mii
[852.440310] ---[ end trace e52cdd2fe4fd911c ]---

v2: Fix typos in the commit message.

Fixes: 7e00897b ("drm/i915: Add object locking to i915_gem_evict_for_node and i915_gem_evict_something, v2.")
Fixes: bc1922e5 ("drm/i915: Fix a race between vma / object destruction and unbinding")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220222133209.587978-1-thomas.hellstrom@linux.intel.com

c03d9826

drm/i915/selftests: mock test io_size · 2d45f668

Matthew Auld authored Feb 25, 2022

Check that mappable vs non-mappable matches our expectations.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225145502.331818-7-matthew.auld@intel.com

2d45f668

drm/i915/buddy: tweak 2big check · f199bf55

Matthew Auld authored Feb 25, 2022

Otherwise we get -EINVAL, instead of the more useful -E2BIG if the
allocation doesn't fit within the pfn range, like with mappable lmem.
The hugepages selftest, for example, needs this to know if a smaller
size is needed.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225145502.331818-6-matthew.auld@intel.com

f199bf55

drm/i915/buddy: adjust res->start · f9eb7429

Matthew Auld authored Feb 25, 2022

Differentiate between mappable vs non-mappable resources, also if this
is an actual range allocation ensure we set res->start as the starting
pfn. Later when we need to do non-mappable -> mappable moves then we
want TTM to see that the current placement is not compatible, which
should result in an actual move, instead of being turned into a noop.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225145502.331818-5-matthew.auld@intel.com

f9eb7429

drm/i915/buddy: track available visible size · 26ffcbbe

Matthew Auld authored Feb 25, 2022

Track the total amount of available visible memory, and also track
per-resource the amount of used visible memory. For now this is useful
for our debug output, and deciding if it is even worth calling into the
buddy allocator. In the future tracking the per-resource visible usage
will be useful for when deciding if we should attempt to evict certain
buffers.

v2:
 - s/place->lpfn/lpfn/, that way we can avoid scanning the list if the
   entire range is already mappable.
 - Move the end declaration inside the if block(Thomas).
 - Make sure to also account for reserved memory.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225145502.331818-4-matthew.auld@intel.com

26ffcbbe

drm/i915: add I915_BO_ALLOC_GPU_ONLY · 30b9d1b3

Matthew Auld authored Feb 25, 2022

If the user doesn't require CPU access for the buffer, then
ALLOC_GPU_ONLY should be used, in order to prioritise allocating in the
non-mappable portion of LMEM, on devices with small BAR.

v2(Thomas):
  - The BO_ALLOC_TOPDOWN naming here is poor, since this is pure lies on
    systems that don't even have small BAR. A better name is GPU_ONLY,
    which is accurate regardless of the configuration.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225145502.331818-3-matthew.auld@intel.com

30b9d1b3

drm/i915/ttm: require mappable by default · 3312a4ac

Matthew Auld authored Feb 25, 2022

On devices with non-mappable LMEM ensure we always allocate the pages
within the mappable portion. For now we assume that all LMEM buffers
will require CPU access, which is also inline with pretty much all
current kernel internal users. In the next patch we will introduce a new
flag to override this behaviour.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225145502.331818-2-matthew.auld@intel.com

3312a4ac

drm/i915: add io_size plumbing · 235582ca

Matthew Auld authored Feb 25, 2022

With small LMEM-BAR we need to be able to differentiate between the
total size of LMEM, and how much of it is CPU mappable. The end goal is
to be able to utilize the entire range, even if part of is it not CPU
accessible.

v2: also update intelfb_create
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225145502.331818-1-matthew.auld@intel.com

235582ca

26 Feb, 2022 1 commit

drm/i915: apply PM_EARLY for non-GTT mappable objects · d2cc01e1

Matthew Auld authored Feb 25, 2022

On DG2 we allow objects that are smaller than the min_page_size, under
the premise that these are never mapped by the GTT, like with the paging
structures. Currently the suspend-resume path will try to map such
objects through the migration vm, which hits:

[  560.529217] kernel BUG at drivers/gpu/drm/i915/gt/intel_migrate.c:431!
[  560.536081] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  560.541629] CPU: 4 PID: 2062 Comm: rtcwake Tainted: G        W         5.17.0-rc5-demarchi+ #175
[  560.550716] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X220.B00.2103302221 03/30/2021
[  560.563627] RIP: 0010:emit_pte+0x2e7/0x380 [i915]
[  560.568665] Code: ee 02 48 89 69 04 83 c6 05 83 c0 05 39 f0 0f 4f c6 48 8b 73 08 39 d0 0f 4f c2 44 89 f2 4c 8d 4a ff 49 85 f1 0f 84 62 fe ff ff <0f> 0b 48 c7 03 00 00 00 00 4d 89 c6 8b 01 48 29 ce 48 8d 57 0c 48
[  560.587691] RSP: 0018:ffffc9000104f8a0 EFLAGS: 00010206
[  560.592906] RAX: 0000000000000040 RBX: ffffc9000104f908 RCX: ffffc900025114d0
[  560.600024] RDX: 0000000000010000 RSI: 00000003f9fe2000 RDI: ffffc900025114dc
[  560.607458] RBP: 0000000001840000 R08: ffff88810f335540 R09: 000000000000ffff
[  560.614865] R10: 000000000000081b R11: 0000000000000001 R12: 000000000000081b
[  560.622300] R13: 0000000000000000 R14: 0000000000010000 R15: ffff888107c3e240
[  560.629716] FS:  00007f5b7c086580(0000) GS:ffff88846dc00000(0000) knlGS:0000000000000000
[  560.638090] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  560.644132] CR2: 00007f3ab0a133a8 CR3: 000000010a43e003 CR4: 00000000003706e0
[  560.651590] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  560.659002] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  560.666438] Call Trace:
[  560.668885]  <TASK>
[  560.670983]  intel_context_migrate_copy+0x1b1/0x4c0 [i915]
[  560.676794]  __i915_ttm_move+0x628/0x790 [i915]
[  560.681704]  ? dma_resv_iter_next+0x8f/0xb0
[  560.686223]  ? dma_resv_iter_first+0xe5/0x140
[  560.690894]  ? i915_deps_add_resv+0x4b/0x110 [i915]
[  560.696147]  ? dma_resv_reserve_shared+0x161/0x310
[  560.701228]  i915_gem_obj_copy_ttm+0x10f/0x220 [i915]
[  560.706650]  i915_ttm_backup+0x191/0x2f0 [i915]
[  560.711558]  i915_gem_process_region+0x266/0x3b0 [i915]
[  560.717153]  ? verify_cpu+0xf0/0x100
[  560.721040]  ? pci_pm_resume_early+0x20/0x20
[  560.725603]  i915_ttm_backup_region+0x47/0x70 [i915]
[  560.730927]  i915_gem_backup_suspend+0x141/0x170 [i91

For now let's just force the memcpy path for such objects during
suspend-resume.

Fixes: 00e27ad8 ("drm/i915/migrate: add acceleration support for DG2")
Reported-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220225103443.225228-1-matthew.auld@intel.com

d2cc01e1

25 Feb, 2022 5 commits

drm/i915/guc: Remove plain ads_blob pointer · 0df0c76c

Lucas De Marchi authored Feb 16, 2022

Now we have the access to content of GuC ADS either using iosys_map
API or using a temporary buffer. Remove guc->ads_blob as there shouldn't
be updates using the bare pointer anymore.

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220216174147.3073235-17-lucas.demarchi@intel.com

0df0c76c

drm/i915/guc: Convert __guc_ads_init to iosys_map · 691ebb11

Lucas De Marchi authored Feb 16, 2022

Now that all the called functions from __guc_ads_init() are converted to
use ads_map, stop using ads_blob in __guc_ads_init().

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220216174147.3073235-16-lucas.demarchi@intel.com

691ebb11

drm/i915/guc: Convert guc_mmio_reg_state_init to iosys_map · 5fc83950

Lucas De Marchi authored Feb 16, 2022

Now that the regset list is prepared, convert guc_mmio_reg_state_init()
to use iosys_map to copy the array to the final location and
initialize additional fields in ads.reg_state_list.

v2: Just use an offset instead of temporary iosys_map.

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220216174147.3073235-15-lucas.demarchi@intel.com

5fc83950

drm/i915/guc: Convert capture list to iosys_map · f3d45c9d

Lucas De Marchi authored Feb 16, 2022

Use iosys_map to write the fields ads.capture_*.

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220216174147.3073235-14-lucas.demarchi@intel.com

f3d45c9d

drm/i915/guc: Convert mapping table to iosys_map · c723b8ee

Lucas De Marchi authored Feb 16, 2022

Use iosys_map to write the fields system_info.mapping_table[][].
Since we already have the info_map around where needed, just use it
instead of going through guc->ads_map.

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220216174147.3073235-13-lucas.demarchi@intel.com

c723b8ee