Commits · 0568a4086a6c7386885eb2ac2dae3f7186eb503f · Kirill Smelkov / linux

30 May, 2024 1 commit

drm/xe: Remove unwanted mutex locking · 0568a408

Niranjana Vishwanathapura authored May 29, 2024

Do not hold xef->exec_queue.lock mutex while parsing the xarray
xef->exec_queue.xa in xe_file_close() as it is not needed and
will cause an unwanted dependency between this lock and the vm->lock.

This lock protects the exec queue lookup and reference taking which
doesn't apply to this code path. When FD is closing, IOCTLs presumably
can't be modifying the xarray.

v2: Update commit text (Matt Brost)
v3: Add more code comment (Rodrigo Vivi)
v4: Further expand code comment (Rodirgo Vivi)
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jagmeet Randhawa <jagmeet.randhawa@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240529221639.23117-1-niranjana.vishwanathapura@intel.com

0568a408

29 May, 2024 5 commits

drm/xe: Drop undesired prefix from the platform name · 37ea1aee

Michal Wajdeczko authored May 21, 2024

We don't have to use exact names of the enumerators as the
potentially user-facing platform names. When constructing
platform descriptor fields, use the unique platform tag and
add the XE_ prefix only to the generated enum field.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521142257.756-3-michal.wajdeczko@intel.comSigned-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

37ea1aee

drm/xe/hwmon: Expose card power and energy attributes of BMG · 7e433356

Karthik Poosa authored May 29, 2024

In BMG there are separate registers for card/platform power and
energy.
These are exposed through channel 0 i.e power_1/energy1_xxx.
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20240523144351.4040131-3-balasubramani.vivekanandan@intel.comSigned-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240529050758.442056-3-balasubramani.vivekanandan@intel.comSigned-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

7e433356

drm/xe/hwmon: Add HWMON support for BMG · e90f7a58

Karthik Poosa authored May 29, 2024

Add HWMON support for BMG. Exposing the pkg power, current,
energy info.
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20240523144351.4040131-2-balasubramani.vivekanandan@intel.comSigned-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240529050758.442056-2-balasubramani.vivekanandan@intel.comSigned-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

e90f7a58

drm/xe: Add engine name to the engine reset and cat-err log · dac81a9a

Nirmoy Das authored May 28, 2024

Add engine name to the engine reset and cat error log
which should be useful while debugging.

v2: Add logical mask and engine class(Matt)
    Use xe_gt_{info|dbg} (Michal)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240528101445.27688-1-nirmoy.das@intel.comSigned-off-by: Nirmoy Das <nirmoy.das@intel.com>

dac81a9a

drm/xe: Check empty pinned BO list with lock held. · a17aceb3

Nirmoy Das authored May 28, 2024

Use lock that is meant to use for accessing the BO pin list.
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240528115408.22094-1-nirmoy.das@intel.comSigned-off-by: Nirmoy Das <nirmoy.das@intel.com>

a17aceb3

28 May, 2024 9 commits

drm/xe/guc: Fix uninitialised count in GuC load debug prints · fa171d49

John Harrison authored May 24, 2024

The debug prints about how long the GuC load takes have a loop
counter. However that was neither initialised nor incremented! Plus,
counting loops is no longer meaningful given the wait function returns
early for any change in the status value. So fix it to only count
loops due to actual timeouts.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405250151.IbH0l8FG-lkp@intel.com/
Fixes: b0ac1b42 ("drm/xe/guc: Port over the slow GuC loading support from i915")
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Oded Gabbay <ogabbay@kernel.org>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: intel-xe@lists.freedesktop.org
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240524202603.4011656-1-John.C.Harrison@Intel.com

fa171d49

MAINTAINERS: update Xe driver maintainers · 8de6625d

Oded Gabbay authored May 15, 2024

Because I left Intel, I'm removing myself from the list
of Xe driver maintainers.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240515162222.12958-3-ogabbay@kernel.orgSigned-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

8de6625d

drm/xe: Enable Coarse Power Gating · 38e8c418

Riana Tauro authored May 24, 2024

Enable power gating for all units and sub-pipes that
are disabled by default.

v2: change the init function name
    use symmetric calls for enable/disable pg
    re-pharase commit message (Rodrigo)
    modify the sub-pipe power gating condition

v3: set hysteresis value for render and media
    when GuC PC is disabled
    skip CPG for PVC (Vinay)

v4: rebase
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> #v2
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240524070916.143022-3-riana.tauro@intel.comSigned-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

38e8c418

drm/xe: Standardize power gate registers · 9276bcc2

Riana Tauro authored May 24, 2024

Standardize power gate registers

No functional changes

v2: change commit message (Rodrigo)
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240524070916.143022-2-riana.tauro@intel.comSigned-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

9276bcc2

drm/xe: Don't refer to general LRC initialization as a "wa" · 5c9464e2

Matt Roper authored May 24, 2024

During engine LRC initialization a number of registers need to be
programmed as general setup. This programming is not a "workaround" so
naming the RTP table as "lrc_was" is misleading; switch to the name
"lrc_setup" to more accurately describe what the table is actually for.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240524230444.1447797-2-matthew.d.roper@intel.com

5c9464e2

drm/xe: Use platform name in xe_assert() · 0aa25625

Michal Wajdeczko authored May 21, 2024

We can now use more user-friendly platform name instead of
previosly used magic platform enumerator value:

  [ ] xe 0000:00:02.0: [drm] Assertion `false` failed!
      platform: ALDERLAKE_S ...
  [ ] xe 0000:03:00.0: [drm] Assertion `false` failed!
      platform: DG2 ...

vs

  [ ] xe 0000:00:02.0: [drm] Assertion `false` failed!
      platform: 3 ...
  [ ] xe 0000:03:00.0: [drm] Assertion `false` failed!
      platform: 7 ...
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521142257.756-4-michal.wajdeczko@intel.com

0aa25625

drm/xe: Store platform name in xe_device.info · 6ca72897

Michal Wajdeczko authored May 21, 2024

We already maintain the platform name as part of the device
descriptor, but in xe_device.info we only store platform enum,
which is not the best for use in any user-facing messages.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240521142257.756-2-michal.wajdeczko@intel.com

6ca72897

drm/xe: allow unaligned start and size xe_res_cursor parameters · 82e0b129

Andrzej Hajda authored Apr 18, 2024

xe_res_cursor code does not depend on the alignment. On the other side
unaligned accesses are useful from pread/pwrite point of view.
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240418-xe_res_cursor-no-align-v1-1-8df7834266c9@intel.comSigned-off-by: Nirmoy Das <nirmoy.das@intel.com>

82e0b129

drm/xe: flush gtt before signalling user fence on all engines · 38007fa9

Andrzej Hajda authored May 22, 2024

Tests show that user fence signalling requires kind of write barrier,
otherwise not all writes performed by the workload will be available
to userspace. It is already done for render and compute, we need it
also for the rest: video, gsc, copy.

v2: added gsc and copy engines, added fixes and r-b tags

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1488
Fixes: dd08ebf6 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522-xu_flush_vcs_before_ufence-v2-1-9ac3e9af0323@intel.comSigned-off-by: Nirmoy Das <nirmoy.das@intel.com>

38007fa9

27 May, 2024 9 commits

drm/xe: Do not access xe file when updating exec queue run_ticks · ce62827b

Umesh Nerlige Ramappa authored May 24, 2024

The current code is running into a use after free case where xe file is
closed before the exec queue run_ticks can be updated. This is occurring
in the xe_file_close path. To fix that, do not access xe file when
updating the exec queue run_ticks. Instead store the exec queue run_ticks
locally in the exec queue object and accumulate it when the user dumps
the drm client stats. We know that the xe file is valid when user is
dumping the run_ticks for the drm client, so this effectively
removes the dependency on xe file object in
xe_exec_queue_update_run_ticks().

v2:
- Fix the accumulation of q->run_ticks delta into xe file run_ticks
- s/runtime/run_ticks/ (Rodrigo)

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1908
Fixes: 6109f24f ("drm/xe: Add helper to accumulate exec queue runtime")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240524234744.1352543-2-umesh.nerlige.ramappa@intel.com

ce62827b

drm/xe: Use run_ticks instead of runtime for client stats · 45bb564d

Umesh Nerlige Ramappa authored May 24, 2024

Note that runtime is also used in the pm context, so it is confusing to
use the same name to denote run time of the drm client. Use a more
appropriate name for the client utilization.

While at it, drop the incorrect multi-lrc comment in the helper
description

v2: s/show_runtime/show_run_ticks/ (Rodrigo)
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240524234744.1352543-1-umesh.nerlige.ramappa@intel.com

45bb564d

drm/xe: Move job creation out of the struct xe_migrate::job_mutex · 50e52592

Thomas Hellström authored May 27, 2024

In order to be able to run gpu jobs from reclaim context,
move job creation (where allocation takes place) out of the
struct xe_migrate::job_mutex, and prime that mutex as reclaim
tainted.

Jobs that may need to run from reclaim context include
CCS metadata extraction at shrinking time.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-6-thomas.hellstrom@linux.intel.com

50e52592

drm/xe: Remove xe_lrc_create_seqno_fence() · 577b83b0

Thomas Hellström authored May 27, 2024

It's not used anymore.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-5-thomas.hellstrom@linux.intel.com

577b83b0

drm/xe: Don't initialize fences at xe_sched_job_create() · 0ac7a2c7

Thomas Hellström authored May 27, 2024

Pre-allocate but don't initialize fences at xe_sched_job_create(),
and initialize / arm them instead at xe_sched_job_arm(). This
makes it possible to move xe_sched_job_create() with its memory
allocation out of any lock that is required for fence
initialization, and that may not allow memory allocation under it.

Replaces the struct dma_fence_array for parallell jobs with a
struct dma_fence_chain, since the former doesn't allow
a split-up between allocation and initialization.

v2:
- Rebase.
- Don't always use the first lrc when initializing parallel
  lrc fences.
- Use dma_fence_chain_contained() to access the lrc fences.

v4:
- Add an assert that job->lrc_seqno == fence->seqno.
  (Matthew Brost)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> #v2
Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-4-thomas.hellstrom@linux.intel.com

0ac7a2c7

drm/xe: Split lrc seqno fence creation up · e183910a

Thomas Hellström authored May 27, 2024

Since sometimes a lock is required to initialize a seqno fence,
and it might be desirable not to hold that lock while performing
memory allocations, split the lrc seqno fence creation up into an
allocation phase and an initialization phase.

Since lrc seqno fences under the hood are hw_fences, do the same
for these and remove the xe_hw_fence_create() function since it
is not used anymore.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-3-thomas.hellstrom@linux.intel.com

e183910a

drm/xe: Decouple job seqno and lrc seqno · 08f72008

Matthew Brost authored May 27, 2024

Tightly coupling these seqno presents problems if alternative fences for
jobs are used. Decouple these for correctness.

v2:
- Slightly reword commit message (Thomas)
- Make sure the lrc fence ops are used in comparison (Thomas)
- Assume seqno is unsigned rather than signed in format string (Thomas)

Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-2-thomas.hellstrom@linux.intel.com

08f72008

drm/xe/vf: Use only assigned GGTT region · d79e8cab

Michal Wajdeczko authored May 27, 2024

Each VF is assigned a limited range of the GGTT address space.
To ensure that the VF driver does not use GGTT allocations outside
of the assigned region, explicitly reserve GGTT space below and
above this region when initializing GGTT.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240527112015.1020-1-michal.wajdeczko@intel.com

d79e8cab

drm/xe/vf: Read VF configuration prior to GGTT initialization · ea797cf4

Michal Wajdeczko authored May 24, 2024

Each VF will be assigned with only a limited range of the GGTT
address space. Make sure that VF driver will read its own GGTT
configuration before starting any GGTT initialization.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240524113714.932-2-michal.wajdeczko@intel.com

ea797cf4

24 May, 2024 6 commits

drm/xe/vf: Treat GMDID as another runtime register · 5cef8493

Michal Wajdeczko authored May 23, 2024

While the GMDID registers are not part of the runtime register list
shared by the PF driver, we may still return cached values from our
VF specific read32() helper function.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-7-michal.wajdeczko@intel.com

5cef8493

drm/xe/vf: Cache value of the GMDID register · 9081f8ca

Michal Wajdeczko authored May 23, 2024

Read and cache value of the GMDID register as part of the config
query that VF driver is doing over MMIO.

While the VF driver likely already obtained the value of the GMDID
register once during the early driver probe, we couldn't cache it
then as the GT structures were not ready yet.

Cache it now, in case the driver needs it later when the GuC MMIO
communication, required to query GMDID from GuC, could be no longer
desired as it will be replaced by the CTB communication.

While around, assert that we will query GMDID only when applicable.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-6-michal.wajdeczko@intel.com

9081f8ca

drm/xe/vf: Provide early access to GMDID register · fcc6b719

Michal Wajdeczko authored May 24, 2024

VFs do not have direct access to the GMDID register and must obtain
its value from the GuC. Since we need GMDID value very early in the
driver probe flow, before we even start the full setup of GT and GuC
data structures, we must do some early initializations ourselves.

Additionally, since we also need GMDID for the media GT, which isn't
created yet, temporarly tweak the root GT type into MEDIA to allow
communication with the correct GuC, as only it can provide the value
of the media GMDID register.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240523223042.888-1-michal.wajdeczko@intel.com

fcc6b719

drm/xe/vf: Obtain value of GMDID register from GuC · 2948b242

Michal Wajdeczko authored May 23, 2024

VFs don't have access to the GMDID register and must obtain it
value using GuC VF ABI KLV query. Add function for doing that.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-4-michal.wajdeczko@intel.com

2948b242

drm/xe/guc: Add GLOBAL_CFG_GMD_ID KLV definition · e70aa101

Michal Wajdeczko authored May 23, 2024

VF drivers can't access GMD_ID register over MMIO.
The value of the GMD_ID register must be queried from GuC.
It is available as GLOBAL_CFG_GMD_ID KLV.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-3-michal.wajdeczko@intel.com

e70aa101

drm/xe/vf: Use register values obtained from the PF · 4edadc41

Michal Wajdeczko authored May 23, 2024

As part of the its initialization, the VF driver has already
obtained a list of the runtime (fuse) register values from the
PF driver. When VF driver is attempting to read register that is
inaccessible to the VF, use the values from this list instead.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-2-michal.wajdeczko@intel.com

4edadc41

23 May, 2024 10 commits

drm/xe/guc: Port over the slow GuC loading support from i915 · b0ac1b42

John Harrison authored May 17, 2024

GuC loading can take longer than it is supposed to for various
reasons. So add in the code to cope with that and to report it when it
happens. There are also many different reasons why GuC loading can
fail, so add in the code for checking for those and for reporting
issues in a meaningful manner rather than just hitting a timeout and
saying 'fail: status = %x'.

Also, remove the 'FIXME' comment about an i915 bug that has never been
applicable to Xe!

v2: Actually report the requested and granted frequencies rather than
showing granted twice (review feedback from Badal).
v3: Locally code all the timeout and end condition handling because a
helper function is not allowed (review feedback from Lucas/Rodrigo).
v4: Add more documentation comments and rename a define to add units
(review feedback from Lucas).
v5: Fix copy/paste error in xe_mmio_wait32_not (review feedback from
Lucas) and rebase (no more return value from guc_wait_ucode).
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240518043700.3264362-3-John.C.Harrison@Intel.com

b0ac1b42

drm/xe: Make read_perf_limit_reasons globally accessible · fcc8f805

John Harrison authored May 17, 2024

Other driver code beyond the sysfs interface wants to know about
throttling. So make the query function globally accessible.

v2: Revert include order change (review feedback from Lucas)
v3: Remove '_sysfs' from throttle file names and keep limit query in
the same file rather than moving elsewhere (review feedback from
Rodrigo).
v4: Correct #include while renaming header file (review feedback
from Lucas).
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240518043700.3264362-2-John.C.Harrison@Intel.com

fcc8f805

drm/xe: Nuke simple error capture · 83ee002d

José Roberto de Souza authored May 22, 2024

This error capture prints into dmesg HW state when a gpu hang happens.
It was useful when we did not had devcoredump, now it is a incompleted
version of devcoredump that has potential to flood dmesg.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522203431.191594-1-jose.souza@intel.comSigned-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

83ee002d

drm/xe: Add process name to devcoredump · b10d0c5e

José Roberto de Souza authored May 22, 2024

Process name help us track what application caused the gpug hang, this
is crucial when running several applications at the same time.

v2:
- handle Xe KMD exec_queues without VM

v3:
- use get_pid_task() (suggested by Nirmoy)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522201203.145403-1-jose.souza@intel.comSigned-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

b10d0c5e

drm/xe: remove unused struct 'xe_gt_desc' · e8ac8048

Dr. David Alan Gilbert authored May 22, 2024

'xe_gt_desc' is unused since
commit 1e6c20be ("drm/xe: Drop extra_gts[] declarations and
XE_GT_TYPE_REMOTE").

Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link:
https://patchwork.freedesktop.org/patch/msgid/20240522175840.382107-1-linux@treblig.orgReviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

e8ac8048

drm/xe: Enable D3Cold on 'low' VRAM utilization · f9180603

Rodrigo Vivi authored May 22, 2024

Now that we eliminated all the mem_access get/put with its
locking issues from the inner calls of migration, we can
allow D3Cold.

Enable it when VRAM utilization is lower then 300Mb. On
higher utilization we only allow D3hot so we don't increase
so much the latency on runtime resume due to the memory
restoration.
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-7-rodrigo.vivi@intel.comSigned-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

f9180603

drm/xe: Stop checking for power_lost on D3Cold · 8d490e01

Rodrigo Vivi authored May 22, 2024

GuC reset status is not reliable for this purpose and it is
once in a while ending up in a situation of D3Cold, where
power_reset is false and without the proper memory restoration
the GuC reload and Display will fail to come back from D3Cold.

So, let's do a full restoration of everything if we have a risk
of losing power, without further optimizations.

v2: also remove the gut_in_reset function (Anshuman)

Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-6-rodrigo.vivi@intel.comSigned-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

8d490e01

drm/xe: Prepare display for D3Cold · e7b180b2

Rodrigo Vivi authored May 22, 2024

Prepare power-well and DC handling for a full power
lost during D3Cold, then sanitize it upon D3->D0.
Otherwise we get a bunch of state mismatch.

Ideally we could leave DC9 enabled and wouldn't need
to move DC9->DC0 on every runtime resume, however,
the disable_DC is part of the power-well checks and
intrinsic to the dc_off power well. In the future that
can be detangled so we can have even bigger power savings.
But for now, let's focus on getting a D3Cold, which saves
much more power by itself.

v2: create new functions to avoid full-suspend-resume path,
which would result in a deadlock between xe_gem_fault and the
modeset-ioctl.

v3: Only avoid the full modeset to avoid the race, for a more
robust suspend-resume.

Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Uma Shankar <uma.shankar@intel.com>
Tested-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-5-rodrigo.vivi@intel.comSigned-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

e7b180b2

drm/xe: Relax runtime pm protection around VM · 73ba282e

Rodrigo Vivi authored May 22, 2024

In the regular use case scenario, user space will create a
VM, and keep it alive for the entire duration of its workload.

For the regular desktop cases, it means that the VM
is alive even on idle scenarios where display goes off. This
is unacceptable since this would entirely block runtime PM
indefinitely, blocking deeper Package-C state. This would be
a waste drainage of power.

Limit the VM protection solely for long-running workloads that
are not protected by the scheduler references.
By design, run_job for long-running workloads returns NULL and
the scheduler drops all the references of it, hence protecting
the VM for this case is necessary.

v2: Update commit message to a more imperative language and to
    reflect why the VM protection is really needed.
    Also add a comment in the code to let the reason visbible.

v3: Remove vma_access case and the mentions to mmap. Mmap cases
    are already protected by the gem page fault.

Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Tested-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-4-rodrigo.vivi@intel.comSigned-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

73ba282e

drm/xe: Relax runtime pm protection during execution · ad1e331f

Rodrigo Vivi authored May 22, 2024

Limit the protection only during moments of actual job execution,
and introduce protection for guc submit fini, which is currently
unprotected due to the absence of exec_queue life protection.

In the regular use case scenario, user space will create an
exec queue, and keep it alive to reuse that until it is done
with that kind of workload.

For the regular desktop cases, it means that the exec_queue
is alive even on idle scenarios where display goes off. This
is unacceptable since this would entirely block runtime PM
indefinitely, blocking deeper Package-C state. This would be
a waste drainage of power.

Cc: Matthew Brost <matthew.brost@intel.com>
Tested-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-3-rodrigo.vivi@intel.comSigned-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

ad1e331f