Commits · f12af4c461fb6cd5ed7b48f8b4d09b22eb19fcc5 · Kirill Smelkov / linux

05 Nov, 2023 5 commits

drm/sched: Drop suffix from drm_sched_wakeup_if_can_queue · f12af4c4

Tvrtko Ursulin authored Nov 02, 2023

Because a) helper is exported to other parts of the scheduler and
b) there isn't a plain drm_sched_wakeup to begin with, I think we can
drop the suffix and by doing so separate the intimiate knowledge
between the scheduler components a bit better.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Luben Tuikov <ltuikov89@gmail.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231102105538.391648-6-tvrtko.ursulin@linux.intel.comReviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

f12af4c4

drm/sched: Rename drm_sched_run_job_queue_if_ready and clarify kerneldoc · 35a4279d

Tvrtko Ursulin authored Nov 02, 2023

"If ready" is not immediately clear what it means - is the scheduler
ready or something else? Drop the suffix, clarify kerneldoc, and employ
the same naming scheme as in drm_sched_run_free_queue:

 - drm_sched_run_job_queue   - enqueues if there is something to enqueue
                               *and* scheduler is ready (can queue)
 - __drm_sched_run_job_queue - low-level helper to simply queue the job
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Luben Tuikov <ltuikov89@gmail.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231102105538.391648-5-tvrtko.ursulin@linux.intel.comReviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

35a4279d

drm/sched: Rename drm_sched_free_job_queue to be more descriptive · 67dd1d8c

Tvrtko Ursulin authored Nov 02, 2023

The current name makes it sound like helper will free a queue, while what
it does is it enqueues the free job worker.

Rename it to drm_sched_run_free_queue to align with existing
drm_sched_run_job_queue.

Despite that creating an illusion there are two queues, while in reality
there is only one, at least it creates a consistent naming for the two
enqueuing helpers.

At the same time simplify the "if done" helper by dropping the suffix and
adding a double underscore prefix to the one which just enqueues.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Luben Tuikov <ltuikov89@gmail.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231102105538.391648-4-tvrtko.ursulin@linux.intel.comReviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

67dd1d8c

drm/sched: Move free worker re-queuing out of the if block · e608d9f7

Tvrtko Ursulin authored Nov 02, 2023

Whether or not there are more jobs to clean up does not depend on the
existance of the current job, given both drm_sched_get_finished_job and
drm_sched_free_job_queue_if_done take and drop the job list lock.
Therefore it is confusing to make it read like there is a dependency.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Luben Tuikov <ltuikov89@gmail.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231102105538.391648-3-tvrtko.ursulin@linux.intel.comReviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

e608d9f7

drm/sched: Rename drm_sched_get_cleanup_job to be more descriptive · 7abbbe26

Tvrtko Ursulin authored Nov 02, 2023

"Get cleanup job" makes it sound like helper is returning a job which will
execute some cleanup, or something, while the kerneldoc itself accurately
says "fetch the next _finished_ job". So lets rename the helper to be self
documenting.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Luben Tuikov <ltuikov89@gmail.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231102105538.391648-2-tvrtko.ursulin@linux.intel.comReviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

7abbbe26

03 Nov, 2023 2 commits

accel/qaic: Support for 0 resize slice execution in BO · 3b511278

Pranjal Ramajor Asha Kanojiya authored Oct 27, 2023

Add support to partially execute a slice which is resized to zero.
Executing a zero size slice in a BO should mean that there is no DMA
transfers involved but you should still configure doorbell and semaphores.

For example consider a BO of size 18K and it is sliced into 3 6K slices
and user calls partial execute ioctl with resize as 10K.
slice 0 - size is 6k and offset is 0, so resize of 10K will not cut short
this slice hence we send the entire slice for execution.
slice 1 - size is 6k and offset is 6k, so resize of 10K will cut short this
slice and only the first 4k should be DMA along with configuring
doorbell and semaphores.
slice 2 - size is 6k and offset is 12k, so resize of 10k will cut short
this slice and no DMA transfer would be involved but we should
would configure doorbell and semaphores.

This change begs to change the behavior of 0 resize. Currently, 0 resize
partial execute ioctl behaves exactly like execute ioctl i.e. no resize.
After this patch all the slice in BO should behave exactly like slice 2 in
above example.

Refactor copy_partial_exec_reqs() to make it more readable and less
complex.
Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231027164330.11978-1-quic_jhugo@quicinc.com

3b511278

accel/qaic: Quiet array bounds check on DMA abort message · 44793c6a

Carl Vanderlip authored Oct 27, 2023

Current wrapper is right-sized to the message being transferred;
however, this is smaller than the structure defining message wrappers
since the trailing element is a union of message/transfer headers of
various sizes (8 and 32 bytes on 32-bit system where issue was
reported). Using the smaller header with a small message
(wire_trans_dma_xfer is 24 bytes including header) ends up being smaller
than a wrapper with the larger header. There are no accesses outside of
the defined size, however they are possible if the larger union member
is referenced.

Abort messages are outside of hot-path and changing the wrapper struct
would require a larger rewrite, so having the memory allocated to the
message be 8 bytes too big is acceptable.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202310182253.bcb9JcyJ-lkp@intel.com/Signed-off-by: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231027180810.4873-1-quic_jhugo@quicinc.com

44793c6a

02 Nov, 2023 6 commits

drm/v3d: add brcm,2712-v3d as a compatible V3D device · 6fd94871

Iago Toral Quiroga authored Oct 31, 2023

This is required to get the V3D module to load with Raspberry Pi 5.
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231031073859.25298-5-itoral@igalia.com

6fd94871

dt-bindings: gpu: v3d: Add BCM2712's compatible · ebb2f6ee

Iago Toral Quiroga authored Oct 31, 2023

BCM2712, Raspberry Pi 5's SoC, contains a V3D core. So add its specific
compatible to the bindings.
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231031073859.25298-4-itoral@igalia.com

ebb2f6ee

drm/v3d: fix up register addresses for V3D 7.x · 0ad5bc1c

Iago Toral Quiroga authored Oct 31, 2023

This patch updates a number of register addresses that have
been changed in Raspberry Pi 5 (V3D 7.1) and updates the
code to use the corresponding registers and addresses based
on the actual V3D version.
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231031073859.25298-3-itoral@igalia.com

0ad5bc1c

drm/v3d: update UAPI to match user-space for V3D 7.x · 1118d10f

Iago Toral Quiroga authored Oct 31, 2023

V3D 7.x takes a new parameter to configure TFU jobs that needs
to be provided by user space.
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231031073859.25298-2-itoral@igalia.com

1118d10f

fbdev/simplefb: Add support for generic power-domains · 92a511a5

Thierry Reding authored Nov 01, 2023

The simple-framebuffer device tree bindings document the power-domains
property, so make sure that simplefb supports it. This ensures that the
power domains remain enabled as long as simplefb is active.

v2: - remove unnecessary call to simplefb_detach_genpds() since that's
      already done automatically by devres
    - fix crash if power-domains property is missing in DT
Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231101172017.3872242-3-thierry.reding@gmail.com

92a511a5

fbdev/simplefb: Support memory-region property · 8ddfc01a

Thierry Reding authored Nov 01, 2023

The simple-framebuffer bindings specify that the "memory-region"
property can be used as an alternative to the "reg" property to define
the framebuffer memory used by the display hardware. Implement support
for this in the simplefb driver.
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231101172017.3872242-2-thierry.reding@gmail.com

8ddfc01a

01 Nov, 2023 6 commits

drm/sched: Add a helper to queue TDR immediately · 3c6c7ca4

Matthew Brost authored Oct 30, 2023

Add a helper whereby a driver can invoke TDR immediately.

v2:
 - Drop timeout args, rename function, use mod delayed work (Luben)
v3:
 - s/XE/Xe (Luben)
 - present tense in commit message (Luben)
 - Adjust comment for drm_sched_tdr_queue_imm (Luben)
v4:
 - Adjust commit message (Luben)

Cc: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20231031032439.1558703-6-matthew.brost@intel.comSigned-off-by: Luben Tuikov <ltuikov89@gmail.com>

3c6c7ca4

drm/sched: Add drm_sched_start_timeout_unlocked helper · 7a36dcfa

Matthew Brost authored Oct 30, 2023

Also add a lockdep assert to drm_sched_start_timeout.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20231031032439.1558703-5-matthew.brost@intel.comSigned-off-by: Luben Tuikov <ltuikov89@gmail.com>

7a36dcfa

drm/sched: Split free_job into own work item · f7fe64ad

Matthew Brost authored Oct 30, 2023

Rather than call free_job and run_job in same work item have a dedicated
work item for each. This aligns with the design and intended use of work
queues.

v2:
   - Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting
     timestamp in free_job() work item (Danilo)
v3:
  - Drop forward dec of drm_sched_select_entity (Boris)
  - Return in drm_sched_run_job_work if entity NULL (Boris)
v4:
  - Replace dequeue with peek and invert logic (Luben)
  - Wrap to 100 lines (Luben)
  - Update comments for *_queue / *_queue_if_ready functions (Luben)
v5:
  - Drop peek argument, blindly reinit idle (Luben)
  - s/drm_sched_free_job_queue_if_ready/drm_sched_free_job_queue_if_done (Luben)
  - Update work_run_job & work_free_job kernel doc (Luben)
v6:
  - Do not move drm_sched_select_entity in file (Luben)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20231031032439.1558703-4-matthew.brost@intel.comReviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

f7fe64ad

drm/sched: Convert drm scheduler to use a work queue rather than kthread · a6149f03

Matthew Brost authored Oct 30, 2023

In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1
mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
seems a bit odd but let us explain the reasoning below.

1. In Xe the submission order from multiple drm_sched_entity is not
guaranteed to be the same completion even if targeting the same hardware
engine. This is because in Xe we have a firmware scheduler, the GuC,
which allowed to reorder, timeslice, and preempt submissions. If a using
shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
apart as the TDR expects submission order == completion order. Using a
dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.

2. In Xe submissions are done via programming a ring buffer (circular
buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
control on the ring for free.

A problem with this design is currently a drm_gpu_scheduler uses a
kthread for submission / job cleanup. This doesn't scale if a large
number of drm_gpu_scheduler are used. To work around the scaling issue,
use a worker rather than kthread for submission / job cleanup.

v2:
  - (Rob Clark) Fix msm build
  - Pass in run work queue
v3:
  - (Boris) don't have loop in worker
v4:
  - (Tvrtko) break out submit ready, stop, start helpers into own patch
v5:
  - (Boris) default to ordered work queue
v6:
  - (Luben / checkpatch) fix alignment in msm_ringbuffer.c
  - (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue
  - (Luben) Update comment for drm_sched_wqueue_enqueue
  - (Luben) Positive check for submit_wq in drm_sched_init
  - (Luben) s/alloc_submit_wq/own_submit_wq
v7:
  - (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue
v8:
  - (Luben) Adjust var names / comments
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20231031032439.1558703-3-matthew.brost@intel.comSigned-off-by: Luben Tuikov <ltuikov89@gmail.com>

a6149f03

drm/sched: Add drm_sched_wqueue_* helpers · 35963cf2

Matthew Brost authored Oct 30, 2023

Add scheduler wqueue ready, stop, and start helpers to hide the
implementation details of the scheduler from the drivers.

v2:
  - s/sched_wqueue/sched_wqueue (Luben)
  - Remove the extra white line after the return-statement (Luben)
  - update drm_sched_wqueue_ready comment (Luben)

Cc: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20231031032439.1558703-2-matthew.brost@intel.comSigned-off-by: Luben Tuikov <ltuikov89@gmail.com>

35963cf2

dma-buf: add dma_fence_timestamp helper · 0da611a8

Christian König authored Sep 08, 2023

When a fence signals there is a very small race window where the timestamp
isn't updated yet. sync_file solves this by busy waiting for the
timestamp to appear, but on other ocassions didn't handled this
correctly.

Provide a dma_fence_timestamp() helper function for this and use it in
all appropriate cases.

Another alternative would be to grab the spinlock when that happens.

v2 by teddy: add a wait parameter to wait for the timestamp to show up, in case
   the accurate timestamp is needed and/or the timestamp is not based on
   ktime (e.g. hw timestamp)
v3 chk: drop the parameter again for unified handling
Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 1774baa6 ("drm/scheduler: Change scheduled fence track v2")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20230929104725.2358-1-christian.koenig@amd.com

0da611a8

31 Oct, 2023 10 commits

accel/ivpu: Rename VPU to NPU in product strings · 2fc1a50f

Jacek Lawrynowicz authored Oct 28, 2023

VPU was rebranded as NPU (Neural Processing Unit) so user facing
strings have to be updated but the code remains as is and the module
is still called intel_vpu.ko.
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028155936.1183342-9-stanislaw.gruszka@linux.intel.com

2fc1a50f

accel/ivpu: Simplify MMU SYNC command · 0c287c27

Jacek Lawrynowicz authored Oct 28, 2023

CMD_SYNC does not need any args as we poll for completion anyway.
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028155936.1183342-8-stanislaw.gruszka@linux.intel.com

0c287c27

accel/ivpu: Make DMA allocations for MMU600 write combined · 3bcc5209

Karol Wachowski authored Oct 28, 2023

Previously using dma_alloc_wc() API we created cache coherent
(mapped as write-back) mappings.

Because we disable MMU600 snooping it was required to do costly
page walk and cache flushes after each page table modification.

With write-combined buffers it's possible to do a single write memory
barrier to flush write-combined buffer to memory which simplifies the
driver and significantly reduce time of map/unmap operations.

Mapping time of 255 MB is reduced from 2.5 ms to 500 us.
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028155936.1183342-7-stanislaw.gruszka@linux.intel.com

3bcc5209

accel/ivpu: Print CMDQ errors after consumer timeout · e013aa9a

Karol Wachowski authored Oct 28, 2023

Add checking of error reason bits in IVPU_MMU_CMDQ_CONS
register when waiting for consumer timeout occurred.
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028155936.1183342-6-stanislaw.gruszka@linux.intel.com

e013aa9a

accel/ivpu: Abort pending rx ipc on reset · ba6b035d

Stanislaw Gruszka authored Oct 28, 2023

Waking up process, which wait for particular condition, will go to
sleep again on wake_up() if the condition is not met. Add abort flag
to wake up IPC receivers, which will finish with -ECANCELED error.

This is only needed for reset, run time power management prevent to
suspend VPU when there is pending IPC processing or pending job.
Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028155936.1183342-5-stanislaw.gruszka@linux.intel.com

ba6b035d

accel/ivpu: Stop job_done_thread on suspend · 57c7e3e4

Stanislaw Gruszka authored Oct 28, 2023

Stop job_done thread when going to suspend. Use kthread_park() instead
of kthread_stop() to avoid memory allocation and potential failure
on resume.

Use separate function as thread wake up condition. Use spin lock to assure
rx_msg_list is properly protected against concurrent access. This avoid
race condition when the rx_msg_list list is modified and read in
ivpu_ipc_recive() at the same time.
Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028155936.1183342-4-stanislaw.gruszka@linux.intel.com

57c7e3e4

accel/ivpu: Assure device is off if power up sequence fail · a06eb9be

Stanislaw Gruszka authored Oct 28, 2023

We should not leave device half enabled if there is failure somewhere
it power up sequence. Fix device init and resume paths.
Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028155936.1183342-3-stanislaw.gruszka@linux.intel.com

a06eb9be

accel/ivpu/40xx: Allow to change profiling frequency · bfc87f90

Krystian Pradzynski authored Oct 28, 2023

Profiling freq is a debug firmware feature. It switches default clock
to higher resolution for fine-grained and more accurate firmware task
profiling. We already configure it during boot up of VPU4.

Add debugfs knob and helpers per HW generation that allow to change it.
For vpu37xx the implementation is empty as profiling frequency can only
be changed on VPU4 or newer.
Signed-off-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028155936.1183342-2-stanislaw.gruszka@linux.intel.com

bfc87f90

Revert "drm/omapdrm: Annotate dma-fence critical section in commit path" · 9d7c8c06

Tomi Valkeinen authored Sep 20, 2023

This reverts commit 250aa229.

The DMA-fence annotations cause a lockdep warning (see below). As per
https://patchwork.freedesktop.org/patch/462170/ it sounds like the
annotations don't work correctly.

======================================================
WARNING: possible circular locking dependency detected
6.5.0-rc2+ #2 Not tainted
------------------------------------------------------
kmstest/219 is trying to acquire lock:
c4705838 (&hdmi->lock){+.+.}-{3:3}, at: hdmi5_bridge_mode_set+0x1c/0x50

but task is already holding lock:
c11e1128 (dma_fence_map){++++}-{0:0}, at: omap_atomic_commit_tail+0x14/0xbc

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (dma_fence_map){++++}-{0:0}:
       __dma_fence_might_wait+0x48/0xb4
       dma_resv_lockdep+0x1b8/0x2bc
       do_one_initcall+0x68/0x3b0
       kernel_init_freeable+0x260/0x34c
       kernel_init+0x14/0x140
       ret_from_fork+0x14/0x28

-> #1 (fs_reclaim){+.+.}-{0:0}:
       fs_reclaim_acquire+0x70/0xa8
       __kmem_cache_alloc_node+0x3c/0x368
       kmalloc_trace+0x28/0x58
       _drm_do_get_edid+0x7c/0x35c
       hdmi5_bridge_get_edid+0xc8/0x1ac
       drm_bridge_connector_get_modes+0x64/0xc0
       drm_helper_probe_single_connector_modes+0x170/0x528
       drm_client_modeset_probe+0x208/0x1334
       __drm_fb_helper_initial_config_and_unlock+0x30/0x548
       omap_fbdev_client_hotplug+0x3c/0x6c
       drm_client_register+0x58/0x94
       pdev_probe+0x544/0x6b0
       platform_probe+0x58/0xbc
       really_probe+0xd8/0x3fc
       __driver_probe_device+0x94/0x1f4
       driver_probe_device+0x2c/0xc4
       __device_attach_driver+0xa4/0x11c
       bus_for_each_drv+0x84/0xdc
       __device_attach+0xac/0x20c
       bus_probe_device+0x8c/0x90
       device_add+0x588/0x7e0
       platform_device_add+0x110/0x24c
       platform_device_register_full+0x108/0x15c
       dss_bind+0x90/0xc0
       try_to_bring_up_aggregate_device+0x1e0/0x2c8
       __component_add+0xa4/0x174
       hdmi5_probe+0x1c8/0x270
       platform_probe+0x58/0xbc
       really_probe+0xd8/0x3fc
       __driver_probe_device+0x94/0x1f4
       driver_probe_device+0x2c/0xc4
       __device_attach_driver+0xa4/0x11c
       bus_for_each_drv+0x84/0xdc
       __device_attach+0xac/0x20c
       bus_probe_device+0x8c/0x90
       deferred_probe_work_func+0x8c/0xd8
       process_one_work+0x2ac/0x6e4
       worker_thread+0x30/0x4ec
       kthread+0x100/0x124
       ret_from_fork+0x14/0x28

-> #0 (&hdmi->lock){+.+.}-{3:3}:
       __lock_acquire+0x145c/0x29cc
       lock_acquire.part.0+0xb4/0x258
       __mutex_lock+0x90/0x950
       mutex_lock_nested+0x1c/0x24
       hdmi5_bridge_mode_set+0x1c/0x50
       drm_bridge_chain_mode_set+0x48/0x5c
       crtc_set_mode+0x188/0x1d0
       omap_atomic_commit_tail+0x2c/0xbc
       commit_tail+0x9c/0x188
       drm_atomic_helper_commit+0x158/0x18c
       drm_atomic_commit+0xa4/0xe8
       drm_mode_atomic_ioctl+0x9a4/0xc38
       drm_ioctl+0x210/0x4a8
       sys_ioctl+0x138/0xf00
       ret_fast_syscall+0x0/0x1c

other info that might help us debug this:

Chain exists of:
  &hdmi->lock --> fs_reclaim --> dma_fence_map

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  rlock(dma_fence_map);
                               lock(fs_reclaim);
                               lock(dma_fence_map);
  lock(&hdmi->lock);

 *** DEADLOCK ***

3 locks held by kmstest/219:
 #0: f1011de4 (crtc_ww_class_acquire){+.+.}-{0:0}, at: drm_mode_atomic_ioctl+0xf0/0xc38
 #1: c47059c8 (crtc_ww_class_mutex){+.+.}-{3:3}, at: modeset_lock+0xf8/0x230
 #2: c11e1128 (dma_fence_map){++++}-{0:0}, at: omap_atomic_commit_tail+0x14/0xbc

stack backtrace:
CPU: 1 PID: 219 Comm: kmstest Not tainted 6.5.0-rc2+ #2
Hardware name: Generic DRA74X (Flattened Device Tree)
 unwind_backtrace from show_stack+0x10/0x14
 show_stack from dump_stack_lvl+0x58/0x70
 dump_stack_lvl from check_noncircular+0x164/0x198
 check_noncircular from __lock_acquire+0x145c/0x29cc
 __lock_acquire from lock_acquire.part.0+0xb4/0x258
 lock_acquire.part.0 from __mutex_lock+0x90/0x950
 __mutex_lock from mutex_lock_nested+0x1c/0x24
 mutex_lock_nested from hdmi5_bridge_mode_set+0x1c/0x50
 hdmi5_bridge_mode_set from drm_bridge_chain_mode_set+0x48/0x5c
 drm_bridge_chain_mode_set from crtc_set_mode+0x188/0x1d0
 crtc_set_mode from omap_atomic_commit_tail+0x2c/0xbc
 omap_atomic_commit_tail from commit_tail+0x9c/0x188
 commit_tail from drm_atomic_helper_commit+0x158/0x18c
 drm_atomic_helper_commit from drm_atomic_commit+0xa4/0xe8
 drm_atomic_commit from drm_mode_atomic_ioctl+0x9a4/0xc38
 drm_mode_atomic_ioctl from drm_ioctl+0x210/0x4a8
 drm_ioctl from sys_ioctl+0x138/0xf00
 sys_ioctl from ret_fast_syscall+0x0/0x1c
Exception stack(0xf1011fa8 to 0xf1011ff0)
1fa0:                   00466d58 be9ab510 00000003 c03864bc be9ab510 be9ab4e0
1fc0: 00466d58 be9ab510 c03864bc 00000036 00466ef0 00466fc0 00467020 00466f20
1fe0: b6bc7ef4 be9ab4d0 b6bbbb00 b6cb2cc0

Fixes: 250aa229 ("drm/omapdrm: Annotate dma-fence critical section in commit path")
Reviewed-by: Aradhya Bhatia <a-bhatia1@ti.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230920-dma-fence-annotation-revert-v1-2-7ebf6f7f5bf6@ideasonboard.com

9d7c8c06

Revert "drm/tidss: Annotate dma-fence critical section in commit path" · ca34d816

Tomi Valkeinen authored Sep 20, 2023

This reverts commit 4d56a4f0.

The DMA-fence annotations cause a lockdep warning (see below). As per
https://patchwork.freedesktop.org/patch/462170/ it sounds like the
annotations don't work correctly.

======================================================
WARNING: possible circular locking dependency detected
6.6.0-rc2+ #1 Not tainted
------------------------------------------------------
kmstest/733 is trying to acquire lock:
ffff8000819377f0 (fs_reclaim){+.+.}-{0:0}, at: __kmem_cache_alloc_node+0x58/0x2d4

but task is already holding lock:
ffff800081a06aa0 (dma_fence_map){++++}-{0:0}, at: tidss_atomic_commit_tail+0x20/0xc0 [tidss]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (dma_fence_map){++++}-{0:0}:
       __dma_fence_might_wait+0x5c/0xd0
       dma_resv_lockdep+0x1a4/0x32c
       do_one_initcall+0x84/0x2fc
       kernel_init_freeable+0x28c/0x4c4
       kernel_init+0x24/0x1dc
       ret_from_fork+0x10/0x20

-> #1 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
       fs_reclaim_acquire+0x70/0xe4
       __kmem_cache_alloc_node+0x58/0x2d4
       kmalloc_trace+0x38/0x78
       __kthread_create_worker+0x3c/0x150
       kthread_create_worker+0x64/0x8c
       workqueue_init+0x1e8/0x2f0
       kernel_init_freeable+0x11c/0x4c4
       kernel_init+0x24/0x1dc
       ret_from_fork+0x10/0x20

-> #0 (fs_reclaim){+.+.}-{0:0}:
       __lock_acquire+0x1370/0x20d8
       lock_acquire+0x1e8/0x308
       fs_reclaim_acquire+0xd0/0xe4
       __kmem_cache_alloc_node+0x58/0x2d4
       __kmalloc_node_track_caller+0x58/0xf0
       kmemdup+0x34/0x60
       regmap_bulk_write+0x64/0x2c0
       tc358768_bridge_pre_enable+0x8c/0x12d0 [tc358768]
       drm_atomic_bridge_call_pre_enable+0x68/0x80 [drm]
       drm_atomic_bridge_chain_pre_enable+0x50/0x158 [drm]
       drm_atomic_helper_commit_modeset_enables+0x164/0x264 [drm_kms_helper]
       tidss_atomic_commit_tail+0x58/0xc0 [tidss]
       commit_tail+0xa0/0x188 [drm_kms_helper]
       drm_atomic_helper_commit+0x1a8/0x1c0 [drm_kms_helper]
       drm_atomic_commit+0xa8/0xe0 [drm]
       drm_mode_atomic_ioctl+0x9ec/0xc80 [drm]
       drm_ioctl_kernel+0xc4/0x170 [drm]
       drm_ioctl+0x234/0x4b0 [drm]
       drm_compat_ioctl+0x110/0x12c [drm]
       __arm64_compat_sys_ioctl+0x128/0x150
       invoke_syscall+0x48/0x110
       el0_svc_common.constprop.0+0x40/0xe0
       do_el0_svc_compat+0x1c/0x38
       el0_svc_compat+0x48/0xb4
       el0t_32_sync_handler+0xb0/0x138
       el0t_32_sync+0x194/0x198

other info that might help us debug this:

Chain exists of:
  fs_reclaim --> mmu_notifier_invalidate_range_start --> dma_fence_map

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  rlock(dma_fence_map);
                               lock(mmu_notifier_invalidate_range_start);
                               lock(dma_fence_map);
  lock(fs_reclaim);

 *** DEADLOCK ***

3 locks held by kmstest/733:
 #0: ffff800082e5bba0 (crtc_ww_class_acquire){+.+.}-{0:0}, at: drm_mode_atomic_ioctl+0x118/0xc80 [drm]
 #1: ffff000004224c88 (crtc_ww_class_mutex){+.+.}-{3:3}, at: modeset_lock+0xdc/0x1a0 [drm]
 #2: ffff800081a06aa0 (dma_fence_map){++++}-{0:0}, at: tidss_atomic_commit_tail+0x20/0xc0 [tidss]

stack backtrace:
CPU: 0 PID: 733 Comm: kmstest Not tainted 6.6.0-rc2+ #1
Hardware name: Toradex Verdin AM62 on Verdin Development Board (DT)
Call trace:
 dump_backtrace+0x98/0x118
 show_stack+0x18/0x24
 dump_stack_lvl+0x60/0xac
 dump_stack+0x18/0x24
 print_circular_bug+0x288/0x368
 check_noncircular+0x168/0x17c
 __lock_acquire+0x1370/0x20d8
 lock_acquire+0x1e8/0x308
 fs_reclaim_acquire+0xd0/0xe4
 __kmem_cache_alloc_node+0x58/0x2d4
 __kmalloc_node_track_caller+0x58/0xf0
 kmemdup+0x34/0x60
 regmap_bulk_write+0x64/0x2c0
 tc358768_bridge_pre_enable+0x8c/0x12d0 [tc358768]
 drm_atomic_bridge_call_pre_enable+0x68/0x80 [drm]
 drm_atomic_bridge_chain_pre_enable+0x50/0x158 [drm]
 drm_atomic_helper_commit_modeset_enables+0x164/0x264 [drm_kms_helper]
 tidss_atomic_commit_tail+0x58/0xc0 [tidss]
 commit_tail+0xa0/0x188 [drm_kms_helper]
 drm_atomic_helper_commit+0x1a8/0x1c0 [drm_kms_helper]
 drm_atomic_commit+0xa8/0xe0 [drm]
 drm_mode_atomic_ioctl+0x9ec/0xc80 [drm]
 drm_ioctl_kernel+0xc4/0x170 [drm]
 drm_ioctl+0x234/0x4b0 [drm]
 drm_compat_ioctl+0x110/0x12c [drm]
 __arm64_compat_sys_ioctl+0x128/0x150
 invoke_syscall+0x48/0x110
 el0_svc_common.constprop.0+0x40/0xe0
 do_el0_svc_compat+0x1c/0x38
 el0_svc_compat+0x48/0xb4
 el0t_32_sync_handler+0xb0/0x138
 el0t_32_sync+0x194/0x198

Fixes: 4d56a4f0 ("drm/tidss: Annotate dma-fence critical section in commit path")
Reviewed-by: Aradhya Bhatia <a-bhatia1@ti.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230920-dma-fence-annotation-revert-v1-1-7ebf6f7f5bf6@ideasonboard.com

ca34d816

30 Oct, 2023 11 commits

drm/panfrost: Remove incorrect IS_ERR() check · b2139fb5

Steven Price authored Oct 20, 2023

sg_page_iter_page() doesn't return an error code, so the IS_ERR() check
is wrong and the error path will never be executed. This also allows
simplifying the code to remove the local variable 'page'.

CC: Adrián Larumbe <adrian.larumbe@collabora.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/376713ff-9a4f-4ea3-b097-fb5efb685d95@moroto.mountainSigned-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Tested-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231020104405.53992-1-steven.price@arm.com

b2139fb5

drm/v3d: wait for all jobs to finish before unregistering · 79d94360

Maíra Canal authored Oct 23, 2023

Currently, we are only warning the user if the BIN or RENDER jobs don't
finish before we unregister V3D. We must wait for all jobs to finish
before unregistering. Therefore, warn the user if TFU or CSD jobs
are not done by the time the driver is unregistered.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Maíra Canal <mairacanal@riseup.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20231023105927.101502-1-mcanal@igalia.com

79d94360

accel/ivpu: Add support for delayed D0i3 entry message · 3198a62e

Andrzej Kacprowski authored Oct 28, 2023

Currently the VPU firmware prepares for D0i3 every time the VPU
is entering D0i2 Idle state. This is not optimal as we might not
enter D0i3 every time we enter D0i2 Idle and this preparation
is quite costly.

This optimization moves D0i3 preparation to a dedicated
message sent from the host driver only when the driver is about
to enter D0i3 - this reduces power consumption and latency for
certain workloads, for example audio workloads that submit
inference every 10 ms.

The VPU needs non zero time to enter IDLE state after responding to
D0i3 entry message. If the driver does not wait for the VPU to enter
IDLE state it could cause warm boot failures.
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028133415.1169975-12-stanislaw.gruszka@linux.intel.com

3198a62e

accel/ivpu/37xx: Print warning when VPUIP is not idle during power down · cc19feda

Stanislaw Gruszka authored Oct 28, 2023

Print warning if VPUIP is not idle during power down.

Use warn log level also when we fail to enter reset state
as this is not really an error but unexpected behavior.
Reviewed-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028133415.1169975-11-stanislaw.gruszka@linux.intel.com

cc19feda

accel/ivpu: Introduce ivpu_ipc_send_receive_active() · 45e45362

Karol Wachowski authored Oct 28, 2023

Split ivpu_ipc_send_receive() implementation to have a version
that does not call pm_runtime_resume_and_get(). That implementation
can be invoked when device is up and runtime resume is prohibited
(for example at the end of boot sequence).

The new function will be used for D0i3 entry IPC message addition
in the separate change.
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028133415.1169975-10-stanislaw.gruszka@linux.intel.com

45e45362

accel/ivpu: Pass D0i3 residency time to the VPU firmware · 3de6d959

Andrzej Kacprowski authored Oct 28, 2023

The firmware needs to know the time spent in D0i3/D3 to
calculate telemetry data. The D0i3/D3 residency time is
calculated by the driver and passed to the firmware
in the boot parameters.

The driver also passes VPU perf counter value captured
right before entering D0i3 - this allows the VPU firmware
to generate monotonic timestamps for the logs.
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028133415.1169975-9-stanislaw.gruszka@linux.intel.com

3de6d959

accel/ivpu/40xx: Capture D0i3 entry host and device timestamps · db37a5bf

Andrzej Kacprowski authored Oct 28, 2023

The driver needs to capture the D0i3 entry timestamp to
calculate D0i3 residency time.

The D0i3 residency time and the VPU timestamp are passed
to the firmware at D0i3 exit (warm boot).
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028133415.1169975-8-stanislaw.gruszka@linux.intel.com

db37a5bf

accel/ivpu: Change test_mode module param to bitmask · 8b5cec3c

Karol Wachowski authored Oct 28, 2023

Change meaning of test_mode module parameter from integer value
to bitmask allowing setting different test features with corresponding
bits.
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028133415.1169975-7-stanislaw.gruszka@linux.intel.com

8b5cec3c

accel/ivpu: Add support for VPU_JOB_FLAGS_NULL_SUBMISSION_MASK · 61ab485f

Andrzej Kacprowski authored Oct 28, 2023

Add test_mode = 3 that add VPU_JOB_FLAGS_NULL_SUBMISSION_MASK
flag to the job send to the VPU device. Then the VPU will process
the job but won't execute commands (except the command to signal
the fence).

This can be used to estimate job processing overhead in the host
software and VPU firmware.

Unlike the null hardware mode, the null submission mode will
still work even if UMD uses VPU fences to track job completion.
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028133415.1169975-6-stanislaw.gruszka@linux.intel.com

61ab485f

accel/ivpu: Remove reset from power up sequence · bacc130d

Karol Wachowski authored Oct 28, 2023

Setting a non-zero work point resets the IP hence IP_RESET
trigger is redundant.
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028133415.1169975-5-stanislaw.gruszka@linux.intel.com

bacc130d

accel/ivpu: Add dvfs_mode file to debugfs · f13108fc

Tomasz Rusinowicz authored Oct 28, 2023

Add new debugfs file to set dvfs_mode FW boot parameter and restart
the FW to allow experimenting with DVFS (dynamic voltage & frequency
scaling).
Signed-off-by: Tomasz Rusinowicz <tomasz.rusinowicz@intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231028133415.1169975-4-stanislaw.gruszka@linux.intel.com

f13108fc