Commits · bcbede6fbeb0e1eb85ccbb532faf06d3b31f0e73 · Kirill Smelkov / linux

19 Jun, 2023 2 commits

Merge tag 'amd-drm-next-6.5-2023-06-16' of https://gitlab.freedesktop.org/agd5f/linux into drm-next · bcbede6f

Dave Airlie authored Jun 19, 2023

amd-drm-next-6.5-2023-06-16:

amdgpu:
- Misc display fixes
- W=1 fixes
- Improve scheduler naming
- DCN 3.1.4 fixes
- kdoc fixes
- Enable W=1
- VCN 4.0 fix
- xgmi fixes
- TOPDOWN fix for large BAR systems
- eDP fix
- PSR fixes
- SubVP fixes
- Freesync fix
- DPIA fix
- SMU 13.0.5 fixes
- vblflash fix
- RAS fixes
- SDMA 4 fix
- BO locking fix
- BO backing store fix
- NBIO 7.9 fixes
- GC 9.4.3 fixes
- GPU reset recovery fixes
- HMM fix

amdkfd:
- Fix NULL check
- Trap fixes
- Queue count fix
- Add event age tracking

radeon:
- fbdev client fix

scheduler:
- Avoid an infinite loop

UAPI:
- Add KFD event age tracking:
  Proposed ROCT-Thunk-Interface:
  https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/commit/efdbf6cfbc026bd68ac3c35d00dacf84370eb81e
  https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/commit/1820ae0a2db85b6f584611dc0cde1a00e7c22915
  Proposed ROCR-Runtime:
  https://github.com/RadeonOpenCompute/ROCR-Runtime/compare/master...zhums:ROCR-Runtime:new_event_wait_review
  https://github.com/RadeonOpenCompute/ROCR-Runtime/commit/e1f5bdb88eb882ac798aeca2c00ea3fbb2dba459
  https://github.com/RadeonOpenCompute/ROCR-Runtime/commit/7d26afd14107b5c2a754c1a3f415d89f3aabb503

drm:
- DP MST fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230616163548.7706-1-alexander.deucher@amd.com

bcbede6f

Merge tag 'drm-misc-next-fixes-2023-06-15' of... · 4e237d84

Dave Airlie authored Jun 19, 2023

Merge tag 'drm-misc-next-fixes-2023-06-15' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

Short summary of fixes pull:

 * Fix fbdev initializer macros
Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230615114009.GA27261@linux-uq9g

4e237d84

15 Jun, 2023 38 commits

Merge tag 'mediatek-drm-next-6.5' of... · e245db7b

Dave Airlie authored Jun 16, 2023

Merge tag 'mediatek-drm-next-6.5' of https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux into drm-next

Mediatek DRM Next for Linux 6.5

1. Add display binding document for MT6795
Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230614225803.2547-1-chunkuang.hu@kernel.org

e245db7b

Merge tag 'drm-intel-next-2023-06-10' of git://anongit.freedesktop.org/drm/drm-intel into drm-next · 8e04cddf

Dave Airlie authored Jun 16, 2023

drm/i915 feature pull #2 for v6.5:

Features and functionality:
- Meteorlake PM demand (Vinod, Mika)
- Switch to dedicated workqueues to stop using flush_scheduled_work() (Luca)

Refactoring and cleanups:
- Move display runtime init under display/ (Matt)
- Async flip error message clarifications (Arun)

Fixes:
- Remove 10bit gamma on desktop gen3 parts, they don't support it (Ville)
- Fix driver probe error handling if driver creation fails (Matt)
- Fix all -Wunused-but-set-variable warnings, and enable it for i915 (Jani)
- Stop using edid_blob_ptr (Jani)
- Fix log level for "CDS interlane align done" (Khaled)
- Fix an unnecessary include prefix (Matt)

Merges:
- Backmerge drm-next to sync with drm-intel-gt-next (Jani)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87o7lnpxz2.fsf@intel.com

8e04cddf

drm/dp_mst: Clear MSG_RDY flag before sending new message · 72f1de49

Wayne Lin authored Apr 17, 2023

[Why]
The sequence for collecting down_reply from source perspective should
be:

Request_n->repeat (get partial reply of Request_n->clear message ready
flag to ack DPRX that the message is received) till all partial
replies for Request_n are received->new Request_n+1.

Now there is chance that drm_dp_mst_hpd_irq() will fire new down
request in the tx queue when the down reply is incomplete. Source is
restricted to generate interveleaved message transactions so we should
avoid it.

Also, while assembling partial reply packets, reading out DPCD DOWN_REP
Sideband MSG buffer + clearing DOWN_REP_MSG_RDY flag should be
wrapped up as a complete operation for reading out a reply packet.
Kicking off a new request before clearing DOWN_REP_MSG_RDY flag might
be risky. e.g. If the reply of the new request has overwritten the
DPRX DOWN_REP Sideband MSG buffer before source writing one to clear
DOWN_REP_MSG_RDY flag, source then unintentionally flushes the reply
for the new request. Should handle the up request in the same way.

[How]
Separete drm_dp_mst_hpd_irq() into 2 steps. After acking the MST IRQ
event, driver calls drm_dp_mst_hpd_irq_send_new_request() and might
trigger drm_dp_mst_kick_tx() only when there is no on going message
transaction.

Changes since v1:
* Reworked on review comments received
-> Adjust the fix to let driver explicitly kick off new down request
when mst irq event is handled and acked
-> Adjust the commit message

Changes since v2:
* Adjust the commit message
* Adjust the naming of the divided 2 functions and add a new input
  parameter "ack".
* Adjust code flow as per review comments.

Changes since v3:
* Update the function description of drm_dp_mst_hpd_irq_handle_event

Changes since v4:
* Change ack of drm_dp_mst_hpd_irq_handle_event() to be an array align
  the size of esi[]
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

72f1de49

drm/amdgpu: Increase hmm range get pages timeout · 5d1c70bb

Philip Yang authored Jun 06, 2023

If hmm_range_fault returns -EBUSY, we should call hmm_range_fault again
to validate the remaining pages. On one system with NUMA auto balancing
enabled, hmm_range_fault takes 6 seconds for 1GB range because CPU
migrate the range one page at a time. To be safe, increase timeout value
to 1 second for 128MB range.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5d1c70bb

drm/amdgpu: Enable translate further for GC v9.4.3 · d728eda3

Philip Yang authored May 03, 2023

To extend UTCL2 reach.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d728eda3

drm/amdgpu: Remove unused NBIO interface · 0e41639d

Lijo Lazar authored Jun 13, 2023

Set compute partition mode interface in NBIO is no longer used. Remove
the only implementation from NBIO v7.9
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0e41639d

drm/amdkfd: bump kfd ioctl minor version for event age availability · d297eedf

James Zhu authored Jun 07, 2023

Bump the minor version to declare event age tracking feature is now
available.

In kernel amdgpu driver, kfd_wait_on_events is used to support user
space signal event wait function. For multiple threads waiting on same
event scenery, race condition could occur since some threads after
checking signal condition, before calling kfd_wait_on_events, the
event interrupt could be fired and wake up other thread which are
sleeping on this event. Then those threads could fall into sleep
without waking up again. Adding event age tracking in both kernel and
user mode, will help avoiding this race condition.

Proposed ROCT-Thunk-Interface:
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/commit/efdbf6cfbc026bd68ac3c35d00dacf84370eb81e
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/commit/1820ae0a2db85b6f584611dc0cde1a00e7c22915

Proposed ROCR-Runtime:
https://github.com/RadeonOpenCompute/ROCR-Runtime/compare/master...zhums:ROCR-Runtime:new_event_wait_review
https://github.com/RadeonOpenCompute/ROCR-Runtime/commit/e1f5bdb88eb882ac798aeca2c00ea3fbb2dba459
https://github.com/RadeonOpenCompute/ROCR-Runtime/commit/7d26afd14107b5c2a754c1a3f415d89f3aabb503Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d297eedf

drm/amdkfd: update user space last_event_age · 973fddea

James Zhu authored Jun 08, 2023

Update user space last_event_age when event age is enabled.
It is only for KFD_EVENT_TYPE_SIGNAL which is checked by user space.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

973fddea

drm/amdkfd: set activated flag true when event age unmatchs · 96cdb538

James Zhu authored May 17, 2023

Set waiter's activated flag true when event age unmatchs with last_event_age.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

96cdb538

drm/amdkfd: add event_age tracking when receiving interrupt · 4057e6ce

James Zhu authored May 17, 2023

Add event_age tracking when receiving interrupt.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4057e6ce

drm/amdkfd: add event age tracking · 6f582513

James Zhu authored May 17, 2023

Add event age tracking
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6f582513

drm/scheduler: avoid infinite loop if entity's dependency is a scheduled error fence · 4f9b94d8

ZhenGuo Yin authored May 09, 2023

[Why]
drm_sched_entity_add_dependency_cb ignores the scheduled fence and return false.
If entity's dependency is a scheduler error fence and drm_sched_stop is called
due to TDR, drm_sched_entity_pop_job will wait for the dependency infinitely.

[How]
Do not wait or ignore the scheduled error fence, add drm_sched_entity_wakeup
callback for the dependency with scheduled error fence.
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4f9b94d8

drm/amdgpu: add entity error check in amdgpu_ctx_get_entity · 71eaac36

ZhenGuo Yin authored May 11, 2023

[Why]
UMD is not aware of entity error, and will keep submitting jobs
into the error entity.

[How]
Add entity error check when getting entity from ctx.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

71eaac36

drm/amdgpu: add VM generation token · f88e295e

Christian König authored Apr 19, 2023

Instead of using the VRAM lost counter add a 64bit token which indicates
if a context or job is still valid to use.

Should the VRAM be lost or the page tables need re-creation the token will
change indicating that userspace needs to act and re-create the contexts
and re-submit the work.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f88e295e

drm/amdgpu: reset VM when an error is detected · 55bf196f

Christian König authored Apr 18, 2023

When some problem with the updates of page tables is detected reset the
state machine of the VM and re-create all page tables from scratch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

55bf196f

drm/amdgpu: abort submissions during prepare on error · e84e697d

Christian König authored Apr 17, 2023

Forward errors from previous submissions to this one.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e84e697d

drm/amdgpu: mark soft recovered fences with -ENODATA · 89fae8dc

Christian König authored Apr 17, 2023

Set the fence error code before trying to soft-recover it.

It gets overwritten when a hard recovery is required.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

89fae8dc

drm/amdgpu: mark force completed fences with -ECANCELED · 0a33b11d

Christian König authored Apr 17, 2023

When we force complete fences we should mark them as canceled.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0a33b11d

drm/amdgpu: add amdgpu_error_* debugfs file · b13eb02b

Christian König authored Apr 19, 2023

This allows us to insert some error codes into the bottom of the pipeline
on an engine.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b13eb02b

drm/amdgpu: mark GC 9.4.3 experimental for now · 2eb841bd

Alex Deucher authored Jun 15, 2023

Mark as experimental for now until we get closer to production
to avoid possible undesireable behavior when mixing newer
boards with older kernels.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

2eb841bd

drm/amdgpu: Use PSP FW API for partition switch · b00f5537

Lijo Lazar authored Jun 13, 2023

Use PSP firmware interface for switching compute partitions.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b00f5537

drm/amdgpu: Change nbio v7.9 xcp status definition · fe381726

Lijo Lazar authored Jun 13, 2023

PARTITION_MODE field in PARTITION_COMPUTE_STATUS register is defined as
below by firmware.

	SPX = 0, DPX = 1, TPX = 2, QPX = 3, CPX = 4

Change driver definition accordingly.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

fe381726

drm/amdgpu: make sure that BOs have a backing store · ca0b954a

Christian König authored Jun 05, 2023

It's perfectly possible that the BO is about to be destroyed and doesn't
have a backing store associated with it.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ca0b954a

drm/amdgpu: make sure BOs are locked in amdgpu_vm_get_memory · e2ad8e2d

Christian König authored Jun 05, 2023

We need to grab the lock of the BO or otherwise can run into a crash
when we try to inspect the current location.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e2ad8e2d

drm/amdgpu: Add checking mc_vram_size · 43aedbf4

Stanley.Yang authored Jun 12, 2023

Do not compare injection address with mc_vram_size
if mc_vram_size is zero.
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

43aedbf4

drm/amdgpu: Optimize checking ras supported · 38298ce6

Stanley.Yang authored Jun 12, 2023

Using "is_app_apu" to identify device in the native
APU mode or carveout mode.
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

38298ce6

drm/amdgpu: Add channel_dis_num to ras init flags · 6fac3964

Candice Li authored Jun 12, 2023

Add disabled channel number to ras init flags.
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6fac3964

drm/amdgpu: Update total channel number for umc v8_10 · bcd9a5f8

Candice Li authored Jun 10, 2023

Update total channel number for umc v8_10.
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

bcd9a5f8

drm/amd/pm: Align eccinfo table structure with smu v13_0_0 interface · 4506f0bc

Candice Li authored Jun 09, 2023

Update eccinfo table structure according to smu v13_0_0 interface.

v2: Calculate array size instead of using macro definition.
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4506f0bc

drm/amd/display: Convert to kdoc formats in dc/core/dc.c · c39ca69b

Srinivasan Shanmugam authored Jun 13, 2023

Fixes the following gcc with W=1:

drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:3483: warning: Cannot understand * *******************************************************************************
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:4204: warning: Cannot understand * *******************************************************************************

Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c39ca69b

drm/amdkfd: decrement queue count on mes queue destroy · 80a780ab

Jonathan Kim authored Jun 13, 2023

Queue count should decrement on queue destruction regardless of HWS
support type.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

80a780ab

drm/amd/pm: enable more Pstates profile levels for SMU v13.0.5 · 121f17ac

Tim Huang authored Jun 09, 2023

This patch enables following UMD stable Pstates profile
levels for power_dpm_force_performance_level interface.

- profile_peak
- profile_min_sclk
- profile_standard
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

121f17ac

drm/radeon: Fix missing prototypes in radeon_atpx_handler.c · fdc95df9

Srinivasan Shanmugam authored Jun 12, 2023

Fixes the following gcc with W=1:

drivers/gpu/drm/radeon/radeon_atpx_handler.c:64:6: warning: no previous prototype for ‘radeon_has_atpx’ [-Wmissing-prototypes]
   64 | bool 4(void) {
      |      ^~~~~~~~~~~~~~~
drivers/gpu/drm/radeon/radeon_atpx_handler.c:68:6: warning: no previous prototype for ‘radeon_has_atpx_dgpu_power_cntl’ [-Wmissing-prototypes]
   68 | bool radeon_has_atpx_dgpu_power_cntl(void) {
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/radeon/radeon_atpx_handler.c:72:6: warning: no previous prototype for ‘radeon_is_atpx_hybrid’ [-Wmissing-prototypes]
   72 | bool radeon_is_atpx_hybrid(void) {
      |      ^~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/radeon/radeon_atpx_handler.c:77:6: warning: no previous prototype for ‘radeon_atpx_dgpu_req_power_for_displays’ [-Wmissing-prototypes]
   77 | bool radeon_atpx_dgpu_req_power_for_displays(void) {
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/radeon/radeon_atpx_handler.c:596:6: warning: no previous prototype for ‘radeon_register_atpx_handler’ [-Wmissing-prototypes]
  596 | void radeon_register_atpx_handler(void)
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/radeon/radeon_atpx_handler.c:614:6: warning: no previous prototype for ‘radeon_unregister_atpx_handler’ [-Wmissing-prototypes]
  614 | void radeon_unregister_atpx_handler(void)
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/radeon/radeon_atpx_handler.c:159: warning: expecting prototype for radeon_atpx_validate_functions(). Prototype was for radeon_atpx_validate() instead

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

fdc95df9

drm/amdgpu: Fix usage of UMC fill record in RAS · 71344a71

Luben Tuikov authored Jun 10, 2023

The fixed commit listed in the Fixes tag below, introduced a bug in
amdgpu_ras.c::amdgpu_reserve_page_direct(), in that when introducing the new
amdgpu_umc_fill_error_record() and internally in that new function the physical
address (argument "uint64_t retired_page"--wrong name) is right-shifted by
AMDGPU_GPU_PAGE_SHIFT. Thus, in amdgpu_reserve_page_direct() when we pass
"address" to that new function, we should NOT right-shift it, since this
results, erroneously, in the page address to be 0 for first
2^(2*AMDGPU_GPU_PAGE_SHIFT) memory addresses.

This commit fixes this bug.

Cc: Tao Zhou <tao.zhou1@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Fixes: 400013b2 ("drm/amdgpu: add umc_fill_error_record to make code more simple")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20230610113536.10621-1-luben.tuikov@amd.comReviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

71344a71

drm/amdgpu/sdma4: set align mask to 255 · e5df16d9

Alex Deucher authored Jun 07, 2023

The wptr needs to be incremented at at least 64 dword intervals,
use 256 to align with windows.  This should fix potential hangs
with unaligned updates.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e5df16d9

drm/amdgpu: Report ras_num_recs in debugfs · 740f42a2

Luben Tuikov authored Jun 03, 2023

Report the number of records stored in the RAS EEPROM table in debugfs.

This can be used by user-space to calculate the capacity of the RAS EEPROM
table since "bad_page_cnt_threshold" is also reported in the same place in
debugfs.

See commit 7fb64071 ("drm/amdgpu: Add bad_page_cnt_threshold to debugfs").

ras_num_recs can already be inferred by dumping the RAS EEPROM table, also in
the same debugfs location, see commit reference c65b0805 (drm/amdgpu:
RAS EEPROM table is now in debugfs, 2021-04-08). This commit makes it an
integer value easily shown in a single file.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Tao Zhou <tao.zhou1@amd.com>
Cc: Stanley Yang <Stanley.Yang@amd.com>
Cc: John Clements <john.clements@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20230603051043.211548-1-luben.tuikov@amd.comAcked-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

740f42a2

drm/amdkfd: Remove DUMMY_VRAM_SIZE · 765663b7

Mukul Joshi authored Jun 12, 2023

Remove DUMMY_VRAM_SIZE as it is not needed and can result
in reporting incorrect memory size.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

765663b7

drm/amdgpu: Release SDMAv4.4.2 ecc irq properly · 82a1f42f

Lijo Lazar authored Jun 13, 2023

Release ECC irq only if irq is enabled - only when RAS feature is enabled
ECC irq gets enabled.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

82a1f42f