Commits · bcca6298069435fa7f23ab98f4cb635fcbd3bc20 · Kirill Smelkov / linux

10 Aug, 2020 9 commits

drm/amdgpu: fix reload KMD hang on GFX10 KIQ · bcca6298

Monk Liu authored Aug 10, 2020

GFX10 KIQ will hang if we try below steps:
modprobe amdgpu
rmmod amdgpu
modprobe amdgpu sched_hw_submission=4

Due to KIQ is always living there even after KMD unloaded
thus when doing the realod KIQ will crash upon its register
being programed by different values with the previous loading
(the config like HQD addr, ring size, is easily changed if we alter
the sched_hw_submission)

the fix is we must inactive KIQ first before touching any
of its registgers
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

bcca6298

drm/amdgpu: update gc golden register for arcturus · 5a58abf5

shiwu.zhang authored Aug 07, 2020

Update golden setting to improve performance on HPC
and ML apps
Signed-off-by: shiwu.zhang <shiwu.zhang@amd.com>
Tested-by: gang.long <gang.long@amd.com>
Reviewed-by: guchun.chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5a58abf5

drm/amd/powerplay: correct UVD/VCE PG state on custom pptable uploading · 8d0717f4

Evan Quan authored Aug 07, 2020

The UVD/VCE PG state is managed by UVD and VCE IP. It's error-prone to
assume the bootup state in SMU based on the dpm status.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8d0717f4

drm/amd/powerplay: correct Vega20 cached smu feature state · 7358462f

Evan Quan authored Aug 07, 2020

Correct the cached smu feature state on pp_features sysfs
setting.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

7358462f

drm/amdgpu: Skip some registers config for SRIOV · 1d447326

Liu ChengZhe authored Aug 06, 2020

Some registers are not accessible to virtual function setup, so
skip their initialization when in VF-SRIOV mode.

v2: move SRIOV VF check into specify functions;
modify commit description and comment.
Signed-off-by: Liu ChengZhe <ChengZhe.Liu@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1d447326

drm/amdkfd: Fix spurious debug exception on gfx10 · b60646a2

Jay Cornwall authored Jul 24, 2020

s_barrier triggers a debug exception when issued with PRIV=1,
DEBUG_EN=1. This causes spurious notifications to rocm-gdb.

Clear MODE before issuing s_barrier and restore MODE afterwards
in the context restore handler.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Tested-by: Laurent Morichetti <laurent.morichetti@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b60646a2

Revert "drm/amdkfd: Unify gfx9/gfx10 context save area layouts" · c342d7c5

Felix Kuehling authored Aug 07, 2020

This reverts commit 0a5baee4.

The change introduced a regression on some chips. Reverting until
a proper solution can be found.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c342d7c5

Revert "drm/amdkfd: Fix spurious debug exception on gfx10" · 52189922

Felix Kuehling authored Aug 07, 2020

This reverts commit ea368183.

Needed due to conflicts when reverting "drm/amdkfd: Unify gfx9/gfx10
context save area layouts".
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

52189922

drm: amdgpu: Use the correct size when allocating memory · 5068ed57

Christophe JAILLET authored Aug 09, 2020

When '*sgt' is allocated, we must allocated 'sizeof(**sgt)' bytes instead
of 'sizeof(*sg)'.

The sizeof(*sg) is bigger than sizeof(**sgt) so this wastes memory but
it won't lead to corruption.

Fixes: f44ffd67 ("drm/amdgpu: add support for exporting VRAM using DMA-buf v3")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5068ed57

07 Aug, 2020 5 commits

drm/amdgpu: Fix bug where DPM is not enabled after hibernate and resume · 82c24547

Sandeep Raghuraman authored Aug 06, 2020

Reproducing bug report here:
After hibernating and resuming, DPM is not enabled. This remains the case
even if you test hibernate using the steps here:
https://www.kernel.org/doc/html/latest/power/basic-pm-debugging.html

I debugged the problem, and figured out that in the file hardwaremanager.c,
in the function, phm_enable_dynamic_state_management(), the check
'if (!hwmgr->pp_one_vf && smum_is_dpm_running(hwmgr) && !amdgpu_passthrough(adev) && adev->in_suspend)'
returns true for the hibernate case, and false for the suspend case.

This means that for the hibernate case, the AMDGPU driver doesn't enable DPM
(even though it should) and simply returns from that function.
In the suspend case, it goes ahead and enables DPM, even though it doesn't need to.

I debugged further, and found out that in the case of suspend, for the
CIK/Hawaii GPUs, smum_is_dpm_running(hwmgr) returns false, while in the case of
hibernate, smum_is_dpm_running(hwmgr) returns true.

For CIK, the ci_is_dpm_running() function calls the ci_is_smc_ram_running() function,
which is ultimately used to determine if DPM is currently enabled or not,
and this seems to provide the wrong answer.

I've changed the ci_is_dpm_running() function to instead use the same method that
some other AMD GPU chips do (e.g Fiji), which seems to read the voltage controller.
I've tested on my R9 390 and it seems to work correctly for both suspend and
hibernate use cases, and has been stable so far.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=208839Signed-off-by: Sandeep Raghuraman <sandy.8925@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

82c24547

drm/amdgpu: unlock mutex on error · 94561899

Dennis Li authored Aug 04, 2020

Make sure to unlock the mutex when error happen

v2:
1. correct syntax error in the commit comments
2. remove change-Id
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

94561899

drm/amd/powerplay: put VCN/JPEG into PG ungate state before dpm table setup(V3) · 520f5e42

Evan Quan authored Aug 05, 2020

As VCN related dpm table setup needs VCN be in PG ungate state. Same logics
applies to JPEG.

V2: fix paste typo
V3: code cosmetic
Signed-off-by: Evan Quan <evan.quan@amd.com>
Tested-by: Matt Coffin <mcoffin13@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

520f5e42

drm/amd/powerplay: update swSMU VCN/JPEG PG logics · ad1cac26

Evan Quan authored Aug 03, 2020

Add lock protections and avoid unnecessary actions
if the PG state is already the same as required.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Tested-by: Matt Coffin <mcoffin13@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ad1cac26

drm/amdgpu: use mode1 reset by default for sienna_cichlid · ca6fd7a6

Likun Gao authored Aug 06, 2020

Swith default gpu reset method for sienna_cichlid to MODE1 reset.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ca6fd7a6

06 Aug, 2020 26 commits

drm/amd/display: Drop dm_determine_update_type_for_commit · f6d7c7fa

Nicholas Kazlauskas authored Jul 28, 2020

[Why]
This was added in the past to solve the issue of not knowing when
to stall for medium and full updates in DM.

Since DC is ultimately decides what requires bandwidth changes we
wanted to make use of it directly to determine this.

The problem is that we can't actually pass any of the stream or surface
updates into DC global validation, so we don't actually check if the new
configuration is valid - we just validate the old existing config
instead and stall for outstanding commits to finish.

There's also the problem of grabbing the DRM private object for
pageflips which can lead to page faults in the case where commits
execute out of order and free a DRM private object state that was
still required for commit tail.

[How]
Now that we reset the plane in DM with the same conditions DC checks
we can have planes go through DC validation and we know when we need
to check and stall based on whether the stream or planes changed.

We mark lock_and_validation_needed whenever we've done this, so just
go back to using that instead of dm_determine_update_type_for_commit.

Since we'll skip resetting the plane for a pageflip we will no longer
grab the DRM private object for pageflips as well, avoiding the
page fault issued caused by pageflipping under load with commits
executing out of order.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f6d7c7fa

drm/amd/display: Reset plane for anything that's not a FAST update · dc4cb30d

Nicholas Kazlauskas authored Jul 28, 2020

[Why]
MEDIUM or FULL updates can require global validation or affect
bandwidth. By treating these all simply as surface updates we aren't
actually passing this through DC global validation.

[How]
There's currently no way to pass surface updates through DC global
validation, nor do I think it's a good idea to change the interface
to accept these.

DC global validation itself is currently stateless, and we can move
our update type checking to be stateless as well by duplicating DC
surface checks in DM based on DRM properties.

We wanted to rely on DC automatically determining this since DC knows
best, but DM is ultimately what fills in everything into DC plane
state so it does need to know as well.

There are basically only three paths that we exercise in DM today:

1) Cursor (async update)
2) Pageflip (fast update)
3) Full pipe programming (medium/full updates)

Which means that anything that's more than a pageflip really needs to
go down path #3.

So this change duplicates all the surface update checks based on DRM
state instead inside of should_reset_plane().

Next step is dropping dm_determine_update_type_for_commit and we no
longer require the old DC state at all for global validation.

Optimization can come later so we don't reset DC planes at all for
MEDIUM udpates and avoid validation, but we might require some extra
checks in DM to achieve this.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

dc4cb30d

drm/amd/display: Use validated tiling_flags and tmz_surface in commit_tail · 8ce5d842

Nicholas Kazlauskas authored Aug 06, 2020

[Why]
So we're not racing with userspace or deadlocking DM.

[How]
These flags are now stored on dm_plane_state itself and acquried and
validated during commit_check, so just use those instead.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8ce5d842

drm/amd/display: Avoid using unvalidated tiling_flags and tmz_surface in prepare_planes · cf322b49

Nicholas Kazlauskas authored Jul 28, 2020

[Why]
We're racing with userspace as the flags could potentially change
from when we acquired and validated them in commit_check.

[How]
We unfortunately can't drop this function in its entirety from
prepare_planes since we don't know the afb->address at commit_check
time yet.

So instead of querying new tiling_flags and tmz_surface use the ones
from the plane_state directly.

While we're at it, also update the force_disable_dcc option based
on the state from atomic check.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

cf322b49

drm/amd/display: Reset plane when tiling flags change · 9a81cc60

Nicholas Kazlauskas authored Jul 28, 2020

[Why]
Enabling or disable DCC or switching between tiled and linear formats
can require bandwidth updates.

They're currently skipping all DC validation by being treated as purely
surface updates.

[How]
Treat tiling_flag changes (which encode DCC state) as a condition for
resetting the plane.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

9a81cc60

drm/amd/display: Store tiling_flags and tmz_surface on dm_plane_state · 707477b0

Nicholas Kazlauskas authored Jul 28, 2020

[Why]
Store these in advance so we can reuse them later in commit_tail without
having to reserve the fbo again.

These will also be used for checking for tiling changes when deciding
to reset the plane or not.

[How]
This change should mostly be a refactor. Only commit check is affected
for now and I'll drop the get_fb_info calls in prepare_planes and
commit_tail after.

This runs a prepass loop once we think that all planes have been added
to the context and replaces the get_fb_info calls with accessing the
dm_plane_state instead.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

707477b0

drm/amd/powerplay: update driver if file for sienna_cichlid · efa85f3a

Likun Gao authored Aug 06, 2020

Update drive if file for sienna_cichlid.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

efa85f3a

drm/amdgpu: new ids flag for tmz (v2) · 16c642ec

Pierre-Eric Pelloux-Prayer authored Jul 30, 2020

Allows UMD to know if TMZ is supported and enabled.

This commit also bumps KMS_DRIVER_MINOR because if we don't
UMD can't tell if "ids_flags & AMDGPU_IDS_FLAGS_TMZ == 0" means
"tmz is not enabled" or "tmz may be enabled but the kernel doesn't
report it".

v2: use amdgpu_is_tmz() and reworded commit message.
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>