Commits · e17e27f9bdba274b404454072302cf5ea2282e5d · Kirill Smelkov / linux

05 Oct, 2021 5 commits

drm/amdgpu: handle the case of pci_channel_io_frozen only in amdgpu_pci_resume · e17e27f9

Guchun Chen authored Oct 01, 2021

In current code, when a PCI error state pci_channel_io_normal is detectd,
it will report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI
driver will continue the execution of PCI resume callback report_resume by
pci_walk_bridge, and the callback will go into amdgpu_pci_resume
finally, where write lock is releasd unconditionally without acquiring
such lock first. In this case, a deadlock will happen when other threads
start to acquire the read lock.

To fix this, add a member in amdgpu_device strucutre to cache
pci_channel_state, and only continue the execution in amdgpu_pci_resume
when it's pci_channel_io_frozen.

Fixes: c9a6b82f ("drm/amdgpu: Implement DPC recovery")
Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e17e27f9

drm/amdgpu: print warning and taint kernel if lockup timeout is disabled · 127aedf9

Christian König authored Sep 30, 2021

Make sure that we notice this in error reports.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

127aedf9

drm/amdgpu: revert "Add autodump debugfs node for gpu reset v8" · c8365dbd

Christian König authored Sep 30, 2021

This reverts commit 728e7e0c.

Further discussion reveals that this feature is severely broken
and needs to be reverted ASAP.

GPU reset can never be delayed by userspace even for debugging or
otherwise we can run into in kernel deadlocks.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c8365dbd

drm/amdgpu: init iommu after amdkfd device init · 286826d7

Yifan Zhang authored Sep 28, 2021

This patch is to fix clinfo failure in Raven/Picasso:

Number of platforms: 1
  Platform Profile: FULL_PROFILE
  Platform Version: OpenCL 2.2 AMD-APP (3364.0)
  Platform Name: AMD Accelerated Parallel Processing
  Platform Vendor: Advanced Micro Devices, Inc.
  Platform Extensions: cl_khr_icd cl_amd_event_callback

  Platform Name: AMD Accelerated Parallel Processing Number of devices: 0
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Tested-by: James Zhu <James.Zhu@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

286826d7

drm/amdkfd: remove redundant iommu cleanup code · 499f4d38

Yifan Zhang authored Sep 24, 2021

kfd_resume doesn't involve iommu operation, remove
redundant iommu cleanup code.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
Tested-by: James Zhu <James.Zhu@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

499f4d38

04 Oct, 2021 35 commits

drm/amdgpu/display: fix dependencies for DRM_AMD_DC_SI · c2c15410

Alex Deucher authored Oct 01, 2021

Depends on DRM_AMDGPU_SI and DRM_AMD_DC
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c2c15410

drm/amdgpu/gmc9: convert to IP version checking · 630e959f

Alex Deucher authored Oct 01, 2021

Use IP versions rather than asic_type to differentiate
IP version specific features.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

630e959f

drm/amd/display: Prevent using DMUB rptr that is out-of-bounds · 64df665f

Wyatt Wood authored Sep 21, 2021

[Why]
Running into bugchecks during stress test where rptr is 0xFFFFFFFF.
Typically this is caused by a hard hang, and can come from HW outside
of DCN.

[How]
To prevent bugchecks when writing the DMUB rptr, fist check that the
rptr is valid.
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Wyatt Wood <wyatt.wood@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

64df665f

drm/amdgpu/display: fold DRM_AMD_DC_DCN201 into DRM_AMD_DC_DCN · 519607a2

Alex Deucher authored Oct 01, 2021

No need for a separate kconfig option at this point.
Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

519607a2

drm/amdgpu: remove some repeated includings · 8001ba85

Guo Zhengkui authored Oct 01, 2021

Remove two repeated includings in line 46 and 47.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8001ba85

drm/amdgpu: During s0ix don't wait to signal GFXOFF · d04287d0

Lijo Lazar authored Oct 01, 2021

In the rare event when GFX IP suspend coincides with a s0ix entry, don't
schedule a delayed work, instead signal PMFW immediately to allow GFXOFF
entry. GFXOFF is a prerequisite for s0ix entry. PMFW needs to be
signaled about GFXOFF status before amd-pmc module passes OS HINT
to PMFW telling that everything is ready for a safe s0ix entry.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1712Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d04287d0

Documentation/gpu: remove spurious "+" in amdgpu.rst · aa877970

Alex Deucher authored Sep 29, 2021

Not sure why that was there.  Remove it.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

aa877970

drm/amdgpu: consolidate case statements · 4b3a624c

Alex Deucher authored Sep 29, 2021

IP_VERSION(11, 0, 13) does the exact same thing as
IP_VERSION(11, 0, 12) so squash them together.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

4b3a624c

drm/amdgpu/jpeg: add jpeg2.6 start/end · c6051149

James Zhu authored Sep 29, 2021

Add jpeg2.6 start/end with updated PCTL0_MMHUB_DEEPSLEEP_IB address.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.lilu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c6051149

drm/amdgpu/jpeg2: move jpeg2 shared macro to header file · d4b0ee65

James Zhu authored Sep 29, 2021

Move jpeg2 shared macro to header file
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.lilu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d4b0ee65

drm/amdkfd: fix a potential ttm->sg memory leak · 546dc20f

Lang Yu authored Sep 29, 2021

Memory is allocated for ttm->sg by kmalloc in kfd_mem_dmamap_userptr,
but isn't freed by kfree in kfd_mem_dmaunmap_userptr. Free it!

Fixes: 264fb4d3 ("drm/amdgpu: Add multi-GPU DMA mapping helpers")
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

546dc20f

drm/amdgpu: add an option to override IP discovery table from a file · a79d3709

Alex Deucher authored Sep 17, 2021

If you set amdgpu.discovery=2 you can force the the driver to
fetch the IP discovery table from a file rather than from the
table shipped on the device.  This is useful for debugging and
for device bring up and emulation when the tables may be in flux.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a79d3709

drm/amdkfd: convert kfd_device.c to use GC IP version · c868d584

Alex Deucher authored Aug 12, 2021

rather than asic type.

v2: fix up CZ case
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

c868d584

drm/amdkfd: clean up parameters in kgd2kfd_probe · 5b983db8

Alex Deucher authored Aug 12, 2021

We can get the pdev and asic type from the adev.  No need
to pass them explicitly.

v2: squash in build fix for !CONFIG_HSA_AMD from Anson
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5b983db8

drm/amdgpu: add support for SRIOV in IP discovery path · 6d46d419

Alex Deucher authored Aug 10, 2021

Handle SRIOV requirements when adding IP blocks.

v2: add comment about UVD/VCE support on vega20 SR-IOV
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6d46d419

drm/amdgpu: clean up set IP function · b05b9c59

Alex Deucher authored Aug 10, 2021

Split into several smaller per IP functions to make it
easier to handle ordering issues for things like
SR-IOV in a follow up patch.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b05b9c59

drm/amdgpu: convert IP version array to include instances · 1d789535

Alex Deucher authored Oct 04, 2021

Allow us to query instances versions more cleanly.

Instancing support is not consistent unfortunately. SDMA is a
good example.  Sienna cichlid has 4 total SDMA instances, each
enumerated separately (HWIDs 42, 43, 68, 69).  Arcturus has 8
total SDMA instances, but they are enumerated as multiple
instances of the same HWIDs (4x HWID 42, 4x HWID 43).  UMC
is another example.  On most chips there are multiple
instances with the same HWID.  This allows us to support both
forms.

v2: rebase
v3: clarify instancing support
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1d789535

drm/amdgpu: set CHIP_IP_DISCOVERY as the asic type by default · d0761fd2

Alex Deucher authored Aug 09, 2021

For new chips with no explicit entry in the PCI ID list.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d0761fd2

drm/amdgpu: add new asic_type for IP discovery · 3ae695d6

Alex Deucher authored Aug 09, 2021

Add a new asic type for asics where we don't have an
explicit entry in the PCI ID list.  We don't need
an asic type for these asics, other than something higher
than the existing ones, so just use this for all new
asics.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3ae695d6

drm/amdgpu/ucode: add default behavior · aa9f8cc3

Alex Deucher authored Aug 09, 2021

Default to PSP ucode loading unless the user specifies
direct.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

aa9f8cc3

drm/amdgpu: get VCN harvest information from IP discovery table · f1741615

Alex Deucher authored Aug 09, 2021

Use the table rather than asic specific harvest registers.

v2: remove harvesting register checking
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f1741615

drm/amdgpu/vcn: remove manual instance setting · 1b592d00

Alex Deucher authored Aug 09, 2021

Handled by IP discovery now.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1b592d00

drm/amdgpu/sdma: remove manual instance setting · fe323f03

Alex Deucher authored Aug 09, 2021

Handled by IP discovery now.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

fe323f03

drm/amdgpu: get VCN and SDMA instances from IP discovery table · 5c3720be

Alex Deucher authored Aug 09, 2021

Rather than hardcoding it.  We already have the number of VCN
instances from a previous patch, so just update the VCN
instances for chips with static tables.

v2: squash in checks for SDMA3,4 (Guchun)
v3: clarify VCN changes
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5c3720be

drm/amdgpu: add HWID of SDMA instance 2 and 3 · de309ab3

Guchun Chen authored Sep 03, 2021

They are missed.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

de309ab3

drm/amdgpu: add VCN1 hardware IP · 5eceb201

Alex Deucher authored Aug 09, 2021

So we can store the VCN IP revision for each instance of VCN.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5eceb201

drm/amd/display: fix error case handling · 2cbc6f42

Guchun Chen authored Aug 09, 2021

Otherwise, we will run into error case path.

v2: fix build when CONFIG_DRM_AMD_DC_DCN is not set
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

2cbc6f42

drm/amdgpu/soc15: convert to IP version checking · 75a07bcd

Alex Deucher authored Aug 04, 2021

Use IP versions rather than asic_type to differentiate
IP version specific features.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

75a07bcd

drm/amdgpu/vcn2.5: convert to IP version checking · 0b64a5a8

Alex Deucher authored Aug 04, 2021

Use IP versions rather than asic_type to differentiate
IP version specific features.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0b64a5a8

drm/amdgpu/amdgpu_vcn: convert to IP version checking · 96b8dd44

Alex Deucher authored Aug 04, 2021

Use IP versions rather than asic_type to differentiate
IP version specific features.

v2: squash in fix for navy flounder and sienna cichlid
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

96b8dd44

drm/amdgpu/pm/amdgpu_smu: convert more IP version checking · 50638f7d

Alex Deucher authored Aug 04, 2021

Use IP versions rather than asic_type to differentiate
IP version specific features.

v2: switch if statement to a switch statement

Acked-by: Christian König <christian.koenig@amd.com> (v1)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

50638f7d

drm/amdgpu/pm/smu_v13.0: convert IP version checking · 61b396b9

Alex Deucher authored Aug 20, 2021

Use IP versions rather than asic_type to differentiate
IP version specific features.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

61b396b9

drm/amdgpu/pm/smu_v11.0: update IP version checking · 6b726a0a

Alex Deucher authored Aug 04, 2021

Use IP versions rather than asic_type to differentiate
IP version specific features.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

6b726a0a

drm/amdgpu/psp_v13.0: convert to IP version checking · 1fcc208c

Alex Deucher authored Aug 04, 2021

Use IP versions rather than asic_type to differentiate
IP version specific features.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1fcc208c

drm/amdgpu/psp_v11.0: convert to IP version checking · e47868ea

Alex Deucher authored Aug 04, 2021

Use IP versions rather than asic_type to differentiate
IP version specific features.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e47868ea