Commits · 2913b567cecb1e354d321a91ce744735448795f0 · Kirill Smelkov / linux

04 May, 2022 8 commits

drm/amd/smu: Increace dpm level count only for smu v13.0.2 · 2913b567

Likun Gao authored Apr 29, 2022

Only V13.0.2 on SMU v13 will get 0 based max level from fw and
increment by one, other ASIC will not need for this.
V2: replace the asic_type check with ip versioning check.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

2913b567

drm/amdgpu: add soc21 ih clientid definition · db56aebd

Stanley Yang authored Feb 22, 2022

Define soc21 ih clientid
Signed-off-by: Stanley Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

db56aebd

drm/amdgpu: add osssys v6_0_0 ip headers v4 · d71093aa

Hawking Zhang authored Dec 30, 2021

Add osssys v6_0_0 register offset and shift masks
header files (Hawking)
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d71093aa

drm/amdgpu/discovery: add NBIO 4.3 Support · 2c0e7ddd

Likun Gao authored Apr 04, 2022

Enable NBIO 4.3 on asics where it is present.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

2c0e7ddd

drm/amdgpu: add nbio v4_3_0 ip block v2 · 0d09a60e

Stanley.Yang authored Apr 04, 2022

This adds nbio v4_3_0 ip block support

Changed from v1:
use WREG32_SOC15/RREG32_SOC15 instead of
WREG32_PCIE/RREG32_PCIE
remove the programming of PCIE_CONFIG_CNTL
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0d09a60e

drm/amdgpu: add nbio v4_3_0 ip headers v6 · e19920c6

Hawking Zhang authored Dec 30, 2021

Add nbio v4_3_0 register offset and shift masks
header files (Hawking)
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e19920c6

drm/amdgpu/discovery: add soc21 common Support · 759693ac

Likun Gao authored Apr 04, 2022

Enable soc21 common support on asics where it is present.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

759693ac

drm/amd/display: Avoid reading audio pattern past AUDIO_CHANNELS_COUNT · 0ee42ab7

Harry Wentland authored Apr 19, 2022

A faulty receiver might report an erroneous channel count. We
should guard against reading beyond AUDIO_CHANNELS_COUNT as
that would overflow the dpcd_pattern_period array.
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

0ee42ab7

28 Apr, 2022 32 commits

drm/amdgpu: Free user pages if amdgpu_cs_parser_bos failed · 3da2c382

Philip Yang authored Apr 27, 2022

Otherwise userspace resubmit the BOs again will trigger kernel WARNING
and fail the command submission.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Tested-by: Robert Święcki <robert@swiecki.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3da2c382

drm/amdgpu: Fix build warning for TA debugfs interface · 86e18ac3

Candice Li authored Apr 27, 2022

Remove the redundant codes to fix build warning
when CONFIG_DEBUG_FS is disabled.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

86e18ac3

drm/amdgpu: add soc21 common ip block v2 · 71199aa4

Stanley.Yang authored Apr 01, 2022

This adds soc21 common ip block support

Changed from v1:
Switch WREG32/RREG32_PCIE to use indirect reg access
helper for sco15 and onwards
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

71199aa4

drm/amdgpu: add new write field for soc21 · ba9e7a4a

Stanley.Yang authored Aug 04, 2021

add new write field macro to handle soc21
registers with reg prefix
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ba9e7a4a

drm/amdgpu: add nbio callback to query rom offset · fb1d6835

Hawking Zhang authored Jan 08, 2022

Add nbio callback func used to query rom offset.
Used to query the rom offset for fetching the vbios.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

fb1d6835

drm/amdgpu: add gc v11_0_0 ip headers v11 · f33ac92f

Hawking Zhang authored Dec 30, 2021

Add gc v11_0_0 register offset and shift masks
header files (Hawking)
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f33ac92f

drm/amdgpu: add mp v13_0_0 ip headers v7 · 85a41b42

Hawking Zhang authored Dec 30, 2021

Add mp v13_0_0 register offset and shift masks
header files (Hawking)
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

85a41b42

drm/amdgpu: update query ref clk from bios · a8d59943

Hawking Zhang authored Jan 23, 2022

Handle atom_gfx_info_v3_0 structure.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a8d59943

drm/amdgpu: update gc info from bios table · f5fb30b6

Hawking Zhang authored Apr 07, 2022

Handle newer gc info tables.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f5fb30b6

drm/amdgpu: add atom_gfx_info_v3_0 structure · 083e5ff6

Hawking Zhang authored Jan 23, 2022

atomfirmware table used for newer gfx IPs.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

083e5ff6

drm/amdgpu: support query vram_info v3_0 · 7089dd3c

Hawking Zhang authored Feb 05, 2022

vram_info table provides various vram information
including vram_vendor, vram_type, vram_width, etc.

v2: correct the calculation of vram_width
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

7089dd3c

drm/amdgpu: add vram_info v3_0 structure · 1a482448

Hawking Zhang authored Feb 05, 2022

To support query vram_width, vram_type, vram_vendor
information
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1a482448

drm/amdgpu: switch to atomfirmware_asic_init · 85d1bcc6

Hawking Zhang authored Feb 28, 2022

Some initial settings now are not available from
the atom data table. The assumption that !ps[0]
|| !ps[1] in amdgpu_atom_asic_init is not valid.
In addition, driver needs to strictly follow
atomfirmware structure (asic_init_parameters) to
initialize parameters used to execute asic_init
function, otherwise, the execution of asic_init
would fail.

This shall be applicable to all soc15 adapters,but
let make the transition on soc21 first.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

85d1bcc6

drm/amdgpu: add helper to execute atomfirmware asic_init · ba75f6eb

Hawking Zhang authored Mar 01, 2022

Add helper function to execute atomfirmware asic_init
from the cmd table
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

ba75f6eb

drm/amdgpu/discovery: move all table parsing into amdgpu_discovery.c · e24d0e91

Alex Deucher authored Mar 30, 2022

This data has no dependencies, so encapsulate it all within
amdgpu_discovery.c.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

e24d0e91

drm/amdgpu/discovery: add a function to parse the vcn info table · 622469c8

Alex Deucher authored Mar 30, 2022

To get the codec disable fuse mask.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

622469c8

drm/amdgpu/discovery: add additional validation · f716113a

Alex Deucher authored Mar 30, 2022

Check the table signatures and checksums and verify that
the tables exist before accessing them.

v2: disable MALL table for now
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

f716113a

drm/amdgpu/discovery: add a function to get the mall_size · 24681cb5

Alex Deucher authored Mar 29, 2022

Add a function to fetch the mall size from the IP discovery
table. Properly handle harvest configurations where more
or less cache may be available.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

24681cb5

drm/amdgpu/discovery: handle UMC harvesting in IP discovery · 478d338b

Alex Deucher authored Mar 29, 2022

Check the harvesting table to determing if any UMC blocks have
been harvested.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

478d338b

drm/amdgpu/discovery: store the number of UMC IPs on the asic · a2efebf1

Alex Deucher authored Mar 31, 2022

For chips with IP discovery get this from the table,
hardcode it for older asics.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

a2efebf1

drm/amdgpu: store the mall size in the gmc structure · 053d35de

Alex Deucher authored Mar 31, 2022

This will be useful in future patches.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

053d35de

drm/amdgpu/discovery: fix byteswapping in gc info parsing · 8eece29c

Alex Deucher authored Mar 30, 2022

The table is in little endian format.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

8eece29c

drm/amdgpu: disable runtime pm on several sienna cichlid cards(v2) · d1acd68b

Guchun Chen authored Apr 27, 2022

Disable runtime power management on several sienna cichlid
cards, otherwise SMU will possibly fail to be resumed from
runtime suspend. Will drop this after a clean solution between
kernel driver and SMU FW is available.

amdgpu 0000:63:00.0: amdgpu: GECC is enabled
amdgpu 0000:63:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
amdgpu 0000:63:00.0: amdgpu: SMU is resuming...
amdgpu 0000:63:00.0: amdgpu: SMU: I'm not done with your command: SMN_C2PMSG_66:0x0000000E SMN_C2PMSG_82:0x00000080
amdgpu 0000:63:00.0: amdgpu: Failed to SetDriverDramAddr!
amdgpu 0000:63:00.0: amdgpu: Failed to setup smc hw!
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
amdgpu 0000:63:00.0: amdgpu: amdgpu_device_ip_resume failed (-62)

v2: seperate to a function.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

d1acd68b

drm/amdgpu/discovery: populate additional GC info · 5cb1cfd5

Alex Deucher authored Mar 29, 2022

From the GC info table to the gfx config structure in the
driver.  The driver will use this data to configure the
card correctly.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

5cb1cfd5

drm/amdgpu: update latest IP discovery table structures · 00583523

Alex Deucher authored Mar 29, 2022

Add new tables.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

00583523

drm/amdgpu: add function to decode ip version · 1d5eee7d

Likun Gao authored Dec 10, 2021

Add function to decode IP version.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

1d5eee7d

drm/amdgpu: increase HWIP MAX INSTANCE · 3202c7e7

Likun Gao authored Nov 07, 2019

Extend HWIP MAX INSTANCE to 11.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

3202c7e7

drm/amdgpu: do not use passthrough mode in Xen dom0 · 78b12008

Marek Marczykowski-Górecki authored Apr 27, 2022

While technically Xen dom0 is a virtual machine too, it does have
access to most of the hardware so it doesn't need to be considered a
"passthrough". Commit b818a5d3 ("drm/amdgpu/gmc: use PCI BARs for
APUs in passthrough") changed how FB is accessed based on passthrough
mode. This breaks amdgpu in Xen dom0 with message like this:

    [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3

While the reason for this failure is unclear, the passthrough mode is
not really necessary in Xen dom0 anyway. So, to unbreak booting affected
kernels, disable passthrough mode in this case.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1985
Fixes: b818a5d3 ("drm/amdgpu/gmc: use PCI BARs for APUs in passthrough")
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

78b12008

drm/amd/pm: fix the compile warning · 555238d9

Evan Quan authored Apr 25, 2022

Fix the compile warning below:
drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/kv_dpm.c:1641
kv_get_acp_boot_level() warn: always true condition '(table->entries[i]->clk >= 0) => (0-u32max >= 0)'
Reported-by: kernel test robot <lkp@intel.com>
CC: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

555238d9

drm/amdkfd: Fix circular lock dependency warning · b179fc28

Mukul Joshi authored Apr 22, 2022

[  168.544078] ======================================================
[  168.550309] WARNING: possible circular locking dependency detected
[  168.556523] 5.16.0-kfd-fkuehlin #148 Tainted: G            E
[  168.562558] ------------------------------------------------------
[  168.568764] kfdtest/3479 is trying to acquire lock:
[  168.573672] ffffffffc0927a70 (&topology_lock){++++}-{3:3}, at:
		kfd_topology_device_by_id+0x16/0x60 [amdgpu] [  168.583663]
                but task is already holding lock:
[  168.589529] ffff97d303dee668 (&mm->mmap_lock#2){++++}-{3:3}, at:
		vm_mmap_pgoff+0xa9/0x180 [  168.597755]
                which lock already depends on the new lock.

[  168.605970]
                the existing dependency chain (in reverse order) is:
[  168.613487]
                -> #3 (&mm->mmap_lock#2){++++}-{3:3}:
[  168.619700]        lock_acquire+0xca/0x2e0
[  168.623814]        down_read+0x3e/0x140
[  168.627676]        do_user_addr_fault+0x40d/0x690
[  168.632399]        exc_page_fault+0x6f/0x270
[  168.636692]        asm_exc_page_fault+0x1e/0x30
[  168.641249]        filldir64+0xc8/0x1e0
[  168.645115]        call_filldir+0x7c/0x110
[  168.649238]        ext4_readdir+0x58e/0x940
[  168.653442]        iterate_dir+0x16a/0x1b0
[  168.657558]        __x64_sys_getdents64+0x83/0x140
[  168.662375]        do_syscall_64+0x35/0x80
[  168.666492]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[  168.672095]
                -> #2 (&type->i_mutex_dir_key#6){++++}-{3:3}:
[  168.679008]        lock_acquire+0xca/0x2e0
[  168.683122]        down_read+0x3e/0x140
[  168.686982]        path_openat+0x5b2/0xa50
[  168.691095]        do_file_open_root+0xfc/0x190
[  168.695652]        file_open_root+0xd8/0x1b0
[  168.702010]        kernel_read_file_from_path_initns+0xc4/0x140
[  168.709542]        _request_firmware+0x2e9/0x5e0
[  168.715741]        request_firmware+0x32/0x50
[  168.721667]        amdgpu_cgs_get_firmware_info+0x370/0xdd0 [amdgpu]
[  168.730060]        smu7_upload_smu_firmware_image+0x53/0x190 [amdgpu]
[  168.738414]        fiji_start_smu+0xcf/0x4e0 [amdgpu]
[  168.745539]        pp_dpm_load_fw+0x21/0x30 [amdgpu]
[  168.752503]        amdgpu_pm_load_smu_firmware+0x4b/0x80 [amdgpu]
[  168.760698]        amdgpu_device_fw_loading+0xb8/0x140 [amdgpu]
[  168.768412]        amdgpu_device_init.cold+0xdf6/0x1716 [amdgpu]
[  168.776285]        amdgpu_driver_load_kms+0x15/0x120 [amdgpu]
[  168.784034]        amdgpu_pci_probe+0x19b/0x3a0 [amdgpu]
[  168.791161]        local_pci_probe+0x40/0x80
[  168.797027]        work_for_cpu_fn+0x10/0x20
[  168.802839]        process_one_work+0x273/0x5b0
[  168.808903]        worker_thread+0x20f/0x3d0
[  168.814700]        kthread+0x176/0x1a0
[  168.819968]        ret_from_fork+0x1f/0x30
[  168.825563]
                -> #1 (&adev->pm.mutex){+.+.}-{3:3}:
[  168.834721]        lock_acquire+0xca/0x2e0
[  168.840364]        __mutex_lock+0xa2/0x930
[  168.846020]        amdgpu_dpm_get_mclk+0x37/0x60 [amdgpu]
[  168.853257]        amdgpu_amdkfd_get_local_mem_info+0xba/0xe0 [amdgpu]
[  168.861547]        kfd_create_vcrat_image_gpu+0x1b1/0xbb0 [amdgpu]
[  168.869478]        kfd_create_crat_image_virtual+0x447/0x510 [amdgpu]
[  168.877884]        kfd_topology_add_device+0x5c8/0x6f0 [amdgpu]
[  168.885556]        kgd2kfd_device_init.cold+0x385/0x4c5 [amdgpu]
[  168.893347]        amdgpu_amdkfd_device_init+0x138/0x180 [amdgpu]
[  168.901177]        amdgpu_device_init.cold+0x141b/0x1716 [amdgpu]
[  168.909025]        amdgpu_driver_load_kms+0x15/0x120 [amdgpu]
[  168.916458]        amdgpu_pci_probe+0x19b/0x3a0 [amdgpu]
[  168.923442]        local_pci_probe+0x40/0x80
[  168.929249]        work_for_cpu_fn+0x10/0x20
[  168.935008]        process_one_work+0x273/0x5b0
[  168.940944]        worker_thread+0x20f/0x3d0
[  168.946623]        kthread+0x176/0x1a0
[  168.951765]        ret_from_fork+0x1f/0x30
[  168.957277]
                -> #0 (&topology_lock){++++}-{3:3}:
[  168.965993]        check_prev_add+0x8f/0xbf0
[  168.971613]        __lock_acquire+0x1299/0x1ca0
[  168.977485]        lock_acquire+0xca/0x2e0
[  168.982877]        down_read+0x3e/0x140
[  168.987975]        kfd_topology_device_by_id+0x16/0x60 [amdgpu]
[  168.995583]        kfd_device_by_id+0xa/0x20 [amdgpu]
[  169.002180]        kfd_mmap+0x95/0x200 [amdgpu]
[  169.008293]        mmap_region+0x337/0x5a0
[  169.013679]        do_mmap+0x3aa/0x540
[  169.018678]        vm_mmap_pgoff+0xdc/0x180
[  169.024095]        ksys_mmap_pgoff+0x186/0x1f0
[  169.029734]        do_syscall_64+0x35/0x80
[  169.035005]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[  169.041754]
                other info that might help us debug this:

[  169.053276] Chain exists of:
                  &topology_lock --> &type->i_mutex_dir_key#6 --> &mm->mmap_lock#2

[  169.068389]  Possible unsafe locking scenario:

[  169.076661]        CPU0                    CPU1
[  169.082383]        ----                    ----
[  169.088087]   lock(&mm->mmap_lock#2);
[  169.092922]                                lock(&type->i_mutex_dir_key#6);
[  169.100975]                                lock(&mm->mmap_lock#2);
[  169.108320]   lock(&topology_lock);
[  169.112957]
                 *** DEADLOCK ***

This commit fixes the deadlock warning by ensuring pm.mutex is not
held while holding the topology lock. For this, kfd_local_mem_info
is moved into the KFD dev struct and filled during device init.
This cached value can then be used instead of querying the value
again and again.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b179fc28

drm/amdkfd: Fix updating IO links during device removal · 98447635

Mukul Joshi authored Apr 20, 2022

The logic to update the IO links when a KFD device
is removed was not correct as it would miss updating
the proximity domain values for some nodes where the
node_from and node_to both were greater values than the
proximity domain value of the KFD device being removed
from topology.

Fixes: 46d18d51 ("drm/amdkfd: Cleanup IO links during KFD device removal")
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

98447635

drm/amdkfd: Use non-atomic bitmap functions when possible · b8b9ba58

Christophe JAILLET authored Nov 28, 2021

All uses of the 'kfd->gtt_sa_bitmap' bitmap are protected with the
'kfd->gtt_sa_lock' mutex.

So:
   - prefer the non-atomic '__set_bit()' function
   - use the non-atomic 'bitmap_[set|clear]()' functions instead of
     equivalent 'for' loops. These functions can work on several bits at a
     time
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

b8b9ba58