An error occurred fetching the project authors.
- 18 May, 2020 1 commit
-
-
Jiange Zhao authored
When GPU got timeout, it would notify an interested part of an opportunity to dump info before actual GPU reset. A usermode app would open 'autodump' node under debugfs system and poll() for readable/writable. When a GPU reset is due, amdgpu would notify usermode app through wait_queue_head and give it 10 minutes to dump info. After usermode app has done its work, this 'autodump' node is closed. On node closure, amdgpu gets to know the dump is done through the completion that is triggered in release(). There is no write or read callback because necessary info can be obtained through dmesg and umr. Messages back and forth between usermode app and amdgpu are unnecessary. v2: (1) changed 'registered' to 'app_listening' (2) add a mutex in open() to prevent race condition v3 (chk): grab the reset lock to avoid race in autodump_open, rename debugfs file to amdgpu_autodump, provide autodump_read as well, style and code cleanups v4: add 'bool app_listening' to differentiate situations, so that the node can be reopened; also, there is no need to wait for completion when no app is waiting for a dump. v5: change 'bool app_listening' to 'enum amdgpu_autodump_state' add 'app_state_mutex' for race conditions: (1)Only 1 user can open this file node (2)wait_dump() can only take effect after poll() executed. (3)eliminated the race condition between release() and wait_dump() v6: removed 'enum amdgpu_autodump_state' and 'app_state_mutex' removed state checking in amdgpu_debugfs_wait_dump Improve on top of version 3 so that the node can be reopened. v7: move reinit_completion into open() so that only one user can open it. v8: remove complete_all() from amdgpu_debugfs_wait_dump(). Signed-off-by:
Jiange Zhao <Jiange.Zhao@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 08 May, 2020 1 commit
-
-
Nirmoy Das authored
Create sysfs file using attributes. Signed-off-by:
Nirmoy Das <nirmoy.das@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 06 May, 2020 1 commit
-
-
Christian König authored
This should speed up debugging VRAM access a lot. v2: add HDP flush/invalidate Unrevert: RAS issue at root of the issue has been addressed Signed-off-by:
Christian König <christian.koenig@amd.com> Reviewed-by:
Felix Kuehling <Felix.Kuehling@amd.com> Acked-by:
Jonathan Kim <Jonathan.Kim@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Kent Russell <kent.russell@amd.com> Acked-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 05 May, 2020 1 commit
-
-
Nathan Chancellor authored
When building with Clang: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4160:53: warning: overflow in expression; result is -294967296 with type 'long' [-Winteger-overflow] expires = ktime_get_mono_fast_ns() + NSEC_PER_SEC * 4L; ^ 1 warning generated. Multiplication happens first due to order of operations and both NSEC_PER_SEC and 4 are long literals so the expression overflows. To avoid this, make 4 an unsigned long long literal, which matches the type of expires (u64). Fixes: 3f12acc8 ("drm/amdgpu: put the audio codec into suspend state before gpu reset V3") Link: https://github.com/ClangBuiltLinux/linux/issues/1017Signed-off-by:
Nathan Chancellor <natechancellor@gmail.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 30 Apr, 2020 1 commit
-
-
Evan Quan authored
At default, the autosuspend delay of audio controller is 3S. If the gpu reset is triggered within 3S(after audio controller idle), the audio controller may be unable into suspended state. Then the sudden gpu reset will cause some audio errors. The change here is targeted to resolve this. However if the audio controller is in use when the gpu reset triggered, this change may be still not enough to put the audio controller into suspend state. Under this case, the gpu reset will still proceed but there will be a warning message printed("failed to suspend display audio"). V2: limit this for BACO and mode1 reset only V3: try 1st to use pm_runtime_autosuspend_expiration() to query how much time is left. Use default setting on failure Signed-off-by:
Evan Quan <evan.quan@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 28 Apr, 2020 2 commits
-
-
Luben Tuikov authored
Implement an accessor of adev->tmz.enabled. Let not code around access it as "if (adev->tmz.enabled)" as the organization may change. Instead... Recruit "bool amdgpu_is_tmz(adev)" to return exactly this Boolean value. That is, this function is now an accessor of an already initialized and set adev and adev->tmz. Add "void amdgpu_gmc_tmz_set(adev)" to check and set adev->gmc.tmz_enabled at initialization time. After which one uses "bool amdgpu_is_tmz(adev)" to query whether adev supports TMZ. Also, remove circular header file include. v2: Remove amdgpu_tmz.[ch] as requested. v3: Move TMZ into GMC. Signed-off-by:
Luben Tuikov <luben.tuikov@amd.com> Acked-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Huang Rui authored
Add a function to check tmz capability with kernel parameter and ASIC type. v2: use a per device tmz variable instead of global amdgpu_tmz. v3: refine the comments for the function. (Luben) v4: add amdgpu_tmz.c/h for future use. Signed-off-by:
Huang Rui <ray.huang@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 27 Apr, 2020 3 commits
-
-
Jason Yan authored
The '>' expression itself is bool, no need to convert it to bool again. This fixes the following coccicheck warning: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3004:68-73: WARNING: conversion to bool not needed here Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Jason Yan <yanaijie@huawei.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Evan Quan authored
CG/PG ungate is already performed in ip_suspend_phase1. Otherwise, the CG/PG ungate will be performed twice. That will cause gfxoff disablement is performed twice also on runpm enter while gfxoff enablemnt once on rump exit. That will put gfxoff into disabled state. Signed-off-by:
Evan Quan <evan.quan@amd.com> Acked-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Evan Quan authored
This sequence change should be safe as what did in ip_suspend_phase1 is to suspend DCE only. And this is a prerequisite for coming redundant cg/pg ungate dropping. Signed-off-by:
Evan Quan <evan.quan@amd.com> Acked-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 22 Apr, 2020 10 commits
-
-
Dennis Li authored
If set error query ready in amdgpu_ras_late_init, which will cause some IP blocks aren't initialized, but their error query is ready. Signed-off-by:
Dennis Li <Dennis.Li@amd.com> Reviewed-by:
Guchun Chen <guchun.chen@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Evan Quan authored
Make code more readable. Signed-off-by:
Evan Quan <evan.quan@amd.com> Reviewed-by:
Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Evan Quan authored
This is basically just some code cosmetic. The current design for XGMI setup gput reset is to operate on current device(adev) first and then on other devices from the hive(by another 'for' loop). But actually we can do some sort to the device list(to put current device 1st position) and handle all the devices in a single 'for' loop. V2: added missing hive->hive_lock protection Signed-off-by:
Evan Quan <evan.quan@amd.com> Reviewed-by:
Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Evan Quan authored
As for XGMI setup, it should be performed on other devices from the hive also. Signed-off-by:
Evan Quan <evan.quan@amd.com> Reviewed-by:
Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Evan Quan authored
As for XGMI setup, it needs to be performed on all the devices from the same hive. Signed-off-by:
Evan Quan <evan.quan@amd.com> Acked-by:
Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Kevin Wang authored
clean up unused variable: 1. ring_lru_list 2. ring_lru_list_lock related-commit: drm/amdgpu: remove ring lru handling Signed-off-by:
Kevin Wang <kevin1.wang@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Yong Zhao authored
This is convenient for multiple teams to obtain the information. Also, add device info by using dev_info(). Signed-off-by:
Yong Zhao <Yong.Zhao@amd.com> Reviewed-by:
Dennis Li <Dennis.Li@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Jonathan Kim authored
Vega20 arbitrates pstate at hive level and not device level. Last peer to remote buffer unmap could drop P-State while another process is still remote buffer mapped. With this fix, P-States still needs to be disabled for now as SMU bug was discovered on synchronous P2P transfers. This should be fixed in the next FW update. Signed-off-by:
Jonathan Kim <Jonathan.Kim@amd.com> Reviewed-by:
Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Kent Russell authored
This reverts commit c12b84d6. The original patch causes a RAS event and subsequent kernel hard-hang when running the KFDMemoryTest.PtraceAccessInvisibleVram on VG20 and Arcturus dmesg output at hang time: [drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected! amdgpu 0000:67:00.0: GPU reset begin! Evicting PASID 0x8000 queues Started evicting pasid 0x8000 qcm fence wait loop timeout expired The cp might be in an unrecoverable state due to an unsuccessful queues preemption Failed to evict process queues Failed to suspend process 0x8000 Finished evicting pasid 0x8000 Started restoring pasid 0x8000 Finished restoring pasid 0x8000 [drm] UVD VCPU state may lost due to RAS ERREVENT_ATHUB_INTERRUPT amdgpu: [powerplay] Failed to send message 0x26, response 0x0 amdgpu: [powerplay] Failed to set soft min gfxclk ! amdgpu: [powerplay] Failed to upload DPM Bootup Levels! amdgpu: [powerplay] Failed to send message 0x7, response 0x0 amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to disable all smu features! amdgpu: [powerplay] [DisableDpmTasks] Failed to disable all smu features! amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM! [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <powerplay> failed -5 Signed-off-by:
Kent Russell <kent.russell@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Prike Liang authored
The system reboot failed as some IP blocks enter power gate before perform hw resource destory. Meanwhile use unify interface to set device CGPG to ungate state can simplify the amdgpu poweroff or reset ungate guard. Fixes: 487eca11 ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)") Signed-off-by:
Prike Liang <Prike.Liang@amd.com> Tested-by:
Mengbing Wang <Mengbing.Wang@amd.com> Tested-by:
Paul Menzel <pmenzel@molgen.mpg.de> Acked-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 14 Apr, 2020 1 commit
-
-
Prike Liang authored
The system reboot failed as some IP blocks enter power gate before perform hw resource destory. Meanwhile use unify interface to set device CGPG to ungate state can simplify the amdgpu poweroff or reset ungate guard. Fixes: 487eca11 ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)") Signed-off-by:
Prike Liang <Prike.Liang@amd.com> Tested-by:
Mengbing Wang <Mengbing.Wang@amd.com> Tested-by:
Paul Menzel <pmenzel@molgen.mpg.de> Acked-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
- 13 Apr, 2020 3 commits
-
-
Evan Quan authored
Vram lost counter is wrongly increased by two during baco reset. V2: assumed vram lost for mode1 reset on all ASICs Signed-off-by:
Evan Quan <evan.quan@amd.com> Acked-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Aurabindo Pillai authored
Let format prefixes take care of printing the module name through pr_fmt and dev_fmt definitions. Signed-off-by:
Aurabindo Pillai <mail@aurabindo.in> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Evan Quan authored
Vram lost counter is wrongly increased by two during baco reset. V2: assumed vram lost for mode1 reset on all ASICs Signed-off-by:
Evan Quan <evan.quan@amd.com> Acked-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 09 Apr, 2020 9 commits
-
-
Hawking Zhang authored
add indirect access support to registers outside of mmio bar. Signed-off-by:
Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Hawking Zhang authored
all the register access through kiq is redirected to amdgpu_kiq_rreg/amdgpu_kiq_wreg Signed-off-by:
Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Hawking Zhang authored
those are not needed anymore Signed-off-by:
Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Hawking Zhang authored
the workaround is not needed for soc15 ASICs except for vega10. it is even not needed with latest vega10 vbios. Signed-off-by:
Hawking Zhang <Hawking.Zhang@amd.com> Acked-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Prike Liang authored
The system will be hang up during S3 suspend because of SMU is pending for GC not respose the register CP_HQD_ACTIVE access request.This issue root cause of accessing the GC register under enter GFX CGGPG and can be fixed by disable GFX CGPG before perform suspend. v2: Use disable the GFX CGPG instead of RLC safe mode guard. Signed-off-by:
Prike Liang <Prike.Liang@amd.com> Tested-by:
Mengbing Wang <Mengbing.Wang@amd.com> Reviewed-by:
Huang Rui <ray.huang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Jack Zhang authored
[PATCH 2/2] kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate Without this change, sriov tdr code path will never free those allocated memories and get memory leak. Signed-off-by:
Jack Zhang <Jack.Zhang1@amd.com> Reviewed-by:
Monk Liu <monk.liu@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Jack Zhang authored
This reverts commit 5161bba4311f in order to split it into two different patches, and this will make it easier to understand. [PATCH 1/2] porting to gfx10 from commit 1b0bfcff ("drm/amdgpu: Avoid destroy hqd when GPU is on reset") Originally, MEC is touched without GPU initialized first. Signed-off-by:
Jack Zhang <Jack.Zhang1@amd.com> Reviewed-by:
Monk Liu <monk.liu@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Nirmoy Das authored
Generate HW IP's sched_list in amdgpu_ring_init() instead of amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(), ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary. This patch also stores sched_list for all HW IPs in one big array in struct amdgpu_device which makes amdgpu_ctx_init_entity() much more leaner. v2: fix a coding style issue do not use drm hw_ip const to populate amdgpu_ring_type enum v3: remove ctx reference and move sched array and num_sched to a struct use num_scheds to detect uninitialized scheduler list v4: use array_index_nospec for user space controlled variables fix possible checkpatch.pl warnings Signed-off-by:
Nirmoy Das <nirmoy.das@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Jack Zhang authored
kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate Without this change, sriov tdr code path will never free those allocated memories and get memory leak. v2:add a bugfix for kiq ring test fail Signed-off-by:
Jack Zhang <Jack.Zhang1@amd.com> Reviewed-by:
Monk Liu <monk.liu@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 08 Apr, 2020 1 commit
-
-
Prike Liang authored
The system will be hang up during S3 suspend because of SMU is pending for GC not respose the register CP_HQD_ACTIVE access request.This issue root cause of accessing the GC register under enter GFX CGGPG and can be fixed by disable GFX CGPG before perform suspend. v2: Use disable the GFX CGPG instead of RLC safe mode guard. Signed-off-by:
Prike Liang <Prike.Liang@amd.com> Tested-by:
Mengbing Wang <Mengbing.Wang@amd.com> Reviewed-by:
Huang Rui <ray.huang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
- 01 Apr, 2020 6 commits
-
-
Jiawei authored
extend compute lockup timeout to 60000 for SR-IOV. Reviewed-by:
Emily Deng <Emily.Deng@amd.com> Signed-off-by:
Jiawei <Jiawei.Gu@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Monk Liu authored
if host support new handshake we only need to enter fullaccess_mode in ip_init() part, otherwise we need to do it before reading vbios (becuase host prepares vbios for VF only after received REQ_GPU_INIT event under legacy handshake) Signed-off-by:
Monk Liu <Monk.Liu@amd.com> Reviewed-by:
Emily Deng <Emily.Deng@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Monk Liu authored
what: 1)move timtout setting before ip_early_init to reduce exclusive mode cost for SRIOV 2)move ip_discovery_init() to inside of amdgpu_discovery_reg_base_init() it is a prepare for the later upcoming patches. why: in later upcoming patches we would use a new mailbox event -- "req_gpu_init_data", which is a callback hooked in adev->virt.ops and this callback send a new event "REQ_GPU_INIT_DAT" to host to notify host to do some preparation like "IP discovery/vbios on the VF FB" and this callback must be: A) invoked after set_ip_block() because virt.ops is configured during set_ip_block() B) invoked before ip_discovery_init() becausen ip_discovery_init() need host side prepares everything in VF FB first. current place of ip_discovery_init() is before we can invoke callback of adev->virt.ops, thus we must move ip_discovery_init() to a place after the adev->virt.ops all settle done, and the perfect place is in amdgpu_discovery_reg_base_init() Signed-off-by:
Monk Liu <Monk.Liu@amd.com> Reviewed-by:
Emily Deng <Emily.Deng@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Monk Liu authored
by this new handshake host side can prepare vbios/ip-discovery and pf&vf exchange data upon recieving this request without stopping world switch. this way the world switch is less impacted by VF's exclusive mode request Signed-off-by:
Monk Liu <Monk.Liu@amd.com> Reviewed-by:
Emily Deng <Emily.Deng@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
John Clements authored
added flag to ras context to indicate if ras query functionality is ready Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
John Clements <john.clements@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Monk Liu authored
we need to move virt detection much earlier because: 1) HW team confirms us that RCC_IOV_FUNC_IDENTIFIER will always be at DE5 (dw) mmio offset from vega10, this way there is no need to implement detect_hw_virt() routine in each nbio/chip file. for VI SRIOV chip (tonga & fiji), the BIF_IOV_FUNC_IDENTIFIER is at 0x1503 2) we need to acknowledged we are SRIOV VF before we do IP discovery because the IP discovery content will be updated by host everytime after it recieved a new coming "REQ_GPU_INIT_DATA" request from guest (there will be patches for this new handshake soon). Signed-off-by:
Monk Liu <Monk.Liu@amd.com> Reviewed-by:
Emily Deng <Emily.Deng@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-