An error occurred fetching the project authors.
- 30 Jul, 2020 2 commits
-
-
Alex Deucher authored
This regressed some working configurations so revert it. Will fix this properly for 5.9 and backport then. This reverts commit 38e0c89a. Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
Huang Rui authored
It doesn't expose PPTable descriptor on APU platform. So max/min temperature values cannot be got from APU platform. v2: Stoney needs to skip crit temperature as well. Signed-off-by:
Huang Rui <ray.huang@amd.com> Reviewed-by:
Kevin Wang <kevin1.wang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 27 Jul, 2020 2 commits
-
-
Evan Quan authored
The current outputs of amdgpu_pm_info debugfs come with clock gating status and followed by current clock/power information. However the clock gating status retrieving may pull GFX out of CG status. That will make the succeeding clock/power information retrieving inaccurate. To overcome this and be with minimum impact, the outputs are updated to show current clock/power information first. Signed-off-by:
Evan Quan <evan.quan@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Dennis Li authored
when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover, the atomic adev->in_gpu_reset and hive->in_reset are used to avoid re-entering GPU recovery. During GPU reset and resume, it is unsafe that other threads access GPU, which maybe cause GPU reset failed. Therefore the new rw_semaphore adev->reset_sem is introduced, which protect GPU from being accessed by external threads during recovery. v2: 1. add rwlock for some ioctls, debugfs and file-close function. 2. change to use dqm->is_resetting and dqm_lock for protection in kfd driver. 3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid re-enter GPU recovery for the same GPU hang. v3: 1. change back to use adev->reset_sem to protect kfd callback functions, because dqm_lock couldn't protect all codes, for example: free_mqd must be called outside of dqm_lock; [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [ 1230.177221] Call Trace: [ 1230.178249] dump_stack+0x98/0xd5 [ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu] [ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu] [ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu] [ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu] [ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm] [ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm] [ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm] [ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm] [ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm] [ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm] [ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu] [ 1230.193833] free_mqd+0x25/0x40 [amdgpu] [ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu] [ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu] [ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu] [ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu] [ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu] [ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20 [ 1230.202831] ksys_ioctl+0x98/0xb0 [ 1230.204004] __x64_sys_ioctl+0x1a/0x20 [ 1230.205174] do_syscall_64+0x5f/0x250 [ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe 2. remove try_lock and introduce atomic hive->in_reset, to avoid re-enter GPU recovery. v4: 1. remove an unnecessary whitespace change in kfd_chardev.c 2. remove comment codes in amdgpu_device.c 3. add more detailed comment in commit message 4. define a wrap function amdgpu_in_reset v5: 1. Fix some style issues. Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Suggested-by:
Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by:
Christian König <christian.koenig@amd.com> Suggested-by:
Felix Kuehling <Felix.Kuehling@amd.com> Suggested-by:
Lijo Lazar <Lijo.Lazar@amd.com> Suggested-by:
Luben Tukov <luben.tuikov@amd.com> Signed-off-by:
Dennis Li <Dennis.Li@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 23 Jul, 2020 1 commit
-
-
Alex Deucher authored
We expose the actual memory controller clock rate in Linux, not the effective memory clock of the DRAMs. To translate it, it follows the following formula: Clock conversion (Mhz): HBM: effective_memory_clock = memory_controller_clock * 1 G5: effective_memory_clock = memory_controller_clock * 1 G6: effective_memory_clock = memory_controller_clock * 2 DRAM data rate (MT/s): HBM: effective_memory_clock * 2 = data_rate G5: effective_memory_clock * 4 = data_rate G6: effective_memory_clock * 8 = data_rate Bandwidth (MB/s): data_rate * vram_bit_width / 8 = memory_bandwidth Some examples: G5 on RX460: memory_controller_clock = 1750 Mhz effective_memory_clock = 1750 Mhz * 1 = 1750 Mhz data rate = 1750 * 4 = 7000 MT/s memory_bandwidth = 7000 * 128 bits / 8 = 112000 MB/s G6 on RX5600: memory_controller_clock = 900 Mhz effective_memory_clock = 900 Mhz * 2 = 1800 Mhz data rate = 1800 * 8 = 14400 MT/s memory_bandwidth = 14400 * 192 bits / 8 = 345600 MB/s Acked-by:
Evan Quan <evan.quan@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 21 Jul, 2020 1 commit
-
-
Paweł Gronowski authored
NULL dereference occurs when string that is not ended with space or newline is written to some dpm sysfs interface (for example pp_dpm_sclk). This happens because strsep replaces the tmp with NULL if the delimiter is not present in string, which is then dereferenced by tmp[0]. Reproduction example: sudo sh -c 'echo -n 1 > /sys/class/drm/card0/device/pp_dpm_sclk' Signed-off-by:
Paweł Gronowski <me@woland.xyz> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 15 Jul, 2020 1 commit
-
-
Evan Quan authored
Leftover of previous performance level setting cleanups. Signed-off-by:
Evan Quan <evan.quan@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 08 Jul, 2020 1 commit
-
-
Alex Jivin authored
Move the mutext lock/unlock outside of the if(), as the mutex is always taken: either in the if() branch or in the else branch. Signed-off-by:
Alex Jivin <alex.jivin@amd.com> Suggested-By:
Luben Tukov <luben.tuikov@amd.com> Reviewed-by:
Luben Tuikov <luben.tuikov@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 02 Jul, 2020 2 commits
-
-
Alex Deucher authored
Large clock values may overflow and show up as negative. Reported by prOMiNd on IRC. Acked-by:
Nirmoy Das <nirmoy.das@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Alex Jivin authored
Port functionality from the Radeon driver to support UVD and VCE power management. Signed-off-by:
Alex Jivin <alex.jivin@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Acked-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 01 Jul, 2020 6 commits
-
-
Wenhui Sheng authored
On Navi12 platform, node power_dpm_force_performance_level doesn't work correctly in one-VF mode with at least three smu messages not supported: SMU_MSG_SetSoftMaxByFreq SMU_MSG_SetSoftMinByFreq SMU_MSG_TransferTableDram2Smu Reviewed-by:
Kevin Wang <kevin1.wang@amd.com> Signed-off-by:
Wenhui Sheng <Wenhui.Sheng@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Colin Ian King authored
The variable ret is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by:
Colin Ian King <colin.king@canonical.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
The call to pm_runtime_get_sync increments the counter even in case of failure, leading to incorrect ref count. In case of failure, decrement the ref count before returning. Reviewed-by:
Evan Quan <evan.quan@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
Add rename the gpu busy percentage for consistency and add the mem busy percentage documentation. Reviewed-by:
Evan Quan <evan.quan@amd.com> Reviewed-by:
Nirmoy Das <nirmoy.das@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
Vega10 and previous asics use one interface, vega20 and newer use another. Reviewed-by:
Evan Quan <evan.quan@amd.com> Acked-by:
Nirmoy Das <nirmoy.das@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Evan Quan authored
Drop unused APIs, variables and argument. Signed-off-by:
Evan Quan <evan.quan@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 17 Jun, 2020 2 commits
-
-
Alex Deucher authored
Add rename the gpu busy percentage for consistency and add the mem busy percentage documentation. Reviewed-by:
Evan Quan <evan.quan@amd.com> Reviewed-by:
Nirmoy Das <nirmoy.das@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
Vega10 and previous asics use one interface, vega20 and newer use another. Reviewed-by:
Evan Quan <evan.quan@amd.com> Acked-by:
Nirmoy Das <nirmoy.das@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 02 Jun, 2020 1 commit
-
-
Kent Russell authored
Add support for unique_id and serial_number, as these are now the same value, and will be for future ASICs as well. v2: Explicitly create unique_id only for VG10/20/ARC v3: Change set_unique_id to get_unique_id for clarity Signed-off-by:
Kent Russell <kent.russell@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 29 May, 2020 3 commits
-
-
Evan Quan authored
User can check and set the enablement of throttling logging and the interval between each logging. V2: simplify the sysfs interface(no string parsing) V3: add proper lock protection on updating throttling_logging_rs.interval V4: documentation cosmetic per Luben's suggestion Signed-off-by:
Evan Quan <evan.quan@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Reviewed-by:
Luben Tuikov <luben.tuikov@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
Return an error for sysfs and debugfs power interfaces during gpu reset and suspend. Prevents access to the hw while it may be in an unusable state. v2: squash in fix to drop suspend check Acked-by:
Nirmoy Das <nirmoy.das@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
Return an error for sysfs and debugfs power interfaces during gpu reset and suspend. Prevents access to the hw while it may be in an unusable state. v2: squash in fix to drop suspend check Acked-by:
Nirmoy Das <nirmoy.das@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 26 May, 2020 1 commit
-
-
Kevin Wang authored
the origin design will use varible of "attr->states" to save node supported states on current gpu device, but for multi gpu device, when probe second gpu device, the driver will check attribute node states from previous gpu device wthether to create attribute node. it will cause other gpu device create attribute node faild. 1. add member attr_list into amdgpu_device to link supported device attribute node. 2. add new structure "struct amdgpu_device_attr_entry{}" to track device attribute state. 3. drop member "states" from amdgpu_device_attr. v2: 1. move "attr_list" into amdgpu_pm and rename to "pm_attr_list". 2. refine create & remove device node functions parameter. fix: drm/amdgpu: optimize amdgpu device attribute code Signed-off-by:
Kevin Wang <kevin1.wang@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 22 May, 2020 3 commits
-
-
Alex Deucher authored
Add some APU flags to simplify handling of different APU variants. It's easier to understand the special cases if we use names flags rather than checking device ids and silicon revisions. v2: rebase on latest code Acked-by:
Evan Quan <evan.quan@amd.com> Acked-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
chen gong authored
[Problem description] 1. Boot up picasso platform, launches desktop, Don't do anything (APU enter into "gfxoff" state) 2. Remote login to platform using SSH, then type the command line: sudo su -c "echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level" sudo su -c "echo 2 > /sys/class/drm/card0/device/pp_dpm_sclk" (fix SCLK to 1400MHz) 3. Move the mouse around in Window 4. Phenomenon : The screen frozen Tester will switch sclk level during glmark2 run time. APU will enter "gfxoff" state intermittently during glmark2 run time. The system got hanged if fix GFXCLK to 1400MHz when APU is in "gfxoff" state. [Debug] 1. Fix SCLK to X MHz 1400: screen frozen, screen black, then OS will reboot. 1300: screen frozen. 1200: screen frozen, screen black. 1100: screen frozen, screen black, then OS will reboot. 1000: screen frozen, screen black. 900: screen frozen, screen black, then OS will reboot. 800: Situation Nomal, issue disappear. 700: Situation Nomal, issue disappear. 2. SBIOS setting: AMD CBS --> SMU Debug Options -->SMU Debug --> "GFX DLDO Psm Margin Control": 50 : Situation Nomal, issue disappear. 45 : Situation Nomal, issue disappear. 40 : Situation Nomal, issue disappear. 35 : Situation Nomal, issue disappear. 30 : screen black. 25 : screen frozen, then blurred screen. 20 : screen frozen. 15 : screen black. 10 : screen frozen. 5 : screen frozen, then blurred screen. 3. Disable GFXOFF feature Situation Nomal, issue disappear. [Why] Through a period of time debugging with Sys Eng team and SMU team, Sys Eng team said this is voltage/frequency marginal issue not a F/W or H/W bug. This experiment proves that default targetPsm [for f=1400MHz] is not sufficient when GFXOFF is enabled on Picasso. SMU team think it is an odd test conditions to force sclk="1400MHz" when GPU is in "gfxoff" state,then wake up the GFX. SCLK should be in the "lowest frequency" when gfxoff. [How] Disable gfxoff when setting manual mode. Enable gfxoff when setting other mode(exiting manual mode) again. By the way, from the user point of view, now that user switch to manual mode and force SCLK Frequency, he don't want SCLK be controlled by workload.It becomes meaningless to "switch to manual mode" if APU enter "gfxoff" due to lack of workload at this point. Tips: Same issue observed on Raven. Signed-off-by:
chen gong <curry.gong@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Alex Deucher authored
Fix typos that prevented them from showing up. v2: switch other files in addition to pp_clk_voltage Fixes: 4e01847c ("drm/amdgpu: optimize amdgpu device attribute code") Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1150Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Acked-by:
Evan Quan <evan.quan@amd.com>
-
- 21 May, 2020 3 commits
-
-
Alex Deucher authored
1. Initialize the counters to 0 in case the callback fails to initialize them. 2. The counters don't exist on APUs so return an error for them. 3. Return an error if the callback doesn't exist. Reviewed-by:
Yong Zhao <Yong.Zhao@amd.com> Reviewed-By:
Kent Russell <kent.russell@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Dan Carpenter authored
This loop in the error handling code should start a "i - 1" and end at "i == 0". Currently it starts a "i" and ends at "i == 1". The result is that it removes one attribute that wasn't created yet, and leaks the zeroeth attribute. Fixes: 4e01847c ("drm/amdgpu: optimize amdgpu device attribute code") Acked-by:
Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Reviewed-by:
Kevin Wang <kevin1.wang@amd.com> Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Kevin Wang authored
the amdgpu device attribute node will be created accordding to sriov vf mode at runtime. cleanup unnecessary sriov check in attribute operation function. Signed-off-by:
Kevin Wang <kevin1.wang@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 18 May, 2020 1 commit
-
-
Kevin Wang authored
unified amdgpu device attribute node functions: 1. add some helper functions to create amdgpu device attribute node. 2. create device node according to device attr flags on different VF mode. 3. rename some functions name to adapt a new interface. v2: 1. remove ATTR_STATE_DEAD, ATTR_STATE_ALIVE enum. 2. rename callback function perform to attr_update. 3. modify some variable names Signed-off-by:
Kevin Wang <kevin1.wang@amd.com> Reviewed-by:
Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 24 Apr, 2020 1 commit
-
-
Monk Liu authored
Signed-off-by:
Monk Liu <Monk.Liu@amd.com> Acked-by:
Yintian Tao <yttao@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 23 Apr, 2020 1 commit
-
-
limingyu authored
For chip like CHIP_OLAND with si enabled(amdgpu.si_support=1), the amdgpu will expose pp_num_states to the /sys directory. In this moment, read the pp_num_states file will excute the amdgpu_get_pp_num_states func. In our case, the data hasn't been initialized, so the kernel will access some ilegal address, trigger the segmentfault and system will reboot soon: uos@uos-PC:~$ cat /sys/devices/pci0000\:00/0000\:00\:00.0/0000\:01\:00 .0/pp_num_states Message from syslogd@uos-PC at Apr 22 09:26:20 ... kernel:[ 82.154129] Internal error: Oops: 96000004 [#1] SMP This patch aims to fix this problem, avoid that reading file triggers the kernel sementfault. Signed-off-by:
limingyu <limingyu@uniontech.com> Signed-off-by:
zhoubinbin <zhoubinbin@uniontech.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 09 Apr, 2020 1 commit
-
-
Aaron Ma authored
On ARCTURUS and RENOIR, powerplay is not supported yet. When plug in or unplug power jack, ACPI event will issue. Then kernel NULL pointer BUG will be triggered. Check for NULL pointers before calling. Signed-off-by:
Aaron Ma <aaron.ma@canonical.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 03 Apr, 2020 1 commit
-
-
Aaron Ma authored
On ARCTURUS and RENOIR, powerplay is not supported yet. When plug in or unplug power jack, ACPI event will issue. Then kernel NULL pointer BUG will be triggered. Check for NULL pointers before calling. Signed-off-by:
Aaron Ma <aaron.ma@canonical.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
- 25 Mar, 2020 1 commit
-
-
Alex Deucher authored
For boards that do not support automatic AC/DC transitions in firmware, manually tell the firmware when the status changes. Bug: https://gitlab.freedesktop.org/drm/amd/issues/1043Reviewed-by:
Evan Quan <evan.quan@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 26 Feb, 2020 1 commit
-
-
Alex Deucher authored
In order to remove the load and unload drm callbacks, we need to reorder the init sequence to move all the drm debugfs file handling. Do this for pm. Tested-by:
Thomas Zimmermann <tzimmermann@suse.de> Acked-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 16 Jan, 2020 1 commit
-
-
Alex Deucher authored
count is size_t so don't use negative values. Reviewed-by:
Evan Quan <evan.quan@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 14 Jan, 2020 1 commit
-
-
Alex Deucher authored
If power management sysfs or debugfs files are accessed, power up the GPU when necessary. Reviewed-by:
Evan Quan <evan.quan@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 07 Jan, 2020 1 commit
-
-
Evan Quan authored
Provided an unified entry point. And fixed the confusing that the API usage is conflict with what the naming implies. Signed-off-by:
Evan Quan <evan.quan@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
- 11 Dec, 2019 1 commit
-
-
Yintian Tao authored
Originally, due to the restriction from PSP and SMU, VF has to send message to hypervisor driver to handle powerplay change which is complicated and redundant. Currently, SMU and PSP can support VF to directly handle powerplay change by itself. Therefore, the old code about the handshake between VF and PF to handle powerplay will be removed and VF will use new the registers below to handshake with SMU. mmMP1_SMN_C2PMSG_101: register to handle SMU message mmMP1_SMN_C2PMSG_102: register to handle SMU parameter mmMP1_SMN_C2PMSG_103: register to handle SMU response v2: remove module parameter pp_one_vf v3: fix the parens v4: forbid vf to change smu feature v5: use hwmon_attributes_visible to skip sepicified hwmon atrribute v6: change skip condition at vega10_copy_table_to_smc Signed-off-by:
Yintian Tao <yttao@amd.com> Acked-by:
Evan Quan <evan.quan@amd.com> Reviewed-by:
Kenneth Feng <kenneth.feng@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-