- 13 Dec, 2021 40 commits
-
-
Mario Limonciello authored
There are a few places that this isn't checked that could potentially be a NULL pointer access. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Philip Yang authored
For userptr bo, if adev is not in IOMMU isolation mode, RAM direct map to GPU, multiple GPUs use same system memory dma mapping address, they can share the original mem->bo in attachment to reduce dma address array memory usage. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Philip Yang authored
If host and amdgpu IOMMU is not enabled or IOMMU is pass through mode, set adev->ram_is_direct_mapped flag which will be used to optimize memory usage for multi GPU mappings. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rodrigo Siqueira authored
In the DC driver, we have multiple acronyms that are not obvious most of the time; the same idea is valid for amdgpu. This commit introduces a DC and amdgpu glossary in order to make it easier to navigate through our driver. Changes since V3: - Yann: Add new acronyms to amdgpu glossary - Daniel: Add link between dc and amdgpu glossary Changes since V2: - Add MMHUB Changes since V1: - Yann: Divide glossary based on driver context. - Alex: Make terms more consistent and update CPLIB - Add new acronyms to the glossary Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rodrigo Siqueira authored
This commit describes how DCN works by providing high-level diagrams with an explanation of each component. In particular, it details the Global Sync signals. Change since V2: - Add a comment about MMHUBBUB. Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rodrigo Siqueira authored
Introduce how to collect DTN log from debugfs. Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rodrigo Siqueira authored
Display core provides a feature that makes it easy for users to debug Pipe Split. This commit introduces how to use such a debug option. Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rodrigo Siqueira authored
Display core provides a feature that makes it easy for users to debug Multiple planes by enabling a visual notification at the bottom of each plane. This commit introduces how to use such a feature. Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Rodrigo Siqueira authored
Display core documentation is not well organized, and it is hard to find information due to the lack of sections. This commit reorganizes the documentation layout, and it is preparation work for future changes. Changes since V1: - Christian: Group amdgpu documentation together. - Daniel: Drop redundant amdgpu prefix. - Jani: Create index pages. - Yann: Mirror display folder in the documentation. Reviewed-by: Yann Dirson <ydirson@free.fr> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Lang Yu authored
SMU firmware expects the driver maintains error context and doesn't interact with SMU any more when SMU errors occurred. That will aid in debugging SMU firmware issues. Add SMU debug option support for this request, it can be enabled or disabled via amdgpu_smu_debug debugfs file. Use a 32-bit mask to indicate corresponding debug modes. Currently, only one mode(HALT_ON_ERROR) is supported. When enabled, it brings hardware to a kind of halt state so that no one can touch it any more in the envent of SMU errors. The dirver interacts with SMU via sending messages. And threre are three ways to sending messages to SMU in current implementation. Handle them respectively as following: 1, smu_cmn_send_smc_msg_with_param() for normal timeout cases Halt on any error. 2, smu_cmn_send_msg_without_waiting()/smu_cmn_wait_for_response() for longer timeout cases Halt on errors apart from ETIME. Otherwise this way won't work. Let the user handle ETIME error in such a case. 3, smu_cmn_send_msg_without_waiting() for no waiting cases Halt on errors apart from ETIME. Otherwise second way won't work. == Command Guide == 1, enable HALT_ON_ERROR mode # echo 0x1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug 2, disable HALT_ON_ERROR mode # echo 0x0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug v5: - Use bit mask to allow more debug features.(Evan) - Use WRAN() instead of BUG().(Evan) v4: - Set to halt state instead of a simple hang.(Christian) v3: - Use debugfs_create_bool().(Christian) - Put variable into smu_context struct. - Don't resend command when timeout. v2: - Resend command when timeout.(Lijo) - Use debugfs file instead of module parameter. Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Lang Yu authored
It is useful to maintain error context when debugging SW/FW issues. Introduce amdgpu_device_halt() for this purpose. It will bring hardware to a kind of halt state, so that no one can touch it any more. Compare to a simple hang, the system will keep stable at least for SSH access. Then it should be trivial to inspect the hardware state and see what's going on. v2: - Set adev->no_hw_access earlier to avoid potential crashes.(Christian) Suggested-by: Christian Koenig <christian.koenig@amd.com> Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Christian Koenig <christian.koenig@amd.co> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Hawking Zhang authored
in case they are not avaiable in early phase Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Hawking Zhang authored
Leave this bit as hardware default setting Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Le Ma authored
should count on GC IP base address Signed-off-by: Le Ma <le.ma@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Hawking Zhang authored
read and authenticate ip discovery binary getting from vram first, if it is not valid, read and authenticate the one getting from file Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Hawking Zhang authored
To be used to check ip discovery binary signature Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Hawking Zhang authored
add _from_vram in the funciton name to diffrentiate the one used to read from file Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Hawking Zhang authored
To be used when ip_discovery binary is not carried by vbios Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Leslie Shi authored
Guest OS will setup VCN instance 1 which is disabled as an enabled instance and execute initialization work on it, but this causes VCN ib ring test failure on the disabled VCN instance during modprobe: amdgpu 0000:00:08.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 5 on hub 1 amdgpu 0000:00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_dec_0 (-110). amdgpu 0000:00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_enc_0.0 (-110). [drm:amdgpu_device_delayed_init_work_handler [amdgpu]] *ERROR* ib ring test failed (-110). v2: drop amdgpu_discovery_get_vcn_version and rename sriov_config to vcn_config v3: modify VCN's revision in SR-IOV and bare-metal Fixes: baf3f8f3 ("drm/amdgpu: handle SRIOV VCN revision parsing") Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Leslie Shi authored
Fix following warning in SRIOV during modprobe: amdgpu 0000:00:08.0: GFX9+ requires FB check based on format modifier WARNING: CPU: 0 PID: 1023 at drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:1150 amdgpu_display_framebuffer_init+0x8e7/0xb40 [amdgpu] Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Jonathan Kim authored
This patch reverts the following: commit 48733b22 ("drm/amdkfd: add Navi2x to GWS init conditions") Disable GWS usage in default settings for now due to FW bugs. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Graham Sider authored
Initalize GWS on Navi2x with mec2_fw_version >= 0x42. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-and-tested-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Lang Yu authored
We found some headaches on ASICs don't need that, so remove that for them. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Kevin Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Lang Yu authored
Currently, we don't find some neccesities to power on/off SDMA in SMU hw_init/fini(). It makes more sense in SDMA hw_init/fini(). Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Kevin Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Felix Kuehling authored
Hawaii support is mostly untested these days. ROCm user mode also depends on custom firmware for AQL packet processing, that was never pushed upstream due to quality regressions in graphics driver testing. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Felix Kuehling authored
If an existing SVM range overlaps an svm_range_set_attr call, we would normally split it in order to update only the overlapping part. However, if the attributes of the existing range would not be changed splitting it is unnecessary. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Felix Kuehling authored
The existing function doesn't compare the access bitmaps and flags. This can result in failure to update those attributes in existing ranges when all other attributes remained unchanged. Because the access and flags attributes modify only some bits in the respective bitmaps, we cannot compare them directly. Instead we need to check whether applying the attributes to a particular range would change the bitmaps. A PREFETCH_LOC attribute must always trigger a migration, even if the attribute value remains unchanged. E.g. if some pages were migrated due to a CPU page fault, a prefetch must still be executed to migrate pages back to VRAM. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Felix Kuehling authored
Add null-pointer check after the last svm_range_new call. This was originally reported by Zhou Qingyang <zhou1615@umn.edu> based on a static analyzer. To avoid duplicating the unwinding code from svm_range_handle_overlap, I merged the two functions into one. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Zhou Qingyang <zhou1615@umn.edu> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Philip Yang authored
Remove not unique timestamp WARNING as same timestamp interrupt happens on some chips, Drain fault need to wait for the processed_timestamp to be truly greater than the checkpoint or the ring to be empty to be sure no stale faults are handled. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1818Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Isabella Basso authored
This fixes the warning below by changing the prototype to a location that's actually included by the .c files that call amdgpu_kms_compat_ioctl: warning: no previous prototype for ‘amdgpu_kms_compat_ioctl’ [-Wmissing-prototypes] 37 | long amdgpu_kms_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) | ^~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Isabella Basso authored
This fixes warnings caused by global functions lacking prototypes:, such as: warning: no previous prototype for 'dcn303_hw_sequencer_construct' [-Wmissing-prototypes] 12 | void dcn303_hw_sequencer_construct(struct dc *dc) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... warning: no previous prototype for ‘amdgpu_has_atpx’ [-Wmissing-prototypes] 76 | bool amdgpu_has_atpx(void) { | ^~~~~~~~~~~~~~~ Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Isabella Basso authored
This turns previously global functions into static, thus removing compile-time warnings such as: warning: no previous prototype for 'pm_set_resources_vi' [-Wmissing-prototypes] 113 | int pm_set_resources_vi(struct packet_manager *pm, uint32_t *buffer, | ^~~~~~~~~~~~~~~~~~~ Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Isabella Basso authored
This turns previously global functions into static, thus removing compile-time warnings such as: warning: no previous prototype for 'amdgpu_vkms_output_init' [-Wmissing-prototypes] 399 | int amdgpu_vkms_output_init(struct drm_device *dev, | ^~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Isabella Basso authored
This fixes various warnings relating to erroneous docstring syntax, of which some are listed below: warning: Function parameter or member 'adev' not described in 'amdgpu_atomfirmware_ras_rom_addr' ... warning: expecting prototype for amdgpu_atpx_validate_functions(). Prototype was for amdgpu_atpx_validate() instead ... warning: Excess function parameter 'mem' description in 'amdgpu_preempt_mgr_new' ... warning: Cannot understand * @kfd_get_cu_occupancy - Collect number of waves in-flight on this device ... warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Isabella Basso authored
Silences 166 compile-time warnings like: warning: 'UVD0_BASE' defined but not used [-Wunused-const-variable=] 129 | static const struct IP_BASE UVD0_BASE ={ { { { 0x00007800, 0x00007E00, 0, 0, 0 } }, | ^~~~~~~~~ warning: 'UMC0_BASE' defined but not used [-Wunused-const-variable=] 123 | static const struct IP_BASE UMC0_BASE ={ { { { 0x00014000, 0, 0, 0, 0 } }, | ^~~~~~~~~ Signed-off-by: Isabella Basso <isabbasso@riseup.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Zhigang Luo authored
For the ASIC has big FB, it need more time to clear FB during reset. This change extended SRIOV VF waiting reset completion timeout from 5s to 10s. Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Acked-by: Shaoyun Liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Zhigang Luo authored
For SRIOV VF, the XGMI topology was not recovered after reset. This change added code to SRIOV VF reset function to update XGMI topology for SRIOV VF after reset. Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Zhigang Luo authored
For SRIOV VF, XGMI was not initialized in PSP during recover. This change added PSP XGMI initialization for SRIOV VF during recover. Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Zhigang Luo authored
On SRIOV, host driver can support FLR(function level reset) on individual VF within the hive which might bring the individual device back to normal without the necessary to execute the hive reset. If the FLR failed , host driver will trigger the hive reset, each guest VF will get reset notification before the real hive reset been executed. The VF device can handle the reset request individually in it's reset work handler. This change updated gpu recover sequence to skip reset other device in the same hive for SRIOV VF. Signed-off-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-
Aurabindo Pillai authored
[Why] Allow for disabling non transparent mode of LTTPR for running tests. [How] Add a feature flag and set them during init sequence. The flags are already being used in DC. Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-