1. 20 Mar, 2024 11 commits
    • ZhenGuo Yin's avatar
      drm/amdgpu: Skip access PF-only registers on gfx10/gfxhub2_1 under SRIOV · 56b30ac8
      ZhenGuo Yin authored
      [Why]
      RLCG interface returns "out-of-range" error under SRIOV VF when accessing
      PF-only registers.
      
      [How]
      Skip access PF-only registers on gfx10/gfxhub2_1 under SRIOV.
      Acked-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarZhenGuo Yin <zhenguo.yin@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      56b30ac8
    • Ahmad Rehman's avatar
      drm/amdgpu: Init zone device and drm client after mode-1 reset on reload · f679fd60
      Ahmad Rehman authored
      In passthrough environment, when amdgpu is reloaded after unload, mode-1
      is triggered after initializing the necessary IPs, That init does not
      include KFD, and KFD init waits until the reset is completed. KFD init
      is called in the reset handler, but in this case, the zone device and
      drm client is not initialized, causing app to create kernel panic.
      
      v2: Removing the init KFD condition from amdgpu_amdkfd_drm_client_create.
      As the previous version has the potential of creating DRM client twice.
      
      v3: v2 patch results in SDMA engine hung as DRM open causes VM clear to SDMA
      before SDMA init. Adding the condition to in drm client creation, on top of v1,
      to guard against drm client creation call multiple times.
      Signed-off-by: default avatarAhmad Rehman <Ahmad.Rehman@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      f679fd60
    • Philip Yang's avatar
      drm/amdgpu: amdgpu_ttm_gart_bind set gtt bound flag · 6c6064cb
      Philip Yang authored
      Otherwise after the GTT bo is released, the GTT and gart space is freed
      but amdgpu_ttm_backend_unbind will not clear the gart page table entry
      and leave valid mapping entry pointing to the stale system page. Then
      if GPU access the gart address mistakely, it will read undefined value
      instead page fault, harder to debug and reproduce the real issue.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      6c6064cb
    • Saleemkhan Jamadar's avatar
      drm/amdgpu/vcn: enable vcn1 fw load for VCN 4_0_6 · 6a7cbbc2
      Saleemkhan Jamadar authored
      v1 - update the fw header for each vcn instance (Veera)
      
      VCN1 has different FW binary in VCN v4_0_6.
      Add changes to load the VCN1 fw binary
      Signed-off-by: default avatarSaleemkhan Jamadar <saleemkhan.jamadar@amd.com>
      Reviewed-by: default avatarVeerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
      Reviewed-by: default avatarLeo Liu <leo.liu@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      6a7cbbc2
    • Aurabindo Pillai's avatar
      drm/amd/display: Enable DML2 debug flags · a568c494
      Aurabindo Pillai authored
      [WHY & HOW]
      Enable DML2 related debug config options in DM for testing purposes.
      Reviewed-by: default avatarChaitanya Dhere <chaitanya.dhere@amd.com>
      Acked-by: default avatarAlex Hung <alex.hung@amd.com>
      Signed-off-by: default avatarAurabindo Pillai <aurabindo.pillai@amd.com>
      Tested-by: default avatarDaniel Wheeler <daniel.wheeler@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      a568c494
    • Swapnil Patel's avatar
      drm/amd/display: Change default size for dummy plane in DML2 · 75eb8f7d
      Swapnil Patel authored
      [WHY & HOW]
      Currently, to map dc states into dml_display_cfg,
      We create a dummy plane if the stream doesn't have any planes
      attached to it. This dummy plane uses max addersable width height.
      This results in certain mode validations failing when they shouldn't.
      
      Cc: Mario Limonciello <mario.limonciello@amd.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarChaitanya Dhere <chaitanya.dhere@amd.com>
      Acked-by: default avatarAlex Hung <alex.hung@amd.com>
      Signed-off-by: default avatarSwapnil Patel <swapnil.patel@amd.com>
      Tested-by: default avatarDaniel Wheeler <daniel.wheeler@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      75eb8f7d
    • Friedrich Vock's avatar
      drm/amdgpu: Reset IH OVERFLOW_EN bit for IH 7.0 · c6ba60af
      Friedrich Vock authored
      IH 7.0 support landed shortly after the original patch for resetting the
      bit on all other generations, but without that patch applied.
      
      Fixes: 12443fc5 ("drm/amdgpu: Add ih v7_0 ip block support")
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarFriedrich Vock <friedrich.vock@gmx.de>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      c6ba60af
    • Lang Yu's avatar
      drm/amdgpu: fix mmhub client id out-of-bounds access · 6540ff64
      Lang Yu authored
      Properly handle cid 0x140.
      
      Fixes: aba2be41 ("drm/amdgpu: add mmhub 3.3.0 support")
      Signed-off-by: default avatarLang Yu <Lang.Yu@amd.com>
      Reviewed-by: default avatarYifan Zhang <yifan1.zhang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      6540ff64
    • Vitaly Prosyak's avatar
      drm/amdgpu: fix use-after-free bug · 22207fd5
      Vitaly Prosyak authored
      The bug can be triggered by sending a single amdgpu_gem_userptr_ioctl
      to the AMDGPU DRM driver on any ASICs with an invalid address and size.
      The bug was reported by Joonkyo Jung <joonkyoj@yonsei.ac.kr>.
      For example the following code:
      
      static void Syzkaller1(int fd)
      {
      	struct drm_amdgpu_gem_userptr arg;
      	int ret;
      
      	arg.addr = 0xffffffffffff0000;
      	arg.size = 0x80000000; /*2 Gb*/
      	arg.flags = 0x7;
      	ret = drmIoctl(fd, 0xc1186451/*amdgpu_gem_userptr_ioctl*/, &arg);
      }
      
      Due to the address and size are not valid there is a failure in
      amdgpu_hmm_register->mmu_interval_notifier_insert->__mmu_interval_notifier_insert->
      check_shl_overflow, but we even the amdgpu_hmm_register failure we still call
      amdgpu_hmm_unregister into  amdgpu_gem_object_free which causes access to a bad address.
      The following stack is below when the issue is reproduced when Kazan is enabled:
      
      [  +0.000014] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020
      [  +0.000009] RIP: 0010:mmu_interval_notifier_remove+0x327/0x340
      [  +0.000017] Code: ff ff 49 89 44 24 08 48 b8 00 01 00 00 00 00 ad de 4c 89 f7 49 89 47 40 48 83 c0 22 49 89 47 48 e8 ce d1 2d 01 e9 32 ff ff ff <0f> 0b e9 16 ff ff ff 4c 89 ef e8 fa 14 b3 ff e9 36 ff ff ff e8 80
      [  +0.000014] RSP: 0018:ffffc90002657988 EFLAGS: 00010246
      [  +0.000013] RAX: 0000000000000000 RBX: 1ffff920004caf35 RCX: ffffffff8160565b
      [  +0.000011] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff8881a9f78260
      [  +0.000010] RBP: ffffc90002657a70 R08: 0000000000000001 R09: fffff520004caf25
      [  +0.000010] R10: 0000000000000003 R11: ffffffff8161d1d6 R12: ffff88810e988c00
      [  +0.000010] R13: ffff888126fb5a00 R14: ffff88810e988c0c R15: ffff8881a9f78260
      [  +0.000011] FS:  00007ff9ec848540(0000) GS:ffff8883cc880000(0000) knlGS:0000000000000000
      [  +0.000012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  +0.000010] CR2: 000055b3f7e14328 CR3: 00000001b5770000 CR4: 0000000000350ef0
      [  +0.000010] Call Trace:
      [  +0.000006]  <TASK>
      [  +0.000007]  ? show_regs+0x6a/0x80
      [  +0.000018]  ? __warn+0xa5/0x1b0
      [  +0.000019]  ? mmu_interval_notifier_remove+0x327/0x340
      [  +0.000018]  ? report_bug+0x24a/0x290
      [  +0.000022]  ? handle_bug+0x46/0x90
      [  +0.000015]  ? exc_invalid_op+0x19/0x50
      [  +0.000016]  ? asm_exc_invalid_op+0x1b/0x20
      [  +0.000017]  ? kasan_save_stack+0x26/0x50
      [  +0.000017]  ? mmu_interval_notifier_remove+0x23b/0x340
      [  +0.000019]  ? mmu_interval_notifier_remove+0x327/0x340
      [  +0.000019]  ? mmu_interval_notifier_remove+0x23b/0x340
      [  +0.000020]  ? __pfx_mmu_interval_notifier_remove+0x10/0x10
      [  +0.000017]  ? kasan_save_alloc_info+0x1e/0x30
      [  +0.000018]  ? srso_return_thunk+0x5/0x5f
      [  +0.000014]  ? __kasan_kmalloc+0xb1/0xc0
      [  +0.000018]  ? srso_return_thunk+0x5/0x5f
      [  +0.000013]  ? __kasan_check_read+0x11/0x20
      [  +0.000020]  amdgpu_hmm_unregister+0x34/0x50 [amdgpu]
      [  +0.004695]  amdgpu_gem_object_free+0x66/0xa0 [amdgpu]
      [  +0.004534]  ? __pfx_amdgpu_gem_object_free+0x10/0x10 [amdgpu]
      [  +0.004291]  ? do_syscall_64+0x5f/0xe0
      [  +0.000023]  ? srso_return_thunk+0x5/0x5f
      [  +0.000017]  drm_gem_object_free+0x3b/0x50 [drm]
      [  +0.000489]  amdgpu_gem_userptr_ioctl+0x306/0x500 [amdgpu]
      [  +0.004295]  ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu]
      [  +0.004270]  ? srso_return_thunk+0x5/0x5f
      [  +0.000014]  ? __this_cpu_preempt_check+0x13/0x20
      [  +0.000015]  ? srso_return_thunk+0x5/0x5f
      [  +0.000013]  ? sysvec_apic_timer_interrupt+0x57/0xc0
      [  +0.000020]  ? srso_return_thunk+0x5/0x5f
      [  +0.000014]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
      [  +0.000022]  ? drm_ioctl_kernel+0x17b/0x1f0 [drm]
      [  +0.000496]  ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu]
      [  +0.004272]  ? drm_ioctl_kernel+0x190/0x1f0 [drm]
      [  +0.000492]  drm_ioctl_kernel+0x140/0x1f0 [drm]
      [  +0.000497]  ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu]
      [  +0.004297]  ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm]
      [  +0.000489]  ? srso_return_thunk+0x5/0x5f
      [  +0.000011]  ? __kasan_check_write+0x14/0x20
      [  +0.000016]  drm_ioctl+0x3da/0x730 [drm]
      [  +0.000475]  ? __pfx_amdgpu_gem_userptr_ioctl+0x10/0x10 [amdgpu]
      [  +0.004293]  ? __pfx_drm_ioctl+0x10/0x10 [drm]
      [  +0.000506]  ? __pfx_rpm_resume+0x10/0x10
      [  +0.000016]  ? srso_return_thunk+0x5/0x5f
      [  +0.000011]  ? __kasan_check_write+0x14/0x20
      [  +0.000010]  ? srso_return_thunk+0x5/0x5f
      [  +0.000011]  ? _raw_spin_lock_irqsave+0x99/0x100
      [  +0.000015]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
      [  +0.000014]  ? srso_return_thunk+0x5/0x5f
      [  +0.000013]  ? srso_return_thunk+0x5/0x5f
      [  +0.000011]  ? srso_return_thunk+0x5/0x5f
      [  +0.000011]  ? preempt_count_sub+0x18/0xc0
      [  +0.000013]  ? srso_return_thunk+0x5/0x5f
      [  +0.000010]  ? _raw_spin_unlock_irqrestore+0x27/0x50
      [  +0.000019]  amdgpu_drm_ioctl+0x7e/0xe0 [amdgpu]
      [  +0.004272]  __x64_sys_ioctl+0xcd/0x110
      [  +0.000020]  do_syscall_64+0x5f/0xe0
      [  +0.000021]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
      [  +0.000015] RIP: 0033:0x7ff9ed31a94f
      [  +0.000012] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
      [  +0.000013] RSP: 002b:00007fff25f66790 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [  +0.000016] RAX: ffffffffffffffda RBX: 000055b3f7e133e0 RCX: 00007ff9ed31a94f
      [  +0.000012] RDX: 000055b3f7e133e0 RSI: 00000000c1186451 RDI: 0000000000000003
      [  +0.000010] RBP: 00000000c1186451 R08: 0000000000000000 R09: 0000000000000000
      [  +0.000009] R10: 0000000000000008 R11: 0000000000000246 R12: 00007fff25f66ca8
      [  +0.000009] R13: 0000000000000003 R14: 000055b3f7021ba8 R15: 00007ff9ed7af040
      [  +0.000024]  </TASK>
      [  +0.000007] ---[ end trace 0000000000000000 ]---
      
      v2: Consolidate any error handling into amdgpu_hmm_register
          which applied to kfd_bo also. (Christian)
      v3: Improve syntax and comment (Christian)
      
      Cc: Christian Koenig <christian.koenig@amd.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Felix Kuehling <felix.kuehling@amd.com>
      Cc: Joonkyo Jung <joonkyoj@yonsei.ac.kr>
      Cc: Dokyung Song <dokyungs@yonsei.ac.kr>
      Cc: <jisoo.jang@yonsei.ac.kr>
      Cc: <yw9865@yonsei.ac.kr>
      Signed-off-by: default avatarVitaly Prosyak <vitaly.prosyak@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      22207fd5
    • Mukul Joshi's avatar
      drm/amdgpu: Handle duplicate BOs during process restore · 71b9d192
      Mukul Joshi authored
      In certain situations, some apps can import a BO multiple times
      (through IPC for example). To restore such processes successfully,
      we need to tell drm to ignore duplicate BOs.
      While at it, also add additional logging to prevent silent failures
      when process restore fails.
      Signed-off-by: default avatarMukul Joshi <mukul.joshi@amd.com>
      Reviewed-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      71b9d192
    • Mario Limonciello's avatar
      drm/amd/display: Use freesync when `DRM_EDID_FEATURE_CONTINUOUS_FREQ` found · 2f14c0c8
      Mario Limonciello authored
      The monitor shipped with the Framework 16 supports VRR [1], but it's not
      being advertised.
      
      This is because the detailed timing block doesn't contain
      `EDID_DETAIL_MONITOR_RANGE` which amdgpu looks for to find min and max
      frequencies.  This check however is superfluous for this case because
      update_display_info() calls drm_get_monitor_range() to get these ranges
      already.
      
      So if the `DRM_EDID_FEATURE_CONTINUOUS_FREQ` EDID feature is found then
      turn on freesync without extra checks.
      
      v2: squash in fix from Harry
      
      Closes: https://www.reddit.com/r/framework/comments/1b4y2i5/no_variable_refresh_rate_on_the_framework_16_on/
      Closes: https://www.reddit.com/r/framework/comments/1b6vzcy/framework_16_variable_refresh_rate/
      Closes: https://community.frame.work/t/resolved-no-vrr-freesync-with-amd-version/42338
      Link: https://gist.github.com/superm1/e8fbacfa4d0f53150231d3a3e0a13fafSigned-off-by: default avatarMario Limonciello <mario.limonciello@amd.com>
      Reviewed-by: default avatarHarry Wentland <harry.wentland@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      2f14c0c8
  2. 11 Mar, 2024 1 commit
  3. 08 Mar, 2024 5 commits
  4. 07 Mar, 2024 19 commits
  5. 06 Mar, 2024 4 commits