1. 19 Apr, 2022 4 commits
    • Darren Powell's avatar
      amdgpu/pm: Clarify documentation of error handling in send_smc_mesg · f24044bd
      Darren Powell authored
      Clarify the smu_cmn_send_smc_msg_with_param documentation to mention two
      cases exist where messages are silently dropped with no error returned.
      These cases occur in unusual situations where either:
       1. the message type is not allowed to a virtual GPU, or
       2. a PCI recovery is underway and the HW is not yet in sync with the SW
      
      For more details see
       commit 4ea5081c ("drm/amd/powerplay: enable SMC message filter")
       commit bf36b52e ("drm/amdgpu: Avoid accessing HW when suspending SW state")
      
      (v2)
        Reworked with suggestions from Luben & Paul
      
      (v3)
        Updated wording as per Luben's feedback
        Corrected error stating all messages denied on virtual GPU
        (each GPU has mask of which messages are allowed)
      Signed-off-by: default avatarDarren Powell <darren.powell@amd.com>
      Reviewed-by: default avatarLuben Tuikov <luben.tuikov@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      f24044bd
    • Huang Rui's avatar
      drm/amdgpu/pm: fix the null pointer while the smu is disabled · eea5c7b3
      Huang Rui authored
      It needs to check if the pp_funcs is initialized while release the
      context, otherwise it will trigger null pointer panic while the software
      smu is not enabled.
      
      [ 1109.404555] BUG: kernel NULL pointer dereference, address: 0000000000000078
      [ 1109.404609] #PF: supervisor read access in kernel mode
      [ 1109.404638] #PF: error_code(0x0000) - not-present page
      [ 1109.404657] PGD 0 P4D 0
      [ 1109.404672] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [ 1109.404701] CPU: 7 PID: 9150 Comm: amdgpu_test Tainted: G           OEL    5.16.0-custom #1
      [ 1109.404732] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [ 1109.404765] RIP: 0010:amdgpu_dpm_force_performance_level+0x1d/0x170 [amdgpu]
      [ 1109.405109] Code: 5d c3 44 8b a3 f0 80 00 00 eb e5 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 4c 8b b7 f0 7d 00 00 <49> 83 7e 78 00 0f 84 f2 00 00 00 80 bf 87 80 00 00 00 48 89 fb 0f
      [ 1109.405176] RSP: 0018:ffffaf3083ad7c20 EFLAGS: 00010282
      [ 1109.405203] RAX: 0000000000000000 RBX: ffff9796b1c14600 RCX: 0000000002862007
      [ 1109.405229] RDX: ffff97968591c8c0 RSI: 0000000000000001 RDI: ffff9796a3700000
      [ 1109.405260] RBP: ffffaf3083ad7c50 R08: ffffffff9897de00 R09: ffff979688d9db60
      [ 1109.405286] R10: 0000000000000000 R11: ffff979688d9db90 R12: 0000000000000001
      [ 1109.405316] R13: ffff9796a3700000 R14: 0000000000000000 R15: ffff9796a3708fc0
      [ 1109.405345] FS:  00007ff055cff180(0000) GS:ffff9796bfdc0000(0000) knlGS:0000000000000000
      [ 1109.405378] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1109.405400] CR2: 0000000000000078 CR3: 000000000a394000 CR4: 00000000000506e0
      [ 1109.405434] Call Trace:
      [ 1109.405445]  <TASK>
      [ 1109.405456]  ? delete_object_full+0x1d/0x20
      [ 1109.405480]  amdgpu_ctx_set_stable_pstate+0x7c/0xa0 [amdgpu]
      [ 1109.405698]  amdgpu_ctx_fini.part.0+0xcb/0x100 [amdgpu]
      [ 1109.405911]  amdgpu_ctx_do_release+0x71/0x80 [amdgpu]
      [ 1109.406121]  amdgpu_ctx_ioctl+0x52d/0x550 [amdgpu]
      [ 1109.406327]  ? _raw_spin_unlock+0x1a/0x30
      [ 1109.406354]  ? drm_gem_handle_delete+0x81/0xb0 [drm]
      [ 1109.406400]  ? amdgpu_ctx_get_entity+0x2c0/0x2c0 [amdgpu]
      [ 1109.406609]  drm_ioctl_kernel+0xb6/0x140 [drm]
      Signed-off-by: default avatarHuang Rui <ray.huang@amd.com>
      Reviewed-by: default avatarAaron Liu <aaron.liu@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      eea5c7b3
    • Lang Yu's avatar
      drm/amdkfd: only allow heavy-weight TLB flush on some ASICs for SVM too · 36bf9321
      Lang Yu authored
      The idea is from
      commit a50fe707 ("drm/amdkfd: Only apply heavy-weight TLB flush on Aldebaran")
      and
      commit f61c40c0 ("drm/amdkfd: enable heavy-weight TLB flush on Arcturus").
      
      At the moment, heavy-weight TLB could cause problems on ASICs except
      Aldebaran and Arcturus.
      
      A simple hipMallocManaged/hipFree program could trigger this issue.
      
      [   97.787657] amdgpu 0000:01:00.0: amdgpu: wait for kiq fence error: 0.
      [  106.868758] amdgpu: qcm fence wait loop timeout expired
      [  106.868966] amdgpu: The cp might be in an unrecoverable state due to an unsuccessful queues preemption
      [  106.869203] amdgpu: Failed to evict process queues
      [  106.869261] amdgpu: Failed to quiesce KFD
      Signed-off-by: default avatarLang Yu <Lang.Yu@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      36bf9321
    • Lang Yu's avatar
      drm/amdkfd: move kfd_flush_tlb_after_unmap into kfd_priv.h · 459ccca5
      Lang Yu authored
      To make kfd_flush_tlb_after_unmap visible in kfd_svm.c,
      move it into kfd_priv.h. And change it to an inline function.
      Signed-off-by: default avatarLang Yu <Lang.Yu@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      459ccca5
  2. 14 Apr, 2022 5 commits
  3. 13 Apr, 2022 6 commits
  4. 12 Apr, 2022 15 commits
  5. 11 Apr, 2022 10 commits