1. 19 Apr, 2022 6 commits
    • Keita Suzuki's avatar
      drm/amd/pm: fix double free in si_parse_power_table() · f3fa2bec
      Keita Suzuki authored
      In function si_parse_power_table(), array adev->pm.dpm.ps and its member
      is allocated. If the allocation of each member fails, the array itself
      is freed and returned with an error code. However, the array is later
      freed again in si_dpm_fini() function which is called when the function
      returns an error.
      
      This leads to potential double free of the array adev->pm.dpm.ps, as
      well as leak of its array members, since the members are not freed in
      the allocation function and the array is not nulled when freed.
      In addition adev->pm.dpm.num_ps, which keeps track of the allocated
      array member, is not updated until the member allocation is
      successfully finished, this could also lead to either use after free,
      or uninitialized variable access in si_dpm_fini().
      
      Fix this by postponing the free of the array until si_dpm_fini() and
      increment adev->pm.dpm.num_ps everytime the array member is allocated.
      Signed-off-by: default avatarKeita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      f3fa2bec
    • Tales Lelo da Aparecida's avatar
      drm/amd/display: make hubp1_wait_pipe_read_start() static · a26b9e0b
      Tales Lelo da Aparecida authored
      It's a local function, let's make it static.
      
      AGD: remove prototype in dcn10_hubp.h
      Signed-off-by: default avatarTales Lelo da Aparecida <tales.aparecida@gmail.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      a26b9e0b
    • Darren Powell's avatar
      amdgpu/pm: Clarify documentation of error handling in send_smc_mesg · f24044bd
      Darren Powell authored
      Clarify the smu_cmn_send_smc_msg_with_param documentation to mention two
      cases exist where messages are silently dropped with no error returned.
      These cases occur in unusual situations where either:
       1. the message type is not allowed to a virtual GPU, or
       2. a PCI recovery is underway and the HW is not yet in sync with the SW
      
      For more details see
       commit 4ea5081c ("drm/amd/powerplay: enable SMC message filter")
       commit bf36b52e ("drm/amdgpu: Avoid accessing HW when suspending SW state")
      
      (v2)
        Reworked with suggestions from Luben & Paul
      
      (v3)
        Updated wording as per Luben's feedback
        Corrected error stating all messages denied on virtual GPU
        (each GPU has mask of which messages are allowed)
      Signed-off-by: default avatarDarren Powell <darren.powell@amd.com>
      Reviewed-by: default avatarLuben Tuikov <luben.tuikov@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      f24044bd
    • Huang Rui's avatar
      drm/amdgpu/pm: fix the null pointer while the smu is disabled · eea5c7b3
      Huang Rui authored
      It needs to check if the pp_funcs is initialized while release the
      context, otherwise it will trigger null pointer panic while the software
      smu is not enabled.
      
      [ 1109.404555] BUG: kernel NULL pointer dereference, address: 0000000000000078
      [ 1109.404609] #PF: supervisor read access in kernel mode
      [ 1109.404638] #PF: error_code(0x0000) - not-present page
      [ 1109.404657] PGD 0 P4D 0
      [ 1109.404672] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [ 1109.404701] CPU: 7 PID: 9150 Comm: amdgpu_test Tainted: G           OEL    5.16.0-custom #1
      [ 1109.404732] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [ 1109.404765] RIP: 0010:amdgpu_dpm_force_performance_level+0x1d/0x170 [amdgpu]
      [ 1109.405109] Code: 5d c3 44 8b a3 f0 80 00 00 eb e5 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 4c 8b b7 f0 7d 00 00 <49> 83 7e 78 00 0f 84 f2 00 00 00 80 bf 87 80 00 00 00 48 89 fb 0f
      [ 1109.405176] RSP: 0018:ffffaf3083ad7c20 EFLAGS: 00010282
      [ 1109.405203] RAX: 0000000000000000 RBX: ffff9796b1c14600 RCX: 0000000002862007
      [ 1109.405229] RDX: ffff97968591c8c0 RSI: 0000000000000001 RDI: ffff9796a3700000
      [ 1109.405260] RBP: ffffaf3083ad7c50 R08: ffffffff9897de00 R09: ffff979688d9db60
      [ 1109.405286] R10: 0000000000000000 R11: ffff979688d9db90 R12: 0000000000000001
      [ 1109.405316] R13: ffff9796a3700000 R14: 0000000000000000 R15: ffff9796a3708fc0
      [ 1109.405345] FS:  00007ff055cff180(0000) GS:ffff9796bfdc0000(0000) knlGS:0000000000000000
      [ 1109.405378] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1109.405400] CR2: 0000000000000078 CR3: 000000000a394000 CR4: 00000000000506e0
      [ 1109.405434] Call Trace:
      [ 1109.405445]  <TASK>
      [ 1109.405456]  ? delete_object_full+0x1d/0x20
      [ 1109.405480]  amdgpu_ctx_set_stable_pstate+0x7c/0xa0 [amdgpu]
      [ 1109.405698]  amdgpu_ctx_fini.part.0+0xcb/0x100 [amdgpu]
      [ 1109.405911]  amdgpu_ctx_do_release+0x71/0x80 [amdgpu]
      [ 1109.406121]  amdgpu_ctx_ioctl+0x52d/0x550 [amdgpu]
      [ 1109.406327]  ? _raw_spin_unlock+0x1a/0x30
      [ 1109.406354]  ? drm_gem_handle_delete+0x81/0xb0 [drm]
      [ 1109.406400]  ? amdgpu_ctx_get_entity+0x2c0/0x2c0 [amdgpu]
      [ 1109.406609]  drm_ioctl_kernel+0xb6/0x140 [drm]
      Signed-off-by: default avatarHuang Rui <ray.huang@amd.com>
      Reviewed-by: default avatarAaron Liu <aaron.liu@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      eea5c7b3
    • Lang Yu's avatar
      drm/amdkfd: only allow heavy-weight TLB flush on some ASICs for SVM too · 36bf9321
      Lang Yu authored
      The idea is from
      commit a50fe707 ("drm/amdkfd: Only apply heavy-weight TLB flush on Aldebaran")
      and
      commit f61c40c0 ("drm/amdkfd: enable heavy-weight TLB flush on Arcturus").
      
      At the moment, heavy-weight TLB could cause problems on ASICs except
      Aldebaran and Arcturus.
      
      A simple hipMallocManaged/hipFree program could trigger this issue.
      
      [   97.787657] amdgpu 0000:01:00.0: amdgpu: wait for kiq fence error: 0.
      [  106.868758] amdgpu: qcm fence wait loop timeout expired
      [  106.868966] amdgpu: The cp might be in an unrecoverable state due to an unsuccessful queues preemption
      [  106.869203] amdgpu: Failed to evict process queues
      [  106.869261] amdgpu: Failed to quiesce KFD
      Signed-off-by: default avatarLang Yu <Lang.Yu@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      36bf9321
    • Lang Yu's avatar
      drm/amdkfd: move kfd_flush_tlb_after_unmap into kfd_priv.h · 459ccca5
      Lang Yu authored
      To make kfd_flush_tlb_after_unmap visible in kfd_svm.c,
      move it into kfd_priv.h. And change it to an inline function.
      Signed-off-by: default avatarLang Yu <Lang.Yu@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      459ccca5
  2. 14 Apr, 2022 5 commits
  3. 13 Apr, 2022 6 commits
  4. 12 Apr, 2022 15 commits
  5. 11 Apr, 2022 8 commits