1. 03 Jan, 2023 2 commits
    • Likun Gao's avatar
      drm/amdgpu: adjust the sequence to check soft reset · 360cd081
      Likun Gao authored
      1.Drop soft reset check when do should recover gpu check.
        (As it will skip gpu reset operation if some ip is hang but
         not support soft reset)
      2.Check soft reset status before do soft reset when pre asic reset.
        a. If check soft reset return true, it means: some ip is hang and
           it also support soft reset, will try soft reset first.
        b. If check soft reset return false, it means:
             I.  All the ip are not hang, will skip gpu reset.
             II. Some ip is hang but not support soft reset, will skip soft
                 reset and retry with full reset later.
      Signed-off-by: default avatarLikun Gao <Likun.Gao@amd.com>
      Reviewed-by: default avatarHawking Zhang <Hawking.Zhang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      360cd081
    • Mukul Joshi's avatar
      drm/amdkfd: Fix kernel warning during topology setup · 1f9d1ff1
      Mukul Joshi authored
      This patch fixes the following kernel warning seen during
      driver load by correctly initializing the p2plink attr before
      creating the sysfs file:
      
      [  +0.002865] ------------[ cut here ]------------
      [  +0.002327] kobject: '(null)' (0000000056260cfb): is not initialized, yet kobject_put() is being called.
      [  +0.004780] WARNING: CPU: 32 PID: 1006 at lib/kobject.c:718 kobject_put+0xaa/0x1c0
      [  +0.001361] Call Trace:
      [  +0.001234]  <TASK>
      [  +0.001067]  kfd_remove_sysfs_node_entry+0x24a/0x2d0 [amdgpu]
      [  +0.003147]  kfd_topology_update_sysfs+0x3d/0x750 [amdgpu]
      [  +0.002890]  kfd_topology_add_device+0xbd7/0xc70 [amdgpu]
      [  +0.002844]  ? lock_release+0x13c/0x2e0
      [  +0.001936]  ? smu_cmn_send_smc_msg_with_param+0x1e8/0x2d0 [amdgpu]
      [  +0.003313]  ? amdgpu_dpm_get_mclk+0x54/0x60 [amdgpu]
      [  +0.002703]  kgd2kfd_device_init.cold+0x39f/0x4ed [amdgpu]
      [  +0.002930]  amdgpu_amdkfd_device_init+0x13d/0x1f0 [amdgpu]
      [  +0.002944]  amdgpu_device_init.cold+0x1464/0x17b4 [amdgpu]
      [  +0.002970]  ? pci_bus_read_config_word+0x43/0x80
      [  +0.002380]  amdgpu_driver_load_kms+0x15/0x100 [amdgpu]
      [  +0.002744]  amdgpu_pci_probe+0x147/0x370 [amdgpu]
      [  +0.002522]  local_pci_probe+0x40/0x80
      [  +0.001896]  work_for_cpu_fn+0x10/0x20
      [  +0.001892]  process_one_work+0x26e/0x5a0
      [  +0.002029]  worker_thread+0x1fd/0x3e0
      [  +0.001890]  ? process_one_work+0x5a0/0x5a0
      [  +0.002115]  kthread+0xea/0x110
      [  +0.001618]  ? kthread_complete_and_exit+0x20/0x20
      [  +0.002422]  ret_from_fork+0x1f/0x30
      [  +0.001808]  </TASK>
      [  +0.001103] irq event stamp: 59837
      [  +0.001718] hardirqs last  enabled at (59849): [<ffffffffb30fab12>] __up_console_sem+0x52/0x60
      [  +0.004414] hardirqs last disabled at (59860): [<ffffffffb30faaf7>] __up_console_sem+0x37/0x60
      [  +0.004414] softirqs last  enabled at (59654): [<ffffffffb307d9c7>] irq_exit_rcu+0xd7/0x130
      [  +0.004205] softirqs last disabled at (59649): [<ffffffffb307d9c7>] irq_exit_rcu+0xd7/0x130
      [  +0.004203] ---[ end trace 0000000000000000 ]---
      
      Fixes: 0f28cca8 ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links")
      Signed-off-by: default avatarMukul Joshi <mukul.joshi@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      1f9d1ff1
  2. 20 Dec, 2022 18 commits
  3. 15 Dec, 2022 20 commits