1. 20 Jun, 2019 9 commits
  2. 17 Jun, 2019 11 commits
    • Alex Deucher's avatar
      drm/amdgpu: wait to fetch the vbios until after common init · 21a249ca
      Alex Deucher authored
      We need the asic_funcs set for the get rom callbacks in some
      cases.
      Tested-by: default avatarKent Russell <kent.russell@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      21a249ca
    • Markus Elfring's avatar
      drm/amd/powerplay: Delete a redundant memory setting in vega20_set_default_od8_setttings() · b9341521
      Markus Elfring authored
      The memory was set to zero already by a call of the function “kzalloc”.
      Thus remove an extra call of the function “memset” for this purpose.
      
      This issue was detected by using the Coccinelle software.
      Signed-off-by: default avatarMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      b9341521
    • Markus Elfring's avatar
      drm/amd/display: Delete a redundant memory setting in amdgpu_dm_irq_register_interrupt() · 4fe7d1a8
      Markus Elfring authored
      The memory was set to zero already by a call of the function “kzalloc”.
      Thus remove an extra call of the function “memset” for this purpose.
      
      This issue was detected by using the Coccinelle software.
      Signed-off-by: default avatarMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      4fe7d1a8
    • Arnd Bergmann's avatar
      drm/amdgpu: fix error handling in df_v3_6_pmc_start · e1a2f2d2
      Arnd Bergmann authored
      When df_v3_6_pmc_get_ctrl_settings() fails for some reason, we
      store uninitialized data in a register, as gcc points out:
      
      drivers/gpu/drm/amd/amdgpu/df_v3_6.c: In function 'df_v3_6_pmc_start':
      drivers/gpu/drm/amd/amdgpu/amdgpu.h:1012:29: error: 'lo_val' may be used uninitialized in this function [-Werror=maybe-uninitialized]
       #define WREG32_PCIE(reg, v) adev->pcie_wreg(adev, (reg), (v))
                                   ^~~~
      drivers/gpu/drm/amd/amdgpu/df_v3_6.c:334:39: note: 'lo_val' was declared here
        uint32_t lo_base_addr, hi_base_addr, lo_val, hi_val;
                                             ^~~~~~
      
      Make it return a proper error code that we can catch in the caller.
      
      Fixes: 992af942 ("drm/amdgpu: add df perfmon regs and funcs for xgmi")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      e1a2f2d2
    • Geert Uytterhoeven's avatar
      drm/amd/display: Add missing newline at end of file · b6bb56ac
      Geert Uytterhoeven authored
      "git diff" says:
      
          \ No newline at end of file
      
      after modifying the file.
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      b6bb56ac
    • Prike Liang's avatar
      drm/amd/powerplay: detect version of smu backend (v2) · 82973e07
      Prike Liang authored
      Print the backend type.
      
      v2: whitespace fixes (Alex)
      Signed-off-by: default avatarPrike Liang <Prike.Liang@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      82973e07
    • Oak Zeng's avatar
      drm/amdkfd: Fix sdma queue allocate race condition · 38bb4226
      Oak Zeng authored
      SDMA queue allocation requires the dqm lock as it modify
      the global dqm members. Enclose it in the dqm_lock.
      Signed-off-by: default avatarOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: default avatarPhilip Yang <philip.yang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      38bb4226
    • Oak Zeng's avatar
      drm/amdkfd: Fix a circular lock dependency · 6a6ef5ee
      Oak Zeng authored
      The idea to break the circular lock dependency is to temporarily drop
      dqm lock before calling allocate_mqd. See callstack #1 below.
      
      [   59.510149] [drm] Initialized amdgpu 3.30.0 20150101 for 0000:04:00.0 on minor 0
      
      [  513.604034] ======================================================
      [  513.604205] WARNING: possible circular locking dependency detected
      [  513.604375] 4.18.0-kfd-root #2 Tainted: G        W
      [  513.604530] ------------------------------------------------------
      [  513.604699] kswapd0/611 is trying to acquire lock:
      [  513.604840] 00000000d254022e (&dqm->lock_hidden){+.+.}, at: evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.605150]
                     but task is already holding lock:
      [  513.605307] 00000000961547fc (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe4/0x250
      [  513.605540]
                     which lock already depends on the new lock.
      
      [  513.605747]
                     the existing dependency chain (in reverse order) is:
      [  513.605944]
                     -> #4 (&anon_vma->rwsem){++++}:
      [  513.606106]        __vma_adjust+0x147/0x7f0
      [  513.606231]        __split_vma+0x179/0x190
      [  513.606353]        mprotect_fixup+0x217/0x260
      [  513.606553]        do_mprotect_pkey+0x211/0x380
      [  513.606752]        __x64_sys_mprotect+0x1b/0x20
      [  513.606954]        do_syscall_64+0x50/0x1a0
      [  513.607149]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  513.607380]
                     -> #3 (&mapping->i_mmap_rwsem){++++}:
      [  513.607678]        rmap_walk_file+0x1f0/0x280
      [  513.607887]        page_referenced+0xdd/0x180
      [  513.608081]        shrink_page_list+0x853/0xcb0
      [  513.608279]        shrink_inactive_list+0x33b/0x700
      [  513.608483]        shrink_node_memcg+0x37a/0x7f0
      [  513.608682]        shrink_node+0xd8/0x490
      [  513.608869]        balance_pgdat+0x18b/0x3b0
      [  513.609062]        kswapd+0x203/0x5c0
      [  513.609241]        kthread+0x100/0x140
      [  513.609420]        ret_from_fork+0x24/0x30
      [  513.609607]
                     -> #2 (fs_reclaim){+.+.}:
      [  513.609883]        kmem_cache_alloc_trace+0x34/0x2e0
      [  513.610093]        reservation_object_reserve_shared+0x139/0x300
      [  513.610326]        ttm_bo_init_reserved+0x291/0x480 [ttm]
      [  513.610567]        amdgpu_bo_do_create+0x1d2/0x650 [amdgpu]
      [  513.610811]        amdgpu_bo_create+0x40/0x1f0 [amdgpu]
      [  513.611041]        amdgpu_bo_create_reserved+0x249/0x2d0 [amdgpu]
      [  513.611290]        amdgpu_bo_create_kernel+0x12/0x70 [amdgpu]
      [  513.611584]        amdgpu_ttm_init+0x2cb/0x560 [amdgpu]
      [  513.611823]        gmc_v9_0_sw_init+0x400/0x750 [amdgpu]
      [  513.612491]        amdgpu_device_init+0x14eb/0x1990 [amdgpu]
      [  513.612730]        amdgpu_driver_load_kms+0x78/0x290 [amdgpu]
      [  513.612958]        drm_dev_register+0x111/0x1a0
      [  513.613171]        amdgpu_pci_probe+0x11c/0x1e0 [amdgpu]
      [  513.613389]        local_pci_probe+0x3f/0x90
      [  513.613581]        pci_device_probe+0x102/0x1c0
      [  513.613779]        driver_probe_device+0x2a7/0x480
      [  513.613984]        __driver_attach+0x10a/0x110
      [  513.614179]        bus_for_each_dev+0x67/0xc0
      [  513.614372]        bus_add_driver+0x1eb/0x260
      [  513.614565]        driver_register+0x5b/0xe0
      [  513.614756]        do_one_initcall+0xac/0x357
      [  513.614952]        do_init_module+0x5b/0x213
      [  513.615145]        load_module+0x2542/0x2d30
      [  513.615337]        __do_sys_finit_module+0xd2/0x100
      [  513.615541]        do_syscall_64+0x50/0x1a0
      [  513.615731]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  513.615963]
                     -> #1 (reservation_ww_class_mutex){+.+.}:
      [  513.616293]        amdgpu_amdkfd_alloc_gtt_mem+0xcf/0x2c0 [amdgpu]
      [  513.616554]        init_mqd+0x223/0x260 [amdgpu]
      [  513.616779]        create_queue_nocpsch+0x4d9/0x600 [amdgpu]
      [  513.617031]        pqm_create_queue+0x37c/0x520 [amdgpu]
      [  513.617270]        kfd_ioctl_create_queue+0x2f9/0x650 [amdgpu]
      [  513.617522]        kfd_ioctl+0x202/0x350 [amdgpu]
      [  513.617724]        do_vfs_ioctl+0x9f/0x6c0
      [  513.617914]        ksys_ioctl+0x66/0x70
      [  513.618095]        __x64_sys_ioctl+0x16/0x20
      [  513.618286]        do_syscall_64+0x50/0x1a0
      [  513.618476]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  513.618695]
                     -> #0 (&dqm->lock_hidden){+.+.}:
      [  513.618984]        __mutex_lock+0x98/0x970
      [  513.619197]        evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.619459]        kfd_process_evict_queues+0x3b/0xb0 [amdgpu]
      [  513.619710]        kgd2kfd_quiesce_mm+0x1c/0x40 [amdgpu]
      [  513.620103]        amdgpu_amdkfd_evict_userptr+0x38/0x70 [amdgpu]
      [  513.620363]        amdgpu_mn_invalidate_range_start_hsa+0xa6/0xc0 [amdgpu]
      [  513.620614]        __mmu_notifier_invalidate_range_start+0x70/0xb0
      [  513.620851]        try_to_unmap_one+0x7fc/0x8f0
      [  513.621049]        rmap_walk_anon+0x121/0x290
      [  513.621242]        try_to_unmap+0x93/0xf0
      [  513.621428]        shrink_page_list+0x606/0xcb0
      [  513.621625]        shrink_inactive_list+0x33b/0x700
      [  513.621835]        shrink_node_memcg+0x37a/0x7f0
      [  513.622034]        shrink_node+0xd8/0x490
      [  513.622219]        balance_pgdat+0x18b/0x3b0
      [  513.622410]        kswapd+0x203/0x5c0
      [  513.622589]        kthread+0x100/0x140
      [  513.622769]        ret_from_fork+0x24/0x30
      [  513.622957]
                     other info that might help us debug this:
      
      [  513.623354] Chain exists of:
                       &dqm->lock_hidden --> &mapping->i_mmap_rwsem --> &anon_vma->rwsem
      
      [  513.623900]  Possible unsafe locking scenario:
      
      [  513.624189]        CPU0                    CPU1
      [  513.624397]        ----                    ----
      [  513.624594]   lock(&anon_vma->rwsem);
      [  513.624771]                                lock(&mapping->i_mmap_rwsem);
      [  513.625020]                                lock(&anon_vma->rwsem);
      [  513.625253]   lock(&dqm->lock_hidden);
      [  513.625433]
                      *** DEADLOCK ***
      
      [  513.625783] 3 locks held by kswapd0/611:
      [  513.625967]  #0: 00000000f14edf84 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
      [  513.626309]  #1: 00000000961547fc (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe4/0x250
      [  513.626671]  #2: 0000000067b5cd12 (srcu){....}, at: __mmu_notifier_invalidate_range_start+0x5/0xb0
      [  513.627037]
                     stack backtrace:
      [  513.627292] CPU: 0 PID: 611 Comm: kswapd0 Tainted: G        W         4.18.0-kfd-root #2
      [  513.627632] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  513.627990] Call Trace:
      [  513.628143]  dump_stack+0x7c/0xbb
      [  513.628315]  print_circular_bug.isra.37+0x21b/0x228
      [  513.628581]  __lock_acquire+0xf7d/0x1470
      [  513.628782]  ? unwind_next_frame+0x6c/0x4f0
      [  513.628974]  ? lock_acquire+0xec/0x1e0
      [  513.629154]  lock_acquire+0xec/0x1e0
      [  513.629357]  ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.629587]  __mutex_lock+0x98/0x970
      [  513.629790]  ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.630047]  ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.630309]  ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.630562]  evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.630816]  kfd_process_evict_queues+0x3b/0xb0 [amdgpu]
      [  513.631057]  kgd2kfd_quiesce_mm+0x1c/0x40 [amdgpu]
      [  513.631288]  amdgpu_amdkfd_evict_userptr+0x38/0x70 [amdgpu]
      [  513.631536]  amdgpu_mn_invalidate_range_start_hsa+0xa6/0xc0 [amdgpu]
      [  513.632076]  __mmu_notifier_invalidate_range_start+0x70/0xb0
      [  513.632299]  try_to_unmap_one+0x7fc/0x8f0
      [  513.632487]  ? page_lock_anon_vma_read+0x68/0x250
      [  513.632690]  rmap_walk_anon+0x121/0x290
      [  513.632875]  try_to_unmap+0x93/0xf0
      [  513.633050]  ? page_remove_rmap+0x330/0x330
      [  513.633239]  ? rcu_read_unlock+0x60/0x60
      [  513.633422]  ? page_get_anon_vma+0x160/0x160
      [  513.633613]  shrink_page_list+0x606/0xcb0
      [  513.633800]  shrink_inactive_list+0x33b/0x700
      [  513.633997]  shrink_node_memcg+0x37a/0x7f0
      [  513.634186]  ? shrink_node+0xd8/0x490
      [  513.634363]  shrink_node+0xd8/0x490
      [  513.634537]  balance_pgdat+0x18b/0x3b0
      [  513.634718]  kswapd+0x203/0x5c0
      [  513.634887]  ? wait_woken+0xb0/0xb0
      [  513.635062]  kthread+0x100/0x140
      [  513.635231]  ? balance_pgdat+0x3b0/0x3b0
      [  513.635414]  ? kthread_delayed_work_timer_fn+0x80/0x80
      [  513.635626]  ret_from_fork+0x24/0x30
      [  513.636042] Evicting PASID 32768 queues
      [  513.936236] Restoring PASID 32768 queues
      [  524.708912] Evicting PASID 32768 queues
      [  524.999875] Restoring PASID 32768 queues
      Signed-off-by: default avatarOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: default avatarPhilip Yang <philip.yang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      6a6ef5ee
    • Oak Zeng's avatar
      Revert "drm/amdkfd: Fix a circular lock dependency" · d091bc0a
      Oak Zeng authored
      This reverts commit 06b89b38.
      This fix is not proper. allocate_mqd can't be moved before
      allocate_sdma_queue as it depends on q->properties->sdma_id
      set in later.
      Signed-off-by: default avatarOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: default avatarPhilip Yang <philip.yang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      d091bc0a
    • Oak Zeng's avatar
      Revert "drm/amdkfd: Fix sdma queue allocate race condition" · 70d488fb
      Oak Zeng authored
      This reverts commit f77dac6c.
      This fix is not proper. allocate_mqd can't be moved before
      allocate_sdma_queue as it depends on q->properties->sdma_id
      set in later.
      Signed-off-by: default avatarOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: default avatarPhilip Yang <philip.yang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      70d488fb
    • James Zhu's avatar
      drm/amdgpu: explicitly set mmGDS_VMID0_BASE to 0 · eb03e795
      James Zhu authored
      Explicitly set mmGDS_VMID0_BASE to 0. Also update
      GDS_VMID0_BASE/_SIZE with direct register writes.
      Signed-off-by: default avatarJames Zhu <James.Zhu@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      eb03e795
  3. 13 Jun, 2019 10 commits
  4. 11 Jun, 2019 10 commits
    • Oak Zeng's avatar
      drm/amdkfd: Add device to topology after it is completely inited · 465ab9e0
      Oak Zeng authored
      We can't have devices that are not completely initialized in kfd topology.
      Otherwise it is a race condition when user access not completely
      initialized device. This also addresses a kfd_topology_add_device accessing
      NULL dqm pointer issue.
      Signed-off-by: default avatarOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      465ab9e0
    • Oak Zeng's avatar
      drm/amdkfd: Initialize HSA_CAP_ATS_PRESENT capability in topology codes · 1ae99eab
      Oak Zeng authored
      Move HSA_CAP_ATS_PRESENT initialization logic from kfd iommu codes to
      kfd topology codes. This removes kfd_iommu_device_init's dependency
      on kfd_topology_add_device. Also remove duplicate code setting the
      same.
      Signed-off-by: default avatarOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      1ae99eab
    • Oak Zeng's avatar
      drm/amdkfd: Fix sdma queue allocate race condition · f77dac6c
      Oak Zeng authored
      SDMA queue allocation requires the dqm lock at it modify
      the global dqm members. Move up the dqm_lock so sdma
      queue allocation is enclosed in the critical section. Move
      mqd allocation out of critical section to avoid circular
      lock dependency.
      Signed-off-by: default avatarOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      f77dac6c
    • Oak Zeng's avatar
      drm/amdkfd: Fix a circular lock dependency · 06b89b38
      Oak Zeng authored
      The idea to break the circular lock dependency is to move allocate_mqd
      out of dqm lock protection. See callstack #1 below.
      
      [   59.510149] [drm] Initialized amdgpu 3.30.0 20150101 for 0000:04:00.0 on minor 0
      
      [  513.604034] ======================================================
      [  513.604205] WARNING: possible circular locking dependency detected
      [  513.604375] 4.18.0-kfd-root #2 Tainted: G        W
      [  513.604530] ------------------------------------------------------
      [  513.604699] kswapd0/611 is trying to acquire lock:
      [  513.604840] 00000000d254022e (&dqm->lock_hidden){+.+.}, at: evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.605150]
                     but task is already holding lock:
      [  513.605307] 00000000961547fc (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe4/0x250
      [  513.605540]
                     which lock already depends on the new lock.
      
      [  513.605747]
                     the existing dependency chain (in reverse order) is:
      [  513.605944]
                     -> #4 (&anon_vma->rwsem){++++}:
      [  513.606106]        __vma_adjust+0x147/0x7f0
      [  513.606231]        __split_vma+0x179/0x190
      [  513.606353]        mprotect_fixup+0x217/0x260
      [  513.606553]        do_mprotect_pkey+0x211/0x380
      [  513.606752]        __x64_sys_mprotect+0x1b/0x20
      [  513.606954]        do_syscall_64+0x50/0x1a0
      [  513.607149]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  513.607380]
                     -> #3 (&mapping->i_mmap_rwsem){++++}:
      [  513.607678]        rmap_walk_file+0x1f0/0x280
      [  513.607887]        page_referenced+0xdd/0x180
      [  513.608081]        shrink_page_list+0x853/0xcb0
      [  513.608279]        shrink_inactive_list+0x33b/0x700
      [  513.608483]        shrink_node_memcg+0x37a/0x7f0
      [  513.608682]        shrink_node+0xd8/0x490
      [  513.608869]        balance_pgdat+0x18b/0x3b0
      [  513.609062]        kswapd+0x203/0x5c0
      [  513.609241]        kthread+0x100/0x140
      [  513.609420]        ret_from_fork+0x24/0x30
      [  513.609607]
                     -> #2 (fs_reclaim){+.+.}:
      [  513.609883]        kmem_cache_alloc_trace+0x34/0x2e0
      [  513.610093]        reservation_object_reserve_shared+0x139/0x300
      [  513.610326]        ttm_bo_init_reserved+0x291/0x480 [ttm]
      [  513.610567]        amdgpu_bo_do_create+0x1d2/0x650 [amdgpu]
      [  513.610811]        amdgpu_bo_create+0x40/0x1f0 [amdgpu]
      [  513.611041]        amdgpu_bo_create_reserved+0x249/0x2d0 [amdgpu]
      [  513.611290]        amdgpu_bo_create_kernel+0x12/0x70 [amdgpu]
      [  513.611584]        amdgpu_ttm_init+0x2cb/0x560 [amdgpu]
      [  513.611823]        gmc_v9_0_sw_init+0x400/0x750 [amdgpu]
      [  513.612491]        amdgpu_device_init+0x14eb/0x1990 [amdgpu]
      [  513.612730]        amdgpu_driver_load_kms+0x78/0x290 [amdgpu]
      [  513.612958]        drm_dev_register+0x111/0x1a0
      [  513.613171]        amdgpu_pci_probe+0x11c/0x1e0 [amdgpu]
      [  513.613389]        local_pci_probe+0x3f/0x90
      [  513.613581]        pci_device_probe+0x102/0x1c0
      [  513.613779]        driver_probe_device+0x2a7/0x480
      [  513.613984]        __driver_attach+0x10a/0x110
      [  513.614179]        bus_for_each_dev+0x67/0xc0
      [  513.614372]        bus_add_driver+0x1eb/0x260
      [  513.614565]        driver_register+0x5b/0xe0
      [  513.614756]        do_one_initcall+0xac/0x357
      [  513.614952]        do_init_module+0x5b/0x213
      [  513.615145]        load_module+0x2542/0x2d30
      [  513.615337]        __do_sys_finit_module+0xd2/0x100
      [  513.615541]        do_syscall_64+0x50/0x1a0
      [  513.615731]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  513.615963]
                     -> #1 (reservation_ww_class_mutex){+.+.}:
      [  513.616293]        amdgpu_amdkfd_alloc_gtt_mem+0xcf/0x2c0 [amdgpu]
      [  513.616554]        init_mqd+0x223/0x260 [amdgpu]
      [  513.616779]        create_queue_nocpsch+0x4d9/0x600 [amdgpu]
      [  513.617031]        pqm_create_queue+0x37c/0x520 [amdgpu]
      [  513.617270]        kfd_ioctl_create_queue+0x2f9/0x650 [amdgpu]
      [  513.617522]        kfd_ioctl+0x202/0x350 [amdgpu]
      [  513.617724]        do_vfs_ioctl+0x9f/0x6c0
      [  513.617914]        ksys_ioctl+0x66/0x70
      [  513.618095]        __x64_sys_ioctl+0x16/0x20
      [  513.618286]        do_syscall_64+0x50/0x1a0
      [  513.618476]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  513.618695]
                     -> #0 (&dqm->lock_hidden){+.+.}:
      [  513.618984]        __mutex_lock+0x98/0x970
      [  513.619197]        evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.619459]        kfd_process_evict_queues+0x3b/0xb0 [amdgpu]
      [  513.619710]        kgd2kfd_quiesce_mm+0x1c/0x40 [amdgpu]
      [  513.620103]        amdgpu_amdkfd_evict_userptr+0x38/0x70 [amdgpu]
      [  513.620363]        amdgpu_mn_invalidate_range_start_hsa+0xa6/0xc0 [amdgpu]
      [  513.620614]        __mmu_notifier_invalidate_range_start+0x70/0xb0
      [  513.620851]        try_to_unmap_one+0x7fc/0x8f0
      [  513.621049]        rmap_walk_anon+0x121/0x290
      [  513.621242]        try_to_unmap+0x93/0xf0
      [  513.621428]        shrink_page_list+0x606/0xcb0
      [  513.621625]        shrink_inactive_list+0x33b/0x700
      [  513.621835]        shrink_node_memcg+0x37a/0x7f0
      [  513.622034]        shrink_node+0xd8/0x490
      [  513.622219]        balance_pgdat+0x18b/0x3b0
      [  513.622410]        kswapd+0x203/0x5c0
      [  513.622589]        kthread+0x100/0x140
      [  513.622769]        ret_from_fork+0x24/0x30
      [  513.622957]
                     other info that might help us debug this:
      
      [  513.623354] Chain exists of:
                       &dqm->lock_hidden --> &mapping->i_mmap_rwsem --> &anon_vma->rwsem
      
      [  513.623900]  Possible unsafe locking scenario:
      
      [  513.624189]        CPU0                    CPU1
      [  513.624397]        ----                    ----
      [  513.624594]   lock(&anon_vma->rwsem);
      [  513.624771]                                lock(&mapping->i_mmap_rwsem);
      [  513.625020]                                lock(&anon_vma->rwsem);
      [  513.625253]   lock(&dqm->lock_hidden);
      [  513.625433]
                      *** DEADLOCK ***
      
      [  513.625783] 3 locks held by kswapd0/611:
      [  513.625967]  #0: 00000000f14edf84 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
      [  513.626309]  #1: 00000000961547fc (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe4/0x250
      [  513.626671]  #2: 0000000067b5cd12 (srcu){....}, at: __mmu_notifier_invalidate_range_start+0x5/0xb0
      [  513.627037]
                     stack backtrace:
      [  513.627292] CPU: 0 PID: 611 Comm: kswapd0 Tainted: G        W         4.18.0-kfd-root #2
      [  513.627632] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  513.627990] Call Trace:
      [  513.628143]  dump_stack+0x7c/0xbb
      [  513.628315]  print_circular_bug.isra.37+0x21b/0x228
      [  513.628581]  __lock_acquire+0xf7d/0x1470
      [  513.628782]  ? unwind_next_frame+0x6c/0x4f0
      [  513.628974]  ? lock_acquire+0xec/0x1e0
      [  513.629154]  lock_acquire+0xec/0x1e0
      [  513.629357]  ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.629587]  __mutex_lock+0x98/0x970
      [  513.629790]  ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.630047]  ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.630309]  ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.630562]  evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.630816]  kfd_process_evict_queues+0x3b/0xb0 [amdgpu]
      [  513.631057]  kgd2kfd_quiesce_mm+0x1c/0x40 [amdgpu]
      [  513.631288]  amdgpu_amdkfd_evict_userptr+0x38/0x70 [amdgpu]
      [  513.631536]  amdgpu_mn_invalidate_range_start_hsa+0xa6/0xc0 [amdgpu]
      [  513.632076]  __mmu_notifier_invalidate_range_start+0x70/0xb0
      [  513.632299]  try_to_unmap_one+0x7fc/0x8f0
      [  513.632487]  ? page_lock_anon_vma_read+0x68/0x250
      [  513.632690]  rmap_walk_anon+0x121/0x290
      [  513.632875]  try_to_unmap+0x93/0xf0
      [  513.633050]  ? page_remove_rmap+0x330/0x330
      [  513.633239]  ? rcu_read_unlock+0x60/0x60
      [  513.633422]  ? page_get_anon_vma+0x160/0x160
      [  513.633613]  shrink_page_list+0x606/0xcb0
      [  513.633800]  shrink_inactive_list+0x33b/0x700
      [  513.633997]  shrink_node_memcg+0x37a/0x7f0
      [  513.634186]  ? shrink_node+0xd8/0x490
      [  513.634363]  shrink_node+0xd8/0x490
      [  513.634537]  balance_pgdat+0x18b/0x3b0
      [  513.634718]  kswapd+0x203/0x5c0
      [  513.634887]  ? wait_woken+0xb0/0xb0
      [  513.635062]  kthread+0x100/0x140
      [  513.635231]  ? balance_pgdat+0x3b0/0x3b0
      [  513.635414]  ? kthread_delayed_work_timer_fn+0x80/0x80
      [  513.635626]  ret_from_fork+0x24/0x30
      [  513.636042] Evicting PASID 32768 queues
      [  513.936236] Restoring PASID 32768 queues
      [  524.708912] Evicting PASID 32768 queues
      [  524.999875] Restoring PASID 32768 queues
      Signed-off-by: default avatarOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      06b89b38
    • Oak Zeng's avatar
      drm/amdkfd: Separate mqd allocation and initialization · 8636e53c
      Oak Zeng authored
      Introduce a new mqd allocation interface and split the original
      init_mqd function into two functions: allocate_mqd and init_mqd.
      Also renamed uninit_mqd to free_mqd. This is preparation work to
      fix a circular lock dependency.
      Signed-off-by: default avatarOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      8636e53c
    • Oak Zeng's avatar
      drm/amdkfd: Refactor create_queue_nocpsch · d39b7737
      Oak Zeng authored
      This is prepare work to fix a circular lock dependency.
      No logic change
      Signed-off-by: default avatarOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      d39b7737
    • Oak Zeng's avatar
      drm/amdkfd: Only load sdma mqd when queue is active · 2ff52819
      Oak Zeng authored
      Also calls load_mqd with current->mm struct. The mm
      struct is used to read back user wptr of the queue.
      Signed-off-by: default avatarOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      2ff52819
    • Oak Zeng's avatar
      drm/amdkfd: Only initialize sdma vm for sdma queues · eec0b4cf
      Oak Zeng authored
      Don't do the same for compute queues
      Signed-off-by: default avatarOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      eec0b4cf
    • Philip Yang's avatar
      drm/amdgpu: use new HMM APIs and helpers · 66c45500
      Philip Yang authored
      HMM provides new APIs and helps in kernel 5.2-rc1 to simplify driver
      path. The old hmm APIs are deprecated and will be removed in future.
      
      Below are changes in driver:
      
      1. Change hmm_vma_fault to hmm_range_register and hmm_range_fault which
      supports range with multiple vmas, remove the multiple vmas handle path
      and data structure.
      2. Change hmm_vma_range_done to hmm_range_unregister.
      3. Use default flags to avoid pre-fill pfn arrays.
      4. Use new hmm_device_ helpers.
      Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      66c45500
    • Dan Carpenter's avatar
      drm/amdgpu: Fix bounds checking in amdgpu_ras_is_supported() · 8252562d
      Dan Carpenter authored
      The "block" variable can be set by the user through debugfs, so it can
      be quite large which leads to shift wrapping here.  This means we report
      a "block" as supported when it's not, and that leads to array overflows
      later on.
      
      This bug is not really a security issue in real life, because debugfs is
      generally root only.
      
      Fixes: 36ea1bd2 ("drm/amdgpu: add debugfs ctrl node")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      8252562d