An error occurred fetching the project authors.
  1. 12 Jul, 2021 1 commit
  2. 29 Apr, 2021 1 commit
    • Maarten Lankhorst's avatar
      drm/i915: Use trylock in shrinker for ggtt on bsw vt-d and bxt, v2. · bc6f80cc
      Maarten Lankhorst authored
      The stop_machine() lock may allocate memory, but is called inside
      vm->mutex, which is taken in the shrinker. This will cause a lockdep
      splat, as can be seen below:
      
      <4>[  462.585762] ======================================================
      <4>[  462.585768] WARNING: possible circular locking dependency detected
      <4>[  462.585773] 5.12.0-rc5-CI-Trybot_7644+ #1 Tainted: G     U
      <4>[  462.585779] ------------------------------------------------------
      <4>[  462.585783] i915_selftest/5540 is trying to acquire lock:
      <4>[  462.585788] ffffffff826440b0 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x12/0x30
      <4>[  462.585814]
                        but task is already holding lock:
      <4>[  462.585818] ffff888125369c70 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x38e/0xb40 [i915]
      <4>[  462.586301]
                        which lock already depends on the new lock.
      
      <4>[  462.586305]
                        the existing dependency chain (in reverse order) is:
      <4>[  462.586309]
                        -> #2 (&vm->mutex/1){+.+.}-{3:3}:
      <4>[  462.586323]        i915_gem_shrinker_taints_mutex+0x2d/0x50 [i915]
      <4>[  462.586719]        i915_address_space_init+0x12d/0x130 [i915]
      <4>[  462.587092]        ppgtt_init+0x4e/0x80 [i915]
      <4>[  462.587467]        gen8_ppgtt_create+0x3e/0x5c0 [i915]
      <4>[  462.587828]        i915_ppgtt_create+0x28/0xf0 [i915]
      <4>[  462.588203]        intel_gt_init+0x123/0x370 [i915]
      <4>[  462.588572]        i915_gem_init+0x129/0x1f0 [i915]
      <4>[  462.588971]        i915_driver_probe+0x753/0xd80 [i915]
      <4>[  462.589320]        i915_pci_probe+0x43/0x1d0 [i915]
      <4>[  462.589671]        pci_device_probe+0x9e/0x110
      <4>[  462.589680]        really_probe+0xea/0x410
      <4>[  462.589690]        driver_probe_device+0xd9/0x140
      <4>[  462.589697]        device_driver_attach+0x4a/0x50
      <4>[  462.589704]        __driver_attach+0x83/0x140
      <4>[  462.589711]        bus_for_each_dev+0x75/0xc0
      <4>[  462.589718]        bus_add_driver+0x14b/0x1f0
      <4>[  462.589724]        driver_register+0x66/0xb0
      <4>[  462.589731]        i915_init+0x70/0x87 [i915]
      <4>[  462.590053]        do_one_initcall+0x56/0x2e0
      <4>[  462.590061]        do_init_module+0x55/0x200
      <4>[  462.590068]        load_module+0x2703/0x2990
      <4>[  462.590074]        __do_sys_finit_module+0xad/0x110
      <4>[  462.590080]        do_syscall_64+0x33/0x80
      <4>[  462.590089]        entry_SYSCALL_64_after_hwframe+0x44/0xae
      <4>[  462.590096]
                        -> #1 (fs_reclaim){+.+.}-{0:0}:
      <4>[  462.590109]        fs_reclaim_acquire+0x9f/0xd0
      <4>[  462.590118]        kmem_cache_alloc_trace+0x3d/0x430
      <4>[  462.590126]        intel_cpuc_prepare+0x3b/0x1b0
      <4>[  462.590133]        cpuhp_invoke_callback+0x9e/0x890
      <4>[  462.590141]        _cpu_up+0xa4/0x130
      <4>[  462.590147]        cpu_up+0x82/0x90
      <4>[  462.590153]        bringup_nonboot_cpus+0x4a/0x60
      <4>[  462.590159]        smp_init+0x21/0x5c
      <4>[  462.590167]        kernel_init_freeable+0x8a/0x1b7
      <4>[  462.590175]        kernel_init+0x5/0xff
      <4>[  462.590181]        ret_from_fork+0x22/0x30
      <4>[  462.590187]
                        -> #0 (cpu_hotplug_lock){++++}-{0:0}:
      <4>[  462.590199]        __lock_acquire+0x1520/0x2590
      <4>[  462.590207]        lock_acquire+0xd1/0x3d0
      <4>[  462.590213]        cpus_read_lock+0x39/0xc0
      <4>[  462.590219]        stop_machine+0x12/0x30
      <4>[  462.590226]        bxt_vtd_ggtt_insert_entries__BKL+0x36/0x50 [i915]
      <4>[  462.590601]        ggtt_bind_vma+0x5d/0x80 [i915]
      <4>[  462.590970]        i915_vma_bind+0xdc/0x1c0 [i915]
      <4>[  462.591374]        i915_vma_pin_ww+0x435/0xb40 [i915]
      <4>[  462.591779]        make_obj_busy+0xcb/0x330 [i915]
      <4>[  462.592170]        igt_mmap_offset_exhaustion+0x45f/0x4c0 [i915]
      <4>[  462.592562]        __i915_subtests.cold.7+0x42/0x92 [i915]
      <4>[  462.592995]        __run_selftests.part.3+0x10d/0x172 [i915]
      <4>[  462.593428]        i915_live_selftests.cold.5+0x1f/0x47 [i915]
      <4>[  462.593860]        i915_pci_probe+0x93/0x1d0 [i915]
      <4>[  462.594210]        pci_device_probe+0x9e/0x110
      <4>[  462.594217]        really_probe+0xea/0x410
      <4>[  462.594226]        driver_probe_device+0xd9/0x140
      <4>[  462.594233]        device_driver_attach+0x4a/0x50
      <4>[  462.594240]        __driver_attach+0x83/0x140
      <4>[  462.594247]        bus_for_each_dev+0x75/0xc0
      <4>[  462.594254]        bus_add_driver+0x14b/0x1f0
      <4>[  462.594260]        driver_register+0x66/0xb0
      <4>[  462.594267]        i915_init+0x70/0x87 [i915]
      <4>[  462.594586]        do_one_initcall+0x56/0x2e0
      <4>[  462.594592]        do_init_module+0x55/0x200
      <4>[  462.594599]        load_module+0x2703/0x2990
      <4>[  462.594605]        __do_sys_finit_module+0xad/0x110
      <4>[  462.594612]        do_syscall_64+0x33/0x80
      <4>[  462.594618]        entry_SYSCALL_64_after_hwframe+0x44/0xae
      <4>[  462.594625]
                        other info that might help us debug this:
      
      <4>[  462.594629] Chain exists of:
                          cpu_hotplug_lock --> fs_reclaim --> &vm->mutex/1
      
      <4>[  462.594645]  Possible unsafe locking scenario:
      
      <4>[  462.594648]        CPU0                    CPU1
      <4>[  462.594652]        ----                    ----
      <4>[  462.594655]   lock(&vm->mutex/1);
      <4>[  462.594664]                                lock(fs_reclaim);
      <4>[  462.594671]                                lock(&vm->mutex/1);
      <4>[  462.594679]   lock(cpu_hotplug_lock);
      <4>[  462.594686]
                         *** DEADLOCK ***
      
      <4>[  462.594690] 4 locks held by i915_selftest/5540:
      <4>[  462.594696]  #0: ffff888100fbc240 (&dev->mutex){....}-{3:3}, at: device_driver_attach+0x18/0x50
      <4>[  462.594715]  #1: ffffc900006cb9a0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: make_obj_busy+0x81/0x330 [i915]
      <4>[  462.595118]  #2: ffff88812a6081e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: make_obj_busy+0x21f/0x330 [i915]
      <4>[  462.595519]  #3: ffff888125369c70 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x38e/0xb40 [i915]
      <4>[  462.595934]
                        stack backtrace:
      <4>[  462.595939] CPU: 0 PID: 5540 Comm: i915_selftest Tainted: G     U            5.12.0-rc5-CI-Trybot_7644+ #1
      <4>[  462.595947] Hardware name: GOOGLE Kefka/Kefka, BIOS MrChromebox 02/04/2018
      <4>[  462.595952] Call Trace:
      <4>[  462.595961]  dump_stack+0x7f/0xad
      <4>[  462.595974]  check_noncircular+0x12e/0x150
      <4>[  462.595982]  ? save_stack.isra.17+0x3f/0x70
      <4>[  462.595991]  ? drm_mm_insert_node_in_range+0x34a/0x5b0
      <4>[  462.596000]  ? i915_vma_pin_ww+0x9ec/0xb40 [i915]
      <4>[  462.596410]  __lock_acquire+0x1520/0x2590
      <4>[  462.596419]  ? do_init_module+0x55/0x200
      <4>[  462.596429]  lock_acquire+0xd1/0x3d0
      <4>[  462.596435]  ? stop_machine+0x12/0x30
      <4>[  462.596445]  ? gen8_ggtt_insert_entries+0xf0/0xf0 [i915]
      <4>[  462.596816]  cpus_read_lock+0x39/0xc0
      <4>[  462.596824]  ? stop_machine+0x12/0x30
      <4>[  462.596831]  stop_machine+0x12/0x30
      <4>[  462.596839]  bxt_vtd_ggtt_insert_entries__BKL+0x36/0x50 [i915]
      <4>[  462.597210]  ggtt_bind_vma+0x5d/0x80 [i915]
      <4>[  462.597580]  i915_vma_bind+0xdc/0x1c0 [i915]
      <4>[  462.597986]  i915_vma_pin_ww+0x435/0xb40 [i915]
      <4>[  462.598395]  ? make_obj_busy+0xcb/0x330 [i915]
      <4>[  462.598786]  make_obj_busy+0xcb/0x330 [i915]
      <4>[  462.599180]  ? 0xffffffff81000000
      <4>[  462.599187]  ? debug_mutex_unlock+0x50/0xa0
      <4>[  462.599198]  igt_mmap_offset_exhaustion+0x45f/0x4c0 [i915]
      <4>[  462.599592]  __i915_subtests.cold.7+0x42/0x92 [i915]
      <4>[  462.600026]  ? i915_perf_selftests+0x20/0x20 [i915]
      <4>[  462.600422]  ? __i915_nop_setup+0x10/0x10 [i915]
      <4>[  462.600820]  __run_selftests.part.3+0x10d/0x172 [i915]
      <4>[  462.601253]  i915_live_selftests.cold.5+0x1f/0x47 [i915]
      <4>[  462.601686]  i915_pci_probe+0x93/0x1d0 [i915]
      <4>[  462.602037]  ? _raw_spin_unlock_irqrestore+0x3d/0x60
      <4>[  462.602047]  pci_device_probe+0x9e/0x110
      <4>[  462.602057]  really_probe+0xea/0x410
      <4>[  462.602067]  driver_probe_device+0xd9/0x140
      <4>[  462.602075]  device_driver_attach+0x4a/0x50
      <4>[  462.602084]  __driver_attach+0x83/0x140
      <4>[  462.602091]  ? device_driver_attach+0x50/0x50
      <4>[  462.602099]  ? device_driver_attach+0x50/0x50
      <4>[  462.602107]  bus_for_each_dev+0x75/0xc0
      <4>[  462.602116]  bus_add_driver+0x14b/0x1f0
      <4>[  462.602124]  driver_register+0x66/0xb0
      <4>[  462.602133]  i915_init+0x70/0x87 [i915]
      <4>[  462.602453]  ? 0xffffffffa0606000
      <4>[  462.602458]  do_one_initcall+0x56/0x2e0
      <4>[  462.602466]  ? kmem_cache_alloc_trace+0x374/0x430
      <4>[  462.602476]  do_init_module+0x55/0x200
      <4>[  462.602484]  load_module+0x2703/0x2990
      <4>[  462.602500]  ? __do_sys_finit_module+0xad/0x110
      <4>[  462.602507]  __do_sys_finit_module+0xad/0x110
      <4>[  462.602519]  do_syscall_64+0x33/0x80
      <4>[  462.602527]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      <4>[  462.602535] RIP: 0033:0x7fab69d8d89d
      
      Changes since v1:
      - Add lockdep annotations during init, to ensure that lockdep is primed.
        This also fixes a false positive when reading /proc/lockdep_stats
        during module reload.
      Signed-off-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210426102351.921874-1-maarten.lankhorst@linux.intel.comReviewed-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      bc6f80cc
  3. 26 Apr, 2021 1 commit
  4. 23 Apr, 2021 1 commit
  5. 24 Mar, 2021 2 commits
  6. 23 Dec, 2020 1 commit
  7. 08 Jul, 2020 1 commit
    • Chris Wilson's avatar
      drm/i915/gem: Unpin idle contexts from kswapd reclaim · 09137e94
      Chris Wilson authored
      We removed retiring requests from the shrinker in order to decouple the
      mutexes from reclaim in preparation for unravelling the struct_mutex.
      The impact of not retiring is that we are much less agressive in making
      global objects available for shrinking, as such objects remain pinned
      until they are flushed by a heartbeat pulse following the last retired
      request along their timeline. In order to ensure that pulse occurs in
      time for memory reclamation, we should kick it from kswapd.
      
      The catch is that we have added some flush_work() into the retirement
      phase (to ensure that we reach a global idle in a timely manner), but
      these flush_work() are not eligible (i.e do not belong to WQ_MEM_RELCAIM)
      for use from inside kswapd. To avoid flushing those workqueues, we teach
      the retirer not to do so unless we are actually waiting, and only do the
      plain retire from inside the shrinker.
      
      Note that for execlists, we already retire completed contexts as they
      are scheduled out, so it should not be keeping global state
      unnecessarily pinned. The legacy ringbuffer however...
      
      References: 9e953980 ("drm/i915: Remove waiting & retiring from shrinker paths")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200708173748.32734-1-chris@chris-wilson.co.uk
      09137e94
  8. 02 Jul, 2020 1 commit
  9. 02 Apr, 2020 1 commit
  10. 27 Feb, 2020 1 commit
  11. 26 Feb, 2020 1 commit
    • Chris Wilson's avatar
      drm/i915: Avoid recursing onto active vma from the shrinker · 23873426
      Chris Wilson authored
      We mark the vma as active while binding it in order to protect outselves
      from being shrunk under mempressure. This only works if we are strict in
      not attempting to shrink active objects.
      
      <6> [472.618968] Workqueue: events_unbound fence_work [i915]
      <4> [472.618970] Call Trace:
      <4> [472.618974]  ? __schedule+0x2e5/0x810
      <4> [472.618978]  schedule+0x37/0xe0
      <4> [472.618982]  schedule_preempt_disabled+0xf/0x20
      <4> [472.618984]  __mutex_lock+0x281/0x9c0
      <4> [472.618987]  ? mark_held_locks+0x49/0x70
      <4> [472.618989]  ? _raw_spin_unlock_irqrestore+0x47/0x60
      <4> [472.619038]  ? i915_vma_unbind+0xae/0x110 [i915]
      <4> [472.619084]  ? i915_vma_unbind+0xae/0x110 [i915]
      <4> [472.619122]  i915_vma_unbind+0xae/0x110 [i915]
      <4> [472.619165]  i915_gem_object_unbind+0x1dc/0x400 [i915]
      <4> [472.619208]  i915_gem_shrink+0x328/0x660 [i915]
      <4> [472.619250]  ? i915_gem_shrink_all+0x38/0x60 [i915]
      <4> [472.619282]  i915_gem_shrink_all+0x38/0x60 [i915]
      <4> [472.619325]  vm_alloc_page.constprop.25+0x1aa/0x240 [i915]
      <4> [472.619330]  ? rcu_read_lock_sched_held+0x4d/0x80
      <4> [472.619363]  ? __alloc_pd+0xb/0x30 [i915]
      <4> [472.619366]  ? module_assert_mutex_or_preempt+0xf/0x30
      <4> [472.619368]  ? __module_address+0x23/0xe0
      <4> [472.619371]  ? is_module_address+0x26/0x40
      <4> [472.619374]  ? static_obj+0x34/0x50
      <4> [472.619376]  ? lockdep_init_map+0x4d/0x1e0
      <4> [472.619407]  setup_page_dma+0xd/0x90 [i915]
      <4> [472.619437]  alloc_pd+0x29/0x50 [i915]
      <4> [472.619470]  __gen8_ppgtt_alloc+0x443/0x6b0 [i915]
      <4> [472.619503]  gen8_ppgtt_alloc+0xd7/0x300 [i915]
      <4> [472.619535]  ppgtt_bind_vma+0x2a/0xe0 [i915]
      <4> [472.619577]  __vma_bind+0x26/0x40 [i915]
      <4> [472.619611]  fence_work+0x1c/0x90 [i915]
      <4> [472.619617]  process_one_work+0x26a/0x620
      
      Fixes: 2850748e ("drm/i915: Pull i915_vma_pin under the vm->mutex")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200221221818.2861432-1-chris@chris-wilson.co.uk
      (cherry picked from commit 6f24e410)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      23873426
  12. 22 Feb, 2020 1 commit
    • Chris Wilson's avatar
      drm/i915: Avoid recursing onto active vma from the shrinker · 6f24e410
      Chris Wilson authored
      We mark the vma as active while binding it in order to protect outselves
      from being shrunk under mempressure. This only works if we are strict in
      not attempting to shrink active objects.
      
      <6> [472.618968] Workqueue: events_unbound fence_work [i915]
      <4> [472.618970] Call Trace:
      <4> [472.618974]  ? __schedule+0x2e5/0x810
      <4> [472.618978]  schedule+0x37/0xe0
      <4> [472.618982]  schedule_preempt_disabled+0xf/0x20
      <4> [472.618984]  __mutex_lock+0x281/0x9c0
      <4> [472.618987]  ? mark_held_locks+0x49/0x70
      <4> [472.618989]  ? _raw_spin_unlock_irqrestore+0x47/0x60
      <4> [472.619038]  ? i915_vma_unbind+0xae/0x110 [i915]
      <4> [472.619084]  ? i915_vma_unbind+0xae/0x110 [i915]
      <4> [472.619122]  i915_vma_unbind+0xae/0x110 [i915]
      <4> [472.619165]  i915_gem_object_unbind+0x1dc/0x400 [i915]
      <4> [472.619208]  i915_gem_shrink+0x328/0x660 [i915]
      <4> [472.619250]  ? i915_gem_shrink_all+0x38/0x60 [i915]
      <4> [472.619282]  i915_gem_shrink_all+0x38/0x60 [i915]
      <4> [472.619325]  vm_alloc_page.constprop.25+0x1aa/0x240 [i915]
      <4> [472.619330]  ? rcu_read_lock_sched_held+0x4d/0x80
      <4> [472.619363]  ? __alloc_pd+0xb/0x30 [i915]
      <4> [472.619366]  ? module_assert_mutex_or_preempt+0xf/0x30
      <4> [472.619368]  ? __module_address+0x23/0xe0
      <4> [472.619371]  ? is_module_address+0x26/0x40
      <4> [472.619374]  ? static_obj+0x34/0x50
      <4> [472.619376]  ? lockdep_init_map+0x4d/0x1e0
      <4> [472.619407]  setup_page_dma+0xd/0x90 [i915]
      <4> [472.619437]  alloc_pd+0x29/0x50 [i915]
      <4> [472.619470]  __gen8_ppgtt_alloc+0x443/0x6b0 [i915]
      <4> [472.619503]  gen8_ppgtt_alloc+0xd7/0x300 [i915]
      <4> [472.619535]  ppgtt_bind_vma+0x2a/0xe0 [i915]
      <4> [472.619577]  __vma_bind+0x26/0x40 [i915]
      <4> [472.619611]  fence_work+0x1c/0x90 [i915]
      <4> [472.619617]  process_one_work+0x26a/0x620
      
      Fixes: 2850748e ("drm/i915: Pull i915_vma_pin under the vm->mutex")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200221221818.2861432-1-chris@chris-wilson.co.uk
      6f24e410
  13. 22 Jan, 2020 1 commit
  14. 07 Nov, 2019 1 commit
    • Daniel Vetter's avatar
      drm/i915: Switch obj->mm.lock lockdep annotations on its head · f86dbacb
      Daniel Vetter authored
      The trouble with having a plain nesting flag for locks which do not
      naturally nest (unlike block devices and their partitions, which is
      the original motivation for nesting levels) is that lockdep will
      never spot a true deadlock if you screw up.
      
      This patch is an attempt at trying better, by highlighting a bit more
      of the actual nature of the nesting that's going on. Essentially we
      have two kinds of objects:
      
      - objects without pages allocated, which cannot be on any lru and are
        hence inaccessible to the shrinker.
      
      - objects which have pages allocated, which are on an lru, and which
        the shrinker can decide to throw out.
      
      For the former type of object, memory allocations while holding
      obj->mm.lock are permissible. For the latter they are not. And
      get/put_pages transitions between the two types of objects.
      
      This is still not entirely fool-proof since the rules might change.
      But as long as we run such a code ever at runtime lockdep should be
      able to observe the inconsistency and complain (like with any other
      lockdep class that we've split up in multiple classes). But there are
      a few clear benefits:
      
      - We can drop the nesting flag parameter from
        __i915_gem_object_put_pages, because that function by definition is
        never going allocate memory, and calling it on an object which
        doesn't have its pages allocated would be a bug.
      
      - We strictly catch more bugs, since there's not only one place in the
        entire tree which is annotated with the special class. All the
        other places that had explicit lockdep nesting annotations we're now
        going to leave up to lockdep again.
      
      - Specifically this catches stuff like calling get_pages from
        put_pages (which isn't really a good idea, if we can call get_pages
        so could the shrinker). I've seen patches do exactly that.
      
      Of course I fully expect CI will show me for the fool I am with this
      one here :-)
      
      v2: There can only be one (lockdep only has a cache for the first
      subclass, not for deeper ones, and we don't want to make these locks
      even slower). Still separate enums for better documentation.
      
      Real fix: don't forget about phys objs and pin_map(), and fix the
      shrinker to have the right annotations ... silly me.
      
      v3: Forgot usertptr too ...
      
      v4: Improve comment for pages_pin_count, drop the IMPORTANT comment
      and instead prime lockdep (Chris).
      
      v5: Appease checkpatch, no double empty lines (Chris)
      
      v6: More rebasing over selftest changes. Also somehow I forgot to
      push this patch :-/
      
      Also format comments consistently while at it.
      
      v7: Fix typo in commit message (Joonas)
      
      Also drop the priming, with the lmem merge we now have allocations
      while holding the lmem lock, which wreaks the generic priming I've
      done in earlier patches. Should probably be resurrected when lmem is
      fixed. See
      
      commit 232a6eba
      Author: Matthew Auld <matthew.auld@intel.com>
      Date:   Tue Oct 8 17:01:14 2019 +0100
      
          drm/i915: introduce intel_memory_region
      
      I'm keeping the priming patch locally so it wont get lost.
      
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: "Tang, CQ" <cq.tang@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v5)
      Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v6)
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191105090148.30269-1-daniel.vetter@ffwll.ch
      [mlankhorst: Fix commit typos pointed out by Michael Ruhl]
      f86dbacb
  15. 09 Oct, 2019 1 commit
    • Qian Cai's avatar
      locking/lockdep: Remove unused @nested argument from lock_release() · 5facae4f
      Qian Cai authored
      Since the following commit:
      
        b4adfe8e ("locking/lockdep: Remove unused argument in __lock_release")
      
      @nested is no longer used in lock_release(), so remove it from all
      lock_release() calls and friends.
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Acked-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: airlied@linux.ie
      Cc: akpm@linux-foundation.org
      Cc: alexander.levin@microsoft.com
      Cc: daniel@iogearbox.net
      Cc: davem@davemloft.net
      Cc: dri-devel@lists.freedesktop.org
      Cc: duyuyang@gmail.com
      Cc: gregkh@linuxfoundation.org
      Cc: hannes@cmpxchg.org
      Cc: intel-gfx@lists.freedesktop.org
      Cc: jack@suse.com
      Cc: jlbec@evilplan.or
      Cc: joonas.lahtinen@linux.intel.com
      Cc: joseph.qi@linux.alibaba.com
      Cc: jslaby@suse.com
      Cc: juri.lelli@redhat.com
      Cc: maarten.lankhorst@linux.intel.com
      Cc: mark@fasheh.com
      Cc: mhocko@kernel.org
      Cc: mripard@kernel.org
      Cc: ocfs2-devel@oss.oracle.com
      Cc: rodrigo.vivi@intel.com
      Cc: sean@poorly.run
      Cc: st@kernel.org
      Cc: tj@kernel.org
      Cc: tytso@mit.edu
      Cc: vdavydov.dev@gmail.com
      Cc: vincent.guittot@linaro.org
      Cc: viro@zeniv.linux.org.uk
      Link: https://lkml.kernel.org/r/1568909380-32199-1-git-send-email-cai@lca.pwSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5facae4f
  16. 04 Oct, 2019 1 commit
    • Chris Wilson's avatar
      drm/i915: Pull i915_vma_pin under the vm->mutex · 2850748e
      Chris Wilson authored
      Replace the struct_mutex requirement for pinning the i915_vma with the
      local vm->mutex instead. Note that the vm->mutex is tainted by the
      shrinker (we require unbinding from inside fs-reclaim) and so we cannot
      allocate while holding that mutex. Instead we have to preallocate
      workers to do allocate and apply the PTE updates after we have we
      reserved their slot in the drm_mm (using fences to order the PTE writes
      with the GPU work and with later unbind).
      
      In adding the asynchronous vma binding, one subtle requirement is to
      avoid coupling the binding fence into the backing object->resv. That is
      the asynchronous binding only applies to the vma timeline itself and not
      to the pages as that is a more global timeline (the binding of one vma
      does not need to be ordered with another vma, nor does the implicit GEM
      fencing depend on a vma, only on writes to the backing store). Keeping
      the vma binding distinct from the backing store timelines is verified by
      a number of async gem_exec_fence and gem_exec_schedule tests. The way we
      do this is quite simple, we keep the fence for the vma binding separate
      and only wait on it as required, and never add it to the obj->resv
      itself.
      
      Another consequence in reducing the locking around the vma is the
      destruction of the vma is no longer globally serialised by struct_mutex.
      A natural solution would be to add a kref to i915_vma, but that requires
      decoupling the reference cycles, possibly by introducing a new
      i915_mm_pages object that is own by both obj->mm and vma->pages.
      However, we have not taken that route due to the overshadowing lmem/ttm
      discussions, and instead play a series of complicated games with
      trylocks to (hopefully) ensure that only one destruction path is called!
      
      v2: Add some commentary, and some helpers to reduce patch churn.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
      2850748e
  17. 11 Sep, 2019 1 commit
  18. 03 Sep, 2019 1 commit
    • Chris Wilson's avatar
      drm/i915: Replace obj->pin_global with obj->frontbuffer · 5a90606d
      Chris Wilson authored
      obj->pin_global was originally used as a means to keep the shrinker off
      the active scanout, but we use the vma->pin_count itself for that and
      the obj->frontbuffer to delay shrinking active framebuffers. The other
      role that obj->pin_global gained was for spotting display objects inside
      GEM and working harder to keep those coherent; for which we can again
      simply inspect obj->frontbuffer directly.
      
      Coming up next, we will want to manipulate the pin_global counter
      outside of the principle locks, so would need to make pin_global atomic.
      However, since obj->frontbuffer is already managed atomically, it makes
      sense to use that the primary key for display objects instead of having
      pin_global.
      
      Ville pointed out the principle difference is that obj->frontbuffer is
      set for as long as an intel_framebuffer is attached to an object, but
      obj->pin_global was only raised for as long as the object was active. In
      practice, this means that we consider the object as being on the scanout
      for longer than is strictly required, causing us to be more proactive in
      flushing -- though it should be true that we would have flushed
      eventually when the back became the front, except that on the flip path
      that flush is async but when hit from another ioctl it will be
      synchronous.
      
      v2: i915_gem_object_is_framebuffer()
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190902040303.14195-5-chris@chris-wilson.co.uk
      5a90606d
  19. 06 Aug, 2019 1 commit
  20. 02 Aug, 2019 1 commit
    • Chris Wilson's avatar
      drm/i915: Hide unshrinkable context objects from the shrinker · 1aff1903
      Chris Wilson authored
      The shrinker cannot touch objects used by the contexts (logical state
      and ring). Currently we mark those as "pin_global" to let the shrinker
      skip over them, however, if we remove them from the shrinker lists
      entirely, we don't event have to include them in our shrink accounting.
      
      By keeping the unshrinkable objects in our shrinker tracking, we report
      a large number of objects available to be shrunk, and leave the shrinker
      deeply unsatisfied when we fail to reclaim those. The shrinker will
      persist in trying to reclaim the unavailable objects, forcing the system
      into a livelock (not even hitting the dread oomkiller).
      
      v2: Extend unshrinkable protection for perma-pinned scratch and guc
      allocations (Tvrtko)
      v3: Notice that we should be pinned when marking unshrinkable and so the
      link cannot be empty; merge duplicate paths.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190802212137.22207-1-chris@chris-wilson.co.uk
      1aff1903
  21. 03 Jul, 2019 1 commit
  22. 21 Jun, 2019 2 commits
  23. 18 Jun, 2019 1 commit
  24. 14 Jun, 2019 3 commits
  25. 12 Jun, 2019 2 commits
  26. 10 Jun, 2019 1 commit
  27. 31 May, 2019 2 commits
    • Chris Wilson's avatar
      drm/i915: Report all objects with allocated pages to the shrinker · d82b4b26
      Chris Wilson authored
      Currently, we try to report to the shrinker the precise number of
      objects (pages) that are available to be reaped at this moment. This
      requires searching all objects with allocated pages to see if they
      fulfill the search criteria, and this count is performed quite
      frequently. (The shrinker tries to free ~128 pages on each invocation,
      before which we count all the objects; counting takes longer than
      unbinding the objects!) If we take the pragmatic view that with
      sufficient desire, all objects are eventually reapable (they become
      inactive, or no longer used as framebuffer etc), we can simply return
      the count of pinned pages maintained during get_pages/put_pages rather
      than walk the lists every time.
      
      The downside is that we may (slightly) over-report the number of
      objects/pages we could shrink and so penalize ourselves by shrinking
      more than required. This is mitigated by keeping the order in which we
      shrink objects such that we avoid penalizing active and frequently used
      objects, and if memory is so tight that we need to free them we would
      need to anyway.
      
      v2: Only expose shrinkable objects to the shrinker; a small reduction in
      not considering stolen and foreign objects.
      v3: Restore the tracking from a "backup" copy from before the gem/ split
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190530203500.26272-2-chris@chris-wilson.co.uk
      d82b4b26
    • Chris Wilson's avatar
      drm/i915: Track the purgeable objects on a separate eviction list · 3b4fa964
      Chris Wilson authored
      Currently the purgeable objects, I915_MADV_DONTNEED, are mixed in the
      normal bound/unbound lists. Every shrinker pass starts with an attempt
      to purge from this set of unneeded objects, which entails us doing a
      walk over both lists looking for any candidates. If there are none, and
      since we are shrinking we can reasonably assume that the lists are
      full!, this becomes a very slow futile walk.
      
      If we separate out the purgeable objects into own list, this search then
      becomes its own phase that is preferentially handled during shrinking.
      Instead the cost becomes that we then need to filter the purgeable list
      if we want to distinguish between bound and unbound objects.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Auld <matthew.william.auld@gmail.com>
      Reviewed-by: default avatarMatthew Auld <matthew.william.auld@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190530203500.26272-1-chris@chris-wilson.co.uk
      3b4fa964
  28. 28 May, 2019 2 commits
  29. 20 Apr, 2019 1 commit
    • Chris Wilson's avatar
      drm/i915: Start writeback from the shrinker · 2d6692e6
      Chris Wilson authored
      When we are called to relieve mempressue via the shrinker, the only way
      we can make progress is either by discarding unwanted pages (those
      objects that userspace has marked MADV_DONTNEED) or by reclaiming the
      dirty objects via swap. As we know that is the only way to make further
      progress, we can initiate the writeback as we invalidate the objects.
      This means the objects we put onto the inactive anon lru list are
      already marked for reclaim+writeback and so will trigger a wait upon the
      writeback inside direct reclaim, greatly improving the success rate of
      direct reclaim on i915 objects.
      
      The corollary is that we may start a slow swap on opportunistic
      mempressure from the likes of the compaction + migration kthreads. This
      is limited by those threads only being allowed to shrink idle pages, but
      also that if we reactivate the page before it is swapped out by gpu
      activity, we only page the cost of repinning the page. The cost is most
      felt when an object is reused after mempressure, which hopefully
      excludes the latency sensitive tasks (as we are just extending the
      impact of swap thrashing to them).
      
      Apparently this is not the first time we've had this idea. Back in
      commit 5537252b ("drm/i915: Invalidate our pages under memory
      pressure") we wanted to start writeback but settled on invalidate after
      Hugh Dickins warned us about a possibility of a deadlock within shmemfs
      if we started writeback from shrink_slab. Looking at the callchain,
      using writeback from i915_gem_shrink should be equivalent to the pageout
      also employed by shrink_slab, i.e. it should not be any riskier afaict.
      
      v2: Leave mmapings intact. At this point, the only mmapings of our
      objects will be via CPU mmaps on the shmemfs filp, which are
      out-of-scope for our LRU tracking. Instead leave those pages to the
      inactive anon LRU page list for aging and pageout as normal.
      
      v3: Be selective on which paths trigger writeback, in particular
      excluding paths shrinking just to reclaim vm space (e.g. mmap, vmap
      reapers) and avoid starting writeback on the entire process space from
      within the pm freezer.
      
      References: https://bugs.freedesktop.org/show_bug.cgi?id=108686Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Michal Hocko <mhocko@suse.com>
      Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v1
      Link: https://patchwork.freedesktop.org/patch/msgid/20190420115539.29081-1-chris@chris-wilson.co.uk
      2d6692e6
  30. 28 Jan, 2019 2 commits
    • Chris Wilson's avatar
      drm/i915: Pull VM lists under the VM mutex. · 09d7e46b
      Chris Wilson authored
      A starting point to counter the pervasive struct_mutex. For the goal of
      avoiding (or at least blocking under them!) global locks during user
      request submission, a simple but important step is being able to manage
      each clients GTT separately. For which, we want to replace using the
      struct_mutex as the guard for all things GTT/VM and switch instead to a
      specific mutex inside i915_address_space.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190128102356.15037-2-chris@chris-wilson.co.uk
      09d7e46b
    • Chris Wilson's avatar
      drm/i915: Stop tracking MRU activity on VMA · 499197dc
      Chris Wilson authored
      Our goal is to remove struct_mutex and replace it with fine grained
      locking. One of the thorny issues is our eviction logic for reclaiming
      space for an execbuffer (or GTT mmaping, among a few other examples).
      While eviction itself is easy to move under a per-VM mutex, performing
      the activity tracking is less agreeable. One solution is not to do any
      MRU tracking and do a simple coarse evaluation during eviction of
      active/inactive, with a loose temporal ordering of last
      insertion/evaluation. That keeps all the locking constrained to when we
      are manipulating the VM itself, neatly avoiding the tricky handling of
      possible recursive locking during execbuf and elsewhere.
      
      Note that discarding the MRU (currently implemented as a pair of lists,
      to avoid scanning the active list for a NONBLOCKING search) is unlikely
      to impact upon our efficiency to reclaim VM space (where we think a LRU
      model is best) as our current strategy is to use random idle replacement
      first before doing a search, and over time the use of softpinned 48b
      per-ppGTT is growing (thereby eliminating any need to perform any eviction
      searches, in theory at least) with the remaining users being found on
      much older devices (gen2-gen6).
      
      v2: Changelog and commentary rewritten to elaborate on the duality of a
      single list being both an inactive and active list.
      v3: Consolidate bool parameters into a single set of flags; don't
      comment on the duality of a single variable being a multiplicity of
      bits.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190128102356.15037-1-chris@chris-wilson.co.uk
      499197dc
  31. 14 Jan, 2019 2 commits