1. 11 Nov, 2021 3 commits
  2. 10 Nov, 2021 2 commits
  3. 09 Nov, 2021 2 commits
  4. 05 Nov, 2021 4 commits
  5. 04 Nov, 2021 4 commits
  6. 02 Nov, 2021 8 commits
  7. 01 Nov, 2021 2 commits
    • Thomas Hellström's avatar
      drm/i915: Introduce refcounted sg-tables · cad7109a
      Thomas Hellström authored
      As we start to introduce asynchronous failsafe object migration,
      where we update the object state and then submit asynchronous
      commands we need to record what memory resources are actually used
      by various part of the command stream. Initially for three purposes:
      
      1) Error capture.
      2) Asynchronous migration error recovery.
      3) Asynchronous vma bind.
      
      At the time where these happens, the object state may have been updated
      to be several migrations ahead and object sg-tables discarded.
      
      In order to make it possible to keep sg-tables with memory resource
      information for these operations, introduce refcounted sg-tables that
      aren't freed until the last user is done with them.
      
      The alternative would be to reference information sitting on the
      corresponding ttm_resources which typically have the same lifetime as
      these refcountes sg_tables, but that leads to other awkward constructs:
      Due to the design direction chosen for ttm resource managers that would
      lead to diamond-style inheritance, the LMEM resources may sometimes be
      prematurely freed, and finally the subclassed struct ttm_resource would
      have to bleed into the asynchronous vma bind code.
      
      v3:
      - Address a number of style issues (Matthew Auld)
      v4:
      - Dont check for st->sgl being NULL in i915_ttm_tt__shmem_unpopulate(),
        that should never happen. (Matthew Auld)
      v5:
      - Fix a Potential double-free (Matthew Auld)
      Signed-off-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211101122444.114607-1-thomas.hellstrom@linux.intel.com
      cad7109a
    • Cooper Chiou's avatar
      drm/i915: Enable WaProgramMgsrForCorrectSliceSpecificMmioReads for Gen9 · c7d561cf
      Cooper Chiou authored
      This implements WaProgramMgsrForCorrectSliceSpecificMmioReads which
      was omitted by mistake from Gen9 documentation, while it is actually
      applicable to fused off parts.
      
      Workaround consists of making sure MCR packet control register is
      programmed to point to enabled slice/subslice pair before doing any
      MMIO reads from the affected registers.
      
      Failure do to this can result in complete system hangs when running
      certain workloads. Two known cases which can cause system hangs are:
      
      1. "test_basic progvar_prog_scope_uninit" test which is part of
          Khronos OpenCL conformance suite
          (https://github.com/KhronosGroup/OpenCL-CTS) with the Intel
          OpenCL driver (https://github.com/intel/compute-runtime).
      
      2. VP8 media hardware encoding using the full-feature build of the
          Intel media-driver (https://github.com/intel/media-driver) and
          ffmpeg.
      
      For the former case patch was verified to fix the hard system hang
      when executing the OCL test on Intel Pentium CPU 6405U which contains
      fused off GT1 graphics.
      
      Reference: HSD#1508045018,1405586840, BSID#0575
      
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: William Tseng <william.tseng@intel.com>
      Cc: Shawn C Lee <shawn.c.lee@intel.com>
      Cc: Pawel Wilma <pawel.wilma@intel.com>
      Signed-off-by: default avatarCooper Chiou <cooper.chiou@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Acked-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211025042623.3876-1-cooper.chiou@intel.com
      c7d561cf
  8. 29 Oct, 2021 3 commits
  9. 28 Oct, 2021 2 commits
    • Umesh Nerlige Ramappa's avatar
      drm/i915/pmu: Connect engine busyness stats from GuC to pmu · 77cdd054
      Umesh Nerlige Ramappa authored
      With GuC handling scheduling, i915 is not aware of the time that a
      context is scheduled in and out of the engine. Since i915 pmu relies on
      this info to provide engine busyness to the user, GuC shares this info
      with i915 for all engines using shared memory. For each engine, this
      info contains:
      
      - total busyness: total time that the context was running (total)
      - id: id of the running context (id)
      - start timestamp: timestamp when the context started running (start)
      
      At the time (now) of sampling the engine busyness, if the id is valid
      (!= ~0), and start is non-zero, then the context is considered to be
      active and the engine busyness is calculated using the below equation
      
      	engine busyness = total + (now - start)
      
      All times are obtained from the gt clock base. For inactive contexts,
      engine busyness is just equal to the total.
      
      The start and total values provided by GuC are 32 bits and wrap around
      in a few minutes. Since perf pmu provides busyness as 64 bit
      monotonically increasing values, there is a need for this implementation
      to account for overflows and extend the time to 64 bits before returning
      busyness to the user. In order to do that, a worker runs periodically at
      frequency = 1/8th the time it takes for the timestamp to wrap. As an
      example, that would be once in 27 seconds for a gt clock frequency of
      19.2 MHz.
      
      Note:
      There might be an over-accounting of busyness due to the fact that GuC
      may be updating the total and start values while kmd is reading them.
      (i.e kmd may read the updated total and the stale start). In such a
      case, user may see higher busyness value followed by smaller ones which
      would eventually catch up to the higher value.
      
      v2: (Tvrtko)
      - Include details in commit message
      - Move intel engine busyness function into execlist code
      - Use union inside engine->stats
      - Use natural type for ping delay jiffies
      - Drop active_work condition checks
      - Use for_each_engine if iterating all engines
      - Drop seq locking, use spinlock at GuC level to update engine stats
      - Document worker specific details
      
      v3: (Tvrtko/Umesh)
      - Demarcate GuC and execlist stat objects with comments
      - Document known over-accounting issue in commit
      - Provide a consistent view of GuC state
      - Add hooks to gt park/unpark for GuC busyness
      - Stop/start worker in gt park/unpark path
      - Drop inline
      - Move spinlock and worker inits to GuC initialization
      - Drop helpers that are called only once
      
      v4: (Tvrtko/Matt/Umesh)
      - Drop addressed opens from commit message
      - Get runtime pm in ping, remove from the park path
      - Use cancel_delayed_work_sync in disable_submission path
      - Update stats during reset prepare
      - Skip ping if reset in progress
      - Explicitly name execlists and GuC stats objects
      - Since disable_submission is called from many places, move resetting
        stats to intel_guc_submission_reset_prepare
      
      v5: (Tvrtko)
      - Add a trylock helper that does not sleep and synchronize PMU event
        callbacks and worker with gt reset
      
      v6: (CI BAT failures)
      - DUTs using execlist submission failed to boot since __gt_unpark is
        called during i915 load. This ends up calling the GuC busyness unpark
        hook and results in kick-starting an uninitialized worker. Let
        park/unpark hooks check if GuC submission has been initialized.
      - drop cant_sleep() from trylock helper since rcu_read_lock takes care
        of that.
      
      v7: (CI) Fix igt@i915_selftest@live@gt_engines
      - For GuC mode of submission the engine busyness is derived from gt time
        domain. Use gt time elapsed as reference in the selftest.
      - Increase busyness calculation to 10ms duration to ensure batch runs
        longer and falls within the busyness tolerances in selftest.
      
      v8:
      - Use ktime_get in selftest as before
      - intel_reset_trylock_no_wait results in a lockdep splat that is not
        trivial to fix since the PMU callback runs in irq context and the
        reset paths are tightly knit into the driver. The test that uncovers
        this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
        instead use the reset_count to synchronize with gt reset during pmu
        callback. For the ping, continue to use intel_reset_trylock since ping
        is not run in irq context.
      
      - GuC PM timestamp does not tick when GuC is idle. This can potentially
        result in wrong busyness values when a context is active on the
        engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
        process the GuC busyness stats. This works since both GuC timestamp and
        RING timestamp are synced with the same clock.
      
      - The busyness stats may get updated after the batch starts running.
        This delay causes the busyness reported for 100us duration to fall
        below 95% in the selftest. The only option at this time is to wait for
        GuC busyness to change from idle to active before we sample busyness
        over a 100us period.
      Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Signed-off-by: default avatarUmesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
      Acked-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarMatthew Brost <matthew.brost@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
      77cdd054
    • Umesh Nerlige Ramappa's avatar
      drm/i915/pmu: Add a name to the execlists stats · 344e6947
      Umesh Nerlige Ramappa authored
      In preparation for GuC pmu stats, add a name to the execlists stats
      structure so that it can be differentiated from the GuC stats.
      Signed-off-by: default avatarUmesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
      Reviewed-by: default avatarMatthew Brost <matthew.brost@intel.com>
      Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-1-umesh.nerlige.ramappa@intel.com
      344e6947
  10. 27 Oct, 2021 2 commits
  11. 25 Oct, 2021 2 commits
  12. 22 Oct, 2021 6 commits
    • Matthew Brost's avatar
      drm/i915/guc: Fix recursive lock in GuC submission · 12a9917e
      Matthew Brost authored
      Use __release_guc_id (lock held) rather than release_guc_id (acquires
      lock), add lockdep annotations.
      
      213.280129] i915: Running i915_perf_live_selftests/live_noa_gpr
      [ 213.283459] ============================================
      [ 213.283462] WARNING: possible recursive locking detected
      {{[ 213.283466] 5.15.0-rc6+ #18 Tainted: G U W }}
      [ 213.283470] --------------------------------------------
      [ 213.283472] kworker/u24:0/8 is trying to acquire lock:
      [ 213.283475] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x2df/0x350 [i915]
      {{[ 213.283618] }}
      {{ but task is already holding lock:}}
      [ 213.283621] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915]
      {{[ 213.283720] }}
      {{ other info that might help us debug this:}}
      [ 213.283724] Possible unsafe locking scenario:[ 213.283727] CPU0
      [ 213.283728] ----
      [ 213.283730] lock(&guc->submission_state.lock);
      [ 213.283734] lock(&guc->submission_state.lock);
      {{[ 213.283737] }}
      {{ *** DEADLOCK ***}}[ 213.283740] May be due to missing lock nesting notation[ 213.283744] 3 locks held by kworker/u24:0/8:
      [ 213.283747] #0: ffff8ffb80059d38 ((wq_completion)events_unbound){..}-{0:0}, at: process_one_work+0x1f3/0x550
      [ 213.283757] #1: ffffb509000e3e78 ((work_completion)(&guc->submission_state.destroyed_worker)){..}-{0:0}, at: process_one_work+0x1f3/0x550
      [ 213.283766] #2: ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915]
      {{[ 213.283860] }}
      {{ stack backtrace:}}
      [ 213.283863] CPU: 8 PID: 8 Comm: kworker/u24:0 Tainted: G U W 5.15.0-rc6+ #18
      [ 213.283868] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021
      [ 213.283873] Workqueue: events_unbound destroyed_worker_func [i915]
      [ 213.283957] Call Trace:
      [ 213.283960] dump_stack_lvl+0x57/0x72
      [ 213.283966] __lock_acquire.cold+0x191/0x2d3
      [ 213.283972] lock_acquire+0xb5/0x2b0
      [ 213.283978] ? destroyed_worker_func+0x2df/0x350 [i915]
      [ 213.284059] ? destroyed_worker_func+0x2d7/0x350 [i915]
      [ 213.284139] ? lock_release+0xb9/0x280
      [ 213.284143] _raw_spin_lock_irqsave+0x48/0x60
      [ 213.284148] ? destroyed_worker_func+0x2df/0x350 [i915]
      [ 213.284226] destroyed_worker_func+0x2df/0x350 [i915]
      [ 213.284310] process_one_work+0x270/0x550
      [ 213.284315] worker_thread+0x52/0x3b0
      [ 213.284319] ? process_one_work+0x550/0x550
      [ 213.284322] kthread+0x135/0x160
      [ 213.284326] ? set_kthread_struct+0x40/0x40
      [ 213.284331] ret_from_fork+0x1f/0x30
      
      and a bit later in the trace:
      
      {{ 227.499864] do_raw_spin_lock+0x94/0xa0}}
      [ 227.499868] _raw_spin_lock_irqsave+0x50/0x60
      [ 227.499871] ? guc_flush_destroyed_contexts+0x4f/0xf0 [i915]
      [ 227.499995] guc_flush_destroyed_contexts+0x4f/0xf0 [i915]
      [ 227.500104] intel_guc_submission_reset_prepare+0x99/0x4b0 [i915]
      [ 227.500209] ? mark_held_locks+0x49/0x70
      [ 227.500212] intel_uc_reset_prepare+0x46/0x50 [i915]
      [ 227.500320] reset_prepare+0x78/0x90 [i915]
      [ 227.500412] __intel_gt_set_wedged.part.0+0x13/0xe0 [i915]
      [ 227.500485] intel_gt_set_wedged.part.0+0x54/0x100 [i915]
      [ 227.500556] intel_gt_set_wedged_on_fini+0x1a/0x30 [i915]
      [ 227.500622] intel_gt_driver_unregister+0x1e/0x60 [i915]
      [ 227.500694] i915_driver_remove+0x4a/0xf0 [i915]
      [ 227.500767] i915_pci_probe+0x84/0x170 [i915]
      [ 227.500838] local_pci_probe+0x42/0x80
      [ 227.500842] pci_device_probe+0xd9/0x190
      [ 227.500844] really_probe+0x1f2/0x3f0
      [ 227.500847] __driver_probe_device+0xfe/0x180
      [ 227.500848] driver_probe_device+0x1e/0x90
      [ 227.500850] __driver_attach+0xc4/0x1d0
      [ 227.500851] ? __device_attach_driver+0xe0/0xe0
      [ 227.500853] ? __device_attach_driver+0xe0/0xe0
      [ 227.500854] bus_for_each_dev+0x64/0x90
      [ 227.500856] bus_add_driver+0x12e/0x1f0
      [ 227.500857] driver_register+0x8f/0xe0
      [ 227.500859] i915_init+0x1d/0x8f [i915]
      [ 227.500934] ? 0xffffffffc144a000
      [ 227.500936] do_one_initcall+0x58/0x2d0
      [ 227.500938] ? rcu_read_lock_sched_held+0x3f/0x80
      [ 227.500940] ? kmem_cache_alloc_trace+0x238/0x2d0
      [ 227.500944] do_init_module+0x5c/0x270
      [ 227.500946] __do_sys_finit_module+0x95/0xe0
      [ 227.500949] do_syscall_64+0x38/0x90
      [ 227.500951] entry_SYSCALL_64_after_hwframe+0x44/0xae
      [ 227.500953] RIP: 0033:0x7ffa59d2ae0d
      [ 227.500954] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64 89 01 48
      [ 227.500955] RSP: 002b:00007fff320bbf48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      [ 227.500956] RAX: ffffffffffffffda RBX: 00000000022ea710 RCX: 00007ffa59d2ae0d
      [ 227.500957] RDX: 0000000000000000 RSI: 00000000022e1d90 RDI: 0000000000000004
      [ 227.500958] RBP: 0000000000000020 R08: 00007ffa59df3a60 R09: 0000000000000070
      [ 227.500958] R10: 00000000022e1d90 R11: 0000000000000246 R12: 00000000022e1d90
      [ 227.500959] R13: 00000000022e58e0 R14: 0000000000000043 R15: 00000000022e42c0
      
      v2:
       (CI build)
        - Fix build error
      
      Fixes: 1a52faed ("drm/i915/guc: Take GT PM ref when deregistering context")
      Signed-off-by: default avatarMatthew Brost <matthew.brost@intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211020192147.8048-1-matthew.brost@intel.com
      12a9917e
    • Matthew Brost's avatar
      drm/i915/selftests: Update live.evict to wait on requests / idle GPU after each loop · 393211e1
      Matthew Brost authored
      Update live.evict to wait on last request and idle GPU after each loop.
      This not only enhances the test to fill the GGTT on each engine class
      but also avoid timeouts from igt_flush_test when using GuC submission.
      igt_flush_test (idle GPU) can take a long time with GuC submission if
      losts of contexts are created due to H2G / G2H required to destroy
      contexts.
      Signed-off-by: default avatarMatthew Brost <matthew.brost@intel.com>
      Reviewed-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211021214040.33292-1-matthew.brost@intel.com
      393211e1
    • Matthew Brost's avatar
      drm/i915/selftests: Increase timeout in requests perf selftest · 7c287113
      Matthew Brost authored
      perf_parallel_engines is micro benchmark to test i915 request
      scheduling. The test creates a thread per physical engine and submits
      NOP requests and waits the requests to complete in a loop. In execlists
      mode this works perfectly fine as powerful CPU has enough cores to feed
      each engine and process the CSBs. With GuC submission the uC gets
      overwhelmed as all threads feed into a single CTB channel and the GuC
      gets bombarded with CSBs as contexts are immediately switched in and out
      on the engines due to the zero runtime of the requests. When the GuC is
      overwhelmed scheduling of contexts is unfair due to the nature of the
      GuC scheduling algorithm. This behavior is understood and deemed
      acceptable as this micro benchmark isn't close to real world use case.
      Increasing the timeout of wait period for requests to complete. This
      makes the test understand that is ok for contexts to get starved in this
      scenario.
      
      A future patch / cleanup may just delete these micro benchmark tests as
      they basically mean nothing. We care about real workloads not made up
      ones.
      Signed-off-by: default avatarMatthew Brost <matthew.brost@intel.com>
      Reviewed-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211011175704.28509-1-matthew.brost@intel.com
      7c287113
    • Matthew Auld's avatar
      drm/i915/ttm: enable shmem tt backend · 5d12ffe6
      Matthew Auld authored
      Turn on the shmem tt backend, and enable shrinking.
      Signed-off-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Reviewed-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-8-matthew.auld@intel.com
      5d12ffe6
    • Matthew Auld's avatar
      drm/i915/ttm: use cached system pages when evicting lmem · 2eda4fc6
      Matthew Auld authored
      This should let us do an accelerated copy directly to the shmem pages
      when temporarily moving lmem-only objects, where the i915-gem shrinker
      can later kick in to swap out the pages, if needed.
      Signed-off-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Reviewed-by: default avatarThomas Hellström <thomas.hellstrom@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-7-matthew.auld@intel.com
      2eda4fc6
    • Matthew Auld's avatar
      drm/i915/ttm: move shrinker management into adjust_lru · ebd4a8ec
      Matthew Auld authored
      We currently just evict lmem objects to system memory when under memory
      pressure. For this case we might lack the usual object mm.pages, which
      effectively hides the pages from the i915-gem shrinker, until we
      actually "attach" the TT to the object, or in the case of lmem-only
      objects it just gets migrated back to lmem when touched again.
      
      For all cases we can just adjust the i915 shrinker LRU each time we also
      adjust the TTM LRU. The two cases we care about are:
      
        1) When something is moved by TTM, including when initially populating
           an object. Importantly this covers the case where TTM moves something from
           lmem <-> smem, outside of the normal get_pages() interface, which
           should still ensure the shmem pages underneath are reclaimable.
      
        2) When calling into i915_gem_object_unlock(). The unlock should
           ensure the object is removed from the shinker LRU, if it was indeed
           swapped out, or just purged, when the shrinker drops the object lock.
      
      v2(Thomas):
        - Handle managing the shrinker LRU in adjust_lru, where it is always
          safe to touch the object.
      v3(Thomas):
        - Pretty much a re-write. This time piggy back off the shrink_pin
          stuff, which actually seems to fit quite well for what we want here.
      v4(Thomas):
        - Just use a simple boolean for tracking ttm_shrinkable.
      v5:
        - Ensure we call adjust_lru when faulting the object, to ensure the
          pages are visible to the shrinker, if needed.
        - Add back the adjust_lru when in i915_ttm_move (Thomas)
      v6(Reported-by: kernel test robot <lkp@intel.com>):
        - Remove unused i915_tt
      Signed-off-by: default avatarMatthew Auld <matthew.auld@intel.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #v4
      Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-6-matthew.auld@intel.com
      ebd4a8ec