1. 02 Mar, 2020 27 commits
  2. 28 Feb, 2020 13 commits
    • Chris Wilson's avatar
      drm/i915/gt: Expose heartbeat interval via sysfs · 9a40bddd
      Chris Wilson authored
      We monitor the health of the system via periodic heartbeat pulses. The
      pulses also provide the opportunity to perform garbage collection.
      However, we interpret an incomplete pulse (a missed heartbeat) as an
      indication that the system is no longer responsive, i.e. hung, and
      perform an engine or full GPU reset. Given that the preemption
      granularity can be very coarse on a system, we let the sysadmin override
      our legacy timeouts which were "optimised" for desktop applications.
      
      The heartbeat interval can be adjusted per-engine using,
      
      	/sys/class/drm/card?/engine/*/heartbeat_interval_ms
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarSteve Carbonari <steven.carbonari@intel.com>
      Tested-by: default avatarSteve Carbonari <steven.carbonari@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200228131716.3243616-7-chris@chris-wilson.co.uk
      9a40bddd
    • Chris Wilson's avatar
      drm/i915/gt: Expose preempt reset timeout via sysfs · db3d8338
      Chris Wilson authored
      After initialising a preemption request, we give the current resident a
      small amount of time to vacate the GPU. The preemption request is for a
      higher priority context and should be immediate to maintain high
      quality of service (and avoid priority inversion). However, the
      preemption granularity of the GPU can be quite coarse and so we need a
      compromise.
      
      The preempt timeout can be adjusted per-engine using,
      
      	/sys/class/drm/card?/engine/*/preempt_timeout_ms
      
      and can be disabled by setting it to 0.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarSteve Carbonari <steven.carbonari@intel.com>
      Tested-by: default avatarSteve Carbonari <steven.carbonari@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200228131716.3243616-6-chris@chris-wilson.co.uk
      db3d8338
    • Chris Wilson's avatar
      drm/i915/gt: Expose reset stop timeout via sysfs · 72338a1f
      Chris Wilson authored
      When we allow ourselves to sleep before a GPU reset after disabling
      submission, even for a few milliseconds, gives an innocent context the
      opportunity to clear the GPU before the reset occurs. However, how long
      to sleep depends on the typical non-preemptible duration (a similar
      problem to determining the ideal preempt-reset timeout or even the
      heartbeat interval). As this seems of a hard policy decision, punt it to
      userspace.
      
      The timeout can be adjusted using
      
      	/sys/class/drm/card?/engine/*/stop_timeout_ms
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Reviewed-by: default avatarSteve Carbonari <steven.carbonari@intel.com>
      Tested-by: default avatarSteve Carbonari <steven.carbonari@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200228131716.3243616-5-chris@chris-wilson.co.uk
      72338a1f
    • Chris Wilson's avatar
      drm/i915/gt: Expose busywait duration to sysfs · 062444bb
      Chris Wilson authored
      We busywait on an inflight request (one that is currently executing on
      HW, and so might complete quickly) prior to setting up an interrupt and
      sleeping. The trade off is that we keep an expensive CPU core busy in
      order to avoid wake up latency: where that trade off should lie is best
      left to the sysadmin.
      
      The busywait mechanism can be compiled out with
      
      	./scripts/config --set-val DRM_I915_SPIN_REQUEST 0
      
      The maximum busywait duration can be adjusted per-engine using,
      
      	/sys/class/drm/card?/engine/*/ms_busywait_duration_ns
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: default avatarSteve Carbonari <steven.carbonari@intel.com>
      Tested-by: default avatarSteve Carbonari <steven.carbonari@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200228131716.3243616-4-chris@chris-wilson.co.uk
      062444bb
    • Chris Wilson's avatar
      drm/i915/gt: Expose timeslice duration to sysfs · 1a2695a7
      Chris Wilson authored
      Execlists uses a scheduling quantum (a timeslice) to alternate execution
      between ready-to-run contexts of equal priority. This ensures that all
      users (though only if they of equal importance) have the opportunity to
      run and prevents livelocks where contexts may have implicit ordering due
      to userspace semaphores.
      
      The timeslicing mechanism can be compiled out with
      
      	./scripts/config --set-val DRM_I915_TIMESLICE_DURATION 0
      
      The timeslice duration can be adjusted per-engine using,
      
      	/sys/class/drm/card?/engine/*/timeslice_duration_ms
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: default avatarSteve Carbonari <steven.carbonari@intel.com>
      Tested-by: default avatarSteve Carbonari <steven.carbonari@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200228131716.3243616-3-chris@chris-wilson.co.uk
      1a2695a7
    • Chris Wilson's avatar
      drm/i915/gt: Expose engine->mmio_base via sysfs · 6e57cc39
      Chris Wilson authored
      Use the per-engine sysfs directory to let userspace discover the
      mmio_base of each engine. Prior to recent generations, the user
      accessible registers on each engine are at a fixed offset relative to
      each engine -- but require absolute addressing. As the absolute address
      depends on the actual physical engine, this is not always possible to
      determine from userspace (for example icl may expose vcs1 or vcs2 as the
      second vcs engine). Make this easy for userspace to discover by
      providing the mmio_base in sysfs.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: default avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Reviewed-by: default avatarSteve Carbonari <steven.carbonari@intel.com>
      Tested-by: default avatarSteve Carbonari <steven.carbonari@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200228131716.3243616-2-chris@chris-wilson.co.uk
      6e57cc39
    • Chris Wilson's avatar
      drm/i915/gt: Expose engine properties via sysfs · 4ec76dbe
      Chris Wilson authored
      Preliminary stub to add engines underneath /sys/class/drm/cardN/, so
      that we can expose properties on each engine to the sysadmin.
      
      To start with we have basic analogues of the i915_query ioctl so that we
      can pretty print engine discovery from the shell, and flesh out the
      directory structure. Later we will add writeable sysadmin properties such
      as per-engine timeout controls.
      
      An example tree of the engine properties on Braswell:
          /sys/class/drm/card0
          └── engine
              ├── bcs0
              │   ├── capabilities
              │   ├── class
              │   ├── instance
              │   ├── known_capabilities
              │   └── name
              ├── rcs0
              │   ├── capabilities
              │   ├── class
              │   ├── instance
              │   ├── known_capabilities
              │   └── name
              ├── vcs0
              │   ├── capabilities
              │   ├── class
              │   ├── instance
              │   ├── known_capabilities
              │   └── name
              └── vecs0
                  ├── capabilities
                  ├── class
                  ├── instance
                  ├── known_capabilities
                  └── name
      
      v2: Include stringified capabilities
      v3: Include all known capabilities for futureproofing.
      v4: Combine the two caps loops into one
      
      v5: Hide underneath Kconfig.unstable for wider discussion
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Acked-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Tested-by: default avatarSteve Carbonari <steven.carbonari@intel.com>
      Reviewed-by: default avatarSteve Carbonari <steven.carbonari@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200228131716.3243616-1-chris@chris-wilson.co.uk
      4ec76dbe
    • Dan Carpenter's avatar
      drm/i915/selftests: Fix return in assert_mmap_offset() · efbf9288
      Dan Carpenter authored
      The assert_mmap_offset() returns type bool so if we return an error
      pointer that is "return true;" or success.  If we have an error, then
      we should return false.
      
      Fixes: 3d81d589 ("drm/i915: Test exhaustion of the mmap space")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200228141413.qfjf4abr323drlo4@kili.mountain
      efbf9288
    • Ville Syrjälä's avatar
      3c75050e
    • Ville Syrjälä's avatar
      1c5fad61
    • Ville Syrjälä's avatar
    • Chris Wilson's avatar
      drm/i915/gt: Reset queue_priority_hint after wedging · 3fc28d3e
      Chris Wilson authored
      An odd and highly unlikely path caught us out. On delayed submission
      (due to an asynchronous reset handler), we poked the priority_hint and
      kicked the tasklet. However, we had already marked the device as wedged
      and swapped out the tasklet for a no-op. The result was that we never
      cleared the priority hint and became upset when we later checked.
      
      <0> [574.303565] i915_sel-6278    2.... 481822445us : __i915_subtests: Running intel_execlists_live_selftests/live_error_interrupt
      <0> [574.303565] i915_sel-6278    2.... 481822472us : __engine_unpark: 0000:00:02.0 rcs0:
      <0> [574.303565] i915_sel-6278    2.... 481822491us : __gt_unpark: 0000:00:02.0
      <0> [574.303565] i915_sel-6278    2.... 481823220us : execlists_context_reset: 0000:00:02.0 rcs0: context:f4ee reset
      <0> [574.303565] i915_sel-6278    2.... 481824830us : __intel_context_active: 0000:00:02.0 rcs0: context:f51b active
      <0> [574.303565] i915_sel-6278    2.... 481825258us : __intel_context_do_pin: 0000:00:02.0 rcs0: context:f51b pin ring:{start:00006000, head:0000, tail:0000}
      <0> [574.303565] i915_sel-6278    2.... 481825311us : __i915_request_commit: 0000:00:02.0 rcs0: fence f51b:2, current 0
      <0> [574.303565] i915_sel-6278    2d..1 481825347us : __i915_request_submit: 0000:00:02.0 rcs0: fence f51b:2, current 0
      <0> [574.303565] i915_sel-6278    2d..1 481825363us : trace_ports: 0000:00:02.0 rcs0: submit { f51b:2, 0:0 }
      <0> [574.303565] i915_sel-6278    2.... 481826809us : __intel_context_active: 0000:00:02.0 rcs0: context:f51c active
      <0> [574.303565]   <idle>-0       7d.h2 481827326us : cs_irq_handler: 0000:00:02.0 rcs0: CS error: 1
      <0> [574.303565]   <idle>-0       7..s1 481827377us : process_csb: 0000:00:02.0 rcs0: cs-irq head=3, tail=4
      <0> [574.303565]   <idle>-0       7..s1 481827379us : process_csb: 0000:00:02.0 rcs0: csb[4]: status=0x10000001:0x00000000
      <0> [574.305593]   <idle>-0       7..s1 481827385us : trace_ports: 0000:00:02.0 rcs0: promote { f51b:2*, 0:0 }
      <0> [574.305611]   <idle>-0       7..s1 481828179us : execlists_reset: 0000:00:02.0 rcs0: reset for CS error
      <0> [574.305611] i915_sel-6278    2.... 481828284us : __intel_context_do_pin: 0000:00:02.0 rcs0: context:f51c pin ring:{start:00007000, head:0000, tail:0000}
      <0> [574.305611] i915_sel-6278    2.... 481828345us : __i915_request_commit: 0000:00:02.0 rcs0: fence f51c:2, current 0
      <0> [574.305611]   <idle>-0       7dNs2 481847823us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence f51b:2, current 1
      <0> [574.305611]   <idle>-0       7dNs2 481847857us : execlists_hold: 0000:00:02.0 rcs0: fence f51b:2, current 1 on hold
      <0> [574.305611]   <idle>-0       7.Ns1 481847863us : intel_engine_reset: 0000:00:02.0 rcs0: flags=4
      <0> [574.305611]   <idle>-0       7.Ns1 481847945us : execlists_reset_prepare: 0000:00:02.0 rcs0: depth<-1
      <0> [574.305611]   <idle>-0       7.Ns1 481847946us : intel_engine_stop_cs: 0000:00:02.0 rcs0:
      <0> [574.305611]   <idle>-0       7.Ns1 538584284us : intel_engine_stop_cs: 0000:00:02.0 rcs0: timed out on STOP_RING -> IDLE
      <0> [574.305611]   <idle>-0       7.Ns1 538584347us : __intel_gt_reset: 0000:00:02.0 engine_mask=1
      <0> [574.305611]   <idle>-0       7.Ns1 538584406us : execlists_reset_rewind: 0000:00:02.0 rcs0:
      <0> [574.305611]   <idle>-0       7dNs2 538585050us : __i915_request_reset: 0000:00:02.0 rcs0: fence f51b:2, current 1 guilty? yes
      <0> [574.305611]   <idle>-0       7dNs2 538585063us : __execlists_reset: 0000:00:02.0 rcs0: replay {head:0000, tail:0068}
      <0> [574.306565]   <idle>-0       7.Ns1 538588457us : intel_engine_cancel_stop_cs: 0000:00:02.0 rcs0:
      <0> [574.306565]   <idle>-0       7dNs2 538588462us : __i915_request_submit: 0000:00:02.0 rcs0: fence f51c:2, current 0
      <0> [574.306565]   <idle>-0       7dNs2 538588471us : trace_ports: 0000:00:02.0 rcs0: submit { f51c:2, 0:0 }
      <0> [574.306565]   <idle>-0       7.Ns1 538588474us : execlists_reset_finish: 0000:00:02.0 rcs0: depth->1
      <0> [574.306565] kworker/-202     2.... 538588755us : i915_request_retire: 0000:00:02.0 rcs0: fence f51c:2, current 2
      <0> [574.306565] ksoftirq-46      7..s. 538588773us : process_csb: 0000:00:02.0 rcs0: cs-irq head=11, tail=1
      <0> [574.306565] ksoftirq-46      7..s. 538588774us : process_csb: 0000:00:02.0 rcs0: csb[0]: status=0x10000001:0x00000000
      <0> [574.306565] ksoftirq-46      7..s. 538588776us : trace_ports: 0000:00:02.0 rcs0: promote { f51c:2!, 0:0 }
      <0> [574.306565] ksoftirq-46      7..s. 538588778us : process_csb: 0000:00:02.0 rcs0: csb[1]: status=0x10000018:0x00000020
      <0> [574.306565] ksoftirq-46      7..s. 538588779us : trace_ports: 0000:00:02.0 rcs0: completed { f51c:2!, 0:0 }
      <0> [574.306565] kworker/-202     2.... 538588826us : intel_context_unpin: 0000:00:02.0 rcs0: context:f51c unpin
      <0> [574.306565] i915_sel-6278    6.... 538589663us : __intel_gt_set_wedged.part.32: 0000:00:02.0 start
      <0> [574.306565] i915_sel-6278    6.... 538589667us : execlists_reset_prepare: 0000:00:02.0 rcs0: depth<-0
      <0> [574.306565] i915_sel-6278    6.... 538589710us : intel_engine_stop_cs: 0000:00:02.0 rcs0:
      <0> [574.306565] i915_sel-6278    6.... 538589732us : execlists_reset_prepare: 0000:00:02.0 bcs0: depth<-0
      <0> [574.307591] i915_sel-6278    6.... 538589733us : intel_engine_stop_cs: 0000:00:02.0 bcs0:
      <0> [574.307591] i915_sel-6278    6.... 538589757us : execlists_reset_prepare: 0000:00:02.0 vcs0: depth<-0
      <0> [574.307591] i915_sel-6278    6.... 538589758us : intel_engine_stop_cs: 0000:00:02.0 vcs0:
      <0> [574.307591] i915_sel-6278    6.... 538589771us : execlists_reset_prepare: 0000:00:02.0 vcs1: depth<-0
      <0> [574.307591] i915_sel-6278    6.... 538589772us : intel_engine_stop_cs: 0000:00:02.0 vcs1:
      <0> [574.307591] i915_sel-6278    6.... 538589778us : execlists_reset_prepare: 0000:00:02.0 vecs0: depth<-0
      <0> [574.307591] i915_sel-6278    6.... 538589780us : intel_engine_stop_cs: 0000:00:02.0 vecs0:
      <0> [574.307591] i915_sel-6278    6.... 538589786us : __intel_gt_reset: 0000:00:02.0 engine_mask=ff
      <0> [574.307591] i915_sel-6278    6.... 538591175us : execlists_reset_cancel: 0000:00:02.0 rcs0:
      <0> [574.307591] i915_sel-6278    6.... 538591970us : execlists_reset_cancel: 0000:00:02.0 bcs0:
      <0> [574.307591] i915_sel-6278    6.... 538591982us : execlists_reset_cancel: 0000:00:02.0 vcs0:
      <0> [574.307591] i915_sel-6278    6.... 538591996us : execlists_reset_cancel: 0000:00:02.0 vcs1:
      <0> [574.307591] i915_sel-6278    6.... 538592759us : execlists_reset_cancel: 0000:00:02.0 vecs0:
      <0> [574.307591] i915_sel-6278    6.... 538592977us : execlists_reset_finish: 0000:00:02.0 rcs0: depth->0
      <0> [574.307591] i915_sel-6278    6.N.. 538592996us : execlists_reset_finish: 0000:00:02.0 bcs0: depth->0
      <0> [574.307591] i915_sel-6278    6.N.. 538593023us : execlists_reset_finish: 0000:00:02.0 vcs0: depth->0
      <0> [574.307591] i915_sel-6278    6.N.. 538593037us : execlists_reset_finish: 0000:00:02.0 vcs1: depth->0
      <0> [574.307591] i915_sel-6278    6.N.. 538593051us : execlists_reset_finish: 0000:00:02.0 vecs0: depth->0
      <0> [574.307591] i915_sel-6278    6.... 538593407us : __intel_gt_set_wedged.part.32: 0000:00:02.0 end
      <0> [574.307591] kworker/-210     7d..1 551958381us : execlists_unhold: 0000:00:02.0 rcs0: fence f51b:2, current 2 hold release
      <0> [574.307591] i915_sel-6278    0.... 559490788us : i915_request_retire: 0000:00:02.0 rcs0: fence f51b:2, current 2
      <0> [574.307591] i915_sel-6278    0.... 559490793us : intel_context_unpin: 0000:00:02.0 rcs0: context:f51b unpin
      <0> [574.307591] i915_sel-6278    0.... 559490798us : __engine_park: 0000:00:02.0 rcs0: parked
      <0> [574.307591] i915_sel-6278    0.... 559490982us : __intel_context_retire: 0000:00:02.0 rcs0: context:f51c retire runtime: { total:30004ns, avg:30004ns }
      <0> [574.307591] i915_sel-6278    0.... 559491372us : __engine_park: __engine_park:261 GEM_BUG_ON(engine->execlists.queue_priority_hint != (-((int)(~0U >> 1)) - 1))
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-9-chris@chris-wilson.co.uk
      3fc28d3e
    • Chris Wilson's avatar
      drm/i915/selftests: Be a little more lenient for reset workers · 280e285d
      Chris Wilson authored
      Give the reset worker a kick before losing help when waiting for hang
      recovery, as the CPU scheduler is a little unreliable.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-15-chris@chris-wilson.co.uk
      280e285d