1. 24 Nov, 2016 2 commits
  2. 23 Nov, 2016 18 commits
  3. 22 Nov, 2016 15 commits
  4. 21 Nov, 2016 5 commits
    • Mika Kuoppala's avatar
      drm/i915: Wipe hang stats as an embedded struct · bc1d53c6
      Mika Kuoppala authored
      Bannable property, banned status, guilty and active counts are
      properties of i915_gem_context. Make them so.
      
      v2: rebase
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1479309634-28574-1-git-send-email-mika.kuoppala@intel.com
      bc1d53c6
    • Mika Kuoppala's avatar
      drm/i915: Add per client max context ban limit · b083a087
      Mika Kuoppala authored
      If we have a bad client submitting unfavourably across different
      contexts, creating new ones, the per context scoring of badness
      doesn't remove the root cause, the offending client.
      To counter, keep track of per client context bans. Deny access if
      client is responsible for more than 3 context bans in
      it's lifetime.
      
      v2: move ban check to context create ioctl (Chris)
      v3: add commentary about hangs needed to reach client ban (Chris)
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
      b083a087
    • Mika Kuoppala's avatar
      drm/i915: Add bannable context parameter · 84102171
      Mika Kuoppala authored
      Now when driver has per context scoring of 'hanging badness'
      and also subsequent hangs during short windows are allowed,
      if there is progress made in between, it does not make sense
      to expose a ban timing window as a context parameter anymore.
      
      Let the scoring be the sole indicator for ban policy and substitute
      ban period context parameter as a boolean to get/set context
      bannable property.
      
      v2: allow non root to opt into being banned (Chris)
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Suggested-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
      84102171
    • Mika Kuoppala's avatar
      drm/i915: Use request retirement as context progress · e5e1fc47
      Mika Kuoppala authored
      As hangcheck score was removed, the active decay of score
      was removed also. This removed feature for hangcheck to detect
      if the gpu client was accidentally or maliciously causing intermittent
      hangs. Reinstate the scoring as a per context property, so that if
      one context starts to act unfavourably, ban it.
      
      v2: ban_period_secs as a gate to score check (Chris)
      v3: decay in proper spot. scores as tunables (Chris)
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
      e5e1fc47
    • Mika Kuoppala's avatar
      drm/i915: Decouple hang detection from hangcheck period · 3fe3b030
      Mika Kuoppala authored
      Hangcheck state accumulation has gained more steps
      along the years, like head movement and more recently the
      subunit inactivity check. As the subunit sampling is only
      done if the previous state check showed inactivity, we
      have added more stages (and time) to reach a hang verdict.
      
      Asymmetric engine states led to different actual weight of
      'one hangcheck unit' and it was demonstrated in some
      hangs that due to difference in stages, simpler engines
      were accused falsely of a hang as their scoring was much
      more quicker to accumulate above the hang treshold.
      
      To completely decouple the hangcheck guilty score
      from the hangcheck period, convert hangcheck score to a
      rough period of inactivity measurement. As these are
      tracked as jiffies, they are meaningful also across
      reset boundaries. This makes finding a guilty engine
      more accurate across multi engine activity scenarios,
      especially across asymmetric engines.
      
      We lose the ability to detect cross batch malicious attempts
      to hinder the progress. Plan is to move this functionality
      to be part of context banning which is more natural fit,
      later in the series.
      
      v2: use time_before macros (Chris)
          reinstate the pardoning of moving engine after hc (Chris)
      v3: avoid global state for per engine stall detection (Chris)
      v4: take timeline last retirement into account (Chris)
      v5: do debug print on pardoning, split out retirement timestamp (Chris)
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
      3fe3b030