An error occurred fetching the project authors.
  1. 28 Jan, 2016 1 commit
  2. 21 Jan, 2016 1 commit
  3. 10 Dec, 2015 1 commit
  4. 09 Dec, 2015 1 commit
    • Chris Wilson's avatar
      drm/i915: Add soft-pinning API for execbuffer · 506a8e87
      Chris Wilson authored
      Userspace can pass in an offset that it presumes the object is located
      at. The kernel will then do its utmost to fit the object into that
      location. The assumption is that userspace is handling its own object
      locations (for example along with full-ppgtt) and that the kernel will
      rarely have to make space for the user's requests.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      
      v2: Fixed incorrect eviction found by Michal Winiarski - fix suggested by Chris
      Wilson.  Fixed incorrect error paths causing crash found by Michal Winiarski.
      (Not published externally)
      
      v3: Rebased because of trivial conflict in object_bind_to_vm.  Fixed eviction
      to allow eviction of soft-pinned objects when another soft-pinned object used
      by a subsequent execbuffer overlaps reported by Michal Winiarski.
      (Not published externally)
      
      v4: Moved soft-pinned objects to the front of ordered_vmas so that they are
      pinned first after an address conflict happens to avoid repeated conflicts in
      rare cases (Suggested by Chris Wilson).  Expanded comment on
      drm_i915_gem_exec_object2.offset to cover this new API.
      
      v5: Added I915_PARAM_HAS_EXEC_SOFTPIN parameter for detecting this capability
      (Kristian). Added check for multiple pinnings on eviction (Akash). Made sure
      buffers are not considered misplaced without the user specifying
      EXEC_OBJECT_SUPPORTS_48B_ADDRESS.  User must assume responsibility for any
      addressing workarounds.  Updated object2.offset field comment again to clarify
      NO_RELOC case (Chris).  checkpatch cleanup.
      
      v6: Trivial rebase on latest drm-intel-nightly
      
      v7: Catch attempts to pin above the max virtual address size and return
      EINVAL (Tvrtko). Decouple EXEC_OBJECT_SUPPORTS_48B_ADDRESS and
      EXEC_OBJECT_PINNED flags, user must pass both flags in any attempt to pin
      something at an offset above 4GB (Chris, Daniel Vetter).
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Akash Goel <akash.goel@intel.com>
      Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
      Cc: Michal Winiarski <michal.winiarski@intel.com>
      Cc: Zou Nanhai <nanhai.zou@intel.com>
      Cc: Kristian Høgsberg <hoegsberg@gmail.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Reviewed-by: default avatarMichel Thierry <michel.thierry@intel.com>
      Acked-by: PDT
      Signed-off-by: default avatarThomas Daniel <thomas.daniel@intel.com>
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1449575707-20933-1-git-send-email-thomas.daniel@intel.com
      506a8e87
  5. 18 Nov, 2015 1 commit
  6. 19 Oct, 2015 1 commit
    • Chris Wilson's avatar
      drm/i915: Report context GTT size · fa8848f2
      Chris Wilson authored
      Since the beginning we have conflated the size of the global GTT with
      that of the per-process context sizes. In recent times (gen8+), those
      are no longer the same where the global GTT is limited to 2/4GiB but the
      per-process GTT may be anything up to 256TiB. Userspace knows nothing of
      this discrepancy and outside of one or two hacks, uses the getaperture
      ioctl to determine the maximum size it can use. Let's leave that as
      reporting the global GTT and use the context reporting method to
      describe the per-process value (which naturally fallsback to reporting
      the aliasing or global on older platforms, so userspace can always use
      this method where available).
      
      Testcase: igt/gem_userptr_blits/minor-normal-sync
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90065Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      fa8848f2
  7. 01 Oct, 2015 1 commit
    • Michel Thierry's avatar
      drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset · 101b506a
      Michel Thierry authored
      There are some allocations that must be only referenced by 32-bit
      offsets. To limit the chances of having the first 4GB already full,
      objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
      DRM_MM_CREATE_TOP flags
      
      In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
      General State Heap (GSH) or Instruction State Heap (ISH) must be in a
      32-bit range, because the General State Offset and Instruction State
      Offset are limited to 32-bits.
      
      Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
      they can be allocated above the 32-bit address range. To limit the
      chances of having the first 4GB already full, objects will use
      DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.
      
      The libdrm user of the EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag is here:
      http://lists.freedesktop.org/archives/intel-gfx/2015-September/075836.html
      
      v2: Changed flag logic from neeeds_32b, to supports_48b.
      v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
      v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
      to use last PIN_ defined instead of hard-coded value; use correct limit
      check in eb_vma_misplaced. (Chris)
      v5: Don't touch PIN_OFFSET_MASK and update workaround comment (Chris)
      v6: Apply pin-high for ggtt too (Chris)
      v7: Handle simultaneous pin-high and pin-mappable end correctly (Akash)
          Fix check for entries currently using +4GB addresses, use min_t and
          other polish in object_bind_to_vm (Chris)
      v8: Commit message updated to point to libdrm patch.
      v9: vmas are allocated in the correct ozone, so only check flag when the
          vma has not been allocated. (Chris)
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
      Signed-off-by: default avatarMichel Thierry <michel.thierry@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      101b506a
  8. 02 Sep, 2015 1 commit
  9. 21 Jul, 2015 1 commit
  10. 15 Jul, 2015 1 commit
  11. 06 Jul, 2015 1 commit
    • Abdiel Janulgue's avatar
      drm/i915: Expose I915_EXEC_RESOURCE_STREAMER flag and getparam · a9ed33ca
      Abdiel Janulgue authored
      Ensures that the batch buffer is executed by the resource streamer.
      And will let userspace know whether Resource Streamer is supported in
      the kernel.
      
      v2: Don't skip 1<<15 for the exec flags (Jani Nikula)
      v3: Use HAS_RESOURCE_STREAMER macro for execbuf validation (Chris Wilson)
      
      (from getparam patch)
      
      v2: Update I915_PARAM_HAS_RESOURCE_STREAMER so it's after
          I915_PARAM_HAS_GPU_RESET.
      v3: Only advertise RS support for hardware that supports it.
      v4: Add HAS_RESOURCE_STREAMER() macro (Chris)
      
      Testcase: igt/gem_exec_params
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Kenneth Graunke <kenneth@whitecape.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarAbdiel Janulgue <abdiel.janulgue@linux.intel.com>
      [danvet: squash in getparam patch since it'd break bisect, suggested
      by Chris.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      a9ed33ca
  12. 15 Jun, 2015 1 commit
  13. 29 May, 2015 1 commit
  14. 26 May, 2015 1 commit
  15. 10 Apr, 2015 1 commit
  16. 27 Mar, 2015 1 commit
  17. 17 Mar, 2015 2 commits
    • Jeff McGee's avatar
      drm/i915: Export total subslice and EU counts · a1559ffe
      Jeff McGee authored
      Setup new I915_GETPARAM ioctl entries for subslice total and
      EU total. Userspace drivers need these values when constructing
      GPGPU commands. This kernel query method is intended to replace
      the PCI ID-based tables that userspace drivers currently maintain.
      The kernel driver can employ fuse register reads as needed to
      ensure the most accurate determination of GT config attributes.
      This first became important with Cherryview in which the config
      could differ between devices with the same PCI ID.
      
      The kernel detection of these values is device-specific and not
      included in this patch. Because zero is not a valid value for any of
      these parameters, a value of zero is interpreted as unknown for the
      device. Userspace drivers should continue to maintain ID-based tables
      for older devices not supported by the new query method.
      
      v2: Increment our I915_GETPARAM indices to fit after REVISION
          which was merged ahead of us.
      
      For: VIZ-4636
      Signed-off-by: default avatarJeff McGee <jeff.mcgee@intel.com>
      Tested-by: default avatarZhigang Gong <zhigang.gong@linux.intel.com>
      Acked-by: default avatarZhigang Gong <zhigang.gong@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      a1559ffe
    • Neil Roberts's avatar
      drm/i915: Add I915_PARAM_REVISION · 27cd4461
      Neil Roberts authored
      Adds a parameter which can be used with DRM_I915_GETPARAM to query the
      GPU revision. The intention is to use this in Mesa to implement the
      WaDisableSIMD16On3SrcInstr workaround on Skylake but only for
      revision 2.
      Signed-off-by: default avatarNeil Roberts <neil@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      27cd4461
  18. 27 Jan, 2015 2 commits
    • Zhipeng Gong's avatar
      drm/i915: add I915_PARAM_HAS_BSD2 to i915_getparam · 08e16dc8
      Zhipeng Gong authored
      This will let userland only try to use the new ring
      when the appropriate kernel is present
      
      v2: change the number to be consistent with upstream (Zhipeng)
      Signed-off-by: default avatarZhipeng Gong <zhipeng.gong@intel.com>
      Reviewed--by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      08e16dc8
    • Zhipeng Gong's avatar
      drm/i915: Specify bsd rings through exec flag · 8d360dff
      Zhipeng Gong authored
      On Skylake GT3 we have 2 Video Command Streamers (VCS), which is asymmetrical.
      For example, HEVC GPU commands can be only dispatched to VCS1 ring.
      But userspace has no control when using VCS1 or VCS2. This patch introduces
      a mechanism to avoid the default ping-pong mode and use one specific ring
      through execution flag. This mechanism is usable for all the platforms
      with 2 VCS rings.
      
      The open source usage is from these two commits in vaapi/intel:
      	commit 702050f04131a44ef8ac16651708ce8a8d98e4b8
      	Author: Zhao, Yakui <yakui.zhao@intel.com>
      	Date:   Mon Nov 17 12:44:19 2014 +0800
      
      	    Allow the batchbuffer to be submitted with override flag
      
      	commit a56efcdf27d11ad9b21664b4a2cda72d7f90f5a8
      	Author: Zhao Yakui <yakui.zhao@intel.com>
      	Date:   Mon Nov 17 12:44:22 2014 +0800
      
      	    Add the override flag to assure that HEVC video command
      		always uses BSD ring0 for SKL GT3 machine
      
      v2: fix whitespace (Rodrigo)
      v3: remove incorrect chunk that came on -collector rebase. (Rodrigo)
      v4: change the comment (Zhipeng)
      v5: address Daniel's comment (Zhipeng)
      Signed-off-by: default avatarZhipeng Gong <zhipeng.gong@intel.com>
      Reviewed-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      8d360dff
  19. 07 Jan, 2015 1 commit
  20. 06 Jan, 2015 1 commit
    • Akash Goel's avatar
      drm/i915: Support creation of unbound wc user mappings for objects · 1816f923
      Akash Goel authored
      This patch provides support to create write-combining virtual mappings of
      GEM object. It intends to provide the same funtionality of 'mmap_gtt'
      interface without the constraints and contention of a limited aperture
      space, but requires clients handles the linear to tile conversion on their
      own. This is for improving the CPU write operation performance, as with such
      mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
      to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
      flush after update from CPU side, when object is passed onto GPU.  This
      type of mapping is specially useful in case of sub-region update,
      i.e. when only a portion of the object is to be updated. Using a CPU mmap
      in such cases would normally incur a clflush of the whole object, and
      using a GTT mmapping would likely require eviction of an active object or
      fence and thus stall. The write-combining CPU mmap avoids both.
      
      To ensure the cache coherency, before using this mapping, the GTT domain
      has been reused here. This provides the required cache flush if the object
      is in CPU domain or synchronization against the concurrent rendering.
      Although the access through an uncached mmap should automatically
      invalidate the cache lines, this may not be true for non-temporal write
      instructions and also not all pages of the object may be updated at any
      given point of time through this mapping.  Having a call to get_pages in
      set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
      Broaden application of set-domain(GTT)', would guarantee the clflush and
      so there will be no cachelines holding the data for the object before it
      is accessed through this map.
      
      The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
      extended with a new flags field (defaulting to 0 for existent users). In
      order for userspace to detect the extended ioctl, a new parameter
      I915_PARAM_MMAP_VERSION has been added for versioning the ioctl interface.
      
      v2: Fix error handling, invalid flag detection, renaming (ickle)
      
      v3: Rebase to latest drm-intel-nightly codebase
      
      The new mmapping is exercised by igt/gem_mmap_wc,
      igt/gem_concurrent_blit and igt/gem_gtt_speed.
      
      Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
      Signed-off-by: default avatarAkash Goel <akash.goel@intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      1816f923
  21. 14 Nov, 2014 1 commit
    • Chris Wilson's avatar
      drm/i915: Make the physical object coherent with GTT · 6a2c4232
      Chris Wilson authored
      Currently objects for which the hardware needs a contiguous physical
      address are allocated a shadow backing storage to satisfy the contraint.
      This shadow buffer is not wired into the normal obj->pages and so the
      physical object is incoherent with accesses via the GPU, GTT and CPU. By
      setting up the appropriate scatter-gather table, we can allow userspace
      to access the physical object via either a GTT mmaping of or by rendering
      into the GEM bo. However, keeping the CPU mmap of the shmemfs backing
      storage coherent with the contiguous shadow is not yet possible.
      Fortuituously, CPU mmaps of objects requiring physical addresses are not
      expected to be coherent anyway.
      
      This allows the physical constraint of the GEM object to be transparent
      to userspace and allow it to efficiently render into or update them via
      the GTT and GPU.
      
      v2: Fix leak of pci handle spotted by Ville
      v3: Remove the now duplicate call to detach_phys_object during free.
      v4: Wait for rendering before pwrite. As this patch makes it possible to
      render into the phys object, we should make it correct as well!
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      6a2c4232
  22. 07 Nov, 2014 1 commit
  23. 16 May, 2014 1 commit
    • Chris Wilson's avatar
      drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl · 5cc9ed4b
      Chris Wilson authored
      By exporting the ability to map user address and inserting PTEs
      representing their backing pages into the GTT, we can exploit UMA in order
      to utilize normal application data as a texture source or even as a
      render target (depending upon the capabilities of the chipset). This has
      a number of uses, with zero-copy downloads to the GPU and efficient
      readback making the intermixed streaming of CPU and GPU operations
      fairly efficient. This ability has many widespread implications from
      faster rendering of client-side software rasterisers (chromium),
      mitigation of stalls due to read back (firefox) and to faster pipelining
      of texture data (such as pixel buffer objects in GL or data blobs in CL).
      
      v2: Compile with CONFIG_MMU_NOTIFIER
      v3: We can sleep while performing invalidate-range, which we can utilise
      to drop our page references prior to the kernel manipulating the vma
      (for either discard or cloning) and so protect normal users.
      v4: Only run the invalidate notifier if the range intercepts the bo.
      v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
      v6: Recheck after reacquire mutex for lost mmu.
      v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
      v8: Fix rebasing error after forwarding porting the back port.
      v9: Limit the userptr to page aligned entries. We now expect userspace
          to handle all the offset-in-page adjustments itself.
      v10: Prevent vma from being copied across fork to avoid issues with cow.
      v11: Drop vma behaviour changes -- locking is nigh on impossible.
           Use a worker to load user pages to avoid lock inversions.
      v12: Use get_task_mm()/mmput() for correct refcounting of mm.
      v13: Use a worker to release the mmu_notifier to avoid lock inversion
      v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
           with its own locking and tree of objects for each mm/mmu_notifier.
      v15: Prevent overlapping userptr objects, and invalidate all objects
           within the mmu_notifier range
      v16: Fix a typo for iterating over multiple objects in the range and
           rearrange error path to destroy the mmu_notifier locklessly.
           Also close a race between invalidate_range and the get_pages_worker.
      v17: Close a race between get_pages_worker/invalidate_range and fresh
           allocations of the same userptr range - and notice that
           struct_mutex was presumed to be held when during creation it wasn't.
      v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
           for the struct sg_table and to clear it before reporting an error.
      v19: Always error out on read-only userptr requests as we don't have the
           hardware infrastructure to support them at the moment.
      v20: Refuse to implement read-only support until we have the required
           infrastructure - but reserve the bit in flags for future use.
      v21: use_mm() is not required for get_user_pages(). It is only meant to
           be used to fix up the kernel thread's current->mm for use with
           copy_user().
      v22: Use sg_alloc_table_from_pages for that chunky feeling
      v23: Export a function for sanity checking dma-buf rather than encode
           userptr details elsewhere, and clean up comments based on
           suggestions by Bradley.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
      Cc: Akash Goel <akash.goel@intel.com>
      Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Reviewed-by: default avatarBrad Volkin <bradley.d.volkin@intel.com>
      [danvet: Frob ioctl allocation to pick the next one - will cause a bit
      of fuss with create2 apparently, but such are the rules.]
      [danvet2: oops, forgot to git add after manual patch application]
      [danvet3: Appease sparse.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      5cc9ed4b
  24. 01 Apr, 2014 1 commit
  25. 22 Jan, 2014 1 commit
  26. 18 Dec, 2013 2 commits
    • Daniel Vetter's avatar
      drm/i915: Drop I915_PARAM_HAS_FULL_PPGTT again · 7d9c4779
      Daniel Vetter authored
      At least for now userspace has no business at all to know that we
      switch address spaces around. For any need it has to know whether hw
      ppgtt is enabled (e.g. to set bits in MI commands correctly) it can
      inquire the existing ppgtt param.
      
      v2: Avoid ternary operator precedence fail (Chris).
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      7d9c4779
    • Ben Widawsky's avatar
      drm/i915: Use multiple VMs -- the point of no return · 7e0d96bc
      Ben Widawsky authored
      As with processes which run on the CPU, the goal of multiple VMs is to
      provide process isolation. Specific to GEN, there is also the ability to
      map more objects per process (2GB each instead of 2Gb-2k total).
      
      For the most part, all the pipes have been laid, and all we need to do
      is remove asserts and actually start changing address spaces with the
      context switch. Since prior to this we've converted the setting of the
      page tables to a streamed version, this is quite easy.
      
      One important thing to point out (since it'd been hotly contested) is
      that with this patch, every context created will have it's own address
      space (provided the HW can do it).
      
      v2: Disable BDW on rebase
      
      NOTE: I tried to make this commit as small as possible. I needed one
      place where I could "turn everything on" and that is here. It could be
      split into finer commits, but I didn't really see much point.
      
      Cc: Eric Anholt <eric@anholt.net>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      7e0d96bc
  27. 12 Nov, 2013 1 commit
    • Mika Kuoppala's avatar
      drm/i915: add i915_get_reset_stats_ioctl · b6359918
      Mika Kuoppala authored
      This ioctl returns reset stats for specified context.
      
      The struct returned contains context loss counters.
      
      reset_count:    all resets across all contexts
      batch_active:   active batches lost on resets
      batch_pending:  pending batches lost on resets
      
      v2: get rid of state tracking completely and deliver only counts. Idea
          from Chris Wilson.
      
      v3: fix commit message
      
      v4: default context handled inside i915_gem_context_get_hang_stats
      
      v5: reset_count only for priviledged process
      
      v6: ctx=0 needs CAP_SYS_ADMIN for batch_* counters (Chris Wilson)
      
      v7: context hang stats never returns NULL
      
      v8: rebased on top of reworked context hang stats
          DRM_RENDER_ALLOW for ioctl
      
      v9: use DEFAULT_CONTEXT_ID. Improve comments for ioctl struct members
      Signed-off-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
      Cc: Ian Romanick <idr@freedesktop.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: default avatarDamien Lespiau <damien.lespiau@intel.com>
      Reviewed-by: default avatarIan Romanick <ian.d.romanick@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      b6359918
  28. 19 Sep, 2013 1 commit
    • Ben Widawsky's avatar
      drm/i915: Add second slice l3 remapping · 35a85ac6
      Ben Widawsky authored
      Certain HSW SKUs have a second bank of L3. This L3 remapping has a
      separate register set, and interrupt from the first "slice". A slice is
      simply a term to define some subset of the GPU's l3 cache. This patch
      implements both the interrupt handler, and ability to communicate with
      userspace about this second slice.
      
      v2:  Remove redundant check about non-existent slice.
      Change warning about interrupts of unknown slices to WARN_ON_ONCE
      Handle the case where we get 2 slice interrupts concurrently, and switch
      the tracking of interrupts to be non-destructive (all Ville)
      Don't enable/mask the second slice parity interrupt for ivb/vlv (even
      though all docs I can find claim it's rsvd) (Ville + Bryan)
      Keep BYT excluded from L3 parity
      
      v3: Fix the slice = ffs to be decremented by one (found by Ville). When
      I initially did my testing on the series, I was using 1-based slice
      counting, so this code was correct. Not sure why my simpler tests that
      I've been running since then didn't pick it up sooner.
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      35a85ac6
  29. 22 Aug, 2013 2 commits
  30. 19 Jul, 2013 1 commit
    • Ben Widawsky's avatar
      drm/i915: Make i915 events part of uapi · cce723ed
      Ben Widawsky authored
      Make the uevent strings part of the user API for people who wish to
      write their own listeners.
      
      v2: Make a space in the string concatenation. (Chad)
      Use the "UEVENT" suffix intead of "EVENT" (Chad)
      Make kernel-doc parseable Docbook comments (Daniel)
      
      v3: Undid reset change introduced in last submission (Daniel)
      Fixed up comments to address removal changes.
      
      Thanks to Daniel Vetter for a majority of the parity error comments.
      
      CC: Chad Versace <chad.versace@linux.intel.com>
      Signed-off-by: default avatarBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      cce723ed
  31. 31 May, 2013 2 commits
  32. 17 Jan, 2013 2 commits
    • Chris Wilson's avatar
      drm/i915: Use the reloc.handle as an index into the execbuffer array · eef90ccb
      Chris Wilson authored
      Using copywinwin10 as an example that is dependent upon emitting a lot
      of relocations (2 per operation), we see improvements of:
      
      c2d/gm45: 618000.0/sec to 623000.0/sec.
      i3-330m: 748000.0/sec to 789000.0/sec.
      
      (measured relative to a baseline with neither optimisations applied).
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarImre Deak <imre.deak@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      eef90ccb
    • Daniel Vetter's avatar
      drm/i915: Allow userspace to hint that the relocations were known · ed5982e6
      Daniel Vetter authored
      Userspace is able to hint to the kernel that its command stream and
      auxiliary state buffers already hold the correct presumed addresses and
      so the relocation process may be skipped if the kernel does not need to
      move any buffers in preparation for the execbuffer. Thus for the common
      case where the allotment of buffers is static between batches, we can
      avoid the overhead of individually checking the relocation entries.
      
      Note that this requires userspace to supply the domain tracking and
      requests for workarounds itself that would otherwise be computed based
      upon the relocation entries.
      
      Using copywinwin10 as an example that is dependent upon emitting a lot
      of relocations (2 per operation), we see improvements of:
      
      c2d/gm45: 618000.0/sec to 632000.0/sec.
      i3-330m: 748000.0/sec to 830000.0/sec.
      
      (measured relative to a baseline with neither optimisations applied).
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarImre Deak <imre.deak@intel.com>
      [danvet: Fixup merge conflict in userspace header due to different
      baseline trees.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      ed5982e6
  33. 17 Dec, 2012 1 commit
    • Daniel Vetter's avatar
      drm/i915: Implement workaround for broken CS tlb on i830/845 · b45305fc
      Daniel Vetter authored
      Now that Chris Wilson demonstrated that the key for stability on early
      gen 2 is to simple _never_ exchange the physical backing storage of
      batch buffers I've tried a stab at a kernel solution. Doesn't look too
      nefarious imho, now that I don't try to be too clever for my own good
      any more.
      
      v2: After discussing the various techniques, we've decided to always blit
      batches on the suspect devices, but allow userspace to opt out of the
      kernel workaround assume full responsibility for providing coherent
      batches. The principal reason is that avoiding the blit does improve
      performance in a few key microbenchmarks and also in cairo-trace
      replays.
      Signed-Off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      [danvet:
      - Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring
        wrap w/a. Suggested by Chris Wilson.
      - Also add the ACTHD check from Chris Wilson for the error state
        dumping, so that we still catch batches when userspace opts out of
        the w/a.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      b45305fc
  34. 04 Oct, 2012 1 commit