1. 08 Feb, 2012 1 commit
    • Daniel Vetter's avatar
      drm/i915: swizzling support for snb/ivb · f691e2f4
      Daniel Vetter authored
      We have to do this manually. Somebody had a Great Idea.
      
      I've measured speed-ups just a few percent above the noise level
      (below 5% for the best case), but no slowdows. Chris Wilson measured
      quite a bit more (10-20% above the usual snb variance) on a more
      recent and better tuned version of sna, but also recorded a few
      slow-downs on benchmarks know for uglier amounts of snb-induced
      variance.
      
      v2: Incorporate Ben Widawsky's preliminary review comments and
      elaborate a bit about the performance impact in the changelog.
      
      v3: Add a comment as to why we don't need to check the 3rd memory
      channel.
      
      v4: Fixup whitespace.
      Acked-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: default avatarEric Anholt <eric@anholt.net>
      Signed-Off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      f691e2f4
  2. 31 Jan, 2012 1 commit
  3. 30 Jan, 2012 6 commits
    • Daniel Vetter's avatar
      drm/i915: rewrite shmem_pread_slow to use copy_to_user · 8461d226
      Daniel Vetter authored
      Like for shmem_pwrite_slow. The only difference is that because we
      read data, we can leave the fetched cachelines in the cpu: In the case
      that the object isn't in the cpu read domain anymore, the clflush for
      the next cpu read domain invalidation will simply drop these
      cachelines.
      
      slow_shmem_bit17_copy is now ununsed, so kill it.
      
      With this patch tests/gem_mmap_gtt now actually works.
      
      v2: add __ to copy_to_user_swizzled as suggested by Chris Wilson.
      
      v3: Fixup the swizzling logic, it swizzled the wrong pages.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38115Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      8461d226
    • Daniel Vetter's avatar
      drm/i915: rewrite shmem_pwrite_slow to use copy_from_user · 8c59967c
      Daniel Vetter authored
      ... instead of get_user_pages, because that fails on non page-backed
      user addresses like e.g. a gtt mapping of a bo.
      
      To get there essentially copy the vfs read path into pagecache. We
      can't call that right away because we have to take care of bit17
      swizzling. To not deadlock with our own pagefault handler we need
      to completely drop struct_mutex, reducing the atomicty-guarantees
      of our userspace abi. Implications for racing with other gem ioctl:
      
      - execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
        already the risk of the pwrite call not being atomic, no degration.
      - read/write access to mmaps: already fully racy, no degration.
      - set_tiling: Calling set_tiling while reading/writing is already
        pretty much undefined, now it just got a bit worse. set_tiling is
        only called by libdrm on unused/new bos, so no problem.
      - set_domain: When changing to the gtt domain while copying (without any
        read/write access, e.g. for synchronization), we might leave unflushed
        data in the cpu caches. The clflush_object at the end of pwrite_slow
        takes care of this problem.
      - truncating of purgeable objects: the shmem_read_mapping_page call could
        reinstate backing storage for truncated objects. The check at the end
        of pwrite_slow takes care of this.
      
      v2:
      - add missing intel_gtt_chipset_flush
      - add __ to copy_from_user_swizzled as suggest by Chris Wilson.
      
      v3: Fixup bit17 swizzling, it swizzled the wrong pages.
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      8c59967c
    • Daniel Vetter's avatar
      drm/i915: fall through pwrite_gtt_slow to the shmem slow path · 5c0480f2
      Daniel Vetter authored
      The gtt_pwrite slowpath grabs the userspace memory with
      get_user_pages. This will not work for non-page backed memory, like a
      gtt mmapped gem object. Hence fall throuh to the shmem paths if we hit
      -EFAULT in the gtt paths.
      
      Now the shmem paths have exactly the same problem, but this way we
      only need to rearrange the code in one write path.
      
      v2: v1 accidentaly falls back to shmem pwrite for phys objects. Fixed.
      
      v3: Make the codeflow around phys_pwrite cleara as suggested by Chris
      Wilson.
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      5c0480f2
    • Daniel Vetter's avatar
      drm/i915: add debugfs file for swizzling information · ea16a3cd
      Daniel Vetter authored
      This will also come handy for the gen6+ swizzling support, where the
      driver is supposed to control swizzling depending upon dram
      configuration.
      
      v2: CxDRB3 are 16 bit regs! Noticed by Chris Wilson.
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      ea16a3cd
    • Daniel Vetter's avatar
      drm/i915: fix swizzle detection for gen3 · c9c4b6f6
      Daniel Vetter authored
      It looks like the desktop variants of i915 and i945 also have the DCC
      register to control dram channel interleave and cpu side bit6
      swizzling.
      
      Unfortunately internal Cspec/ConfigDB documentation for these ancient chips
      have already been dropped and there seem to be no archives. Also
      somebody thought the swizzling behaviour is surely a worthy secret to
      keep and redacted any mention of these fields from the published Intel
      datasheets.
      
      I suspect the hw engineers were really proud of the page coloring
      they've achieved in their first dual channel dram controller with
      bit17 - after all Bspec explains in great length the optimal layout of
      page frame numbers modulo 4 for the color and depth buffers, too.
      Later on when they've started to work on VT-d they shamefully
      discoverd their stupidity and tried to cover the tracks ...
      
      Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch> (i915g)
      Tested-by: Pavel Ondračka <pavel.ondracka@email.cz> (i945g)
      Tested-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42625Signed-Off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      c9c4b6f6
    • Chris Wilson's avatar
      drm/i915: Remove the upper limit on the bo size for mapping into the CPU domain · 068c6ff1
      Chris Wilson authored
      The original intention of comparing the bo against the mappable GTT
      limits was to prevent a subsequent faulting of the bo into the GTT from
      clearing the entire GTT in vain. However, that was clearly a cut'n'paste
      mistake as a CPU mapping never binds the bo into the aperture. Whilst
      there may be some merit to limiting the maximum size of the bo to
      something that can be utilized by the GPU, that limit itself does not
      belong as a safeguard to mmapping the bo, so remove the check entirely.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarEric Anholt <eric@anholt.net>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      068c6ff1
  4. 29 Jan, 2012 13 commits
  5. 26 Jan, 2012 3 commits
  6. 25 Jan, 2012 2 commits
  7. 24 Jan, 2012 1 commit
  8. 21 Jan, 2012 3 commits
  9. 17 Jan, 2012 10 commits