1. 17 May, 2017 10 commits
    • Chris Wilson's avatar
      drm/i915: Create a kmem_cache to allocate struct i915_priolist from · c5cf9a91
      Chris Wilson authored
      The i915_priolist are allocated within an atomic context on a path where
      we wish to minimise latency. If we use a dedicated kmem_cache, we have
      the advantage of a local freelist from which to service new requests
      that should keep the latency impact of an allocation small. Though
      currently we expect the majority of requests to be at default priority
      (and so hit the preallocate priolist), once userspace starts using
      priorities they are likely to use many fine grained policies improving
      the utilisation of a private slab.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-9-chris@chris-wilson.co.uk
      c5cf9a91
    • Chris Wilson's avatar
      drm/i915: Split execlist priority queue into rbtree + linked list · 6c067579
      Chris Wilson authored
      All the requests at the same priority are executed in FIFO order. They
      do not need to be stored in the rbtree themselves, as they are a simple
      list within a level. If we move the requests at one priority into a list,
      we can then reduce the rbtree to the set of priorities. This should keep
      the height of the rbtree small, as the number of active priorities can not
      exceed the number of active requests and should be typically only a few.
      
      Currently, we have ~2k possible different priority levels, that may
      increase to allow even more fine grained selection. Allocating those in
      advance seems a waste (and may be impossible), so we opt for allocating
      upon first use, and freeing after its requests are depleted. To avoid
      the possibility of an allocation failure causing us to lose a request,
      we preallocate the default priority (0) and bump any request to that
      priority if we fail to allocate it the appropriate plist. Having a
      request (that is ready to run, so not leading to corruption) execute
      out-of-order is better than leaking the request (and its dependency
      tree) entirely.
      
      There should be a benefit to reducing execlists_dequeue() to principally
      using a simple list (and reducing the frequency of both rbtree iteration
      and balancing on erase) but for typical workloads, request coalescing
      should be small enough that we don't notice any change. The main gain is
      from improving PI calls to schedule, and the explicit list within a
      level should make request unwinding simpler (we just need to insert at
      the head of the list rather than the tail and not have to make the
      rbtree search more complicated).
      
      v2: Avoid use-after-free when deleting a depleted priolist
      
      v3: Michał found the solution to handling the allocation failure
      gracefully. If we disable all priority scheduling following the
      allocation failure, those requests will be executed in fifo and we will
      ensure that this request and its dependencies are in strict fifo (even
      when it doesn't realise it is only a single list). Normal scheduling is
      restored once we know the device is idle, until the next failure!
      Suggested-by: default avatarMichał Wajdeczko <michal.wajdeczko@intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: default avatarMichał Winiarski <michal.winiarski@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
      6c067579
    • Chris Wilson's avatar
      drm/i915: Use a define for the default priority [0] · e4f815f6
      Chris Wilson authored
      Explicitly assign the default priority, and give it a name. After much
      discussion, we have chosen to call it I915_PRIORITY_NORMAL!
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-7-chris@chris-wilson.co.uk
      e4f815f6
    • Chris Wilson's avatar
      drm/i915: Don't mark an execlists context-switch when idle · a4b2b015
      Chris Wilson authored
      If we *know* that the engine is idle, i.e. we have not more contexts in
      flight, we can skip any spurious CSB idle interrupts. These spurious
      interrupts seem to arrive long after we assert that the engines are
      completely idle, triggering later assertions:
      
      [  178.896646] intel_engine_is_idle(bcs): interrupt not handled, irq_posted=2
      [  178.896655] ------------[ cut here ]------------
      [  178.896658] kernel BUG at drivers/gpu/drm/i915/intel_engine_cs.c:226!
      [  178.896661] invalid opcode: 0000 [#1] SMP
      [  178.896663] Modules linked in: i915(E) x86_pkg_temp_thermal(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) intel_gtt(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) aesni_intel(E) prime_numbers(E) evdev(E) aes_x86_64(E) drm(E) crypto_simd(E) cryptd(E) glue_helper(E) mei_me(E) mei(E) lpc_ich(E) efivars(E) mfd_core(E) battery(E) video(E) acpi_pad(E) button(E) tpm_tis(E) tpm_tis_core(E) tpm(E) autofs4(E) i2c_i801(E) fan(E) thermal(E) i2c_designware_platform(E) i2c_designware_core(E)
      [  178.896694] CPU: 1 PID: 522 Comm: gem_exec_whispe Tainted: G            E   4.11.0-rc5+ #14
      [  178.896702] task: ffff88040aba8d40 task.stack: ffffc900003f0000
      [  178.896722] RIP: 0010:intel_engine_init_global_seqno+0x1db/0x1f0 [i915]
      [  178.896725] RSP: 0018:ffffc900003f3ab0 EFLAGS: 00010246
      [  178.896728] RAX: 0000000000000000 RBX: ffff88040af54000 RCX: 0000000000000000
      [  178.896731] RDX: ffff88041ec933e0 RSI: ffff88041ec8cc48 RDI: ffff88041ec8cc48
      [  178.896734] RBP: ffffc900003f3ac8 R08: 0000000000000000 R09: 000000000000047d
      [  178.896736] R10: 0000000000000040 R11: ffff88040b344f80 R12: 0000000000000000
      [  178.896739] R13: ffff88040bce0000 R14: ffff88040bce52d8 R15: ffff88040bce0000
      [  178.896742] FS:  00007f2cccc2d8c0(0000) GS:ffff88041ec80000(0000) knlGS:0000000000000000
      [  178.896746] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  178.896749] CR2: 00007f41ddd8f000 CR3: 000000040bb03000 CR4: 00000000001406e0
      [  178.896752] Call Trace:
      [  178.896768]  reset_all_global_seqno.part.33+0x4e/0xd0 [i915]
      [  178.896782]  i915_gem_request_alloc+0x304/0x330 [i915]
      [  178.896795]  i915_gem_do_execbuffer+0x8a1/0x17d0 [i915]
      [  178.896799]  ? remove_wait_queue+0x48/0x50
      [  178.896812]  ? i915_wait_request+0x300/0x590 [i915]
      [  178.896816]  ? wake_up_q+0x70/0x70
      [  178.896819]  ? refcount_dec_and_test+0x11/0x20
      [  178.896823]  ? reservation_object_add_excl_fence+0xa5/0x100
      [  178.896835]  i915_gem_execbuffer2+0xab/0x1f0 [i915]
      [  178.896844]  drm_ioctl+0x1e6/0x460 [drm]
      [  178.896858]  ? i915_gem_execbuffer+0x260/0x260 [i915]
      [  178.896862]  ? dput+0xcf/0x250
      [  178.896866]  ? full_proxy_release+0x66/0x80
      [  178.896869]  ? mntput+0x1f/0x30
      [  178.896872]  do_vfs_ioctl+0x8f/0x5b0
      [  178.896875]  ? ____fput+0x9/0x10
      [  178.896878]  ? task_work_run+0x80/0xa0
      [  178.896881]  SyS_ioctl+0x3c/0x70
      [  178.896885]  entry_SYSCALL_64_fastpath+0x17/0x98
      [  178.896888] RIP: 0033:0x7f2ccb455ca7
      [  178.896890] RSP: 002b:00007ffcabec72d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [  178.896894] RAX: ffffffffffffffda RBX: 000055f897a44b90 RCX: 00007f2ccb455ca7
      [  178.896897] RDX: 00007ffcabec74a0 RSI: 0000000040406469 RDI: 0000000000000003
      [  178.896900] RBP: 00007f2ccb70a440 R08: 00007f2ccb70d0a4 R09: 0000000000000000
      [  178.896903] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      [  178.896905] R13: 000055f89782d71a R14: 00007ffcabecf838 R15: 0000000000000003
      [  178.896908] Code: 00 31 d2 4c 89 ef 8d 70 48 41 ff 95 f8 06 00 00 e9 68 fe ff ff be 0f 00 00 00 48 c7 c7 48 dc 37 a0 e8 fa 33 d6 e0 e9 0b ff ff ff <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00
      
      On the other hand, by ignoring the interrupt do we risk running out of
      space in CSB ring? Testing for a few hours suggests not, i.e. that we
      only seem to get the odd delayed CSB idle notification.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-6-chris@chris-wilson.co.uk
      a4b2b015
    • Chris Wilson's avatar
      drm/i915/execlists: Pack the count into the low bits of the port.request · 77f0d0e9
      Chris Wilson authored
      add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187)
      function                                     old     new   delta
      execlists_submit_ports                       262     471    +209
      port_assign.isra                               -     136    +136
      capture                                     6344    6359     +15
      reset_common_ring                            438     452     +14
      execlists_submit_request                     228     238     +10
      gen8_init_common_ring                        334     341      +7
      intel_engine_is_idle                         106     105      -1
      i915_engine_info                            2314    2290     -24
      __i915_gem_set_wedged_BKL                    485     411     -74
      intel_lrc_irq_handler                       1789    1604    -185
      execlists_update_context                     294       -    -294
      
      The most important change there is the improve to the
      intel_lrc_irq_handler and excclist_submit_ports (net improvement since
      execlists_update_context is now inlined).
      
      v2: Use the port_api() for guc as well (even though currently we do not
      pack any counters in there, yet) and hide all port->request_count inside
      the helpers.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-5-chris@chris-wilson.co.uk
      77f0d0e9
    • Chris Wilson's avatar
      drm/i915: Redefine ptr_pack_bits() and friends · 0ce81788
      Chris Wilson authored
      Rebrand the current (pointer | bits) pack/unpack utility macros as
      explicit bit twiddling for PAGE_SIZE so that we can use the more
      flexible underlying macros for different bits.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-4-chris@chris-wilson.co.uk
      0ce81788
    • Chris Wilson's avatar
      drm/i915: Make ptr_unpack_bits() more function-like · 991bfc64
      Chris Wilson authored
      ptr_unpack_bits() is a function-like macro, as such it is meant to be
      replaceable by a function. In this case, we should be passing in the
      out-param as a pointer.
      
      Bizarrely this does affect code generation:
      
      function                                     old     new   delta
      i915_gem_object_pin_map                      409     389     -20
      
      An improvement(?) in this case, but one can't help wonder what
      strict-aliasing optimisations we are preventing.
      
      The generated code looks identical in using ptr_unpack_bits (no extra
      motions to stack, the pointer and bits appear to be kept in registers),
      the difference appears to be code ordering and with a reorder it is able
      to use smaller forward jumps.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-3-chris@chris-wilson.co.uk
      991bfc64
    • Chris Wilson's avatar
      drm/i915: Import the kfence selftests for i915_sw_fence · 47624cc3
      Chris Wilson authored
      A long time ago, I wrote some selftests for the struct kfence idea. Now
      that we have infrastructure in i915/igt for running kselftests, include
      some for i915_sw_fence.
      
      v2: INIT_WORK_ONSTACK/destroy_work_on_stack (Mika)
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-2-chris@chris-wilson.co.uk
      47624cc3
    • Chris Wilson's avatar
      drm/i915: Remove kref from i915_sw_fence · 9310cb7f
      Chris Wilson authored
      My original intention was for i915_sw_fence to be the base class and
      provide the reference count for the container. This was from starting
      with a design to handle async_work. In practice, for i915 we embed
      fences into structs which have their own independent reference counting,
      making the i915_sw_fence.kref duplicitous. If we remove the kref, we
      remove the i915_sw_fence's ability to free itself and its independence,
      it can only exist within a container and must be supplied with a
      callback to handle its release.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-1-chris@chris-wilson.co.uk
      9310cb7f
    • Arkadiusz Hiler's avatar
      drm/i915/gen9: Reintroduce WaEnableYV12BugFixInHalfSliceChicken7 · 0b71cea2
      Arkadiusz Hiler authored
      This basically reverts commit 465418c6
      ("drm/i915/gen9: Remove WaEnableYV12BugFixInHalfSliceChicken7")
      with small addition - marking it as affecting GLK as well.
      
      It was incorrectly considered fixed in production steppings.
      
      References: HSD#2126385, HSD#2131381, HSDES#1504433555, BSID#0764
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Jeff McGee <jeff.mcgee@intel.com>
      Signed-off-by: default avatarArkadiusz Hiler <arkadiusz.hiler@intel.com>
      Reviewed-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
      [Mika: s/KBL/GLK on commit message]
      Signed-off-by: default avatarMika Kuoppala <mika.kuoppala@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170512112015.19082-1-arkadiusz.hiler@intel.com
      0b71cea2
  2. 16 May, 2017 4 commits
  3. 15 May, 2017 3 commits
  4. 13 May, 2017 8 commits
  5. 12 May, 2017 6 commits
  6. 11 May, 2017 6 commits
  7. 10 May, 2017 3 commits