1. 23 May, 2019 9 commits
    • Chris Wilson's avatar
      drm/i915/gtt: Always acquire struct_mutex for gen6_ppgtt_cleanup · d3622099
      Chris Wilson authored
      We rearranged the vm_destroy_ioctl to avoid taking struct_mutex, little
      realising that buried underneath the gen6 ppgtt release path was a
      struct_mutex requirement (to remove its GGTT vma). Until that
      struct_mutex is vanquished, take a detour in gen6_ppgtt_cleanup to do
      the i915_vma_destroy from inside a worker under the struct_mutex.
      
      <4> [257.740160] WARN_ON(debug_locks && !lock_is_held(&(&vma->vm->i915->drm.struct_mutex)->dep_map))
      <4> [257.740213] WARNING: CPU: 3 PID: 1507 at drivers/gpu/drm/i915/i915_vma.c:841 i915_vma_destroy+0x1ae/0x3a0 [i915]
      <4> [257.740214] Modules linked in: snd_hda_codec_hdmi i915 x86_pkg_temp_thermal mei_hdcp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core r8169 realtek snd_pcm mei_me mei prime_numbers lpc_ich
      <4> [257.740224] CPU: 3 PID: 1507 Comm: gem_vm_create Tainted: G     U            5.2.0-rc1-CI-CI_DRM_6118+ #1
      <4> [257.740225] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
      <4> [257.740249] RIP: 0010:i915_vma_destroy+0x1ae/0x3a0 [i915]
      <4> [257.740250] Code: 00 00 00 48 81 c7 c8 00 00 00 e8 ed 08 f0 e0 85 c0 0f 85 78 fe ff ff 48 c7 c6 e8 ec 30 a0 48 c7 c7 da 55 33 a0 e8 42 8c e9 e0 <0f> 0b 8b 83 40 01 00 00 85 c0 0f 84 63 fe ff ff 48 c7 c1 c1 58 33
      <4> [257.740251] RSP: 0018:ffffc90000aafc68 EFLAGS: 00010282
      <4> [257.740252] RAX: 0000000000000000 RBX: ffff8883f7957840 RCX: 0000000000000003
      <4> [257.740253] RDX: 0000000000000046 RSI: 0000000000000006 RDI: ffffffff8212d1b9
      <4> [257.740254] RBP: ffffc90000aafcc8 R08: 0000000000000000 R09: 0000000000000000
      <4> [257.740255] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8883f4d5c2a8
      <4> [257.740256] R13: ffff8883f4d5d680 R14: ffff8883f4d5c668 R15: ffff8883f4d5c2f0
      <4> [257.740257] FS:  00007f777fa8fe40(0000) GS:ffff88840f780000(0000) knlGS:0000000000000000
      <4> [257.740258] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4> [257.740259] CR2: 00007f777f6522b0 CR3: 00000003c612a006 CR4: 00000000001606e0
      <4> [257.740260] Call Trace:
      <4> [257.740283]  gen6_ppgtt_cleanup+0x25/0x60 [i915]
      <4> [257.740306]  i915_ppgtt_release+0x102/0x290 [i915]
      <4> [257.740330]  i915_gem_vm_destroy_ioctl+0x7c/0xa0 [i915]
      <4> [257.740376]  ? i915_gem_vm_create_ioctl+0x160/0x160 [i915]
      <4> [257.740379]  drm_ioctl_kernel+0x83/0xf0
      <4> [257.740382]  drm_ioctl+0x2f3/0x3b0
      <4> [257.740422]  ? i915_gem_vm_create_ioctl+0x160/0x160 [i915]
      <4> [257.740426]  ? _raw_spin_unlock_irqrestore+0x39/0x60
      <4> [257.740430]  do_vfs_ioctl+0xa0/0x6e0
      <4> [257.740433]  ? lock_acquire+0xa6/0x1c0
      <4> [257.740436]  ? __task_pid_nr_ns+0xb9/0x1f0
      <4> [257.740439]  ksys_ioctl+0x35/0x60
      <4> [257.740441]  __x64_sys_ioctl+0x11/0x20
      <4> [257.740443]  do_syscall_64+0x55/0x1c0
      <4> [257.740445]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      References: e0695db7 ("drm/i915: Create/destroy VM (ppGTT) for use with contexts")
      Fixes: 7f3f317a ("drm/i915: Restore control over ppgtt for context creation ABI")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190523064933.23604-1-chris@chris-wilson.co.uk
      d3622099
    • Jani Nikula's avatar
      drm/i915: remove duplicate typedef for intel_wakeref_t · 09a93ef3
      Jani Nikula authored
      Fix the duplicate typedef for intel_wakeref_t leading to Clang build
      issues. While at it, actually make the intel_runtime_pm.h header
      self-contained, which was claimed in the commit being fixed.
      Reported-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Cc: Nathan Chancellor <natechancellor@gmail.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      References: http://mid.mail-archive.com/20190521183850.GA9157@archlinux-epyc
      References: https://travis-ci.com/ClangBuiltLinux/continuous-integration/jobs/201754420#L2435
      Fixes: 0d5adc5f ("drm/i915: extract intel_runtime_pm.h from intel_drv.h")
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Tested-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190522103505.2082-1-jani.nikula@intel.com
      09a93ef3
    • Jani Nikula's avatar
      drm/i915: Update DRIVER_DATE to 20190523 · cfc0e7bb
      Jani Nikula authored
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      cfc0e7bb
    • Gwan-gyeong Mun's avatar
      drm/i915/dp: Support DP ports YUV 4:2:0 output to GEN11 · 47d0ccec
      Gwan-gyeong Mun authored
      Bspec describes that GEN10 only supports capability of YUV 4:2:0 output to
      HDMI port and GEN11 supports capability of YUV 4:2:0 output to both DP and
      HDMI ports.
      
      v2: Minor style fix.
      Signed-off-by: default avatarGwan-gyeong Mun <gwan-gyeong.mun@intel.com>
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521121721.32010-7-gwan-gyeong.mun@intel.com
      47d0ccec
    • Gwan-gyeong Mun's avatar
      drm/i915/dp: Change a link bandwidth computation for DP · 16668f48
      Gwan-gyeong Mun authored
      Data M/N calculations were assumed a bpp as RGB format. But when we are
      using YCbCr 4:2:0 output format on DP, we should change bpp calculations
      as YCbCr 4:2:0 format. The pipe_bpp value was assumed RGB format,
      therefore, it was multiplied with 3. But YCbCr 4:2:0 requires a multiplier
      value to 1.5.
      Therefore we need to divide pipe_bpp to 2 while DP output uses YCbCr4:2:0
      format.
       - RGB format bpp = bpc x 3
       - YCbCr 4:2:0 format bpp = bpc x 1.5
      
      But Link M/N values are calculated and applied based on the Full Clock for
      YCbCr 4:2:0. And DP YCbCr 4:2:0 does not need to pixel clock double for
      a dotclock caluation. Only for HDMI YCbCr 4:2:0 needs to pixel clock double
      for a dot clock calculation.
      
      It only affects dp and edp port which use YCbCr 4:2:0 output format.
      And for now, it does not consider a use case of DSC + YCbCr 4:2:0.
      
      v2:
        Addressed review comments from Ville.
        Remove a changing of pipe_bpp on intel_ddi_set_pipe_settings().
        Because the pipe is running at the full bpp, keep pipe_bpp as RGB
        even though YCbCr 4:2:0 output format is used.
        Add a link bandwidth computation for YCbCr4:2:0 output format.
      
      v3:
        Addressed reivew comments from Ville.
        In order to make codes simple, it adds and uses intel_dp_output_bpp()
        function.
      
      v6:
        Link M/N values are calculated and applied based on the Full Clock for
        YCbCr420. The Bit per Pixel needs to be adjusted for YUV420 mode as it
        requires only half of the RGB case.
          - Link M/N values are calculated and applied based on the Full Clock
          - Data M/N values needs to be calculated considering the data is half
            due to subsampling
        Remove a doubling of pixel clock on a dot clock calculator for
        DP YCbCr 4:2:0.
        Rebase and remove a duplicate setting of vsc_sdp.DB17.
        Add a setting of dynamic range bit to  vsc_sdp.DB17.
        Change Content Type bit to "Graphics" from "Not defined".
        Change a dividing of pipe_bpp to muliplying to constant values on a
        switch-case statement.
      
      v7:
        Addressed review comments from Ville.
        Move a setting of dynamic range bit and a setting of bpc which is based
        on pipe_bpp to a "drm/i915/dp: Program VSC Header and DB for Pixel
        Encoding/Colorimetry Format" commit.
        Change Content Type bit to "Not defined" from "Graphics".
      
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarGwan-gyeong Mun <gwan-gyeong.mun@intel.com>
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521121721.32010-6-gwan-gyeong.mun@intel.com
      16668f48
    • Gwan-gyeong Mun's avatar
      drm/i915/dp: Add a support of YCBCR 4:2:0 to DP MSA · ec4401d3
      Gwan-gyeong Mun authored
      When YCBCR 4:2:0 outputs is used for DP, we should program YCBCR 4:2:0 to
      MSA and VSC SDP.
      
      As per DP 1.4a spec section 2.2.4.3 [MSA Field for Indication of Color
      Encoding Format and Content Color Gamut] while sending YCBCR 420 signals
      we should program MSA MISC1 fields which indicate VSC SDP for the Pixel
      Encoding/Colorimetry Format.
      
      v2: Block comment style fix.
      
      v6:
        Fix an wrong setting of MSA MISC1 fields for Pixel Encoding/Colorimetry
        Format indication. As per DP 1.4a spec Table 2-96 [MSA MISC1 and MISC0
        Fields for Pixel Encoding/Colorimetry Format Indication]
        When MISC1, bit 6, is Set to 1, a Source device uses a VSC SDP to
        indicate the Pixel Encoding/Colorimetry Format. On the wrong version
        it set a bit 5 of MISC1, now it set a bit 6 of MISC1.
      Signed-off-by: default avatarGwan-gyeong Mun <gwan-gyeong.mun@intel.com>
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521121721.32010-5-gwan-gyeong.mun@intel.com
      ec4401d3
    • Gwan-gyeong Mun's avatar
      drm/i915/dp: Program VSC Header and DB for Pixel Encoding/Colorimetry Format · 3c053a96
      Gwan-gyeong Mun authored
      Function intel_pixel_encoding_setup_vsc handles vsc header and data block
      setup for pixel encoding / colorimetry format.
      
      Setup VSC header and data block in function intel_pixel_encoding_setup_vsc
      for pixel encoding / colorimetry format as per dp 1.4a spec,
      section 2.2.5.7.1, table 2-119: VSC SDP Header Bytes, section 2.2.5.7.5,
      table 2-120:VSC SDP Payload for DB16 through DB18.
      
      v2:
        Minor style fix. [Maarten]
        Refer to commit ids instead of patchwork. [Maarten]
      
      v6: Rebase
      
      v7:
        Rebase and addressed review comments from Ville.
        Use a structure initializer instead of memset().
        Fix non-standard comment format.
        Remove a referring to specific commit.
        Add a setting of dynamic range bit to  vsc_sdp.DB17.
        Add a setting of bpc which is based on pipe_bpp.
        Remove duplicated checking of connector's ycbcr_420_allowed from
        intel_pixel_encoding_setup_vsc(). It is already checked from
        intel_dp_ycbcr420_config().
        Remove comments for VSC_SDP_EXTENSION_FOR_COLORIMETRY_SUPPORTED. It is
        already implemented on intel_dp_get_colorimetry_status().
      
      v8:
        A missing of setting bpc to VSC setup is the pretty fatal case, it
        replaces DRM_DEBUG_KMS() to MISSING_CASE(). [Maarten]
      
      v9: Use a changed member name of struct dp_sdp. it renamed to db from DB.
      
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarGwan-gyeong Mun <gwan-gyeong.mun@intel.com>
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521121721.32010-4-gwan-gyeong.mun@intel.com
      3c053a96
    • Gwan-gyeong Mun's avatar
      drm: Rename struct edp_vsc_psr to struct dp_sdp · 4d432f95
      Gwan-gyeong Mun authored
      VSC SDP Payload for PSR is one of data block type of SDP (Secondaray Data
      Packet). In order to generalize SDP packet structure name, it renames
      struct edp_vsc_psr to struct dp_sdp. And each SDP data blocks have
      different usages, each SDP type has different reserved data blocks and
      Video_Stream_Configuration Extension VESA SDP might use all of Data Blocks
      as Extended INFORFRAME Data Byte. so it makes Data Block variables as
      array type. And it adds comments of details of DB of VSC SDP Payload
      for Pixel Encoding/Colorimetry Format. This comments follows DP 1.4a spec,
      section 2.2.5.7.5, chapter "VSC SDP Payload for Pixel Encoding/Colorimetry
      Format".
      
      v7: Addressed review comments from Ville.
      
      v9: Rename a member value name DB to db on struct dp_sdp [Laurent]
      
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarGwan-gyeong Mun <gwan-gyeong.mun@intel.com>
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Acked-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Acked-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521121721.32010-3-gwan-gyeong.mun@intel.com
      4d432f95
    • Gwan-gyeong Mun's avatar
      drm/i915/dp: Add a config function for YCBCR420 outputs · 8e9d645c
      Gwan-gyeong Mun authored
      This patch checks a support of YCBCR420 outputs on an encoder level.
      If the input mode is YCBCR420-only mode then it prepares DP as an YCBCR420
      output, else it continues with RGB output mode.
      It set output_format to INTEL_OUTPUT_FORMAT_YCBCR420 in order to using
      a pipe scaler as RGB to YCbCr 4:4:4.
      
      v2:
        Addressed review comments from Ville.
        Style fixed with few naming.
        %s/config/crtc_state/
        %s/intel_crtc/crtc/
        If lscon is active, it makes not to call intel_dp_ycbcr420_config()
        to avoid to clobber of lspcon_ycbcr420_config() routine.
        And it move the 420_only check into the intel_dp_ycbcr420_config().
      
      v3: Fix uninitialized return value and it is reported by Dan Carpenter.
      
      v4:
        Addressed review comments from Ville.
        In order to avoid the extra indentation, it inverts if-clause on
        intel_dp_ycbcr420_config().
        Remove the error print where no errors print are allowed.
      
      v6: Rebase
      
      v7:
        Move intel_dp_get_colorimetry_status() to intel_dp from intel_psr.
        intel_dp_get_colorimetry_status() checks
        VSC_SDP_EXTENSION_FOR_COLORIMETRY_SUPPORTED bit in the
        DPRX_FEATURE_ENUMERATION_LIST register.
        And intel_dp_ycbcr420_config() uses intel_dp_get_colorimetry_status().
      
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarGwan-gyeong Mun <gwan-gyeong.mun@intel.com>
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521121721.32010-2-gwan-gyeong.mun@intel.com
      8e9d645c
  2. 22 May, 2019 13 commits
    • Tvrtko Ursulin's avatar
      drm/i915: Engine discovery query · c5d3e39c
      Tvrtko Ursulin authored
      Engine discovery query allows userspace to enumerate engines, probe their
      configuration features, all without needing to maintain the internal PCI
      ID based database.
      
      A new query for the generic i915 query ioctl is added named
      DRM_I915_QUERY_ENGINE_INFO, together with accompanying structure
      drm_i915_query_engine_info. The address of latter should be passed to the
      kernel in the query.data_ptr field, and should be large enough for the
      kernel to fill out all known engines as struct drm_i915_engine_info
      elements trailing the query.
      
      As with other queries, setting the item query length to zero allows
      userspace to query minimum required buffer size.
      
      Enumerated engines have common type mask which can be used to query all
      hardware engines, versus engines userspace can submit to using the execbuf
      uAPI.
      
      Engines also have capabilities which are per engine class namespace of
      bits describing features not present on all engine instances.
      
      v2:
       * Fixed HEVC assignment.
       * Reorder some fields, rename type to flags, increase width. (Lionel)
       * No need to allocate temporary storage if we do it engine by engine.
         (Lionel)
      
      v3:
       * Describe engine flags and mark mbz fields. (Lionel)
       * HEVC only applies to VCS.
      
      v4:
       * Squash SFC flag into main patch.
       * Tidy some comments.
      
      v5:
       * Add uabi_ prefix to engine capabilities. (Chris Wilson)
       * Report exact size of engine info array. (Chris Wilson)
       * Drop the engine flags. (Joonas Lahtinen)
       * Added some more reserved fields.
       * Move flags after class/instance.
      
      v6:
       * Do not check engine info array was zeroed by userspace but zero the
         unused fields for them instead.
      
      v7:
       * Simplify length calculation loop. (Lionel Landwerlin)
      
      v8:
       * Remove MBZ comments where not applicable.
       * Rename ABI flags to match engine class define naming.
       * Rename SFC ABI flag to reflect it applies to VCS and VECS.
       * SFC is wired to even _logical_ engine instances.
       * SFC applies to VCS and VECS.
       * HEVC is present on all instances on Gen11. (Tony)
       * Simplify length calculation even more. (Chris Wilson)
       * Move info_ptr assigment closer to loop for clarity. (Chris Wilson)
       * Use vdbox_sfc_access from runtime info.
       * Rebase for RUNTIME_INFO.
       * Refactor for lower indentation.
       * Rename uAPI class/instance to engine_class/instance to avoid C++
         keyword.
      
      v9:
       * Rebase for s/num_rings/num_engines/ in RUNTIME_INFO.
      
      v10:
       * Use new copy_query_item.
      
      v11:
       * Consolidate with struct i915_engine_class_instnace.
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tony Ye <tony.ye@intel.com>
      Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> # v7
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190522090054.6007-1-tvrtko.ursulin@linux.intel.com
      c5d3e39c
    • Tvrtko Ursulin's avatar
      drm/i915/icl: Add WaDisableBankHangMode · cbe3e1d1
      Tvrtko Ursulin authored
      Disable GPU hang by default on unrecoverable ECC cache errors.
      
      v2:
       * Rebase.
      
      v3:
       * Use intel_uncore_read. (Chris)
      
      Fixes: cc38cae7 ("drm/i915/icl: Introduce initial Icelake Workarounds")
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Acked-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190520110442.403-2-tvrtko.ursulin@linux.intel.com
      cbe3e1d1
    • Tvrtko Ursulin's avatar
      drm/i915/selftests: Verify context workarounds · fde93886
      Tvrtko Ursulin authored
      Test context workarounds have been correctly applied in newly created
      contexts.
      
      To accomplish this the existing engine_wa_list_verify helper is extended
      to take in a context from which reading of the workaround list will be
      done.
      
      Context workaround verification is done from the existing subtests, which
      have been renamed to reflect they are no longer only about GT and engine
      workarounds.
      
      v2:
       * Test after resets and refactor to use intel_context more. (Chris)
      
      v3:
       * Use ce->engine->i915 instead of ce->gem_context->i915. (Chris)
       * gem_engine_iter.idx is engine->id + 1. (Chris)
      
      v4:
       * Make local function static.
      Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190520142546.12493-1-tvrtko.ursulin@linux.intel.com
      fde93886
    • Chris Wilson's avatar
      drm/i915: Allow specification of parallel execbuf · a88b6e4c
      Chris Wilson authored
      There is a desire to split a task onto two engines and have them run at
      the same time, e.g. scanline interleaving to spread the workload evenly.
      Through the use of the out-fence from the first execbuf, we can
      coordinate secondary execbuf to only become ready simultaneously with
      the first, so that with all things idle the second execbufs are executed
      in parallel with the first. The key difference here between the new
      EXEC_FENCE_SUBMIT and the existing EXEC_FENCE_IN is that the in-fence
      waits for the completion of the first request (so that all of its
      rendering results are visible to the second execbuf, the more common
      userspace fence requirement).
      
      Since we only have a single input fence slot, userspace cannot mix an
      in-fence and a submit-fence. It has to use one or the other! This is not
      such a harsh requirement, since by virtue of the submit-fence, the
      secondary execbuf inherit all of the dependencies from the first
      request, and for the application the dependencies should be common
      between the primary and secondary execbuf.
      Suggested-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Testcase: igt/gem_exec_fence/parallel
      Link: https://github.com/intel/media-driver/pull/546Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-10-chris@chris-wilson.co.uk
      a88b6e4c
    • Chris Wilson's avatar
      drm/i915/execlists: Virtual engine bonding · ee113690
      Chris Wilson authored
      Some users require that when a master batch is executed on one particular
      engine, a companion batch is run simultaneously on a specific slave
      engine. For this purpose, we introduce virtual engine bonding, allowing
      maps of master:slaves to be constructed to constrain which physical
      engines a virtual engine may select given a fence on a master engine.
      
      For the moment, we continue to ignore the issue of preemption deferring
      the master request for later. Ideally, we would like to then also remove
      the slave and run something else rather than have it stall the pipeline.
      With load balancing, we should be able to move workload around it, but
      there is a similar stall on the master pipeline while it may wait for
      the slave to be executed. At the cost of more latency for the bonded
      request, it may be interesting to launch both on their engines in
      lockstep. (Bubbles abound.)
      
      Opens: Also what about bonding an engine as its own master? It doesn't
      break anything internally, so allow the silliness.
      
      v2: Emancipate the bonds
      v3: Couple in delayed scheduling for the selftests
      v4: Handle invalid mutually exclusive bonding
      v5: Mention what the uapi does
      v6: s/nbond/num_bonds/
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-9-chris@chris-wilson.co.uk
      ee113690
    • Chris Wilson's avatar
      drm/i915: Extend execution fence to support a callback · f71e01a7
      Chris Wilson authored
      In the next patch, we will want to configure the slave request
      depending on which physical engine the master request is executed on.
      For this, we introduce a callback from the execute fence to convey this
      information.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-8-chris@chris-wilson.co.uk
      f71e01a7
    • Chris Wilson's avatar
      drm/i915: Apply an execution_mask to the virtual_engine · 78e41ddd
      Chris Wilson authored
      Allow the user to direct which physical engines of the virtual engine
      they wish to execute one, as sometimes it is necessary to override the
      load balancing algorithm.
      
      v2: Only kick the virtual engines on context-out if required
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-7-chris@chris-wilson.co.uk
      78e41ddd
    • Chris Wilson's avatar
      drm/i915: Load balancing across a virtual engine · 6d06779e
      Chris Wilson authored
      Having allowed the user to define a set of engines that they will want
      to only use, we go one step further and allow them to bind those engines
      into a single virtual instance. Submitting a batch to the virtual engine
      will then forward it to any one of the set in a manner as best to
      distribute load.  The virtual engine has a single timeline across all
      engines (it operates as a single queue), so it is not able to concurrently
      run batches across multiple engines by itself; that is left up to the user
      to submit multiple concurrent batches to multiple queues. Multiple users
      will be load balanced across the system.
      
      The mechanism used for load balancing in this patch is a late greedy
      balancer. When a request is ready for execution, it is added to each
      engine's queue, and when an engine is ready for its next request it
      claims it from the virtual engine. The first engine to do so, wins, i.e.
      the request is executed at the earliest opportunity (idle moment) in the
      system.
      
      As not all HW is created equal, the user is still able to skip the
      virtual engine and execute the batch on a specific engine, all within the
      same queue. It will then be executed in order on the correct engine,
      with execution on other virtual engines being moved away due to the load
      detection.
      
      A couple of areas for potential improvement left!
      
      - The virtual engine always take priority over equal-priority tasks.
      Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
      and hopefully the virtual and real engines are not then congested (i.e.
      all work is via virtual engines, or all work is to the real engine).
      
      - We require the breadcrumb irq around every virtual engine request. For
      normal engines, we eliminate the need for the slow round trip via
      interrupt by using the submit fence and queueing in order. For virtual
      engines, we have to allow any job to transfer to a new ring, and cannot
      coalesce the submissions, so require the completion fence instead,
      forcing the persistent use of interrupts.
      
      - We only drip feed single requests through each virtual engine and onto
      the physical engines, even if there was enough work to fill all ELSP,
      leaving small stalls with an idle CS event at the end of every request.
      Could we be greedy and fill both slots? Being lazy is virtuous for load
      distribution on less-than-full workloads though.
      
      Other areas of improvement are more general, such as reducing lock
      contention, reducing dispatch overhead, looking at direct submission
      rather than bouncing around tasklets etc.
      
      sseu: Lift the restriction to allow sseu to be reconfigured on virtual
      engines composed of RENDER_CLASS (rcs).
      
      v2: macroize check_user_mbz()
      v3: Cancel virtual engines on wedging
      v4: Commence commenting
      v5: Replace 64b sibling_mask with a list of class:instance
      v6: Drop the one-element array in the uabi
      v7: Assert it is an virtual engine in to_virtual_engine()
      v8: Skip over holes in [class][inst] so we can selftest with (vcs0, vcs2)
      
      Link: https://github.com/intel/media-driver/pull/283Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-6-chris@chris-wilson.co.uk
      6d06779e
    • Chris Wilson's avatar
      drm/i915: Allow userspace to clone contexts on creation · b81dde71
      Chris Wilson authored
      A usecase arose out of handling context recovery in mesa, whereby they
      wish to recreate a context with fresh logical state but preserving all
      other details of the original. Currently, they create a new context and
      iterate over which bits they want to copy across, but it would much more
      convenient if they were able to just pass in a target context to clone
      during creation. This essentially extends the setparam during creation
      to pull the details from a target context instead of the user supplied
      parameters.
      
      The ideal here is that we don't expose control over anything more than
      can be obtained via CONTEXT_PARAM. That is userspace retains explicit
      control over all features, and this api is just convenience.
      
      For example, you could replace
      
      	struct context_param p = { .param = CONTEXT_PARAM_VM };
      
      	param.ctx_id = old_id;
      	gem_context_get_param(&p.param);
      
      	new_id = gem_context_create();
      
      	param.ctx_id = new_id;
      	gem_context_set_param(&p.param);
      
      	gem_vm_destroy(param.value); /* drop the ref to VM_ID handle */
      
      with
      
      	struct create_ext_param p = {
      	  { .name = CONTEXT_CREATE_CLONE },
      	  .clone_id = old_id,
      	  .flags = CLONE_FLAGS_VM
      	}
      	new_id = gem_context_create_ext(&p);
      
      and not have to worry about stray namespace pollution etc.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-5-chris@chris-wilson.co.uk
      b81dde71
    • Chris Wilson's avatar
      drm/i915: Re-expose SINGLE_TIMELINE flags for context creation · 8319f44c
      Chris Wilson authored
      The SINGLE_TIMELINE flag can be used to create a context such that all
      engine instances within that context share a common timeline. This can
      be useful for mixing operations between real and virtual engines, or
      when using a composite context for a single client API context.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-4-chris@chris-wilson.co.uk
      8319f44c
    • Chris Wilson's avatar
      drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] · e620f7b3
      Chris Wilson authored
      Allow the user to specify a local engine index (as opposed to
      class:index) that they can use to refer to a preset engine inside the
      ctx->engine[] array defined by an earlier I915_CONTEXT_PARAM_ENGINES.
      This will be useful for setting SSEU parameters on virtual engines that
      are local to the context and do not have a valid global class:instance
      lookup.
      
      Note that due to the ambiguity in using class:instance with
      ctx->engines[], if a user supplied engine map is active the user must
      specify the engine to alter by its index into the ctx->engines[].
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-3-chris@chris-wilson.co.uk
      e620f7b3
    • Chris Wilson's avatar
      drm/i915: Allow a context to define its set of engines · 976b55f0
      Chris Wilson authored
      Over the last few years, we have debated how to extend the user API to
      support an increase in the number of engines, that may be sparse and
      even be heterogeneous within a class (not all video decoders created
      equal). We settled on using (class, instance) tuples to identify a
      specific engine, with an API for the user to construct a map of engines
      to capabilities. Into this picture, we then add a challenge of virtual
      engines; one user engine that maps behind the scenes to any number of
      physical engines. To keep it general, we want the user to have full
      control over that mapping. To that end, we allow the user to constrain a
      context to define the set of engines that it can access, order fully
      controlled by the user via (class, instance). With such precise control
      in context setup, we can continue to use the existing execbuf uABI of
      specifying a single index; only now it doesn't automagically map onto
      the engines, it uses the user defined engine map from the context.
      
      v2: Fixup freeing of local on success of get_engines()
      v3: Allow empty engines[]
      v4: s/nengine/num_engines/
      v5: Replace 64 limit on num_engines with a note that execbuf is
      currently limited to only using the first 64 engines.
      v6: Actually use the engines_mutex to guard the ctx->engines.
      
      Testcase: igt/gem_ctx_engines
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-2-chris@chris-wilson.co.uk
      976b55f0
    • Chris Wilson's avatar
      drm/i915: Restore control over ppgtt for context creation ABI · 7f3f317a
      Chris Wilson authored
      Having hid the partially exposed new ABI from the PR, put it back again
      for completion of context recovery. A significant part of context
      recovery is the ability to reuse as much of the old context as is
      feasible (to avoid expensive reconstruction). The biggest chunk kept
      hidden at the moment is fine-control over the ctx->ppgtt (the GPU page
      tables and associated translation tables and kernel maps), so make
      control over the ctx->ppgtt explicit.
      
      This allows userspace to create and share virtual memory address spaces
      (within the limits of a single fd) between contexts they own, along with
      the ability to query the contexts for the vm state.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-1-chris@chris-wilson.co.uk
      7f3f317a
  3. 20 May, 2019 8 commits
  4. 17 May, 2019 7 commits
    • Chris Wilson's avatar
      drm/i915/execlists: Drop promotion on unsubmit · 4cc79cbb
      Chris Wilson authored
      With the disappearance of NEWCLIENT, we no longer need to provide the
      priority boost on preemption in order to prevent repeated gazumping,
      and we can remove the dead code.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-5-chris@chris-wilson.co.uk
      4cc79cbb
    • Chris Wilson's avatar
      drm/i915: Downgrade NEWCLIENT to non-preemptive · 68fc728b
      Chris Wilson authored
      Commit 1413b2bc ("drm/i915: Trim NEWCLIENT boosting") had the
      intended consequence of not allowing a sequence of work that merely
      crossed into a new engine the privilege to be promoted to NEWCLIENT
      status. It also had the unintended consequence of actually making
      NEWCLIENT effective on heavily oversubscribed transcode machines and
      impacting upon their throughput.
      
      If we consider a client packet composed of (rcsA, rcsB, vcs) and 30 of
      those clients, using the NEWCLIENT boost that will be scheduled as
      
      	rcsA x 30, (rcsB, vcs) x 30
      
      where as before it would have been
      
      	(rcsA, rcsB, vcs) x 30
      
      That is with NEWCLIENT only boosting the first request of each client,
      we would execute all rcsA requests prior to running on the vcs engines;
      acruing a lot of dead time as compared to the previous case where the
      vcs engine would be started in parallel to processing the second client.
      
      The previous patch has the effect of delaying submission until it is
      required by a third party (either the user with an explicit wait, or by
      another client/engine). We reduce the NEWCLIENT bump to a mere WAIT,
      which has the effect of removing its preemptive grant and reducing it to
      the same level as any other user interaction -- that it will not be
      promoted above the interengine dependencies, and so preventing NEWCLIENTS
      from starving other engines. This a large nerf to the rrul properties of
      the current NEWCLIENT, but it still does give prioritised submission to
      new requests from light workloads.
      
      References: b16c7651 ("drm/i915: Priority boost for new clients")
      Fixes: 1413b2bc ("drm/i915: Trim NEWCLIENT boosting") # customer impact
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
      Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-4-chris@chris-wilson.co.uk
      68fc728b
    • Chris Wilson's avatar
      drm/i915: Bump signaler priority on adding a waiter · 6e7eb7a8
      Chris Wilson authored
      The handling of the no-preemption priority level imposes the restriction
      that we need to maintain the implied ordering even though preemption is
      disabled. Otherwise we may end up with an AB-BA deadlock across multiple
      engine due to a real preemption event reordering the no-preemption
      WAITs. To resolve this issue we currently promote all requests to WAIT
      on unsubmission, however this interferes with the timeslicing
      requirement that we do not apply any implicit promotion that will defeat
      the round-robin timeslice list. (If we automatically promote the active
      request it will go back to the head of the queue and not the tail!)
      
      So we need implicit promotion to prevent reordering around semaphores
      where we are not allowed to preempt, and we must avoid implicit
      promotion on unsubmission. So instead of at unsubmit, if we apply that
      implicit promotion on adding the dependency, we avoid the semaphore
      deadlock and we also reduce the gains made by the promotion for user
      space waiting. Furthermore, by keeping the earlier dependencies at a
      higher level, we reduce the search space for timeslicing without
      altering runtime scheduling too badly (no dependencies at all will be
      assigned a higher priority for rrul).
      
      v2: Limit the bump to external edges (as originally intended) i.e.
      between contexts and out to the user.
      
      Testcase: igt/gem_concurrent_blit
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-3-chris@chris-wilson.co.uk
      6e7eb7a8
    • Chris Wilson's avatar
      drm/i915/hdcp: Use both bits for device_count · af461ff3
      Chris Wilson authored
      Smatch spotted:
      drivers/gpu/drm/i915//intel_hdcp.c:1406 hdcp2_authenticate_repeater_topology() warn: should this be a bitwise op?
      
      and indeed looks to be suspect that we do need to use a bitwise or to
      combine the two register fields into one counter.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Ramalingam C <ramalingam.c@intel.com>
      Reviewed-by: default avatarRamalingam C <ramalingam.c@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190517102225.3069-3-chris@chris-wilson.co.uk
      af461ff3
    • Chris Wilson's avatar
      drm/i915/dp: Initialise locals for static analysis · 96ac0813
      Chris Wilson authored
      Just to squelch an smatch warning that doesn't see the with_() being
      taken unconditionally:
      drivers/gpu/drm/i915//intel_dp.c:230 intel_dp_get_fia_supported_lane_count() error: uninitialized symbol 'lane_info'.
      drivers/gpu/drm/i915//intel_dp.c:5338 intel_digital_port_connected() error: uninitialized symbol 'is_connected'.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Imre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarImre Deak <imre.deak@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190517102225.3069-2-chris@chris-wilson.co.uk
      96ac0813
    • Chris Wilson's avatar
      drm/i915: Truly bump ready tasks ahead of busywaits · 17db337f
      Chris Wilson authored
      In commit b7404c7e ("drm/i915: Bump ready tasks ahead of
      busywaits"), I tried cutting a corner in order to not install a signal
      for each of our dependencies, and only listened to requests on which we
      were intending to busywait. The compromise that was made was that
      instead of then being able to promote the request with a full
      NOSEMAPHORE like its non-busywaiting brethren, as we had not ensured we
      had cleared the semaphore chain, we settled for only using the NEWCLIENT
      boost. With an over saturated system with multiple NEWCLIENTS in flight
      at any time, this was found to be an inadequate promotion and left us
      with a much poorer scheduling order than prior to using semaphores.
      
      The outcome of this patch, is that all requests have NOSEMAPHORE
      priority when they have no dependencies and are ready to run and not
      busywait, restoring the pre-semaphore ordering on saturated systems.
      
      We can demonstrate the effect of poor scheduling order by oversaturating
      the system using gem_wsim on a system with multiple vcs engines
      (i.e running the same workloads across more clients than required for
      peak throughput, e.g. media_load_balance_17i7.wsim -c4 -b context):
      
      x v5.1 (normalized)
      + tip
      * fix
      +------------------------------------------------------------------------+
      |                                                                    x   |
      |                                                                    x   |
      |                                                                    x   |
      |                                                                    x   |
      |                                                                   %x   |
      |                                                                  %%x   |
      |                                                                  %%x   |
      |                                                                  %%x   |
      |                                                                  %%x   |
      |                                                                  %%x   |
      |                                                                  %%x   |
      |                                                                  %%x   |
      |                                                                  %%x   |
      |                                                                  %%x   |
      |                                                                  %%x   |
      |                                                                  %#x   |
      |                                                                  %#x   |
      |                                                                  %#x   |
      |                                                                  %#x   |
      |                                                                  %#x   |
      |         +                                                        %#xx  |
      |         +                                                        %#xx  |
      |         +                                                       %%#xx  |
      |         +                                                       %%#xx  |
      |         +                                                       %%#xx  |
      |         +                                                       %%#xx  |
      |         +                                                       %%##x  |
      |         +++                                                     %%##x  |
      |         +++                                                     %%##x  |
      |         +++                                                     %%##x  |
      |        ++++                                                     %%##x  |
      |        ++++                                                     %%##x  |
      |        ++++                                                     %%##xx |
      |        ++++                                                     %###xx |
      |        ++++                                                     %###xx |
      |        ++++                                                     %###xx |
      |        ++++                                                     %###xx |
      |        ++++ +                                                   %#O#xx |
      |        ++++ +                                                   %#O#xx |
      |        ++++++ +                                                 %#O#xx |
      |       ++++++++++                                                %OOOxxx|
      |       ++++++++++       +                                       %#OOO#xx|
      |     + ++++++++++++ ++ +++++    +                        ++    @@OOOO#xx|
      |                                                                   |A_| |
      ||__________M_______A____________________|                               |
      |                                                                 |A_|   |
      +------------------------------------------------------------------------+
          N           Min           Max        Median           Avg        Stddev
      x 120       0.99456       1.00628      0.999985     1.0001545  0.0024387139
      + 120      0.873021       1.00037      0.884134    0.90148752   0.039190862
      Difference at 99.5% confidence
      	-0.098667 +/- 0.0110762
      	-9.86517% +/- 1.10745%
      	(Student's t, pooled s = 0.0277657)
      % 120      0.990207       1.00165     0.9970265    0.99699748     0.0021024
      Difference at 99.5% confidence
      	-0.003157 +/- 0.000908245
      	-0.315651% +/- 0.0908105%
      	(Student's t, pooled s = 0.00227678)
      
      Fixes: b7404c7e ("drm/i915: Bump ready tasks ahead of busywaits")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
      Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-2-chris@chris-wilson.co.uk
      17db337f
    • Chris Wilson's avatar
      drm/i915: Mark semaphores as complete on unsubmit out if payload was started · dba5a7f3
      Chris Wilson authored
      Avoid charging us for the presumed busywait if the request was preempted
      after successfully using semaphores to reduce inter-engine latency.
      
      v2: Bump the priority to reflect the lack of semaphores now required.
      
      References: ca6e56f6 ("drm/i915: Disable semaphore busywaits on saturated systems")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-1-chris@chris-wilson.co.uk
      dba5a7f3
  5. 14 May, 2019 3 commits