- 06 Dec, 2022 4 commits
-
-
Chris Wilson authored
VT-d may cause overfetch of the scanout PTE, both before and after the vma (depending on the scanout orientation). bspec recommends that we provide a tile-row in either directions, and suggests using 168 PTE, warning that the accesses will wrap around the ends of the GGTT. Currently, we fill the entire GGTT with scratch pages when using VT-d to always ensure there are valid entries around every vma, including scanout. However, writing every PTE is slow as on recent devices we perform 8MiB of uncached writes, incurring an extra 100ms during resume. If instead we focus on only putting guard pages around scanout, we can avoid touching the whole GGTT. To avoid having to introduce extra nodes around each scanout vma, we adjust the scanout drm_mm_node to be smaller than the allocated space, and fixup the extra PTE during dma binding. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221130235805.221010-5-andi.shyti@linux.intel.com
-
Chris Wilson authored
Introduce the concept of padding the i915_vma with guard pages before and after. The major consequence is that all ordinary uses of i915_vma must use i915_vma_offset/i915_vma_size and not i915_vma.node.start/size directly, as the drm_mm_node will include the guard pages that surround our object. The biggest connundrum is how exactly to mix requesting a fixed address with guard pages, particularly through the existing uABI. The user does not know about guard pages, so such must be transparent to the user, and so the execobj.offset must be that of the object itself excluding the guard. So a PIN_OFFSET_FIXED must then be exclusive of the guard pages. The caveat is that some placements will be impossible with guard pages, as wrap arounds need to be avoided, and the vma itself will require a larger node. We must not report EINVAL but ENOSPC as these are unavailable locations within the GTT rather than conflicting user requirements. In the next patch, we start using guard pages for scanout objects. While these are limited to GGTT vma, on a few platforms these vma (or at least an alias of the vma) is shared with userspace, so we may leak the existence of such guards if we are not careful to ensure that the execobj.offset is transparent and excludes the guards. (On such platforms like ivb, without full-ppgtt, userspace has to use relocations so the presence of more untouchable regions within its GTT such be of no further issue.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221201203912.346110-1-andi.shyti@linux.intel.com
-
Chris Wilson authored
We already wrap i915_vma.node.start for use with the GGTT, as there we can perform additional sanity checks that the node belongs to the GGTT and fits within the 32b registers. In the next couple of patches, we will introduce guard pages around the objects _inside_ the drm_mm_node allocation. That is we will offset the vma->pages so that the first page is at drm_mm_node.start + vma->guard (not 0 as is currently the case). All users must then not use i915_vma.node.start directly, but compute the guard offset, thus all users are converted to use a i915_vma_offset() wrapper. The notable exceptions are the selftests that are testing exact behaviour of i915_vma_pin/i915_vma_insert. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221130235805.221010-3-andi.shyti@linux.intel.com
-
Andi Shyti authored
The coming commit "drm/i915: Introduce guard pages to i915_vma" from Chris, was originally changing display_alignment to u32 from u64. The reason is that the display GGTT is and will be limited o 4GB. Put it in a separate patch and use "max(...)" instead of "max_t(64, ...)" when asigning the value. We can safely use max as we know beforehand that the comparison is between two u32 variables. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221130235805.221010-2-andi.shyti@linux.intel.com
-
- 05 Dec, 2022 3 commits
-
-
Daniele Ceraolo Spurio authored
Invalidating the GuC TLBs while GuC is not loaded does not have negative consequences, so if we're starting the driver with GuC enabled we can use the GGTT invalidation function from the get-go, instead of switching to it when we initialize the GuC objects. In MTL, this fixes and issue where we try to overwrite the invalidation function twice (once for each GuC), due to the GGTT being shared between the primary and media GTs Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221110175823.3867135-1-daniele.ceraolospurio@intel.com
-
Matt Roper authored
The TGL/RKL/DG1/ADL performance tuning guide suggests programming a literal value of 0x2FC0100F for this register. The register's hardware default value is 0x2FC0108F, so this translates to just clearing one bit. Take this opportunity to also clean up the register definition and re-write its existing bits/fields in the preferred notation. Bspec: 31870 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221201222210.344152-1-matthew.d.roper@intel.com
-
Matt Roper authored
When determining whether the platform has a hardware-level steering semaphore (i.e., MTL and beyond), we need to use GRAPHICS_VER_FULL() to compare the full version rather than just the major version number returned by GRAPHICS_VER(). Reported-by: kernel test robot <lkp@intel.com> Fixes: 3100240b ("drm/i915/mtl: Add hardware-level lock for steering") Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221202223528.714491-1-matthew.d.roper@intel.com
-
- 02 Dec, 2022 1 commit
-
-
Matt Roper authored
Starting with MTL, the driver needs to not only protect the steering control register from simultaneous software accesses, but also protect against races with hardware/firmware agents. The hardware provides a dedicated locking mechanism to support this via the MTL_STEER_SEMAPHORE register. Reading the register acts as a 'trylock' operation; the read will return 0x1 if the lock is acquired or 0x0 if something else is already holding the lock; once acquired, writing 0x1 to the register will release the lock. We'll continue to grab the software lock as well, just so lockdep can track our locking; assuming the hardware lock is behaving properly, there should never be any contention on the software lock in this case. v2: - Extend hardware semaphore timeout and add a taint for CI if it ever happens (this would imply misbehaving hardware/firmware). (Mika) - Add "MTL_" prefix to new steering semaphore register. (Mika) Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221128233014.4000136-5-matthew.d.roper@intel.com
-
- 01 Dec, 2022 2 commits
-
-
Matt Roper authored
PPAT setup involves a series of multicast writes. This can be optimized slightly be acquiring forcewake and the steering lock just once for the entire sequence. v2: - We should use FW_REG_WRITE instead of FW_REG_READ. (Bala) Suggested-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221130155852.19601-1-matthew.d.roper@intel.com
-
Wayne Boyer authored
As per the performance tuning guide, set the HOSTCACHEEN bit to implement the recommended caching policy on PVC. Signed-off-by: Wayne Boyer <wayne.boyer@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221130170723.2460014-1-wayne.boyer@intel.com
-
- 30 Nov, 2022 6 commits
-
-
John Harrison authored
The GuC firmware includes an extra version number to specify the submission API level. So use that rather than the main firmware version number for submission related checks. Also, while it is guaranteed that GuC version number components are only 8-bits in size, other firmwares do not have that restriction. So stop making assumptions about them generically fitting in a u16 individually, or in a u32 as a combined 8.8.8. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221129232031.3401386-4-John.C.Harrison@Intel.com
-
John Harrison authored
As a precursor to a coming change (for adding a GuC submission API version), abstract the UC version number into its own private structure separate to the firmware filename. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221129232031.3401386-3-John.C.Harrison@Intel.com
-
John Harrison authored
The way delimiters (underscores and dots) were added to the UC filenames was different for different types of delimiter. Rationalise them to all be done the same way - implicitly in the concatenation macro rather than explicitly in the file name prefix. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221129232031.3401386-2-John.C.Harrison@Intel.com
-
Matt Roper authored
We've been overloading uncore->lock to protect access to the MCR steering register. That's not really what uncore->lock is intended for, and it would be better if we didn't need to hold such a high-traffic spinlock for the whole sequence of (apply steering, access MCR register, restore steering). Let's create a dedicated MCR lock to protect the steering control register over this critical section and stop relying on the high-traffic uncore->lock. For now the new lock is a software lock. However some platforms (MTL and beyond) have a hardware-provided locking mechanism that can be used to serialize not only software accesses, but also hardware/firmware accesses as well; support for that hardware level lock will be added in a future patch. v2: - Use irqsave/irqrestore spinlock calls; platforms using execlist submission rather than GuC submission can perform MCR accesses in interrupt context because reset -> errordump happens in a tasklet. Cc: Chris Wilson <chris.p.wilson@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221128233014.4000136-4-matthew.d.roper@intel.com
-
Matt Roper authored
Passing the GT rather than uncore to the lowest level MCR read and write functions will make it easier to introduce dedicated MCR locking in a following patch. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221128233014.4000136-3-matthew.d.roper@intel.com
-
Matt Roper authored
The kerneldoc function name was not updated when this function was converted to a non-fw form. Fixes: 192bb40f ("drm/i915/gt: Manage uncore->lock while waiting on MCR register") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221128233014.4000136-2-matthew.d.roper@intel.com
-
- 29 Nov, 2022 2 commits
-
-
Daniele Ceraolo Spurio authored
The fence is only tracking if the HuC load is in progress or not and doesn't distinguish between already loaded, not supported or disabled, so we can always initialize it to completed, no matter the actual support. We already do that for most platforms, but we skip it on GTs that lack VCS engines (e.g. MTL root GT), so fix that. Note that the cleanup is already unconditional. While at it, move the init/fini to helper functions. Fixes: 02224691 ("drm/i915/huc: fix leak of debug object in huc load fence on driver unload") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221123235417.1475709-1-daniele.ceraolospurio@intel.com
-
Aravind Iddamsetty authored
On XE_LPM+ platforms the media engines are carved out into a separate GT but have a common GGTMMADR address range which essentially makes the GGTT address space to be shared between media and render GT. As a result any updates in GGTT shall invalidate TLB of GTs sharing it and similarly any operation on GGTT requiring an action on a GT will have to involve all GTs sharing it. setup_private_pat was being done on a per GGTT based as that doesn't touch any GGTT structures moved it to per GT based. BSPEC: 63834 v2: 1. Add details to commit msg 2. includes fix for failure to add item to ggtt->gt_list, as suggested by Lucas 3. as ggtt_flush() is used only for ggtt drop i915_is_ggtt check within it. 4. setup_private_pat moved out of intel_gt_tiles_init v3: 1. Move out for_each_gt from i915_driver.c (Jani Nikula) v4: drop using RCU primitives on ggtt->gt_list as it is not an RCU list (Matt Roper) Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221122070126.4813-1-aravind.iddamsetty@intel.com
-
- 28 Nov, 2022 2 commits
-
-
Matt Atwood authored
Wa_18019271663 applies to all DG2 steppings and skus. Bspec: 66622 Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221123183648.407058-2-matthew.s.atwood@intel.com
-
Matt Atwood authored
Wa_18018764978 applies to specific steppings of DG2 (G10 C0+, G11 and G12 A0+). Clean up style in function at the same time. Bspec: 66622 Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221123183648.407058-1-matthew.s.atwood@intel.com
-
- 24 Nov, 2022 2 commits
-
-
Janusz Krzysztofik authored
Users of intel_gt_retire_requests_timeout() expect 0 return value on success. However, we have no protection from passing back 0 potentially returned by a call to dma_fence_wait_timeout() when it succedes right after its timeout has expired. Replace 0 with -ETIME before potentially using the timeout value as return code, so -ETIME is returned if there are still some requests not retired after timeout, 0 otherwise. v3: Use conditional expression, more compact but also better reflecting intention standing behind the change. v2: Move the added lines down so flush_submission() is not affected. Fixes: f33a8a51 ("drm/i915: Merge wait_for_timelines with retire_request") Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221121145655.75141-3-janusz.krzysztofik@linux.intel.com
-
Janusz Krzysztofik authored
Commit b97060a9 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC") extended the API of intel_gt_retire_requests_timeout() with an extra argument 'remaining_timeout', intended for passing back unconsumed portion of requested timeout when 0 (success) is returned. However, when request retirement happens to succeed despite an error returned by a call to dma_fence_wait_timeout(), that error code (a negative value) is passed back instead of remaining time. If we then pass that negative value forward as requested timeout to intel_uc_wait_for_idle(), an explicit BUG will be triggered. If request retirement succeeds but an error code is passed back via remaininig_timeout, we may have no clue on how much of the initial timeout might have been left for spending it on waiting for GuC to become idle. OTOH, since all pending requests have been successfully retired, that error code has been already ignored by intel_gt_retire_requests_timeout(), then we shouldn't fail. Assume no more time has been left on error and pass 0 timeout value to intel_uc_wait_for_idle() to give it a chance to return success if GuC is already idle. v3: Don't fail on any error passed back via remaining_timeout. v2: Fix the issue on the caller side, not the provider. Fixes: b97060a9 ("drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC") Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: stable@vger.kernel.org # v5.15+ Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221121145655.75141-2-janusz.krzysztofik@linux.intel.com
-
- 23 Nov, 2022 3 commits
-
-
John Harrison authored
It was noticed that the table order verification step was only being run once rather than once per firmware type. Fix that. Note that the long term plan is to convert this code to be a mock selftest. It is already only compiled in when selftests are enabled. And the work involved in the conversion was estimated to be non-trivial. So that conversion is currently low on the priority list. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221122233328.854217-1-John.C.Harrison@Intel.com
-
Daniele Ceraolo Spurio authored
The fence is always initialized in huc_init_early, but the cleanup in huc_fini is only being run if HuC is enabled. This causes a leaking of the debug object when HuC is disabled/not supported, which can in turn trigger a warning if we try to register a new debug offset at the same address on driver reload. To fix the issue, make sure to always run the cleanup code. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reported-by: Brian Norris <briannorris@chromium.org> Fixes: 27536e03 ("drm/i915/huc: track delayed HuC load with a fence") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Brian Norris <briannorris@chromium.org> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Tested-by: Brian Norris <briannorris@chromium.org> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221111005651.4160369-1-daniele.ceraolospurio@intel.com
-
Jani Nikula authored
The default_lists array should be in rodata. Fixes: dce2bd54 ("drm/i915/guc: Add Gen9 registers for GuC error state capture.") Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221122141616.3469214-1-jani.nikula@intel.com
-
- 22 Nov, 2022 1 commit
-
-
José Roberto de Souza authored
For multi-tile setups the GSC operational only on the tile 0. Skip GSC auxiliary device creation for all other tiles in GSC device init code. Initialize basic GSC fields and use the same path as HECI1 (HECI_PXP) device disable. Cc: Tomas Winkler <tomas.winkler@intel.com> Cc: Vitaly Lubart <vitaly.lubart@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Acked-by: Tomas Winkler <tomas.winkler@intel.com> Reviewed-by: Tomas Winkler <tomas.winkler@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221121092449.328674-1-alexander.usyskin@intel.com
-
- 21 Nov, 2022 4 commits
-
-
Umesh Nerlige Ramappa authored
Engine busyness samples around a 10ms period is failing with busyness ranging approx. from 87% to 115% as shown below. The expected range is +/- 5% of the sample period. Fail 10% of the time. rcs0: reported 11716042ns [91%] busyness while spinning [for 12805719ns] When determining busyness of active engine, the GuC based engine busyness implementation relies on a 64 bit timestamp register read. The latency incurred by this register read causes the failure. On DG1, when the test fails, the observed latencies range from 900us - 1.5ms. Optimizing the 2x32 read by acquiring the lock and forcewake prior to all reg reads reduces the rate of failure to around 2%, but does not eliminate it. In order to make the selftest more robust and always account for such latencies, increase the sample period to 100 ms. This eliminates the issue as seen in a 1000 runs. v2: (Ashutosh) - Add error to commit msg - Include gitlab bug - Update commit for inclusion of 2x32 optimized read Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4418Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221110171913.670286-3-umesh.nerlige.ramappa@intel.com
-
Umesh Nerlige Ramappa authored
PMU reads the GT timestamp as a 2x32 mmio read and since upper and lower 32 bit registers are read in a loop, there is a latency involved between getting the GT timestamp and the CPU timestamp. As part of the resolution, refactor intel_uncore_read64_2x32 to acquire forcewake and uncore lock prior to reading upper and lower regs. Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221110171913.670286-2-umesh.nerlige.ramappa@intel.com
-
Vinay Belgaumkar authored
By defaut idle messaging is disabled for GSC CS so to unblock RC6 entry on media tile idle messaging need to be enabled. v2: - Fix review comments (Vinay) - Set GSC idle hysteresis as per spec (Badal) v3: - Fix review comments (Rodrigo) Bspec: 71496 Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221118183354.1047829-1-badal.nilawar@intel.com
-
Tvrtko Ursulin authored
In 36537275 ("drm/i915: Simplify internal helper function signature") I broke the old platforms by not noticing engine workaround init does not initialize the list on old platforms. Fix it by always initializing which already does the right thing by mostly not doing anything if there aren't any workarounds on the list. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: 36537275 ("drm/i915: Simplify internal helper function signature") Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221118115249.2683946-1-tvrtko.ursulin@linux.intel.com
-
- 18 Nov, 2022 1 commit
-
-
Matt Roper authored
The GT MCR code currently relies on uncore->lock to avoid race conditions on the steering control register during MCR operations. The *_fw() versions of MCR operations expect the caller to already hold uncore->lock, while the non-fw variants manage the lock internally. However the sole callsite of intel_gt_mcr_wait_for_reg_fw() does not currently obtain the forcewake lock, allowing a potential race condition (and triggering an assertion on lockdep builds). Furthermore, since 'wait for register value' requests may not return immediately, it is undesirable to hold a fundamental lock like uncore->lock for the entire wait and block all other MMIO for the duration; rather the lock is only needed around the MCR read operations and can be released during the delays. Convert intel_gt_mcr_wait_for_reg_fw() to a non-fw variant that will manage uncore->lock internally. This does have the side effect of causing an unnecessary lookup in the forcewake table on each read operation, but since the caller is still holding the relevant forcewake domain, this will ultimately just incremenent the reference count and won't actually cause any additional MMIO traffic. In the future we plan to switch to a dedicated MCR lock to protect the steering critical section rather than using the overloaded and high-traffic uncore->lock; on MTL and beyond the new lock can be implemented on top of the hardware-provided synchonization mechanism for steering. Fixes: 3068bec8 ("drm/i915/gt: Add intel_gt_mcr_wait_for_reg_fw()") Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221117173358.1980230-1-matthew.d.roper@intel.com
-
- 17 Nov, 2022 5 commits
-
-
Badal Nilawar authored
Add support for C6 residency and C state type for MTL SAMedia. Also add mtl_drpc. v2: Fixed review comments (Ashutosh) v3: Sort registers and fix whitespace errors in intel_gt_regs.h (Matt R) Remove MTL_CC_SHIFT (Ashutosh) Adapt to RC6 residency register code refactor (Jani N) v4: Move MTL branch to top in drpc_show v5: Use FORCEWAKE_MT identical to gen6_drpc (Ashutosh) v6: Add MISSING_CASE for gt_core_status switch statement (Rodrigo) Change state name for MTL_CC0 to C0 (from "on") (Rodrigo) v7: Change state name for MTL_CC0 to RC0 (Rodrigo) Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221114123348.3474216-6-badal.nilawar@intel.com
-
Ashutosh Dixit authored
Previously RC6 residency functions directly accepted RC6 residency register MMIO offsets (there are four RC6 residency registers). This worked but required an assumption on the residency register layout so was not future proof. Therefore change RC6 residency functions to accept RC6 residency types instead of register MMIO offsets. The knowledge of register offsets as well as ID to offset mapping is now maintained solely in intel_rc6 and can be tailored for different platforms and different register layouts as need arises. v2: Address review comments by Jani N - Change residency functions to accept RC6 residency types instead of register ID's - s/intel_rc6_print_rc5_res/intel_rc6_print_residency/ - Remove "const enum" in function arguments - Naming: intel_rc6_* for enum - Use INTEL_RC6_RES_MAX and other minor changes v3: Don't include intel_rc6_types.h in intel_rc6.h (Jani) Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Suggested-by: Jani Nikula <jani.nikula@linux.intel.com> Reported-by: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221114123348.3474216-5-badal.nilawar@intel.com
-
Badal Nilawar authored
Update CAGF functions for MTL to get actual resolved frequency of 3D and SAMedia. v2: Update MTL_MIRROR_TARGET_WP1 position/formatting (MattR) Move MTL branches in cagf functions to top (MattR) Fix commit message (Andi) v3: Added comment about registers not needing forcewake for Gen12+ and returning 0 freq in RC6 v4: Use REG_FIELD_GET and uncore (Rodrigo) Bspec: 66300 Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221114123348.3474216-4-badal.nilawar@intel.com
-
Don Hiatt authored
On GEN12+ use GEN12_RPSTAT register to get actual resolved GT freq. GEN12_RPSTAT does not require a forcewake and will return 0 freq if GT is in RC6. v2: - Fixed review comments(Ashutosh) - Added function intel_rps_read_rpstat_fw to read RPSTAT without forcewake, required especially for GEN6_RPSTAT1 (Ashutosh, Tvrtko) v3: - Updated commit title and message for more clarity (Ashutosh) - Replaced intel_rps_read_rpstat with direct read to GEN12_RPSTAT1 in read_cagf (Ashutosh) v4: Remove GEN12_CAGF_SHIFT and use REG_FIELD_GET (Rodrigo) Cc: Don Hiatt <donhiatt@gmail.com> Cc: Andi Shyti <andi.shyti@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221114123348.3474216-3-badal.nilawar@intel.com
-
Ashutosh Dixit authored
Instead of masks/shifts settle on REG_FIELD_GET as the standard way to extract reg fields. This allows future patches touching this code to also consistently use REG_FIELD_GET and friends. Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221114123348.3474216-2-badal.nilawar@intel.com
-
- 16 Nov, 2022 4 commits
-
-
Daniele Ceraolo Spurio authored
For the GSC engine we only want to capture the instance regs. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221111001832.4144910-1-daniele.ceraolospurio@intel.com
-
Alan Previn authored
Previously, we only used PXP FW interface version-42 structures for PXP arbitration session on ADL/TGL products and version-43 for HuC authentication on DG2. That worked fine despite not differentiating such versioning of the PXP firmware interaction structures. This was okay back then because the only commands used via version 42 was not used via version 43 and vice versa. With MTL, we'll need both these versions side by side for the same commands (PXP-session) with the older platform feature support. That said, let's create separate files to define the structures and definitions for both version-42 and 43 of PXP FW interfaces. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221108045628.4187260-2-alan.previn.teres.alexis@intel.com
-
Matthew Auld authored
In i915_gem_madvise_ioctl() we immediately purge the object is not currently used, like when the mm.pages are NULL. With shmem the pages might still be hanging around or are perhaps swapped out. Similarly with ttm we might still have the pages hanging around on the ttm resource, like with lmem or shmem, but here we need to be extra careful since async unbinds are possible as well as in-progress kernel moves. In i915_ttm_purge() we expect the pipeline-gutting to nuke the ttm resource for us, however if it's busy the memory is only moved to a ghost object, which then leads to broken behaviour when for example clearing the i915_tt->filp, since the actual ttm_tt is still alive and populated, even though it's been moved to the ghost object. When we later destroy the ghost object we hit the following, since the filp is now NULL: [ +0.006982] #PF: supervisor read access in kernel mode [ +0.005149] #PF: error_code(0x0000) - not-present page [ +0.005147] PGD 11631d067 P4D 11631d067 PUD 115972067 PMD 0 [ +0.005676] Oops: 0000 [#1] PREEMPT SMP NOPTI [ +0.012962] Workqueue: events ttm_device_delayed_workqueue [ttm] [ +0.006022] RIP: 0010:i915_ttm_tt_unpopulate+0x3a/0x70 [i915] [ +0.005879] Code: 89 fb 48 85 f6 74 11 8b 55 4c 48 8b 7d 30 45 31 c0 31 c9 e8 18 6a e5 e0 80 7d 60 00 74 20 48 8b 45 68 8b 55 08 4c 89 e7 5b 5d <48> 8b 40 20 83 e2 01 41 5c 89 d1 48 8b 70 30 e9 42 b2 ff ff 4c 89 [ +0.018782] RSP: 0000:ffffc9000bf6fd70 EFLAGS: 00010202 [ +0.005244] RAX: 0000000000000000 RBX: ffff8883e12ae380 RCX: 0000000000000000 [ +0.007150] RDX: 000000008000000e RSI: ffffffff823559b4 RDI: ffff8883e12ae3c0 [ +0.007142] RBP: ffff888103b65d48 R08: 0000000000000001 R09: 0000000000000001 [ +0.007144] R10: 0000000000000001 R11: ffff88829c2c8040 R12: ffff8883e12ae3c0 [ +0.007148] R13: 0000000000000001 R14: ffff888115184140 R15: ffff888115184248 [ +0.007154] FS: 0000000000000000(0000) GS:ffff88844db00000(0000) knlGS:0000000000000000 [ +0.008108] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.005763] CR2: 0000000000000020 CR3: 000000013fdb4004 CR4: 00000000003706e0 [ +0.007152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ +0.007145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ +0.007154] Call Trace: [ +0.002459] <TASK> [ +0.002126] ttm_tt_unpopulate.part.0+0x17/0x70 [ttm] [ +0.005068] ttm_bo_tt_destroy+0x1c/0x50 [ttm] [ +0.004464] ttm_bo_cleanup_memtype_use+0x25/0x40 [ttm] [ +0.005244] ttm_bo_cleanup_refs+0x90/0x2c0 [ttm] [ +0.004721] ttm_bo_delayed_delete+0x235/0x250 [ttm] [ +0.004981] ttm_device_delayed_workqueue+0x13/0x40 [ttm] [ +0.005422] process_one_work+0x248/0x560 [ +0.004028] worker_thread+0x4b/0x390 [ +0.003682] ? process_one_work+0x560/0x560 [ +0.004199] kthread+0xeb/0x120 [ +0.003163] ? kthread_complete_and_exit+0x20/0x20 [ +0.004815] ret_from_fork+0x1f/0x30 v2: - Just use ttm_bo_wait() directly (Niranjana) - Add testcase reference Testcase: igt@gem_madvise@dontneed-evict-race Fixes: 213d5092 ("drm/i915/ttm: Introduce a TTM i915 gem object backend") Reported-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: <stable@vger.kernel.org> # v5.15+ Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Acked-by: Nirmoy Das <Nirmoy.Das@intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221115104620.120432-1-matthew.auld@intel.com
-
Nirmoy Das authored
vm_fault_ttm() should not expect ttm ghost obj so remove that check. Suggested-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221024144558.27747-1-nirmoy.das@intel.com
-