- 04 Jan, 2018 3 commits
-
-
https://git.pengutronix.de/git/lst/linuxDave Airlie authored
Highlights this time: 1. Fix for a nasty Kconfig dependency chain issue from Philipp. 2. Occlusion query buffer address added to the cmdstream validator by Christian. 3. Fixes and cleanups to the job handling from me. This allows us to turn on the GPU performance profiling added in the last cycle. It is also prep work for hooking in the DRM GPU scheduler, which I hope to land for the next cycle. * 'etnaviv/next' of https://git.pengutronix.de/git/lst/linux: (32 commits) drm/etnaviv: use memset32 to init pagetable drm/etnaviv: move submit free out of critical section drm/etnaviv: re-enable perfmon support drm/etnaviv: couple runtime PM management to submit object lifetime drm/etnaviv: move GPU active handling to bo pin/unpin drm/etnaviv: move cmdbuf into submit object drm/etnaviv: use submit exec_state for perfmon sampling drm/etnaviv: move exec_state to submit object drm/etnaviv: move PMRs to submit object drm/etnaviv: refcount the submit object drm/etnaviv: move ww_acquire_ctx out of submit object drm/etnaviv: move object unpinning to submit cleanup drm/etnaviv: attach in fence to submit and move fence wait to fence_sync drm/etnaviv: rename submit fence to out_fence drm/etnaviv: move object fence attachment to gem_submit path drm/etnaviv: simplify submit_create drm/etnaviv: add lockdep annotations to buffer manipulation functions drm/etnaviv: hold GPU lock while inserting END command drm/etnaviv: move workqueue to be per GPU drm/etnaviv: remove switch_context member from etnaviv_gpu ...
-
Dave Airlie authored
Merge tag 'exynos-drm-next-for-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-next Remove lagacy IPP driver - This driver isn't used anymore so remove it. Marek is preparing new one which includes completely rewritten API so this driver will be replaced with the new version[1] later. And cleanups. [1] https://patches.linaro.org/cover/118386/ * tag 'exynos-drm-next-for-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos: drm/exynos: ipp: Remove Exynos DRM IPP subsystem drm/exynos/decon: Add include guard to the Exynos7 header drm/exynos/decon: Move headers from global to local place drm/exynos: decon5433: Remove unnecessary platform_get_resource() error check
-
git://people.freedesktop.org/~gabbayo/linuxDave Airlie authored
- Add CWSR (compute wave save restore) support for GFX8 (Carrizo) - Fix SDMA user-mode queues support for GFX7 (Kaveri) - Add SDMA user-mode queues support for GFX8 (Carrizo) - Allow HWS (hardware scheduling) to schedule multiple processes concurrently - Add debugfs support - Simplify process locking and lock dependencies - Refactoring topology code to prepare for dGPU support + fixes to that code - Add option to generate dummy/virtual CRAT table when its missing or deformed - Recognize CPUs other then APUs as compute entities - Various clean ups and bug fixes I have not yet sent the dGPU topology code because it depends on a patch for the PCI subsystem that adds PCIe atomics support. Once that patch is upstreamed we can continue with the rest of the dGPU code. * tag 'drm-amdkfd-next-2017-12-24' of git://people.freedesktop.org/~gabbayo/linux: (53 commits) drm/amdgpu: Add support for reporting VRAM usage drm/amdkfd: Ignore ACPI CRAT for non-APU systems drm/amdkfd: Module option to disable CRAT table drm/amdkfd: Add AQL Queue Memory flag on topology drm/amdkfd: Fixup incorrect info in the CZ CRAT table drm/amdkfd: Add perf counters to topology drm/amdkfd: Add topology support for dGPUs drm/amdkfd: Add topology support for CPUs drm/amdkfd: Fix sibling_map[] size drm/amdkfd: Simplify counting of memory banks drm/amdkfd: Turn verbose topology messages into pr_debug drm/amdkfd: sync IOLINK defines to thunk spec drm/amdkfd: Support enumerating non-GPU devices drm/amdkfd: Decouple CRAT parsing from device list update drm/amdkfd: Reorganize CRAT fetching from ACPI drm/amdkfd: Group up CRAT related functions drm/amdkfd: Fix memory leaks in kfd topology drm/amdkfd: Topology: Fix location_id drm/amdkfd: Update number of compute unit from KGD drm/amd: Remove get_vmem_size from KGD-KFD interface ...
-
- 02 Jan, 2018 29 commits
-
-
Lucas Stach authored
Now that memset32 is available, the open-coded pagetable initialization loop can be replaced. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
-
Lucas Stach authored
There is no need to hold the GPU lock while freeing the submit object. Only move the retired submits from the GPU active list to a temporary retire list under the GPU lock. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
-
Lucas Stach authored
Now that the PMR lifetime issues are solved we can safely re-enable performance counter profiling support. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
-
Lucas Stach authored
As long as there is an active submit, we want the GPU to stay awake. This is slightly complicated by the fact that we really want to wake the GPU at the last possible moment to achieve maximum power savings. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
-
Lucas Stach authored
The active count is used to check if the BO is idle, where idle is defined as not active on the GPU and all VM mappings and reference counts dropped to the initial state. As the idling of the mappings and references now only happens in the submit cleanup, the active state handling must be moved to the same location in order to keep the userspace semantics. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
-
Lucas Stach authored
Less dynamic allocations and slims down the cmdbuf object to only the required information, as everything else is already available in the submit object. This also simplifies buffer and mappings lifetime management, as they are now exlusively attached to the submit object and not additionally to the cmdbuf. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
-
Lucas Stach authored
The GPU exec state may have changed at the time when the perfmon sampling is done, as it reflects the state of the last submission, not the current GPU execution state. So for proper sampling we must use the submit exec_state. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
-
Lucas Stach authored
We'll need this in some places where only the submit is available. Also this is a first step at slimming down the cmdbuf object. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
-
Lucas Stach authored
To make them available to the event worker even after the actual command stream execution has finished. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
-
Lucas Stach authored
The submit object lifetime will get extended to the actual GPU execution. As multiple users will depend on this, add a kref to properly control destruction of the object. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
-
Lucas Stach authored
The acquire_ctx is special in that it needs to be released from the same thread as has been used to initialize it. This collides with the intention to extend the submit lifetime beyond the gem_submit function with potentially other threads doing the final cleanup. Move the ww_acquire_ctx to the function local stack as suggested in the documentation. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
-
Lucas Stach authored
This is safe to call in all paths, as the BO_PINNED flag tells us if the BO needs unpinning. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
-
Lucas Stach authored
Simplifies the cleanup path and moves fence waiting to a central location. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
-
Lucas Stach authored
This is the fence passed out on a sucessful GPU submit. Make the name more clear. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
-
Lucas Stach authored
The object fencing has nothing to do with the actual GPU buffer submit, so move it to the gem submit path to have a cleaner split. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
-
Lucas Stach authored
Use kzalloc so other code doesn't need to worry about uninitialized members. Drop the non-standard GFP flags, as we really don't want to fail the submit when under slight memory pressure. Remove one level of indentation by using an early return if the allocation failed. Also remove the unused drm device member. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
-
Lucas Stach authored
When manipulating the kernel command buffer the GPU mutex must be held, as otherwise different callers might try to replace the same part of the buffer, wreacking havok in the GPU execution. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
-
Lucas Stach authored
Inserting the END command when suspending the GPU is changing the command buffer state, which requires the GPU to be held. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
-
Lucas Stach authored
While the etnaviv workqueue needs to be ordered, as we rely on work items being executed in queuing order, this is only true for a single GPU. Having a shared workqueue for all GPUs in the system limits concurrency artificially. Getting each GPU its own ordered workqueue still meets our ordering expectations and enables retire workers to run concurrently. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
-
Lucas Stach authored
There is no need to store this in the gpu struct. MMU flushes are triggered correctly in reaction to MMU maps and unmaps, independent of the current ctx. Any required pipe switches can be infered from the current and the desired GPU exec state. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
-
Lucas Stach authored
There is no need to synchronize with oustanding retire jobs if the object has gone idle. Retire jobs only ever change the object state from active to idle, not the other way around. The IOVA put race is uncritical, as the GEM_WAIT ioctl itself is holding a reference to the GEM object, so the retire worker will not pull the object into the CPU domain, which is the thing we are trying to guard against with etnaviv_gpu_wait_obj_inactive. The ordering of the various counts and waits may change a bit, but the userspace visible behavior at the bounds of the syscall are unchanged. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
-
Lucas Stach authored
Flush and prefetch are properly handled in the buffer code, data endianess would need much wider changes than adding something to this single function. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
-
Lucas Stach authored
Now that the userptr BO handling doesn't rely on the userspace restarting the submit after object population, there is no need to special case the -EAGAIN return value anymore. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
-
Lucas Stach authored
All code paths which populate userptr BOs are fine with the get_pages function taking the mmap_sem lock. This allows to get rid of the pretty involved architecture with a worker being scheduled if the mmap_sem needs to be taken, but instead call GUP directly and allow it to take the lock if necessary. This simplifies the code a lot and removes the possibility of this function returning -EAGAIN, which complicates object population handling at the callers. A notable change in behavior is that we don't allow a process to populate objects with user pages from a foreign MM anymore. This would have been an invalid use before, as it breaks the assumptions made in the etnaviv kernel driver to enfore cache coherence. We now disallow this by rejecting the request to populate those objects. Well behaving userspace is unaffected by this change. Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
-
Lucas Stach authored
This function never fails, as it does nothing more than adding the GEM object to the global device list. Making this explicit through the void return type allows to drop some unnecessary error handling. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
-
Lucas Stach authored
This function has only one caller and it isn't expected that there will be any more in the future. Folding this function into the caller is helping the readability. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
-
Lucas Stach authored
The current userptr page population will defer work to a work item if needed to avoid ever taking the mmap_sem in the direct call path. With the more fine-grained locking in etnaviv this isn't needed anymore, so a future commit will simplify this code. Add a lockdep annotation to validate the assumption that the mmap_sem can be taken in the direct call path. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
-
Lucas Stach authored
Userptr, prime and shmem buffer objects have different lock ordering requirements. This is mostly due to the fact that we don't allow to mmap userptr buffers, so we won't ever end up in our fault handler for those, so some of the code paths are never called with the mmap_sem held. To avoid lockdep false positives, split them up into different lock classes. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
-
Lucas Stach authored
If the FE is restarted before the sync point event is cleared, the GPU might trigger a completion IRQ for the next sync point, corrupting the state of the currently running worker. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
-
- 01 Jan, 2018 4 commits
-
-
Marek Szyprowski authored
Exynos DRM IPP subsystem is in fact non-functional and frankly speaking dead-code. This patch clearly marks that Exynos DRM IPP subsystem is broken and never really functional. It will be replaced by a completely rewritten API. Exynos DRM IPP user-space API can be obsoleted for the following reasons: 1. Exynos DRM IPP user-space API can be optional in Exynos DRM, so userspace should not rely that it is always available and should have a software fallback in case it is not there. 2. The only mode which was initially semi-working was memory-to-memory image processing. The remaining modes (LCD-"writeback" and "output") were never operational due to missing code (both in mainline and even vendor kernels). 3. Exynos DRM IPP mainline user-space API compatibility for memory-to-memory got broken very early by commit 083500ba ("drm: remove DRM_FORMAT_NV12MT", which removed the support for tiled formats, the main feature which made this API somehow useful on Exynos platforms (video codec that time produced only tiled frames, to implement xvideo or any other video overlay, one has to de-tile them for proper display). 4. Broken drivers. Especially once support for IOMMU has been added, it revealed that drivers don't configure DMA operations properly and in many cases operate outside the provided buffers trashing memory around. 5. Need for external patches. Although IPP user-space API has been used in some vendor kernels, but in such cases there were additional patches applied (like reverting mentioned 083500ba patch) what means that those userspace apps which might use it, still won't work with the mainline kernel version. We don't have time machines, so we cannot change it, but Exynos DRM IPP extension should never have been merged to mainline in that form. Exynos IPP subsystem and user-space API will be rewritten, so remove current IPP core code and mark existing drivers as BROKEN. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Daniel Stone <daniels@collabora.com> Acked-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Inki Dae <inki.dae@samsung.com>
-
Krzysztof Kozlowski authored
Although header is included only once but still having an include guard is a good practice. To avoid confusion, add SoC prefix to existing Exynos5433 header include guard. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Inki Dae <inki.dae@samsung.com>
-
Krzysztof Kozlowski authored
The DECON headers contain only defines for registers. There are no other drivers using them so this should be put locally to the Exynos DRM driver. Keeping headers local helps managing the code. Suggested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Inki Dae <inki.dae@samsung.com>
-
Fabio Estevam authored
devm_ioremap_resource() already checks if the resource is NULL, so remove the unnecessary platform_get_resource() error check. Cc: Inki Dae <inki.dae@samsung.com> Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Reviewed-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
-
- 27 Dec, 2017 1 commit
-
-
git://anongit.freedesktop.org/drm/drm-intelDave Airlie authored
- Allow internal page allocation to fail (Chris) - More improvements on logs, dumps, and trace (Chris, Michal) - Coffee Lake important fix for stolen memory (Lucas) - Continue to make GPU reset more robust as well improving selftest coverage for it (Chris) - Unifying debugfs return codes (Michal) - Using existing helper for testing obj pages (Matthew) - Organize and improve gem_request tracepoints (Lionel) - Protect DDI port to DPLL map from theoretical race (Rodrigo) - ... and consequently fixing the indentation on this DDI clk selection function (Chris) - ... and consequently properly serializing non-blocking modesets (Ville) - Add support for horizontal plane flipping on Cannonlake (Joonas) - Two Cannonlake Workarounds for better stability (Rafael) - Fix mess around PSR registers (DK) - More Coffee Lake PCI IDs (Rodrigo) - Remove CSS modifiers on pipe C of Geminilake (Krisman) - Disable all planes for load detection (Ville) - Reorg on i915 display headers (Michal) - Avoid enabling movntdqa optimization on hypervisor guest (Changbin) GVT: - more mmio switch optimization (Weinan) - cleanup i915_reg_t vs. offset usage (Zhenyu) - move write protect handler out of mmio handler (Zhenyu) * tag 'drm-intel-next-2017-12-22' of git://anongit.freedesktop.org/drm/drm-intel: (55 commits) drm/i915: Update DRIVER_DATE to 20171222 drm/i915: Show HWSP in intel_engine_dump() drm/i915: Assert that the request is on the execution queue before being removed drm/i915/execlists: Show preemption progress in GEM_TRACE drm/i915: Put all non-blocking modesets onto an ordered wq drm/i915: Disable GMBUS clock gating around GMBUS transfers on gen9+ drm/i915: Clean up the PNV bit banging vs. GMBUS clock gating w/a drm/i915: No need to power up PG2 for GMBUS on BXT drm/i915: Disable DC states around GMBUS on GLK drm/i915: Do not enable movntdqa optimization in hypervisor guest drm/i915: Dump device info at once drm/i915: Add pretty printer for runtime part of intel_device_info drm/i915: Update intel_device_info_runtime_init() parameter drm/i915: Move intel_device_info definitions to its own header drm/i915: Move opregion definitions to dedicated intel_opregion.h drm/i915: Move display related definitions to dedicated header drm/i915: Move some utility functions to i915_util.h drm/i915/gvt: move write protect handler out of mmio emulation function drm/i915/gvt: cleanup usage for typed mmio reg vs. offset drm/i915/gvt: Fix pipe A enable as default for vgpu ...
-
- 22 Dec, 2017 3 commits
-
-
Rodrigo Vivi authored
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
-
Chris Wilson authored
Looking at a CI failure with an ominous line of [ 362.550715] hangcheck current seqno ffffff6b, last ffffff8c, hangcheck ffffff6b [6016 ms], inflight 118 with no apparent cause for the seqno to be negative, left me wondering if someone had scribbled over the HWSP. So include the HWSP in the engine dump to see if there are more signs of random scribbling. v2: Fix row pointer, i is now incremented by 8 so doesn't need scaling by 8, and we don't need to keep volatile here as the status_page isn't marked up as volatile itself. v3: Use hexdump, with suppression of identical lines. (Tvrtko) Which results in HWSP: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 * 00000040 00000001 00000000 00000018 00000002 00000001 00000000 00000018 00000000 00000060 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000003 00000080 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 * 000000c0 00000002 00000000 00000000 00000000 00000000 00000000 00000000 00000000 000000e0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 * instead of 128 lines of mostly 0s. v4: Tidy up the locals Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171222182521.18106-1-chris@chris-wilson.co.ukReviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
-
Chris Wilson authored
We should only attempt to remove requests from the execution queue that are on the execution queue. These are the requests that have been assigned a global_seqno, so we can assert that we only attempt to remove requests with a nonzero global_seqno. Afterwards we assert that we remove them in order, i.e. the global_seqno matches the engine's seqno, but that leaves a small loophole for an unattached request on an unused engine. We can then make the same assertion on queuing the request to the execution engine, it must have a zero global_seqno or else we are queuing the same request twice. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171222141959.3006-1-chris@chris-wilson.co.ukReviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
-