- 18 Dec, 2017 3 commits
-
-
Chris Wilson authored
A useful bit of information for inspecting GPU stalls from intel_engine_dump() are the error registers, IPEIR and IPEHR. v2: Fixup gen changes in register offsets (Tvrtko) v3: Old FADDR location as well v4: Use I915_READ64_2x32 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171218123914.19027-1-chris@chris-wilson.co.ukReviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
-
Matthew Auld authored
We have an existing helper for testing obj->mm.pages, so use it. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171218103855.25274-1-matthew.auld@intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
-
Chris Wilson authored
Inside i915_gem_reset(), we start touching the HW and so require the low-level HW to be re-enabled, in particular the PCI BARs. Fixes: 7b6da818 ("drm/i915: Restore the kernel context after a GPU reset on an idle engine") References: 0db8c961 ("drm/i915: Re-enable GTT following a device reset") Testcase: igt/drv_hangman #i915g/i915gm Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171217132852.30642-1-chris@chris-wilson.co.ukReviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
-
- 17 Dec, 2017 1 commit
-
-
Michal Wajdeczko authored
Instead of trying different seq_puts messages, lets use common -ENODEV error code to indicate missing/unsupported feature. v2: don't forget about guc_log_control fops (Sagar) Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com> Cc: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171215143635.17884-1-michal.wajdeczko@intel.comReviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
-
- 16 Dec, 2017 2 commits
-
-
Chris Wilson authored
As part of the system requirement for powersaving is that we always have a context loaded. Upon boot and resume, we load the kernel_context to ensure that some valid state is set before powersaving kicks in, we should do so after a full GPU reset as well. We only need to do so for an idle engine, as any active engines will restart by executing the stuck request, loading its context. For the idle engine, we create a new request to load the kernel_context instead. For whatever reason, perfoming a dummy execute on the idle engine after reset papers over a subsequent GPU hang in rare circumstances, even on machines not using contexts (e.g. Pineview). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104259 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104261Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171216000334.8197-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
At the beginning of a reset, we disable the submission method and find the stuck request. We expect to find a stuck request for we have declared the engine stalled. However, if we find no active request, the engine must have recovered from its stall before we could issue a reset, so let the engine continue on without a reset. If the engine is truly stuck, we will back soon enough with the next reset attempt. v2: Remove the stale debug message. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171216002206.31737-1-chris@chris-wilson.co.uk
-
- 15 Dec, 2017 3 commits
-
-
Lucas De Marchi authored
CFL was missing from intel_early_ids[]. The PCI ID needs to be there to allow the memory region to be stolen, otherwise we could have RAM being arbitrarily overwritten if for example we keep using the UEFI framebuffer, depending on how BIOS has set up the e820 map. Fixes: b056f8f3 ("drm/i915/cfl: Add Coffee Lake PCI IDs for S Skus.") Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Anusha Srivatsa <anusha.srivatsa@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: David Airlie <airlied@linux.ie> Cc: intel-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Cc: <stable@vger.kernel.org> # v4.13+ 0890540e drm/i915: add GT number to intel_device_info Cc: <stable@vger.kernel.org> # v4.13+ 41693fd5 drm/i915/kbl: Change a KBL pci id to GT2 from GT1.5 Cc: <stable@vger.kernel.org> # v4.13+ Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171213200425.2954-1-lucas.demarchi@intel.com
-
Chris Wilson authored
Just printk the string, or at least do not double up on the newlines! Fixes: eef57324 ("drm/i915: setup bridge for HDMI LPE audio driver") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Cc: Jerome Anand <jerome.anand@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Takashi Iwai <tiwai@suse.de> Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171213182858.2159-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
Internal objects consistent of scratch pages not subject to the persistence guarantees of user facing objects. They are used for example, in ring buffers where they are only required for temporary storage of commands that will be rewritten every time. As they are temporary constructs, quietly report -ENOMEM back along the callchain rather than subject the system to oomkiller if an allocation fails. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171215101753.1519-1-chris@chris-wilson.co.uk
-
- 14 Dec, 2017 12 commits
-
-
Rodrigo Vivi authored
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
-
https://github.com/intel/gvt-linuxRodrigo Vivi authored
gvt-next-2017-12-14: - fixes for two coverity scan errors (Colin) - mmio switch code refine (Changbin) - more virtual display dmabuf fixes (Tina/Gustavo) - misc cleanups (Pei) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171214033434.jlppjlyal5d67ya7@zhen-hp.sh.intel.com
-
Sebastian Andrzej Siewior authored
The code has an ifdef and uses two functions to either init the bare spinlock or init it and set a lock-class. It is possible to do the same thing without an ifdef. With this patch (in debug case) we first use the "default" lock class which is later overwritten to the supplied one. Without lockdep the set name/class function vanishes. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171214131009.7479-1-joonas.lahtinen@linux.intel.comSigned-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
Chris Wilson authored
Knowing the state of the engine when hangcheck thinks it is stalling is useful for both debugging hangcheck itself and the potential cause of an unwanted stall. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171214122613.26134-1-chris@chris-wilson.co.uk
-
Lionel Landwerlin authored
As suggested by Chris, we should make this more obvious for people working with newer generations. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171213171154.6201-1-lionel.g.landwerlin@intel.com
-
Michał Winiarski authored
We have the selftest that's checking doorbell create/destroy, so there's no need to check all doorbells delaying the reset every time. We do want to have that extra sanity check at module load/unload though. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171213221352.7173-7-michal.winiarski@intel.com
-
Michał Winiarski authored
We can now move the clients allocation to submission_init path, rather than keeping the condition inside submission_enable called on every reset. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171213221352.7173-6-michal.winiarski@intel.com
-
Michał Winiarski authored
Full GPU reset causes GuC to be reset. This means that every time we're doing a reset, we need to talk to GuC and tell it about doorbells. Let's separate the communication part (create_doorbell) from our internal bookkeeping (reserve_doorbell) so that we can cleanly separate the initialization done at module load from reinitialization done at reset in the following patch. While I'm here, let's also add a proper (although slightly asymetric) cleanup that doesn't try to communicate with GuC after it's already gone, getting rid of "expected" warnings caused by GuC action failures on module unload. Note that I've also removed one of the tests (bitmap out of sync), since it doesn't make much sense anymore - bitmaps are now not expected to change during the lifetime of a client. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171213221352.7173-5-michal.winiarski@intel.com
-
Michał Winiarski authored
To make this operation a bit cleaner, we should make sure that the HW can catch up by calling the new implementation right away. Note that currently we're only touching the vfunc at module load time (before GuC is even loaded), so this shouldn't cause any functional changes. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171213221352.7173-4-michal.winiarski@intel.com
-
Michał Winiarski authored
After GPU reset, GuC HW needs to be reinitialized (with FW reload). Unfortunately, we're doing some extra work there (mostly allocating stuff), work that can be moved to guc_init and called once at driver load time. As a side effect we're no longer hitting an assert in i915_ggtt_enable_guc on suspend/resume. v2: Do not duplicate disable_communication / reset_guc_interrupts v3: Add proper teardown after rebase References: 04f7b24e ("drm/i915/guc: Assert that we switch between known ggtt->invalidate functions") Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171213221352.7173-3-michal.winiarski@intel.com
-
Michał Winiarski authored
This gets rid of the following lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 4.15.0-rc2-CI-Patchwork_7428+ #1 Not tainted ------------------------------------------------------ debugfs_test/1351 is trying to acquire lock: (&dev->struct_mutex){+.+.}, at: [<000000009d90d1a3>] i915_mutex_lock_interruptible+0x47/0x130 [i915] but task is already holding lock: (&mm->mmap_sem){++++}, at: [<000000005df01c1e>] __do_page_fault+0x106/0x560 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #6 (&mm->mmap_sem){++++}: __might_fault+0x63/0x90 _copy_to_user+0x1e/0x70 filldir+0x8c/0xf0 dcache_readdir+0xeb/0x160 iterate_dir+0xe6/0x150 SyS_getdents+0xa0/0x130 entry_SYSCALL_64_fastpath+0x1c/0x89 -> #5 (&sb->s_type->i_mutex_key#5){++++}: lockref_get+0x9/0x20 -> #4 ((completion)&req.done){+.+.}: wait_for_common+0x54/0x210 devtmpfs_create_node+0x130/0x150 device_add+0x5ad/0x5e0 device_create_groups_vargs+0xd4/0xe0 device_create+0x35/0x40 msr_device_create+0x22/0x40 cpuhp_invoke_callback+0xc5/0xbf0 cpuhp_thread_fun+0x167/0x210 smpboot_thread_fn+0x17f/0x270 kthread+0x173/0x1b0 ret_from_fork+0x24/0x30 -> #3 (cpuhp_state-up){+.+.}: cpuhp_issue_call+0x132/0x1c0 __cpuhp_setup_state_cpuslocked+0x12f/0x2a0 __cpuhp_setup_state+0x3a/0x50 page_writeback_init+0x3a/0x5c start_kernel+0x393/0x3e2 secondary_startup_64+0xa5/0xb0 -> #2 (cpuhp_state_mutex){+.+.}: __mutex_lock+0x81/0x9b0 __cpuhp_setup_state_cpuslocked+0x4b/0x2a0 __cpuhp_setup_state+0x3a/0x50 page_alloc_init+0x1f/0x26 start_kernel+0x139/0x3e2 secondary_startup_64+0xa5/0xb0 -> #1 (cpu_hotplug_lock.rw_sem){++++}: cpus_read_lock+0x34/0xa0 apply_workqueue_attrs+0xd/0x40 __alloc_workqueue_key+0x2c7/0x4e1 intel_guc_submission_init+0x10c/0x650 [i915] intel_uc_init_hw+0x29e/0x460 [i915] i915_gem_init_hw+0xca/0x290 [i915] i915_gem_init+0x115/0x3a0 [i915] i915_driver_load+0x9a8/0x16c0 [i915] i915_pci_probe+0x2e/0x90 [i915] pci_device_probe+0x9c/0x120 driver_probe_device+0x2a3/0x480 __driver_attach+0xd9/0xe0 bus_for_each_dev+0x57/0x90 bus_add_driver+0x168/0x260 driver_register+0x52/0xc0 do_one_initcall+0x39/0x150 do_init_module+0x56/0x1ef load_module+0x231c/0x2d70 SyS_finit_module+0xa5/0xe0 entry_SYSCALL_64_fastpath+0x1c/0x89 -> #0 (&dev->struct_mutex){+.+.}: lock_acquire+0xaf/0x200 __mutex_lock+0x81/0x9b0 i915_mutex_lock_interruptible+0x47/0x130 [i915] i915_gem_fault+0x201/0x760 [i915] __do_fault+0x15/0x70 __handle_mm_fault+0x85b/0xe40 handle_mm_fault+0x14f/0x2f0 __do_page_fault+0x2d1/0x560 page_fault+0x22/0x30 other info that might help us debug this: Chain exists of: &dev->struct_mutex --> &sb->s_type->i_mutex_key#5 --> &mm->mmap_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_sem); lock(&sb->s_type->i_mutex_key#5); lock(&mm->mmap_sem); lock(&dev->struct_mutex); *** DEADLOCK *** 1 lock held by debugfs_test/1351: #0: (&mm->mmap_sem){++++}, at: [<000000005df01c1e>] __do_page_fault+0x106/0x560 stack backtrace: CPU: 2 PID: 1351 Comm: debugfs_test Not tainted 4.15.0-rc2-CI-Patchwork_7428+ #1 Hardware name: /NUC6i5SYB, BIOS SYSKLi35.86A.0057.2017.0119.1758 01/19/2017 Call Trace: dump_stack+0x5f/0x86 print_circular_bug+0x230/0x3b0 check_prev_add+0x439/0x7b0 ? lockdep_init_map_crosslock+0x20/0x20 ? unwind_get_return_address+0x16/0x30 ? __lock_acquire+0x1385/0x15a0 __lock_acquire+0x1385/0x15a0 lock_acquire+0xaf/0x200 ? i915_mutex_lock_interruptible+0x47/0x130 [i915] __mutex_lock+0x81/0x9b0 ? i915_mutex_lock_interruptible+0x47/0x130 [i915] ? i915_mutex_lock_interruptible+0x47/0x130 [i915] ? i915_mutex_lock_interruptible+0x47/0x130 [i915] i915_mutex_lock_interruptible+0x47/0x130 [i915] ? __pm_runtime_resume+0x4f/0x80 i915_gem_fault+0x201/0x760 [i915] __do_fault+0x15/0x70 __handle_mm_fault+0x85b/0xe40 handle_mm_fault+0x14f/0x2f0 __do_page_fault+0x2d1/0x560 page_fault+0x22/0x30 RIP: 0033:0x7f98d6f49116 RSP: 002b:00007ffd6ffc3278 EFLAGS: 00010283 RAX: 00007f98d39a2bc0 RBX: 0000000000000000 RCX: 0000000000001680 RDX: 0000000000001680 RSI: 00007ffd6ffc3400 RDI: 00007f98d39a2bc0 RBP: 00007ffd6ffc33a0 R08: 0000000000000000 R09: 00000000000005a0 R10: 000055e847c2a830 R11: 0000000000000002 R12: 0000000000000001 R13: 000055e847c1d040 R14: 00007ffd6ffc3400 R15: 00007f98d6752ba0 v2: Init preempt_work unconditionally (Chris) v3: Mention that we need the enable_guc=1 for lockdep splat (Chris) Testcase: igt/debugfs_test/read_all_entries # with i915.enable_guc=1 Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171213221352.7173-2-michal.winiarski@intel.com
-
Michał Winiarski authored
We need shared data for actions (e.g. guc suspend/resume), and we're using those with GuC submission disabled. Let's introduce intel_guc_init and move shared data alloc there. This fixes GPF during module unload with HuC, but without GuC submission: BUG: unable to handle kernel NULL pointer dereference at 000000005aee7809 IP: intel_guc_suspend+0x34/0x140 [i915] PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: i915(O-) netconsole x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mei_me i2c_i801 mei prime_numbers [last unloaded: i915] CPU: 2 PID: 2794 Comm: rmmod Tainted: G U W O 4.15.0-rc2+ #297 Hardware name: /NUC6i5SYB, BIOS SYSKLi35.86A.0054.2016.0930.1102 09/30/2016 task: 0000000055945c61 task.stack: 00000000264ccb43 RIP: 0010:intel_guc_suspend+0x34/0x140 [i915] RSP: 0018:ffffc90000483df8 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff880829180000 RCX: 0000000000000000 RDX: 0000000000000006 RSI: ffff880844c2c938 RDI: ffff880844c2c000 RBP: ffff880829180000 R08: 00000000a29c58c1 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa040ba40 R13: ffffffffa040bab0 R14: ffff88084a195060 R15: 000055df3ef357a0 FS: 00007ff43c043740(0000) GS:ffff88084e200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000f9 CR3: 000000083f179005 CR4: 00000000003606e0 Call Trace: i915_gem_suspend+0x9d/0x130 [i915] ? i915_driver_unload+0x68/0x180 [i915] i915_driver_unload+0x70/0x180 [i915] i915_pci_remove+0x15/0x20 [i915] pci_device_remove+0x36/0xb0 device_release_driver_internal+0x15f/0x220 driver_detach+0x3a/0x80 bus_remove_driver+0x58/0xd0 pci_unregister_driver+0x29/0x90 SyS_delete_module+0x150/0x1e0 entry_SYSCALL_64_fastpath+0x23/0x9a RIP: 0033:0x7ff43b51b5c7 RSP: 002b:00007ffe6825a758 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007ff43b51b5c7 RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055df3ef35808 RBP: 0000000000000000 R08: 00007ffe682596d1 R09: 0000000000000000 R10: 00007ff43b594880 R11: 0000000000000206 R12: 000055df3ef357a0 R13: 00007ffe68259740 R14: 000055df3ef35260 R15: 000055df3ef357a0 Code: 00 00 02 74 03 31 c0 c3 53 48 89 fb 48 83 ec 10 e8 52 0f f8 ff 48 b8 01 05 00 00 02 00 00 00 48 89 44 24 04 48 8b 83 00 12 00 00 <f6> 80 f9 00 00 00 01 0f 84 a7 00 00 00 f6 80 98 00 00 00 01 0f RIP: intel_guc_suspend+0x34/0x140 [i915] RSP: ffffc90000483df8 CR2: 00000000000000f9 ---[ end trace 23a192a61d937a3e ]--- Fixes: b8e5eb96 ("drm/i915/guc: Allocate separate shared data object for GuC communication") Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20171213221352.7173-1-michal.winiarski@intel.com
-
- 13 Dec, 2017 6 commits
-
-
Chris Wilson authored
Since Michal introduced new user controllable errors other than -EIO during i915_gem_init(), we need to actually unwind on the error path as we have to abort the module load (and we expect to do so cleanly!). As we now teardown key state and then mark the driver as wedged (on EIO), we have to be careful to not allow ourselves to resume and unwedge, thus attempting to use the uninitialised driver. v2: Try not to free driver state for the suppressed EIO v3: Use load-fault-injection to test both error/recovery paths. References: 8620eb1d ("drm/i915/uc: Don't use -EIO to report missing firmware") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171213134347.4608-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
If we fail to allocate a request, we can reap the outstanding requests and push them to the request's slab's freelist before trying again. This forces us to ratelimit malicious clients that tie up all of the system resources in requests, instead of causing a system-wide oom. Testcase: igt/gem_shrink/execbuf1 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171212180652.22061-3-chris@chris-wilson.co.uk
-
Chris Wilson authored
If a fence allocation fails in a blocking context, we will sleep on the fence as a last resort. We can therefore allow ourselves to fail and sleep on the fence instead of triggering a system-wide oom. This allows us to throttle malicious clients that are consuming lots of system resources by capping the amount of memory used by fences. Testcase: igt/gem_shrink/execbufX Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171212180652.22061-2-chris@chris-wilson.co.uk
-
Chris Wilson authored
As kmalloc is allowed to block (if given the right flags), mark up the two i915_sw_fence routines that may call kmalloc as potential sleeping routines. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171212180652.22061-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
i915_gem_wait_for_idle() is called from inside the shrinker, to ensure that we drain the last resources from the GPU in dire circumstances (OOM). As we may allocate whilst building a request, it is then possible to hit the shrinker with a request under construction, and so we must account for the incomplete request whilst waiting. In particular, we preincrement (in reserve_engine) the i915->gt.active_requests counter and mark the GPU as busy, therefore we can not use that counter for shortcircuiting the wait-for-idle. [ 950.859024] GEM_BUG_ON(i915->gt.active_requests) [ 950.859041] WARNING: CPU: 2 PID: 2178 at drivers/gpu/drm/i915/i915_gem.c:3615 i915_gem_wait_for_idle.part.56+0x166/0x4e0 [ 950.859041] Modules linked in: ccm tun fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_security iptable_raw arc4 iwldvm mac80211 snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_codec btusb snd_hda_core btrtl btbcm iwlwifi snd_hwdep btintel bluetooth snd_seq snd_seq_device snd_pcm ecdh_generic x86_pkg_temp_thermal tpm_infineon coretemp tpm_tis crc32_pclmul wmi_bmof crc32c_intel iTCO_wdt hp_wmi snd_timer iTCO_vendor_support sparse_keymap tpm_tis_core mei_me cfg80211 [ 950.859082] snd joydev tpm mei rfkill pcspkr wmi soundcore lpc_ich hp_accel lis3lv02d input_polldev binfmt_misc e1000e ptp serio_raw pps_core [ 950.859094] CPU: 2 PID: 2178 Comm: gem_exec_nop Tainted: G U 4.15.0-rc2+ #900 [ 950.859102] Hardware name: Hewlett-Packard HP ProBook 6360b/1620, BIOS 68SCF Ver. B.42 12/29/2010 [ 950.859107] task: c5119cb4 task.stack: f3ccb8d8 [ 950.859112] EIP: i915_gem_wait_for_idle.part.56+0x166/0x4e0 [ 950.859113] EFLAGS: 00010296 CPU: 2 [ 950.859114] EAX: 00000024 EBX: f36c1888 ECX: f777a044 EDX: 00000007 [ 950.859115] ESI: f36c1888 EDI: edd53958 EBP: edd53970 ESP: edd53938 [ 950.859116] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 950.859117] CR0: 80050033 CR2: b7f39000 CR3: 2f2b3000 CR4: 000406d0 [ 950.859118] Call Trace: [ 950.859125] ? drm_printk+0x70/0x70 [ 950.859129] i915_gem_wait_for_idle+0x18/0x30 [ 950.859133] i915_gem_shrink+0x360/0x410 [ 950.859138] ? vmpressure+0xa8/0xf0 [ 950.859142] ? ktime_get+0x4a/0x100 [ 950.859147] i915_gem_shrink_all+0x21/0x40 [ 950.859151] i915_gem_shrinker_oom+0x23/0x130 [ 950.859156] notifier_call_chain+0x4e/0x70 [ 950.859160] __blocking_notifier_call_chain+0x2f/0x60 [ 950.859164] blocking_notifier_call_chain+0x11/0x20 [ 950.859169] out_of_memory+0x207/0x280 [ 950.859174] __alloc_pages_nodemask+0xd47/0xe60 [ 950.859179] new_slab+0x32d/0x450 [ 950.859183] ___slab_alloc.constprop.81+0x358/0x4e0 [ 950.859189] ? i915_sw_fence_await_dma_fence+0x53/0x160 [ 950.859193] ? __slab_free+0x1fe/0x310 [ 950.859197] ? native_sched_clock+0x1e/0xc0 [ 950.859201] ? i915_gem_request_alloc+0xcf/0x510 [ 950.859205] ? sched_clock+0x9/0x10 [ 950.859209] __slab_alloc.constprop.80+0x29/0x40 [ 950.859212] ? __slab_alloc.constprop.80+0x29/0x40 [ 950.859216] kmem_cache_alloc_trace+0x160/0x1a0 [ 950.859220] ? i915_sw_fence_await_dma_fence+0x53/0x160 [ 950.859224] i915_sw_fence_await_dma_fence+0x53/0x160 [ 950.859229] i915_gem_request_await_dma_fence+0x1eb/0x390 [ 950.859233] i915_gem_request_await_object+0xee/0x230 [ 950.859239] i915_gem_do_execbuffer+0xc16/0x1200 [ 950.859246] ? irqtime_account_irq+0x3e/0xc0 [ 950.859251] ? irq_exit+0x4f/0xb0 [ 950.859257] ? smp_apic_timer_interrupt+0x5f/0x110 [ 950.859261] ? apic_timer_interrupt+0x35/0x3c [ 950.859266] i915_gem_execbuffer2_ioctl+0x212/0x440 [ 950.859270] ? apic_timer_interrupt+0x35/0x3c [ 950.859274] ? i915_gem_do_execbuffer+0x1200/0x1200 [ 950.859279] ? insn_get_seg_base+0x1b/0x50 [ 950.859283] ? i915_gem_do_execbuffer+0x1200/0x1200 [ 950.859287] drm_ioctl_kernel+0x51/0xa0 [ 950.859291] drm_ioctl+0x2a3/0x350 [ 950.859294] ? i915_gem_do_execbuffer+0x1200/0x1200 [ 950.859300] ? sched_clock+0x9/0x10 [ 950.859303] ? drm_getunique+0x70/0x70 [ 950.859308] do_vfs_ioctl+0x7d/0x640 [ 950.859311] ? native_sched_clock+0x1e/0xc0 [ 950.859315] ? sched_clock+0x9/0x10 [ 950.859319] ? sched_clock_cpu+0x13/0x120 [ 950.859323] SyS_ioctl+0x4e/0x80 [ 950.859326] do_fast_syscall_32+0x75/0x250 [ 950.859331] ? irq_exit+0x4f/0xb0 [ 950.859334] entry_SYSENTER_32+0x47/0x71 [ 950.859338] EIP: 0xb7f81d11 [ 950.859339] EFLAGS: 00000296 CPU: 2 [ 950.859340] EAX: ffffffda EBX: 00000003 ECX: 40406469 EDX: bfde4c20 [ 950.859340] ESI: 00000003 EDI: 40406469 EBP: 00000003 ESP: bfde4b38 [ 950.859341] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b [ 950.859343] Code: e8 30 60 01 00 83 c4 10 83 c3 04 39 f3 75 e0 8b 45 d8 8b 80 14 37 00 00 85 c0 74 13 68 dd 33 e4 c0 68 49 6f e3 c0 e8 4a 55 be ff <0f> ff 5e 5f b8 fe ff ff 3f bb 0a 00 00 00 e8 b7 14 c4 ff 8b 15 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171212132148.8124-1-chris@chris-wilson.co.ukReviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
Chris Wilson authored
It is illegal to perform an immediate free of the struct irq_work from inside the irq_work callback (as irq_work_run_list modifies work->flags after execution of the work->func()). As we use the irq_work to coordinate the freeing of the callback from two different softirq paths, we need to defer the kfree from inside our irq_work callback, for which we can use kfree_rcu. Fixes: 81c0ed21 ("drm/i915/fence: Avoid del_timer_sync() from inside a timer") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171213094802.28243-1-chris@chris-wilson.co.uk
-
- 12 Dec, 2017 13 commits
-
-
Chris Wilson authored
If wait_for_engines() fails and we resort to declaring the HW wedged, dump the engine state for debugging. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171211194135.27095-2-chris@chris-wilson.co.uk
-
Chris Wilson authored
Extract the timeout we use in i915_gem_idle_work_handler() and reuse it for wait_for_engines() in i915_gem_wait_for_idle(). It too has the same problem in sometimes having to wait for an extended period before the HW settles, so make use of the same timeout. References: 5427f207 ("drm/i915: Bump wait-times for the final CS interrupt before parking") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171211194135.27095-1-chris@chris-wilson.co.uk
-
Chris Wilson authored
It never meant what it said, as it was always the total size of the Global GTT and not a limit upon memory usage. Originally it served as a quick guide to the largest batch that could be submitted by userspace, an approximation to its maximum RSS, but was phrased badly. Today with the 48b ppgtt, it is even more meaningless. Replace with a more specific debug message; those wanting to know how much "video ram" they have should consult the userspace libraries for the relevant approximation. v2: Rebase Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171212113532.22574-1-chris@chris-wilson.co.ukReviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
Chris Wilson authored
Since on gen2, we do not universally have a GPU reset implementation, we fail i915_reset() at intel_has_gpu_reset(). However, this is also intentionally disabled for CI testing and so it only has a debug message. Promote that debug message to a user-facing error message that should explain why their machine became unusable following the GPU hang. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171211204040.22858-1-chris@chris-wilson.co.uk
-
Markus Heiser authored
With commit d9e2e014 the 'GuC-specific firmware loader' doc section was removed from intel_guc_loader.c without a replacement. So lets remove it from the Kernel-doc:: .. kernel-doc:: drivers/gpu/drm/i915/intel_guc_loader.c :doc: GuC-specific firmware loader With commit e8668bbc intel_guc_loader.c was renamed to to intel_guc_fw.c and to name just one, intel_guc_init_hw() was renamed to intel_guc_fw_upload(). Since we get errors in the Sphinx build like: - Error: Cannot open file ./drivers/gpu/drm/i915/intel_guc_loader.c Change the kernel-doc directive from intel_guc_loader.c to intel_guc_fw.c Signed-off-by: Markus Heiser <markus.heiser@darmarit.de> [danvet: Rebase onto the partial fix 006c2332 ("documentation/gpu/i915: fix docs build error after file rename")] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/1513078717-12373-1-git-send-email-markus.heiser@darmarit.de
-
Matthew Auld authored
Keeps things consistent now that we make use of struct resource. This should keep us covered in case we ever get huge amounts of stolen memory. v2: bunch of missing conversions (Chris) Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171211151822.20953-10-matthew.auld@intel.com
-
Matthew Auld authored
Kick it out of i915_ggtt and keep it grouped with dsm and dsm_reserved, where it makes the most sense. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171211151822.20953-9-matthew.auld@intel.com
-
Matthew Auld authored
Now that we are using struct resource to track the stolen region, it is more convenient if we track the mappable region in a resource as well. v2: prefer iomap and gmadr naming scheme prefer DEFINE_RES_MEM Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171211151822.20953-8-matthew.auld@intel.com
-
Matthew Auld authored
Now that we are using struct resource to track the stolen region, it is more convenient if we track the reserved portion of that region in a resource as well. v2: s/<= end + 1/< end/ (Chris) v3: prefer DEFINE_RES_MEM Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171211151822.20953-7-matthew.auld@intel.com
-
Matthew Auld authored
Now that we are using struct resource to track the stolen region, it is more convenient if we track dsm in a resource as well. v2: check range_overflow when writing to 32b registers (Chris) pepper in some comments (Chris) v3: refit i915_stolen_to_dma() v4: kill ggtt->stolen_size v5: some more polish Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171211151822.20953-6-matthew.auld@intel.com
-
Matthew Auld authored
We duplicate the stolen discovery code in early-quirks and in i915, however now that the stolen region is exported as a resource from early-quirks we can nuke the duplication. v2: check overflows_type Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171211151822.20953-5-matthew.auld@intel.com
-
Matthew Auld authored
Replace the magical +2, +9 etc. with +MB, which is far easier to read. Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171211151822.20953-4-matthew.auld@intel.com
-
Matthew Auld authored
We duplicate the stolen discovery code in early-quirks and in i915, however if we just export the region as a resource from early-quirks we can nuke the duplication. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171211151822.20953-3-matthew.auld@intel.com
-