- 29 Jan, 2023 40 commits
-
-
Breno Leitao authored
Every io_uring request is represented by struct io_kiocb, which is cached locally by io_uring (not SLAB/SLUB) in the list called submit_state.freelist. This patch simply enabled KASAN for this free list. This list is initially created by KMEM_CACHE, but later, managed by io_uring. This patch basically poisons the objects that are not used (i.e., they are the free list), and unpoisons it when the object is allocated/removed from the list. Touching these poisoned objects while in the freelist will cause a KASAN warning. Suggested-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Breno Leitao <leitao@debian.org> Reviewed-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If TIF_NOTIFY_RESUME is set, then we need to call resume_user_mode_work() for PF_IO_WORKER threads. They never return to usermode, hence never get a chance to process any items that are marked by this flag. Most notably this includes the final put of files, but also any throttling markers set by block cgroups. Cc: stable@vger.kernel.org # 5.10+ Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If the target ring is using IORING_SETUP_SINGLE_ISSUER and we're posting a message from a different thread, then we need to ensure that the fallback task_work that posts the CQE knwos about the flags passing as well. If not we'll always be posting 0 as the flags. Fixes: 3563d7ed58a5 ("io_uring/msg_ring: Pass custom flags to the cqe") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Breno Leitao authored
This patch removes some "cold" fields from `struct io_issue_def`. The plan is to keep only highly used fields into `struct io_issue_def`, so, it may be hot in the cache. The hot fields are basically all the bitfields and the callback functions for .issue and .prep. The other less frequently used fields are now located in a secondary and cold struct, called `io_cold_def`. This is the size for the structs: Before: io_issue_def = 56 bytes After: io_issue_def = 24 bytes; io_cold_def = 40 bytes Signed-off-by:
Breno Leitao <leitao@debian.org> Reviewed-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20230112144411.2624698-2-leitao@debian.orgSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Breno Leitao authored
The current io_op_def struct is becoming huge and the name is a bit generic. The goal of this patch is to rename this struct to `io_issue_def`. This struct will contain the hot functions associated with the issue code path. For now, this patch only renames the structure, and an upcoming patch will break up the structure in two, moving the non-issue fields to a secondary struct. Signed-off-by:
Breno Leitao <leitao@debian.org> Reviewed-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20230112144411.2624698-1-leitao@debian.orgSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Keep parts of __io_req_complete_post() relying on req->flags together so the value can be cached. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2b4fbb42f404a0e75c4d9f0a5b16f314a839d0a9.1673887636.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
There may be different cost for reeading just one byte or more, so it's benificial to keep ctx flag bits that we access together in a single byte. That affected code generation of __io_cq_unlock_post_flush() and removed one memory load. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/bbe8ca4705704690319d65e45845f9fc9d35f420.1673887636.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Lock the ring with uring_lock in io_fallback_req_func(), which should make it a bit safer and easier. With that we also don't need refs pinning as io_ring_exit_work() will wait until uring_lock is freed. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/56170e6a0cbfc8edee2794c6613e8f6f1d76d276.1673887636.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io_put_task() is only used in uring.c so enclose it there together with __io_put_task(). Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/43c7f9227e2ab215f1a6069dadbc5382bed346fe.1673887636.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io_submit_flush_completions() may queue new requests for tw execution, especially true for linked requests. Recheck the tw list for emptiness after flushing completions. Note that this doesn't really fix the commit referenced below, but it does reinstate an optimization that existed before that got merged. Fixes: f88262e6 ("io_uring: lockless task list") Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6328acdbb5e60efc762b18003382de077e6e1367.1673887636.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Quanfa Fu authored
Change the return type to void since it always return 0, and no need to do the checking in syscall io_uring_enter. Signed-off-by:
Quanfa Fu <quanfafu@gmail.com> Link: https://lore.kernel.org/r/20230115071519.554282-1-quanfafu@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
We needed fake nodes in __io_run_local_work() and to avoid unecessary wake ups while the task already running task_works, but we don't need them anymore since wake ups are protected by cq_waiting, which is always cleared by the time we're executing deferred task_work items. Note that because of loose sync around cq_waiting clearing io_req_local_work_add() may wake the task more than once, but that's fine and should be rare to not hurt perf. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8839534891f0a2f1076e78554a31ea7e099f7de5.1673274244.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Don't wake the master task after queueing a deferred tw unless it's currently waiting in io_cqring_wait. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/717702d772825a6647e6c315b4690277ba84c3fc.1673274244.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
With DEFER_TASKRUN only ctx->submitter_task might be waiting for CQEs, we can use this to optimise io_cqring_wait(). Replace ->cq_wait waitqueue with waking the task directly. It works but misses an important optimisation covered by the following patch, so this patch without follow ups might hurt performance. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/103d174d35d919d4cb0922d8a9c93a8f0c35f74a.1673274244.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Flush completions is done either from the submit syscall or by the task_work, both are in the context of the submitter task, and when it goes for a single threaded rings like implied by ->task_complete, there won't be any waiters on ->cq_wait but the master task. That means that there can be no tasks sleeping on cq_wait while we run __io_submit_flush_completions() and so waking up can be skipped. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/60ad9768ec74435a0ddaa6eec0ffa7729474f69f.1673274244.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Even though io_poll_wq_wake()'s waitqueue_active reuses a barrier we do for another waitqueue, it's not going to be the case in the future and so we want to have a fast path for it when the ring has never been polled. Move poll_wq wake ups into __io_commit_cqring_flush() using a new flag called ->poll_activated. The idea behind the flag is to set it when the ring was polled for the first time. This requires additional sync to not miss events, which is done here by using task_work for ->task_complete rings, and by default enabling the flag for all other types of rings. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/060785e8e9137a920b232c0c7f575b131af19cac.1673274244.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Don't use ->cq_wait for ring polling but add a separate wait queue for it. We need it for following patches. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/dea0be0bf990503443c5c6c337fc66824af7d590.1673274244.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io_run_local_work_locked() is only used in io_uring.c, move it there. With that we can also make __io_run_local_work() static. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/91757bcb33e5774e49fed6f2b6e058630608119b.1673274244.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io_run_local_work is enclosed in io_uring.c, we don't need to export it. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b477fb81f5e77044f724a06fe245d5c078659364.1673274244.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
The CQ waiting loop sets TASK_RUNNING before trying to execute task_work, no need to repeat it in io_run_local_work(). Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9d9422c429ef3f9457b4f4b8288bf4789564f33b.1673274244.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Remove a local variable ctx in io_wake_function(), we don't need it if io_should_wake() triggers it to wake up. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e60eb1008aebe286aab7d34c772ed01c447bddb1.1673274244.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
->submitter_task is used somewhat more frequent now than before, i.e. for local tw enqueue and run, let's move it from the end of ctx, which is full of cold data, to the first cacheline with mostly constants. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/415ca91dc5ad1dec612b892e489cda98e1069542.1673274244.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Dmitrii Bundin authored
The IS_ERR function uses the IS_ERR_VALUE macro under the hood which already wraps the condition into unlikely. Signed-off-by:
Dmitrii Bundin <dmitrii.bundin.a@gmail.com> Link: https://lore.kernel.org/r/20230109185854.25698-1-dmitrii.bundin.a@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Breno Leitao authored
This patch adds a new flag (IORING_MSG_RING_FLAGS_PASS) in the message ring operations (IORING_OP_MSG_RING). This new flag enables the sender to specify custom flags, which will be copied over to cqe->flags in the receiving ring. These custom flags should be specified using the sqe->file_index field. This mechanism provides additional flexibility when sending messages between rings. Signed-off-by:
Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20230103160507.617416-1-leitao@debian.orgSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Move waiting timeout into io_wait_queue Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e4b48a9e26a3b1cf97c80121e62d4b5ab873d28d.1672916894.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Unlike the jiffy scheduling version, schedule_hrtimeout() jumps a few functions before getting into schedule() even if there is no actual timeout needed. Some tests showed that it takes up to 1% of CPU cycles. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/89f880574eceee6f4899783377ead234df7b3d04.1672916894.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Instead of constantly watching that the state of the task is running before executing tw or taking locks in io_cqring_wait(), switch it back to TASK_RUNNING immediately. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/246dddee247d89fd52023f785ed17cc34962a008.1672916894.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
->work_llist should never be non-empty for a non DEFER_TASKRUN ring, so we can safely skip checking the flag. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/26af9f73c09a56c9a035f94db56127358688f3aa.1672916894.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io_cqring_wait_schedule() is called after we started waiting on the cq wq and set the state to TASK_INTERRUPTIBLE, for that reason we have to constantly worry whether we has returned the state back to running or not. Leave only quick checks in io_cqring_wait_schedule() and move the rest including running task work to the callers. Note, we run tw in the loop after the sched checks because of the fast path in the beginning of the function. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2814fabe75e2e019e7ca43ea07daa94564349805.1672916894.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
We already avoid flushing overflows in io_cqring_wait_schedule() but only return an error for the outer loop to handle it. Minimise it even further by moving all ->check_cq parsing there. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9dfcec3121013f98208dbf79368d636d74e1231a.1672916894.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Most places that want to run local tw explicitly and in advance check if they are allowed to do so. Don't rely on a similar check in __io_run_local_work(), leave it as a just-in-case warning and make sure callers checks capabilities themselves. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/990fe0e8e70fd4d57e43625e5ce8fba584821d1a.1672916894.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
There is only one user of io_run_task_work_ctx(), inline it. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/40953c65f7c88fb00cdc4d870ca5d5319fb3d7ea.1672916894.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Task work runners keep running until all queues tw items are exhausted. It's also rare for defer tw to queue normal tw and vise versa. Taking it into account, there is only a dim chance that further iterating the io_cqring_wait() fast path will get us anything and so we can remove the loop there. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1f9565726661266abaa5d921e97433c831759ecf.1672916894.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
There should be nothing in the ->work_llist for non DEFER_TASKRUN rings, so we can skip flag checks and test the list emptiness directly. Also move it out of io_run_local_work() for inlining. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/331d63fd15ca79b35b95c82a82d9246110686392.1672916894.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fix from Borislav Petkov: - Cleanup the firmware node for the new IRQ MSI domain properly, to avoid leaking memory * tag 'irq_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/msi: Free the fwnode created by msi_create_device_irq_domain()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Borislav Petkov: - Start checking for -mindirect-branch-cs-prefix clang support too now that LLVM 16 will support it - Fix a NULL ptr deref when suspending with Xen PV - Have a SEV-SNP guest check explicitly for features enabled by the hypervisor and fail gracefully if some are unsupported by the guest instead of failing in a non-obvious and hard-to-debug way - Fix a MSI descriptor leakage under Xen - Mark Xen's MSI domain as supporting MSI-X - Prevent legacy PIC interrupts from being resent in software by marking them level triggered, as they should be, which lead to a NULL ptr deref * tag 'x86_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Move '-mindirect-branch-cs-prefix' out of GCC-only block acpi: Fix suspend with Xen PV x86/sev: Add SEV-SNP guest feature negotiation support x86/pci/xen: Fixup fallout from the PCI/MSI overhaul x86/pci/xen: Set MSI_FLAG_PCI_MSIX support in Xen MSI domain x86/i8259: Mark legacy PIC interrupts with IRQ_LEVEL
-
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/inputLinus Torvalds authored
Pull input fixes from Dmitry Torokhov: - touchpads on HP 15-* laptops switched back to PS/2 emulation mode - a quirk for Clevo PCX0DX/TUXEDO XP1511 to make sure keyboard is responding after resume * tag 'input-for-v6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: i8042 - add Clevo PCX0DX to i8042 quirk table Revert "Input: synaptics - switch touchpad on HP Laptop 15-da3001TU to RMI mode"
-
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxlLinus Torvalds authored
Pull cxl fixes from Dan Williams: "A couple of fixes for bugs introduced during the merge window. One is a regression, the other was a bug in the CXL AER handler: - Fix a crash regression due to module load order of cxl_pmem.ko - Fix wrong register offset read in CXL AER handling path" * tag 'cxl-fixes-for-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/pmem: Fix nvdimm unregistration when cxl_pmem driver is absent cxl: fix cxl_report_and_clear() RAS UE addr mis-assignment
-
Vlastimil Babka authored
This reverts commit 7efc3b72. We have got openSUSE reports (Link 1) for 6.1 kernel with khugepaged stalling CPU for long periods of time. Investigation of tracepoint data shows that compaction is stuck in repeating fast_find_migrateblock() based migrate page isolation, and then fails to migrate all isolated pages. Commit 7efc3b72 ("mm/compaction: fix set skip in fast_find_migrateblock") was suspected as it was merged in 6.1 and in theory can indeed remove a termination condition for fast_find_migrateblock() under certain conditions, as it removes a place that always marks a scanned pageblock from being re-scanned. There are other such places, but those can be skipped under certain conditions, which seems to match the tracepoint data. Testing of revert also appears to have resolved the issue, thus revert the commit until a more robust solution for the original problem is developed. It's also likely this will fix qemu stalls with 6.1 kernel reported in Link 2, but that is not yet confirmed. Link: https://bugzilla.suse.com/show_bug.cgi?id=1206848 Link: https://lore.kernel.org/kvm/b8017e09-f336-3035-8344-c549086c2340@kernel.org/ Link: https://lore.kernel.org/lkml/20230125134434.18017-1-mgorman@techsingularity.net/ Fixes: 7efc3b72 ("mm/compaction: fix set skip in fast_find_migrateblock") Cc: <stable@vger.kernel.org> Tested-by:
Pedro Falcato <pedro.falcato@gmail.com> Acked-by:
Mel Gorman <mgorman@techsingularity.net> Signed-off-by:
Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-