- 30 Jun, 2021 16 commits
-
-
Hao Xu authored
We currently spin in iopoll() when requests to be iopolled are for same file(device), while one device may have multiple hardware queues. given an example: hw_queue_0 | hw_queue_1 req(30us) req(10us) If we first spin on iopolling for the hw_queue_0. the avg latency would be (30us + 30us) / 2 = 30us. While if we do round robin, the avg latency would be (30us + 10us) / 2 = 20us since we reap the request in hw_queue_1 in time. So it's better to do spinning only when requests are in same hardware queue. Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Most of requests are allocated from an internal cache, so it's waste of time fully initialising them every time. Instead, let's pre-init some of the fields we can during initial allocation (e.g. kmalloc(), see io_alloc_req()) and keep them valid on request recycling. There are four of them in this patch: ->ctx is always stays the same ->link is NULL on free, it's an invariant ->result is not even needed to init, just a precaution ->async_data we now clean in io_dismantle_req() as it's likely to never be allocated. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/892ba0e71309bba9fe9e0142472330bbf9d8f05d.1624739600.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Don't init req_batch before we actually need it. Also, add a small clean up for req declaration. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ad85512e12bd3a20d521e9782750300970e5afc8.1624739600.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Move likely/unlikely from io_check_restriction() to specifically ctx->restricted check, because doesn't do what it supposed to and make the common path take an extra jump. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/22bf70d0a543dfc935d7276bdc73081784e30698.1624739600.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Since cancellation got moved before exit_signals(), there is no one left who can call io_run_task_work() with PF_EXIING set, so remove the check. Note that __io_req_task_submit() still needs a similar check. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f7f305ececb1e6044ea649fb983ca754805bb884.1624739600.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
task_works are widely used, so place io_run_task_work() directly into the main path of io_sq_thread(), and remove it from other places where it's not needed anymore. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/24eb5e35d519c590d3dffbd694b4c61a5fe49029.1624739600.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
gcc 11 goes a weird path and duplicates most of io_arm_poll_handler() for READ and WRITE cases. Help it and move all pollin vs pollout specific bits under a single if-else, so there is no temptation for this kind of unfolding. before vs after: text data bss dec hex filename 85362 12650 8 98020 17ee4 ./fs/io_uring.o 85186 12650 8 97844 17e34 ./fs/io_uring.o Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1deea0037293a922a0358e2958384b2e42437885.1624739600.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Olivier Langlois authored
It is quite frequent that when an operation fails and returns EAGAIN, the data becomes available between that failure and the call to vfs_poll() done by io_arm_poll_handler(). Detecting the situation and reissuing the operation is much faster than going ahead and push the operation to the io-wq. Performance improvement testing has been performed with: Single thread, 1 TCP connection receiving a 5 Mbps stream, no sqpoll. 4 measurements have been taken: 1. The time it takes to process a read request when data is already available 2. The time it takes to process by calling twice io_issue_sqe() after vfs_poll() indicated that data was available 3. The time it takes to execute io_queue_async_work() 4. The time it takes to complete a read request asynchronously 2.25% of all the read operations did use the new path. ready data (baseline) avg 3657.94182918628 min 580 max 20098 stddev 1213.15975908162 reissue completion average 7882.67567567568 min 2316 max 28811 stddev 1982.79172973284 insert io-wq time average 8983.82276995305 min 3324 max 87816 stddev 2551.60056552038 async time completion average 24670.4758861127 min 10758 max 102612 stddev 3483.92416873804 Conclusion: On average reissuing the sqe with the patch code is 1.1uSec faster and in the worse case scenario 59uSec faster than placing the request on io-wq On average completion time by reissuing the sqe with the patch code is 16.79uSec faster and in the worse case scenario 73.8uSec faster than async completion. Signed-off-by: Olivier Langlois <olivier@trillion01.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9e8441419bb1b8f3c3fcc607b2713efecdef2136.1624364038.git.olivier@trillion01.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We can't support IOPOLL with non-pollable request types, and we should check for unused/reserved fields like we do for other request types. Fixes: 14a1143b ("io_uring: add support for IORING_OP_UNLINKAT") Cc: stable@vger.kernel.org Reported-by: Dmitry Kadashev <dkadashev@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We can't support IOPOLL with non-pollable request types, and we should check for unused/reserved fields like we do for other request types. Fixes: 80a261fd ("io_uring: add support for IORING_OP_RENAMEAT") Cc: stable@vger.kernel.org Reported-by: Dmitry Kadashev <dkadashev@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Put do_filp_open() fail path of io_openat2() under a single if, deduplicating put_unused_fd(), making it look better and helping the hot path. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f4c84d25c049d0af2adc19c703bbfef607200209.1624543113.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Flatten struct io_uring_sqe, the last union is exactly 64B, so move them out of union { struct { ... }}, and decrease __pad2 size. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2e21ef7aed136293d654450bc3088973a8adc730.1624543113.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Add missing BUILD_BUG_SQE_ELEM() for ->buf_group verifying that SQE layout doesn't change. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1f9d21bd74599b856b3a632be4c23ffa184a3ef0.1624543113.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Fix a bunch of problems mostly found by checkpatch.pl Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cfaf9a2f27b43934144fe9422a916bd327099f44.1624543113.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Move needs_sched declaration into the block where it's used, so it's harder to misuse/wrongfully reuse. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e4a07db1353ee38b924dd1b45394cf8e746130b4.1624543113.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
SQPOLL doesn't need to change creds if it's not submitting requests. Move creds overriding into __io_sq_thread() after checking if there are SQEs pending. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c54368da2357ac539e0a333f7cfff70d5fb045b2.1624543113.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 24 Jun, 2021 2 commits
-
-
Olivier Langlois authored
The magic number used to cap the number of entries extracted from an io_uring instance SQ before moving to the other instances is an interesting parameter to experiment with. A define has been created to make it easy to change its value from a single location. Signed-off-by: Olivier Langlois <olivier@trillion01.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b401640063e77ad3e9f921e09c9b3ac10a8bb923.1624473200.git.olivier@trillion01.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Olivier Langlois authored
If an asynchronous completion happens before the task is preparing itself to wait and set its state to TASK_INTERRUPTIBLE, the completion will not wake up the sqp thread. Cc: stable@vger.kernel.org Signed-off-by: Olivier Langlois <olivier@trillion01.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d1419dc32ec6a97b453bee34dc03fa6a02797142.1624473200.git.olivier@trillion01.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 18 Jun, 2021 12 commits
-
-
Pavel Begunkov authored
If task_state is cleared, io_req_task_work_add() will go the slow path adding a task_work, setting the task_state, waking up the task and so on. Not to mention it's expensive. tctx_task_work() first clears the state and then executes all the work items queued, so if any of them resubmits or adds new task_work items, it would unnecessarily go through the slow path of io_req_task_work_add(). Let's clear the ->task_state at the end. We still have to check ->task_list for emptiness afterward to synchronise with io_req_task_work_add(), do that, and set the state back if we're going to retry, because clearing not-ours task_state on the next iteration would be buggy. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1ef72cdac7022adf0cd7ce4bfe3bb5c82a62eb93.1623949695.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Entering tctx_task_work() with empty task_list is a strange scenario, that can happen only on rare occasion during task exit, so let's not check for task_list emptiness in advance and do it do-while style. The code still correct for the empty case, just would do extra work about which we don't care. Do extra step and do the check before cond_resched(), so we don't resched if have nothing to execute. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c4173e288e69793d03c7d7ce826f9d28afba718a.1623949695.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
We don't need a full copy of tctx->task_list in tctx_task_work(), but only a first one, so just assign node directly. Taking into account that task_works are run in a context of a task, it's very unlikely to first see non-empty tctx->task_list and then splice it empty, can only happen with task_work cancellations that is not-normal slow path anyway. Hence, get rid of the check in the end, it's there not for validity but "performance" purposes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d076c83fedb8253baf43acb23b8fafd7c5da1714.1623949695.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
tctx_task_work() tries to fetch a next batch of requests, but before it would flush completions from the previous batch that may be sub-optimal. E.g. io_req_task_queue() executes a head of the link where all the linked may be enqueued through the same io_req_task_queue(). And there are more cases for that. Do the flushing at the end, so it can cache completions of several waves of a single tctx_task_work(), and do the flush at the very end. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3cac83934e4fbce520ff8025c3524398b3ae0270.1623949695.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Inline __tctx_task_work() into tctx_task_work() in preparation for further optimisations. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f9c05c4bc9763af7bd8e25ebc3c5f7b6f69148f8.1623949695.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Clean up io_get_sequence() and add a comment describing the magic around sequence correction. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f55dc409936b8afa4698d24b8677a34d31077ccb.1623949695.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Clean all flags in io_clean_op() in the end in one operation, will save us a couple of operation and binary size. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b8efe1f022a037f74e7fe497c69fb554d59bfeaf.1623949695.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
We don't get REQ_F_NEED_CLEANUP for rw unless there is ->free_iovec set, so remove the optimisation of NULL checking it inline, kfree() will take care if that would ever be the case. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a233dc655d3d45bd4f69b73d55a61de46d914415.1623949695.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Currently, if req->creds is not NULL, then there are creds assigned. Track the invariant with a new flag in req->flags. No need to clear the field at init, and also cleanup can be efficiently moved into io_clean_op(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5f8baeb8d3b909487f555542350e2eac97005556.1623949695.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io-wq now doesn't have anything to do with creds now, so move ->creds from struct io_wq_work into request (aka struct io_kiocb). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8520c72ab8b8f4b96db12a228a2ab4c094ae64e1.1623949695.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
struct io_comp_state is always contained in struct io_ring_ctx, don't pass them into io_submit_flush_completions() separately, it makes the interface cleaner and simplifies it for the compiler. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/44d6ca57003a82484338e95197024dbd65a1b376.1623949695.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
WARNING: CPU: 1 PID: 11749 at fs/io-wq.c:244 io_wqe_wake_worker fs/io-wq.c:244 [inline] WARNING: CPU: 1 PID: 11749 at fs/io-wq.c:244 io_wqe_enqueue+0x7f6/0x910 fs/io-wq.c:751 A WARN_ON_ONCE() in io_wqe_wake_worker() can be triggered by a valid userspace setup. Replace it with pr_warn. Reported-by: syzbot+ea2f1484cffe5109dc10@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f7ede342c3342c4c26668f5168e2993e38bbd99c.1623949695.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 17 Jun, 2021 2 commits
-
-
Jens Axboe authored
io-wq defaults to per-node masks for IO workers. This works fine by default, but isn't particularly handy for workloads that prefer more specific affinities, for either performance or isolation reasons. This adds IORING_REGISTER_IOWQ_AFF that allows the user to pass in a CPU mask that is then applied to IO thread workers, and an IORING_UNREGISTER_IOWQ_AFF that simply resets the masks back to the default of per-node. Note that no care is given to existing IO threads, they will need to go through a reschedule before the affinity is correct if they are already running or sleeping. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
In preparation for allowing user specific CPU masks for IO thread creation, switch to using a mask embedded in the per-node wqe structure. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 16 Jun, 2021 3 commits
-
-
Olivier Langlois authored
mm related header files are not needed for io-wq module. remove them for a small clean-up. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Olivier Langlois <olivier@trillion01.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Olivier Langlois authored
Fix tabulation to make nice columns Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Olivier Langlois <olivier@trillion01.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Olivier Langlois authored
The req pointer uniquely identify a specific request. Having it in traces can provide valuable insights that is not possible to have if the calling process is reusing the same user_data value. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Olivier Langlois <olivier@trillion01.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 15 Jun, 2021 5 commits
-
-
Pavel Begunkov authored
In most cases io_commit_cqring() is just an smp_store_release(), and it's hot enough, especially for IRQ rw, to want it to save on a function call. Mark it inline and extract a non-inlined slow path doing drain and timeout flushing. The inlined part is pretty slim to not cause binary bloating. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7350f8b6b92caa50a48a80be39909f0d83eddd93.1623772051.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Place all drain_next logic into io_drain_req(), so it's never executed if there was no drained requests before. The only thing we need is to set ->drain_active if we see a request with IOSQE_IO_DRAIN, do that in io_init_req() where flags are definitely in registers. Also, all drain-related code is encapsulated in io_drain_req(), makes it cleaner. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/68bf4f7395ddaafbf1a26bd97b57d57d45a9f900.1623772051.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
->drain_used is one way, which is not optimal if users use DRAIN but very rarely. However, we can just clear it in io_drain_req() when all drained before requests are gone. Also rename the flag to reflect the change and be more clear about it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7f37a240857546a94df6348507edddacab150460.1623772051.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
fs/io_uring.c: In function 'io_alloc_page_table': include/linux/minmax.h:20:28: warning: comparison of distinct pointer types lacks a cast Cast everything to size_t using min_t. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 9123c8ff ("io_uring: add helpers for 2 level table alloc") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/50f420a956bca070a43810d4a805293ed54f39d8.1623759527.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Fam Zheng authored
The sqe_ptr argument has been gone since 709b302f (io_uring: simplify io_get_sqring, 2020-04-08), made the return value of the function. Update the comment accordingly. Signed-off-by: Fam Zheng <fam.zheng@bytedance.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20210604164256.12242-1-fam.zheng@bytedance.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-