Commits · d60aa65ba221f038404b98d8484f562f72bb807b · Kirill Smelkov / linux

19 Oct, 2021 40 commits

io_uring: merge CQ and poll waitqueues · d60aa65b

Pavel Begunkov authored Oct 04, 2021

->cq_wait and ->poll_wait and waken up in the same manner, use a single
waitqueue for both of them. CQ waiters are queued exclusively, so wake
up should first go over all pollers and that's what we need.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/00fe603e50000365774cf8435ef5fe03f049c1c9.1633373302.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

d60aa65b

io_uring: don't wake sqpoll in io_cqring_ev_posted · aede728a

Pavel Begunkov authored Oct 04, 2021

io_cqring_ev_posted() doesn't need to wake SQPOLL, it's either done by
userspace or with task_work, but no action is required on request
completion. Rip off bits waking it up in io_cqring_ev_posted().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b49dab27b64cf11f4c50f2f90dcaac123430e05d.1633373302.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

aede728a

io_uring: optimise INIT_WQ_LIST · 765ff496

Pavel Begunkov authored Oct 04, 2021

The invariant of io_wq_work_list is that it's empty IFF ->first is NULL,
so no need to initially set ->last. With now having more users of the
list it may play a role, i.e. used in each tw iteration and on every
completion flushing.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c464ab5cab6e46a858c6d39c107e92b3b5291f13.1633373302.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

765ff496

io_uring: optimise request allocation · a33ae9ce

Pavel Begunkov authored Oct 04, 2021

Even after fully inlining io_alloc_req() my compiler does a NULL check
in the path of successful allocation, no hacks like an empty dereference
help it. Restructure io_alloc_req() by splitting out refilling part, so
the compiler generate a slightly better binary.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/eda17571bdc7248d8e617b23e7132a5416e4680b.1633373302.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

a33ae9ce

io_uring: delay req queueing into compl-batch list · fff4e40e

Pavel Begunkov authored Oct 04, 2021

io_req_complete_state() is inlined and used in lots of places, so we
want to keep it concise. Move adding a request into a completion batch
list from io_req_complete_state() into the consumer, i.e.
__io_queue_sqe().

before vs after
   text    data     bss     dec     hex filename
  91894   14002       8  105904   19db0 ./fs/io_uring.o
  91046   14002       8  105056   19a60 ./fs/io_uring.o
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4afca4e11abfd4cc8e99777fdcaf4d34cf4d022d.1633373302.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

fff4e40e

io_uring: add more likely/unlikely() annotations · 51d48dab

Pavel Begunkov authored Oct 04, 2021

Add two extra unlikely() in io_submit_sqes() and one around
io_req_needs_clean() to help the compiler to avoid extra jumps
in hot paths.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/88e087afe657e7660194353aada9b00f11d480f9.1633373302.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

51d48dab

io_uring: optimise kiocb layout · 7e3709d5

Pavel Begunkov authored Oct 04, 2021

We want ->comp_list in the second cacheline, which is hotter comparing
to the 3rd. Swap the field with ->link, which is not as hot and
controlled by flags and so not accessed unless there is a link.

By the way add a couple of comments for io_kiocb fields.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9d9dde31f8f62279a5f48c575bbc27b8290edc0c.1633373302.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

7e3709d5

io_uring: add flag to not fail link after timeout · 6224590d

Pavel Begunkov authored Oct 02, 2021

For some reason non-off IORING_OP_TIMEOUT always fails links, it's
pretty inconvenient and unnecessary limits chaining after it to hard
linking, which is far from ideal, e.g. doesn't pair well with timeout
cancellation. Add a flag forcing it to not fail links on -ETIME.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/17c7ec0fb7a6113cc6be8cdaedcada0ba836ac0e.1633199723.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

6224590d

io_uring: clean up buffer select · 30d51dd4

Pavel Begunkov authored Oct 01, 2021

Hiding a pointer to a struct io_buffer in rw.addr is error prone. We
have some place in io_kiocb, so keep kbuf's in a separate field
without aliasing and risks of it being misused.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3e63a6a953b04cad81d9ea827b12344dd57b37b4.1633107393.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

30d51dd4

io_uring: init opcode in io_init_req() · fc0ae024

Pavel Begunkov authored Oct 01, 2021

Move io_req_prep() call inside of io_init_req(), it simplifies a bit
error handling for callers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a0f59291fd52da4672c323542fd56fd899e23f8f.1633107393.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

fc0ae024

io_uring: don't return from io_drain_req() · e0eb71dc

Pavel Begunkov authored Oct 01, 2021

Never return from io_drain_req() but punt to tw if we've got there but
it's a false positive and we shouldn't actually drain.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/93583cee51b8783706b76c73196c155b28d9e762.1633107393.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e0eb71dc

io_uring: extra a helper for drain init · 22b2ca31

Pavel Begunkov authored Oct 01, 2021

Add a helper io_init_req_drain for initialising requests with
IOSQE_DRAIN set. Also move bits from preambule of io_drain_req() in
there, because we already modify all the bits needed inside the helper.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dcb412825b35b1cb8891245a387d7d69f8d14cef.1633107393.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

22b2ca31

io_uring: disable draining earlier · 5e371265

Pavel Begunkov authored Sep 24, 2021

Clear ->drain_active in two more cases where we check for a need of
draining. It's not a bug, but still may lead to some extra requests
being punted to io-wq, and that may be not desirable.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d20b265f77bb4e8860b15b9987252c7c711dfcba.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

5e371265

io_uring: comment why inline complete calls io_clean_op() · a1cdbb4c

Pavel Begunkov authored Sep 24, 2021

io_req_complete_state() calls io_clean_op() and it may be not entirely
obvious, leave a comment.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/21806f862151e223fdf439e5e8ed7178a8d66979.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

a1cdbb4c

io_uring: kill off ->inflight_entry field · ef05d9eb

Pavel Begunkov authored Sep 24, 2021

->inflight_entry is not used anymore after converting everything to
single linked lists, remove it. Also adjust io_kiocb layout, so all hot
bits are in first 3 cachelines.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fd8d68087ede26c4e1707ce6b175aa1eb2381f2b.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

ef05d9eb

io_uring: restructure submit sqes to_submit checks · 69629809

Pavel Begunkov authored Sep 24, 2021

Put an explicit check for number of requests to submit. First,
we can turn while into do-while and it generates better code, and second
that if can be cheaper, e.g. by using CPU flags after sub in
io_sqring_entries().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5926baadd20c28feab7a5e1725fedf32e4553ff7.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

69629809

io_uring: reshuffle queue_sqe completion handling · d9f9d284

Pavel Begunkov authored Sep 24, 2021

If a request completed inline the result should only be zero, it's a
grave error otherwise. So, when we see REQ_F_COMPLETE_INLINE it's not
even necessary to check the return code, and the flag check can be moved
earlier.

It's one "if" less for inline completions, and same two checks for it
normally completing (ret == 0). Those are two cases we care about the
most.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ebd4e397a9c26d96c99b24447acc309741041a83.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

d9f9d284

io_uring: inline hot path of __io_queue_sqe() · d475a9a6

Pavel Begunkov authored Sep 24, 2021

Extract slow paths from __io_queue_sqe() into a function and inline the
hot path. With that we have everything completely inlined on the
submission path up until io_issue_sqe().

-> io_submit_sqes()
  -> io_submit_sqe() (inlined)
    -> io_queue_sqe() (inlined)
       -> __io_queue_sqe() (inlined)
         -> io_issue_sqe()
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f1606864d95d7f26dc28c7eec3dc6ed6ec32618a.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

d475a9a6

io_uring: split slow path from io_queue_sqe · 4652fe3f

Pavel Begunkov authored Sep 24, 2021

We don't want the slow path of io_queue_sqe to be inlined, so extract a
function from it.

text data bss dec hex filename
91950 13986 8 105944 19dd8 ./fs/io_uring.o
91758 13986 8 105752 19d18 ./fs/io_uring.o
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fb01253911f8fb374268f65b1ba939b54ca6583f.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

4652fe3f

io_uring: remove drain_active check from hot path · 2a56a9bd

Pavel Begunkov authored Sep 24, 2021

req->ctx->active_drain is a bit too expensive, partially because of two
dereferences. Do a trick, if we see it set in io_init_req(), set
REQ_F_FORCE_ASYNC and it automatically goes through a slower path where
we can catch it. It's nearly free to do in io_init_req() because there
is already ->restricted check and it's in the same byte of a bitmask.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d7e7ddc63c15e8a300833132abb3eb8fd3918aef.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

2a56a9bd

io_uring: deduplicate io_queue_sqe() call sites · f15a3431

Pavel Begunkov authored Sep 24, 2021

There are two call sites of io_queue_sqe() in io_submit_sqe(), combine
them into one, because io_queue_sqe() is inline and we don't want to
bloat binary, and will become even bigger

text data bss dec hex filename
92126 13986 8 106120 19e88 ./fs/io_uring.o
91966 13986 8 105960 19de8 ./fs/io_uring.o
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/506124b8e767f0a4576f7a459f6aea3d13fb4dda.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

f15a3431

io_uring: don't pass state to io_submit_state_end · 553deffd

Pavel Begunkov authored Sep 24, 2021

Submission state and ctx and coupled together, no need to passs
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e22d77a5786ef77e0c49b933ad74bae55cfb6ca6.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

553deffd

io_uring: don't pass tail into io_free_batch_list · 1cce17ac

Pavel Begunkov authored Sep 24, 2021

io_free_batch_list() iterates all requests in the passed in list,
so we don't really need to know the tail but can keep iterating until
meet NULL. Just pass the first node into it and it will be enough.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4a12c84b6d887d980e05f417ba4172d04c64acae.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

1cce17ac

io_uring: inline completion batching helpers · d4b7a5ef

Pavel Begunkov authored Sep 24, 2021

We now have a single function for batched put of requests, just inline
struct req_batch and all related helpers into it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/595a2917f80dd94288cd7203052c7934f5446580.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

d4b7a5ef

io_uring: optimise batch completion · f5ed3bcd

Pavel Begunkov authored Sep 24, 2021

First, convert rest of iopoll bits to single linked lists, and also
replace per-request list_add_tail() with splicing a part of slist.

With that, use io_free_batch_list() to put/free requests. The main
advantage of it is that it's now the only user of struct req_batch and
friends, and so they can be inlined. The main overhead there was
per-request call to not-inlined io_req_free_batch(), which is expensive
enough.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b37fc6d5954b241e025eead7ab92c6f44a42f229.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

f5ed3bcd

io_uring: convert iopoll_completed to store_release · b3fa03fd

Pavel Begunkov authored Sep 24, 2021

Convert explicit barrier around iopoll_completed to smp_load_acquire()
and smp_store_release(). Similar on the callback side, but replaces a
single smp_rmb() with per-request smp_load_acquire(), neither imply any
extra CPU ordering for x86. Use READ_ONCE as usual where it doesn't
matter.

Use it to move filling CQEs by iopoll earlier, that will be necessary
to avoid traversing the list one extra time in the future.
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8bd663cb15efdc72d6247c38ee810964e744a450.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

b3fa03fd

io_uring: add a helper for batch free · 3aa83bfb

Pavel Begunkov authored Sep 24, 2021

Add a helper io_free_batch_list(), which takes a single linked list and
puts/frees all requests from it in an efficient manner. Will be reused
later.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4fc8306b542c6b1dd1d08e8021ef3bdb0ad15010.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

3aa83bfb

io_uring: use single linked list for iopoll · 5eef4e87

Pavel Begunkov authored Sep 24, 2021

Use single linked lists for keeping iopoll requests, takes less space,
may be faster, but mostly will be of benefit for further patches.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/314033676b100cd485518c3bc55e1b95a0dcd71f.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

5eef4e87

io_uring: split iopoll loop · e3f721e6

Pavel Begunkov authored Sep 24, 2021

The main loop of io_do_iopoll() iterates and does ->iopoll() until it
meets a first completed request, then it continues from that position
and splices requests to pass them through io_iopoll_complete().

Split the loop in two for clearness, iopolling and reaping completed
requests from the list.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a7f6fd27a94845e5dc925a47a4a9765a92e514fb.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e3f721e6

io_uring: replace list with stack for req caches · c2b6c6bc

Pavel Begunkov authored Sep 24, 2021

Replace struct list_head free_list serving for caching requests with
singly linked stack, which is faster.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1bc942b82422fb2624b8353bd93aca183a022846.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

c2b6c6bc

io-wq: add io_wq_work_node based stack · 0d9521b9

Pavel Begunkov authored Sep 24, 2021

Apart from just using lists (i.e. io_wq_work_list), we also want to have
stacks, which are a bit faster, and have some interoperability between
them. Add a stack implementation based on io_wq_work_node and some
helpers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5d3a412a5ac0d47e0f0499d70d2207d70a68925e.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

0d9521b9

io_uring: remove allocation cache array · 3ab665b7

Pavel Begunkov authored Sep 24, 2021

We have several of request allocation layers, remove the last one, which
is the submit->reqs array, and always use submit->free_reqs instead.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8547095c35f7a87bab14f6447ecd30a273ed7500.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

3ab665b7

io_uring: use slist for completion batching · 6f33b0bc

Pavel Begunkov authored Sep 24, 2021

Currently we collect requests for completion batching in an array.
Replace them with a singly linked list. It's as fast as arrays but
doesn't take some much space in ctx, and will be used in future patches.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a666826f2854d17e9fb9417fb302edfeb750f425.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

6f33b0bc

io_uring: make io_do_iopoll return number of reqs · 5ba3c874

Pavel Begunkov authored Sep 24, 2021

Don't pass nr_events pointer around but return directly, it's less
expensive than pointer increments.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f771a8153a86f16f12ff4272524e9e549c5de40b.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

5ba3c874

io_uring: force_nonspin · 87a115fb

Pavel Begunkov authored Sep 24, 2021

We don't really need to pass the number of requests to complete into
io_do_iopoll(), a flag whether to enforce non-spin mode is enough.

Should be straightforward, maybe except io_iopoll_check(). We pass !min
there, because we do never enter with the number of already reaped
requests is larger than the specified @min, apart from the first
iteration, where nr_events is 0 and so the final check should be
identical.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/782b39d1d8ec584eae15bca0a1feb6f0571fe5b8.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

87a115fb

io_uring: mark having different creds unlikely · 6878b40e

Pavel Begunkov authored Sep 24, 2021

Hint the compiler that it's not as likely to have creds different from
current attached to a request. The current code generation is far from
ideal, hopefully it can help to some compilers to remove duplicated jump
tables and so.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e7815251ac4bf5a4a23d298c752f029ae19f3837.1632516769.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

6878b40e

io_uring: return boolean value for io_alloc_async_data · 8d4af685

Hao Xu authored Sep 22, 2021

boolean value is good enough for io_alloc_async_data.
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210922101522.9179-1-haoxu@linux.alibaba.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

8d4af685

io_uring: optimise io_req_init() sqe flags checks · 68fe256a

Pavel Begunkov authored Sep 15, 2021

IOSQE_IO_DRAIN is quite marginal and we don't care too much about
IOSQE_BUFFER_SELECT. Save to ifs and hide both of them under
SQE_VALID_FLAGS check. Now we first check whether it uses a "safe"
subset, i.e. without DRAIN and BUFFER_SELECT, and only if it's not
true we test the rest of the flags.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dccfb9ab2ab0969a2d8dc59af88fa0ce44eeb1d5.1631703764.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

68fe256a

io_uring: remove ctx referencing from complete_post · a3f34907

Pavel Begunkov authored Sep 15, 2021

Now completions are done from task context, that means that it's either
the task itself, task_work or io-wq worker. In all those cases the ctx
will be staying alive by mutexing, explicit referencing or req references
by iowq. Remove extra ctx pinning from io_req_complete_post().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/60a0e96434c16ab4fe587651448290d61ec9a113.1631703756.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

a3f34907

io_uring: add more uring info to fdinfo for debug · 83f84356

Hao Xu authored Sep 13, 2021

Developers may need some uring info to help themselves debug and address
issues in production. This includes sqring/cqring head/tail and the
detailed sqe/cqe info, which is very useful when an application is hung
on a ring.
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210913130854.38542-1-haoxu@linux.alibaba.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

83f84356