Commits · d581076b6a85c6f8308a4ba2bdcd82651f5183df · Kirill Smelkov / linux

12 Apr, 2023 9 commits

io_uring/rsrc: extract SCM file put helper · d581076b

Pavel Begunkov authored Apr 11, 2023

SCM file accounting is a slow path and is only used for UNIX files.
Extract a helper out of io_rsrc_file_put() that does the SCM
unaccounting.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/58cc7bffc2ee96bec8c2b89274a51febcbfa5556.1681210788.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

d581076b

io_uring/rsrc: refactor io_rsrc_node_switch · 2933ae6e

Pavel Begunkov authored Apr 11, 2023

We use io_rsrc_node_switch() coupled with io_rsrc_node_switch_start()
for a bunch of cases including initialising ctx->rsrc_node, i.e. by
passing NULL instead of rsrc_data. Leave it to only deal with actual
node changing.

For that, first remove it from io_uring_create() and add a function
allocating the first node. Then also remove all calls to
io_rsrc_node_switch() from files/buffers register as we already have a
node installed and it does essentially nothing.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d146fe306ff98b1a5a60c997c252534f03d423d7.1681210788.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

2933ae6e

io_uring/rsrc: zero node's rsrc data on alloc · 13c22396

Pavel Begunkov authored Apr 11, 2023

struct io_rsrc_node::rsrc_data field is initialised on rsrc removal and
shouldn't be used before that, still let's play safe and zero the field
on alloc.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/09bd03cedc8da8a7974c5e6e4bf0489fd16593ab.1681210788.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

13c22396

io_uring/rsrc: consolidate node caching · 528407b1

Pavel Begunkov authored Apr 11, 2023

We store one pre-allocated rsrc node in ->rsrc_backup_node, merge it
with ->rsrc_node_cache.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6d5410e51ccd29be7a716be045b51d6b371baef6.1681210788.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

528407b1

io_uring/rsrc: add lockdep checks · 786788a8

Pavel Begunkov authored Apr 11, 2023

Add a lockdep chek to make sure that file and buffer updates hold
->uring_lock.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/961bbe6e433ec9bc0375127f23468b37b729df99.1681210788.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

786788a8

io_uring: add irq lockdep checks · 8ce4269e

Pavel Begunkov authored Apr 11, 2023

We don't post CQEs from the IRQ context, add a check catching that.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f23f7a24dbe8027b3d37873fece2b6488f878b31.1681210788.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

8ce4269e

io_uring/kbuf: remove extra ->buf_ring null check · ceac766a

Pavel Begunkov authored Apr 11, 2023

The kernel test robot complains about __io_remove_buffers().

io_uring/kbuf.c:221 __io_remove_buffers() warn: variable dereferenced
before check 'bl->buf_ring' (see line 219)

That check is not needed as ->buf_ring will always be set, so we can
remove it and so silence the warning.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9a632bbf749d9d911e605255652ce08d18e7d2c6.1681210788.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

ceac766a

io_uring: shut io_prep_async_work warning · 8b1df11f

Pavel Begunkov authored Apr 11, 2023

io_uring/io_uring.c:432 io_prep_async_work() error: we previously
assumed 'req->file' could be null (see line 425).

Even though it's a false positive as there will not be REQ_F_ISREG set
without a file, let's add a simple check to make the kernel test robot
happy. We don't care about performance here, but assumingly it'll be
optimised out by the compiler.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a6cfbe92c74b789c0b4f046f7f98d19b1ca2e5b7.1681210788.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

8b1df11f

io_uring/uring_cmd: take advantage of completion batching · 27a67079

Jens Axboe authored Apr 12, 2023

We know now what the completion context is for the uring_cmd completion
handling, so use that to have io_req_task_complete() decide what the
best way to complete the request is. This allows batching of the posted
completions if we have multiple pending, rather than always doing them
one-by-one.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

27a67079

06 Apr, 2023 8 commits

io_uring: optimise io_req_local_work_add · 360cd42c

Pavel Begunkov authored Apr 06, 2023

Chains of memory accesses are never good for performance.
The req->task->io_uring->in_cancel in io_req_local_work_add() is there
so that when a task is exiting via io_uring_try_cancel_requests() and
starts waiting for completions, it gets woken up by every new task_work
item queued.

Do a little trick by announcing waiting in io_uring_try_cancel_requests(),
making io_req_local_work_add() wake us up. We also need to check for
deferred tw items after prepare_to_wait(TASK_INTERRUPTIBLE);
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fb11597e9bbcb365901824f8c5c2cf0d6ee100d0.1680782017.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

360cd42c

io_uring: refactor __io_cq_unlock_post_flush() · c66ae3ec

Pavel Begunkov authored Apr 06, 2023

Separate ->task_complete path in __io_cq_unlock_post_flush().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/baa9b8d822f024e4ee01c40209dbbe38d9c8c11d.1680782017.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

c66ae3ec

io_uring: reduce scheduling due to tw · 8751d154

Pavel Begunkov authored Apr 06, 2023

Every task_work will try to wake the task to be executed, which causes
excessive scheduling and additional overhead. For some tw it's
justified, but others won't do much but post a single CQE.

When a task waits for multiple cqes, every such task_work will wake it
up. Instead, the task may give a hint about how many cqes it waits for,
io_req_local_work_add() will compare against it and skip wake ups
if #cqes + #tw is not enough to satisfy the waiting condition. Task_work
that uses the optimisation should be simple enough and never post more
than one CQE. It's also ignored for non DEFER_TASKRUN rings.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d2b77e99d1e86624d8a69f7037d764b739dcd225.1680782017.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

8751d154

io_uring: inline llist_add() · 51509400

Pavel Begunkov authored Apr 06, 2023

We'll need to grab some information from the previous request in the tw
list, inline llist_add(), it'll be used in the following patch.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f0165493af7b379943c792114b972f331e7d7d10.1680782017.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

51509400

io_uring: add tw add flags · 8501fe70

Pavel Begunkov authored Apr 06, 2023

We pass 'allow_local' into io_req_task_work_add() but will need more
flags. Replace it with a flags bit field and name this allow_local
flag.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4c0f01e7ef4e6feebfb199093cc995af7a19befa.1680782017.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

8501fe70

io_uring: refactor io_cqring_wake() · 6e7248ad

Pavel Begunkov authored Apr 06, 2023

Instead of smp_mb() + __io_cqring_wake() in __io_cq_unlock_post_flush()
use equivalent io_cqring_wake(). With that we can clean it up further
and remove __io_cqring_wake().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/662ee5d898168ac206be06038525e97b64072a46.1680782017.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

6e7248ad

io_uring: optimize local tw add ctx pinning · d73a572d

Pavel Begunkov authored Apr 06, 2023

We currently pin the ctx for io_req_local_work_add() with
percpu_ref_get/put, which implies two rcu_read_lock/unlock pairs and some
extra overhead on top in the fast path. Replace it with a pure rcu read
and let io_ring_exit_work() synchronise against it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cbdfcb6b232627f30e9e50ef91f13c4f05910247.1680782017.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

d73a572d

io_uring: move pinning out of io_req_local_work_add · ab1c590f

Pavel Begunkov authored Apr 06, 2023

Move ctx pinning from io_req_local_work_add() to the caller, looks
better and makes working with the code a bit easier.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/49c0dbed390b0d6d04cb942dd3592879fd5bfb1b.1680782017.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

ab1c590f

05 Apr, 2023 1 commit

io_uring/uring_cmd: assign ioucmd->cmd at async prep time · 758d5d64

Jens Axboe authored Apr 05, 2023

Rather than check this in the fast path issue, it makes more sense to
just assign the copy of the data when we're setting it up anyway. This
makes the code a bit cleaner, and removes the need for this check in
the issue path.
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

758d5d64

04 Apr, 2023 13 commits

io_uring/rsrc: add custom limit for node caching · 69bbc6ad

Pavel Begunkov authored Apr 04, 2023

The number of entries in the rsrc node cache is limited to 512, which
still seems unnecessarily large. Add per cache thresholds and set to
to 32 for the rsrc node cache.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d0cd538b944dac0bf878e276fc0199f21e6bccea.1680576071.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

69bbc6ad

io_uring/rsrc: optimise io_rsrc_data refcounting · 757ef468

Pavel Begunkov authored Apr 04, 2023

Every struct io_rsrc_node takes a struct io_rsrc_data reference, which
means all rsrc updates do 2 extra atomics. Replace atomics refcounting
with a int as it's all done under ->uring_lock.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e73c3d6820cf679532696d790b5b8fae23537213.1680576071.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

757ef468

io_uring/rsrc: add lockdep sanity checks · 1f2c8f61

Pavel Begunkov authored Apr 04, 2023

We should hold ->uring_lock while putting nodes with io_put_rsrc_node(),
add a lockdep check for that.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b50d5f156ac41450029796738c1dfd22a521df7a.1680576071.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

1f2c8f61

io_uring/rsrc: cache struct io_rsrc_node · 9eae8655

Pavel Begunkov authored Apr 04, 2023

Add allocation cache for struct io_rsrc_node, it's always allocated and
put under ->uring_lock, so it doesn't need any extra synchronisation
around caches.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/252a9d9ef9654e6467af30fdc02f57c0118fb76e.1680576071.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

9eae8655

io_uring/rsrc: don't offload node free · 36b9818a

Pavel Begunkov authored Apr 04, 2023

struct delayed_work rsrc_put_work was previously used to offload node
freeing because io_rsrc_node_ref_zero() was previously called by RCU in
the IRQ context. Now, as percpu refcounting is gone, we can do it
eagerly at the spot without pushing it to a worker.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/13fb1aac1e8d068ad8fd4a0c6d0d157ab61b90c0.1680576071.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

36b9818a

io_uring/rsrc: optimise io_rsrc_put allocation · ff7c75ec

Pavel Begunkov authored Apr 04, 2023

Every io_rsrc_node keeps a list of items to put, and all entries are
kmalloc()'ed. However, it's quite often to queue up only one entry per
node, so let's add an inline entry there to avoid extra allocations.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c482c1c652c45c85ac52e67c974bc758a50fed5f.1680576071.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

ff7c75ec

io_uring/rsrc: rename rsrc_list · c824986c

Pavel Begunkov authored Apr 04, 2023

We have too many "rsrc" around which makes the name of struct
io_rsrc_node::rsrc_list confusing. The field is responsible for keeping
a list of files or buffers, so call it item_list and add comments
around.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3e34d4dfc1fdbb6b520f904ee6187c2ccf680efe.1680576071.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

c824986c

io_uring/rsrc: kill rsrc_ref_lock · 0a4813b1

Pavel Begunkov authored Apr 04, 2023

We use ->rsrc_ref_lock spinlock to protect ->rsrc_ref_list in
io_rsrc_node_ref_zero(). Now we removed pcpu refcounting, which means
io_rsrc_node_ref_zero() is not executed from the irq context as an RCU
callback anymore, and we also put it under ->uring_lock.
io_rsrc_node_switch(), which queues up nodes into the list, is also
protected by ->uring_lock, so we can safely get rid of ->rsrc_ref_lock.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6b60af883c263551190b526a55ff2c9d5ae07141.1680576071.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

0a4813b1

io_uring/rsrc: protect node refs with uring_lock · ef8ae64f

Pavel Begunkov authored Apr 04, 2023

Currently, for nodes we have an atomic counter and some cached
(non-atomic) refs protected by uring_lock. Let's put all ref
manipulations under uring_lock and get rid of the atomic part.
It's free as in all cases we care about we already hold the lock.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/25b142feed7d831008257d90c8b17c0115d4fc15.1680576071.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

ef8ae64f

io_uring: io_free_req() via tw · 03adabe8

Pavel Begunkov authored Apr 04, 2023

io_free_req() is not often used but nevertheless problematic as there is
no way to know the current context, it may be used from the submission
path or even by an irq handler. Push it to a fresh context using
task_work.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3a92fe80bb068757e51aaa0b105cfbe8f5dfee9e.1680576071.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

03adabe8

io_uring: don't put nodes under spinlocks · 2ad4c6d0

Pavel Begunkov authored Apr 04, 2023

io_req_put_rsrc() doesn't need any locking, so move it out of
a spinlock section in __io_req_complete_post() and adjust helpers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d5b87a5f31270dade6805f7acafc4cc34b84b241.1680576071.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

2ad4c6d0

io_uring/rsrc: keep cached refs per node · 8e15c0e7

Pavel Begunkov authored Apr 04, 2023

We cache refs of the current node (i.e. ctx->rsrc_node) in
ctx->rsrc_cached_refs. We'll be moving away from atomics, so move the
cached refs in struct io_rsrc_node for now. It's a prep patch and
shouldn't change anything in practise.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9edc3669c1d71b06c2dca78b2b2b8bb9292738b9.1680576071.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

8e15c0e7

io_uring/rsrc: use non-pcpu refcounts for nodes · b8fb5b4f

Pavel Begunkov authored Apr 04, 2023

One problem with the current rsrc infra is that often updates will
generates lots of rsrc nodes, each carry pcpu refs. That takes quite a
lot of memory, especially if there is a stall, and takes lots of CPU
cycles. Only pcpu allocations takes >50 of CPU with a naive benchmark
updating files in a loop.

Replace pcpu refs with normal refcounting. There is already a hot path
avoiding atomics / refs, but following patches will further improve it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e9ed8a9457b331a26555ff9443afc64cdaab7247.1680576071.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

b8fb5b4f

03 Apr, 2023 9 commits

io_uring: cap io_sqring_entries() at SQ ring size · e3ef728f

Jens Axboe authored Mar 30, 2023

We already do this manually for the !SQPOLL case, do it in general and
we can also dump the ugly min3() in io_submit_sqes().
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e3ef728f

io_uring: rename trace_io_uring_submit_sqe() tracepoint · 2ad57931

Jens Axboe authored Mar 30, 2023

It has nothing to do with the SQE at this point, it's a request
submission. While in there, get rid of the 'force_nonblock' argument
which is also dead, as we only pass in true.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

2ad57931

io_uring: encapsulate task_work state · a282967c

Pavel Begunkov authored Mar 27, 2023

For task works we're passing around a bool pointer for whether the
current ring is locked or not, let's wrap it in a structure, that
will make it more opaque preventing abuse and will also help us
to pass more info in the future if needed.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1ecec9483d58696e248d1bfd52cf62b04442df1d.1679931367.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

a282967c

io_uring: remove extra tw trylocks · 13bfa6f1

Pavel Begunkov authored Mar 27, 2023

Before cond_resched()'ing in handle_tw_list() we also drop the current
ring context, and so the next loop iteration will need to pick/pin a new
context and do trylock.

The chunk removed by this patch was intended to be an optimisation
covering exactly this case, i.e. retaking the lock after reschedule, but
in reality it's skipped for the first iteration after resched as
described and will keep hammering the lock if it's contended.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1ecec9483d58696e248d1bfd52cf62b04442df1d.1679931367.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

13bfa6f1

io_uring/io-wq: drop outdated comment · 07d99096

Jens Axboe authored Mar 27, 2023

Since the move to PF_IO_WORKER, we don't juggle memory context manually
anymore. Remove that outdated part of the comment for __io_worker_idle().
Signed-off-by: Jens Axboe <axboe@kernel.dk>

07d99096

io_uring: kill unused notif declarations · d322818e

Pavel Begunkov authored Mar 27, 2023

There are two leftover structures from the notification registration
mechanism that has never been released, kill them.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f05f65aebaf8b1b5bf28519a8fdb350e3e7c9ad0.1679924536.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

d322818e

io-wq: Drop struct io_wqe · eb47943f

Gabriel Krisman Bertazi authored Mar 21, 2023

Since commit 0654b05e7e65 ("io_uring: One wqe per wq"), we have just a
single io_wqe instance embedded per io_wq. Drop the extra structure in
favor of accessing struct io_wq directly, cleaning up quite a bit of
dereferences and backpointers.

No functional changes intended. Tested with liburing's testsuite
and mmtests performance microbenchmarks. I didn't observe any
performance regressions.
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20230322011628.23359-2-krisman@suse.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

eb47943f

io-wq: Move wq accounting to io_wq · dfd63baf

Gabriel Krisman Bertazi authored Mar 21, 2023

Since we now have a single io_wqe per io_wq instead of per-node, and in
preparation to its removal, move the accounting into the parent
structure.
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://lore.kernel.org/r/20230322011628.23359-2-krisman@suse.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

dfd63baf

io_uring/kbuf: disallow mapping a badly aligned provided ring buffer · fcb46c0c

Jens Axboe authored Mar 17, 2023

On at least parisc, we have strict requirements on how we virtually map
an address that is shared between the application and the kernel. On
these platforms, IOU_PBUF_RING_MMAP should be used when setting up a
shared ring buffer for provided buffers. If the application is mapping
these pages and asking the kernel to pin+map them as well, then we have
no control over what virtual address we get in the kernel.

For that case, do a sanity check if SHM_COLOUR is defined, and disallow
the mapping request. The application must fall back to using
IOU_PBUF_RING_MMAP for this case, and liburing will do that transparently
with the set of helpers that it has.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

fcb46c0c