An error occurred fetching the project authors.
- 15 Apr, 2024 35 commits
-
-
Jens Axboe authored
A previous consolidation cleanup missed handling the case where the ring is dying, and __io_cqring_overflow_flush() doesn't flush entries if the CQ ring is already full. This is fine for the normal CQE overflow flushing, but if the ring is going away, we need to flush everything, even if it means simply freeing the overflown entries. Fixes: 6c948ec44b29 ("io_uring: consolidate overflow flushing") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Consolidate __io_cqring_overflow_flush and io_cqring_overflow_kill() into a single function as it once was, it's easier to work with it this way. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/986b42c35e76a6be7aa0cdcda0a236a2222da3a7.1712708261.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Conditional locking is never great, in case of __io_cqring_overflow_flush(), which is a slow path, it's not justified. Don't handle IOPOLL separately, always grab uring_lock for overflow flushing. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/162947df299aa12693ac4b305dacedab32ec7976.1712708261.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
There is only one caller of io_cqring_overflow_flush(), open code it Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a1fecd56d9dba923ed8d4d159727fa939d3baa2a.1712708261.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
c1edbf5f ("io_uring: flag SQPOLL busy condition to userspace") added an extra overflowed CQE flush in the SQPOLL submission path due to backpressure, which was later removed. Remove the flush and let io_cqring_wait() / iopoll handle it. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2a83b0724ca6ca9d16c7d79a51b77c81876b2e39.1712708261.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
There are no users of io_req_cqe_overflow() apart from io_uring.c, make it static. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f4295eb2f9eb98d5db38c0578f57f0b86bfe0d8c.1712708261.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
The only caller doesn't handle the return value of io_put_kbuf_comp(), so change its return type into void. Also follow Jens's suggestion to rename it as io_put_kbuf_drop(). Signed-off-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240407132759.4056167-1-ming.lei@redhat.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io_req_put_rsrc_locked() is a weird shim function around io_req_put_rsrc(). All calls to io_req_put_rsrc() require holding ->uring_lock, so we can just use it directly. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a195bc78ac3d2c6fbaea72976e982fe51e50ecdd.1712331455.git.asml.silence@gmail.comReviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io_req_complete_post() was a sole user of ->locked_free_list, but since we just gutted the function, the cache is not used anymore and can be removed. ->locked_free_list served as an asynhronous counterpart of the main request (i.e. struct io_kiocb) cache for all unlocked cases like io-wq. Now they're all forced to be completed into the main cache directly, off of the normal completion path or via io_free_req(). Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7bffccd213e370abd4de480e739d8b08ab6c1326.1712331455.git.asml.silence@gmail.comReviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io_req_complete_post() is now io-wq only and shouldn't be used outside of it, i.e. it relies that io-wq holds a ref for the request as explained in a comment below. Let's add a warning to enforce the assumption and make sure nobody would try to do anything weird. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1013b60c35d431d0698cafbc53c06f5917348c20.1712331455.git.asml.silence@gmail.comReviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Since commit 8f6c829491fe ("io_uring: remove struct io_tw_state::locked"), io_req_complete_post() is only called from io-wq submit work, where the request reference is guaranteed to be grabbed and won't drop to zero in io_req_complete_post(). Kill the dead code, meantime add req_ref_put() to put the reference. Cc: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1d8297e2046553153e763a52574f0e0f4d512f86.1712331455.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Move the related code from io_uring.c into memmap.c. No functional changes in this patch, just cleaning it up a bit now that the full transition is done. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
There are a few cases of open-rolled loops around unpin_user_page(), use the generic helper instead. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Rather than use remap_pfn_range() for this and manually free later, switch to using vm_insert_page() and have it Just Work. This requires a bit of effort on the mmap lookup side, as the ctx uring_lock isn't held, which otherwise protects buffer_lists from being torn down, and it's not safe to grab from mmap context that would introduce an ABBA deadlock between the mmap lock and the ctx uring_lock. Instead, lookup the buffer_list under RCU, as the the list is RCU freed already. Use the existing reference count to determine whether it's possible to safely grab a reference to it (eg if it's not zero already), and drop that reference when done with the mapping. If the mmap reference is the last one, the buffer_list and the associated memory can go away, since the vma insertion has references to the inserted pages at that point. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Move it into io_uring.c where it belongs, and use it in there as well rather than have two implementations of this. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This is the last holdout which does odd page checking, convert it to vmap just like what is done for the non-mmap path. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Rather than use remap_pfn_range() for this and manually free later, switch to using vm_insert_pages() and have it Just Work. If possible, allocate a single compound page that covers the range that is needed. If that works, then we can just use page_address() on that page. If we fail to get a compound page, allocate single pages and use vmap() to map them into the kernel virtual address space. This just covers the rings/sqes, the other remaining user of the mmap remap_pfn_range() user will be converted separately. Once that is done, we can kill the old alloc/free code. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Joel Granados authored
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) Remove sentinel element from kernel_io_uring_disabled_table Signed-off-by:
Joel Granados <j.granados@samsung.com> Link: https://lore.kernel.org/r/20240328-jag-sysctl_remset_misc-v1-6-47c1463b3af2@samsung.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jiapeng Chong authored
The function are defined in the io_uring.c file, but not called elsewhere, so delete the unused function. io_uring/io_uring.c:646:20: warning: unused function '__io_cq_unlock'. Reported-by:
Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8660Signed-off-by:
Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/20240328022324.78029-1-jiapeng.chong@linux.alibaba.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
The allocator will generally return memory in order, but __io_alloc_req_refill() then adds them to a stack and we'll extract them in the opposite order. This obviously isn't a huge deal, but: 1) it makes debugging easier when they are in order 2) keeping them in-order is the right thing to do 3) reduces the code for adding them to the stack Just add them in reverse to the stack. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This should be plenty, rather than the default of 128, and matches what we have on the rsrc and futex side as well. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Currently lists are being used to manage this, but best practice is usually to have these in an array instead as that it cheaper to manage. Outside of that detail, games are also played with KASAN as the list is inside the cached entry itself. Finally, all users of this need a struct io_cache_entry embedded in their struct, which is union'ized with something else in there that isn't used across the free -> realloc cycle. Get rid of all of that, and simply have it be an array. This will not change the memory used, as we're just trading an 8-byte member entry for the per-elem array size. This reduces the overhead of the recycled allocations, and it reduces the amount of code code needed to support recycling to about half of what it currently is. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
It's now unused, drop the code related to it. This includes the io_issue_defs->manual alloc field. While in there, and since ->async_size is now being used a bit more frequently and in the issue path, move it to io_issue_defs[]. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Basic conversion ensuring async_data is allocated off the prep path. Adds a basic alloc cache as well, as passthrough IO can be quite high in rate. Tested-by:
Anuj Gupta <anuj20.g@samsung.com> Reviewed-by:
Anuj Gupta <anuj20.g@samsung.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
read/write requests try to put everything on the stack, and then alloc and copy if a retry is needed. This necessitates a bunch of nasty code that deals with intermediate state. Get rid of this, and have the prep side setup everything that is needed upfront, which greatly simplifies the opcode handlers. This includes adding an alloc cache for io_async_rw, to make it cheap to handle. In terms of cost, this should be basically free and transparent. For the worst case of {READ,WRITE}_FIXED which didn't need it before, performance is unaffected in the normal peak workload that is being used to test that. Still runs at 122M IOPS. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
For historical reasons these were special cased, as they were the only ones that needed cancelation. But now we handle cancelations generally, and hence there's no need to check for these in io_ring_ctx_wait_and_kill() when io_uring_try_cancel_requests() handles both these and the rest as well. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Just like we run the inline task_work, ensure we also factor in and run the fallback task_work. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Make io_req_complete_post() to push all IORING_SETUP_IOPOLL requests to task_work, it's much cleaner and should normally happen. We couldn't do it before because there was a possibility of looping in complete_post() -> tw -> complete_post() -> ... Also, unexport the function and inline __io_req_complete_post(). Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Tested-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/ea19c032ace3e0dd96ac4d991a063b0188037014.1710799188.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
task_work execution is now always locked, and we shouldn't get into io_req_complete_post() from them. That means that complete_post() is always called out of the original task context and we don't even need to check current. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Tested-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/24ec27f27db0d8f58c974d8118dca1d345314ddc.1710799188.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io_post_aux_cqe(), which is used for multishot requests, delays completions by putting CQEs into a temporary array for the purpose completion lock/flush batching. DEFER_TASKRUN doesn't need any locking, so for it we can put completions directly into the CQ and defer post completion handling with a flag. That leaves !DEFER_TASKRUN, which is not that interesting / hot for multishot requests, so have conditional locking with deferred flush for them. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Tested-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/b1d05a81fd27aaa2a07f9860af13059e7ad7a890.1710799188.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
The restriction on multishot execution context disallowing io-wq is driven by rules of io_fill_cqe_req_aux(), it should only be called in the master task context, either from the syscall path or in task_work. Since task_work now always takes the ctx lock implying IO_URING_F_COMPLETE_DEFER, we can just assume that the function is always called with its defer argument set to true. Kill the argument. Also rename the function for more consistency as "fill" in CQE related functions was usually meant for raw interfaces only copying data into the CQ without any locking, waking the user and other accounting "post" functions take care of. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Tested-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/93423d106c33116c7d06bf277f651aa68b427328.1710799188.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
ctx is always locked for task_work now, so get rid of struct io_tw_state::locked. Note I'm stopping one step before removing io_tw_state altogether, which is not empty, because it still serves the purpose of indicating which function is a tw callback and forcing users not to invoke them carelessly out of a wrong context. The removal can always be done later. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Tested-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/e95e1ea116d0bfa54b656076e6a977bc221392a4.1710799188.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
We can run normal task_work without locking the ctx, however we try to lock anyway and most handlers prefer or require it locked. It might have been interesting to multi-submitter ring with high contention completing async read/write requests via task_work, however that will still need to go through io_req_complete_post() and potentially take the lock for rsrc node putting or some other case. In other words, it's hard to care about it, so alawys force the locking. The case described would also because of various io_uring caches. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Tested-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/6ae858f2ef562e6ed9f13c60978c0d48926954ba.1710799188.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
kiocb_done() should care to specifically redirecting requests to io-wq. Remove the hopping to tw to then queue an io-wq, return -EAGAIN and let the core code io_uring handle offloading. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Tested-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/413564e550fe23744a970e1783dfa566291b0e6f.1710799188.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io_uring_try_cancel_uring_cmd() is a part of the cmd handling so let's move it closer to all cmd bits into uring_cmd.c Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Tested-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/43a3937af4933655f0fd9362c381802f804f43de.1710799188.git.asml.silence@gmail.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 06 Apr, 2024 1 commit
-
-
Alexey Izbyshev authored
This bug was introduced in commit 950e79dd ("io_uring: minor io_cqring_wait() optimization"), which was made in preparation for adc8682e ("io_uring: Add support for napi_busy_poll"). The latter got reverted in cb318216 ("Revert "io_uring: Add support for napi_busy_poll""), so simply undo the former as well. Cc: stable@vger.kernel.org Fixes: 950e79dd ("io_uring: minor io_cqring_wait() optimization") Signed-off-by:
Alexey Izbyshev <izbyshev@ispras.ru> Link: https://lore.kernel.org/r/20240405125551.237142-1-izbyshev@ispras.ruSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 03 Apr, 2024 2 commits
-
-
Jens Axboe authored
If we look up the kbuf, ensure that it doesn't get unregistered until after we're done with it. Since we're inside mmap, we cannot safely use the io_uring lock. Rely on the fact that we can lookup the buffer list under RCU now and grab a reference to it, preventing it from being unregistered until we're done with it. The lookup returns the io_buffer_list directly with it referenced. Cc: stable@vger.kernel.org # v6.4+ Fixes: 5cf4f52e ("io_uring: free io_buffer_list entries via RCU") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Just rely on the xarray for any kind of bgid. This simplifies things, and it really doesn't bring us much, if anything. Cc: stable@vger.kernel.org # v6.4+ Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 02 Apr, 2024 1 commit
-
-
Jens Axboe authored
Rather than use the system unbound event workqueue, use an io_uring specific one. This avoids dependencies with the tty, which also uses the system_unbound_wq, and issues flushes of said workqueue from inside its poll handling. Cc: stable@vger.kernel.org Reported-by:
Rasmus Karlsson <rasmus.karlsson@pajlada.com> Tested-by:
Rasmus Karlsson <rasmus.karlsson@pajlada.com> Tested-by:
Iskren Chernev <me@iskren.info> Link: https://github.com/axboe/liburing/issues/1113Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 01 Apr, 2024 1 commit
-
-
Jens Axboe authored
Do the same check for direct io-wq execution for multishot requests that commit 2a975d42 did for the inline execution, and disable multishot mode (and revert to single shot) if the file type doesn't support NOWAIT, and isn't opened in O_NONBLOCK mode. For multishot to work properly, it's a requirement that nonblocking read attempts can be done. Cc: stable@vger.kernel.org Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-