- 23 Aug, 2021 40 commits
-
-
Pavel Begunkov authored
WARNING: CPU: 1 PID: 5870 at fs/io_uring.c:5975 io_try_cancel_userdata+0x30f/0x540 fs/io_uring.c:5975 CPU: 0 PID: 5870 Comm: iou-wrk-5860 Not tainted 5.14.0-rc6-next-20210820-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:io_try_cancel_userdata+0x30f/0x540 fs/io_uring.c:5975 Call Trace: io_async_cancel fs/io_uring.c:6014 [inline] io_issue_sqe+0x22d5/0x65a0 fs/io_uring.c:6407 io_wq_submit_work+0x1dc/0x300 fs/io_uring.c:6511 io_worker_handle_work+0xa45/0x1840 fs/io-wq.c:533 io_wqe_worker+0x2cc/0xbb0 fs/io-wq.c:582 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 io_try_cancel_userdata() can be called from io_async_cancel() executing in the io-wq context, so the warning fires, which is there to alert anyone accessing task->io_uring->io_wq in a racy way. However, io_wq_put_and_exit() always first waits for all threads to complete, so the only detail left is to zero tctx->io_wq after the context is removed. note: one little assumption is that when IO_WQ_WORK_CANCEL, the executor won't touch ->io_wq, because io_wq_destroy() might cancel left pending requests in such a way. Cc: stable@vger.kernel.org Reported-by: syzbot+b0c9d1588ae92866515f@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/dfdd37a80cfa9ffd3e59538929c99cdd55d8699e.1629721757.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Employ inline completion logic for read/write completions done via io_req_task_complete(). If ->uring_lock is contended, just do normal request completion, but if not, make tctx_task_work() to grab the lock and do batched inline completions in io_req_task_complete(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/94589c3ce69eaed86a21bb1ec696407a54fab1aa.1629286357.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Many task_work handlers either grab ->uring_lock, or may benefit from having it. Move locking logic out of individual handlers to a lazy approach controlled by tctx_task_work(), so we don't keep doing tons of mutex lock/unlock. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d6a34e147f2507a2f3e2fa1e38a9c541dcad3929.1629286357.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io_fallback_req_func() doesn't expect anyone creating inline completions, and no one currently does that. Teach the function to flush completions preparing for further changes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8b941516921f72e1a64d58932d671736892d7fff.1629286357.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
->splice_fd_in is used only by splice/tee, but no other request checks it for validity. Add the check for most of request types excluding reads/writes/sends/recvs, we don't want overhead for them and can leave them be as is until the field is actually used. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f44bc2acd6777d932de3d71a5692235b5b2b7397.1629451684.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We've previously had an issue where overflow flush unconditionally calls io_cqring_ev_posted() even if it didn't flush any events to the ring, causing wake and eventfd increment where no new events are available. Some applications don't like that, see commit b18032bb for details. This came up in discussion for another patch recently, hence add a comment detailing what the relationship between calling the events posted helper and CQ ring entries is. Link: https://lore.kernel.org/io-uring/77a44fce-c831-16a6-8e80-9aee77f496a2@kernel.dk/Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Fixed tables may be large enough, place all of them together with allocated tags under memcg limits. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b3ac9f5da9821bb59837b5fe25e8ef4be982218c.1629451684.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Limit the number of files in io_uring fixed tables by RLIMIT_NOFILE, that's the first and the simpliest restriction that we should impose. Cc: stable@vger.kernel.org Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b2756c340aed7d6c0b302c26dab50c6c5907f4ce.1629451684.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Hao Xu authored
coml_nr in ctx_flush_and_put() is not protected by uring_lock, this may cause problems when accessing in parallel: say coml_nr > 0 ctx_flush_and put other context if (compl_nr) get mutex coml_nr > 0 do flush coml_nr = 0 release mutex get mutex do flush (*) release mutex in (*) place, we call io_cqring_ev_posted() and users likely get no events there. To avoid spurious events, re-check the value when under the lock. Fixes: 2c32395d ("io_uring: fix __tctx_task_work() ctx race") Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20210820221954.61815-1-haoxu@linux.alibaba.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
wangyangbo authored
Now allocated rsrc table uses PAGE_SIZE as the size of 2nd-level, and accessing this table relies on each level index from fixed TABLE_SHIFT (12 - 3) in 4k page case. In order to correctly work in non-4k page, define TABLE_SHIFT as non-fixed (PAGE_SHIFT - shift of data) for 2nd-level table entry number. Signed-off-by: wangyangbo <wangyangbo@uniontech.com> Link: https://lore.kernel.org/r/20210819055657.27327-1-wangyangbo@uniontech.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Now with IRQ completions done via IRQ, almost all requests freeing are done from the context of submitter task, so it makes sense to extend task_put optimisation from io_req_free_batch_finish() to cover all the cases including task_work by moving it into io_put_task(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/824a7cbd745ddeee4a0f3ff85c558a24fd005872.1629302453.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We have two checks of task->flags & PF_EXITING left: 1) In io_req_task_submit(), which is called in task_work and hence always in the context of the original task. That means that req->task == current, and hence checking ->flags is totally fine. 2) In io_poll_rewait(), where we need to stop re-arming poll to prevent it interfering with cancelation. This is only run from task_work as well, and hence for this case too req->task == current. Add a comment to both spots detailing that. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Hao Xu authored
We don't need to protect nr_running and worker_refs by wqe->lock, so narrow the range of raw_spin_lock_irq - raw_spin_unlock_irq Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Link: https://lore.kernel.org/r/20210810125554.99229-1-haoxu@linux.alibaba.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io_timeout_cancel() posts CQEs so needs ->completion_lock to be held, so grab it in io_timeout_remove(). Fixes: 48ecb6369f1f2 ("io_uring: run timeouts from task_work") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d6f03d653a4d7bf693ef6f39b6a426b6d97fd96f.1629280204.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Move earlier the check for whether __io_queue_proc() tries to poll already polled waitqueue, and do the same for the second poll entry, if any. Shouldn't really matter, but at least it would have a more predictable behaviour. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8cb428cfe8ade0fd055859fabb878db8777d4c2f.1629228203.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
We have io_req_complete_post() to post a CQE and put the request. It takes care of all synchronisation and is more concise and efficent, so replace all hancoded occurrences of "lock; post CQE; unlock; + put_req()" with io_req_complete_post(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2c83463458a613f9d870e5147eb134da2aa70779.1629228203.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Make io_put_rw_kbuf() to do the REQ_F_BUFFER_SELECTED check, so all the callers don't need to hand code it. The number of places where we call io_put_rw_kbuf() is growing, so saves some pain. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3df3919e5e7efe03420c44ab4d9317a81a9cf398.1629228203.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Linked timeout handling during issuing is heavy, it adds extra instructions and forces to save the next linked timeout before io_issue_sqe(). Follwing the same reasoning as in refcounting patches, a request can't be freed by the time it returns from io_issue_sqe(), so now we don't need to do io_prep_linked_timeout() in advance, and it can be delayed to colder paths optimising the generic path. Also, it should also save quite a lot for requests with linked timeouts and completed inline on timeout spinlocking + hrtimer_start() + hrtimer_try_to_cancel() and so on. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/19bfc9a0d26c5c5f1e359f7650afe807ca8ef879.1628981736.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Adjust io_disarm_next(), so it can detect if there is a linked but not-yet-armed timeout and complete/cancel it separately. Will be used in the following patch. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ae228cde2c0df3d92d29d5e4852ed9fa8a2a97db.1628981736.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
The link test in io_prep_linked_timeout() is pretty bulky, replace it with a flag. It's better for normal path and linked requests, and also will be used further for request failing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3703770bfae8bc1ff370e43ef5767940202cab42.1628981736.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Instead of handling double consecutive linked timeouts through tricky flag combinations, just check the submit_state.link during timeout_prep and fail that case in advance. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/04150760b0dc739522264b8abd309409f7421a06.1628981736.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
IORING_OP_ASYNC_CANCEL and IORING_OP_LINK_TIMEOUT have enough of overlap, so extract a helper for request cancellation and use in both. Also, removes some amount of ugliness because of success_ret. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/900122b588e65b637e71bfec80a260726c6a54d6.1628981736.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
773af691 ("io_uring: always reissue from task_work context") makes all resubmission to be made from task_work, so we don't need that hack with resubmit/not-resubmit switch anymore. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/47fa177cca04e5ffd308a35227966c8e15d8525b.1628981736.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Linked timeouts are never refcounted when it comes to the first call to __io_prep_linked_timeout(), so save an io_ref_get() and set the desired value directly. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/177b24cc62ffbb42d915d6eb9e8876266e4c0d5a.1628981736.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Tracking linked timeouts as infligh was needed to make sure that io-wq is not destroyed by io_uring_cancel_generic() racing with io_async_cancel_one() accessing it. Now, cancellations issued by linked timeouts are done in the task context, so it's already synchronised. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e1b05cf47cb69df2305efdbee8cf7ba36f46c1a3.1628981736.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
If a requests is forwarded into io-wq, there is a good chance it hasn't been refcounted yet and we can save one req_ref_get() by setting the refcount number to the right value directly. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2d53f4449faaf73b4a4c5de667fc3c176d974860.1628981736.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
io_req_free_batch() has a __must_hold annotation referencing a request being passed in, but we're passing in the context. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Hao Xu authored
We can merge two spin_unlock() operations to one since we removed some code not long ago. Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Hao Xu authored
When doing cancellation, we use a parameter to indicate where it's from do_exit or exec. So a boolean value is good enough for this, remove the struct files* as it is not necessary. Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> [axboe: fixup io_uring_files_cancel for !CONFIG_IO_URING] Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Hao Xu authored
Extract io_uring_files_cancel() call in io_uring_task_cancel() to make io_uring_files_cancel() and io_uring_task_cancel() coherent and easy to read. Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
io_prep_linked_timeout() grew too heavy and compiler now refuse to inline the function. Help it by splitting in two and annotating with inline. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/560636717a32e9513724f09b9ecaace942dde4d4.1628705069.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
As submission references are gone, there is only one initial reference left. Instead of actually doing atomic refcounting, add a flag indicating whether we're going to take more refs or doing any other sync magic. The flag should be set before the request may get used in parallel. Together with the previous patch it saves 2 refcount atomics per request for IOPOLL and IRQ completions, and 1 atomic per req for inline completions, with some exceptions. In particular, currently, there are three cases, when the refcounting have to be enabled: - Polling, including apoll. Because double poll entries takes a ref. Might get relaxed in the near future. - Link timeouts, enabled for both, the timeout and the request it's bound to, because they work in-parallel and we need to synchronise to cancel one of them on completion. - When a request gets in io-wq, because it doesn't hold uring_lock and we need guarantees of submission references. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8b204b6c5f6643062270a1913d6d3a7f8f795fd9.1628705069.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Requests are by default given with two references, submission and completion. Completion references are straightforward, they represent request ownership and are put when a request is completed or so. Submission references are a bit more trickier. They're needed when io_issue_sqe() followed deep into the submission stack (e.g. in fs, block, drivers, etc.), request may have given away for concurrent execution or already completed, and the code unwinding back to io_issue_sqe() may be accessing some pieces of our requests, e.g. file or iov. Now, we prevent such async/in-depth completions by pushing requests through task_work. Punting to io-wq is also done through task_works, apart from a couple of cases with a pretty well known context. So, there're two cases: 1) io_issue_sqe() from the task context and protected by ->uring_lock. Either requests return back to io_uring or handed to task_work, which won't be executed because we're currently controlling that task. So, we can be sure that requests are staying alive all the time and we don't need submission references to pin them. 2) io_issue_sqe() from io-wq, which doesn't hold the mutex. The role of submission reference is played by io-wq reference, which is put by io_wq_submit_work(). Hence, it should be fine. Considering that, we can carefully kill the submission reference. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6b68f1c763229a590f2a27148aee77767a8d7750.1628705069.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Soon, we won't need to put several references at once, remove req_ref_sub_and_test() and @nr argument from io_put_req_deferred(), and put the rest of the references by hand. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1868c7554108bff9194fb5757e77be23fadf7fc0.1628705069.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Move all request refcount helpers to avoid forward declarations in the future. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/89fd36f6f3fe5b733dfe4546c24725eee40df605.1628705069.git.asml.silence@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We have no hard/soft IRQ users of this lock left, remove any IRQ disabling/saving and restoring when grabbing this lock. This is straight forward with no users entering with IRQs disabled anymore, the only thing to look out for is the waitqueue poll head lock which nests inside the completion lock. That needs IRQs disabled, and hence we have to do that now instead of relying on the outer lock doing so. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This is in preparation to making the completion lock work outside of hard/soft IRQ context. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This is in preparation to making the completion lock work outside of hard/soft IRQ context. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This is in preparation to making the completion lock work outside of hard/soft IRQ context. Add a timeout_lock to handle the ordering of timeout completions or cancelations with the timeouts actually triggering. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
For requests with non-fixed files, instead of grabbing just one reference, we get by the number of left requests, so the following requests using the same file can take it without atomics. However, it's not all win. If there is one request in the middle not using files or having a fixed file, we'll need to put back the left references. Even worse if an application submits requests dealing with different files, it will do a put for each new request, so doubling the number of atomics needed. Also, even if not used, it's still takes some cycles in the submission path. If a file used many times, it rather makes sense to pre-register it, if not, we may fall in the described pitfall. So, this optimisation is a matter of use case. Go with the simpliest code-wise way, remove it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-