- 28 Sep, 2020 1 commit
-
-
Jens Axboe authored
syzbot reports a crash with tty polling, which is using the double poll handling: general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f] CPU: 0 PID: 6874 Comm: syz-executor749 Not tainted 5.9.0-rc6-next-20200924-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:io_poll_get_single fs/io_uring.c:4778 [inline] RIP: 0010:io_poll_double_wake+0x51/0x510 fs/io_uring.c:4845 Code: fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 9e 03 00 00 48 b8 00 00 00 00 00 fc ff df 49 8b 5d 08 48 8d 7b 48 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 06 0f 8e 63 03 00 00 0f b6 6b 48 bf 06 00 00 RSP: 0018:ffffc90001c1fb70 EFLAGS: 00010006 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000004 RDX: 0000000000000009 RSI: ffffffff81d9b3ad RDI: 0000000000000048 RBP: dffffc0000000000 R08: ffff8880a3cac798 R09: ffffc90001c1fc60 R10: fffff52000383f73 R11: 0000000000000000 R12: 0000000000000004 R13: ffff8880a3cac798 R14: ffff8880a3cac7a0 R15: 0000000000000004 FS: 0000000001f98880(0000) GS:ffff8880ae400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f18886916c0 CR3: 0000000094c5a000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __wake_up_common+0x147/0x650 kernel/sched/wait.c:93 __wake_up_common_lock+0xd0/0x130 kernel/sched/wait.c:123 tty_ldisc_hangup+0x1cf/0x680 drivers/tty/tty_ldisc.c:735 __tty_hangup.part.0+0x403/0x870 drivers/tty/tty_io.c:625 __tty_hangup drivers/tty/tty_io.c:575 [inline] tty_vhangup+0x1d/0x30 drivers/tty/tty_io.c:698 pty_close+0x3f5/0x550 drivers/tty/pty.c:79 tty_release+0x455/0xf60 drivers/tty/tty_io.c:1679 __fput+0x285/0x920 fs/file_table.c:281 task_work_run+0xdd/0x190 kernel/task_work.c:141 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_user_mode_loop kernel/entry/common.c:165 [inline] exit_to_user_mode_prepare+0x1e2/0x1f0 kernel/entry/common.c:192 syscall_exit_to_user_mode+0x7a/0x2c0 kernel/entry/common.c:267 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x401210 which is due to a failure in removing the double poll wait entry if we hit a wakeup match. This can cause multiple invocations of the wakeup, which isn't safe. Cc: stable@vger.kernel.org # v5.8 Reported-by: syzbot+81b3883093f772addf6d@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 25 Sep, 2020 3 commits
-
-
Jens Axboe authored
A previous commit for fixing up short reads botched the async retry path, so we ended up going to worker threads more often than we should. Fix this up, so retries work the way they originally were intended to. Fixes: 227c0c96 ("io_uring: internally retry short reads") Reported-by: Hao_Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This causes all the bios to be submitted with REQ_NOWAIT, which can be problematic on either btrfs or on file systems that otherwise use a mix of block devices where only some of them support it. For now, just remove the setting of plug->nowait = true. Reported-by: Dan Melnic <dmm@fb.com> Reported-by: Brian Foster <bfoster@redhat.com> Fixes: b63534c4 ("io_uring: re-issue block requests that failed because of resources") Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If we cancel these requests, we'll leak the memory associated with the filename. Add them to the table of ops that need cleaning, if REQ_F_NEED_CLEANUP is set. Cc: stable@vger.kernel.org Fixes: e62753e4 ("io_uring: call statx directly") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 21 Sep, 2020 5 commits
-
-
Jens Axboe authored
A previous commit unified how we handle prep for these two functions, but this means that we check the allowed context (SQPOLL, specifically) later than we should. Move the ring type checking into the two parent functions, instead of doing it after we've done some setup work. Fixes: ec65fea5 ("io_uring: deduplicate io_openat{,2}_prep()") Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
These will naturally fail when attempted through SQPOLL, but either with -EFAULT or -EBADF. Make it explicit that these are not workable through SQPOLL and return -EINVAL, just like other ops that need to use ->files. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Douglas Gilbert authored
It would seem none of the kernel continuous integration does this: $ cd tools/io_uring $ make Otherwise it may have noticed: cc -Wall -Wextra -g -D_GNU_SOURCE -c -o io_uring-bench.o io_uring-bench.c io_uring-bench.c:133:12: error: static declaration of ‘gettid’ follows non-static declaration 133 | static int gettid(void) | ^~~~~~ In file included from /usr/include/unistd.h:1170, from io_uring-bench.c:27: /usr/include/x86_64-linux-gnu/bits/unistd_ext.h:34:16: note: previous declaration of ‘gettid’ was here 34 | extern __pid_t gettid (void) __THROW; | ^~~~~~ make: *** [<builtin>: io_uring-bench.o] Error 1 The problem on Ubuntu 20.04 (with lk 5.9.0-rc5) is that unistd.h already defines gettid(). So prefix the local definition with "lk_". Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Some block devices, like dm, bubble back -EAGAIN through the completion handler. We check for this in io_read(), but don't honor it for when we have copied the iov. Return -EAGAIN for this case before retrying, to force punt to io-wq. Fixes: bcf5a063 ("io_uring: support true async buffered reads, if file provides it") Reported-by: Zorro Lang <zlang@redhat.com> Tested-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If we already have mapped the necessary data for retry, then don't set it up again. It's a pointless operation, and we leak the iovec if it's a large (non-stack) vec. Fixes: b63534c4 ("io_uring: re-issue block requests that failed because of resources") Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 14 Sep, 2020 2 commits
-
-
Jens Axboe authored
This isn't safe, and isn't needed either. We are guaranteed that any work we queue is on a live task (and will be run), or it goes to our backup io-wq threads if the task is exiting. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If task_work ends up being marked for cancelation, we go through a cancelation helper instead of the queue path. In converting task_work to always hold a ctx reference, this path was missed. Make sure that io_req_task_cancel() puts the reference that is being held against the ctx. Fixes: 6d816e08 ("io_uring: hold 'ctx' reference around task_work queue + execute") Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 13 Sep, 2020 1 commit
-
-
Jens Axboe authored
Always grab work environment for deferred links. The assumption that we will be running it always from the task in question is false, as exiting tasks may mean that we're deferring this one to a thread helper. And at that point it's too late to grab the work environment. Fixes: debb85f4 ("io_uring: factor out grab_env() from defer_prep()") Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 05 Sep, 2020 3 commits
-
-
Pavel Begunkov authored
While looking for ->files in ->defer_list, consider that requests there may actually be links. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
While trying to cancel requests with ->files, it also should look for requests in ->defer_list, otherwise it might end up hanging a thread. Cancel all requests in ->defer_list up to the last request there with matching ->files, that's needed to follow drain ordering semantics. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If we exceed UIO_FASTIOV, we don't handle the transition correctly between an allocated vec for requests that are queued with IOSQE_ASYNC. Store the iovec appropriately and re-set it in the iter iov in case it changed. Fixes: ff6165b2 ("io_uring: retain iov_iter state over io_read/io_write calls") Reported-by: Nick Hill <nick@nickhill.org> Tested-by: Norman Maurer <norman.maurer@googlemail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 02 Sep, 2020 2 commits
-
-
Jens Axboe authored
Actually two things that need fixing up here: - The io_rw_reissue() -EAGAIN retry is explicit to block devices and regular files, so don't ever attempt to do that on other types of files. - If we hit -EAGAIN on a nonblock marked file, don't arm poll handler for it. It should just complete with -EAGAIN. Cc: stable@vger.kernel.org Reported-by: Norman Maurer <norman.maurer@googlemail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jiufei Xue authored
While io_sqe_file_register() failed in __io_sqe_files_update(), table->files[i] still point to the original file which may freed soon, and that will trigger use-after-free problems. Cc: stable@vger.kernel.org Fixes: f3bd9dae ("io_uring: fix memleak in __io_sqe_files_update()") Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 01 Sep, 2020 1 commit
-
-
Jiufei Xue authored
Index here is already the position of the file in fixed_file_table, we should not use io_file_from_index() again to get it. Otherwise, the wrong file which still in use may be released unexpectedly. Cc: stable@vger.kernel.org # v5.6 Fixes: 05f3fb3c ("io_uring: avoid ring quiesce for fixed file set unregister and update") Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 27 Aug, 2020 3 commits
-
-
Jens Axboe authored
These events happen inline from submission, so there's no need to bounce them through the original task. Just set them up for retry and issue retry directly instead of going over task_work. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This normally isn't hit, as polling is mostly done on NVMe with deep queue depths. But if we do run into request starvation, we need to ensure that retries are properly serialized. Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Make sure we clear req->result, which was set to -EAGAIN for retry purposes, when moving it to the reissue list. Otherwise we can end up retrying a request more than once, which leads to weird results in the io-wq handling (and other spots). Cc: stable@vger.kernel.org Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 26 Aug, 2020 1 commit
-
-
Jens Axboe authored
The man page for io_uring generally claims were consistent with what preadv2 and pwritev2 accept, but turns out there's a slight discrepancy in how offset == -1 is handled for pipes/streams. preadv doesn't allow it, but preadv2 does. This currently causes io_uring to return -EINVAL if that is attempted, but we should allow that as documented. This change makes us consistent with preadv2/pwritev2 for just passing in a NULL ppos for streams if the offset is -1. Cc: stable@vger.kernel.org # v5.7+ Reported-by: Benedikt Ames <wisp3rwind@posteo.eu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 25 Aug, 2020 4 commits
-
-
Jens Axboe authored
We need to call kiocb_done() for any ret < 0 to ensure that we always get the proper -ERESTARTSYS (and friends) transformation done. At some point this should be tied into general error handling, so we can get rid of the various (mostly network) related commands that check and perform this substitution. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
There's no point in using the poll handler if we can't do a nonblocking IO attempt of the operation, since we'll need to go async anyway. In fact this is actively harmful, as reading from eg pipes won't return 0 to indicate EOF. Cc: stable@vger.kernel.org # v5.7+ Reported-by: Benedikt Ames <wisp3rwind@posteo.eu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We do the initial accounting of locked_vm and pinned_vm before we have setup ctx->sqo_mm, which means we can end up having not accounted the memory at setup time, but still decrement it when we exit. This causes an imbalance in the accounting. Setup ctx->sqo_mm earlier in io_uring_create(), before we do the first accounting of mm->{locked,pinned}_vm. This also unifies the state grabbing for the ctx, and eliminates a failure case in io_sq_offload_start(). Fixes: f74441e6 ("io_uring: account locked memory before potential error case") Reported-by: Robert M. Muncrief <rmuncrief@humanavance.com> Reported-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Robert M. Muncrief <rmuncrief@humanavance.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Some consumers of the iov_iter will return an error, but still have bytes consumed in the iterator. This is an issue for -EAGAIN, since we rely on a sane iov_iter state across retries. Fix this by ensuring that we revert consumed bytes, if any, if the file operations have consumed any bytes from iterator. This is similar to what generic_file_read_iter() does, and is always safe as we have the previous bytes count handy already. Fixes: ff6165b2 ("io_uring: retain iov_iter state over io_read/io_write calls") Reported-by: Dmitry Shulyak <yashulyak@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 23 Aug, 2020 2 commits
-
-
Pavel Begunkov authored
Don't forget to update wqe->hash_tail after cancelling a pending work item, if it was hashed. Cc: stable@vger.kernel.org # 5.7+ Reported-by: Dmitry Shulyak <yashulyak@gmail.com> Fixes: 86f3cd1b ("io-wq: handle hashed writes in chains") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If an application is doing reads on signalfd, and we arm the poll handler because there's no data available, then the wakeup can recurse on the tasks sighand->siglock as the signal delivery from task_work_add() will use TWA_SIGNAL and that attempts to lock it again. We can detect the signalfd case pretty easily by comparing the poll->head wait_queue_head_t with the target task signalfd wait queue. Just use normal task wakeup for this case. Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 20 Aug, 2020 3 commits
-
-
Pavel Begunkov authored
If io_import_iovec() returns an error, return iovec is undefined and must not be used, so don't set it to NULL when failing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
kfree() handles NULL pointers well, but io_{read,write}() checks it because of performance reasons. Leave a comment there for those who are tempted to patch it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Setting and clearing REQ_F_OVERFLOW in io_uring_cancel_files() and io_cqring_overflow_flush() are racy, because they might be called asynchronously. REQ_F_OVERFLOW flag in only needed for files cancellation, so if it can be guaranteed that requests _currently_ marked inflight can't be overflown, the problem will be solved with removing the flag altogether. That's how the patch works, it removes inflight status of a request in io_cqring_fill_event() whenever it should be thrown into CQ-overflow list. That's Ok to do, because no opcode specific handling can be done after io_cqring_fill_event(), the same assumption as with "struct io_completion" patches. And it already have a good place for such cleanups, which is io_clean_op(). A nice side effect of this is removing this inflight check from the hot path. note on synchronisation: now __io_cqring_fill_event() may be taking two spinlocks simultaneously, completion_lock and inflight_lock. It's fine, because we never do that in reverse order, and CQ-overflow of inflight requests shouldn't happen often. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 19 Aug, 2020 1 commit
-
-
Jens Axboe authored
We currently use system_wq, which is unbounded in terms of number of workers. This means that if we're exiting tons of rings at the same time, then we'll briefly spawn tons of event kworkers just for a very short blocking time as the rings exit. Use system_unbound_wq instead, which has a sane cap on the concurrency level. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 18 Aug, 2020 1 commit
-
-
Jens Axboe authored
io_rw_prep_async() goes through a dance of clearing req->io, calling the iovec import, then re-setting req->io. Provide an internal helper that does the right thing without needing state tweaked to get there. This enables further cleanups in io_read, io_write, and io_resubmit_prep(), but that's left for another time. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 16 Aug, 2020 7 commits
-
-
Jens Axboe authored
The 5.9 merge moved this function io_uring, which means that we don't need to retain the generic nature of it. Clean up this part by removing redundant checks, and just inlining the small remainder in io_rw_should_retry(). No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Commit f254ac04 ("io_uring: enable lookup of links holding inflight files") only handled 2 out of the three head link cases we have, we also need to lookup and cancel work that is blocked in io-wq if that work has a link that's holding a reference to the files structure. Put the "cancel head links that hold this request pending" logic into io_attempt_cancel(), which will to through the motions of finding and canceling head links that hold the current inflight files stable request pending. Cc: stable@vger.kernel.org Reported-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Linus Torvalds authored
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull io_uring fixes from Jens Axboe: "A few differerent things in here. Seems like syzbot got some more io_uring bits wired up, and we got a handful of reports and the associated fixes are in here. General fixes too, and a lot of them marked for stable. Lastly, a bit of fallout from the async buffered reads, where we now more easily trigger short reads. Some applications don't really like that, so the io_read() code now handles short reads internally, and got a cleanup along the way so that it's now easier to read (and documented). We're now passing tests that failed before" * tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block: io_uring: short circuit -EAGAIN for blocking read attempt io_uring: sanitize double poll handling io_uring: internally retry short reads io_uring: retain iov_iter state over io_read/io_write calls task_work: only grab task signal lock when needed io_uring: enable lookup of links holding inflight files io_uring: fail poll arm on queue proc failure io_uring: hold 'ctx' reference around task_work queue + execute fs: RWF_NOWAIT should imply IOCB_NOIO io_uring: defer file table grabbing request cleanup for locked requests io_uring: add missing REQ_F_COMP_LOCKED for nested requests io_uring: fix recursive completion locking on oveflow flush io_uring: use TWA_SIGNAL for task_work uncondtionally io_uring: account locked memory before potential error case io_uring: set ctx sq/cq entry count earlier io_uring: Fix NULL pointer dereference in loop_rw_iter() io_uring: add comments on how the async buffered read retry works io_uring: io_async_buf_func() need not test page bit
-
Mike Rapoport authored
Commit 1355c31e ("asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()") converted parisc to use generic version of pmd_alloc_one() but it missed the fact that parisc uses order-1 pages for PMD. Restore the original version of pmd_alloc_one() for parisc, just use GFP_PGTABLE_KERNEL that implies __GFP_ZERO instead of GFP_KERNEL and memset. Fixes: 1355c31e ("asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()") Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Meelis Roos <mroos@linux.ee> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lkml.kernel.org/r/9f2b5ebd-e4a4-0fa1-6cd3-4b9f6892d1ad@linux.eeSigned-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block fixes from Jens Axboe: "A few fixes on the block side of things: - Discard granularity fix (Coly) - rnbd cleanups (Guoqing) - md error handling fix (Dan) - md sysfs fix (Junxiao) - Fix flush request accounting, which caused an IO slowdown for some configurations (Ming) - Properly propagate loop flag for partition scanning (Lennart)" * tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block: block: fix double account of flush request's driver tag loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURE rnbd: no need to set bi_end_io in rnbd_bio_map_kern rnbd: remove rnbd_dev_submit_io md-cluster: Fix potential error pointer dereference in resize_bitmaps() block: check queue's limits.discard_granularity in __blkdev_issue_discard() md: get sysfs entry after redundancy attr group create
-
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linuxLinus Torvalds authored
Pull RISC-V fix from Palmer Dabbelt: "I collected a single fix during the merge window: we managed to break the early trap setup on !MMU, this fixes it" * tag 'riscv-for-linus-5.9-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Setup exception vector for nommu platform
-