1. 18 Feb, 2021 10 commits
  2. 17 Feb, 2021 1 commit
  3. 16 Feb, 2021 1 commit
  4. 15 Feb, 2021 1 commit
  5. 13 Feb, 2021 3 commits
  6. 12 Feb, 2021 10 commits
  7. 11 Feb, 2021 6 commits
  8. 10 Feb, 2021 8 commits
    • Colin Ian King's avatar
      io_uring: remove redundant initialization of variable ret · 4a245479
      Colin Ian King authored
      The variable ret is being initialized with a value that is never read
      and it is being updated later with a new value.  The initialization is
      redundant and can be removed.
      
      Addresses-Coverity: ("Unused value")
      Fixes: b63534c4 ("io_uring: re-issue block requests that failed because of resources")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4a245479
    • Pavel Begunkov's avatar
      io_uring: unpark SQPOLL thread for cancelation · 34343786
      Pavel Begunkov authored
      We park SQPOLL task before going into io_uring_cancel_files(), so the
      task won't run task_works including those that might be important for
      the cancellation passes. In this case it's io_poll_remove_one(), which
      frees requests via io_put_req_deferred().
      
      Unpark it for while waiting, it's ok as we disable submissions
      beforehand, so no new requests will be generated.
      
      INFO: task syz-executor893:8493 blocked for more than 143 seconds.
      Call Trace:
       context_switch kernel/sched/core.c:4327 [inline]
       __schedule+0x90c/0x21a0 kernel/sched/core.c:5078
       schedule+0xcf/0x270 kernel/sched/core.c:5157
       io_uring_cancel_files fs/io_uring.c:8912 [inline]
       io_uring_cancel_task_requests+0xe70/0x11a0 fs/io_uring.c:8979
       __io_uring_files_cancel+0x110/0x1b0 fs/io_uring.c:9067
       io_uring_files_cancel include/linux/io_uring.h:51 [inline]
       do_exit+0x2fe/0x2ae0 kernel/exit.c:780
       do_group_exit+0x125/0x310 kernel/exit.c:922
       __do_sys_exit_group kernel/exit.c:933 [inline]
       __se_sys_exit_group kernel/exit.c:931 [inline]
       __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:931
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Cc: stable@vger.kernel.org # 5.5+
      Reported-by: syzbot+695b03d82fa8e4901b06@syzkaller.appspotmail.com
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      34343786
    • Jens Axboe's avatar
      io_uring: place ring SQ/CQ arrays under memcg memory limits · 26bfa89e
      Jens Axboe authored
      Instead of imposing rlimit memlock limits for the rings themselves,
      ensure that we account them properly under memcg with __GFP_ACCOUNT.
      We retain rlimit memlock for registered buffers, this is just for the
      ring arrays themselves.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      26bfa89e
    • Jens Axboe's avatar
      io_uring: enable kmemcg account for io_uring requests · 91f245d5
      Jens Axboe authored
      This puts io_uring under the memory cgroups accounting and limits for
      requests.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      91f245d5
    • Jens Axboe's avatar
      io_uring: enable req cache for IRQ driven IO · c7dae4ba
      Jens Axboe authored
      This is the last class of requests that cannot utilize the req alloc
      cache. Add a per-ctx req cache that is protected by the completion_lock,
      and refill our submit side cache when it gets over our batch count.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c7dae4ba
    • Hao Xu's avatar
      io_uring: fix possible deadlock in io_uring_poll · ed670c3f
      Hao Xu authored
      Abaci reported follow issue:
      
      [   30.615891] ======================================================
      [   30.616648] WARNING: possible circular locking dependency detected
      [   30.617423] 5.11.0-rc3-next-20210115 #1 Not tainted
      [   30.618035] ------------------------------------------------------
      [   30.618914] a.out/1128 is trying to acquire lock:
      [   30.619520] ffff88810b063868 (&ep->mtx){+.+.}-{3:3}, at: __ep_eventpoll_poll+0x9f/0x220
      [   30.620505]
      [   30.620505] but task is already holding lock:
      [   30.621218] ffff88810e952be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0
      [   30.622349]
      [   30.622349] which lock already depends on the new lock.
      [   30.622349]
      [   30.623289]
      [   30.623289] the existing dependency chain (in reverse order) is:
      [   30.624243]
      [   30.624243] -> #1 (&ctx->uring_lock){+.+.}-{3:3}:
      [   30.625263]        lock_acquire+0x2c7/0x390
      [   30.625868]        __mutex_lock+0xae/0x9f0
      [   30.626451]        io_cqring_overflow_flush.part.95+0x6d/0x70
      [   30.627278]        io_uring_poll+0xcb/0xd0
      [   30.627890]        ep_item_poll.isra.14+0x4e/0x90
      [   30.628531]        do_epoll_ctl+0xb7e/0x1120
      [   30.629122]        __x64_sys_epoll_ctl+0x70/0xb0
      [   30.629770]        do_syscall_64+0x2d/0x40
      [   30.630332]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   30.631187]
      [   30.631187] -> #0 (&ep->mtx){+.+.}-{3:3}:
      [   30.631985]        check_prevs_add+0x226/0xb00
      [   30.632584]        __lock_acquire+0x1237/0x13a0
      [   30.633207]        lock_acquire+0x2c7/0x390
      [   30.633740]        __mutex_lock+0xae/0x9f0
      [   30.634258]        __ep_eventpoll_poll+0x9f/0x220
      [   30.634879]        __io_arm_poll_handler+0xbf/0x220
      [   30.635462]        io_issue_sqe+0xa6b/0x13e0
      [   30.635982]        __io_queue_sqe+0x10b/0x550
      [   30.636648]        io_queue_sqe+0x235/0x470
      [   30.637281]        io_submit_sqes+0xcce/0xf10
      [   30.637839]        __x64_sys_io_uring_enter+0x3fb/0x5b0
      [   30.638465]        do_syscall_64+0x2d/0x40
      [   30.638999]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   30.639643]
      [   30.639643] other info that might help us debug this:
      [   30.639643]
      [   30.640618]  Possible unsafe locking scenario:
      [   30.640618]
      [   30.641402]        CPU0                    CPU1
      [   30.641938]        ----                    ----
      [   30.642664]   lock(&ctx->uring_lock);
      [   30.643425]                                lock(&ep->mtx);
      [   30.644498]                                lock(&ctx->uring_lock);
      [   30.645668]   lock(&ep->mtx);
      [   30.646321]
      [   30.646321]  *** DEADLOCK ***
      [   30.646321]
      [   30.647642] 1 lock held by a.out/1128:
      [   30.648424]  #0: ffff88810e952be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0
      [   30.649954]
      [   30.649954] stack backtrace:
      [   30.650592] CPU: 1 PID: 1128 Comm: a.out Not tainted 5.11.0-rc3-next-20210115 #1
      [   30.651554] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [   30.652290] Call Trace:
      [   30.652688]  dump_stack+0xac/0xe3
      [   30.653164]  check_noncircular+0x11e/0x130
      [   30.653747]  ? check_prevs_add+0x226/0xb00
      [   30.654303]  check_prevs_add+0x226/0xb00
      [   30.654845]  ? add_lock_to_list.constprop.49+0xac/0x1d0
      [   30.655564]  __lock_acquire+0x1237/0x13a0
      [   30.656262]  lock_acquire+0x2c7/0x390
      [   30.656788]  ? __ep_eventpoll_poll+0x9f/0x220
      [   30.657379]  ? __io_queue_proc.isra.88+0x180/0x180
      [   30.658014]  __mutex_lock+0xae/0x9f0
      [   30.658524]  ? __ep_eventpoll_poll+0x9f/0x220
      [   30.659112]  ? mark_held_locks+0x5a/0x80
      [   30.659648]  ? __ep_eventpoll_poll+0x9f/0x220
      [   30.660229]  ? _raw_spin_unlock_irqrestore+0x2d/0x40
      [   30.660885]  ? trace_hardirqs_on+0x46/0x110
      [   30.661471]  ? __io_queue_proc.isra.88+0x180/0x180
      [   30.662102]  ? __ep_eventpoll_poll+0x9f/0x220
      [   30.662696]  __ep_eventpoll_poll+0x9f/0x220
      [   30.663273]  ? __ep_eventpoll_poll+0x220/0x220
      [   30.663875]  __io_arm_poll_handler+0xbf/0x220
      [   30.664463]  io_issue_sqe+0xa6b/0x13e0
      [   30.664984]  ? __lock_acquire+0x782/0x13a0
      [   30.665544]  ? __io_queue_proc.isra.88+0x180/0x180
      [   30.666170]  ? __io_queue_sqe+0x10b/0x550
      [   30.666725]  __io_queue_sqe+0x10b/0x550
      [   30.667252]  ? __fget_files+0x131/0x260
      [   30.667791]  ? io_req_prep+0xd8/0x1090
      [   30.668316]  ? io_queue_sqe+0x235/0x470
      [   30.668868]  io_queue_sqe+0x235/0x470
      [   30.669398]  io_submit_sqes+0xcce/0xf10
      [   30.669931]  ? xa_load+0xe4/0x1c0
      [   30.670425]  __x64_sys_io_uring_enter+0x3fb/0x5b0
      [   30.671051]  ? lockdep_hardirqs_on_prepare+0xde/0x180
      [   30.671719]  ? syscall_enter_from_user_mode+0x2b/0x80
      [   30.672380]  do_syscall_64+0x2d/0x40
      [   30.672901]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   30.673503] RIP: 0033:0x7fd89c813239
      [   30.673962] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05  3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48
      [   30.675920] RSP: 002b:00007ffc65a7c628 EFLAGS: 00000217 ORIG_RAX: 00000000000001aa
      [   30.676791] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd89c813239
      [   30.677594] RDX: 0000000000000000 RSI: 0000000000000014 RDI: 0000000000000003
      [   30.678678] RBP: 00007ffc65a7c720 R08: 0000000000000000 R09: 0000000003000000
      [   30.679492] R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000400ff0
      [   30.680282] R13: 00007ffc65a7c840 R14: 0000000000000000 R15: 0000000000000000
      
      This might happen if we do epoll_wait on a uring fd while reading/writing
      the former epoll fd in a sqe in the former uring instance.
      So let's don't flush cqring overflow list, just do a simple check.
      Reported-by: default avatarAbaci <abaci@linux.alibaba.com>
      Fixes: 6c503150 ("io_uring: patch up IOPOLL overflow_flush sync")
      Signed-off-by: default avatarHao Xu <haoxu@linux.alibaba.com>
      Reviewed-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ed670c3f
    • Pavel Begunkov's avatar
      io_uring: defer flushing cached reqs · e5d1bc0a
      Pavel Begunkov authored
      Awhile there are requests in the allocation cache -- use them, only if
      those ended go for the stashed memory in comp.free_list. As list
      manipulation are generally heavy and are not good for caches, flush them
      all or as much as can in one go.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      [axboe: return success/failure from io_flush_cached_reqs()]
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e5d1bc0a
    • Pavel Begunkov's avatar
      io_uring: take comp_state from ctx · c5eef2b9
      Pavel Begunkov authored
      __io_queue_sqe() is always called with a non-NULL comp_state, which is
      taken directly from context. Don't pass it around but infer from ctx.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c5eef2b9