• Jens Axboe's avatar
    io_uring: fix SQPOLL IORING_OP_CLOSE cancelation state · 607ec89e
    Jens Axboe authored
    IORING_OP_CLOSE is special in terms of cancelation, since it has an
    intermediate state where we've removed the file descriptor but hasn't
    closed the file yet. For that reason, it's currently marked with
    IO_WQ_WORK_NO_CANCEL to prevent cancelation. This ensures that the op
    is always run even if canceled, to prevent leaving us with a live file
    but an fd that is gone. However, with SQPOLL, since a cancel request
    doesn't carry any resources on behalf of the request being canceled, if
    we cancel before any of the close op has been run, we can end up with
    io-wq not having the ->files assigned. This can result in the following
    oops reported by Joseph:
    
    BUG: kernel NULL pointer dereference, address: 00000000000000d8
    PGD 800000010b76f067 P4D 800000010b76f067 PUD 10b462067 PMD 0
    Oops: 0000 [#1] SMP PTI
    CPU: 1 PID: 1788 Comm: io_uring-sq Not tainted 5.11.0-rc4 #1
    Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    RIP: 0010:__lock_acquire+0x19d/0x18c0
    Code: 00 00 8b 1d fd 56 dd 08 85 db 0f 85 43 05 00 00 48 c7 c6 98 7b 95 82 48 c7 c7 57 96 93 82 e8 9a bc f5 ff 0f 0b e9 2b 05 00 00 <48> 81 3f c0 ca 67 8a b8 00 00 00 00 41 0f 45 c0 89 04 24 e9 81 fe
    RSP: 0018:ffffc90001933828 EFLAGS: 00010002
    RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000d8
    RBP: 0000000000000246 R08: 0000000000000001 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000000 R14: ffff888106e8a140 R15: 00000000000000d8
    FS:  0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000000d8 CR3: 0000000106efa004 CR4: 00000000003706e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     lock_acquire+0x31a/0x440
     ? close_fd_get_file+0x39/0x160
     ? __lock_acquire+0x647/0x18c0
     _raw_spin_lock+0x2c/0x40
     ? close_fd_get_file+0x39/0x160
     close_fd_get_file+0x39/0x160
     io_issue_sqe+0x1334/0x14e0
     ? lock_acquire+0x31a/0x440
     ? __io_free_req+0xcf/0x2e0
     ? __io_free_req+0x175/0x2e0
     ? find_held_lock+0x28/0xb0
     ? io_wq_submit_work+0x7f/0x240
     io_wq_submit_work+0x7f/0x240
     io_wq_cancel_cb+0x161/0x580
     ? io_wqe_wake_worker+0x114/0x360
     ? io_uring_get_socket+0x40/0x40
     io_async_find_and_cancel+0x3b/0x140
     io_issue_sqe+0xbe1/0x14e0
     ? __lock_acquire+0x647/0x18c0
     ? __io_queue_sqe+0x10b/0x5f0
     __io_queue_sqe+0x10b/0x5f0
     ? io_req_prep+0xdb/0x1150
     ? mark_held_locks+0x6d/0xb0
     ? mark_held_locks+0x6d/0xb0
     ? io_queue_sqe+0x235/0x4b0
     io_queue_sqe+0x235/0x4b0
     io_submit_sqes+0xd7e/0x12a0
     ? _raw_spin_unlock_irq+0x24/0x30
     ? io_sq_thread+0x3ae/0x940
     io_sq_thread+0x207/0x940
     ? do_wait_intr_irq+0xc0/0xc0
     ? __ia32_sys_io_uring_enter+0x650/0x650
     kthread+0x134/0x180
     ? kthread_create_worker_on_cpu+0x90/0x90
     ret_from_fork+0x1f/0x30
    
    Fix this by moving the IO_WQ_WORK_NO_CANCEL until _after_ we've modified
    the fdtable. Canceling before this point is totally fine, and running
    it in the io-wq context _after_ that point is also fine.
    
    For 5.12, we'll handle this internally and get rid of the no-cancel
    flag, as IORING_OP_CLOSE is the only user of it.
    
    Cc: stable@vger.kernel.org
    Fixes: b5dba59e ("io_uring: add support for IORING_OP_CLOSE")
    Reported-by: "Abaci <abaci@linux.alibaba.com>"
    Reviewed-and-tested-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    607ec89e
io_uring.c 245 KB