• Jens Axboe's avatar
    io_uring: drop ctx->uring_lock before flushing work item · c018db4a
    Jens Axboe authored
    Ammar reports that he's seeing a lockdep splat on running test/rsrc_tags
    from the regression suite:
    
    ======================================================
    WARNING: possible circular locking dependency detected
    5.14.0-rc3-bluetea-test-00249-gc7d10223 #5 Tainted: G           OE
    ------------------------------------------------------
    kworker/2:4/2684 is trying to acquire lock:
    ffff88814bb1c0a8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_rsrc_put_work+0x13d/0x1a0
    
    but task is already holding lock:
    ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530
    
    which lock already depends on the new lock.
    
    the existing dependency chain (in reverse order) is:
    
    -> #1 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}:
           __flush_work+0x31b/0x490
           io_rsrc_ref_quiesce.part.0.constprop.0+0x35/0xb0
           __do_sys_io_uring_register+0x45b/0x1060
           do_syscall_64+0x35/0xb0
           entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    -> #0 (&ctx->uring_lock){+.+.}-{3:3}:
           __lock_acquire+0x119a/0x1e10
           lock_acquire+0xc8/0x2f0
           __mutex_lock+0x86/0x740
           io_rsrc_put_work+0x13d/0x1a0
           process_one_work+0x236/0x530
           worker_thread+0x52/0x3b0
           kthread+0x135/0x160
           ret_from_fork+0x1f/0x30
    
    other info that might help us debug this:
    
     Possible unsafe locking scenario:
    
           CPU0                    CPU1
           ----                    ----
      lock((work_completion)(&(&ctx->rsrc_put_work)->work));
                                   lock(&ctx->uring_lock);
                                   lock((work_completion)(&(&ctx->rsrc_put_work)->work));
      lock(&ctx->uring_lock);
    
     *** DEADLOCK ***
    
    2 locks held by kworker/2:4/2684:
     #0: ffff88810004d938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530
     #1: ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530
    
    stack backtrace:
    CPU: 2 PID: 2684 Comm: kworker/2:4 Tainted: G           OE     5.14.0-rc3-bluetea-test-00249-gc7d10223 #5
    Hardware name: Acer Aspire ES1-421/OLVIA_BE, BIOS V1.05 07/02/2015
    Workqueue: events io_rsrc_put_work
    Call Trace:
     dump_stack_lvl+0x6a/0x9a
     check_noncircular+0xfe/0x110
     __lock_acquire+0x119a/0x1e10
     lock_acquire+0xc8/0x2f0
     ? io_rsrc_put_work+0x13d/0x1a0
     __mutex_lock+0x86/0x740
     ? io_rsrc_put_work+0x13d/0x1a0
     ? io_rsrc_put_work+0x13d/0x1a0
     ? io_rsrc_put_work+0x13d/0x1a0
     ? process_one_work+0x1ce/0x530
     io_rsrc_put_work+0x13d/0x1a0
     process_one_work+0x236/0x530
     worker_thread+0x52/0x3b0
     ? process_one_work+0x530/0x530
     kthread+0x135/0x160
     ? set_kthread_struct+0x40/0x40
     ret_from_fork+0x1f/0x30
    
    which is due to holding the ctx->uring_lock when flushing existing
    pending work, while the pending work flushing may need to grab the uring
    lock if we're using IOPOLL.
    
    Fix this by dropping the uring_lock a bit earlier as part of the flush.
    
    Cc: stable@vger.kernel.org
    Link: https://github.com/axboe/liburing/issues/404Tested-by: default avatarAmmar Faizi <ammarfaizi2@gmail.com>
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    c018db4a
io_uring.c 253 KB