• Su Yue's avatar
    bcachefs: grab s_umount only if snapshotting · 2acc59dd
    Su Yue authored
    When I was testing mongodb over bcachefs with compression,
    there is a lockdep warning when snapshotting mongodb data volume.
    
    $ cat test.sh
    prog=bcachefs
    
    $prog subvolume create /mnt/data
    $prog subvolume create /mnt/data/snapshots
    
    while true;do
        $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s)
        sleep 1s
    done
    
    $ cat /etc/mongodb.conf
    systemLog:
      destination: file
      logAppend: true
      path: /mnt/data/mongod.log
    
    storage:
      dbPath: /mnt/data/
    
    lockdep reports:
    [ 3437.452330] ======================================================
    [ 3437.452750] WARNING: possible circular locking dependency detected
    [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G            E
    [ 3437.453562] ------------------------------------------------------
    [ 3437.453981] bcachefs/35533 is trying to acquire lock:
    [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190
    [ 3437.454875]
                   but task is already holding lock:
    [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
    [ 3437.456009]
                   which lock already depends on the new lock.
    
    [ 3437.456553]
                   the existing dependency chain (in reverse order) is:
    [ 3437.457054]
                   -> #3 (&type->s_umount_key#48){.+.+}-{3:3}:
    [ 3437.457507]        down_read+0x3e/0x170
    [ 3437.457772]        bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
    [ 3437.458206]        __x64_sys_ioctl+0x93/0xd0
    [ 3437.458498]        do_syscall_64+0x42/0xf0
    [ 3437.458779]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
    [ 3437.459155]
                   -> #2 (&c->snapshot_create_lock){++++}-{3:3}:
    [ 3437.459615]        down_read+0x3e/0x170
    [ 3437.459878]        bch2_truncate+0x82/0x110 [bcachefs]
    [ 3437.460276]        bchfs_truncate+0x254/0x3c0 [bcachefs]
    [ 3437.460686]        notify_change+0x1f1/0x4a0
    [ 3437.461283]        do_truncate+0x7f/0xd0
    [ 3437.461555]        path_openat+0xa57/0xce0
    [ 3437.461836]        do_filp_open+0xb4/0x160
    [ 3437.462116]        do_sys_openat2+0x91/0xc0
    [ 3437.462402]        __x64_sys_openat+0x53/0xa0
    [ 3437.462701]        do_syscall_64+0x42/0xf0
    [ 3437.462982]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
    [ 3437.463359]
                   -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}:
    [ 3437.463843]        down_write+0x3b/0xc0
    [ 3437.464223]        bch2_write_iter+0x5b/0xcc0 [bcachefs]
    [ 3437.464493]        vfs_write+0x21b/0x4c0
    [ 3437.464653]        ksys_write+0x69/0xf0
    [ 3437.464839]        do_syscall_64+0x42/0xf0
    [ 3437.465009]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
    [ 3437.465231]
                   -> #0 (sb_writers#10){.+.+}-{0:0}:
    [ 3437.465471]        __lock_acquire+0x1455/0x21b0
    [ 3437.465656]        lock_acquire+0xc6/0x2b0
    [ 3437.465822]        mnt_want_write+0x46/0x1a0
    [ 3437.465996]        filename_create+0x62/0x190
    [ 3437.466175]        user_path_create+0x2d/0x50
    [ 3437.466352]        bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
    [ 3437.466617]        __x64_sys_ioctl+0x93/0xd0
    [ 3437.466791]        do_syscall_64+0x42/0xf0
    [ 3437.466957]        entry_SYSCALL_64_after_hwframe+0x6e/0x76
    [ 3437.467180]
                   other info that might help us debug this:
    
    [ 3437.469670] 2 locks held by bcachefs/35533:
                   other info that might help us debug this:
    
    [ 3437.467507] Chain exists of:
                     sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48
    
    [ 3437.467979]  Possible unsafe locking scenario:
    
    [ 3437.468223]        CPU0                    CPU1
    [ 3437.468405]        ----                    ----
    [ 3437.468585]   rlock(&type->s_umount_key#48);
    [ 3437.468758]                                lock(&c->snapshot_create_lock);
    [ 3437.469030]                                lock(&type->s_umount_key#48);
    [ 3437.469291]   rlock(sb_writers#10);
    [ 3437.469434]
                    *** DEADLOCK ***
    
    [ 3437.469670] 2 locks held by bcachefs/35533:
    [ 3437.469838]  #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs]
    [ 3437.470294]  #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
    [ 3437.470744]
                   stack backtrace:
    [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G            E      6.7.0-rc7-custom+ #85
    [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
    [ 3437.471694] Call Trace:
    [ 3437.471795]  <TASK>
    [ 3437.471884]  dump_stack_lvl+0x57/0x90
    [ 3437.472035]  check_noncircular+0x132/0x150
    [ 3437.472202]  __lock_acquire+0x1455/0x21b0
    [ 3437.472369]  lock_acquire+0xc6/0x2b0
    [ 3437.472518]  ? filename_create+0x62/0x190
    [ 3437.472683]  ? lock_is_held_type+0x97/0x110
    [ 3437.472856]  mnt_want_write+0x46/0x1a0
    [ 3437.473025]  ? filename_create+0x62/0x190
    [ 3437.473204]  filename_create+0x62/0x190
    [ 3437.473380]  user_path_create+0x2d/0x50
    [ 3437.473555]  bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
    [ 3437.473819]  ? lock_acquire+0xc6/0x2b0
    [ 3437.474002]  ? __fget_files+0x2a/0x190
    [ 3437.474195]  ? __fget_files+0xbc/0x190
    [ 3437.474380]  ? lock_release+0xc5/0x270
    [ 3437.474567]  ? __x64_sys_ioctl+0x93/0xd0
    [ 3437.474764]  ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs]
    [ 3437.475090]  __x64_sys_ioctl+0x93/0xd0
    [ 3437.475277]  do_syscall_64+0x42/0xf0
    [ 3437.475454]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
    [ 3437.475691] RIP: 0033:0x7f2743c313af
    ======================================================
    
    In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally
    and unlock it at the end of the function. There is a comment
    "why do we need this lock?" about the lock coming from
    commit 42d23732 ("bcachefs: Snapshot creation, deletion")
    The reason is that __bch2_ioctl_subvolume_create() calls
    sync_inodes_sb() which enforce locked s_umount to writeback all dirty
    nodes before doing snapshot works.
    
    Fix it by read locking s_umount for snapshotting only and unlocking
    s_umount after sync_inodes_sb().
    Signed-off-by: default avatarSu Yue <glass.su@suse.com>
    Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
    2acc59dd
fs-ioctl.c 12.4 KB