• Qu Wenruo's avatar
    btrfs: qgroup: don't commit transaction when we already hold the handle · 6f23277a
    Qu Wenruo authored
    [BUG]
    When running the following script, btrfs will trigger an ASSERT():
    
      #/bin/bash
      mkfs.btrfs -f $dev
      mount $dev $mnt
      xfs_io -f -c "pwrite 0 1G" $mnt/file
      sync
      btrfs quota enable $mnt
      btrfs quota rescan -w $mnt
    
      # Manually set the limit below current usage
      btrfs qgroup limit 512M $mnt $mnt
    
      # Crash happens
      touch $mnt/file
    
    The dmesg looks like this:
    
      assertion failed: refcount_read(&trans->use_count) == 1, in fs/btrfs/transaction.c:2022
      ------------[ cut here ]------------
      kernel BUG at fs/btrfs/ctree.h:3230!
      invalid opcode: 0000 [#1] SMP PTI
      RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
       btrfs_commit_transaction.cold+0x11/0x5d [btrfs]
       try_flush_qgroup+0x67/0x100 [btrfs]
       __btrfs_qgroup_reserve_meta+0x3a/0x60 [btrfs]
       btrfs_delayed_update_inode+0xaa/0x350 [btrfs]
       btrfs_update_inode+0x9d/0x110 [btrfs]
       btrfs_dirty_inode+0x5d/0xd0 [btrfs]
       touch_atime+0xb5/0x100
       iterate_dir+0xf1/0x1b0
       __x64_sys_getdents64+0x78/0x110
       do_syscall_64+0x33/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7fb5afe588db
    
    [CAUSE]
    In try_flush_qgroup(), we assume we don't hold a transaction handle at
    all.  This is true for data reservation and mostly true for metadata.
    Since data space reservation always happens before we start a
    transaction, and for most metadata operation we reserve space in
    start_transaction().
    
    But there is an exception, btrfs_delayed_inode_reserve_metadata().
    It holds a transaction handle, while still trying to reserve extra
    metadata space.
    
    When we hit EDQUOT inside btrfs_delayed_inode_reserve_metadata(), we
    will join current transaction and commit, while we still have
    transaction handle from qgroup code.
    
    [FIX]
    Let's check current->journal before we join the transaction.
    
    If current->journal is unset or BTRFS_SEND_TRANS_STUB, it means
    we are not holding a transaction, thus are able to join and then commit
    transaction.
    
    If current->journal is a valid transaction handle, we avoid committing
    transaction and just end it
    
    This is less effective than committing current transaction, as it won't
    free metadata reserved space, but we may still free some data space
    before new data writes.
    
    Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1178634
    Fixes: c53e9653 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
    Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    6f23277a
qgroup.c 108 KB