• Filipe Manana's avatar
    btrfs: fix race between quota rescan and disable leading to NULL pointer deref · b7adbf9a
    Filipe Manana authored
    If we have one task trying to start the quota rescan worker while another
    one is trying to disable quotas, we can end up hitting a race that results
    in the quota rescan worker doing a NULL pointer dereference. The steps for
    this are the following:
    
    1) Quotas are enabled;
    
    2) Task A calls the quota rescan ioctl and enters btrfs_qgroup_rescan().
       It calls qgroup_rescan_init() which returns 0 (success) and then joins a
       transaction and commits it;
    
    3) Task B calls the quota disable ioctl and enters btrfs_quota_disable().
       It clears the bit BTRFS_FS_QUOTA_ENABLED from fs_info->flags and calls
       btrfs_qgroup_wait_for_completion(), which returns immediately since the
       rescan worker is not yet running.
       Then it starts a transaction and locks fs_info->qgroup_ioctl_lock;
    
    4) Task A queues the rescan worker, by calling btrfs_queue_work();
    
    5) The rescan worker starts, and calls rescan_should_stop() at the start
       of its while loop, which results in 0 iterations of the loop, since
       the flag BTRFS_FS_QUOTA_ENABLED was cleared from fs_info->flags by
       task B at step 3);
    
    6) Task B sets fs_info->quota_root to NULL;
    
    7) The rescan worker tries to start a transaction and uses
       fs_info->quota_root as the root argument for btrfs_start_transaction().
       This results in a NULL pointer dereference down the call chain of
       btrfs_start_transaction(). The stack trace is something like the one
       reported in Link tag below:
    
       general protection fault, probably for non-canonical address 0xdffffc0000000041: 0000 [#1] PREEMPT SMP KASAN
       KASAN: null-ptr-deref in range [0x0000000000000208-0x000000000000020f]
       CPU: 1 PID: 34 Comm: kworker/u4:2 Not tainted 6.1.0-syzkaller-13872-gb6bb9676 #0
       Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
       Workqueue: btrfs-qgroup-rescan btrfs_work_helper
       RIP: 0010:start_transaction+0x48/0x10f0 fs/btrfs/transaction.c:564
       Code: 48 89 fb 48 (...)
       RSP: 0018:ffffc90000ab7ab0 EFLAGS: 00010206
       RAX: 0000000000000041 RBX: 0000000000000208 RCX: ffff88801779ba80
       RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
       RBP: dffffc0000000000 R08: 0000000000000001 R09: fffff52000156f5d
       R10: fffff52000156f5d R11: 1ffff92000156f5c R12: 0000000000000000
       R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000003
       FS:  0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f2bea75b718 CR3: 000000001d0cc000 CR4: 00000000003506e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        <TASK>
        btrfs_qgroup_rescan_worker+0x3bb/0x6a0 fs/btrfs/qgroup.c:3402
        btrfs_work_helper+0x312/0x850 fs/btrfs/async-thread.c:280
        process_one_work+0x877/0xdb0 kernel/workqueue.c:2289
        worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
        kthread+0x266/0x300 kernel/kthread.c:376
        ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
        </TASK>
       Modules linked in:
    
    So fix this by having the rescan worker function not attempt to start a
    transaction if it didn't do any rescan work.
    
    Reported-by: syzbot+96977faa68092ad382c4@syzkaller.appspotmail.com
    Link: https://lore.kernel.org/linux-btrfs/000000000000e5454b05f065a803@google.com/
    Fixes: e804861b ("btrfs: fix deadlock between quota disable and qgroup rescan worker")
    CC: stable@vger.kernel.org # 5.4+
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    b7adbf9a
qgroup.c 115 KB