• Baokun Li's avatar
    writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs · 1ba1199e
    Baokun Li authored
    KASAN report null-ptr-deref:
    ==================================================================
    BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
    Write of size 8 at addr 0000000000000000 by task sync/943
    CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
    Call Trace:
     <TASK>
     dump_stack_lvl+0x7f/0xc0
     print_report+0x2ba/0x340
     kasan_report+0xc4/0x120
     kasan_check_range+0x1b7/0x2e0
     __kasan_check_write+0x24/0x40
     bdi_split_work_to_wbs+0x5c5/0x7b0
     sync_inodes_sb+0x195/0x630
     sync_inodes_one_sb+0x3a/0x50
     iterate_supers+0x106/0x1b0
     ksys_sync+0x98/0x160
    [...]
    ==================================================================
    
    The race that causes the above issue is as follows:
    
               cpu1                     cpu2
    -------------------------|-------------------------
    inode_switch_wbs
     INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
     queue_rcu_work(isw_wq, &isw->work)
     // queue_work async
      inode_switch_wbs_work_fn
       wb_put_many(old_wb, nr_switched)
        percpu_ref_put_many
         ref->data->release(ref)
         cgwb_release
          queue_work(cgwb_release_wq, &wb->release_work)
          // queue_work async
           &wb->release_work
           cgwb_release_workfn
                                ksys_sync
                                 iterate_supers
                                  sync_inodes_one_sb
                                   sync_inodes_sb
                                    bdi_split_work_to_wbs
                                     kmalloc(sizeof(*work), GFP_ATOMIC)
                                     // alloc memory failed
            percpu_ref_exit
             ref->data = NULL
             kfree(data)
                                     wb_get(wb)
                                      percpu_ref_get(&wb->refcnt)
                                       percpu_ref_get_many(ref, 1)
                                        atomic_long_add(nr, &ref->data->count)
                                         atomic64_add(i, v)
                                         // trigger null-ptr-deref
    
    bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all
    wbs.  If the allocation of new work fails, the on-stack fallback will be
    used and the reference count of the current wb is increased afterwards. 
    If cgroup writeback membership switches occur before getting the reference
    count and the current wb is released as old_wd, then calling wb_get() or
    wb_put() will trigger the null pointer dereference above.
    
    This issue was introduced in v4.3-rc7 (see fix tag1).  Both
    sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
    bdi_split_work_to_wbs() can trigger this issue.  For scenarios called via
    sync_inodes_sb(), originally commit 7fc5854f ("writeback: synchronize
    sync(2) against cgroup writeback membership switches") reduced the
    possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
    fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
    inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io,
    thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
    and the issue becomes easily reproducible again.
    
    To solve this problem, percpu_ref_exit() is called under RCU protection to
    avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs(). 
    Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
    and skip the current wb if wb_tryget() fails because the wb has already
    been shutdown.
    
    Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
    Fixes: b817525a ("writeback: bdi_writeback iteration must not skip dying ones")
    Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
    Reviewed-by: default avatarJan Kara <jack@suse.cz>
    Acked-by: default avatarTejun Heo <tj@kernel.org>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andreas Dilger <adilger.kernel@dilger.ca>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Dennis Zhou <dennis@kernel.org>
    Cc: Hou Tao <houtao1@huawei.com>
    Cc: yangerkun <yangerkun@huawei.com>
    Cc: Zhang Yi <yi.zhang@huawei.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    1ba1199e
backing-dev.c 27.6 KB