• Nikolay Borisov's avatar
    btrfs: Fix delalloc inodes invalidation during transaction abort · fe816d0f
    Nikolay Borisov authored
    When a transaction is aborted btrfs_cleanup_transaction is called to
    cleanup all the various in-flight bits and pieces which migth be
    active. One of those is delalloc inodes - inodes which have dirty
    pages which haven't been persisted yet. Currently the process of
    freeing such delalloc inodes in exceptional circumstances such as
    transaction abort boiled down to calling btrfs_invalidate_inodes whose
    sole job is to invalidate the dentries for all inodes related to a
    root. This is in fact wrong and insufficient since such delalloc inodes
    will likely have pending pages or ordered-extents and will be linked to
    the sb->s_inode_list. This means that unmounting a btrfs instance with
    an aborted transaction could potentially lead inodes/their pages
    visible to the system long after their superblock has been freed. This
    in turn leads to a "use-after-free" situation once page shrink is
    triggered. This situation could be simulated by running generic/019
    which would cause such inodes to be left hanging, followed by
    generic/176 which causes memory pressure and page eviction which lead
    to touching the freed super block instance. This situation is
    additionally detected by the unmount code of VFS with the following
    message:
    
    "VFS: Busy inodes after unmount of Self-destruct in 5 seconds.  Have a nice day..."
    
    Additionally btrfs hits WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));
    in free_fs_root for the same reason.
    
    This patch aims to rectify the sitaution by doing the following:
    
    1. Change btrfs_destroy_delalloc_inodes so that it calls
    invalidate_inode_pages2 for every inode on the delalloc list, this
    ensures that all the pages of the inode are released. This function
    boils down to calling btrfs_releasepage. During test I observed cases
    where inodes on the delalloc list were having an i_count of 0, so this
    necessitates using igrab to be sure we are working on a non-freed inode.
    
    2. Since calling btrfs_releasepage might queue delayed iputs move the
    call out to btrfs_cleanup_transaction in btrfs_error_commit_super before
    calling run_delayed_iputs for the last time. This is necessary to ensure
    that delayed iputs are run.
    
    Note: this patch is tagged for 4.14 stable but the fix applies to older
    versions too but needs to be backported manually due to conflicts.
    
    CC: stable@vger.kernel.org # 4.14.x: 2b877331: btrfs: Split btrfs_del_delalloc_inode into 2 functions
    CC: stable@vger.kernel.org # 4.14.x
    Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    [ add comment to igrab ]
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    fe816d0f
disk-io.c 123 KB