• Dave Chinner's avatar
    xfs: add RENAME_WHITEOUT support · 7dcf5c3e
    Dave Chinner authored
    Whiteouts are used by overlayfs -  it has a crazy convention that a
    whiteout is a character device inode with a major:minor of 0:0.
    Because it's not documented anywhere, here's an example of what
    RENAME_WHITEOUT does on ext4:
    
    # echo foo > /mnt/scratch/foo
    # echo bar > /mnt/scratch/bar
    # ls -l /mnt/scratch
    total 24
    -rw-r--r-- 1 root root     4 Feb 11 20:22 bar
    -rw-r--r-- 1 root root     4 Feb 11 20:22 foo
    drwx------ 2 root root 16384 Feb 11 20:18 lost+found
    # src/renameat2 -w /mnt/scratch/foo /mnt/scratch/bar
    # ls -l /mnt/scratch
    total 20
    -rw-r--r-- 1 root root     4 Feb 11 20:22 bar
    c--------- 1 root root  0, 0 Feb 11 20:23 foo
    drwx------ 2 root root 16384 Feb 11 20:18 lost+found
    # cat /mnt/scratch/bar
    foo
    #
    
    In XFS rename terms, the operation that has been done is that source
    (foo) has been moved to the target (bar), which is like a nomal
    rename operation, but rather than the source being removed, it have
    been replaced with a whiteout.
    
    We can't allocate whiteout inodes within the rename transaction due
    to allocation being a multi-commit transaction: rename needs to
    be a single, atomic commit. Hence we have several options here, form
    most efficient to least efficient:
    
        - use DT_WHT in the target dirent and do no whiteout inode
          allocation.  The main issue with this approach is that we need
          hooks in lookup to create a virtual chardev inode to present
          to userspace and in places where we might need to modify the
          dirent e.g. unlink.  Overlayfs also needs to be taught about
          DT_WHT. Most invasive change, lowest overhead.
    
        - create a special whiteout inode in the root directory (e.g. a
          ".wino" dirent) and then hardlink every new whiteout to it.
          This means we only need to create a single whiteout inode, and
          rename simply creates a hardlink to it. We can use DT_WHT for
          these, though using DT_CHR means we won't have to modify
          overlayfs, nor anything in userspace. Downside is we have to
          look up the whiteout inode on every operation and create it if
          it doesn't exist.
    
        - copy ext4: create a special whiteout chardev inode for every
          whiteout.  This is more complex than the above options because
          of the lack of atomicity between inode creation and the rename
          operation, requiring us to create a tmpfile inode and then
          linking it into the directory structure during the rename. At
          least with a tmpfile inode crashes between the create and
          rename doesn't leave unreferenced inodes or directory
          pollution around.
    
    By far the simplest thing to do in the short term is to copy ext4.
    While it is the most inefficient way of supporting whiteouts, but as
    an initial implementation we can simply reuse existing functions and
    add a small amount of extra code the the rename operation.
    
    When we get full whiteout support in the VFS (via the dentry cache)
    we can then look to supporting DT_WHT method outlined as the first
    method of supporting whiteouts. But until then, we'll stick with
    what overlayfs expects us to be: dumb and stupid.
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    7dcf5c3e
xfs_inode.c 94.3 KB