Commit 2eedfa9e authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'v6.5/vfs.rename.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs rename locking updates from Christian Brauner:
 "This contains the work from Jan to fix problems with cross-directory
  renames originally reported in [1].

  To quickly sum it up some filesystems (so far we know at least about
  ext4, udf, f2fs, ocfs2, likely also reiserfs, gfs2 and others) need to
  lock the directory when it is being renamed into another directory.

  This is because we need to update the parent pointer in the directory
  in that case and if that races with other operations on the directory,
  in particular a conversion from one directory format into another, bad
  things can happen.

  So far we've done the locking in the filesystem code but recently
  Darrick pointed out in [2] that the RENAME_EXCHANGE case was missing.
  That one is particularly nasty because RENAME_EXCHANGE can arbitrarily
  mix regular files and directories and proper lock ordering is not
  achievable in the filesystems alone.

  This patch set adds locking into vfs_rename() so that not only parent
  directories but also moved inodes, regardless of whether they are
  directories or not, are locked when calling into the filesystem.

  This means establishing a locking order for unrelated directories. New
  helpers are added for this purpose and our documentation is updated to
  cover this in detail.

  The locking is now actually easier to follow as we now always lock
  source and target. We've always locked the target independent of
  whether it was a directory or file and we've always locked source if
  it was a regular file. The exact details for why this came about can
  be found in [3] and [4]"

Link: https://lore.kernel.org/all/20230117123735.un7wbamlbdihninm@quack3 [1]
Link: https://lore.kernel.org/all/20230517045836.GA11594@frogsfrogsfrogs [2]
Link: https://lore.kernel.org/all/20230526-schrebergarten-vortag-9cd89694517e@brauner [3]
Link: https://lore.kernel.org/all/20230530-seenotrettung-allrad-44f4b00139d4@brauner [4]

* tag 'v6.5/vfs.rename.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: Restrict lock_two_nondirectories() to non-directory inodes
  fs: Lock moved directories
  fs: Establish locking order for unrelated directories
  Revert "f2fs: fix potential corruption when moving a directory"
  Revert "udf: Protect rename against modification of moved directory"
  ext4: Remove ext4 locking of moved directory
parents 64bf6ae9 2454ad83
......@@ -22,12 +22,11 @@ exclusive.
3) object removal. Locking rules: caller locks parent, finds victim,
locks victim and calls the method. Locks are exclusive.
4) rename() that is _not_ cross-directory. Locking rules: caller locks
the parent and finds source and target. In case of exchange (with
RENAME_EXCHANGE in flags argument) lock both. In any case,
if the target already exists, lock it. If the source is a non-directory,
lock it. If we need to lock both, lock them in inode pointer order.
Then call the method. All locks are exclusive.
4) rename() that is _not_ cross-directory. Locking rules: caller locks the
parent and finds source and target. We lock both (provided they exist). If we
need to lock two inodes of different type (dir vs non-dir), we lock directory
first. If we need to lock two inodes of the same type, lock them in inode
pointer order. Then call the method. All locks are exclusive.
NB: we might get away with locking the source (and target in exchange
case) shared.
......@@ -44,15 +43,17 @@ All locks are exclusive.
rules:
* lock the filesystem
* lock parents in "ancestors first" order.
* lock parents in "ancestors first" order. If one is not ancestor of
the other, lock them in inode pointer order.
* find source and target.
* if old parent is equal to or is a descendent of target
fail with -ENOTEMPTY
* if new parent is equal to or is a descendent of source
fail with -ELOOP
* If it's an exchange, lock both the source and the target.
* If the target exists, lock it. If the source is a non-directory,
lock it. If we need to lock both, do so in inode pointer order.
* Lock both the source and the target provided they exist. If we
need to lock two inodes of different type (dir vs non-dir), we lock
the directory first. If we need to lock two inodes of the same type,
lock them in inode pointer order.
* call the method.
All ->i_rwsem are taken exclusive. Again, we might get away with locking
......@@ -66,8 +67,9 @@ If no directory is its own ancestor, the scheme above is deadlock-free.
Proof:
First of all, at any moment we have a partial ordering of the
objects - A < B iff A is an ancestor of B.
First of all, at any moment we have a linear ordering of the
objects - A < B iff (A is an ancestor of B) or (B is not an ancestor
of A and ptr(A) < ptr(B)).
That ordering can change. However, the following is true:
......
......@@ -3834,19 +3834,10 @@ static int ext4_rename(struct mnt_idmap *idmap, struct inode *old_dir,
return retval;
}
/*
* We need to protect against old.inode directory getting converted
* from inline directory format into a normal one.
*/
if (S_ISDIR(old.inode->i_mode))
inode_lock_nested(old.inode, I_MUTEX_NONDIR2);
old.bh = ext4_find_entry(old.dir, &old.dentry->d_name, &old.de,
&old.inlined);
if (IS_ERR(old.bh)) {
retval = PTR_ERR(old.bh);
goto unlock_moved_dir;
}
if (IS_ERR(old.bh))
return PTR_ERR(old.bh);
/*
* Check for inode number is _not_ due to possible IO errors.
......@@ -4043,10 +4034,6 @@ static int ext4_rename(struct mnt_idmap *idmap, struct inode *old_dir,
brelse(old.bh);
brelse(new.bh);
unlock_moved_dir:
if (S_ISDIR(old.inode->i_mode))
inode_unlock(old.inode);
return retval;
}
......
......@@ -995,20 +995,12 @@ static int f2fs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
goto out;
}
/*
* Copied from ext4_rename: we need to protect against old.inode
* directory getting converted from inline directory format into
* a normal one.
*/
if (S_ISDIR(old_inode->i_mode))
inode_lock_nested(old_inode, I_MUTEX_NONDIR2);
err = -ENOENT;
old_entry = f2fs_find_entry(old_dir, &old_dentry->d_name, &old_page);
if (!old_entry) {
if (IS_ERR(old_page))
err = PTR_ERR(old_page);
goto out_unlock_old;
goto out;
}
if (S_ISDIR(old_inode->i_mode)) {
......@@ -1116,9 +1108,6 @@ static int f2fs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
f2fs_unlock_op(sbi);
if (S_ISDIR(old_inode->i_mode))
inode_unlock(old_inode);
if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir))
f2fs_sync_fs(sbi->sb, 1);
......@@ -1133,9 +1122,6 @@ static int f2fs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
f2fs_put_page(old_dir_page, 0);
out_old:
f2fs_put_page(old_page, 0);
out_unlock_old:
if (S_ISDIR(old_inode->i_mode))
inode_unlock(old_inode);
out:
iput(whiteout);
return err;
......
......@@ -1103,10 +1103,52 @@ void discard_new_inode(struct inode *inode)
}
EXPORT_SYMBOL(discard_new_inode);
/**
* lock_two_inodes - lock two inodes (may be regular files but also dirs)
*
* Lock any non-NULL argument. The caller must make sure that if he is passing
* in two directories, one is not ancestor of the other. Zero, one or two
* objects may be locked by this function.
*
* @inode1: first inode to lock
* @inode2: second inode to lock
* @subclass1: inode lock subclass for the first lock obtained
* @subclass2: inode lock subclass for the second lock obtained
*/
void lock_two_inodes(struct inode *inode1, struct inode *inode2,
unsigned subclass1, unsigned subclass2)
{
if (!inode1 || !inode2) {
/*
* Make sure @subclass1 will be used for the acquired lock.
* This is not strictly necessary (no current caller cares) but
* let's keep things consistent.
*/
if (!inode1)
swap(inode1, inode2);
goto lock;
}
/*
* If one object is directory and the other is not, we must make sure
* to lock directory first as the other object may be its child.
*/
if (S_ISDIR(inode2->i_mode) == S_ISDIR(inode1->i_mode)) {
if (inode1 > inode2)
swap(inode1, inode2);
} else if (!S_ISDIR(inode1->i_mode))
swap(inode1, inode2);
lock:
if (inode1)
inode_lock_nested(inode1, subclass1);
if (inode2 && inode2 != inode1)
inode_lock_nested(inode2, subclass2);
}
/**
* lock_two_nondirectories - take two i_mutexes on non-directory objects
*
* Lock any non-NULL argument that is not a directory.
* Lock any non-NULL argument. Passed objects must not be directories.
* Zero, one or two objects may be locked by this function.
*
* @inode1: first inode to lock
......@@ -1114,13 +1156,9 @@ EXPORT_SYMBOL(discard_new_inode);
*/
void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
{
if (inode1 > inode2)
swap(inode1, inode2);
if (inode1 && !S_ISDIR(inode1->i_mode))
inode_lock(inode1);
if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
inode_lock_nested(inode2, I_MUTEX_NONDIR2);
WARN_ON_ONCE(S_ISDIR(inode1->i_mode));
WARN_ON_ONCE(S_ISDIR(inode2->i_mode));
lock_two_inodes(inode1, inode2, I_MUTEX_NORMAL, I_MUTEX_NONDIR2);
}
EXPORT_SYMBOL(lock_two_nondirectories);
......@@ -1131,10 +1169,14 @@ EXPORT_SYMBOL(lock_two_nondirectories);
*/
void unlock_two_nondirectories(struct inode *inode1, struct inode *inode2)
{
if (inode1 && !S_ISDIR(inode1->i_mode))
if (inode1) {
WARN_ON_ONCE(S_ISDIR(inode1->i_mode));
inode_unlock(inode1);
if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
}
if (inode2 && inode2 != inode1) {
WARN_ON_ONCE(S_ISDIR(inode2->i_mode));
inode_unlock(inode2);
}
}
EXPORT_SYMBOL(unlock_two_nondirectories);
......
......@@ -193,6 +193,8 @@ extern long prune_icache_sb(struct super_block *sb, struct shrink_control *sc);
int dentry_needs_remove_privs(struct mnt_idmap *, struct dentry *dentry);
bool in_group_or_capable(struct mnt_idmap *idmap,
const struct inode *inode, vfsgid_t vfsgid);
void lock_two_inodes(struct inode *inode1, struct inode *inode2,
unsigned subclass1, unsigned subclass2);
/*
* fs-writeback.c
......
......@@ -3028,8 +3028,8 @@ static struct dentry *lock_two_directories(struct dentry *p1, struct dentry *p2)
return p;
}
inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
inode_lock_nested(p2->d_inode, I_MUTEX_PARENT2);
lock_two_inodes(p1->d_inode, p2->d_inode,
I_MUTEX_PARENT, I_MUTEX_PARENT2);
return NULL;
}
......@@ -4731,7 +4731,7 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
* sb->s_vfs_rename_mutex. We might be more accurate, but that's another
* story.
* c) we have to lock _four_ objects - parents and victim (if it exists),
* and source (if it is not a directory).
* and source.
* And that - after we got ->i_mutex on parents (until then we don't know
* whether the target exists). Solution: try to be smart with locking
* order for inodes. We rely on the fact that tree topology may change
......@@ -4815,10 +4815,16 @@ int vfs_rename(struct renamedata *rd)
take_dentry_name_snapshot(&old_name, old_dentry);
dget(new_dentry);
if (!is_dir || (flags & RENAME_EXCHANGE))
lock_two_nondirectories(source, target);
else if (target)
inode_lock(target);
/*
* Lock all moved children. Moved directories may need to change parent
* pointer so they need the lock to prevent against concurrent
* directory changes moving parent pointer. For regular files we've
* historically always done this. The lockdep locking subclasses are
* somewhat arbitrary but RENAME_EXCHANGE in particular can swap
* regular files and directories so it's difficult to tell which
* subclasses to use.
*/
lock_two_inodes(source, target, I_MUTEX_NORMAL, I_MUTEX_NONDIR2);
error = -EPERM;
if (IS_SWAPFILE(source) || (target && IS_SWAPFILE(target)))
......@@ -4866,9 +4872,9 @@ int vfs_rename(struct renamedata *rd)
d_exchange(old_dentry, new_dentry);
}
out:
if (!is_dir || (flags & RENAME_EXCHANGE))
unlock_two_nondirectories(source, target);
else if (target)
if (source)
inode_unlock(source);
if (target)
inode_unlock(target);
dput(new_dentry);
if (!error) {
......
......@@ -793,11 +793,6 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
if (!empty_dir(new_inode))
goto out_oiter;
}
/*
* We need to protect against old_inode getting converted from
* ICB to normal directory.
*/
inode_lock_nested(old_inode, I_MUTEX_NONDIR2);
retval = udf_fiiter_find_entry(old_inode, &dotdot_name,
&diriter);
if (retval == -ENOENT) {
......@@ -806,10 +801,8 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
old_inode->i_ino);
retval = -EFSCORRUPTED;
}
if (retval) {
inode_unlock(old_inode);
if (retval)
goto out_oiter;
}
has_diriter = true;
tloc = lelb_to_cpu(diriter.fi.icb.extLocation);
if (udf_get_lb_pblock(old_inode->i_sb, &tloc, 0) !=
......@@ -889,7 +882,6 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
udf_dir_entry_len(&diriter.fi));
udf_fiiter_write_fi(&diriter, NULL);
udf_fiiter_release(&diriter);
inode_unlock(old_inode);
inode_dec_link_count(old_dir);
if (new_inode)
......@@ -901,10 +893,8 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
}
return 0;
out_oiter:
if (has_diriter) {
if (has_diriter)
udf_fiiter_release(&diriter);
inode_unlock(old_inode);
}
udf_fiiter_release(&oiter);
return retval;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment