Commits · 6b03f7edf43e3b284c023ab03aba6e5e7c5a7559 · nexedi / linux

14 Mar, 2020 31 commits

path_connected(): pass mount and dentry separately · 6b03f7ed

Al Viro authored Feb 24, 2020

eventually we'll want to do that check *before* mangling
nd->path.dentry...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

6b03f7ed

split the lookup-related parts of do_last() into a separate helper · c981a482
Al Viro authored Jan 26, 2020
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
c981a482
do_last(): rejoin the common path even earlier in FMODE_{OPENED,CREATED} case · 973d4b73
Al Viro authored Jan 26, 2020
```
... getting may_create_in_sticky() checks in FMODE_OPENED case as well.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
973d4b73

do_last(): simplify the liveness analysis past finish_open_created · 8795e7d4

Al Viro authored Jan 26, 2020

Don't mess with got_write there - it is guaranteed to be false on
entry and it will be set true if and only if we decide to go for
truncation and manage to get write access for that.

Don't carry acc_mode through the entire thing - it's only used
in that part.  And don't bother with gotos in there - compiler is
quite capable of optimizing that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

8795e7d4

do_last(): rejoing the common path earlier in FMODE_{OPENED,CREATED} case · 5a2d3edd
Al Viro authored Jan 26, 2020
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
5a2d3edd

do_last(): don't bother with keeping got_write in FMODE_OPENED case · 59e96e65

Al Viro authored Jan 26, 2020

it's easier to drop it right after lookup_open() and regain if
needed (i.e. if we will need to truncate).  On the non-FMODE_OPENED
path we do that anyway.  In case of FMODE_CREATED we won't be
needing it.  And it's easier to prove correctness that way,
especially since the initial failure to get write access is not
always fatal; proving that we'll never end up truncating in that
case is rather convoluted.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

59e96e65

do_last(): merge the may_open() calls · 3ad5615a

Al Viro authored Jan 26, 2020

have FMODE_OPENED case rejoin the main path at earlier point
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

3ad5615a

atomic_open(): lift the call of may_open() into do_last() · 7be219b4

Al Viro authored Jan 26, 2020

there we'll be able to merge it with its counterparts in other
cases, and there's no reason to do it before the parent has
been unlocked
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

7be219b4

atomic_open(): return the right dentry in FMODE_OPENED case · 6fb968cd

Al Viro authored Jan 26, 2020

->atomic_open() might have used a different alias than the one we'd
passed to it; in "not opened" case we take care of that, in "opened"
one we don't.  Currently we don't care downstream of "opened" case
which alias to return; however, that will change shortly when we
get to unifying may_open() calls.

It's not hard to get right in all cases, anyway.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

6fb968cd

new helper: traverse_mounts() · 9deed3eb

Al Viro authored Jan 17, 2020

common guts of follow_down() and follow_managed() taken to a new
helper - traverse_mounts().  The remnants of follow_managed()
are folded into its sole remaining caller (handle_mounts()).
Calling conventions of handle_mounts() slightly sanitized -
instead of the weird "1 for success, -E... for failure" that used
to be imposed by the calling conventions of walk_component() et.al.
we can use the normal "0 for success, -E... for failure".
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9deed3eb

massage __follow_mount_rcu() a bit · ea936aeb

Al Viro authored Jan 16, 2020

make the loop more similar to that in follow_managed(), with
explicit tracking of flags, etc.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ea936aeb

namei: have link_path_walk() maintain LOOKUP_PARENT · c108837e

Al Viro authored Mar 05, 2020

set on entry, clear when we get to the last component.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

c108837e

link_path_walk(): simplify stack handling · d8d4611a

Al Viro authored Feb 23, 2020

We use nd->stack to store two things: pinning down the symlinks
we are resolving and resuming the name traversal when a nested
symlink is finished.

Currently, nd->depth is used to keep track of both.  It's 0 when
we call link_path_walk() for the first time (for the pathname
itself) and 1 on all subsequent calls (for trailing symlinks,
if any).  That's fine, as far as pinning symlinks goes - when
handling a trailing symlink, the string we are interpreting
is the body of symlink pinned down in nd->stack[0].  It's
rather inconvenient with respect to handling nested symlinks,
though - when we run out of a string we are currently interpreting,
we need to decide whether it's a nested symlink (in which case
we need to pick the string saved back when we started to interpret
that nested symlink and resume its traversal) or not (in which
case we are done with link_path_walk()).

Current solution is a bit of a kludge - in handling of trailing symlink
(in lookup_last() and open_last_lookups() we clear nd->stack[0].name.
That allows link_path_walk() to use the following rules when
running out of a string to interpret:
	* if nd->depth is zero, we are at the end of pathname itself.
	* if nd->depth is positive, check the saved string; for
nested symlink it will be non-NULL, for trailing symlink - NULL.

It works, but it's rather non-obvious.  Note that we have two sets:
the set of symlinks currently being traversed and the set of postponed
pathname tails.  The former is stored in nd->stack[0..nd->depth-1].link
and it's valid throught the pathname resolution; the latter is valid only
during an individual call of link_path_walk() and it occupies
nd->stack[0..nd->depth-1].name for the first call of link_path_walk() and
nd->stack[1..nd->depth-1].name for subsequent ones.  The kludge is basically
a way to recognize the second set becoming empty.

The things get simpler if we keep track of the second set's size
explicitly and always store it in nd->stack[0..depth-1].name.
We access the second set only inside link_path_walk(), so its
size can live in a local variable; that way the check becomes
trivial without the need of that kludge.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

d8d4611a

pick_link(): check for WALK_TRAILING, not LOOKUP_PARENT · b1a81972
Al Viro authored Jan 19, 2020
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
b1a81972

namei: invert the meaning of WALK_FOLLOW · 8c4efe22

Al Viro authored Jan 19, 2020

old flags & WALK_FOLLOW <=> new !(flags & WALK_TRAILING)
That's what that flag had really been used for.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

8c4efe22

sanitize handling of nd->last_type, kill LAST_BIND · b4c03536

Al Viro authored Jan 19, 2020

->last_type values are set in 3 places: path_init() (sets to LAST_ROOT),
link_path_walk (LAST_NORM/DOT/DOTDOT) and pick_link (LAST_BIND).

The are checked in walk_component(), lookup_last() and do_last().
They also get copied to the caller by filename_parentat(). In the last
3 cases the value is what we had at the return from link_path_walk().
In case of walk_component() it's either directly downstream from
assignment in link_path_walk() or, when called by lookup_last(), the
value we have at the return from link_path_walk().

The value at the entry into link_path_walk() can survive to return only
if the pathname contains nothing but slashes. Note that pick_link()
never returns such - pure jumps are handled directly. So for the calls
of link_path_walk() for trailing symlinks it does not matter what value
had been there at the entry; the value at the return won't depend upon it.

There are 3 call chains that might have pick_link() storing LAST_BIND:

1) pick_link() from step_into() from walk_component() from
link_path_walk(). In that case we will either be parsing the next
component immediately after return into link_path_walk(), which will
overwrite the ->last_type before anyone has a chance to look at it,
or we'll fail, in which case nobody will be looking at ->last_type at all.

2) pick_link() from step_into() from walk_component() from lookup_last().
The value is never looked at due to the above; it won't affect the value
seen at return from any link_path_walk().

3) pick_link() from step_into() from do_last(). Ditto.

In other words, assignemnt in pick_link() is pointless, and so is
LAST_BIND itself; nothing ever looks at that value. Kill it off.
And make link_path_walk() _always_ assign ->last_type - in the only
case when the value at the entry might survive to the return that value
is always LAST_ROOT, inherited from path_init(). Move that assignment
from path_init() into the beginning of link_path_walk(), to consolidate
the things.

Historical note: LAST_BIND used to be used for the kludge with trailing
pure jump symlinks (extra iteration through the top-level loop).
No point keeping it anymore...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

b4c03536

finally fold get_link() into pick_link() · ad6cc4c3

Al Viro authored Jan 14, 2020

kill nd->link_inode, while we are at it
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ad6cc4c3

merging pick_link() with get_link(), part 6 · 06708adb

Al Viro authored Jan 14, 2020

move the only remaining call of get_link() into pick_link()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

06708adb

merging pick_link() with get_link(), part 5 · b0417d2c

Al Viro authored Jan 14, 2020

move get_link() call into step_into().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

b0417d2c

merging pick_link() with get_link(), part 4 · 92d27016

Al Viro authored Jan 14, 2020

Move the call of get_link() into walk_component().  Change the
calling conventions for walk_component() to returning the link
body to follow (if any).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

92d27016

merging pick_link() with get_link(), part 3 · 40fcf5a9

Al Viro authored Jan 14, 2020

After a pure jump ("/" or procfs-style symlink) we don't need to
hold the link anymore.  link_path_walk() dropped it if such case
had been detected, lookup_last/do_last() (i.e. old trailing_symlink())
left it on the stack - it ended up calling terminate_walk() shortly
anyway, which would've purged the entire stack.

Do it in get_link() itself instead.  Simpler logics that way...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

40fcf5a9

merging pick_link() with get_link(), part 2 · 1ccac622

Al Viro authored Jan 14, 2020

Fold trailing_symlink() into lookup_last() and do_last(), change
the calling conventions of those two.  Rules change:
	success, we are done => NULL instead of 0
	error	=> ERR_PTR(-E...) instead of -E...
	got a symlink to follow => return the path to be followed instead of 1

The loops calling those (in path_lookupat() and path_openat()) adjusted.

A subtle change of control flow here: originally a pure-jump trailing
symlink ("/" or procfs one) would've passed through the upper level
loop once more, with "" for path to traverse.  That would've brought
us back to the lookup_last/do_last entry and we would've hit LAST_BIND
case (LAST_BIND left from get_link() called by trailing_symlink())
and pretty much skip to the point right after where we'd left the
sucker back when we picked that trailing symlink.

Now we don't bother with that extra pass through the upper level
loop - if get_link() says "I've just done a pure jump, nothing
else to do", we just treat that as non-symlink case.

Boilerplate added on that step will go away shortly - it'll migrate
into walk_component() and then to step_into(), collapsing into the
change of calling conventions for those.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

1ccac622

merging pick_link() with get_link(), part 1 · 43679723

Al Viro authored Jan 14, 2020

Move restoring LOOKUP_PARENT and zeroing nd->stack.name[0] past
the call of get_link() (nothing _currently_ uses them in there).
That allows to moved the call of may_follow_link() into get_link()
as well, since now the presence of LOOKUP_PARENT distinguishes
the callers from each other (link_path_walk() has it, trailing_symlink()
doesn't).

Preparations for folding trailing_symlink() into callers (lookup_last()
and do_last()) and changing the calling conventions of those.  Next
stage after that will have get_link() call migrate into walk_component(),
then - into step_into().  It's tricky enough to warrant doing that
in stages, unfortunately...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

43679723

expand the only remaining call of path_lookup_conditional() · a9dc1494
Al Viro authored Jan 12, 2020
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
a9dc1494

LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat() · 161aff1d

Al Viro authored Jan 11, 2020

New LOOKUP flag, telling path_lookupat() to act as path_mountpointat().
IOW, traverse mounts at the final point and skip revalidation of the
location where it ends up.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

161aff1d

fold handle_mounts() into step_into() · cbae4d12

Al Viro authored Jan 12, 2020

The following is true:
	* calls of handle_mounts() and step_into() are always
paired in sequences like
	err = handle_mounts(nd, dentry, &path, &inode, &seq);
	if (unlikely(err < 0))
		return err;
	err = step_into(nd, &path, flags, inode, seq);
	* in all such sequences path is uninitialized before and
unused after this pair of calls
	* in all such sequences inode and seq are unused afterwards.

So the call of handle_mounts() can be shifted inside step_into(),
turning 'path' into a local variable in the combined function.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

cbae4d12

new step_into() flag: WALK_NOFOLLOW · aca2903e

Al Viro authored Jan 09, 2020

Tells step_into() not to follow symlinks, regardless of LOOKUP_FOLLOW.
Allows to switch handle_lookup_down() to of step_into(), getting
all follow_managed() and step_into() calls paired.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

aca2903e

step_into() callers: dismiss the symlink earlier · 56676ec3

Al Viro authored Mar 10, 2020

We need to dismiss a symlink when we are done traversing it;
currently that's done when we call step_into() for its last
component.  For the cases when we do not call step_into()
for that component (i.e. when it's . or ..) we do the same
symlink dismissal after the call of handle_dots().

What we need to guarantee is that the symlink won't be dismissed
while we are still using nd->last.name - it's pointing into the
body of said symlink.  step_into() is sufficiently late - by
the time it's called we'd already obtained the dentry, so the
name we'd been looking up is no longer needed.  However, it
turns out to be cleaner to have that ("we are done with that
component now, can dismiss the link") done explicitly - in the
callers of step_into().

In handle_dots() case we won't be using the component string
at all, so for . and .. the corresponding point is actually
_before_ the call of handle_dots(), not after it.

Fix a minor irregularity in do_last(), while we are at it -
if trailing symlink ended with . or .. we forgot to dismiss
it.  Not a problem, since nameidata is about to be done with
(neither . nor .. can be a trailing symlink, so this is the
last iteration through the loop) and terminate_walk() will
clean the stack anyway, but let's keep it more regular.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

56676ec3

lookup_fast(): take mount traversal into callers · 20e34357

Al Viro authored Jan 09, 2020

Current calling conventions: -E... on error, 0 on cache miss,
result of handle_mounts(nd, dentry, path, inode, seqp) on
success.  Turn that into returning ERR_PTR(-E...), NULL and dentry
resp.; deal with handle_mounts() in the callers.  The thing
is, they already do that in cache miss handling case, so we
just need to supply dentry to them and unify the mount traversal
in those cases.  Fewer arguments that way, and we get closer
to merging handle_mounts() and step_into().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

20e34357

teach handle_mounts() to handle RCU mode · c153007b

Al Viro authored Jan 09, 2020

... and make the callers of __follow_mount_rcu() use handle_mounts().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

c153007b

lookup_fast(): consolidate the RCU success case · b023e172

Al Viro authored Jan 17, 2020

1) in case of __follow_mount_rcu() failure, lookup_fast() proceeds
to call unlazy_child() and, should it succeed, handle_mounts().
Note that we have status > 0 (or we wouldn't be calling
__follow_mount_rcu() at all), so all stuff conditional upon
non-positive status won't be even touched.

Consolidate just that sequence after the call of __follow_mount_rcu().

2) calling d_is_negative() and keeping its result is pointless -
we either don't get past checking ->d_seq (and don't use the results of
d_is_negative() at all), or we are guaranteed that ->d_inode and
type bits of ->d_flags had been consistent at the time of d_is_negative()
call.  IOW, we could only get to the use of its result if it's
equal to !inode.  The same ->d_seq check guarantees that after that point
this CPU won't observe ->d_flags values older than ->d_inode update.
So 'negative' variable is completely pointless these days.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

b023e172

12 Mar, 2020 3 commits

handle_mounts(): pass dentry in, turn path into a pure out argument · db3c9ade

Al Viro authored Jan 09, 2020

All callers are equivalent to
	path->dentry = dentry;
	path->mnt = nd->path.mnt;
	err = handle_mounts(path, ...)
Pass dentry as an explicit argument, fill *path in handle_mounts()
itself.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

db3c9ade

do_last(): collapse the call of path_to_nameidata() · e73cabff

Al Viro authored Jan 09, 2020

... and shift filling struct path to just before the call of
handle_mounts().  All callers of handle_mounts() are
immediately preceded by path->mnt = nd->path.mnt now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

e73cabff

lookup_open(): saner calling conventions (return dentry on success) · da5ebf5a
Al Viro authored Jan 09, 2020
```
same story as for atomic_open() in the previous commit.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
da5ebf5a

27 Feb, 2020 6 commits

atomic_open(): saner calling conventions (return dentry on success) · 239eb983

Al Viro authored Jan 09, 2020

Currently it either returns -E... or puts (nd->path.mnt,dentry)
into *path and returns 0.  Make it return ERR_PTR(-E...) or
dentry; adjust the caller.  Fewer arguments and it's easier
to keep track of *path contents that way.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

239eb983

handle_mounts(): start building a sane wrapper for follow_managed() · bd7c4b50

Al Viro authored Jan 08, 2020

All callers of follow_managed() follow it on success with the same steps -
d_backing_inode(path->dentry) is calculated and stored into some struct inode *
variable and, in all but one case, an unsigned variable (nd->seq to be) is
zeroed.  The single exception is lookup_fast() and there zeroing is correct
thing to do - not doing it is a pointless microoptimization.

	Add a wrapper for follow_managed() that would do that combination.
It's mostly a vehicle for code massage - it will be changing quite a bit,
and the current calling conventions are by no means final.  Right now it
takes path, nameidata and (as out params) inode and seq, similar to
__follow_mount_rcu().  Which will soon get folded into it...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

bd7c4b50

make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW · 31d1726d

Al Viro authored Jan 08, 2020

O_CREAT | O_EXCL means "-EEXIST if we run into a trailing symlink".
As it is, we might or might not have LOOKUP_FOLLOW in op->intent
in that case - that depends upon having O_NOFOLLOW in open flags.
It doesn't matter, since we won't be checking it in that case -
do_last() bails out earlier.

However, making sure it's not set (i.e. acting as if we had an explicit
O_NOFOLLOW) makes the behaviour more explicit and allows to reorder the
check for O_CREAT | O_EXCL in do_last() with the call of step_into()
immediately following it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

31d1726d

follow_automount() doesn't need the entire nameidata · 1c9f5e06

Al Viro authored Jan 16, 2020

Only the address of ->total_link_count and the flags.
And fix an off-by-one is ELOOP detection - make it
consistent with symlink following, where we check if
the pre-increment value has reached 40, rather than
check the post-increment one.

[kudos to Christian Brauner for spotted braino]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

1c9f5e06

follow_automount(): get rid of dead^Wstillborn code · 25e195aa

Al Viro authored Jan 11, 2020

1) no instances of ->d_automount() have ever made use of the "return
ERR_PTR(-EISDIR) if you don't feel like mounting anything" - that's
a rudiment of plans that got superseded before the thing went into
the tree.  Despite the comment in follow_automount(), autofs has
never done that.

2) if there's no ->d_automount() in dentry_operations, filesystems
should not set DCACHE_NEED_AUTOMOUNT in the first place.  None have
ever done so...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

25e195aa

fix automount/automount race properly · 26df6034

Al Viro authored Jan 11, 2020

Protection against automount/automount races (two threads hitting the same
referral point at the same time) is based upon do_add_mount() prevention of
identical overmounts - trying to overmount the root of mounted tree with
the same tree fails with -EBUSY.  It's unreliable (the other thread might've
mounted something on top of the automount it has triggered) *and* causes
no end of headache for follow_automount() and its caller, since
finish_automount() behaves like do_new_mount() - if the mountpoint to be is
overmounted, it mounts on top what's overmounting it.  It's not only wrong
(we want to go into what's overmounting the automount point and quietly
discard what we planned to mount there), it introduces the possibility of
original parent mount getting dropped.  That's what 8aef1884 (VFS: Fix
vfsmount overput on simultaneous automount) deals with, but it can't do
anything about the reliability of conflict detection - if something had
been overmounted the other thread's automount (e.g. that other thread
having stepped into automount in mount(2)), we don't get that -EBUSY and
the result is
	 referral point under automounted NFS under explicit overmount
under another copy of automounted NFS

What we need is finish_automount() *NOT* digging into overmounts - if it
finds one, it should just quietly discard the thing it was asked to mount.
And don't bother with actually crossing into the results of finish_automount() -
the same loop that calls follow_automount() will do that just fine on the
next iteration.

IOW, instead of calling lock_mount() have finish_automount() do it manually,
_without_ the "move into overmount and retry" part.  And leave crossing into
the results to the caller of follow_automount(), which simplifies it a lot.

Moral: if you end up with a lot of glue working around the calling conventions
of something, perhaps these calling conventions are simply wrong...

Fixes: 8aef1884 (VFS: Fix vfsmount overput on simultaneous automount)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

26df6034