Commits · 9f5418010940236b2c39ea53b99055ca26ff1279 · Kirill Smelkov / linux

18 May, 2016 2 commits

xfs: concurrent readdir hangs on data buffer locks · 9f541801

Dave Chinner authored May 19, 2016

There's a three-process deadlock involving shared/exclusive barriers
and inverted lock orders in the directory readdir implementation.
It's a pre-existing problem with lock ordering, exposed by the
VFS parallelisation code.

process 1               process 2               process 3
---------               ---------               ---------
readdir
iolock(shared)
  get_leaf_dents
    iterate entries
       ilock(shared)
       map, lock and read buffer
       iunlock(shared)
       process entries in buffer
       .....
                                                readdir
                                                iolock(shared)
                                                  get_leaf_dents
                                                    iterate entries
                                                      ilock(shared)
                                                      map, lock buffer
                                                      <blocks>
                        finish ->iterate_shared
                        file_accessed()
                          ->update_time
                            start transaction
                            ilock(excl)
                            <blocks>
        .....
        finishes processing buffer
        get next buffer
          ilock(shared)
          <blocks>

And that's the deadlock.

Fix this by dropping the current buffer lock in process 1 before
trying to map the next buffer. This means we keep the lock order of
ilock -> buffer lock intact and hence will allow process 3 to make
progress and drop it's ilock(shared) once it is done.
Reported-by: Xiong Zhou <xzhou@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9f541801

Revert "btrfs: switch to ->iterate_shared()" · fe742fd4

Al Viro authored May 18, 2016

This reverts commit 972b241f.
Quoth Chris:
	didn't take the delayed inode stuff into account
	it got an rbtree of items and it pulls things out
	so in shared mode, its hugely racey
	sorry, lets revert and fix it for real inside of btrfs
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fe742fd4

13 May, 2016 3 commits

ext4: switch to ->iterate_shared() · ae05327a

Al Viro authored May 12, 2016

Note that we need relax_dir() equivalent for directories
locked shared.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ae05327a

hfs: switch to ->iterate_shared() · 9717a91b

Al Viro authored May 12, 2016

exact parallel of hfsplus analogue
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9717a91b

hfsplus: switch to ->iterate_shared() · 323ee8fc

Al Viro authored May 12, 2016

We need to protect the list of hfsplus_readdir_data against parallel
insertions (in readdir) and removals (in release).  Add a spinlock
for that.  Note that it has nothing to do with protection of
hfsplus_readdir_data->key - we have an exclusion between hfsplus_readdir()
and hfsplus_delete_cat() on directory lock and between several
hfsplus_readdir() for the same struct file on ->f_pos_lock.  The spinlock
is strictly for list changes.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

323ee8fc

12 May, 2016 4 commits

hostfs: switch to ->iterate_shared() · 552a9d48
Al Viro authored May 12, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
552a9d48

hpfs: switch to ->iterate_shared() · 7d674b31

Al Viro authored May 12, 2016

NOTE: the only reason we can do that without ->i_rdir_offs races
is that hpfs_lock() serializes everything in there anyway.  It's
not that hard to get rid of, but not as part of this series...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

7d674b31

hpfs: handle allocation failures in hpfs_add_pos() · e82c3147

Al Viro authored May 12, 2016

pr_err() is nice, but we'd better propagate the error
to caller and not proceed to violate the invariants
(namely, "every file with f_pos tied to directory block
should have its address visible in per-inode array").
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

e82c3147

gfs2: switch to ->iterate_shared() · 1d1bb236

Al Viro authored May 12, 2016

protected by glock and already used without locking the directory
by gfs2_get_name()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

1d1bb236

10 May, 2016 4 commits
- f2fs: switch to ->iterate_shared() · e77d0c63
  Al Viro authored May 10, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  e77d0c63
- afs: switch to ->iterate_shared() · 29884eff
  Al Viro authored May 10, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  29884eff
- befs: switch to ->iterate_shared() · e23e9aa7
  Al Viro authored May 10, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  e23e9aa7
- befs: constify stuff a bit · 22341d8f
  Al Viro authored May 10, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  22341d8f
09 May, 2016 14 commits

isofs: switch to ->iterate_shared() · e8991089
Al Viro authored May 09, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
e8991089

get_acorn_filename(): deobfuscate a bit · e17a21d3

Al Viro authored May 05, 2016

Lots of Idiotic Silly Parentheses is -> that way...  What that
condition checks is that there's exactly 32 bytes between the
end of name and the end of entire drectory record.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

e17a21d3

btrfs: switch to ->iterate_shared() · 972b241f
Al Viro authored May 04, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
972b241f
logfs: no need to lock directory in lseek · 5e261246
Al Viro authored May 04, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
5e261246
switch ecryptfs to ->iterate_shared · 51a16a9c
Al Viro authored May 04, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
51a16a9c
Merge branch 'for-linus' into work.lookups · a063ff1e
Al Viro authored May 09, 2016

a063ff1e
9p: switch to ->iterate_shared() · 5963ded8
Al Viro authored May 01, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
5963ded8

fat: switch to ->iterate_shared() · 98d4b8d8

Al Viro authored Apr 30, 2016

... and make that weird ioctl lock directory only shared.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

98d4b8d8

romfs, squashfs: switch to ->iterate_shared() · d375570f

Al Viro authored Apr 30, 2016

don't need to lock directory in ->llseek(), either
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

d375570f

more trivial ->iterate_shared conversions · c51da20c
Al Viro authored Apr 30, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
c51da20c

lustre: don't need to lock inode in directory lseek · 060ff688

Al Viro authored Apr 20, 2016

Note that lustre has its private mutex protecting directory pagecache;
if they ever remove it, they'll need to be careful with PageChecked()
use.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

060ff688

kernfs: no point locking directory around that generic_file_llseek() · 8cb0d2c1
Al Viro authored Apr 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
8cb0d2c1
configfs_readdir(): make safe under shared lock · a01b3007
Al Viro authored Apr 20, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
a01b3007

nfs: per-name sillyunlink exclusion · 884be175

Al Viro authored Apr 28, 2016

use d_alloc_parallel() for sillyunlink/lookup exclusion and
explicit rwsem (nfs_rmdir() being a writer and nfs_call_unlink() -
a reader) for rmdir/sillyunlink one.

That ought to make lookup/readdir/!O_CREAT atomic_open really
parallel on NFS.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

884be175

08 May, 2016 1 commit

get_rock_ridge_filename(): handle malformed NM entries · 99d82582

Al Viro authored May 05, 2016

Payloads of NM entries are not supposed to contain NUL.  When we run
into such, only the part prior to the first NUL goes into the
concatenation (i.e. the directory entry name being encoded by a bunch
of NM entries).  We do stop when the amount collected so far + the
claimed amount in the current NM entry exceed 254.  So far, so good,
but what we return as the total length is the sum of *claimed*
sizes, not the actual amount collected.  And that can grow pretty
large - not unlimited, since you'd need to put CE entries in
between to be able to get more than the maximum that could be
contained in one isofs directory entry / continuation chunk and
we are stop once we'd encountered 32 CEs, but you can get about 8Kb
easily.  And that's what will be passed to readdir callback as the
name length.  8Kb __copy_to_user() from a buffer allocated by
__get_free_page()

Cc: stable@vger.kernel.org # 0.98pl6+ (yes, really)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

99d82582

04 May, 2016 1 commit

ecryptfs: fix handling of directory opening · 6a480a78

Al Viro authored May 04, 2016

First of all, trying to open them r/w is idiocy; it's guaranteed to fail.
Moreover, assigning ->f_pos and assuming that everything will work is
blatantly broken - try that with e.g. tmpfs as underlying layer and watch
the fireworks. There may be a non-trivial amount of state associated with
current IO position, well beyond the numeric offset. Using the single
struct file associated with underlying inode is really not a good idea;
we ought to open one for each ecryptfs directory struct file.

Additionally, file_operations both for directories and non-directories are
full of pointless methods; non-directories should *not* have ->iterate(),
directories should not have ->flush(), ->fasync() and ->splice_read().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

6a480a78

02 May, 2016 11 commits
- nfs: switch to ->iterate_shared() · 9ac3d3e8
  Al Viro authored Apr 28, 2016
```
aside of the usual care about seeding dcache from readdir, we need
to be careful about the pagecache evictions here.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  9ac3d3e8
- lookup_open(): lock the parent shared unless O_CREAT is given · 9cf843e3
  Al Viro authored Apr 28, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  9cf843e3
- lookup_open(): put the dentry fed to ->lookup() or ->atomic_open() into in-lookup hash · 6fbd0714
  Al Viro authored Apr 28, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  6fbd0714
- lookup_open(): expand the call of real_lookup() · 12fa5e24
  Al Viro authored Apr 28, 2016
```
... and lose the duplicate IS_DEADDIR() - we'd already checked that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  12fa5e24
- atomic_open(): reorder and clean up a bit · 384f26e2
  Al Viro authored Apr 28, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  384f26e2
- lookup_open(): lift the "fallback to !O_CREAT" logics from atomic_open() · 1643b43f
  Al Viro authored Apr 27, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  1643b43f
- atomic_open(): be paranoid about may_open() return value · b3d58eaf
  Al Viro authored Apr 27, 2016
```
It should never return positives; however, with Linux S&M crowd
involved, no bogosity is impossible.  Results would be unpleasant...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  b3d58eaf
- atomic_open(): delay open_to_namei_flags() until the method call · 0fb1ea09
  Al Viro authored Apr 27, 2016
```
nobody else needs that transformation.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  0fb1ea09
- do_last(): take fput() on error after opening to out: · fe9ec829
  Al Viro authored Apr 27, 2016
```
make it conditional on *opened & FILE_OPENED; in addition to getting
rid of exit_fput: thing, it simplifies atomic_open() cleanup on
may_open() failure.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  fe9ec829
- do_last(): get rid of duplicate ELOOP check · 47f9dbd3
  Al Viro authored Apr 27, 2016
```
may_open() will catch it
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  47f9dbd3
- atomic_open(): massage the create_error logics a bit · 55db2fd9
  Al Viro authored Apr 27, 2016
```
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
```
  55db2fd9