Commits · a41b05edfedb939440e83666f23de3ef9af33acf · Kirill Smelkov / linux

13 Mar, 2022 9 commits

SUNRPC/auth: async tasks mustn't block waiting for memory · a41b05ed

NeilBrown authored Mar 07, 2022

When memory is short, new worker threads cannot be created and we depend
on the minimum one rpciod thread to be able to handle everything.  So it
must not block waiting for memory.

mempools are particularly a problem as memory can only be released back
to the mempool by an async rpc task running.  If all available workqueue
threads are waiting on the mempool, no thread is available to return
anything.

lookup_cred() can block on a mempool or kmalloc - and this can cause
deadlocks.  So add a new RPCAUTH_LOOKUP flag for async lookups and don't
block on memory.  If the -ENOMEM gets back to call_refreshresult(), wait
a short while and try again.  HZ>>4 is chosen as it is used elsewhere
for -ENOMEM retries.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

a41b05ed

SUNRPC/call_alloc: async tasks mustn't block waiting for memory · c487216b

NeilBrown authored Mar 07, 2022

When memory is short, new worker threads cannot be created and we depend
on the minimum one rpciod thread to be able to handle everything.
So it must not block waiting for memory.

mempools are particularly a problem as memory can only be released back
to the mempool by an async rpc task running.  If all available
workqueue threads are waiting on the mempool, no thread is available to
return anything.

rpc_malloc() can block, and this might cause deadlocks.
So check RPC_IS_ASYNC(), rather than RPC_IS_SWAPPER() to determine if
blocking is acceptable.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

c487216b

NFS: remove IS_SWAPFILE hack · 944d95f7

NeilBrown authored Mar 07, 2022

This code is pointless as IS_SWAPFILE is always defined.
So remove it.
Suggested-by: Mark Hemment <markhemm@googlemail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

944d95f7

NFS: Remove remaining dfprintks related to fscache and remove NFSDBG_FSCACHE · b5fdf66f

Dave Wysochanski authored Mar 01, 2022

The fscache cookie APIs including fscache_acquire_cookie() and
fscache_relinquish_cookie() now have very good tracing.  Thus,
there is no real need for dfprintks in the NFS fscache interface.

The NFS fscache interface has removed all dfprintks so remove the
NFSDBG_FSCACHE defines.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

b5fdf66f

NFS: Replace dfprintks with tracepoints in fscache read and write page functions · e3f0a7fe

Dave Wysochanski authored Mar 01, 2022

Most of fscache and other NFS IO paths are now using tracepoints.
Remove the dfprintks in the NFS fscache read/write page functions
and replace with tracepoints at the begin and end of the functions.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

e3f0a7fe

NFS: Rename fscache read and write pages functions · fc1c5abf

Dave Wysochanski authored Mar 01, 2022

Rename NFS fscache functions in a more consistent fashion
to better reflect when we read from and write to fscache.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

fc1c5abf

NFS: Cleanup usage of nfs_inode in fscache interface · 45f3a70b

Dave Wysochanski authored Mar 01, 2022

A number of places in the fscache interface used nfs_inode when inode could
be used, simplifying the code.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

45f3a70b

NFSv4.1 restrict GETATTR fs_location query to the main transport · b4be2c59

Olga Kornievskaia authored Feb 15, 2022

In the presence of trunking transports, it's helpful to make sure
that during the migration event, the GETATTR for fs_location attribute
happens on the main transport.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

b4be2c59

NFS: remove unneeded check in decode_devicenotify_args() · cb8fac6d

Alexey Khoroshilov authored Feb 15, 2022

[You don't often get email from khoroshilov@ispras.ru. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.]

Overflow check in not needed anymore after we switch to kmalloc_array().
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Fixes: a4f743a6 ("NFSv4.1: Convert open-coded array allocation calls to kmalloc_array()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

cb8fac6d

02 Mar, 2022 20 commits

NFS: Cache all entries in the readdirplus reply · 612896ec

Trond Myklebust authored Feb 24, 2022

Even if we're not able to cache all the entries in the readdir buffer,
let's ensure that we do prime the dcache.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

612896ec

NFS: Optimise away the previous cookie field · 0adf85b4

Trond Myklebust authored Feb 27, 2022

Replace the 'previous cookie' field in struct nfs_entry with the
array->last_cookie.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

0adf85b4

NFS: Fix up forced readdirplus · b0365ccb

Trond Myklebust authored Feb 23, 2022

Avoid clearing the entire readdir page cache if we're just doing forced
readdirplus for the 'ls -l' heuristic.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

b0365ccb

NFS: Convert readdir page cache to use a cookie based index · f648022f

Trond Myklebust authored Feb 23, 2022

Instead of using a linear index to address the pages, use the cookie of
the first entry, since that is what we use to match the page anyway.

This allows us to avoid re-reading the entire cache on a seekdir() type
of operation. The latter is very common when re-exporting NFS, and is a
major performance drain.

The change does affect our duplicate cookie detection, since we can no
longer rely on the page index as a linear offset for detecting whether
we looped backwards. However since we no longer do a linear search
through all the pages on each call to nfs_readdir(), this is less of a
concern than it was previously.
The other downside is that invalidate_mapping_pages() no longer can use
the page index to avoid clearing pages that have been read. A subsequent
patch will restore the functionality this provides to the 'ls -l'
heuristic.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

f648022f

NFS: Clean up page array initialisation/free · 9332cf14
Trond Myklebust authored Feb 26, 2022
```
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
```
9332cf14

NFS: Trace effects of the readdirplus heuristic · 11d03d0a

Trond Myklebust authored Feb 19, 2022

Enable tracking of when the readdirplus heuristic causes a page cache
invalidation.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

11d03d0a

NFS: Trace effects of readdirplus on the dcache · eace45a1

Trond Myklebust authored Feb 19, 2022

Trace the effects of readdirplus on attribute and dentry revalidation.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

eace45a1

NFS: Add basic readdir tracing · 310e3187

Trond Myklebust authored Feb 19, 2022

Add tracing to track how often the client goes to the server for updated
readdir information.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

310e3187

NFS: Don't request readdirplus when revalidation was forced · 0b3cc71b

Trond Myklebust authored Feb 19, 2022

If the revalidation was forced, due to the presence of a LOOKUP_EXCL or
a LOOKUP_REVAL flag, then readdirplus won't help. It also can't help
when we're doing a path component lookup.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

0b3cc71b

NFS: Readdirplus can't help lookup for case insensitive filesystems · 2c2c3365

Trond Myklebust authored Feb 19, 2022

If the filesystem is case insensitive, then readdirplus can't help with
cache misses, since it won't return case folded variants of the filename.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

2c2c3365

NFSv4: Ask for a full XDR buffer of readdir goodness · c49c6894

Trond Myklebust authored Feb 18, 2022

Instead of pretending that we know the ratio of directory info vs
readdirplus attribute info, just set the 'dircount' field to the same
value as the 'maxcount' field.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

c49c6894

NFS: Don't ask for readdirplus unless it can help nfs_getattr() · ad1e109a

Trond Myklebust authored Feb 17, 2022

If attribute caching is turned off, then use of readdirplus is not going
to help stat() performance.
Readdirplus also doesn't help if a file is being written to, since we
will have to flush those writes in order to sync the mtime/ctime.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

ad1e109a

NFS: Improve heuristic for readdirplus · 230bc98f

Trond Myklebust authored Feb 17, 2022

The heuristic for readdirplus is designed to try to detect 'ls -l' and
similar patterns. It does so by looking for cache hit/miss patterns in
both the attribute cache and in the dcache of the files in a given
directory, and then sets a flag for the readdirplus code to interpret.

The problem with this approach is that a single attribute or dcache miss
can cause the NFS code to force a refresh of the attributes for the
entire set of files contained in the directory.

To be able to make a more nuanced decision, let's sample the number of
hits and misses in the set of open directory descriptors. That allows us
to set thresholds at which we start preferring READDIRPLUS over regular
READDIR, or at which we start to force a re-read of the remaining
readdir cache using READDIRPLUS.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

230bc98f

NFS: Reduce use of uncached readdir · 9c3f4d98

Trond Myklebust authored Feb 17, 2022

When reading a very large directory, we want to try to keep the page
cache up to date if doing so is inexpensive. With the change to allow
readdir to continue reading even when the cache is incomplete, we no
longer need to fall back to uncached readdir in order to scale to large
directories.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

9c3f4d98

NFS: Simplify nfs_readdir_xdr_to_array() · 9ff89c25

Trond Myklebust authored Feb 07, 2022

Recent changes to readdir mean that we can cope with partially filled
page cache entries, so we no longer need to rely on looping in
nfs_readdir_xdr_to_array().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

9ff89c25

NFS: If the cookie verifier changes, we must invalidate the page cache · 6c34f05b

Trond Myklebust authored Feb 22, 2022

Ensure that if the cookie verifier changes when we use the zero-valued
cookie, then we invalidate any cached pages.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

6c34f05b

NFS: Adjust the amount of readahead performed by NFS readdir · 580f2367

Trond Myklebust authored Feb 07, 2022

The current NFS readdir code will always try to maximise the amount of
readahead it performs on the assumption that we can cache anything that
isn't immediately read by the process.
There are several cases where this assumption breaks down, including
when the 'ls -l' heuristic kicks in to try to force use of readdirplus
as a batch replacement for lookup/getattr.

This patch therefore tries to tone down the amount of readahead we
perform, and adjust it to try to match the amount of data being
requested by user space.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

580f2367

NFS: Don't advance the page pointer unless the page is full · c8f0523b

Trond Myklebust authored Feb 26, 2022

When we hit the end of the data in the readdir page, we don't want to
start filling a new page, unless this one is full.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

c8f0523b

NFS: Don't re-read the entire page cache to find the next cookie · 728dd0ab

Trond Myklebust authored Feb 22, 2022

If the page cache entry that was last read gets invalidated for some
reason, then make sure we can re-create it on the next call to readdir.
This, combined with the cache page validation, allows us to reuse the
cached value of page-index on successive calls to nfs_readdir.

Credit is due to Benjamin Coddington for showing that the concept works,
and that it allows for improved cache sharing between processes even in
the case where pages are lost due to LRU or active invalidation.
Suggested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

728dd0ab

NFS: Store the change attribute in the directory page cache · d09e673f

Trond Myklebust authored Feb 22, 2022

Use the change attribute and the first cookie in a directory page cache
entry to validate that the page is up to date.
Suggested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

d09e673f

28 Feb, 2022 7 commits

NFS: Calculate page offsets algorithmically · 0b2662b7

Trond Myklebust authored Feb 22, 2022

Instead of relying on counting the page offsets as we walk through the
page cache, switch to calculating them algorithmically.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

0b2662b7

NFS: Use kzalloc() to avoid initialising the nfs_open_dir_context · 281f31b2
Trond Myklebust authored Feb 22, 2022
```
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
```
281f31b2

NFS: Initialise the readdir verifier as best we can in nfs_opendir() · d1e32ea3

Trond Myklebust authored Feb 25, 2022

For the purpose of ensuring that opendir() followed by seekdir() work as
correctly as possible, try to initialise the readdir verifier in
nfs_opendir().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

d1e32ea3

NFS: Trace lookup revalidation failure · 2eef8a31

Trond Myklebust authored Feb 19, 2022

Enable tracing of lookup revalidation failures.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

2eef8a31

NFS: constify nfs_server_capable() and nfs_have_writebacks() · 1a93b82c
Trond Myklebust authored Feb 18, 2022
```
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
```
1a93b82c

NFS: Return valid errors from nfs2/3_decode_dirent() · 64cfca85

Trond Myklebust authored Feb 24, 2022

Valid return values for decode_dirent() callback functions are:
 0: Success
 -EBADCOOKIE: End of directory
 -EAGAIN: End of xdr_stream

All errors need to map into one of those three values.

Fixes: 573c4e1e ("NFS: Simplify ->decode_dirent() calling sequence")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

64cfca85

Revert "NFSv4: use unique client identifiers in network namespaces" · b38e09b9

Trond Myklebust authored Feb 28, 2022

This reverts commit 50c790a0.

The functionality is believed to be capable of causing regressions in
existing setups, so the author has requested that it be reverted.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

b38e09b9

25 Feb, 2022 4 commits

NFS: Use of mapping_set_error() results in spurious errors · 6c984083

Trond Myklebust authored Feb 15, 2022

The use of mapping_set_error() in conjunction with calls to
filemap_check_errors() is problematic because every error gets reported
as either an EIO or an ENOSPC by filemap_check_errors() in functions
such as filemap_write_and_wait() or filemap_write_and_wait_range().
In almost all cases, we prefer to use the more nuanced wb errors.

Fixes: b8946d7b ("NFS: Revalidate the file mapping on all fatal writeback errors")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

6c984083

NFS: Clean up NFSv4.2 xattrs · 84631f84

Trond Myklebust authored Feb 23, 2022

Add a helper for the xattr mask so that we can get rid of the inlined
ifdefs.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

84631f84

NFS: Remove unnecessary XATTR cache invalidation in nfs_fhget() · f1ec501d

Trond Myklebust authored Feb 23, 2022

We should never expect the 'xattr_cache' to be non-null in that case,
hence nfs_set_cache_invalid() is just going to optimise it away.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

f1ec501d

NFS: NFSv2/v3 clients should never be setting NFS_CAP_XATTR · b622ffe1

Trond Myklebust authored Feb 22, 2022

Ensure that we always initialise the 'xattr_support' field in struct
nfs_fsinfo, so that nfs_server_set_fsinfo() doesn't declare our NFSv2/v3
client to be capable of supporting the NFSv4.2 xattr protocol by setting
the NFS_CAP_XATTR capability.

This configuration can cause nfs_do_access() to set access mode bits
that are unsupported by the NFSv3 ACCESS call, which may confuse
spec-compliant servers.
Reported-by: Olga Kornievskaia <kolga@netapp.com>
Fixes: b78ef845 ("NFSv4.2: query the server for extended attribute support")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

b622ffe1