Commits · 3c59366c207e4c6c6569524af606baf017a55c61 · Kirill Smelkov / linux

08 Aug, 2022 1 commit

NFS: don't unhash dentry during unlink/rename · 3c59366c

NeilBrown authored Aug 01, 2022

NFS unlink() (and rename over existing target) must determine if the
file is open, and must perform a "silly rename" instead of an unlink (or
before rename) if it is.  Otherwise the client might hold a file open
which has been removed on the server.

Consequently if it determines that the file isn't open, it must block
any subsequent opens until the unlink/rename has been completed on the
server.

This is currently achieved by unhashing the dentry.  This forces any
open attempt to the slow-path for lookup which will block on i_rwsem on
the directory until the unlink/rename completes.  A future patch will
change the VFS to only get a shared lock on i_rwsem for unlink, so this
will no longer work.

Instead we introduce an explicit interlock.  A special value is stored
in dentry->d_fsdata while the unlink/rename is running and
->d_revalidate blocks while that value is present.  When ->d_revalidate
unblocks, the dentry will be invalid.  This closes the race
without requiring exclusion on i_rwsem.

d_fsdata is already used in two different ways.
1/ an IS_ROOT directory dentry might have a "devname" stored in
   d_fsdata.  Such a dentry doesn't have a name and so cannot be the
   target of unlink or rename.  For safety we check if an old devname
   is still stored, and remove it if it is.
2/ a dentry with DCACHE_NFSFS_RENAMED set will have a 'struct
   nfs_unlinkdata' stored in d_fsdata.  While this is set maydelete()
   will fail, so an unlink or rename will never proceed on such
   a dentry.

Neither of these can be in effect when a dentry is the target of unlink
or rename.  So we can expect d_fsdata to be NULL, and store a special
value ((void*)1) which is given the name NFS_FSDATA_BLOCKED to indicate
that any lookup will be blocked.

The d_count() is incremented under d_lock() when a lookup finds the
dentry, so we check d_count() is low, and set NFS_FSDATA_BLOCKED under
the same lock to avoid any races.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

3c59366c

02 Aug, 2022 2 commits

NFSv4/pnfs: Fix a use-after-free bug in open · 2135e5d5

Trond Myklebust authored Aug 02, 2022

If someone cancels the open RPC call, then we must not try to free
either the open slot or the layoutget operation arguments, since they
are likely still in use by the hung RPC call.

Fixes: 69494938 ("NFSv4: Don't hold the layoutget locks across multiple RPC calls")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

2135e5d5

NFS: nfs_async_write_reschedule_io must not recurse into the writeback code · b1a28f2e

Trond Myklebust authored Aug 01, 2022

It is not safe to call filemap_fdatawrite_range() from
nfs_async_write_reschedule_io(), since we're often calling from a page
reclaim context. Just let fsync() redrive the writeback for us.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

b1a28f2e

27 Jul, 2022 3 commits

SUNRPC: Don't reuse bvec on retransmission of the request · 72691a26

Trond Myklebust authored Jul 27, 2022

If a request is re-encoded and then retransmitted, we need to make sure
that we also re-encode the bvec, in case the page lists have changed.

Fixes: ff053dbb ("SUNRPC: Move the call to xprt_send_pagedata() out of xprt_sock_sendmsg()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

72691a26

SUNRPC: Reinitialise the backchannel request buffers before reuse · 6622e3a7

Trond Myklebust authored Jul 27, 2022

When we're reusing the backchannel requests instead of freeing them,
then we should reinitialise any values of the send/receive xdr_bufs so
that they reflect the available space.

Fixes: 0d2a970d ("SUNRPC: Fix a backchannel race")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

6622e3a7

NFSv4.1: RECLAIM_COMPLETE must handle EACCES · e35a5e78

Zhang Xianwei authored Jul 27, 2022

A client should be able to handle getting an EACCES error while doing
a mount operation to reclaim state due to NFS4CLNT_RECLAIM_REBOOT
being set. If the server returns RPC_AUTH_BADCRED because authentication
failed when we execute "exportfs -au", then RECLAIM_COMPLETE will go a
wrong way. After mount succeeds, all OPEN call will fail due to an
NFS4ERR_GRACE error being returned. This patch is to fix it by resending
a RPC request.
Signed-off-by: Zhang Xianwei <zhang.xianwei8@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Fixes: aa5190d0 ("NFSv4: Kill nfs4_async_handle_error() abuses by NFSv4.1")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

e35a5e78

25 Jul, 2022 11 commits

NFSv4.1 probe offline transports for trunking on session creation · f201bdfd

Olga Kornievskaia authored Jul 25, 2022

Once the session is established call into the SUNRPC layer to check
if any offlined trunking connections should be re-enabled.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

f201bdfd

SUNRPC create a function that probes only offline transports · 92cc04f6

Olga Kornievskaia authored Jul 25, 2022

For only offline transports, attempt to check connectivity via
a NULL call and, if that succeeds, call a provided session trunking
detection function.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

92cc04f6

SUNRPC export xprt_iter_rewind function · 273d6aed

Olga Kornievskaia authored Jul 25, 2022

Make xprt_iter_rewind callable outside of xprtmultipath.c
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

273d6aed

SUNRPC restructure rpc_clnt_setup_test_and_add_xprt · 7960aa9e

Olga Kornievskaia authored Jul 25, 2022

In preparation for code re-use, pull out the part of the
rpc_clnt_setup_test_and_add_xprt() portion that sends a NULL rpc
and then calls a session trunking function into a helper function.

Re-organize the end of the function for code re-use.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

7960aa9e

NFSv4.1 remove xprt from xprt_switch if session trunking test fails · e818bd08

Olga Kornievskaia authored Jul 25, 2022

If we are doing a session trunking test and it fails for the transport,
then remove this transport from the xprt_switch group.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

e818bd08

SUNRPC create an rpc function that allows xprt removal from rpc_clnt · 497e6464

Olga Kornievskaia authored Jul 25, 2022

Expose a function that allows a removal of xprt from the rpc_clnt.

When called from NFS that's running a trunked transport then don't
decrement the active transport counter.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

497e6464

SUNRPC enable back offline transports in trunking discovery · 9368fd6c

Olga Kornievskaia authored Jul 25, 2022

When we are adding a transport to a xprt_switch that's already on
the list but has been marked OFFLINE, then make the state ONLINE
since it's been tested now.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

9368fd6c

SUNRPC create an iterator to list only OFFLINE xprts · 95d0d30c

Olga Kornievskaia authored Jul 25, 2022

Create a new iterator helper that will go thru the all the transports
in the switch and return transports that are marked OFFLINE.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

95d0d30c

NFSv4.1 offline trunkable transports on DESTROY_SESSION · 88363d3e

Olga Kornievskaia authored Jul 25, 2022

When session is destroy, some of the transports might no longer be
valid trunks for the new session. Offline existing transports.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

88363d3e

SUNRPC add function to offline remove trunkable transports · 895245cc

Olga Kornievskaia authored Jul 25, 2022

Iterate thru available transports in the xprt_switch for all
trunkable transports offline and possibly remote them as well.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

895245cc

SUNRPC expose functions for offline remote xprt functionality · 7ffcdaa6

Olga Kornievskaia authored Jul 25, 2022

Re-arrange the code that make offline transport and delete transport
callable functions.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

7ffcdaa6

23 Jul, 2022 12 commits

SUNRPC: Remove xdr_align_data() and xdr_expand_hole() · 29946fbc

Anna Schumaker authored Jul 21, 2022

These functions are no longer needed now that the NFS client places data
and hole segments directly.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

29946fbc

NFS: Replace the READ_PLUS decoding code · d3b00a80

Anna Schumaker authored Jul 21, 2022

We now take a 2-step process that allows us to place data and hole
segments directly at their final position in the xdr_stream without
needing to do a bunch of redundant copies to expand holes. Due to the
variable lengths of each segment, the xdr metadata might cross page
boundaries which I account for by setting a small scratch buffer so
xdr_inline_decode() won't fail.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

d3b00a80

SUNRPC: Add a function for zeroing out a portion of an xdr_stream · e1bd8760

Anna Schumaker authored Jul 21, 2022

This will be used during READ_PLUS decoding for handling HOLE segments.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

e1bd8760

SUNRPC: Add a function for directly setting the xdr page len · 7c4cd5f4

Anna Schumaker authored Jul 21, 2022

We need to do this step during READ_PLUS decoding so that we know pages
are the right length and any extra data has been preserved in the tail.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

7c4cd5f4

SUNRPC: Introduce xdr_stream_move_subsegment() · 4f5f3b60

Anna Schumaker authored Jul 21, 2022

I do this by creating an xdr subsegment for the range we will be
operating over. This lets me shift data to the correct place without
potentially overwriting anything already there.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

4f5f3b60

NFS: Replace fs_context-related dprintk() call sites with tracepoints · 33ce83ef

Chuck Lever authored Jul 22, 2022

Contributed as part of the long patch series that converts NFS from
using dprintk to tracepoints for observability.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

33ce83ef

SUNRPC: Replace dprintk() call site in xs_data_ready · f67939e4

Chuck Lever authored Jul 22, 2022

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

f67939e4

SUNRPC: Fail faster on bad verifier · 0701214c

Chuck Lever authored Jul 22, 2022

A bad verifier is not a garbage argument, it's an authentication
failure. Retrying it doesn't make the problem go away, and delays
upper layer recovery steps.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

0701214c

nfs: only issue commit in DIO codepath if we have uncommitted data · 69d96651

Jeff Layton authored Jul 22, 2022

Currently, we try to determine whether to issue a commit based on
nfs_write_need_commit which looks at the current verifier. In the case
where we got a short write and then tried to follow it up with one that
failed, the verifier can't be trusted.

What we really want to know is whether the pgio request had any
successful writes that came back as UNSTABLE. Add a new flag to the pgio
request, and use that to indicate that we've had a successful unstable
write. Only issue a commit if that flag is set.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

69d96651

nfs: always check dreq->error after a commit · 55051c0c

Jeff Layton authored Jul 22, 2022

When the client gets back a short DIO write, it will then attempt to
issue another write to finish the DIO request. If that write then fails
(as is often the case in an -ENOSPC situation), then we still may need
to issue a COMMIT if the earlier short write was unstable. If that COMMIT
then succeeds, then we don't want the client to reschedule the write
requests, and to instead just return a short write. Otherwise, we can
end up looping over the same DIO write forever.

Always consult dreq->error after a successful RPC, even when the flag
state is not NFS_ODIRECT_DONE.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2028370Reported-by: Boyang Xue <bxue@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

55051c0c

nfs: add new nfs_direct_req tracepoint events · 8efc4bbe

Jeff Layton authored Jul 22, 2022

Add some new tracepoints to the DIO write code.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

8efc4bbe

SUNRPC: Shrink size of struct rpc_task · ba8ec7a6

Trond Myklebust authored Jul 23, 2022

Move the field 'tk_rpc_status' so that we eliminate a 4 byte hole in the
structure.
For x86_64, this shrinks the size of the struct by 8 bytes.

'pahole' output before the change:
        /* size: 232, cachelines: 4, members: 27 */
        /* sum members: 222, holes: 1, sum holes: 4 */
        /* sum bitfield members: 8 bits (1 bytes) */
        /* padding: 5 */
        /* last cacheline: 40 bytes */

'pahole' output after the change:
        /* size: 224, cachelines: 4, members: 27 */
        /* padding: 1 */
        /* last cacheline: 32 bytes */
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

ba8ec7a6

13 Jul, 2022 1 commit

NFSv4: Fix races in the legacy idmapper upcall · 51fd2eb5

Trond Myklebust authored Jul 13, 2022

nfs_idmap_instantiate() will cause the process that is waiting in
request_key_with_auxdata() to wake up and exit. If there is a second
process waiting for the idmap->idmap_mutex, then it may wake up and
start a new call to request_key_with_auxdata(). If the call to
idmap_pipe_downcall() from the first process has not yet finished
calling nfs_idmap_complete_pipe_upcall_locked(), then we may end up
triggering the WARN_ON_ONCE() in nfs_idmap_prepare_pipe_upcall().

The fix is to ensure that we clear idmap->idmap_upcall_data before
calling nfs_idmap_instantiate().

Fixes: e9ab41b6 ("NFSv4: Clean up the legacy idmapper upcall")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

51fd2eb5

12 Jul, 2022 8 commits

NFS: Allow setting rsize / wsize to a multiple of PAGE_SIZE · 940261a1

Anna Schumaker authored Jun 17, 2022

Previously, we required this to value to be a power of 2 for UDP related
reasons. This patch keeps the power of 2 rule for UDP but allows more
flexibility for TCP and RDMA.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

940261a1

sunrpc: fix expiry of auth creds · f1bafa73

Dan Aloni authored Jul 04, 2022

Before this commit, with a large enough LRU of expired items (100), the
loop skipped all the expired items and was entirely ineffectual in
trimming the LRU list.

Fixes: 95cd6232 ('SUNRPC: Clean up the AUTH cache code')
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

f1bafa73

nfs: fix port value parsing · 8b4e87a1

Ian Kent authored Jul 02, 2022

The valid values of nfs options port and mountport are 0 to USHRT_MAX.

The fs parser will return a fail for port values that are negative
and the sloppy option handling then returns success.

But the sloppy option handling is meant to return success for invalid
options not valid options with invalid values.

Restricting the sloppy option override to handle failure returns for
invalid options only is sufficient to resolve this problem.

Changes:

v2: utilize the return value from fs_parse() to resolve this problem
    instead of changing the parameter definitions.
Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

8b4e87a1

nfs: Replace kmap() with kmap_local_page() · c77c738c

Fabio M. De Francesco authored Jun 28, 2022

The use of kmap() is being deprecated in favor of kmap_local_page().

With kmap_local_page(), the mapping is per thread, CPU local and not
globally visible. Furthermore, the mapping can be acquired from any context
(including interrupts).

Therefore, use kmap_local_page() in nfs_do_filldir() because this mapping
is per thread, CPU local, and not globally visible.
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

c77c738c

NFS: remove redundant code in nfs_file_write() · 064109db

ChenXiaoSong authored Jun 23, 2022

filemap_fdatawait_range() will always return 0, after patch 6c984083
("NFS: Use of mapping_set_error() results in spurious errors"), it will not
save the wb err in struct address_space->flags:

  result = filemap_fdatawait_range(file->f_mapping, ...) = 0
    filemap_check_errors(mapping) = 0
      test_bit(..., &mapping->flags) // flags is 0
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

064109db

nfs/blocklayout: refactor block device opening · f931d837

Christoph Hellwig authored Jun 22, 2022

Deduplicate the helpers to open a device node by passing a name
prefix argument and using the same helper for both kinds of paths.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

f931d837

NFSv4.1: Handle NFS4ERR_DELAY replies to OP_SEQUENCE correctly · 7ccafd4b

Trond Myklebust authored Jul 12, 2022

Don't assume that the NFS4ERR_DELAY means that the server is processing
this slot id.

Fixes: 3453d570 ("NFSv4.1: Avoid false retries when RPC calls are interrupted")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

7ccafd4b

NFSv4.1: Don't decrease the value of seq_nr_highest_sent · f07a5d24

Trond Myklebust authored Jul 12, 2022

When we're trying to figure out what the server may or may not have seen
in terms of request numbers, do not assume that requests with a larger
number were missed, just because we saw a reply to a request with a
smaller number.

Fixes: 3453d570 ("NFSv4.1: Avoid false retries when RPC calls are interrupted")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

f07a5d24

10 Jul, 2022 2 commits

NFS: Fix case insensitive renames · 6ca0a6f8

Trond Myklebust authored Jun 27, 2022

For filesystems that are case insensitive and case preserving, we need
to be able to rename from one case folded variant of the filename to
another.
Currently, if we have looked up the target filename before the call to
rename, then we may have a hashed dentry with that target name in the
dcache, causing the vfs to optimise away the rename.
To avoid that, let's drop the target dentry, and leave it to the server
to optimise away the rename if that is the correct thing to do.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

6ca0a6f8

pNFS/files: Handle RDMA connection errors correctly · 431794e6

Trond Myklebust authored May 18, 2022

The RPC/RDMA driver will return -EPROTO and -ENODEV as connection errors
under certain circumstances. Make sure that we handle them correctly and
avoid cycling forever in a LAYOUTGET/LAYOUTRETURN loop.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

431794e6