Commits · b95239ca4954a0d48b19c09ce7e8f31b453b4216 · Kirill Smelkov / linux

26 Sep, 2022 40 commits

nfsd: make nfsd4_run_cb a bool return function · b95239ca

Jeff Layton authored Sep 26, 2022

queue_work can return false and not queue anything, if the work is
already queued. If that happens in the case of a CB_RECALL, we'll have
taken an extra reference to the stid that will never be put. Ensure we
throw a warning in that case.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

b95239ca

nfsd: fix comments about spinlock handling with delegations · 25fbe1fc
Jeff Layton authored Sep 26, 2022
```
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
```
25fbe1fc

nfsd: only fill out return pointer on success in nfsd4_lookup_stateid · 4d01416a

Jeff Layton authored Sep 26, 2022

In the case of a revoked delegation, we still fill out the pointer even
when returning an error, which is bad form. Only overwrite the pointer
on success.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

4d01416a

NFSD: fix use-after-free on source server when doing inter-server copy · 019805fe

Dai Ngo authored Sep 26, 2022

Use-after-free occurred when the laundromat tried to free expired
cpntf_state entry on the s2s_cp_stateids list after inter-server
copy completed. The sc_cp_list that the expired copy state was
inserted on was already freed.

When COPY completes, the Linux client normally sends LOCKU(lock_state x),
FREE_STATEID(lock_state x) and CLOSE(open_state y) to the source server.
The nfs4_put_stid call from nfsd4_free_stateid cleans up the copy state
from the s2s_cp_stateids list before freeing the lock state's stid.

However, sometimes the CLOSE was sent before the FREE_STATEID request.
When this happens, the nfsd4_close_open_stateid call from nfsd4_close
frees all lock states on its st_locks list without cleaning up the copy
state on the sc_cp_list list. When the time the FREE_STATEID arrives the
server returns BAD_STATEID since the lock state was freed. This causes
the use-after-free error to occur when the laundromat tries to free
the expired cpntf_state.

This patch adds a call to nfs4_free_cpntf_statelist in
nfsd4_close_open_stateid to clean up the copy state before calling
free_ol_stateid_reaplist to free the lock state's stid on the reaplist.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

019805fe

NFSD: Cap rsize_bop result based on send buffer size · 76ce4dce

Chuck Lever authored Sep 01, 2022

Since before the git era, NFSD has conserved the number of pages
held by each nfsd thread by combining the RPC receive and send
buffers into a single array of pages. This works because there are
no cases where an operation needs a large RPC Call message and a
large RPC Reply at the same time.

Once an RPC Call has been received, svc_process() updates
svc_rqst::rq_res to describe the part of rq_pages that can be
used for constructing the Reply. This means that the send buffer
(rq_res) shrinks when the received RPC record containing the RPC
Call is large.

Add an NFSv4 helper that computes the size of the send buffer. It
replaces svc_max_payload() in spots where svc_max_payload() returns
a value that might be larger than the remaining send buffer space.
Callers who need to know the transport's actual maximum payload size
will continue to use svc_max_payload().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

76ce4dce

NFSD: Rename the fields in copy_stateid_t · 781fde1a

Chuck Lever authored Sep 22, 2022

Code maintenance: The name of the copy_stateid_t::sc_count field
collides with the sc_count field in struct nfs4_stid, making the
latter difficult to grep for when auditing stateid reference
counting.

No behavior change expected.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

781fde1a

nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_file_cache_stats_fops · 1342f9dd

ChenXiaoSong authored Sep 23, 2022

Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

1342f9dd

nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_reply_cache_stats_fops · 64776611

ChenXiaoSong authored Sep 23, 2022

Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.

nfsd_net is converted from seq_file->file instead of seq_file->private in
nfsd_reply_cache_stats_show().
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
[ cel: reduce line length ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

64776611

nfsd: use DEFINE_SHOW_ATTRIBUTE to define client_info_fops · 1d7f6b30

ChenXiaoSong authored Sep 23, 2022

Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.

inode is converted from seq_file->file instead of seq_file->private in
client_info_show().
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

1d7f6b30

nfsd: use DEFINE_SHOW_ATTRIBUTE to define export_features_fops and supported_enctypes_fops · 9beeaab8

ChenXiaoSong authored Sep 23, 2022

Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
[ cel: reduce line length ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

9beeaab8

nfsd: use DEFINE_PROC_SHOW_ATTRIBUTE to define nfsd_proc_ops · 0cfb0c42

ChenXiaoSong authored Sep 23, 2022

Use DEFINE_PROC_SHOW_ATTRIBUTE helper macro to simplify the code.
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

0cfb0c42

NFSD: Pack struct nfsd4_compoundres · 9f553e61

Chuck Lever authored Sep 12, 2022

Remove a couple of 4-byte holes on platforms with 64-bit pointers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

9f553e61

NFSD: Remove unused nfsd4_compoundargs::cachetype field · 77e378cf

Chuck Lever authored Sep 12, 2022

This field was added by commit 1091006c ("nfsd: turn on reply
cache for NFSv4") but was never put to use.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

77e378cf

NFSD: Remove "inline" directives on op_rsize_bop helpers · 6604148c

Chuck Lever authored Sep 12, 2022

These helpers are always invoked indirectly, so the compiler can't
inline these anyway. While we're updating the synopses of these
helpers, defensively convert their parameters to const pointers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

6604148c

NFSD: Clean up nfs4svc_encode_compoundres() · 9993a663

Chuck Lever authored Sep 12, 2022

In today's Linux NFS server implementation, the NFS dispatcher
initializes each XDR result stream, and the NFSv4 .pc_func and
.pc_encode methods all use xdr_stream-based encoding. This keeps
rq_res.len automatically updated. There is no longer a need for
the WARN_ON_ONCE() check in nfs4svc_encode_compoundres().
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

9993a663

SUNRPC: Fix typo in xdr_buf_subsegment's kdoc comment · b8ab2a6f

Chuck Lever authored Sep 12, 2022

Fix a typo.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

b8ab2a6f

NFSD: Clean up WRITE arg decoders · d4da5baa

Chuck Lever authored Sep 12, 2022

xdr_stream_subsegment() already returns a boolean value.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

d4da5baa

NFSD: Use xdr_inline_decode() to decode NFSv3 symlinks · c3d2a04f

Chuck Lever authored Sep 12, 2022

Replace the check for buffer over/underflow with a helper that is
commonly used for this purpose. The helper also sets xdr->nwords
correctly after successfully linearizing the symlink argument into
the stream's scratch buffer.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

c3d2a04f

NFSD: Refactor common code out of dirlist helpers · 98124f5b

Chuck Lever authored Sep 12, 2022

The dust has settled a bit and it's become obvious what code is
totally common between nfsd_init_dirlist_pages() and
nfsd3_init_dirlist_pages(). Move that common code to SUNRPC.

The new helper brackets the existing xdr_init_decode_pages() API.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

98124f5b

SUNRPC: Clarify comment that documents svc_max_payload() · f18d8afb

Chuck Lever authored Sep 12, 2022

Note the function returns a per-transport value, not a per-request
value (eg, one that is related to the size of the available send or
receive buffer space).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

f18d8afb

NFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing · 3fdc5464

Chuck Lever authored Sep 12, 2022

Have SunRPC clear everything except for the iops array. Then have
each NFSv4 XDR decoder clear it's own argument before decoding.

Now individual operations may have a large argument struct while not
penalizing the vast majority of operations with a small struct.

And, clearing the argument structure occurs as the argument fields
are initialized, enabling the CPU to do write combining on that
memory. In some cases, clearing is not even necessary because all
of the fields in the argument structure are initialized by the
decoder.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

3fdc5464

SUNRPC: Parametrize how much of argsize should be zeroed · 103cc1fa

Chuck Lever authored Sep 12, 2022

Currently, SUNRPC clears the whole of .pc_argsize before processing
each incoming RPC transaction. Add an extra parameter to struct
svc_procedure to enable upper layers to reduce the amount of each
operation's argument structure that is zeroed by SUNRPC.

The size of struct nfsd4_compoundargs, in particular, is a lot to
clear on each incoming RPC Call. A subsequent patch will cut this
down to something closer to what NFSv2 and NFSv3 uses.

This patch should cause no behavior changes.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

103cc1fa

SUNRPC: Optimize svc_process() · 81593c4d

Chuck Lever authored Sep 12, 2022

Move exception handling code out of the hot path, and avoid the need
for a bswap of a non-constant.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

81593c4d

NFSD: add shrinker to reap courtesy clients on low memory condition · 7746b32f

Dai Ngo authored Sep 14, 2022

Add courtesy_client_reaper to react to low memory condition triggered
by the system memory shrinker.

The delayed_work for the courtesy_client_reaper is scheduled on
the shrinker's count callback using the laundry_wq.

The shrinker's scan callback is not used for expiring the courtesy
clients due to potential deadlocks.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

7746b32f

NFSD: keep track of the number of courtesy clients in the system · 3a4ea23d

Dai Ngo authored Sep 14, 2022

Add counter nfs4_courtesy_client_count to nfsd_net to keep track
of the number of courtesy clients in the system.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

3a4ea23d

NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data · 06981d56

Anna Schumaker authored Sep 13, 2022

This was discussed with Chuck as part of this patch set. Returning
nfserr_resource was decided to not be the best error message here, and
he suggested changing to nfserr_serverfault instead.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Link: https://lore.kernel.org/linux-nfs/20220907195259.926736-1-anna@kernel.org/T/#tSigned-off-by: Chuck Lever <chuck.lever@oracle.com>

06981d56

NFSD: Make nfsd4_remove() wait before returning NFS4ERR_DELAY · 5f5f8b6d

Chuck Lever authored Sep 08, 2022

nfsd_unlink() can kick off a CB_RECALL (via
vfs_unlink() -> leases_conflict()) if a delegation is present.
Before returning NFS4ERR_DELAY, give the client holding that
delegation a chance to return it and then retry the nfsd_unlink()
again, once.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354Tested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

5f5f8b6d

NFSD: Make nfsd4_rename() wait before returning NFS4ERR_DELAY · 68c522af

Chuck Lever authored Sep 08, 2022

nfsd_rename() can kick off a CB_RECALL (via
vfs_rename() -> leases_conflict()) if a delegation is present.
Before returning NFS4ERR_DELAY, give the client holding that
delegation a chance to return it and then retry the nfsd_rename()
again, once.

This version of the patch handles renaming an existing file,
but does not deal with renaming onto an existing file. That
case will still always trigger an NFS4ERR_DELAY.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354Tested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

68c522af

NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY · 34b91dda

Chuck Lever authored Sep 08, 2022

nfsd_setattr() can kick off a CB_RECALL (via
notify_change() -> break_lease()) if a delegation is present. Before
returning NFS4ERR_DELAY, give the client holding that delegation a
chance to return it and then retry the nfsd_setattr() again, once.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354Tested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

34b91dda

NFSD: Refactor nfsd_setattr() · c0aa1913

Chuck Lever authored Sep 08, 2022

Move code that will be retried (in a subsequent patch) into a helper
function.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

c0aa1913

NFSD: Add a mechanism to wait for a DELEGRETURN · c035362e

Chuck Lever authored Sep 08, 2022

Subsequent patches will use this mechanism to wake up an operation
that is waiting for a client to return a delegation.

The new tracepoint records whether the wait timed out or was
properly awoken by the expected DELEGRETURN:

            nfsd-1155  [002] 83799.493199: nfsd_delegret_wakeup: xid=0x14b7d6ef fh_hash=0xf6826792 (timed out)
Suggested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

c035362e

NFSD: Add tracepoints to report NFSv4 callback completions · 1035d654

Chuck Lever authored Sep 08, 2022

Wireshark has always been lousy about dissecting NFSv4 callbacks,
especially NFSv4.0 backchannel requests. Add tracepoints so we
can surgically capture these events in the trace log.

Tracepoints are time-stamped and ordered so that we can now observe
the timing relationship between a CB_RECALL Reply and the client's
DELEGRETURN Call. Example:

nfsd-1153 [002] 211.986391: nfsd_cb_recall: addr=192.168.1.67:45767 client 62ea82e4:fee7492a stateid 00000003:00000001

nfsd-1153 [002] 212.095634: nfsd_compound: xid=0x0000002c opcnt=2
nfsd-1153 [002] 212.095647: nfsd_compound_status: op=1/2 OP_PUTFH status=0
nfsd-1153 [002] 212.095658: nfsd_file_put: hash=0xf72 inode=0xffff9291148c7410 ref=3 flags=HASHED|REFERENCED may=READ file=0xffff929103b3ea00
nfsd-1153 [002] 212.095661: nfsd_compound_status: op=2/2 OP_DELEGRETURN status=0
kworker/u25:8-148 [002] 212.096713: nfsd_cb_recall_done: client 62ea82e4:fee7492a stateid 00000003:00000001 status=0
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

1035d654

NFSD: Trace NFSv4 COMPOUND tags · de29cf7e

Chuck Lever authored Sep 08, 2022

The Linux NFSv4 client implementation does not use COMPOUND tags,
but the Solaris and MacOS implementations do, and so does pynfs.
Record these eye-catchers in the server's trace buffer to annotate
client requests while troubleshooting.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

de29cf7e

NFSD: Replace dprintk() call site in fh_verify() · 948755ef

Chuck Lever authored Sep 08, 2022

Record permission errors in the trace log. Note that the new trace
event is conditional, so it will only record non-zero return values
from nfsd_permission().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

948755ef

nfsd: remove nfsd4_prepare_cb_recall() declaration · 18224dc5

Gaosheng Cui authored Sep 09, 2022

nfsd4_prepare_cb_recall() has been removed since
commit 0162ac2b ("nfsd: introduce nfsd4_callback_ops"),
so remove it.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

18224dc5

nfsd: clean up mounted_on_fileid handling · 6106d911

Jeff Layton authored Sep 08, 2022

We only need the inode number for this, not a full rack of attributes.
Rename this function make it take a pointer to a u64 instead of
struct kstat, and change it to just request STATX_INO.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
[ cel: renamed get_mounted_on_ino() ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

6106d911

NFSD: Fix handling of oversized NFSv4 COMPOUND requests · 7518a3dc

Chuck Lever authored Sep 05, 2022

If an NFS server returns NFS4ERR_RESOURCE on the first operation in
an NFSv4 COMPOUND, there's no way for a client to know where the
problem is and then simplify the compound to make forward progress.

So instead, make NFSD process as many operations in an oversized
COMPOUND as it can and then return NFS4ERR_RESOURCE on the first
operation it did not process.

pynfs NFSv4.0 COMP6 exercises this case, but checks only for the
COMPOUND status code, not whether the server has processed any
of the operations.

pynfs NFSv4.1 SEQ6 and SEQ7 exercise the NFSv4.1 case, which detects
too many operations per COMPOUND by checking against the limits
negotiated when the session was created.
Suggested-by: Bruce Fields <bfields@fieldses.org>
Fixes: 0078117c ("nfsd: return RESOURCE not GARBAGE_ARGS on too many ops")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

7518a3dc

NFSD: drop fname and flen args from nfsd_create_locked() · 9558f930

NeilBrown authored Sep 06, 2022

nfsd_create_locked() does not use the "fname" and "flen" arguments, so
drop them from declaration and all callers.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

9558f930

NFSD: Protect against send buffer overflow in NFSv3 READ · fa6be9cc

Chuck Lever authored Sep 01, 2022

Since before the git era, NFSD has conserved the number of pages
held by each nfsd thread by combining the RPC receive and send
buffers into a single array of pages. This works because there are
no cases where an operation needs a large RPC Call message and a
large RPC Reply at the same time.

Once an RPC Call has been received, svc_process() updates
svc_rqst::rq_res to describe the part of rq_pages that can be
used for constructing the Reply. This means that the send buffer
(rq_res) shrinks when the received RPC record containing the RPC
Call is large.

A client can force this shrinkage on TCP by sending a correctly-
formed RPC Call header contained in an RPC record that is
excessively large. The full maximum payload size cannot be
constructed in that case.

Cc: <stable@vger.kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

fa6be9cc

NFSD: Protect against send buffer overflow in NFSv2 READ · 401bc1f9

Chuck Lever authored Sep 01, 2022

Since before the git era, NFSD has conserved the number of pages
held by each nfsd thread by combining the RPC receive and send
buffers into a single array of pages. This works because there are
no cases where an operation needs a large RPC Call message and a
large RPC Reply at the same time.

Once an RPC Call has been received, svc_process() updates
svc_rqst::rq_res to describe the part of rq_pages that can be
used for constructing the Reply. This means that the send buffer
(rq_res) shrinks when the received RPC record containing the RPC
Call is large.

A client can force this shrinkage on TCP by sending a correctly-
formed RPC Call header contained in an RPC record that is
excessively large. The full maximum payload size cannot be
constructed in that case.

Cc: <stable@vger.kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

401bc1f9