Commits · 64776611a06322b99386f8dfe3b3ba1aa0347a38 · Kirill Smelkov / linux

26 Sep, 2022 40 commits

nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_reply_cache_stats_fops · 64776611

ChenXiaoSong authored Sep 23, 2022

Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.

nfsd_net is converted from seq_file->file instead of seq_file->private in
nfsd_reply_cache_stats_show().
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
[ cel: reduce line length ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

64776611

nfsd: use DEFINE_SHOW_ATTRIBUTE to define client_info_fops · 1d7f6b30

ChenXiaoSong authored Sep 23, 2022

Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.

inode is converted from seq_file->file instead of seq_file->private in
client_info_show().
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

1d7f6b30

nfsd: use DEFINE_SHOW_ATTRIBUTE to define export_features_fops and supported_enctypes_fops · 9beeaab8

ChenXiaoSong authored Sep 23, 2022

Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
[ cel: reduce line length ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

9beeaab8

nfsd: use DEFINE_PROC_SHOW_ATTRIBUTE to define nfsd_proc_ops · 0cfb0c42

ChenXiaoSong authored Sep 23, 2022

Use DEFINE_PROC_SHOW_ATTRIBUTE helper macro to simplify the code.
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

0cfb0c42

NFSD: Pack struct nfsd4_compoundres · 9f553e61

Chuck Lever authored Sep 12, 2022

Remove a couple of 4-byte holes on platforms with 64-bit pointers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

9f553e61

NFSD: Remove unused nfsd4_compoundargs::cachetype field · 77e378cf

Chuck Lever authored Sep 12, 2022

This field was added by commit 1091006c ("nfsd: turn on reply
cache for NFSv4") but was never put to use.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

77e378cf

NFSD: Remove "inline" directives on op_rsize_bop helpers · 6604148c

Chuck Lever authored Sep 12, 2022

These helpers are always invoked indirectly, so the compiler can't
inline these anyway. While we're updating the synopses of these
helpers, defensively convert their parameters to const pointers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

6604148c

NFSD: Clean up nfs4svc_encode_compoundres() · 9993a663

Chuck Lever authored Sep 12, 2022

In today's Linux NFS server implementation, the NFS dispatcher
initializes each XDR result stream, and the NFSv4 .pc_func and
.pc_encode methods all use xdr_stream-based encoding. This keeps
rq_res.len automatically updated. There is no longer a need for
the WARN_ON_ONCE() check in nfs4svc_encode_compoundres().
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

9993a663

SUNRPC: Fix typo in xdr_buf_subsegment's kdoc comment · b8ab2a6f

Chuck Lever authored Sep 12, 2022

Fix a typo.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

b8ab2a6f

NFSD: Clean up WRITE arg decoders · d4da5baa

Chuck Lever authored Sep 12, 2022

xdr_stream_subsegment() already returns a boolean value.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

d4da5baa

NFSD: Use xdr_inline_decode() to decode NFSv3 symlinks · c3d2a04f

Chuck Lever authored Sep 12, 2022

Replace the check for buffer over/underflow with a helper that is
commonly used for this purpose. The helper also sets xdr->nwords
correctly after successfully linearizing the symlink argument into
the stream's scratch buffer.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

c3d2a04f

NFSD: Refactor common code out of dirlist helpers · 98124f5b

Chuck Lever authored Sep 12, 2022

The dust has settled a bit and it's become obvious what code is
totally common between nfsd_init_dirlist_pages() and
nfsd3_init_dirlist_pages(). Move that common code to SUNRPC.

The new helper brackets the existing xdr_init_decode_pages() API.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

98124f5b

SUNRPC: Clarify comment that documents svc_max_payload() · f18d8afb

Chuck Lever authored Sep 12, 2022

Note the function returns a per-transport value, not a per-request
value (eg, one that is related to the size of the available send or
receive buffer space).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

f18d8afb

NFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing · 3fdc5464

Chuck Lever authored Sep 12, 2022

Have SunRPC clear everything except for the iops array. Then have
each NFSv4 XDR decoder clear it's own argument before decoding.

Now individual operations may have a large argument struct while not
penalizing the vast majority of operations with a small struct.

And, clearing the argument structure occurs as the argument fields
are initialized, enabling the CPU to do write combining on that
memory. In some cases, clearing is not even necessary because all
of the fields in the argument structure are initialized by the
decoder.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

3fdc5464

SUNRPC: Parametrize how much of argsize should be zeroed · 103cc1fa

Chuck Lever authored Sep 12, 2022

Currently, SUNRPC clears the whole of .pc_argsize before processing
each incoming RPC transaction. Add an extra parameter to struct
svc_procedure to enable upper layers to reduce the amount of each
operation's argument structure that is zeroed by SUNRPC.

The size of struct nfsd4_compoundargs, in particular, is a lot to
clear on each incoming RPC Call. A subsequent patch will cut this
down to something closer to what NFSv2 and NFSv3 uses.

This patch should cause no behavior changes.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

103cc1fa

SUNRPC: Optimize svc_process() · 81593c4d

Chuck Lever authored Sep 12, 2022

Move exception handling code out of the hot path, and avoid the need
for a bswap of a non-constant.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

81593c4d

NFSD: add shrinker to reap courtesy clients on low memory condition · 7746b32f

Dai Ngo authored Sep 14, 2022

Add courtesy_client_reaper to react to low memory condition triggered
by the system memory shrinker.

The delayed_work for the courtesy_client_reaper is scheduled on
the shrinker's count callback using the laundry_wq.

The shrinker's scan callback is not used for expiring the courtesy
clients due to potential deadlocks.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

7746b32f

NFSD: keep track of the number of courtesy clients in the system · 3a4ea23d

Dai Ngo authored Sep 14, 2022

Add counter nfs4_courtesy_client_count to nfsd_net to keep track
of the number of courtesy clients in the system.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

3a4ea23d

NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data · 06981d56

Anna Schumaker authored Sep 13, 2022

This was discussed with Chuck as part of this patch set. Returning
nfserr_resource was decided to not be the best error message here, and
he suggested changing to nfserr_serverfault instead.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Link: https://lore.kernel.org/linux-nfs/20220907195259.926736-1-anna@kernel.org/T/#tSigned-off-by: Chuck Lever <chuck.lever@oracle.com>

06981d56

NFSD: Make nfsd4_remove() wait before returning NFS4ERR_DELAY · 5f5f8b6d

Chuck Lever authored Sep 08, 2022

nfsd_unlink() can kick off a CB_RECALL (via
vfs_unlink() -> leases_conflict()) if a delegation is present.
Before returning NFS4ERR_DELAY, give the client holding that
delegation a chance to return it and then retry the nfsd_unlink()
again, once.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354Tested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

5f5f8b6d

NFSD: Make nfsd4_rename() wait before returning NFS4ERR_DELAY · 68c522af

Chuck Lever authored Sep 08, 2022

nfsd_rename() can kick off a CB_RECALL (via
vfs_rename() -> leases_conflict()) if a delegation is present.
Before returning NFS4ERR_DELAY, give the client holding that
delegation a chance to return it and then retry the nfsd_rename()
again, once.

This version of the patch handles renaming an existing file,
but does not deal with renaming onto an existing file. That
case will still always trigger an NFS4ERR_DELAY.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354Tested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

68c522af

NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY · 34b91dda

Chuck Lever authored Sep 08, 2022

nfsd_setattr() can kick off a CB_RECALL (via
notify_change() -> break_lease()) if a delegation is present. Before
returning NFS4ERR_DELAY, give the client holding that delegation a
chance to return it and then retry the nfsd_setattr() again, once.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354Tested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

34b91dda

NFSD: Refactor nfsd_setattr() · c0aa1913

Chuck Lever authored Sep 08, 2022

Move code that will be retried (in a subsequent patch) into a helper
function.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

c0aa1913

NFSD: Add a mechanism to wait for a DELEGRETURN · c035362e

Chuck Lever authored Sep 08, 2022

Subsequent patches will use this mechanism to wake up an operation
that is waiting for a client to return a delegation.

The new tracepoint records whether the wait timed out or was
properly awoken by the expected DELEGRETURN:

            nfsd-1155  [002] 83799.493199: nfsd_delegret_wakeup: xid=0x14b7d6ef fh_hash=0xf6826792 (timed out)
Suggested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

c035362e

NFSD: Add tracepoints to report NFSv4 callback completions · 1035d654

Chuck Lever authored Sep 08, 2022

Wireshark has always been lousy about dissecting NFSv4 callbacks,
especially NFSv4.0 backchannel requests. Add tracepoints so we
can surgically capture these events in the trace log.

Tracepoints are time-stamped and ordered so that we can now observe
the timing relationship between a CB_RECALL Reply and the client's
DELEGRETURN Call. Example:

nfsd-1153 [002] 211.986391: nfsd_cb_recall: addr=192.168.1.67:45767 client 62ea82e4:fee7492a stateid 00000003:00000001

nfsd-1153 [002] 212.095634: nfsd_compound: xid=0x0000002c opcnt=2
nfsd-1153 [002] 212.095647: nfsd_compound_status: op=1/2 OP_PUTFH status=0
nfsd-1153 [002] 212.095658: nfsd_file_put: hash=0xf72 inode=0xffff9291148c7410 ref=3 flags=HASHED|REFERENCED may=READ file=0xffff929103b3ea00
nfsd-1153 [002] 212.095661: nfsd_compound_status: op=2/2 OP_DELEGRETURN status=0
kworker/u25:8-148 [002] 212.096713: nfsd_cb_recall_done: client 62ea82e4:fee7492a stateid 00000003:00000001 status=0
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

1035d654

NFSD: Trace NFSv4 COMPOUND tags · de29cf7e

Chuck Lever authored Sep 08, 2022

The Linux NFSv4 client implementation does not use COMPOUND tags,
but the Solaris and MacOS implementations do, and so does pynfs.
Record these eye-catchers in the server's trace buffer to annotate
client requests while troubleshooting.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

de29cf7e

NFSD: Replace dprintk() call site in fh_verify() · 948755ef

Chuck Lever authored Sep 08, 2022

Record permission errors in the trace log. Note that the new trace
event is conditional, so it will only record non-zero return values
from nfsd_permission().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

948755ef

nfsd: remove nfsd4_prepare_cb_recall() declaration · 18224dc5

Gaosheng Cui authored Sep 09, 2022

nfsd4_prepare_cb_recall() has been removed since
commit 0162ac2b ("nfsd: introduce nfsd4_callback_ops"),
so remove it.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

18224dc5

nfsd: clean up mounted_on_fileid handling · 6106d911

Jeff Layton authored Sep 08, 2022

We only need the inode number for this, not a full rack of attributes.
Rename this function make it take a pointer to a u64 instead of
struct kstat, and change it to just request STATX_INO.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
[ cel: renamed get_mounted_on_ino() ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

6106d911

NFSD: Fix handling of oversized NFSv4 COMPOUND requests · 7518a3dc

Chuck Lever authored Sep 05, 2022

If an NFS server returns NFS4ERR_RESOURCE on the first operation in
an NFSv4 COMPOUND, there's no way for a client to know where the
problem is and then simplify the compound to make forward progress.

So instead, make NFSD process as many operations in an oversized
COMPOUND as it can and then return NFS4ERR_RESOURCE on the first
operation it did not process.

pynfs NFSv4.0 COMP6 exercises this case, but checks only for the
COMPOUND status code, not whether the server has processed any
of the operations.

pynfs NFSv4.1 SEQ6 and SEQ7 exercise the NFSv4.1 case, which detects
too many operations per COMPOUND by checking against the limits
negotiated when the session was created.
Suggested-by: Bruce Fields <bfields@fieldses.org>
Fixes: 0078117c ("nfsd: return RESOURCE not GARBAGE_ARGS on too many ops")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

7518a3dc

NFSD: drop fname and flen args from nfsd_create_locked() · 9558f930

NeilBrown authored Sep 06, 2022

nfsd_create_locked() does not use the "fname" and "flen" arguments, so
drop them from declaration and all callers.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

9558f930

NFSD: Protect against send buffer overflow in NFSv3 READ · fa6be9cc

Chuck Lever authored Sep 01, 2022

Since before the git era, NFSD has conserved the number of pages
held by each nfsd thread by combining the RPC receive and send
buffers into a single array of pages. This works because there are
no cases where an operation needs a large RPC Call message and a
large RPC Reply at the same time.

Once an RPC Call has been received, svc_process() updates
svc_rqst::rq_res to describe the part of rq_pages that can be
used for constructing the Reply. This means that the send buffer
(rq_res) shrinks when the received RPC record containing the RPC
Call is large.

A client can force this shrinkage on TCP by sending a correctly-
formed RPC Call header contained in an RPC record that is
excessively large. The full maximum payload size cannot be
constructed in that case.

Cc: <stable@vger.kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

fa6be9cc

NFSD: Protect against send buffer overflow in NFSv2 READ · 401bc1f9

Chuck Lever authored Sep 01, 2022

Since before the git era, NFSD has conserved the number of pages
held by each nfsd thread by combining the RPC receive and send
buffers into a single array of pages. This works because there are
no cases where an operation needs a large RPC Call message and a
large RPC Reply at the same time.

Once an RPC Call has been received, svc_process() updates
svc_rqst::rq_res to describe the part of rq_pages that can be
used for constructing the Reply. This means that the send buffer
(rq_res) shrinks when the received RPC record containing the RPC
Call is large.

A client can force this shrinkage on TCP by sending a correctly-
formed RPC Call header contained in an RPC record that is
excessively large. The full maximum payload size cannot be
constructed in that case.

Cc: <stable@vger.kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

401bc1f9

NFSD: Protect against send buffer overflow in NFSv3 READDIR · 640f87c1

Chuck Lever authored Sep 01, 2022

Since before the git era, NFSD has conserved the number of pages
held by each nfsd thread by combining the RPC receive and send
buffers into a single array of pages. This works because there are
no cases where an operation needs a large RPC Call message and a
large RPC Reply message at the same time.

Once an RPC Call has been received, svc_process() updates
svc_rqst::rq_res to describe the part of rq_pages that can be
used for constructing the Reply. This means that the send buffer
(rq_res) shrinks when the received RPC record containing the RPC
Call is large.

A client can force this shrinkage on TCP by sending a correctly-
formed RPC Call header contained in an RPC record that is
excessively large. The full maximum payload size cannot be
constructed in that case.

Thanks to Aleksi Illikainen and Kari Hulkko for uncovering this
issue.
Reported-by: Ben Ronallo <Benjamin.Ronallo@synopsys.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

640f87c1

NFSD: Protect against send buffer overflow in NFSv2 READDIR · 00b44926

Chuck Lever authored Sep 01, 2022

Restore the previous limit on the @count argument to prevent a
buffer overflow attack.

Fixes: 53b1119a ("NFSD: Fix READDIR buffer overflow")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

00b44926

SUNRPC: Fix svcxdr_init_encode's buflen calculation · 1242a87d

Chuck Lever authored Sep 01, 2022

Commit 2825a7f9 ("nfsd4: allow encoding across page boundaries")
added an explicit computation of the remaining length in the rq_res
XDR buffer.

The computation appears to suffer from an "off-by-one" bug. Because
buflen is too large by one page, XDR encoding can run off the end of
the send buffer by eventually trying to use the struct page address
in rq_page_end, which always contains NULL.

Fixes: bddfdbcd ("NFSD: Extract the svcxdr_init_encode() helper")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

1242a87d

SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation · 90bfc37b

Chuck Lever authored Sep 01, 2022

Ensure that stream-based argument decoding can't go past the actual
end of the receive buffer. xdr_init_decode's calculation of the
value of xdr->end over-estimates the end of the buffer because the
Linux kernel RPC server code does not remove the size of the RPC
header from rqstp->rq_arg before calling the upper layer's
dispatcher.

The server-side still uses the svc_getnl() macros to decode the
RPC call header. These macros reduce the length of the head iov
but do not update the total length of the message in the buffer
(buf->len).

A proper fix for this would be to replace the use of svc_getnl() and
friends in the RPC header decoder, but that would be a large and
invasive change that would be difficult to backport.

Fixes: 5191955d ("SUNRPC: Prepare for xdr_stream-style decoding on the server-side")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

90bfc37b

NFSD: Increase NFSD_MAX_OPS_PER_COMPOUND · 80e591ce

Chuck Lever authored Sep 02, 2022

When attempting an NFSv4 mount, a Solaris NFSv4 client builds a
single large COMPOUND that chains a series of LOOKUPs to get to the
pseudo filesystem root directory that is to be mounted. The Linux
NFS server's current maximum of 16 operations per NFSv4 COMPOUND is
not large enough to ensure that this works for paths that are more
than a few components deep.

Since NFSD_MAX_OPS_PER_COMPOUND is mostly a sanity check, and most
NFSv4 COMPOUNDS are between 3 and 6 operations (thus they do not
trigger any re-allocation of the operation array on the server),
increasing this maximum should result in little to no impact.

The ops array can get large now, so allocate it via vmalloc() to
help ensure memory fragmentation won't cause an allocation failure.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216383Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

80e591ce

nfsd: Propagate some error code returned by memdup_user() · 30a30fcc

Christophe JAILLET authored Sep 01, 2022

Propagate the error code returned by memdup_user() instead of a hard coded
-EFAULT.
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

30a30fcc

nfsd: Avoid some useless tests · d44899b8

Christophe JAILLET authored Sep 01, 2022

memdup_user() can't return NULL, so there is no point for checking for it.

Simplify some tests accordingly.
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

d44899b8