Commits · aba2072f452346d56a462718bcde93d697383148 · Kirill Smelkov / linux

19 Apr, 2021 4 commits

nfsd: grant read delegations to clients holding writes · aba2072f

J. Bruce Fields authored Apr 16, 2021

It's OK to grant a read delegation to a client that holds a write,
as long as it's the only client holding the write.

We originally tried to do this in commit 94415b06 ("nfsd4: a
client's own opens needn't prevent delegations"), which had to be
reverted in commit 6ee65a77 ("Revert "nfsd4: a client's own
opens needn't prevent delegations"").
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

aba2072f

nfsd: reshuffle some code · ebd9d2c2

J. Bruce Fields authored Apr 16, 2021

No change in behavior, I'm just moving some code around to avoid forward
references in a following patch.

(To do someday: figure out how to split up nfs4state.c.  It's big and
disorganized.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

ebd9d2c2

nfsd: track filehandle aliasing in nfs4_files · a0ce4837

J. Bruce Fields authored Apr 16, 2021

It's unusual but possible for multiple filehandles to point to the same
file.  In that case, we may end up with multiple nfs4_files referencing
the same inode.

For delegation purposes it will turn out to be useful to flag those
cases.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

a0ce4837

nfsd: hash nfs4_files by inode number · f9b60e22

J. Bruce Fields authored Apr 16, 2021

The nfs4_file structure is per-filehandle, not per-inode, because the
spec requires open and other state to be per filehandle.

But it will turn out to be convenient for nfs4_files associated with the
same inode to be hashed to the same bucket, so let's hash on the inode
instead of the filehandle.

Filehandle aliasing is rare, so that shouldn't have much performance
impact.

(If you have a ton of exported filesystems, though, and all of them have
a root with inode number 2, could that get you an overlong hash chain?
Perhaps this (and the v4 open file cache) should be hashed on the inode
pointer instead.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

f9b60e22

16 Apr, 2021 1 commit

nfsd: ensure new clients break delegations · 217fd6f6

J. Bruce Fields authored Apr 16, 2021

If nfsd already has an open file that it plans to use for IO from
another, it may not need to do another vfs open, but it still may need
to break any delegations in case the existing opens are for another
client.

Symptoms are that we may incorrectly fail to break a delegation on a
write open from a different client, when the delegation-holding client
already has a write open.

Fixes: 28df3d15 ("nfsd: clients don't need to break their own delegations")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

217fd6f6

15 Apr, 2021 2 commits

nfsd: removed unused argument in nfsd_startup_generic() · 70c53075

Vasily Averin authored Apr 15, 2021

Since commit 501cb184 ("nfsd: rip out the raparms cache")
nrservs is not used in nfsd_startup_generic()
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

70c53075

nfsd: remove unused function · 363f8dd5

Jiapeng Chong authored Apr 15, 2021

Fix the following clang warning:

fs/nfsd/nfs4state.c:6276:1: warning: unused function 'end_offset'
[-Wunused-function].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

363f8dd5

14 Apr, 2021 3 commits

svcrdma: Pass a useful error code to the send_err tracepoint · 8727f788

Chuck Lever authored Apr 11, 2021

Capture error codes in @ret, which is passed to the send_err
tracepoint, so that they can be logged when something goes awry.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

8727f788

svcrdma: Rename goto labels in svc_rdma_sendto() · c7731d5e

Chuck Lever authored Apr 13, 2021

Clean up: Make the goto labels consistent with other similar
functions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

c7731d5e

svcrdma: Don't leak send_ctxt on Send errors · 351461f3

Chuck Lever authored Apr 13, 2021

Address a rare send_ctxt leak in the svc_rdma_sendto() error paths.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

351461f3

06 Apr, 2021 2 commits

NFSD: Use DEFINE_SPINLOCK() for spinlock · b73ac680

Guobin Huang authored Apr 06, 2021

spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Guobin Huang <huangguobin4@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

b73ac680

sunrpc: Remove unused function ip_map_lookup · dee9f6ad

Jiapeng Chong authored Apr 06, 2021

Fix the following clang warnings:

net/sunrpc/svcauth_unix.c:306:30: warning: unused function
'ip_map_lookup' [-Wunused-function].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

dee9f6ad

01 Apr, 2021 1 commit

NFSv4.2: fix copy stateid copying for the async copy · e739b120

Olga Kornievskaia authored Mar 30, 2021

This patch fixes Dan Carpenter's report that the static checker
found a problem where memcpy() was copying into too small of a buffer.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e0639dc5 ("NFSD introduce async copy feature")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Dai Ngo <dai.ngo@oracle.com>

e739b120

31 Mar, 2021 5 commits

UAPI: nfsfh.h: Replace one-element array with flexible-array member · c0a744dc

Gustavo A. R. Silva authored Mar 23, 2021

There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].

Use an anonymous union with a couple of anonymous structs in order to
keep userspace unchanged:

$ pahole -C nfs_fhbase_new fs/nfsd/nfsfh.o
struct nfs_fhbase_new {
        union {
                struct {
                        __u8       fb_version_aux;       /*     0     1 */
                        __u8       fb_auth_type_aux;     /*     1     1 */
                        __u8       fb_fsid_type_aux;     /*     2     1 */
                        __u8       fb_fileid_type_aux;   /*     3     1 */
                        __u32      fb_auth[1];           /*     4     4 */
                };                                       /*     0     8 */
                struct {
                        __u8       fb_version;           /*     0     1 */
                        __u8       fb_auth_type;         /*     1     1 */
                        __u8       fb_fsid_type;         /*     2     1 */
                        __u8       fb_fileid_type;       /*     3     1 */
                        __u32      fb_auth_flex[0];      /*     4     0 */
                };                                       /*     0     4 */
        };                                               /*     0     8 */

        /* size: 8, cachelines: 1, members: 1 */
        /* last cacheline: 8 bytes */
};

Also, this helps with the ongoing efforts to enable -Warray-bounds by
fixing the following warnings:

fs/nfsd/nfsfh.c: In function ‘nfsd_set_fh_dentry’:
fs/nfsd/nfsfh.c:191:41: warning: array subscript 1 is above array bounds of ‘__u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds]
  191 |        ntohl((__force __be32)fh->fh_fsid[1])));
      |                              ~~~~~~~~~~~^~~
./include/linux/kdev_t.h:12:46: note: in definition of macro ‘MKDEV’
   12 | #define MKDEV(ma,mi) (((ma) << MINORBITS) | (mi))
      |                                              ^~
./include/uapi/linux/byteorder/little_endian.h:40:26: note: in expansion of macro ‘__swab32’
   40 | #define __be32_to_cpu(x) __swab32((__force __u32)(__be32)(x))
      |                          ^~~~~~~~
./include/linux/byteorder/generic.h:136:21: note: in expansion of macro ‘__be32_to_cpu’
  136 | #define ___ntohl(x) __be32_to_cpu(x)
      |                     ^~~~~~~~~~~~~
./include/linux/byteorder/generic.h:140:18: note: in expansion of macro ‘___ntohl’
  140 | #define ntohl(x) ___ntohl(x)
      |                  ^~~~~~~~
fs/nfsd/nfsfh.c:191:8: note: in expansion of macro ‘ntohl’
  191 |        ntohl((__force __be32)fh->fh_fsid[1])));
      |        ^~~~~
fs/nfsd/nfsfh.c:192:32: warning: array subscript 2 is above array bounds of ‘__u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds]
  192 |    fh->fh_fsid[1] = fh->fh_fsid[2];
      |                     ~~~~~~~~~~~^~~
fs/nfsd/nfsfh.c:192:15: warning: array subscript 1 is above array bounds of ‘__u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds]
  192 |    fh->fh_fsid[1] = fh->fh_fsid[2];
      |    ~~~~~~~~~~~^~~

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/109Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

c0a744dc

svcrdma: Clean up dto_q critical section in svc_rdma_recvfrom() · e3eded5e

Chuck Lever authored Mar 01, 2021

This, to me, seems less cluttered and less redundant. I was hoping
it could help reduce lock contention on the dto_q lock by reducing
the size of the critical section, but alas, the only improvement is
readability.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

e3eded5e

svcrdma: Remove svc_rdma_recv_ctxt::rc_pages and ::rc_arg · 5533c4f4

Chuck Lever authored Jan 13, 2021

These fields are no longer used.

The size of struct svc_rdma_recv_ctxt is now less than 300 bytes on
x86_64, down from 2440 bytes.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

5533c4f4

svcrdma: Remove sc_read_complete_q · 9af723be

Chuck Lever authored Dec 30, 2020

Now that svc_rdma_recvfrom() waits for Read completion,
sc_read_complete_q is no longer used.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

9af723be

svcrdma: Single-stage RDMA Read · 7d81ee87

Chuck Lever authored Dec 22, 2020

Currently the generic RPC server layer calls svc_rdma_recvfrom()
twice to retrieve an RPC message that uses Read chunks. I'm not
exactly sure why this design was chosen originally.

Instead, let's wait for the Read chunk completion inline in the
first call to svc_rdma_recvfrom().

The goal is to eliminate some page allocator churn.
rdma_read_complete() replaces pages in the second svc_rqst by
calling put_page() repeatedly while the upper layer waits for the
request to be constructed, which adds unnecessary NFS WRITE round-
trip latency.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Tom Talpey <tom@talpey.com>

7d81ee87

22 Mar, 2021 22 commits

SUNRPC: Move svc_xprt_received() call sites · 82011c80

Chuck Lever authored Jan 05, 2021

Currently, XPT_BUSY is not cleared until xpo_recvfrom returns.
That effectively blocks the receipt and handling of the next RPC
message until the current one has been taken off the transport.
This strict ordering is a requirement for socket transports.

For our kernel RPC/RDMA transport implementation, however, dequeuing
an ingress message is nothing more than a list_del(). The transport
can safely be marked un-busy as soon as that is done.

To keep the changes simpler, this patch just moves the
svc_xprt_received() call site from svc_handle_xprt() into the
transports, so that the actual optimization can be done in a
subsequent patch.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

82011c80

SUNRPC: Export svc_xprt_received() · 7dcfbd86

Chuck Lever authored Jan 29, 2021

Prepare svc_xprt_received() to be called from transport code instead
of from generic RPC server code.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

7dcfbd86

svcrdma: Retain the page backing rq_res.head[0].iov_base · cc93ce95

Chuck Lever authored Feb 01, 2021

svc_rdma_sendto() now waits for the NIC hardware to finish with
the pages backing rq_res. We still have to release the page array
in some cases, but now it's always safe to immediately re-use the
page backing rq_res's head buffer.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

cc93ce95

svcrdma: Remove unused sc_pages field · 57990067

Chuck Lever authored Jan 28, 2021

Clean up. This significantly reduces the size of struct
svc_rdma_send_ctxt.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

57990067

svcrdma: Normalize Send page handling · 2a1e4f21

Chuck Lever authored Jan 13, 2021

Currently svc_rdma_sendto() migrates xdr_buf pages into a separate
page list and NULLs out a bunch of entries in rq_pages while the
pages are under I/O. The Send completion handler then frees those
pages later.

Instead, let's wait for the Send completion, then handle page
releasing in the nfsd thread. I'd like to avoid the cost of 250+
put_page() calls in the Send completion handler, which is single-
threaded.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

2a1e4f21

svcrdma: Add a "deferred close" helper · e844d307

Chuck Lever authored Feb 20, 2021

Refactor a bit of commonly used logic so that every site that wants
a close deferred to an nfsd thread does all the right things
(set_bit(XPT_CLOSE) then enqueue).

Also, once XPT_CLOSE is set on a transport, it is never cleared. If
XPT_CLOSE is already set, then the close is already being handled
and the enqueue can be skipped.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

e844d307

svcrdma: Maintain a Receive water mark · c558d475

Chuck Lever authored Mar 11, 2021

Post more Receives when the number of pending Receives drops below
a water mark. The batch mechanism is disabled if the underlying
device cannot support a reasonably-sized Receive Queue.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

c558d475

svcrdma: Use svc_rdma_refresh_recvs() in wc_receive · 7b748c30

Chuck Lever authored Mar 11, 2021

Replace svc_rdma_post_recv() with the new batch receive mechanism.
For the moment it is posting just a single Receive WR at a time,
so no change in behavior is expected.

Since svc_rdma_wc_receive() was the last call site for
svc_rdma_post_recv(), it is removed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

7b748c30

svcrdma: Add a batch Receive posting mechanism · 77f0a2aa

Chuck Lever authored Mar 11, 2021

Introduce a server-side mechanism similar to commit e340c2d6
("xprtrdma: Reduce the doorbell rate (Receive)") to post Receive
WRs in batch. Its first consumer is svc_rdma_post_recvs(), which
posts the initial set of Receive WRs.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

77f0a2aa

svcrdma: Remove stale comment for svc_rdma_wc_receive() · c6b7ed8f

Chuck Lever authored Mar 11, 2021

xprt pinning was removed in commit 365e9992 ("svcrdma: Remove
transport reference counting"), but this comment was not updated
to reflect that change.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

c6b7ed8f

svcrdma: Provide an explanatory comment in CMA event handler · 270f25ed

Chuck Lever authored Mar 01, 2021

Clean up: explain why svc_xprt_enqueue() is invoked in the event
handler even though no xpt_flags bits are toggled here.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

270f25ed

svcrdma: RPCDBG_FACILITY is no longer used · 072db263
Chuck Lever authored Feb 20, 2021
```
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
```
072db263

nfsd: report client confirmation status in "info" file · 472d155a

NeilBrown authored Mar 20, 2021

mountd can now monitor clients appearing and disappearing in
/proc/fs/nfsd/clients, and will log these events, in liu of the logging
of mount/unmount events for NFSv3.

Currently it cannot distinguish between unconfirmed clients (which might
be transient and totally uninteresting) and confirmed clients.

So add a "status: " line which reports either "confirmed" or
"unconfirmed", and use fsnotify to report that the info file
has been modified.

This requires a bit of infrastructure to keep the dentry for the "info"
file.  There is no need to take a counted reference as the dentry must
remain around until the client is removed.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

472d155a

nfsd: don't ignore high bits of copy count · e7a833e9

J. Bruce Fields authored Mar 18, 2021

Note size_t is 32-bit on a 32-bit architecture, but cp_count is defined
by the protocol to be 64 bit, so we could be turning a large copy into a
0-length copy here.

Reported-by: <radchenkoy@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

e7a833e9

nfsd: COPY with length 0 should copy to end of file · 792a5112

J. Bruce Fields authored Mar 18, 2021

>From https://tools.ietf.org/html/rfc7862#page-65

	A count of 0 (zero) requests that all bytes from ca_src_offset
	through EOF be copied to the destination.

Reported-by: <radchenkoy@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

792a5112

nfsd: Fix typo "accesible" · 34a62493

Ricardo Ribalda authored Mar 18, 2021

Trivial fix.

Cc: linux-nfs@vger.kernel.org
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

34a62493

nfsd: Ensure knfsd shuts down when the "nfsd" pseudofs is unmounted · c6c7f2a8

Trond Myklebust authored Mar 13, 2021

In order to ensure that knfsd threads don't linger once the nfsd
pseudofs is unmounted (e.g. when the container is killed) we let
nfsd_umount() shut down those threads and wait for them to exit.

This also should ensure that we don't need to do a kernel mount of
the pseudofs, since the thread lifetime is now limited by the
lifetime of the filesystem.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

c6c7f2a8

nfsd: Log client tracking type log message as info instead of warning · f988a7b7

Paul Menzel authored Mar 12, 2021

`printk()`, by default, uses the log level warning, which leaves the
user reading

    NFSD: Using UMH upcall client tracking operations.

wondering what to do about it (`dmesg --level=warn`).

Several client tracking methods are tried, and expected to fail. That’s
why a message is printed only on success. It might be interesting for
users to know the chosen method, so use info-level instead of
debug-level.

Cc: linux-nfs@vger.kernel.org
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

f988a7b7

nfsd: helper for laundromat expiry calculations · 7f7e7a40

J. Bruce Fields authored Mar 02, 2021

We do this same logic repeatedly, and it's easy to get the sense of the
comparison wrong.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

7f7e7a40

NFSD: Clean up NFSDDBG_FACILITY macro · 219a1705

Chuck Lever authored Mar 05, 2021

These are no longer needed because there are no dprintk() call sites
in these files.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

219a1705

NFSD: Add a tracepoint to record directory entry encoding · 6019ce07

Chuck Lever authored Mar 05, 2021

Enable watching the progress of directory encoding to capture the
timing of any issues with reading or encoding a directory. The
new tracepoint captures dirent encoding for all NFS versions.

For example, here's what a few NFSv4 directory entries might look
like:

nfsd-989 [002] 468.596265: nfsd_dirent: fh_hash=0x5d162594 ino=2 name=.
nfsd-989 [002] 468.596267: nfsd_dirent: fh_hash=0x5d162594 ino=1 name=..
nfsd-989 [002] 468.596299: nfsd_dirent: fh_hash=0x5d162594 ino=3827 name=zlib.c
nfsd-989 [002] 468.596325: nfsd_dirent: fh_hash=0x5d162594 ino=3811 name=xdiff
nfsd-989 [002] 468.596351: nfsd_dirent: fh_hash=0x5d162594 ino=3810 name=xdiff-interface.h
nfsd-989 [002] 468.596377: nfsd_dirent: fh_hash=0x5d162594 ino=3809 name=xdiff-interface.c
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

6019ce07

NFSD: Clean up after updating NFSv3 ACL encoders · 1416f435
Chuck Lever authored Nov 15, 2020
```
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
```
1416f435