Commits · 07d02a67b7faae56e184f6c35f78de47f06da37f · nexedi / linux

23 Oct, 2018 3 commits

Trond Myklebust authored Oct 12, 2018

We no longer need to worry about whether or not the entry is hashed in
order to figure out if the contents are valid. We only care whether or
not the refcount is non-zero.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

07d02a67

SUNRPC: Clean up the AUTH cache code · 95cd6232
Trond Myklebust authored Oct 11, 2018
```
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
```
95cd6232

NFS: change sign of nfs_fh length · 86bbd742

Frank Sorenson authored Oct 23, 2018

The filehandle has a length which is defined as a 32-bit
"unsigned integer".  Change sign of the length appropriately.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

86bbd742

18 Oct, 2018 5 commits

Merge tag 'nfs-rdma-for-4.20-1' of git://git.linux-nfs.org/projects/anna/linux-nfs · 93bdcf9f

Trond Myklebust authored Oct 18, 2018

NFS RDMA client updates for Linux 4.20

Stable bugfixes:
- Reset credit grant properly after a disconnect

Other bugfixes and cleanups:
- xprt_release_rqst_cong is called outside of transport_lock
- Create more MRs at a time and toss out old ones during recovery
- Various improvements to the RDMA connection and disconnection code:
  - Improve naming of trace events, functions, and variables
  - Add documenting comments
  - Fix metrics and stats reporting
- Fix a tracepoint sparse warning
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

93bdcf9f

sunrpc: safely reallow resvport min/max inversion · 826799e6

J. Bruce Fields authored Oct 18, 2018

Commits ffb6ca33 and e08ea3a9 prevent setting xprt_min_resvport
greater than xprt_max_resvport, but may also break simple code that sets
one parameter then the other, if the new range does not overlap the old.

Also it looks racy to me, unless there's some serialization I'm not
seeing.  Granted it would probably require malicious privileged processes
(unless there's a chance these might eventually be settable in unprivileged
containers), but still it seems better not to let userspace panic the
kernel.

Simpler seems to be to allow setting the parameters to whatever you want
but interpret xprt_min_resvport > xprt_max_resvport as the empty range.

Fixes: ffb6ca33 "sunrpc: Prevent resvport min/max inversion..."
Fixes: e08ea3a9 "sunrpc: Prevent rexvport min/max inversion..."
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

826799e6

nfs: remove redundant call to nfs_context_set_write_error() · fc187514

Benjamin Coddington authored Oct 18, 2018

We don't need to call this in the direct, read, or pnfs resend paths and
the only other caller is the write path in nfs_page_async_flush() which
already checks and sets the pg_error on the context.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

fc187514

nfs: Fix a missed page unlock after pg_doio() · fdbd1a2e

Benjamin Coddington authored Oct 18, 2018

We must check pg_error and call error_cleanup after any call to pg_doio.
Currently, we are skipping the unlock of a page if we encounter an error in
nfs_pageio_complete() before handing off the work to the RPC layer.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

fdbd1a2e

SUNRPC: Fix a compile warning for cmpxchg64() · e732f448
Trond Myklebust authored Oct 18, 2018
```
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
```
e732f448

05 Oct, 2018 2 commits

NFSv4.x: fix lock recovery during delegation recall · 44f411c3

Olga Kornievskaia authored Oct 04, 2018

Running "./nfstest_delegation --runtest recall26" uncovers that
client doesn't recover the lock when we have an appending open,
where the initial open got a write delegation.

Instead of checking for the passed in open context against
the file lock's open context. Check that the state is the same.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

44f411c3

SUNRPC: use cmpxchg64() in gss_seq_send64_fetch_and_inc() · 21924765

Arnd Bergmann authored Oct 02, 2018

The newly introduced gss_seq_send64_fetch_and_inc() fails to build on
32-bit architectures:

net/sunrpc/auth_gss/gss_krb5_seal.c:144:14: note: in expansion of macro 'cmpxchg'
   seq_send = cmpxchg(&ctx->seq_send64, old, old + 1);
              ^~~~~~~
arch/x86/include/asm/cmpxchg.h:128:3: error: call to '__cmpxchg_wrong_size' declared with attribute error: Bad argument size for cmpxchg
   __cmpxchg_wrong_size();     \

As the message tells us, cmpxchg() cannot be used on 64-bit arguments,
that's what cmpxchg64() does.

Fixes: 571ed1fd ("SUNRPC: Replace krb5_seq_lock with a lockless scheme")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

21924765

03 Oct, 2018 10 commits

xprtrdma: Squelch a sparse warning · 470443e0

Chuck Lever authored Oct 01, 2018

linux/include/trace/events/rpcrdma.h:501:1: warning: expression using sizeof bool
linux/include/trace/events/rpcrdma.h:501:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)

Fixes: ab03eff5 ("xprtrdma: Add trace points in RPC Call ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

470443e0

xprtrdma: Clean up xprt_rdma_disconnect_inject · ad091180

Chuck Lever authored Oct 01, 2018

Clean up: Use the appropriate C macro instead of open-coding
container_of() .
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

ad091180

xprtrdma: Add documenting comments · f26c32fa

Chuck Lever authored Oct 01, 2018

Clean up: fill in or update documenting comments for transport
switch entry points.

For xprt_rdma_allocate:

The first paragraph is no longer true since commit 5a6d1db4
("SUNRPC: Add a transport-specific private field in rpc_rqst").

The second paragraph is no longer true since commit 54cbd6b0
("xprtrdma: Delay DMA mapping Send and Receive buffers").
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

f26c32fa

xprtrdma: Report when there were zero posted Receives · 61c208a5

Chuck Lever authored Oct 01, 2018

To show that a caller did attempt to allocate and post more Receive
buffers, the trace point in rpcrdma_post_recvs() should report when
rpcrdma_post_recvs() was invoked but no new Receive buffers were
posted.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

61c208a5

xprtrdma: Move rb_flags initialization · 512ccfb6

Chuck Lever authored Oct 01, 2018

Clean up: rb_flags might be used for other things besides
RPCRDMA_BUF_F_EMPTY_SCQ, so initialize it in a generic spot
instead of in a send-completion-queue-related helper.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

512ccfb6

xprtrdma: Don't disable BH's in backchannel server · f7d46681

Chuck Lever authored Oct 01, 2018

Clean up: This code was copied from xprtsock.c and
backchannel_rqst.c. For rpcrdma, the backchannel server runs
exclusively in process context, thus disabling bottom-halves is
unnecessary.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

f7d46681

xprtrdma: Remove memory address of "ep" from an error message · 83e301dd

Chuck Lever authored Oct 01, 2018

Clean up: Replace the hashed memory address of the target rpcrdma_ep
with the server's IP address and port. The server address is more
useful in an administrative error message.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

83e301dd

xprtrdma: Rename rpcrdma_qp_async_error_upcall · f9521d53

Chuck Lever authored Oct 01, 2018

Clean up: Use a function name that is consistent with the RDMA core
API and with other consumers. Because this is a function that is
invoked from outside the rpcrdma.ko module, add an appropriate
documenting comment.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

f9521d53

xprtrdma: Simplify RPC wake-ups on connect · 31e62d25

Chuck Lever authored Oct 01, 2018

Currently, when a connection is established, rpcrdma_conn_upcall
invokes rpcrdma_conn_func and then
wake_up_all(&ep->rep_connect_wait). The former wakes waiting RPCs,
but the connect worker is not done yet, and that leads to races,
double wakes, and difficulty understanding how this logic is
supposed to work.

Instead, collect all the "connection established" logic in the
connect worker (xprt_rdma_connect_worker). A disconnect worker is
retained to handle provider upcalls safely.

Fixes: 254f91e2 ("xprtrdma: RPC/RDMA must invoke ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

31e62d25

xprtrdma: Re-organize the switch() in rpcrdma_conn_upcall · 316a616e

Chuck Lever authored Oct 01, 2018

Clean up: Eliminate the FALLTHROUGH into the default arm to make the
switch easier to understand.

Also, as long as I'm here, do not display the memory address of the
target rpcrdma_ep. A hashed memory address is of marginal use here.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

316a616e

02 Oct, 2018 10 commits

xprtrdma: Eliminate "connstate" variable from rpcrdma_conn_upcall() · aadc5a94

Chuck Lever authored Oct 01, 2018

Clean up.

Since commit 173b8f49 ("xprtrdma: Demote "connect" log messages")
there has been no need to initialize connstat to zero. In fact, in
this code path there's now no reason not to set rep_connected
directly.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

aadc5a94

xprtrdma: Conventional variable names in rpcrdma_conn_upcall · ed97f1f7

Chuck Lever authored Oct 01, 2018

Clean up: The convention throughout other parts of xprtrdma is to
name variables of type struct rpcrdma_xprt "r_xprt", not "xprt".
This convention enables the use of the name "xprt" for a "struct
rpc_xprt" type variable, as in other parts of the RPC client.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

ed97f1f7

xprtrdma: Rename rpcrdma_conn_upcall · ae38288e

Chuck Lever authored Oct 01, 2018

Clean up: Use a function name that is consistent with the RDMA core
API and with other consumers. Because this is a function that is
invoked from outside the rpcrdma.ko module, add an appropriate
documenting comment.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

ae38288e

sunrpc: Report connect_time in seconds · 8440a886

Chuck Lever authored Oct 01, 2018

The way connection-oriented transports report connect_time is wrong:
it's supposed to be in seconds, not in jiffies.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

8440a886

sunrpc: Fix connect metrics · 3968a8a5

Chuck Lever authored Oct 01, 2018

For TCP, the logic in xprt_connect_status is currently never invoked
to record a successful connection. Commit 2a491991 ("SUNRPC:
Return EAGAIN instead of ENOTCONN when waking up xprt->pending")
changed the way TCP xprt's are awoken after a connect succeeds.

Instead, change connection-oriented transports to bump connect_count
and compute connect_time the moment that XPRT_CONNECTED is set.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

3968a8a5

xprtrdma: Name MR trace events consistently · d379eaa8

Chuck Lever authored Oct 01, 2018

Clean up the names of trace events related to MRs so that it's
easy to enable these with a glob.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

d379eaa8

xprtrdma: Explicitly resetting MRs is no longer necessary · 61da886b

Chuck Lever authored Oct 01, 2018

When a memory operation fails, the MR's driver state might not match
its hardware state. The only reliable recourse is to dereg the MR.
This is done in ->ro_recover_mr, which then attempts to allocate a
fresh MR to replace the released MR.

Since commit e2ac236c ("xprtrdma: Allocate MRs on demand"),
xprtrdma dynamically allocates MRs. It can add more MRs whenever
they are needed.

That makes it possible to simply release an MR when a memory
operation fails, instead of "recovering" it. It will automatically
be replaced by the on-demand MR allocator.

This commit is a little larger than I wanted, but it replaces
->ro_recover_mr, rb_recovery_lock, rb_recovery_worker, and the
rb_stale_mrs list with a generic work queue.

Since MRs are no longer orphaned, the mrs_orphaned metric is no
longer used.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

61da886b

xprtrdma: Create more MRs at a time · c421ece6

Chuck Lever authored Oct 01, 2018

Some devices require more than 3 MRs to build a single 1MB I/O.
Ensure that rpcrdma_mrs_create() will add enough MRs to build that
I/O.

In a subsequent patch I'm changing the MR recovery logic to just
toss out the MRs. In that case it's possible for ->send_request to
loop acquiring some MRs, not getting enough, getting called again,
recycling the previous MRs, then not getting enough, lather rinse
repeat. Thus first we need to ensure enough MRs are created to
prevent that loop.

I'm "reusing" ia->ri_max_segs. All of its accessors seem to want the
maximum number of data segments plus two, so I'm going to bake that
into the initial calculation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

c421ece6

xprtrdma: Reset credit grant properly after a disconnect · ef739b21

Chuck Lever authored Oct 01, 2018

On a fresh connection, an RPC/RDMA client is supposed to send only
one RPC Call until it gets a credit grant in the first RPC Reply
from the server [RFC 8166, Section 3.3.3].

There is a bug in the Linux client's credit accounting mechanism
introduced by commit e7ce710a ("xprtrdma: Avoid deadlock when
credit window is reset"). On connect, it simply dumps all pending
RPC Calls onto the new connection.

Servers have been tolerant of this bad behavior. Currently no server
implementation ever changes its credit grant over reconnects, and
servers always repost enough Receives before connections are fully
established.

To correct this issue, ensure that the client resets both the credit
grant _and_ the congestion window when handling a reconnect.

Fixes: e7ce710a ("xprtrdma: Avoid deadlock when credit ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

ef739b21

xprtrdma: xprt_release_rqst_cong is called outside of transport_lock · 91ca1866

Chuck Lever authored Oct 01, 2018

Since commit ce7c252a ("SUNRPC: Add a separate spinlock to
protect the RPC request receive list") the RPC/RDMA reply handler
has been calling xprt_release_rqst_cong without holding
xprt->transport_lock.

I think the only way this call is ever made is if the credit grant
increases and there are RPCs pending. Current server implementations
do not change their credit grant during operation (except at
connect time).

Commit e7ce710a ("xprtrdma: Avoid deadlock when credit window is
reset") added the ->release_rqst call because UDP invokes
xprt_adjust_cwnd(), which calls __xprt_put_cong() after adjusting
xprt->cwnd. Both xprt_release() and ->xprt_release_xprt already wake
another task in this case, so it is safe to remove this call from
the reply handler.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

91ca1866

30 Sep, 2018 10 commits

NFSv4: Fix lookup revalidate of regular files · c7944ebb

Trond Myklebust authored Sep 28, 2018

If we're revalidating an existing dentry in order to open a file, we need
to ensure that we check the directory has not changed before we optimise
away the lookup.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

c7944ebb

NFS: Refactor nfs_lookup_revalidate() · 5ceb9d7f

Trond Myklebust authored Sep 28, 2018

Refactor the code in nfs_lookup_revalidate() as a stepping stone towards
optimising and fixing nfs4_lookup_revalidate().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

5ceb9d7f

NFS: Fix dentry revalidation on NFSv4 lookup · be189f7e

Trond Myklebust authored Sep 27, 2018

We need to ensure that inode and dentry revalidation occurs correctly
on reopen of a file that is already open. Currently, we can end up
not revalidating either in the case of NFSv4.0, due to the 'cached open'
path.
Let's fix that by ensuring that we only do cached open for the special
cases of open recovery and delegation return.
Reported-by: Stan Hu <stanhu@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

be189f7e

SUNRPC: Replace krb5_seq_lock with a lockless scheme · 571ed1fd
Trond Myklebust authored Sep 29, 2018
```
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
```
571ed1fd

SUNRPC: Lockless lookup of RPCSEC_GSS mechanisms · 0c1c19f4

Trond Myklebust authored Sep 29, 2018

Use RCU protected lookups for discovering the supported mechanisms.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

0c1c19f4

SUNRPC: Remove rpc_authflavor_lock in favour of RCU locking · 4e4c3bef

Trond Myklebust authored Sep 27, 2018

Module removal is RCU safe by design, so we really have no need to
lock the auth_flavors[] array. Substitute a lockless scheme to
add/remove entries in the array, and then use rcu.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

4e4c3bef

NFS: Remove private spinlock in struct nfs_pgio_header · 1c6c4b74

Trond Myklebust authored Sep 25, 2018

Now that each struct nfs_pgio_header corresponds to one RPC call, we
only have one writer to the struct nfs_pgio_header.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

1c6c4b74

NFSv4: Save a few bytes in the nfs_pgio_args/res · 28d52235

Trond Myklebust authored Sep 24, 2018

Save a few bytes by allowing the read/write specific fields of the
structures to share storage.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

28d52235

NFSv3: Improve NFSv3 performance when server returns no post-op attributes · 8d8928d8

Trond Myklebust authored Mar 05, 2018

When the server fails to return post-op attributes, the client's
attempt to place read data directly in the page cache fails, and
so we have to do an extra copy in order to realign the data with
page borders.
This patch attempts to detect servers that don't return post-op
attributes on read (e.g. for pNFS) and adjusts the placement
calculation accordingly.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

8d8928d8

NFSv4: Split out NFS v4.2 copy completion functions · 80f42368

Anna Schumaker authored Sep 20, 2018

The convention in the rest of the code is to have a separate function
for anything that might be ifdef-ed out.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

80f42368