Commits · 27adc785928ae6b34cdda96f472735b77c91e247 · nexedi / linux

15 Mar, 2019 3 commits

SUNRPC: Use the ENOTCONN error on socket disconnect · 27adc785

Trond Myklebust authored Mar 15, 2019

When the socket is closed, we currently send an EAGAIN error to all
pending requests in order to ask them to retransmit. Use ENOTCONN
instead, to ensure that they try to reconnect before attempting to
transmit.
This also helps SOFTCONN tasks to behave correctly in this
situation.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

27adc785

SUNRPC: Fix the minimal size for reply buffer allocation · 51314960

Trond Myklebust authored Mar 15, 2019

We must at minimum allocate enough memory to be able to see any auth
errors in the reply from the server.

Fixes: 2c94b8ec ("SUNRPC: Use au_rslack when computing reply...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

51314960

SUNRPC: Fix a client regression when handling oversized replies · 9734ad57

Trond Myklebust authored Mar 15, 2019

If the server sends a reply that is larger than the pre-allocated
buffer, then the current code may fail to register how much of
the stream that it has finished reading. This again can lead to
hangs.

Fixes: e92053a5 ("SUNRPC: Handle zero length fragments correctly")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

9734ad57

12 Mar, 2019 2 commits

pNFS: Fix a typo in pnfs_update_layout · 400417b0

Trond Myklebust authored Mar 12, 2019

We're supposed to wait for the outstanding layout count to go to zero,
but that got lost somehow.

Fixes: d03360aa ("pNFS: Ensure we return the error if someone...")
Reported-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

400417b0

fix null pointer deref in tracepoints in back channel · f87b543a

Olga Kornievskaia authored Mar 12, 2019

Backchannel doesn't have the rq_task->tk_clientid pointer set.

Otherwise can lead to the following oops:
ocalhost login: [  111.385319] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[  111.388073] #PF error: [normal kernel read fault]
[  111.389452] PGD 80000000290d8067 P4D 80000000290d8067 PUD 75f25067 PMD 0
[  111.391224] Oops: 0000 [#1] SMP PTI
[  111.392151] CPU: 0 PID: 3533 Comm: NFSv4 callback Not tainted 5.0.0-rc7+ #1
[  111.393787] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[  111.396340] RIP: 0010:trace_event_raw_event_xprt_enq_xmit+0x6f/0xf0 [sunrpc]
[  111.397974] Code: 00 00 00 48 89 ee 48 89 e7 e8 bd 0a 85 d7 48 85 c0 74 4a 41 0f b7 94 24 e0 00 00 00 48 89 e7 89 50 08 49 8b 94 24 a8 00 00 00 <8b> 52 04 89 50 0c 49 8b 94 24 c0 00 00 00 8b 92 a8 00 00 00 0f ca
[  111.402215] RSP: 0018:ffffb98743263cf8 EFLAGS: 00010286
[  111.403406] RAX: ffffa0890fc3bc88 RBX: 0000000000000003 RCX: 0000000000000000
[  111.405057] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffb98743263cf8
[  111.406656] RBP: ffffa0896f5368f0 R08: 0000000000000246 R09: 0000000000000000
[  111.408437] R10: ffffe19b01c01500 R11: 0000000000000000 R12: ffffa08977d28a00
[  111.410210] R13: 0000000000000004 R14: ffffa089315303f0 R15: ffffa08931530000
[  111.411856] FS:  0000000000000000(0000) GS:ffffa0897bc00000(0000) knlGS:0000000000000000
[  111.413699] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  111.415068] CR2: 0000000000000004 CR3: 000000002ac90004 CR4: 00000000001606f0
[  111.416745] Call Trace:
[  111.417339]  xprt_request_enqueue_transmit+0x2b6/0x4a0 [sunrpc]
[  111.418709]  ? rpc_task_need_encode+0x40/0x40 [sunrpc]
[  111.419957]  call_bc_transmit+0xd5/0x170 [sunrpc]
[  111.421067]  __rpc_execute+0x7e/0x3f0 [sunrpc]
[  111.422177]  rpc_run_bc_task+0x78/0xd0 [sunrpc]
[  111.423212]  bc_svc_process+0x281/0x340 [sunrpc]
[  111.424325]  nfs41_callback_svc+0x130/0x1c0 [nfsv4]
[  111.425430]  ? remove_wait_queue+0x60/0x60
[  111.426398]  kthread+0xf5/0x130
[  111.427155]  ? nfs_callback_authenticate+0x50/0x50 [nfsv4]
[  111.428388]  ? kthread_bind+0x10/0x10
[  111.429270]  ret_from_fork+0x1f/0x30

localhost login: [  467.462259] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[  467.464411] #PF error: [normal kernel read fault]
[  467.465445] PGD 80000000728c1067 P4D 80000000728c1067 PUD 728c0067 PMD 0
[  467.466980] Oops: 0000 [#1] SMP PTI
[  467.467759] CPU: 0 PID: 3517 Comm: NFSv4 callback Not tainted 5.0.0-rc7+ #1
[  467.469393] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[  467.471840] RIP: 0010:trace_event_raw_event_xprt_transmit+0x7c/0xf0 [sunrpc]
[  467.473392] Code: f6 48 85 c0 74 4b 49 8b 94 24 98 00 00 00 48 89 e7 0f b7 92 e0 00 00 00 89 50 08 49 8b 94 24 98 00 00 00 48 8b 92 a8 00 00 00 <8b> 52 04 89 50 0c 41 8b 94 24 a8 00 00 00 0f ca 89 50 10 41 8b 94
[  467.477605] RSP: 0018:ffffabe7434fbcd0 EFLAGS: 00010282
[  467.478793] RAX: ffff99720fc3bce0 RBX: 0000000000000003 RCX: 0000000000000000
[  467.480409] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffabe7434fbcd0
[  467.482011] RBP: ffff99726f631948 R08: 0000000000000246 R09: 0000000000000000
[  467.483591] R10: 0000000070000000 R11: 0000000000000000 R12: ffff997277dfcc00
[  467.485226] R13: 0000000000000000 R14: 0000000000000000 R15: ffff99722fecdca8
[  467.486830] FS:  0000000000000000(0000) GS:ffff99727bc00000(0000) knlGS:0000000000000000
[  467.488596] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  467.489931] CR2: 0000000000000004 CR3: 00000000270e6006 CR4: 00000000001606f0
[  467.491559] Call Trace:
[  467.492128]  xprt_transmit+0x303/0x3f0 [sunrpc]
[  467.493143]  ? rpc_task_need_encode+0x40/0x40 [sunrpc]
[  467.494328]  call_bc_transmit+0x49/0x170 [sunrpc]
[  467.495379]  __rpc_execute+0x7e/0x3f0 [sunrpc]
[  467.496451]  rpc_run_bc_task+0x78/0xd0 [sunrpc]
[  467.497467]  bc_svc_process+0x281/0x340 [sunrpc]
[  467.498507]  nfs41_callback_svc+0x130/0x1c0 [nfsv4]
[  467.499751]  ? remove_wait_queue+0x60/0x60
[  467.500686]  kthread+0xf5/0x130
[  467.501438]  ? nfs_callback_authenticate+0x50/0x50 [nfsv4]
[  467.502640]  ? kthread_bind+0x10/0x10
[  467.503454]  ret_from_fork+0x1f/0x30
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

f87b543a

10 Mar, 2019 3 commits

SUNRPC: Take the transport send lock before binding+connecting · 4d6c671a

Trond Myklebust authored Mar 10, 2019

Before trying to bind a port, ensure we grab the send lock to
ensure that we don't change the port while another task is busy
transmitting requests.
The connect code already takes the send lock in xprt_connect(),
but it is harmless to take it before that.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

4d6c671a

SUNRPC: Micro-optimise when the task is known not to be sleeping · 009a82f6

Trond Myklebust authored Mar 09, 2019

In cases where we know the task is not sleeping, try to optimise
away the indirect call to task->tk_action() by replacing it with
a direct call.
Only change tail calls, to allow gcc to perform tail call
elimination.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

009a82f6

SUNRPC: Check whether the task was transmitted before rebind/reconnect · 03e51d32

Trond Myklebust authored Mar 10, 2019

Before initiating transport actions that require putting the task to sleep,
such as rebinding or reconnecting, we should check whether or not the task
was already transmitted.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

03e51d32

09 Mar, 2019 2 commits

SUNRPC: Remove redundant calls to RPC_IS_QUEUED() · 6b5f5900

Trond Myklebust authored Mar 09, 2019

The RPC task wakeup calls all check for RPC_IS_QUEUED() before taking any
locks. In addition, rpc_exit() already calls rpc_wake_up_queued_task().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

6b5f5900

SUNRPC: Clean up · cea57789

Trond Myklebust authored Mar 09, 2019

Replace remaining callers of call_timeout() with rpc_check_timeout().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

cea57789

07 Mar, 2019 3 commits

SUNRPC: Respect RPC call timeouts when retrying transmission · 7b3fef8e

Trond Myklebust authored Mar 07, 2019

Fix a regression where soft and softconn requests are not timing out
as expected.

Fixes: 89f90fe1 ("SUNRPC: Allow calls to xprt_transmit() to drain...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

7b3fef8e

SUNRPC: Fix up RPC back channel transmission · 477687e1

Trond Myklebust authored Mar 05, 2019

Now that transmissions happen through a queue, we require the RPC tasks
to handle error conditions that may have been set while they were
sleeping. The back channel does not currently do this, but assumes
that any error condition happens during its own call to xprt_transmit().

The solution is to ensure that the back channel splits out the
error handling just like the forward channel does.

Fixes: 89f90fe1 ("SUNRPC: Allow calls to xprt_transmit() to drain...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

477687e1

SUNRPC: Prevent thundering herd when the socket is not connected · ed7dc973

Trond Myklebust authored Mar 04, 2019

If the socket is not connected, then we want to initiate a reconnect
rather that trying to transmit requests. If there is a large number
of requests queued and waiting for the lock in call_transmit(),
then it can take a while for one of the to loop back and retake
the lock in call_connect.

Fixes: 89f90fe1 ("SUNRPC: Allow calls to xprt_transmit() to drain...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

ed7dc973

02 Mar, 2019 17 commits

SUNRPC: Allow dynamic allocation of back channel slots · 0d1bf340

Trond Myklebust authored Mar 02, 2019

Now that the reads happen in a process context rather than a softirq,
it is safe to allocate back channel slots using a reclaiming
allocation.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

0d1bf340

NFSv4.1: Bump the default callback session slot count to 16 · 067c4696

Trond Myklebust authored Mar 02, 2019

Users can still control this value explicitly using the
max_session_cb_slots module parameter, but let's bump the default
up to 16 for now.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

067c4696

SUNRPC: Convert remaining GFP_NOIO, and GFP_NOWAIT sites in sunrpc · 12a3ad61

Trond Myklebust authored Mar 02, 2019

Convert the remaining gfp_flags arguments in sunrpc to standard reclaiming
allocations, now that we set memalloc_nofs_save() as appropriate.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

12a3ad61

NFS/flexfiles: Clean up mirror DS initialisation · cefa587a

Trond Myklebust authored Feb 28, 2019

Get rid of the redundant parameter and rename the function
ff_layout_mirror_valid() to ff_layout_init_mirror_ds() for clarity.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

cefa587a

NFS/flexfiles: Remove dead code in ff_layout_mirror_valid() · 29a23909

Trond Myklebust authored Feb 28, 2019

nfs4_ff_alloc_deviceid_node() guarantees that if mirror->mirror_ds is
a valid pointer, then so is mirror->mirror_ds->ds.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

29a23909

NFS/flexfile: Simplify nfs4_ff_layout_select_ds_stateid() · 4cbc8a57

Trond Myklebust authored Feb 28, 2019

Pass in a pointer to the mirror rather than forcing another
array access.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

4cbc8a57

NFS/flexfile: Simplify nfs4_ff_layout_ds_version() · 626d48b1

Trond Myklebust authored Feb 28, 2019

Pass in a pointer to the mirror rather than forcing another
array access.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

626d48b1

NFS/flexfiles: Simplify ff_layout_get_ds_cred() · 312cd4cb

Trond Myklebust authored Feb 28, 2019

Pass in a pointer to the mirror rather than forcing another
array access.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

312cd4cb

NFS/flexfiles: Simplify nfs4_ff_find_or_create_ds_client() · 561d6f8a

Trond Myklebust authored Feb 28, 2019

Pass in a pointer to the mirror rather than forcing another
array access.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

561d6f8a

NFS/flexfiles: Simplify nfs4_ff_layout_select_ds_fh() · 749da527

Trond Myklebust authored Feb 28, 2019

Pass in a pointer to the mirror rather than having to retrieve it from
the array and then verify the resulting pointer.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

749da527

NFS/flexfiles: Speed up read failover when DSes are down · 76c66905

Trond Myklebust authored Feb 14, 2019

If we notice that a DS may be down, we should attempt to read from the
other mirrors first before we go back to retry the dead DS.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

76c66905

NFS/flexfiles: Don't invalidate DS deviceids for being unresponsive · 17aaec81

Trond Myklebust authored Feb 26, 2019

If the DS is unresponsive, we want to just mark it as such, while
reporting the errors. If the server later returns the same deviceid
in a new layout, then we don't want to have to look it up again.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

17aaec81

NFS/flexfiles: Remove bogus checks for invalid deviceids · d082d4b5

Trond Myklebust authored Feb 26, 2019

We already check the deviceids before we start the RPC call.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

d082d4b5

NFS/flexfiles: Avoid unnecessary layout invalidations · 0a156dd5

Trond Myklebust authored Feb 27, 2019

In ff_layout_mirror_valid() we may not want to invalidate the layout
segment despite the call to GETDEVICEINFO failing. The reason is that
a read may still be able to make progress on another mirror.

So instead we let the caller (in this case nfs4_ff_layout_prepare_ds())
decide whether or not it needs to invalidate.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

0a156dd5

NFS/flexfiles: refactor calls to fs4_ff_layout_prepare_ds() · 2444ff27

Trond Myklebust authored Feb 14, 2019

While we may want to skip attempting to connect to a downed mirror
when we're deciding which mirror to select for a read, we do not
want to do so once we've committed to attempting the I/O in
ff_layout_read/write_pagelist(), or ff_layout_initiate_commit()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

2444ff27

NFSv4: Handle early exit in layoutget by returning an error · 18c0778a

Trond Myklebust authored Feb 13, 2019

If the LAYOUTGET rpc call exits early without an error, convert it to
EAGAIN.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

18c0778a

NFS/flexfiles: Send LAYOUTERROR when failing over mirrored reads · f0922a6c

Trond Myklebust authored Feb 10, 2019

When a read to the preferred mirror returns an error, the flexfiles
driver records the error in the inode list and currently marks the
layout for return before failing over the attempted read to the next
mirror.
What we actually want to do is fire off a LAYOUTERROR to notify the
MDS that there is an issue with the preferred mirror, then we fail
over. Only once we've failed to read from all mirrors should we
return the layout.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

f0922a6c

01 Mar, 2019 8 commits

NFSv4.2: Add client support for the generic 'layouterror' RPC call · 3eb86093
Trond Myklebust authored Feb 08, 2019
```
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
```
3eb86093

NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated · a79f194a

Trond Myklebust authored Feb 27, 2019

If a layout segment gets invalidated while a pNFS I/O operation
is queued for transmission, then we ideally want to abort
immediately. This is particularly the case when there is a large
number of I/O related RPCs queued in the RPC layer, and the layout
segment gets invalidated due to an ENOSPC error, or an EACCES (because
the client was fenced). We may end up forced to spam the MDS with a
lot of otherwise unnecessary LAYOUTERRORs after that I/O fails.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

a79f194a

NFSv4/pnfs: Fix barriers in nfs4_mark_deviceid_unavailable() · 39a5201a

Trond Myklebust authored Feb 26, 2019

Fix the memory barriers in nfs4_mark_deviceid_unavailable() and
nfs4_test_deviceid_unavailable().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

39a5201a

NFS/flexfiles: Fix up sparse RCU annotations · 762bb7e9
Trond Myklebust authored Feb 26, 2019
```
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
```
762bb7e9

NFSv4/flexfiles: Fix invalid deref in FF_LAYOUT_DEVID_NODE() · 108bb4af

Trond Myklebust authored Feb 26, 2019

If the attempt to instantiate the mirror's layout DS pointer failed,
then that pointer may hold a value of type ERR_PTR(), so we need
to check that before we dereference it.

Fixes: 65990d1a ("pNFS/flexfiles: Fix a deadlock on LAYOUTGET")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

108bb4af

NFS: Add missing encode / decode sequence_maxsz to v4.2 operations · 1a3466ae

Anna Schumaker authored Mar 01, 2019

These really should have been there from the beginning, but we never
noticed because there was enough slack in the RPC request for the extra
bytes. Chuck's recent patch to use au_cslack and au_rslack to compute
buffer size shrunk the buffer enough that this was now a problem for
SEEK operations on my test client.

Fixes: f4ac1674 ("nfs: Add ALLOCATE support")
Fixes: 2e72448b ("NFS: Add COPY nfs operation")
Fixes: cb95deea ("NFS OFFLOAD_CANCEL xdr")
Fixes: 624bd5b7 ("nfs: Add DEALLOCATE support")
Fixes: 1c6dcbe5 ("NFS: Implement SEEK")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

1a3466ae

NFSv4.1: Don't process the sequence op more than once. · c71c46f0

Trond Myklebust authored Mar 01, 2019

Ensure that if we call nfs41_sequence_process() a second time for the
same rpc_task, then we only process the results once.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

c71c46f0

NFSv4.1: Reinitialise sequence results before retransmitting a request · c1dffe0b

Trond Myklebust authored Mar 01, 2019

If we have to retransmit a request, we should ensure that we reinitialise
the sequence results structure, since in the event of a signal
we need to treat the request as if it had not been sent.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org

c1dffe0b

26 Feb, 2019 1 commit

SUNRPC: Fix an Oops in udp_poll() · a73881c9

Trond Myklebust authored Feb 26, 2019

udp_poll() checks the struct file for the O_NONBLOCK flag, so we must not
call it with a NULL file pointer.

Fixes: 0ffe86f4 ("SUNRPC: Use poll() to fix up the socket requeue races")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

a73881c9

25 Feb, 2019 1 commit

Merge tag 'nfs-rdma-for-5.1-1' of git://git.linux-nfs.org/projects/anna/linux-nfs · 06b5fc3a

Trond Myklebust authored Feb 25, 2019

NFSoRDMA client updates for 5.1

New features:
- Convert rpc auth layer to use xdr_streams
- Config option to disable insecure enctypes
- Reduce size of RPC receive buffers

Bugfixes and cleanups:
- Fix sparse warnings
- Check inline size before providing a write chunk
- Reduce the receive doorbell rate
- Various tracepoint improvements

[Trond: Fix up merge conflicts]
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

06b5fc3a