Commits · b70ae915e4282854fb7864519e5ec559ab2de7c3 · Kirill Smelkov / linux

09 Feb, 2015 12 commits

SUNRPC: Handle connection reset more efficiently. · b70ae915

Trond Myklebust authored Feb 09, 2015

If the connection reset is due to an active call on our side, then
the state change is sometimes not reported. Catch those instances
using xs_error_report() instead.
Also remove the xs_tcp_shutdown() call in xs_tcp_send_request() as
the change in behaviour makes it redundant.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

b70ae915

SUNRPC: Remove the redundant XPRT_CONNECTION_CLOSE flag · 9e2b9f37
Trond Myklebust authored Feb 08, 2015
```
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
9e2b9f37

SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release · caf4ccd4

Trond Myklebust authored Feb 09, 2015

Use of socket shutdown() means that we monitor the shutdown process
through the xs_tcp_state_change() callback, so it is preferable to
a full close in all cases unless we're destroying the transport.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

caf4ccd4

SUNRPC: Ensure xs_tcp_shutdown() requests a full close of the connection · 0efeac26

Trond Myklebust authored Feb 09, 2015

The previous behaviour left the connection half-open in order to try
to scrape the last replies from the socket. Now that we have more reliable
reconnection, change the behaviour to close down the socket faster.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

0efeac26

SUNRPC: Cleanup to remove remaining uses of XPRT_CONNECTION_ABORT · 505936f5
Trond Myklebust authored Feb 08, 2015
```
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
505936f5

SUNRPC: Remove TCP socket linger code · 9cbc94fb

Trond Myklebust authored Feb 08, 2015

Now that we no longer use the partial shutdown code when closing the
socket, we no longer need to worry about the TCP linger2 state.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

9cbc94fb

SUNRPC: Remove TCP client connection reset hack · 4efdd92c

Trond Myklebust authored Feb 08, 2015

Instead we rely on SO_REUSEPORT to provide the reconnection semantics
that we need for NFSv2/v3.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

4efdd92c

SUNRPC: TCP/UDP always close the old socket before reconnecting · de84d890

Trond Myklebust authored Feb 08, 2015

It is not safe to call xs_reset_transport() from inside xs_udp_setup_socket()
or xs_tcp_setup_socket(), since they do not own the correct locks. Instead,
do it in xs_connect().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

de84d890

SUNRPC: Add helpers to prevent socket create from racing · 718ba5b8

Trond Myklebust authored Feb 08, 2015

The socket lock is currently held by the task that is requesting the
connection be established. While that is efficient in the case where
the connection happens quickly, it is racy in the case where it doesn't.
What we really want is for the connect helper to be able to block access
to the socket while it is being set up.

This patch does so by arranging to transfer the socket lock from the
task that is requesting the connect attempt, and then releasing that
lock once everything is done.
This scheme also gives us automatic protection against collisions with
the RPC close code, so we can kill the cancel_delayed_work_sync()
call in xs_close().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

718ba5b8

SUNRPC: Ensure xs_reset_transport() resets the close connection flags · 6cc7e908
Trond Myklebust authored Feb 08, 2015
```
Otherwise, we may end up looping.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
6cc7e908

SUNRPC: Do not clear the source port in xs_reset_transport · 76698b23

Trond Myklebust authored Feb 08, 2015

Now that we can reuse bound ports after a close, we never really want to
clear the transport's source port after it has been set. Doing so really
messes up the NFSv3 DRC on the server.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

76698b23

SUNRPC: Handle EADDRINUSE on connect · 3913c78c

Trond Myklebust authored Feb 08, 2015

Now that we're setting SO_REUSEPORT, we still need to handle the
case where a connect() is attempted, but the old socket is still
lingering.
Essentially, all we want to do here is handle the error by waiting
a few seconds and then retrying.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

3913c78c

08 Feb, 2015 2 commits

SUNRPC: Set SO_REUSEPORT socket option for TCP connections · 4dda9c8a

Trond Myklebust authored Feb 08, 2015

When using TCP, we need the ability to reuse port numbers after
a disconnection, so that the NFSv3 server knows that we're the same
client. Currently we use a hack to work around the TCP socket's
TIME_WAIT: we send an RST instead of closing, which doesn't
always work...
The SO_REUSEPORT option added in Linux 3.9 allows us to bind multiple
TCP connections to the same source address+port combination, and thus
to use ordinary TCP close() instead of the current hack.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

4dda9c8a

Merge tag 'nfs-rdma-for-3.20-part-2' of git://git.linux-nfs.org/projects/anna/nfs-rdma · bc3203cd

Trond Myklebust authored Feb 08, 2015

NFS: RDMA Client Sparse Fixes

This patch fixes a sparse warning in the initial submission.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

* tag 'nfs-rdma-for-3.20-part-2' of git://git.linux-nfs.org/projects/anna/nfs-rdma:
  xprtrdma: Address sparse complaint in rpcr_to_rdmar()

bc3203cd

06 Feb, 2015 5 commits

NFSv4.1: Fix pnfs_put_lseg races · 4ef2e4f8

Trond Myklebust authored Feb 05, 2015

pnfs_layoutreturn_free_lseg_async() can also race with inode put in
the general case. We can now fix this, and also simplify the code.

Cc: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

4ef2e4f8

NFSv4.1: pnfs_send_layoutreturn should use GFP_NOFS · e4af440a

Trond Myklebust authored Feb 05, 2015

In we want to be able to call pnfs_send_layoutreturn() from within the
writeback path, we really want it to use GFP_NOFS in order to prevent
recursion.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

e4af440a

NFSv4.1: Pin the inode and super block in asynchronous layoutreturns · 5a0ec8ac

Trond Myklebust authored Feb 05, 2015

If we're sending an asynchronous layoutreturn, then we need to ensure
that the inode and the super block remain pinned.

Cc: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Peng Tao <tao.peng@primarydata.com>

5a0ec8ac

NFSv4.1: Pin the inode and super block in asynchronous layoutcommit · 472e2594

Trond Myklebust authored Feb 05, 2015

If we're sending an asynchronous layoutcommit, then we need to ensure
that the inode and the super block remain pinned.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Peng Tao <tao.peng@primarydata.com>

472e2594

NFSv4: Ensure we reference the inode for return-on-close in delegreturn · ea7c38fe

Trond Myklebust authored Feb 05, 2015

If we have to do a return-on-close in the delegreturn code, then
we must ensure that the inode and super block remain referenced.

Cc: Peng Tao <tao.peng@primarydata.com>
Cc: stable@vger.kernel.org # 3.17.x
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Peng Tao <tao.peng@primarydata.com>

ea7c38fe

05 Feb, 2015 1 commit

xprtrdma: Address sparse complaint in rpcr_to_rdmar() · b625a616

Chuck Lever authored Feb 04, 2015

With "make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__":

linux-2.6/net/sunrpc/xprtrdma/xprt_rdma.h:273:30: warning: incorrect
  type in initializer (different base types)
linux-2.6/net/sunrpc/xprtrdma/xprt_rdma.h:273:30: expected restricted
  __be32 [usertype] *buffer
linux-2.6/net/sunrpc/xprtrdma/xprt_rdma.h:273:30:    got unsigned int
  [usertype] *rq_buffer

As far as I can tell this is a false positive.

Reported-by: kbuild-all@01.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

b625a616

04 Feb, 2015 2 commits

NFSv4.1: Ask for no delegation on OPEN if using O_DIRECT · 6ae37339

Trond Myklebust authored Jan 30, 2015

If we're using NFSv4.1, then we have the ability to let the server know
whether or not we believe that returning a delegation as part of our OPEN
request would be useful.
The feature needs to be used with care, since the client sending the request
doesn't necessarily know how other clients are using that file, and how
they may be affected by the delegation.
For this reason, our initial use of the feature will be to let the server
know when the client believes that handing out a delegation would not be
useful.
The first application for this function is when opening the file using
O_DIRECT.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

6ae37339

NFS: Add Anna Schumaker as co-maintainer for the NFS client · 0e3b137f

Trond Myklebust authored Feb 03, 2015

Anna has essentially been performing the duties of co-maintainer for
the past several years. In recognition of those efforts, I'd like to
add her to the maintainers file.

Cc: Anna Schumaker <anna.schumaker@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

0e3b137f

03 Feb, 2015 18 commits

SUNRPC: NULL utsname dereference on NFS umount during namespace cleanup · 03a9a42a

Trond Myklebust authored Jan 30, 2015

Fix an Oopsable condition when nsm_mon_unmon is called as part of the
namespace cleanup, which now apparently happens after the utsname
has been freed.

Link: http://lkml.kernel.org/r/20150125220604.090121ae@neptune.homeReported-by: Bruno Prémont <bonbons@linux-vserver.org>
Cc: stable@vger.kernel.org # 3.18
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

03a9a42a

Merge branch 'flexfiles' · e2c63e09

Trond Myklebust authored Feb 03, 2015

* flexfiles: (53 commits)
  pnfs: lookup new lseg at lseg boundary
  nfs41: .init_read and .init_write can be called with valid pg_lseg
  pnfs: Update documentation on the Layout Drivers
  pnfs/flexfiles: Add the FlexFile Layout Driver
  nfs: count DIO good bytes correctly with mirroring
  nfs41: wait for LAYOUTRETURN before retrying LAYOUTGET
  nfs: add a helper to set NFS_ODIRECT_RESCHED_WRITES to direct writes
  nfs41: add NFS_LAYOUT_RETRY_LAYOUTGET to layout header flags
  nfs/flexfiles: send layoutreturn before freeing lseg
  nfs41: introduce NFS_LAYOUT_RETURN_BEFORE_CLOSE
  nfs41: allow async version layoutreturn
  nfs41: add range to layoutreturn args
  pnfs: allow LD to ask to resend read through pnfs
  nfs: add nfs_pgio_current_mirror helper
  nfs: only reset desc->pg_mirror_idx when mirroring is supported
  nfs41: add a debug warning if we destroy an unempty layout
  pnfs: fail comparison when bucket verifier not set
  nfs: mirroring support for direct io
  nfs: add mirroring support to pgio layer
  pnfs: pass ds_commit_idx through the commit path
  ...

Conflicts:
	fs/nfs/pnfs.c
	fs/nfs/pnfs.h

e2c63e09

pnfs: lookup new lseg at lseg boundary · 7c13789e

Weston Andros Adamson authored Jan 30, 2015

Before mirroring support was added, the pageio descriptor's pg_lseg was
set to null when an RPC was sent. Because of this, pg_init was called
at lseg boundaries with pg_lseg = NULL, and it could be set to the new
lseg.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>

7c13789e

nfs41: .init_read and .init_write can be called with valid pg_lseg · cb5d04bc

Peng Tao authored Jan 24, 2015

With pgio refactoring in v3.15, .init_read and .init_write can be
called with valid pgio->pg_lseg. file layout was fixed at that time
by commit c6194271 (pnfs: filelayout: support non page aligned
layouts). But the generic helper still needs to be fixed.

Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Peng Tao <tao.peng@primarydata.com>

cb5d04bc

pnfs: Update documentation on the Layout Drivers · 8f9cdcb2
Tom Haynes authored Jan 12, 2015
```
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
```
8f9cdcb2

pnfs/flexfiles: Add the FlexFile Layout Driver · d67ae825

Tom Haynes authored Dec 11, 2014

The flexfile layout is a new layout that extends the
file layout. It is currently being drafted as a specification at
https://datatracker.ietf.org/doc/draft-ietf-nfsv4-layout-types/Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Tao Peng <bergwolf@primarydata.com>

d67ae825

nfs: count DIO good bytes correctly with mirroring · 5fadeb47

Peng Tao authored Jan 19, 2015

When resending to MDS, we might resend multiple mirroring
requests to MDS. As a result, nfs_direct_good_bytes() ends
up counting bytes multiple times, causing application to
get wrong return results in read/write syscalls.

Fix it by tracking start of a dreq and checking the range of
pgio header.

Cc: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>

5fadeb47

nfs41: wait for LAYOUTRETURN before retrying LAYOUTGET · aa8a45ee

Peng Tao authored Dec 01, 2014

Also take care to stop waiting if someone clears retry bit.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>

aa8a45ee

nfs: add a helper to set NFS_ODIRECT_RESCHED_WRITES to direct writes · 012fa16d
Peng Tao authored Dec 01, 2014
```
To allow pnfs LD to ask direct writes to be resend.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
```
012fa16d

nfs41: add NFS_LAYOUT_RETRY_LAYOUTGET to layout header flags · c829013d

Peng Tao authored Dec 01, 2014

Use it to indicate that LD wants to retry layoutget. LD can set
it whenever it wants the common pnfs code to return and retry
pnfs path through a new layout.

The bit gets cleared when client does a new layoutget, when client
closes the file (ROC case), or when kernel needs to evict the inode
(non-ROC case).
Signed-off-by: Peng Tao <tao.peng@primarydata.com>

c829013d

nfs/flexfiles: send layoutreturn before freeing lseg · 27b6f539

Peng Tao authored Oct 20, 2014

Otherwise we'll lose error tracking information when
encoding layoutreturn.

pnfs_put_lseg may be called from rpc callbacks. So we should not
call pnfs_send_layoutreturn directly because it can deadlock in
the rpc layer.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>

27b6f539

nfs41: introduce NFS_LAYOUT_RETURN_BEFORE_CLOSE · 193e3aa2

Peng Tao authored Nov 17, 2014

When it is set, generic pnfs would try to send layoutreturn right
before last close/delegation_return regard less NFS_LAYOUT_ROC is
set or not. LD can then make sure layoutreturn is always sent
rather than being omitted.

The difference against NFS_LAYOUT_RETURN is that
NFS_LAYOUT_RETURN_BEFORE_CLOSE does not block usage of the layout so
LD can set it and expect generic layer to try pnfs path at the
same time.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>

193e3aa2

nfs41: allow async version layoutreturn · 6c16605d

Peng Tao authored Nov 17, 2014

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>

6c16605d

nfs41: add range to layoutreturn args · 15eb67c1

Peng Tao authored Nov 17, 2014

So that callers can specify which range to return.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>

15eb67c1

pnfs: allow LD to ask to resend read through pnfs · ceb11e13

Peng Tao authored Nov 10, 2014

If current IO cannot be completed due to some transient errors,
LD may want to ask generic layer to resend the request through
pnfs again.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>

ceb11e13

nfs: add nfs_pgio_current_mirror helper · 48d635f1

Peng Tao authored Nov 10, 2014

Let it return current nfs_pgio_mirror in use depending on pg_mirror_count.
For read, we always use pg_mirrors[0], so this effectively gives us freedom
to use pg_mirror_idx to track the actual mirror to read from through out the
IO stack.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>

48d635f1

nfs: only reset desc->pg_mirror_idx when mirroring is supported · 47af81f2

Peng Tao authored Nov 10, 2014

so that we don't reset desc->pg_mirror_idx for read unnecessarily.
Remove WARN_ON_ONCE from __nfs_pageio_add_request to allow LD to
set pg_mirror_idx for read where pg_mirror_count is always 1.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Tom Haynes <loghyr@primarydata.com>

47af81f2

nfs41: add a debug warning if we destroy an unempty layout · 566f8737

Peng Tao authored Oct 10, 2014

So that we can detect the case if some layout segments are still
pinned which is surely a bug that we need to fix.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>

566f8737