Commits · da2e8127510525eb4bce0fe34aff06192e042c8f · nexedi / linux

27 Jun, 2015 1 commit

NFSv4.2: Fix up a decoding error in layoutstats · da2e8127

Trond Myklebust authored Jun 27, 2015

According to the spec, the server is only returning the status,
which we decode in the op header.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

da2e8127

26 Jun, 2015 5 commits

pNFS/flexfiles: Fix the reset of struct pgio_header when resending · d6208769

Trond Myklebust authored Jun 26, 2015

hdr->good_bytes needs to be set to the length of the request, not
zero.

Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

d6208769

pNFS/flexfiles: Turn off layoutcommit for servers that don't need it · c0f5f505

Trond Myklebust authored Jun 26, 2015

This patch ensures that we record the value of 'ffl_flags' from
the layout, and then checks for the presence of the
FF_FLAGS_NO_LAYOUTCOMMIT flag before deciding whether or not to
call pnfs_set_layoutcommit().

The effect is that servers now can decide whether or not they want
the client to call layoutcommit before returning a writeable layout.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

c0f5f505

Merge branch 'layoutstats' · 122d328d

Trond Myklebust authored Jun 26, 2015

* layoutstats:
  pnfs/flexfiles: protect ktime manipulation with mirror lock
  nfs: provide pnfs_report_layoutstat when NFS42 is disabled
  pnfs/flexfiles: report layoutstat regularly
  nfs42: serialize LAYOUTSTATS calls of the same file
  pnfs/flexfiles: encode LAYOUTSTATS flexfiles specific data
  pnfs/flexfiles: add ff_layout_prepare_layoutstats
  pNFS/flexfiles: track when layout is first used
  pNFS/flexfiles: add layoutstats tracking
  pNFS/flexfiles: Remove unused struct members user_name, group_name
  pnfs: add pnfs_report_layoutstat helper function
  pNFS: fill in nfs42_layoutstat_ops
  NFSv.2/pnfs Add a LAYOUTSTATS rpc function

122d328d

pnfs/flexfiles: protect ktime manipulation with mirror lock · 9bbd9bb4

Peng Tao authored Jun 26, 2015

It looks as if xchg() and cmpxchg() are not available for 64-bit integers on sparc32:

> New breakage seen in linux-next today:
>
> ERROR: "__xchg_called_with_bad_pointer" [fs/nfs/flexfilelayout/nfs_layout_flexfiles.ko] undefined!
> ERROR: "__cmpxchg_called_with_bad_pointer" [fs/nfs/flexfilelayout/nfs_layout_flexfiles.ko] undefined!
> make[2]: *** [__modpost] Error 1
> make[1]: *** [modules] Error 2

Given that mirror ktime manipulation is already under mirror->lock, let's make use of the fact.
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

9bbd9bb4

nfs: provide pnfs_report_layoutstat when NFS42 is disabled · 865a7ecb

Peng Tao authored Jun 25, 2015

kbuild test robot reported:
   fs/built-in.o: In function `pnfs_report_layoutstat':
>> (.text+0x151a1c): undefined reference to `nfs42_proc_layoutstats_generic'
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

865a7ecb

25 Jun, 2015 3 commits

nfs: verify open flags before allowing open · 18a60089

Benjamin Coddington authored Jun 25, 2015

Commit 9597c13b forbade opens with O_APPEND|O_DIRECT for NFSv4:

nfs: verify open flags before allowing an atomic open

Currently, you can open a NFSv4 file with O_APPEND|O_DIRECT, but cannot
fcntl(F_SETFL,...) with those flags. This flag combination is explicitly
forbidden on NFSv3 opens, and it seems like it should also be on NFSv4.

However, you can still open a file with O_DIRECT|O_APPEND if there exists a
cached dentry for the file because nfs4_file_open() is used instead of
nfs_atomic_open() and the check is bypassed. Add the check in
nfs4_file_open() as well.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

18a60089

nfs: always update creds in mirror, even when we have an already connected ds · 0c8315dd

Jeff Layton authored Jun 24, 2015

A ds can be associated with more than one mirror, but we currently skip
setting a mirror's credentials if we find that it's already set up with
a connected client.

The upshot is that we can end up sending DS writes with MDS credentials
instead of properly setting them up. Fix nfs4_ff_layout_prepare_ds to
always verify that the mirror's credentials are set up, even when we
have a DS that's already connected.
Reported-by: Tom Haynes <thomas.haynes@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

0c8315dd

nfs: fix potential credential leak in ff_layout_update_mirror_cred · a24221dc

Jeff Layton authored Jun 24, 2015

If we have two tasks racing to update a mirror's credentials, then they
can end up leaking one (or more) sets of credentials. The first task
will set mirror->cred and then the second task will just overwrite it.

Use a cmpxchg to ensure that the creds are only set once. If we get to
the point where we would set mirror->cred and find that they're already
set, then we just release the creds that were just found.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

a24221dc

24 Jun, 2015 10 commits

pnfs/flexfiles: report layoutstat regularly · 97ba375b

Peng Tao authored Jun 23, 2015

As a simple scheme, report every minute if IO is still going on.
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

97ba375b

nfs42: serialize LAYOUTSTATS calls of the same file · 1bfe3b25

Peng Tao authored Jun 23, 2015

There is no need to report concurrently.
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

1bfe3b25

pnfs/flexfiles: encode LAYOUTSTATS flexfiles specific data · 27c43064

Peng Tao authored Jun 23, 2015

Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

27c43064

pnfs/flexfiles: add ff_layout_prepare_layoutstats · ad4dc53e

Peng Tao authored Jun 23, 2015

It fills in the generic part of LAYOUTSTATS call. One thing to note
is that we don't really track if IO is continuous or not. So just fake
to use the completed bytes for it.

Still missing flexfiles specific part, which will be included in the next patch.
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

ad4dc53e

pNFS/flexfiles: track when layout is first used · d983803d

Peng Tao authored Jun 23, 2015

So that we can report cumulative time since the beginning
of statistics collection of the layout.
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

d983803d

pNFS/flexfiles: add layoutstats tracking · abcb7bfc

Trond Myklebust authored Jun 23, 2015

Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

abcb7bfc

pNFS/flexfiles: Remove unused struct members user_name, group_name · 27797d1b

Trond Myklebust authored Jun 23, 2015

Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

27797d1b

pnfs: add pnfs_report_layoutstat helper function · 8733408d

Peng Tao authored Jun 23, 2015

Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

8733408d

pNFS: fill in nfs42_layoutstat_ops · 1b4a4bd8

Peng Tao authored Jun 23, 2015

Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

1b4a4bd8

NFSv.2/pnfs Add a LAYOUTSTATS rpc function · be3a5d23

Trond Myklebust authored Jun 23, 2015

Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

be3a5d23

22 Jun, 2015 1 commit

Merge branch 'bugfixes' · 1372a313

Trond Myklebust authored Jun 22, 2015

* bugfixes:
  NFS: Ensure we set NFS_CONTEXT_RESEND_WRITES when requeuing writes
  pNFS: Fix a memory leak when attempted pnfs fails
  NFS: Ensure that we update the sequence id under the slot table lock
  nfs: Initialize cb_sequenceres information before validate_seqid()
  nfs: Only update callback sequnce id when CB_SEQUENCE success
  NFSv4: nfs4_handle_delegation_recall_error should ignore EAGAIN

1372a313

20 Jun, 2015 1 commit

SUNRPC: Set the TCP user timeout option on client sockets · 775f06ab

Trond Myklebust authored Jun 20, 2015

Use the TCP_USER_TIMEOUT socket option to advertise to the server
how long we will keep the connection open if there is unacknowledged
data. See RFC5482.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

775f06ab

19 Jun, 2015 2 commits

SUNRPC: Ensure we release the TCP socket once it has been closed · 4876cc77

Trond Myklebust authored Jun 19, 2015

This fixes a regression introduced by commit caf4ccd4 ("SUNRPC:
Make xs_tcp_close() do a socket shutdown rather than a sock_release").
Prior to that commit, the autoclose feature would ensure that an
idle connection would result in the socket being both disconnected and
released, whereas now only gets disconnected.

While the current behaviour is harmless, it does leave the port bound
until either RPC traffic resumes or the RPC client is shut down.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

4876cc77

SUNRPC: Handle connection issues correctly on the back channel · 3832591e

Trond Myklebust authored Jun 19, 2015

If the back channel is disconnected, we can and should just fail the
transmission. The expectation is that the NFSv4.1 server will always
retransmit any outstanding callbacks once the connection is
re-established.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

3832591e

18 Jun, 2015 4 commits

nfs: Fix comment for nfs_pageio_init() and nfs_pageio_complete_mirror() · dfad7000
Yijing Wang authored Jun 18, 2015
```
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
dfad7000

sunrpc: use sg_init_one() in krb5_rc4_setup_enc/seq_key() · d1381929

Fabian Frederick authored Jun 16, 2015

Don't opencode sg_init_one()
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

d1381929

NFS: Ensure we set NFS_CONTEXT_RESEND_WRITES when requeuing writes · c7070113

Trond Myklebust authored Jun 17, 2015

If a write attempt fails, and the write is queued up for resending to
the server, as opposed to being dropped, then we need to set the
appropriate flag so that nfs_file_fsync() does the right thing.

Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

c7070113

pNFS: Fix a memory leak when attempted pnfs fails · 1ca018d2

Trond Myklebust authored Jun 17, 2015

pnfs_do_write() expects the call to pnfs_write_through_mds() to free the
pgio header and to release the layout segment before exiting. The problem
is that nfs_pgio_data_destroy() doesn't actually do this; it only frees
the memory allocated by nfs_generic_pgio().

Ditto for pnfs_do_read()...

Fix in both cases is to add a call to hdr->release(hdr).

Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

1ca018d2

16 Jun, 2015 11 commits

Merge tag 'nfs-rdma-for-4.2' of git://git.linux-nfs.org/projects/anna/nfs-rdma · 3438995b

Trond Myklebust authored Jun 16, 2015

NFS: NFSoRDMA Client Changes

These patches continue to build up for improving the rsize and wsize that the
NFS client uses when talking over RDMA.  In addition, these patches also add
in scalability enhancements and other bugfixes.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

* tag 'nfs-rdma-for-4.2' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (142 commits)
  xprtrdma: Reduce per-transport MR allocation
  xprtrdma: Stack relief in fmr_op_map()
  xprtrdma: Split rb_lock
  xprtrdma: Remove rpcrdma_ia::ri_memreg_strategy
  xprtrdma: Remove ->ro_reset
  xprtrdma: Remove unused LOCAL_INV recovery logic
  xprtrdma: Acquire MRs in rpcrdma_register_external()
  xprtrdma: Introduce an FRMR recovery workqueue
  xprtrdma: Acquire FMRs in rpcrdma_fmr_register_external()
  xprtrdma: Introduce helpers for allocating MWs
  xprtrdma: Use ib_device pointer safely
  xprtrdma: Remove rr_func
  xprtrdma: Replace rpcrdma_rep::rr_buffer with rr_rxprt
  xprtrdma: Warn when there are orphaned IB objects
  ...

3438995b

NFSv4: Fix stateid recovery on revoked delegations · 5ba12443

Trond Myklebust authored Jun 16, 2015

Ensure that we fix the non-NULL stateid case as well.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

5ba12443

Recover from stateid-type error on SETATTR · ae2ffef3

Olga Kornievskaia authored Jun 12, 2015

Client can receives stateid-type error (eg., BAD_STATEID) on SETATTR when
delegation stateid was used. When no open state exists, in case of application
calling truncate() on the file, client has no state to recover and fails with
EIO.

Instead, upon such error, return the bad delegation and then resend the
SETATTR with a zero stateid.

Signed-off: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

ae2ffef3

nfs: Fix showing truncated fsid/dev in, /proc/net/nfsfs/volumes · df05a49f

Kinglong Mee authored Jun 13, 2015

A truncated fsid showing from /proc/fs/nfsfs/volumes as,
NV SERVER   PORT DEV     FSID              FSC
v4 c0a80881  801 0:43    34931f044c2a439b  no

It should be as,
NV SERVER   PORT DEV          FSID                              FSC
v4 c0a80881  801 0:43         34931f044c2a439b:954c5d830fa4be8c no

The max buffer length for storing "%llx:%llx" format should be
 16 + 1 + 16 + 1 = 34 (16 for %llx, 1 for ':', 1 for '\0').

Also, for storing "%u:%u" of MAJOR() and MINOR() should be
 8 + 1 + 3 + 1 = 13 (8 for 2^24, 1 for ':', 3 for 2^8, 1 for '\0').

v2, add comments for dev/fsid buffer and use sizeof in snprintf.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

df05a49f

nfs: make nfs4_init_uniform_client_string use a dynamically allocated buffer · 873e3851

Jeff Layton authored Jun 09, 2015

Change the uniform client string generator to dynamically allocate the
NFSv4 client name string buffer. With this patch, we can eliminate the
buffers that are embedded within the "args" structs and simply use the
name string that is hanging off the client.

This uniform string case is a little simpler than the nonuniform since
we don't need to deal with RCU, but we do have two different cases,
depending on whether there is a uniquifier or not.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

873e3851

nfs: make nfs4_init_nonuniform_client_string use a dynamically allocated buffer · a3192688

Jeff Layton authored Jun 09, 2015

The way the *_client_string functions work is a little goofy. They build
the string in an on-stack buffer and then use kstrdup to copy it. This
is not only stack-heavy but artificially limits the size of the client
name string. Change it so that we determine the length of the string,
allocate it and then scnprintf into it.

Since the contents of the nonuniform string depend on rcu-managed data
structures, it's possible that they'll change between when we allocate
the string and when we go to fill it. If that happens, free the string,
recalculate the length and try again. If it the mismatch isn't resolved
on the second try then just give up and return -EINVAL.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

a3192688

nfs: update maxsz values for SETCLIENTID and EXCHANGE_ID · b8fb2f59

Jeff Layton authored Jun 09, 2015

The spec allows for up to NFS4_OPAQUE_LIMIT (1k). While we'll almost
certainly never use that much, these ops are generally the only ones
in the compound so we might as well allow for them to be that large.

Also, the existing code didn't add in a word for the opaque length
field for either name string. Fix that while we're in there.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

b8fb2f59

nfs: convert setclientid and exchange_id encoders to use clp->cl_owner_id · 3a6bb738

Jeff Layton authored Jun 09, 2015

...instead of buffers that are part of their arg structs. We already
hold a reference to the client, so we might as well use the allocated
buffer. In the event that we can't allocate the clp->cl_owner_id, then
just return -ENOMEM.

Note too that we switch from a GFP_KERNEL allocation here to GFP_NOFS.
It's possible we could end up trying to do a SETCLIENTID or EXCHANGE_ID
in order to reclaim some memory, and the GFP_KERNEL allocations in the
existing code could cause recursion back into NFS reclaim.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

3a6bb738

nfs: increase size of EXCHANGE_ID name string buffer · 764ad8ba

Jeff Layton authored Jun 09, 2015

The current buffer is much too small if you have a relatively long
hostname. Bring it up to the size of the one that SETCLIENTID has.

Cc: <stable@vger.kernel.org>
Reported-by: Michael Skralivetsky <michael.skralivetsky@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

764ad8ba

pnfs/flexfiles: use swap() in ff_layout_sort_mirrors() · 455b6ee6

Fabian Frederick authored Jun 12, 2015

Use kernel.h macro definition.

Thanks to Julia Lawall for Coccinelle scripting support.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

455b6ee6

SUNRPC: never enqueue a ->rq_cong request on ->sending · 29807318

Neil Brown authored Jun 15, 2015

If the sending queue has a task without ->rq_cong set at the front,
and then a number of tasks with ->rq_cong set such that they use
the entire congestion window, then the queue deadlocks.  The first
entry cannot be processed until later entries complete.

This scenario has been seen with a client using UDP to access a server,
and the network connection breaking for a period of time - it doesn't
recover.

It never really makes sense for an ->rq_cong request to be on the ->sending
queue, but it can happen when a request is being retried, and finds
the transport if locked (XPRT_LOCKED).  In this case we simple call
__xprt_put_cong() and the deadlock goes away.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

29807318

12 Jun, 2015 2 commits

xprtrdma: Reduce per-transport MR allocation · 40c6ed0c

Chuck Lever authored May 26, 2015

Reduce resource consumption per-transport to make way for increasing
the credit limit and maximum r/wsize. Pre-allocate fewer MRs.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

40c6ed0c

xprtrdma: Stack relief in fmr_op_map() · acb9da7a

Chuck Lever authored May 26, 2015

fmr_op_map() declares a 64 element array of u64 in automatic
storage. This is 512 bytes (8 * 64) on the stack.

Instead, when FMR memory registration is in use, pre-allocate a
physaddr array for each rpcrdma_mw.

This is a pre-requisite for increasing the r/wsize maximum for
FMR on platforms with 4KB pages.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

acb9da7a