- 04 Aug, 2014 2 commits
-
-
NeilBrown authored
current_cred() can only be changed by 'current', and cred->group_info is never changed. If a new group_info is needed, a new 'cred' is created. Consequently it is always safe to access current_cred()->group_info without taking any further references. So drop the refcounting and the incorrect rcu_dereference(). Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
NeilBrown authored
1/ rcu_dereference isn't correct: that field isn't RCU protected. It could potentially change at any time so ACCESS_ONCE might be justified. changes to ->d_parent are protected by ->d_seq. However that isn't always checked after ->d_revalidate is called, so it is safest to keep the double-check that ->d_parent hasn't changed at the end of these functions. 2/ in nfs4_lookup_revalidate, "->d_parent" was forgotten. So 'parent' was not the parent of 'dentry'. This fails safe is the context is that dentry->d_inode is NULL, and the result of parent->d_inode being NULL is that ECHILD is returned, which is always safe. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
- 03 Aug, 2014 23 commits
-
-
NeilBrown authored
The access cache is used during RCU-walk path lookups, so it is best to avoid locking if possible as taking a lock kills concurrency. The rbtree is not rcu-safe and cannot easily be made so. Instead we simply check the last (i.e. most recent) entry on the LRU list. If this doesn't match, then we return -ECHILD and retry in lock/refcount mode. This requires freeing the nfs_access_entry struct with rcu, and requires using rcu access primatives when adding entries to the lru, and when examining the last entry. Calling put_rpccred before kfree_rcu looks a bit odd, but as put_rpccred already provides rcu protection, we know that the cred will not actually be freed until the next grace period, so any concurrent access will be safe. This patch provides about 5% performance improvement on a stat-heavy synthetic work load with 4 threads on a 2-core CPU. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
NeilBrown authored
It fails with -ECHILD rather than make an RPC call. This allows nfs_lookup_revalidate to call it in RCU-walk mode. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
NeilBrown authored
This requires nfs_check_verifier to take an rcu_walk flag, and requires an rcu version of nfs_revalidate_inode which returns -ECHILD rather than making an RPC call. With this, nfs_lookup_revalidate can call nfs_neg_need_reval in RCU-walk mode. We can also move the LOOKUP_RCU check past the nfs_check_verifier() call in nfs_lookup_revalidate. If RCU_WALK prevents nfs_check_verifier or nfs_neg_need_reval from doing a full check, they return a status indicating that a revalidation is required. As this revalidation will not be possible in RCU_WALK mode, -ECHILD will ultimately be returned, which is the desired result. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
NeilBrown authored
nfs_permission makes two calls which are not always safe in RCU_WALK, rpc_lookup_cred and nfs_do_access. The second can easily be made rcu-safe by aborting with -ECHILD before making the RPC call. The former can be made rcu-safe by calling rpc_lookup_cred_nonblock() instead. As this will almost always succeed, we use it even when RCU_WALK isn't being used as it still saves some spinlocks in a common case. We only fall back to rpc_lookup_cred() if rpc_lookup_cred_nonblock() fails and MAY_NOT_BLOCK isn't set. This optimisation (always trying rpc_lookup_cred_nonblock()) is particularly important when a security module is active. In that case inode_permission() may return -ECHILD from security_inode_permission() even though ->permission() succeeded in RCU_WALK mode. This leads to may_lookup() retrying inode_permission after performing unlazy_walk(). The spinlock that rpc_lookup_cred() takes is often more expensive than anything security_inode_permission() does, so that spinlock becomes the main bottleneck. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
NeilBrown authored
The new flag RPCAUTH_LOOKUP_RCU to credential lookup avoids locking, does not take a reference on the returned credential, and returns -ECHILD if a simple lookup was not possible. The returned value can only be used within an rcu_read_lock protected region. The main user of this is the new rpc_lookup_cred_nonblock() which returns a pointer to the current credential which is only rcu-safe (no ref-count held), and might return -ECHILD if allocation was required. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
NeilBrown authored
nfs_lookup_revalidate, nfs4_lookup_revalidate, and nfs_permission all need to understand and handle RCU-walk for NFS to gain the benefits of RCU-walk for cached information. Currently these functions all immediately return -ECHILD if the relevant flag (LOOKUP_RCU or MAY_NOT_BLOCK) is set. This patch pushes those tests later in the code so that we only abort immediately before we enter rcu-unsafe code. As subsequent patches make that rcu-unsafe code rcu-safe, several of these new tests will disappear. With this patch there are several paths through the code which will no longer return -ECHILD during an RCU-walk. However these are mostly error paths or other uninteresting cases. A noteworthy change in nfs_lookup_revalidate is that we don't take (or put) the reference to ->d_parent when LOOKUP_RCU is set. Rather we rcu_dereference ->d_parent, and check that ->d_inode is not NULL. We also check that ->d_parent hasn't changed after all the tests. In nfs4_lookup_revalidate we simply avoid testing LOOKUP_RCU on the path that only calls nfs_lookup_revalidate() as that function already performs the required test. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
NeilBrown authored
nfs4_lookup_revalidate only uses 'parent' to get 'dir', and only uses 'dir' if 'inode == NULL'. So we don't need to find out what 'parent' or 'dir' is until we know that 'inode' is NULL. By moving 'dget_parent' inside the 'if', we can reduce the number of call sites for 'dput(parent)'. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Alexey Khoroshilov authored
There is a couple of places in client code where returned value of try_module_get() is ignored. As a result there is a small chance to premature unload module because of unbalanced refcounting. The patch adds error handling in that places. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Weston Andros Adamson authored
Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Weston Andros Adamson authored
This is useful when lsegs need to be released while holding locks. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Weston Andros Adamson authored
nfs_page_find_head_request_locked looks through the regular nfs commit lists when the page is swapped out, but doesn't look through the pnfs commit lists. I'm not sure if anyone has hit any issues caused by this. Suggested-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Weston Andros Adamson authored
Fix the comment in nfs_page.h for PG_INODE_REF to reflect that it's no longer set only on head requests. Also add a WARN_ON_ONCE in nfs_inode_remove_request as PG_INODE_REF should always be set. Suggested-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Weston Andros Adamson authored
Return errors from wait_on_bit_lock from nfs_page_group_lock. Add a bool argument @wait to nfs_page_group_lock. If true, loop over wait_on_bit_lock until it returns cleanly. If false, return the error from wait_on_bit_lock. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
It's always 0. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
Fix the endianness handling in gss_wrap_kerberos_v1 and drop the memset call there in favor of setting the filler bytes directly. In gss_wrap_kerberos_v2, get rid of the "ec" variable which is always zero, and drop the endianness conversion of 0. Sparse handles 0 as a special case, so it's not necessary. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
Use u16 pointer in setup_token and setup_token_v2. None of the fields are actually handled as __be16, so this simplifies the code a bit. Also get rid of some unneeded pointer increments. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
The handling of the gc_ctx pointer only seems to be partially RCU-safe. The assignment and freeing are done using RCU, but many places in the code seem to dereference that pointer without proper RCU safeguards. Fix them to use rcu_dereference and to rcu_read_lock/unlock, and to properly handle the case where the pointer is NULL. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
Commit 5b22216e (nfs: __rcu annotations) added a __rcu annotation to the gc_gss_ctx field. I see no rationale for adding that though, as that field does not seem to be managed via RCU at all. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
NeilBrown authored
If you have an NFSv4 mounted directory which does not container 'foo' and: ls -l foo ssh $server touch foo cat foo then the 'cat' will fail (usually, depending a bit on the various cache ages). This is correct as negative looks are cached by default. However with the same initial conditions: cat foo ssh $server touch foo cat foo will usually succeed. This is because an "open" does not add a negative dentry to the dcache, while a "lookup" does. This can have negative performance effects. When "gcc" searches for an include file, it will try to "open" the file in every director in the search path. Without caching of negative "open" results, this generates much more traffic to the server than it should (or than NFSv3 does). The root of the problem is that _nfs4_open_and_get_state() will call d_add_unique() on a positive result, but not on a negative result. Compare with nfs_lookup() which calls d_materialise_unique on both a positive result and on ENOENT. This patch adds a call d_add() in the ENOENT case for _nfs4_open_and_get_state() and also calls nfs_set_verifier(). With it, many fewer "open" requests for known-non-existent files are sent to the server. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Andrey Utkin authored
There was a check for result being not NULL. But get_acl() may return NULL, or ERR_PTR, or actual pointer. The purpose of the function where current change is done is to "list ACLs only when they are available", so any error condition of get_acl() mustn't be elevated, and returning 0 there is still valid. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81111Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Fixes: 74adf83f (nfs: only show Posix ACLs in listxattr if actually...) Cc: stable@vger.kernel.org # 3.14+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
git://git.linux-nfs.org/projects/anna/nfs-rdmaTrond Myklebust authored
* 'nfs-rdma' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (916 commits) xprtrdma: Handle additional connection events xprtrdma: Remove RPCRDMA_PERSISTENT_REGISTRATION macro xprtrdma: Make rpcrdma_ep_disconnect() return void xprtrdma: Schedule reply tasklet once per upcall xprtrdma: Allocate each struct rpcrdma_mw separately xprtrdma: Rename frmr_wr xprtrdma: Disable completions for LOCAL_INV Work Requests xprtrdma: Disable completions for FAST_REG_MR Work Requests xprtrdma: Don't post a LOCAL_INV in rpcrdma_register_frmr_external() xprtrdma: Reset FRMRs after a flushed LOCAL_INV Work Request xprtrdma: Reset FRMRs when FAST_REG_MR is flushed by a disconnect xprtrdma: Properly handle exhaustion of the rb_mws list xprtrdma: Chain together all MWs in same buffer pool xprtrdma: Back off rkey when FAST_REG_MR fails xprtrdma: Unclutter struct rpcrdma_mr_seg xprtrdma: Don't invalidate FRMRs if registration fails xprtrdma: On disconnect, don't ignore pending CQEs xprtrdma: Update rkeys after transport reconnect xprtrdma: Limit data payload size for ALLPHYSICAL xprtrdma: Protect ia->ri_id when unmapping/invalidating MRs ...
-
Trond Myklebust authored
This may be used to limit the number of cached credentials building up inside the access cache. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
In some cases where the credentials are not often reused, we may want to limit their total number just in order to make the negative lookups in the hash table more manageable. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
- 31 Jul, 2014 15 commits
-
-
Chuck Lever authored
Commit 38ca83a5 added RDMA_CM_EVENT_TIMEWAIT_EXIT. But that status is relevant only for consumers that re-use their QPs on new connections. xprtrdma creates a fresh QP on reconnection, so that event should be explicitly ignored. Squelch the alarming "unexpected CM event" message. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Clean up. RPCRDMA_PERSISTENT_REGISTRATION was a compile-time switch between RPCRDMA_REGISTER mode and RPCRDMA_ALLPHYSICAL mode. Since RPCRDMA_REGISTER has been removed, there's no need for the extra conditional compilation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Clean up: The return code is used only for dprintk's that are already redundant. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Minor optimization: grab rpcrdma_tk_lock_g and disable hard IRQs just once after clearing the receive completion queue. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Currently rpcrdma_buffer_create() allocates struct rpcrdma_mw's as a single contiguous area of memory. It amounts to quite a bit of memory, and there's no requirement for these to be carved from a single piece of contiguous memory. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Clean up: Name frmr_wr after the opcode of the Work Request, consistent with the send and local invalidation paths. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Instead of relying on a completion to change the state of an FRMR to FRMR_IS_INVALID, set it in advance. If an error occurs, a completion will fire anyway and mark the FRMR FRMR_IS_STALE. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Instead of relying on a completion to change the state of an FRMR to FRMR_IS_VALID, set it in advance. If an error occurs, a completion will fire anyway and mark the FRMR FRMR_IS_STALE. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Any FRMR arriving in rpcrdma_register_frmr_external() is now guaranteed to be either invalid, or to be targeted by a queued LOCAL_INV that will invalidate it before the adapter processes the FAST_REG_MR being built here. The problem with current arrangement of chaining a LOCAL_INV to the FAST_REG_MR is that if the transport is not connected, the LOCAL_INV is flushed and the FAST_REG_MR is flushed. This leaves the FRMR valid with the old rkey. But rpcrdma_register_frmr_external() has already bumped the in-memory rkey. Next time through rpcrdma_register_frmr_external(), a LOCAL_INV and FAST_REG_MR is attempted again because the FRMR is still valid. But the rkey no longer matches the hardware's rkey, and a memory management operation error occurs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
When a LOCAL_INV Work Request is flushed, it leaves an FRMR in the VALID state. This FRMR can be returned by rpcrdma_buffer_get(), and must be knocked down in rpcrdma_register_frmr_external() before it can be re-used. Instead, capture these in rpcrdma_buffer_get(), and reset them. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
FAST_REG_MR Work Requests update a Memory Region's rkey. Rkey's are used to block unwanted access to the memory controlled by an MR. The rkey is passed to the receiver (the NFS server, in our case), and is also used by xprtrdma to invalidate the MR when the RPC is complete. When a FAST_REG_MR Work Request is flushed after a transport disconnect, xprtrdma cannot tell whether the WR actually hit the adapter or not. So it is indeterminant at that point whether the existing rkey is still valid. After the transport connection is re-established, the next FAST_REG_MR or LOCAL_INV Work Request against that MR can sometimes fail because the rkey value does not match what xprtrdma expects. The only reliable way to recover in this case is to deregister and register the MR before it is used again. These operations can be done only in a process context, so handle it in the transport connect worker. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
If the rb_mws list is exhausted, clean up and return NULL so that call_allocate() will delay and try again. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
During connection loss recovery, need to visit every MW in a buffer pool. Any MW that is in use by an RPC will not be on the rb_mws list. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
If posting a FAST_REG_MR Work Reqeust fails, revert the rkey update to avoid subsequent IB_WC_MW_BIND_ERR completions. Suggested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Clean ups: - make it obvious that the rl_mw field is a pointer -- allocated separately, not as part of struct rpcrdma_mr_seg - promote "struct {} frmr;" to a named type - promote the state enum to a named type - name the MW state field the same way other fields in rpcrdma_mw are named Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-