- 20 Oct, 2021 17 commits
-
-
Anna Schumaker authored
This lets us update the server's attributes when the user does a "mount -o remount" on the filesystem. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Anna Schumaker authored
Clean up. There are a few places where we want to probe the server, but don't actually care about the fsinfo result. Change these to use nfs_probe_server(), which handles the fattr allocation for us. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Anna Schumaker authored
And rename it to nfs_probe_server(). I also change it to take the nfs_fh as an argument so callers can choose what filehandle to probe. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Anna Schumaker authored
And call it before doing an FSINFO probe to reset to the baseline capabilities before probing. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Chuck Lever authored
dprintk call sites that display no other information than the function name can be replaced with use of the trace "function" or "function_graph" plug-ins. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Chuck Lever authored
Introduce a single tracepoint that can replace simple dprintk call sites in upper layer "rpc_call_done" callbacks. Example: kworker/u24:2-1254 [001] 771.026677: rpc_stats_latency: task:00000001@00000002 xid=0x16a6f3c0 rpcbindv2 GETPORT backlog=446 rtt=101 execute=555 kworker/u24:2-1254 [001] 771.026677: rpc_task_call_done: task:00000001@00000002 flags=ASYNC|DYNAMIC|SOFT|SOFTCONN|SENT runstate=RUNNING|ACTIVE status=0 action=rpcb_getport_done kworker/u24:2-1254 [001] 771.026678: rpcb_setport: task:00000001@00000002 status=0 port=20048 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Chuck Lever authored
These new events report slightly different information for readpage and readpages/readahead. For readpage: fsx-1387 [006] 380.761896: nfs_aop_readpage: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355910932437 offset=131072 fsx-1387 [006] 380.761900: nfs_aop_readpage_done: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355910932437 offset=131072 ret=0 The index of a synchronous single-page read is reported. For readpages: fsx-1387 [006] 380.760847: nfs_aop_readahead: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355909932456 nr_pages=3 fsx-1387 [006] 380.760853: nfs_aop_readahead_done: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355909932456 nr_pages=3 ret=0 The count of pages requested is reported. nfs_readpages does not wait for the READ requests to complete. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Chuck Lever authored
Clean up: BIT() is preferred over open-coding the shift. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Chuck Lever authored
For certain special cases, RPC-related tracepoints record a -1 as the task ID or the client ID. It's ugly for a trace event to display 4 billion in these cases. To help keep SUNRPC tracepoints consistent, create a macro that defines the print format specifiers for tk_pid and cl_clid. At some point in the future we might try tk_pid with a wider range of values than 0..64K so this makes it easier to make that change. RPC tracepoints now look like this: <...>-1276 [009] 149.720358: rpc_clnt_new: client=00000005 peer=[192.168.2.55]:20049 program=nfs server=klimt.ib <...>-1342 [004] 149.921234: rpc_xdr_recvfrom: task:0000001a@00000005 head=[0xff1242d9ab6dc01c,144] page=0 tail=[(nil),0] len=144 <...>-1342 [004] 149.921235: xprt_release_cong: task:0000001a@00000005 snd_task:ffffffff cong=256 cwnd=16384 <...>-1342 [004] 149.921235: xprt_put_cong: task:0000001a@00000005 snd_task:ffffffff cong=0 cwnd=16384 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Chuck Lever authored
Clean up: this field is no longer used. xprt_rdma_pad_optimize is also no longer used, but is left in place because it is part of the kernel/userspace API. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Chuck Lever authored
This is a buffer to be left persistently registered while a connection is up. Connection tear-down will automatically DMA-unmap, invalidate, and dereg the MR. A persistently registered buffer is lower in cost to provide, and it can never be coalesced into the RDMA segment that carries the data payload. An RPC that provisions a Write chunk with a non-aligned length now uses this MR rather than the tail buffer of the RPC's rq_rcv_buf. Reviewed-By: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Alexey Gladkov authored
Fixes: 61ca2c4a ("NFS: Only reference user namespace from nfs4idmap struct instead of cred") Signed-off-by: Alexey Gladkov <legion@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Save some space in the nfs_inode by setting up an anonymous union with the fields that are peculiar to a specific type of filesystem object. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Dave Wysochanski authored
Fixes the following WARN_ON WARNING: CPU: 2 PID: 18678 at fs/nfs/inode.c:123 nfs_clear_inode+0x3b/0x50 [nfs] ... Call Trace: nfs4_evict_inode+0x57/0x70 [nfsv4] evict+0xd1/0x180 dispose_list+0x48/0x60 evict_inodes+0x156/0x190 generic_shutdown_super+0x37/0x110 nfs_kill_super+0x1d/0x40 [nfs] deactivate_locked_super+0x36/0xa0 Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
We mustn't call nfs_wb_all() on anything other than a regular file. Furthermore, we can exit early when we don't hold a delegation. Reported-by: David Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Olga reports seeing the following Oops when doing O_DIRECT writes to a pNFS flexfiles server: Oops: 0000 [#1] SMP PTI CPU: 1 PID: 234186 Comm: kworker/u8:1 Not tainted 5.15.0-rc4+ #4 Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014 Workqueue: nfsiod rpc_async_release [sunrpc] RIP: 0010:nfs_mark_request_commit+0x12/0x30 [nfs] Code: ff ff be 03 00 00 00 e8 ac 34 83 eb e9 29 ff ff ff e8 22 bc d7 eb 66 90 0f 1f 44 00 00 48 85 f6 74 16 48 8b 42 10 48 8b 40 18 <48> 8b 40 18 48 85 c0 74 05 e9 70 fc 15 ec 48 89 d6 e9 68 ed ff ff RSP: 0018:ffffa82f0159fe00 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff8f3393141880 RCX: 0000000000000000 RDX: ffffa82f0159fe08 RSI: ffff8f3381252500 RDI: ffff8f3393141880 RBP: ffff8f33ac317c00 R08: 0000000000000000 R09: ffff8f3487724cb0 R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000001 R13: ffff8f3485bccee0 R14: ffff8f33ac317c10 R15: ffff8f33ac317cd8 FS: 0000000000000000(0000) GS:ffff8f34fbc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 0000000122120006 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: nfs_direct_write_completion+0x13b/0x250 [nfs] rpc_free_task+0x39/0x60 [sunrpc] rpc_async_release+0x29/0x40 [sunrpc] process_one_work+0x1ce/0x370 worker_thread+0x30/0x380 ? process_one_work+0x370/0x370 kthread+0x11a/0x140 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 Reported-by: Olga Kornievskaia <aglo@umich.edu> Fixes: 9c455a8c ("NFS/pNFS: Clean up pNFS commit operations") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
If O_DIRECT bumps the commit_info rpcs_out field, then that could lead to fsync() hangs. The fix is to ensure that O_DIRECT calls nfs_commit_end(). Fixes: 723c921e ("sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
- 10 Oct, 2021 4 commits
-
-
Trond Myklebust authored
Partially revert commit 2ce209c4 ("NFS: Wait for requests that are locked on the commit list"), since it can lead to deadlocks between commit requests and nfs_join_page_group(). For now we should assume that any locked requests on the commit list are either about to be removed and committed by another task, or the writes they describe are about to be retransmitted. In either case, we should not need to worry. Fixes: 2ce209c4 ("NFS: Wait for requests that are locked on the commit list") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Chuck Lever authored
Generate a trace event whenever the NFS client modifies the size of a file. These new events aid troubleshooting workloads that trigger races around size updates. There are four new trace points, all named nfs_size_something so they are easy to grep for or enable as a group with a single glob. Size updated on the server: kworker/u24:10-194 [010] 369.939174: nfs_size_update: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899344277980615 cursize=250471 newsize=172083 Server-side size update reported via NFSv3 WCC attributes: fsx-1387 [006] 380.760686: nfs_size_wcc: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899355909932456 cursize=146792 newsize=171216 File has been truncated locally: fsx-1387 [007] 369.437421: nfs_size_truncate: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899231200117272 cursize=215244 newsize=0 File has been extended locally: fsx-1387 [007] 369.439213: nfs_size_grow: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899343704248410 cursize=258048 newsize=262144 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Chuck Lever authored
The current range of RPC task PIDs is 0..65535. That's not adequate for distinguishing tasks across multiple rpc_clnts running high throughput workloads. To help relieve this situation and to reduce the bottleneck of having a single atomic for assigning all RPC task PIDs, assign task PIDs per rpc_clnt. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Chuck Lever authored
Clean up: TRACE_DEFINE_ENUM is unnecessary because the target symbols are all C macros, not enums. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
- 04 Oct, 2021 19 commits
-
-
Baptiste Lepers authored
_nfs4_pnfs_v3/v4_ds_connect do some work smp_wmb ds->ds_clp = clp; And nfs4_ff_layout_prepare_ds currently does smp_rmb if(ds->ds_clp) ... This patch places the smp_rmb after the if. This ensures that following reads only happen once nfs4_ff_layout_prepare_ds has checked that data has been properly initialized. Fixes: d67ae825 ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Baptiste Lepers <baptiste.lepers@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Remove cache invalidations that are already covered by change attribute updates. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
The original premise in commit 83672d39 ("NFS: Fix directory caching problem - with test case and patch.") was that readdirplus was caching attribute information and replaying it later. This is no longer the case. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
If the directory changed while we were revalidating the dentry, then don't update the dentry verifier. There is no value in setting the verifier to an older value, and we could end up overwriting a more up to date verifier from a parallel revalidation. Fixes: efeda80d ("NFSv4: Fix revalidation of dentries with delegations") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
-
Trond Myklebust authored
If a user is doing 'ls -l', we have a heuristic in GETATTR that tells the readdir code to try to use READDIRPLUS in order to refresh the inode attributes. In certain cirumstances, we also try to invalidate the remaining directory entries in order to ensure this refresh. If there are multiple readers of the directory, we probably should avoid invalidating the page cache, since the heuristic breaks down in that situation anyway. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
-
Trond Myklebust authored
The check for duplicate readdir cookies should only care if the change attribute is invalid or the data cache is invalid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
-
Trond Myklebust authored
If we want to revalidate the directory, then just mark the change attribute as invalid. Fixes: 13c0b082 ("NFS: Replace use of NFS_INO_REVAL_PAGECACHE when checking cache validity") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
-
Trond Myklebust authored
NFS_INO_DATA_INVAL_DEFER and NFS_INO_INVALID_DATA should be considered mutually exclusive. Fixes: 1c341b77 ("NFS: Add deferred cache invalidation for close-to-open consistency violations") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
-
Trond Myklebust authored
Both NFSv3 and NFSv2 generate their change attribute from the ctime value that was supplied by the server. However the problem is that there are plenty of servers out there with ctime resolutions of 1ms or worse. In a modern performance system, this is insufficient when trying to decide which is the most recent set of attributes when, for instance, a READ or GETATTR call races with a WRITE or SETATTR. For this reason, let's revert to labelling the NFSv2/v3 change attributes as NFS4_CHANGE_TYPE_IS_UNDEFINED. This will ensure we protect against such races. Fixes: 7b24dacf ("NFS: Another inode revalidation improvement") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
-
Trond Myklebust authored
NFS4_CREATE_EXCLUSIVE does not allow the caller to set an access mode, so for most Linux filesystems, the access call ends up returning no permissions. However both NFS4_CREATE_EXCLUSIVE4_1 and NFS4_CREATE_GUARDED allow the client to set the access mode. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
All these bits are being used as bit locks. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
The clearing of the XPRT_LOCKED bit has to happen after we clear xprt->snd_task, but we don't require any extra memory barriers after that. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
The only check for RPC_TASK_RUNNING is the one in rpc_make_runnable(), which happens under the same spin lock held when we call rpc_clear_running(). Ditto, the last check for RPC_TASK_QUEUED in rpc_execute() is performed under the same lock as the one held when we call rpc_clear_queued(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Don't let xprtiod pre-empt softirq. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
Allow tasks that need to pre-empt rpciod/xprtiod to do so when it is safe. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
The premise of commit 6f9f1728 ("SUNRPC: Mitigate cond_resched() in xprt_transmit()") was that cond_resched() is expensive and unnecessary when there has been just a single send. The point of cond_resched() is to ensure that tasks that should pre-empt this one get a chance to do so when it is safe to do so. The code prior to commit 6f9f1728 failed to take into account that it was keeping a rpc_task pinned for longer than it needed to, and so rather than doing a full revert, let's just move the cond_resched. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
If the cached credential exists but doesn't have any expiration callback then exit early. Fix up atomicity issues when replacing the credential with a new one since the existing code could lead to refcount leaks. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
After the success of an operation such as rmdir() or unlink(), we expect to add the dentry back to the dcache as an ordinary negative dentry. However in NFS, unless it is labelled with the appropriate verifier for the parent directory state, then nfs_lookup_revalidate will end up discarding that dentry and forcing a new lookup. The fix is to ensure that we relabel the dentry appropriately on success. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-
Trond Myklebust authored
After the success of an operation such as link(), or symlink(), we expect to add the dentry back to the dcache as an ordinary positive dentry. However in NFS, unless it is labelled with the appropriate verifier for the parent directory state, then nfs_lookup_revalidate will end up discarding that dentry and forcing a new lookup. The fix is to ensure that we relabel the dentry appropriately on success. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-