- 27 Nov, 2014 2 commits
-
-
Jeff Layton authored
Add a new directory heirarchy under the debugfs sunrpc/ directory: sunrpc/ rpc_xprt/ <xprt id>/ Within that directory, we can put files that give info about the xprts. We do have the (minor) problem that there is no succinct, unique identifier for rpc_xprts. So we generate them synthetically with a static atomic_t counter. For now, this directory just holds an "info" file, but we may add other files to it in the future. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
It's possible to get a dump of the RPC task queue by writing a value to /proc/sys/sunrpc/rpc_debug. If you write any value to that file, you get a dump of the RPC client task list into the log buffer. This is a rather inconvenient interface however, and makes it hard to get immediate info about the task queue. Add a new directory hierarchy under debugfs: sunrpc/ rpc_clnt/ <clientid>/ Within each clientid directory we create a new "tasks" file that will dump info similar to what shows up in the log buffer, but with a few small differences -- we avoid printing raw kernel addresses in favor of symbolic names and the XID is also displayed. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
- 26 Nov, 2014 2 commits
-
-
git://git.linux-nfs.org/projects/anna/nfs-rdmaTrond Myklebust authored
Pull NFS client RDMA changes for 3.19 from Anna Schumaker: "NFS: Client side changes for RDMA These patches various bugfixes and cleanups for using NFS over RDMA, including better error handling and performance improvements by using pad optimization. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>" * tag 'nfs-rdma-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma: xprtrdma: Display async errors xprtrdma: Enable pad optimization xprtrdma: Re-write rpcrdma_flush_cqs() xprtrdma: Refactor tasklet scheduling xprtrdma: unmap all FMRs during transport disconnect xprtrdma: Cap req_cqinit xprtrdma: Return an errno from rpcrdma_register_external()
-
git://git.linux-nfs.org/projects/anna/nfs-rdmaTrond Myklebust authored
Pull pull additional NFS client changes for 3.19 from Anna Schumaker: "NFS: Generic client side changes from Chuck These patches fixes for iostats and SETCLIENTID in addition to cleaning up the nfs4_init_callback() function. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>" * tag 'nfs-cel-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma: NFS: Clean up nfs4_init_callback() NFS: SETCLIENTID XDR buffer sizes are incorrect SUNRPC: serialize iostats updates
-
- 25 Nov, 2014 15 commits
-
-
Anna Schumaker authored
This patch adds support for using the NFS v4.2 operation DEALLOCATE to punch holes in a file. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Anna Schumaker authored
This patch adds support for using the NFS v4.2 operation ALLOCATE to preallocate data in a file. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Chuck Lever authored
nfs4_init_callback() is never invoked for NFS versions other than 4. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Use the correct calculation of the maximum size of a clientaddr4 when encoding and decoding SETCLIENTID operations. clientaddr4 is defined in section 2.2.10 of RFC3530bis-31. The usage in encode_setclientid_maxsz is missing the 4-byte length in both strings, but is otherwise correct. decode_setclientid_maxsz simply asks for a page of receive buffer space, which is unnecessarily large (more than 4KB). Note that a SETCLIENTID reply is either clientid+verifier, or clientaddr4, depending on the returned NFS status. It doesn't hurt to allocate enough space for both. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Occasionally mountstats reports a negative retransmission rate. Ensure that two RPCs completing concurrently don't confuse the sums in the transport's op_metrics array. Since pNFS filelayout can invoke rpc_count_iostats() on another transport from xprt_release(), we can't rely on simply holding the transport_lock in xprt_release(). There's nothing for it but hard serialization. One spin lock per RPC operation should make this as painless as it can be. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
An async error upcall is a hard error, and should be reported in the system log. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
The Linux NFS/RDMA server used to reject NFSv3 WRITE requests when pad optimization was enabled. That bug was fixed by commit e560e3b5 ("svcrdma: Add zero padding if the client doesn't send it"). We can now enable pad optimization on the client, which helps performance and is supported now by both Linux and Solaris servers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Currently rpcrdma_flush_cqs() attempts to avoid code duplication, and simply invokes rpcrdma_recvcq_upcall and rpcrdma_sendcq_upcall. 1. rpcrdma_flush_cqs() can run concurrently with provider upcalls. Both flush_cqs() and the upcalls were invoking ib_poll_cq() in different threads using the same wc buffers (ep->rep_recv_wcs and ep->rep_send_wcs), added by commit 1c00dd07 ("xprtrmda: Reduce calls to ib_poll_cq() in completion handlers"). During transport disconnect processing, this sometimes resulted in the same reply getting added to the rpcrdma_tasklets_g list more than once, which corrupted the list. 2. The upcall functions drain only a limited number of CQEs, thanks to the poll budget added by commit 8301a2c0 ("xprtrdma: Limit work done by completion handler"). Fixes: a7bc211a ("xprtrdma: On disconnect, don't ignore ... ") BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=276Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Restore the separate function that schedules the reply handling tasklet. I need to call it from two different paths. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
When using RPCRDMA_MTHCAFMR memory registration, after a few transport disconnect / reconnect cycles, ib_map_phys_fmr() starts to return EINVAL because the provider has exhausted its map pool. Make sure that all FMRs are unmapped during transport disconnect, and that ->send_request remarshals them during an RPC retransmit. This resets the transport's MRs to ensure that none are leaked during a disconnect. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
Recent work made FRMR registration and invalidation completions unsignaled. This greatly reduces the adapter interrupt rate. Every so often, however, a posted send Work Request is allowed to signal. Otherwise, the provider's Work Queue will wrap and the workload will hang. The number of Work Requests that are allowed to remain unsignaled is determined by the value of req_cqinit. Currently, this is set to the size of the send Work Queue divided by two, minus 1. For FRMR, the send Work Queue is the maximum number of concurrent RPCs (currently 32) times the maximum number of Work Requests an RPC might use (currently 7, though some adapters may need more). For mlx4, this is 224 entries. This leaves completion signaling disabled for 111 send Work Requests. Some providers hold back dispatching Work Requests until a CQE is generated. If completions are disabled, then no CQEs are generated for quite some time, and that can stall the Work Queue. I've seen this occur running xfstests generic/113 over NFSv4, where eventually, posting a FAST_REG_MR Work Request fails with -ENOMEM because the Work Queue has overflowed. The connection is dropped and re-established. Cap the rep_cqinit setting so completions are not left turned off for too long. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=269Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Chuck Lever authored
The RPC/RDMA send_request method and the chunk registration code expects an errno from the registration function. This allows the upper layers to distinguish between a recoverable failure (for example, temporary memory exhaustion) and a hard failure (for example, a bug in the registration logic). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Li RongQing authored
Define and use nfs_inc_fscache_stats when plus one, which can save to pass one parameter. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Li RongQing authored
Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Markus Elfring authored
The nfs_put_client() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
- 24 Nov, 2014 11 commits
-
-
Jeff Layton authored
It's always set to the same value as CONFIG_TRACEPOINTS, so we can just use that instead. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
It's always set to whatever CONFIG_SUNRPC_DEBUG is, so just use that. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
LOCKD_DEBUG is always the same value as CONFIG_SUNRPC_DEBUG, so we can just use it instead. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Peng Tao authored
nfs4_layoutget_release() drops layout hdr refcnt. Grab the refcnt early so that it is safe to call .release in case nfs4_alloc_pages fails. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Fixes: a47970ff ("NFSv4.1: Hold reference to layout hdr in layoutget") Cc: stable@vger.kernel.org # 3.9+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Weston Andros Adamson authored
Recent work in the pgio layer made it possible for there to be more than one request per page. This caused a subtle change in commit behavior, because write.c:nfs_commit_unstable_pages compares the number of *pages* waiting for writeback against the number of requests on a commit list to choose when to send a COMMIT in a non-blocking flush. This is probably hard to hit in normal operation - you have to be using rsize/wsize < PAGE_SIZE, or pnfs with lots of boundaries that are not page aligned to have a noticeable change in behavior. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Christoph Hellwig authored
Use the number of pages in the pagecache mapping instead of the number of pnfs requests which is only slightly related. Reported-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
Add tracepoints inside the main loop on xs_tcp_data_recv that allow us to keep an eye on what's happening during each phase of it. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
...so we can keep track of when calls are sent and replies received. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
...just around svc_send, svc_recv and svc_process for now. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Anna Schumaker authored
This should make the code easier to maintain in the future. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jan Kara authored
NFS4ERR_ACCESS has number 13 and thus is matched and returned immediately at the beginning of nfs4_map_errors() and there's no point in checking it later. Coverity-id: 733891 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
- 23 Nov, 2014 10 commits
-
-
Linus Torvalds authored
-
Andy Lutomirski authored
x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but not on non-paranoid returns. I suspect that this is a mistake and that the code only works because int3 is paranoid. Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround for the x86 bug. With that bug fixed, we can remove _TIF_NOTIFY_RESUME from the uprobes code. Reported-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Thomas Gleixner authored
Chris bisected a NULL pointer deference in task_sched_runtime() to commit 6e998916 'sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency'. Chris observed crashes in atop or other /proc walking programs when he started fork bombs on his machine. He assumed that this is a new exit race, but that does not make any sense when looking at that commit. What's interesting is that, the commit provides update_curr callbacks for all scheduling classes except stop_task and idle_task. While nothing can ever hit that via the clock_nanosleep() and clock_gettime() interfaces, which have been the target of the commit in question, the author obviously forgot that there are other code paths which invoke task_sched_runtime() do_task_stat(() thread_group_cputime_adjusted() thread_group_cputime() task_cputime() task_sched_runtime() if (task_current(rq, p) && task_on_rq_queued(p)) { update_rq_clock(rq); up->sched_class->update_curr(rq); } If the stats are read for a stomp machine task, aka 'migration/N' and that task is current on its cpu, this will happily call the NULL pointer of stop_task->update_curr. Ooops. Chris observation that this happens faster when he runs the fork bomb makes sense as the fork bomb will kick migration threads more often so the probability to hit the issue will increase. Add the missing update_curr callbacks to the scheduler classes stop_task and idle_task. While idle tasks cannot be monitored via /proc we have other means to hit the idle case. Fixes: 6e998916 'sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency' Reported-by: Chris Mason <clm@fb.com> Reported-and-tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
Merge x86-64 iret fixes from Andy Lutomirski: "This addresses the following issues: - an unrecoverable double-fault triggerable with modify_ldt. - invalid stack usage in espfix64 failed IRET recovery from IST context. - invalid stack usage in non-espfix64 failed IRET recovery from IST context. It also makes a good but IMO scary change: non-espfix64 failed IRET will now report the correct error. Hopefully nothing depended on the old incorrect behavior, but maybe Wine will get confused in some obscure corner case" * emailed patches from Andy Lutomirski <luto@amacapital.net>: x86_64, traps: Rework bad_iret x86_64, traps: Stop using IST for #SS x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C
-
Andy Lutomirski authored
It's possible for iretq to userspace to fail. This can happen because of a bad CS, SS, or RIP. Historically, we've handled it by fixing up an exception from iretq to land at bad_iret, which pretends that the failed iret frame was really the hardware part of #GP(0) from userspace. To make this work, there's an extra fixup to fudge the gs base into a usable state. This is suboptimal because it loses the original exception. It's also buggy because there's no guarantee that we were on the kernel stack to begin with. For example, if the failing iret happened on return from an NMI, then we'll end up executing general_protection on the NMI stack. This is bad for several reasons, the most immediate of which is that general_protection, as a non-paranoid idtentry, will try to deliver signals and/or schedule from the wrong stack. This patch throws out bad_iret entirely. As a replacement, it augments the existing swapgs fudge into a full-blown iret fixup, mostly written in C. It's should be clearer and more correct. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andy Lutomirski authored
On a 32-bit kernel, this has no effect, since there are no IST stacks. On a 64-bit kernel, #SS can only happen in user code, on a failed iret to user space, a canonical violation on access via RSP or RBP, or a genuine stack segment violation in 32-bit kernel code. The first two cases don't need IST, and the latter two cases are unlikely fatal bugs, and promoting them to double faults would be fine. This fixes a bug in which the espfix64 code mishandles a stack segment violation. This saves 4k of memory per CPU and a tiny bit of code. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andy Lutomirski authored
There's nothing special enough about the espfix64 double fault fixup to justify writing it in assembly. Move it to C. This also fixes a bug: if the double fault came from an IST stack, the old asm code would return to a partially uninitialized stack frame. Fixes: 3891a04aSigned-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds authored
Pull ARM SoC fixes from Olof Johansson: "A collection of fixes this week: - A set of clock fixes for shmobile platforms - A fix for tegra that moves serial port labels to be per board. We're choosing to merge this for 3.18 because the labels will start being parsed in 3.19, and without this change serial port numbers that used to be stable since the dawn of time will change numbers. - A few other DT tweaks for Tegra. - A fix for multi_v7_defconfig that makes it stop spewing cpufreq errors on Arndale (Exynos)" * tag 'armsoc-for-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: multi_v7_defconfig: fix failure setting CPU voltage by enabling dependent I2C controller ARM: tegra: roth: Fix SD card VDD_IO regulator ARM: tegra: Remove eMMC vmmc property for roth/tn7 ARM: dts: tegra: move serial aliases to per-board ARM: tegra: Add serial port labels to Tegra124 DT ARM: shmobile: kzm9g legacy: Set i2c clks_per_count to 2 ARM: shmobile: r8a7740 dtsi: Correct IIC0 parent clock ARM: shmobile: r8a7790: Fix SD3CKCR address to device tree ARM: shmobile: r8a7740 legacy: Correct IIC0 parent clock ARM: shmobile: r8a7740 legacy: Add missing INTCA clock for irqpin module ARM: shmobile: r8a7790: Fix SD3CKCR address ARM: dts: sun6i: Re-parent ahb1_mux to pll6 as required by dma controller
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpuLinus Torvalds authored
Pull percpu fix from Tejun Heo: "This contains one patch to fix a race condition which can lead to percpu_ref using a percpu pointer which is corrupted with a set DEAD bit. The bug was introduced while separating out the ATOMIC mode flag from the DEAD flag. The fix is pretty straight forward. I just committed the patch to the percpu tree but am sending out the pull request early as I'll be on vacation for a week. The patch should be fairly safe and while the latency will be higher I'll be checking emails" * 'for-3.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu-ref: fix DEAD flag contamination of percpu pointer
-
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfsLinus Torvalds authored
Pull btrfs deadlock fix from Chris Mason: "This has a fix for a long standing deadlock that we've been trying to nail down for a while. It ended up being a bad interaction with the fair reader/writer locks and the order btrfs reacquires locks in the btree" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: fix lockups from btrfs_clear_path_blocking
-