- 16 Jun, 2015 7 commits
-
-
Jeff Layton authored
Change the uniform client string generator to dynamically allocate the NFSv4 client name string buffer. With this patch, we can eliminate the buffers that are embedded within the "args" structs and simply use the name string that is hanging off the client. This uniform string case is a little simpler than the nonuniform since we don't need to deal with RCU, but we do have two different cases, depending on whether there is a uniquifier or not. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
The way the *_client_string functions work is a little goofy. They build the string in an on-stack buffer and then use kstrdup to copy it. This is not only stack-heavy but artificially limits the size of the client name string. Change it so that we determine the length of the string, allocate it and then scnprintf into it. Since the contents of the nonuniform string depend on rcu-managed data structures, it's possible that they'll change between when we allocate the string and when we go to fill it. If that happens, free the string, recalculate the length and try again. If it the mismatch isn't resolved on the second try then just give up and return -EINVAL. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
The spec allows for up to NFS4_OPAQUE_LIMIT (1k). While we'll almost certainly never use that much, these ops are generally the only ones in the compound so we might as well allow for them to be that large. Also, the existing code didn't add in a word for the opaque length field for either name string. Fix that while we're in there. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
...instead of buffers that are part of their arg structs. We already hold a reference to the client, so we might as well use the allocated buffer. In the event that we can't allocate the clp->cl_owner_id, then just return -ENOMEM. Note too that we switch from a GFP_KERNEL allocation here to GFP_NOFS. It's possible we could end up trying to do a SETCLIENTID or EXCHANGE_ID in order to reclaim some memory, and the GFP_KERNEL allocations in the existing code could cause recursion back into NFS reclaim. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
The current buffer is much too small if you have a relatively long hostname. Bring it up to the size of the one that SETCLIENTID has. Cc: <stable@vger.kernel.org> Reported-by: Michael Skralivetsky <michael.skralivetsky@primarydata.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Fabian Frederick authored
Use kernel.h macro definition. Thanks to Julia Lawall for Coccinelle scripting support. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Neil Brown authored
If the sending queue has a task without ->rq_cong set at the front, and then a number of tasks with ->rq_cong set such that they use the entire congestion window, then the queue deadlocks. The first entry cannot be processed until later entries complete. This scenario has been seen with a client using UDP to access a server, and the network connection breaking for a period of time - it doesn't recover. It never really makes sense for an ->rq_cong request to be on the ->sending queue, but it can happen when a request is being retried, and finds the transport if locked (XPRT_LOCKED). In this case we simple call __xprt_put_cong() and the deadlock goes away. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
- 11 Jun, 2015 2 commits
-
-
Jeff Layton authored
A drop should really only be done when the frame is malformed or we have reason to think that there is some sort of DoS going on. When we get an RPC with bad auth, we should send back an error instead. Cc: Andy Adamson <William.Adamson@netapp.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Chuck Lever authored
Cross-compile test on ARCH=mn10300: In file included from include/linux/list.h:8:0, from include/linux/wait.h:6, from include/linux/fs.h:6, from include/linux/debugfs.h:18, from net/sunrpc/debugfs.c:7: net/sunrpc/debugfs.c: In function 'fault_disconnect_write': include/linux/kernel.h:723:17: warning: comparison of distinct pointer types lacks a cast (void) (&_min1 == &_min2); \ ^ >> net/sunrpc/debugfs.c:307:8: note: in expansion of macro 'min' len = min(len, sizeof(buffer) - 1); Fixes: ('SUNRPC: Transport fault injection') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
- 10 Jun, 2015 9 commits
-
-
Vaishali Thakkar authored
In little endian cases, the macro htonl unfolds to __swab32 which provides special case for constants. In big endian cases, __constant_htonl and htonl expand directly to the same expression. So, replace __constant_htonl with htonl with the goal of getting rid of the definition of __constant_htonl completely. The semantic patch that performs this transformation is as follows: @@expression x;@@ - __constant_htonl(x) + htonl(x) Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Chuck Lever authored
It has been exceptionally useful to exercise the logic that handles local immediate errors and RDMA connection loss. To enable developers to test this regularly and repeatably, add logic to simulate connection loss every so often. Fault injection is disabled by default. It is enabled with $ sudo echo xxx > /sys/kernel/debug/sunrpc/inject_fault/disconnect where "xxx" is a large positive number of transport method calls before a disconnect. A value of several thousand is usually a good number that allows reasonable forward progress while still causing a lot of connection drops. These hooks are disabled when SUNRPC_DEBUG is turned off. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Anna Schumaker authored
This was only ever set to nfs_writeback_release_common(), a function which is completely empty. Let's just drop this function pointer and simplify the code a bit. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Dominique Martinet authored
nfs4_proc_lookup_common is supposed to return a posix error, we have to handle any error returned that isn't errno Reported-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Frank S. Filz <ffilzlnx@mindspring.com> Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
RDMA xprts don't have a sock_xprt, but an rdma_xprt, so the xs_swapper_enable/disable functions will likely oops when fed an RDMA xprt. Turn these functions into rpc_xprt_ops so that that doesn't occur. For now the RDMA versions are no-ops that just return -EINVAL on an attempt to swapon. Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
It's possible that we could race with a call to xs_reset_transport, in which case the xprt->inet pointer could be zeroed out while we're accessing it. Lock the xprt before we try to set memalloc on it. Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
We currently increment the memalloc_socks counter if we have a xprt that is associated with a swapfile. That socket can be replaced however during a reconnect event, and the memalloc_socks counter is never decremented if that occurs. When tearing down a xprt socket, check to see if the xprt is set up for swapping and sk_clear_memalloc before releasing the socket if so. Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
Split xs_swapper into enable/disable functions and eliminate the "enable" flag. Currently, it's racy if you have multiple swapon/swapoff operations running in parallel over the same xprt. Also fix it so that we only set it to a memalloc socket on a 0->1 transition and only clear it on a 1->0 transition. Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Jeff Layton authored
Jerome reported seeing a warning pop when working with a swapfile on NFS. The nfs_swap_activate can end up calling sk_set_memalloc while holding the rcu_read_lock and that function can sleep. To fix that, we need to take a reference to the xprt while holding the rcu_read_lock, set the socket up for swapping and then drop that reference. But, xprt_put is not exported and having NFS deal with the underlying xprt is a bit of layering violation anyway. Fix this by adding a set of activate/deactivate functions that take a rpc_clnt pointer instead of an rpc_xprt, and have nfs_swap_activate and nfs_swap_deactivate call those. Also, add a per-rpc_clnt atomic counter to keep track of the number of active swapfiles associated with it. When the counter does a 0->1 transition, we enable swapping on the xprt, when we do a 1->0 transition we disable swapping on it. This also allows us to be a bit more selective with the RPC_TASK_SWAPPER flag. If non-swapper and swapper clnts are sharing a xprt, then we only need to flag the tasks from the swapper clnt with that flag. Acked-by: Mel Gorman <mgorman@suse.de> Reported-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
- 05 Jun, 2015 3 commits
-
-
Trond Myklebust authored
We need to allow the server to send a new request immediately after we've replied to the previous one. Right now, there is a window between the send and the release of the old request in rpc_put_task(), where the server could send us a new backchannel RPC call, and we have no request to service it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
- 02 Jun, 2015 8 commits
-
-
Chuck Lever authored
Clean up: Merge bc_send() into bc_svc_process(). Note: even thought this touches svc.c, it is a client-side change. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
If the socket was busy due to a socket nospace error, then we should retry the send. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
req->rq_private_buf isn't initialised when xprt_setup_backchannel calls xprt_free_allocation. Fixes: fb7a0b9a ("nfs41: New backchannel helper routines") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Julia Lawall authored
Delete jump to a label on the next line, when that label is not used elsewhere. A simplified version of the semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ identifier l; @@ -if (...) goto l; -l: // </smpl> Also drop the unnecessary ret variable. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Stefan Hajnoczi authored
Several functions have outdated arguments listed in the doc comments. Drop documentation for arguments that no longer exist. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Chuck Lever authored
When encoding the NFSACL SETACL operation, reserve just the estimated size of the ACL rather than a fixed maximum. This eliminates needless zero padding on the wire that the server ignores. Fixes: ee5dc773 ('NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!"') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
NeilBrown authored
In glibc 2.21 (and several previous), a call to opendir() will result in a 32K (BUFSIZ*4) buffer being allocated and passed to getdents. However a call to fdopendir() results in an 'fstat' request to determine block size and a matching buffer allocated for subsequent use with getdents. This will typically be 1M. The first getdents call on an NFS directory will always use READDIR_PLUS (or NFSv4 equivalent) if available. Subsequent getdents calls only use this more expensive version if some 'stat' requests are made between the getdents calls. For this reason it is good to keep at least that first getdents call relatively short. When fdopendir() and readdir() is used on a large directory, it takes approximately 32 times as long to complete as using "opendir". Current versions of 'find' use fdopendir() and demonstrate this slowness. 'stat' on a directory currently returns the 'wsize'. This number has no meaning on directories. Actual READDIR requests are limited to ->dtsize, which itself is capped at 4 pages, coincidently the same as BUFSIZ*4. So this is a meaningful number to use as the blocksize on directories, and has the effect of making 'find' on large directories go a lot faster. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Trond Myklebust authored
While the NFSv4.1 code has always drained the slot tables in order to stop non-recovery related RPC calls when doing lease recovery, the NFSv4 code did not. The reason for the difference in behaviour is that NFSv4 does not have session state, and so RPC calls can in theory proceed while recovery is happening. In practice, however, anything I/O or state related needs to wait until recovery is over. This patch changes the behaviour of NFSv4 to match that of NFSv4.1 so that we can simplify the state recovery code by assuming that we do not have to deal with races between recovery and ordinary I/O. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
- 01 Jun, 2015 2 commits
-
-
Olga Kornievskaia authored
Problem: When an operation like WRITE receives a BAD_STATEID, even though recovery code clears the RECLAIM_NOGRACE recovery flag before recovering the open state, because of clearing delegation state for the associated inode, nfs_inode_find_state_and_recover() gets called and it makes the same state with RECLAIM_NOGRACE flag again. As a results, when we restart looking over the open states, we end up in the infinite loop instead of breaking out in the next test of state flags. Solution: unset the RECLAIM_NOGRACE set because of calling of nfs_inode_find_state_and_recover() after returning from calling recover_open() function. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
-
Linus Torvalds authored
-
- 31 May, 2015 9 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds authored
Pull vfs fix from Al Viro: "Off-by-one in d_walk()/__dentry_kill() race fix. It's very hard to hit; possible in the same conditions as the original bug, except that you need the skipped branch to contain all the remaining evictables, so that the d_walk()-calling loop in d_invalidate() decides there's nothing more to do and doesn't go for another pass - otherwise that next pass will sweep the sucker. So it's not too urgent, but seeing that the fix is obvious and the original commit has spread into all -stable branches..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: d_walk() might skip too much
-
git://ftp.arm.linux.org.uk/~rmk/linux-armLinus Torvalds authored
Pull ARM fixes from Russell King: "Three fixes this time around: - fix a memory leak which occurs when probing performance monitoring unit interrupts - fix handling of non-PMD aligned end of RAM causing boot failures - fix missing syscall trace exit path with syscall tracing enabled causing a kernel oops in the audit code" * 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: ARM: 8357/1: perf: fix memory leak when probing PMU PPIs ARM: fix missing syscall trace exit ARM: 8356/1: mm: handle non-pmd-aligned end of RAM
-
git://git.kernel.org/pub/scm/linux/kernel/git/ralf/linuxLinus Torvalds authored
Pull MIPS fixes from Ralf Baechle: "MIPS fixes for 4.1 all across the tree" * 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/ralf/linux: MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression MIPS: BMIPS: Fix bmips_wr_vec() MIPS: ath79: fix build problem if CONFIG_BLK_DEV_INITRD is not set MIPS: Fuloong 2E: Replace CONFIG_USB_ISP1760_HCD by CONFIG_USB_ISP1760 MIPS: irq: Use DECLARE_BITMAP ttyFDC: Fix to use native endian MMIO reads MIPS: Fix CDMM to use native endian MMIO reads
-
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linuxLinus Torvalds authored
Pull turbostat tool fixes from Len Brown: "Just one minor kernel dependency in this batch -- added a #define to msr-index.h" * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: update version number to 4.7 tools/power turbostat: allow running without cpu0 tools/power turbostat: correctly decode of ENERGY_PERFORMANCE_BIAS tools/power turbostat: enable turbostat to support Knights Landing (KNL) tools/power turbostat: correctly display more than 2 threads/core
-
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pendingLinus Torvalds authored
Pull SCSI target fixes from Nicholas Bellinger: "These are mostly minor fixes, with the exception of the following that address fall-out from recent v4.1-rc1 changes: - regression fix related to the big fabric API registration changes and configfs_depend_item() usage, that required cherry-picking one of HCH's patches from for-next to address the issue for v4.1 code. - remaining TCM-USER -v2 related changes to enforce full CDB passthrough from Andy + Ilias. Also included is a target_core_pscsi driver fix from Andy that addresses a long standing issue with a Scsi_Host reference being leaked on PSCSI device shutdown" * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: iser-target: Fix error path in isert_create_pi_ctx() target: Use a PASSTHROUGH flag instead of transport_types target: Move passthrough CDB parsing into a common function target/user: Only support full command pass-through target/user: Update example code for new ABI requirements target/pscsi: Don't leak scsi_host if hba is VIRTUAL_HOST target: Fix se_tpg_tfo->tf_subsys regression + remove tf_subsystem target: Drop signal_pending checks after interruptible lock acquire target: Add missing parentheses target: Fix bidi command handling target/user: Disallow full passthrough (pass_level=0) ISCSI: fix minor memory leak
-
Linus Torvalds authored
Merge tag 'hwmon-for-linus-v4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: "Some late hwmon patches, all headed for -stable - fix sysfs attribute initialization in nct6775 and nct6683 drivers - do not attempt to auto-detect tmp435 on I2C address 0x37 - ensure iio channel is of type IIO_VOLTAGE in ntc_thermistor driver" * tag 'hwmon-for-linus-v4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (nct6683) Add missing sysfs attribute initialization hwmon: (nct6775) Add missing sysfs attribute initialization hwmon: (tmp401) Do not auto-detect chip on I2C address 0x37 hwmon: (ntc_thermistor) Ensure iio channel is of type IIO_VOLTAGE
-
Roland Dreier authored
We don't assign pi_ctx to desc->pi_ctx until we're certain to succeed in the function. That means the cleanup path should use the local pi_ctx variable, not desc->pi_ctx. This was detected by Coverity (CID 1260062). Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
-
Andy Grover authored
It seems like we only care if a transport is passthrough or not. Convert transport_type to a flags field and replace TRANSPORT_PLUGIN_* with a flag, TRANSPORT_FLAG_PASSTHROUGH. Signed-off-by: Andy Grover <agrover@redhat.com> Reviewed-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
-
Andy Grover authored
Aside from whether they handle BIDI ops or not, parsing of the CDB by kernel and user SCSI passthrough modules should be identical. Move this into a new passthrough_parse_cdb() and call it from tcm-pscsi and tcm-user. Reported-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
-