Commits · 35f5a422ce1af836007f811b613c440d0e348e06 · nexedi / linux

06 Jan, 2006 40 commits

SUNRPC: new interface to force an RPC rebind · 35f5a422

Chuck Lever authored Jan 03, 2006

 We'd like to hide fields in rpc_xprt and rpc_clnt from upper layer protocols.
 Start by creating an API to force RPC rebind, replacing logic that simply
 sets cl_port to zero.

 Test-plan:
 Destructive testing (unplugging the network temporarily).  Connectathon
 with UDP and TCP.  NFSv2/3 and NFSv4 mounting should be carefully checked.
 Probably need to rig a server where certain services aren't running, or
 that returns an error for some typical operation.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

35f5a422

SUNRPC: switchable buffer allocation · 02107148

Chuck Lever authored Jan 03, 2006

 Add RPC client transport switch support for replacing buffer management
 on a per-transport basis.

 In the current IPv4 socket transport implementation, RPC buffers are
 allocated as needed for each RPC message that is sent.  Some transport
 implementations may choose to use pre-allocated buffers for encoding,
 sending, receiving, and unmarshalling RPC messages, however.  For
 transports capable of direct data placement, the buffers can be carved
 out of a pre-registered area of memory rather than from a slab cache.

 Test-plan:
 Millions of fsx operations.  Performance characterization with "sio" and
 "iozone".  Use oprofile and other tools to look for significant regression
 in CPU utilization.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

02107148

NFSv3: try get_root user-supplied security_flavor · 03c21733

J. Bruce Fields authored Jan 03, 2006

Thanks to Ed Keizer for bug and root cause. He says: "... we could only mount
the top-level Solaris share. We could not mount deeper into the tree.
Investigation showed that Solaris allows UNIX authenticated FSINFO only on the
top level of the share. This is a problem because we share/export our home
directories one level higher than we mount them. I.e. we share the partition
and not the individual home directories. This prevented access to home
directories."

We still may need to try auth_sys for the case where the client doesn't have
appropriate credentials.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

03c21733

NLM: fix parsing of sm notify procedure · a659753e

J. Bruce Fields authored Jan 03, 2006

 The procedure that decodes statd sm_notify call seems to be skipping a
 few arguments.  How did this ever work?

 >From folks at Polyserve.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

a659753e

NLM: Further cancel fixes · 64a318ee

J. Bruce Fields authored Jan 03, 2006

 If the server receives an NLM cancel call and finds no waiting lock to
 cancel, then chances are the lock has already been applied, and the client
 just hadn't yet processed the NLM granted callback before it sent the
 cancel.

 The Open Group text, for example, perimts a server to return either success
 (LCK_GRANTED) or failure (LCK_DENIED) in this case.  But returning an error
 seems more helpful; the client may be able to use it to recognize that a
 race has occurred and to recover from the race.

 So, modify the relevant functions to return an error in this case.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

64a318ee

NLM: clean up nlmsvc_delete_block · 2c5acd2e

J. Bruce Fields authored Jan 03, 2006

The fl_next check here is superfluous (and possibly a layering violation).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

2c5acd2e

NLM: don't unlock on cancel requests · 5996a298

J. Bruce Fields authored Jan 03, 2006

 Currently when lockd gets an NLM_CANCEL request, it also does an unlock for
 the same range.  This is incorrect.

 The Open Group documentation says that "This procedure cancels an
 *outstanding* blocked lock request."  (Emphasis mine.)

 Also, consider a client that holds a lock on the first byte of a file, and
 requests a lock on the entire file.  If the client cancels that request
 (perhaps because the requesting process is signalled), the server shouldn't
 apply perform an unlock on the entire file, since that will also remove the
 previous lock that the client was already granted.

 Or consider a lock request that actually *downgraded* an exclusive lock to
 a shared lock.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

5996a298

NLM: Clean up nlmsvc_grant_reply locking · f232142c

J. Bruce Fields authored Jan 03, 2006

 Slightly simpler logic here makes it more trivial to verify that the up's
 and down's are balanced here.  Break out an assignment from a conditional
 while we're at it.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

f232142c

SUNRPC: net/sunrpc/xdr.c: remove xdr_decode_string() · fb459f45

Adrian Bunk authored Jan 03, 2006

 This patch removes ths unused function xdr_decode_string().
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Neil Brown <neilb@suse.de>
Acked-by: Charles Lever <Charles.Lever@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

fb459f45

NFSv4: Allow user to set the port used by the NFSv4 callback channel · a72b4422
Trond Myklebust authored Jan 03, 2006
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
a72b4422

NFS: Clean up weak cache consistency code · a895b4a1

Trond Myklebust authored Jan 03, 2006

 ...and ensure that nfs_update_inode() respects wcc
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

a895b4a1

NFSv4: Ensure DELEGRETURN returns attributes · fa178f29

Trond Myklebust authored Jan 03, 2006

Upon return of a write delegation, the server will almost always bump the
change attribute. Ensure that we pick up that change so that we don't
invalidate our data cache unnecessarily.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

fa178f29

NFSv4: Ensure change attribute returned by GETATTR callback conforms to spec · beb2a5ec

Trond Myklebust authored Jan 03, 2006

 According to RFC3530 we're supposed to cache the change attribute
 at the time the client receives a write delegation.
 If the inode is clean, a CB_GETATTR callback by the server to the
 client is supposed to return the cached change attribute.
 If, OTOH, the inode is dirty, the client should bump the cached
 change attribute by 1.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

beb2a5ec

SUNRPC: Fix a potential race in rpc_pipefs. · 969b7f25
Trond Myklebust authored Jan 03, 2006
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
969b7f25

NFS: Make directIO aware of compound pages... · 566dd606

Trond Myklebust authored Jan 03, 2006

 ...and avoid calling set_page_dirty on them
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

566dd606

NFS: Make stat() return updated mtimes after a write() · 70b9ecbd

Trond Myklebust authored Jan 03, 2006

 The SuS states that a call to write() will cause mtime to be updated on
 the file. In order to satisfy that requirement, we need to flush out
 any cached writes in nfs_getattr().
 Speed things up slightly by not committing the writes.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

70b9ecbd

NFSv4: Ensure that we return the delegation on the target of a rename too. · 24174119
Trond Myklebust authored Jan 03, 2006
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
24174119

NFS: support large reads and writes on the wire · 40859d7e

Chuck Lever authored Nov 30, 2005

 Most NFS server implementations allow up to 64KB reads and writes on the
 wire.  The Solaris NFS server allows up to a megabyte, for instance.

 Now the Linux NFS client supports transfer sizes up to 1MB, too.  This will
 help reduce protocol and context switch overhead on read/write intensive NFS
 workloads, and support larger atomic read and write operations on servers
 that support them.

 Test-plan:
 Connectathon and iozone on mount point with wsize=rsize>32768 over TCP.
 Tests with NFS over UDP to verify the maximum RPC payload size cap.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

40859d7e

NFS: make "inode number mismatch" message more useful · 325cfed9

Chuck Lever authored Nov 30, 2005

 To help NFS users and server developers, make the "inode number mismatch"
 message display more useful information.

 Test-plan:
 None.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

325cfed9

NFS: get rid of useless kernel log message · dc20f803

Chuck Lever authored Nov 30, 2005

 nfs_statfs() generates a log message when GETATTR returns an error.  This
 is usually a useless message.  Make it a dprintk.

 Test plan:
 None
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

dc20f803

NFS: simplify inlined bit ops in nfs_page.h · a911fd9a

Chuck Lever authored Nov 30, 2005

 Minor cleanup:  inlined bit ops in nfs_page.h can be simpler.

 Test plan:
 Write-intensive workload against a server that requires COMMITs.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

a911fd9a

NFS: Fix error recovery code in fs/nfs/inode.c:__init_nfs() · 6b59a754

Chuck Lever authored Nov 30, 2005

 Red Hat found a problem in the error recovery logic in __init_nfs.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

6b59a754

NFS: use generic_write_checks() to sanity check direct writes · ce1a8e67

Chuck Lever authored Nov 30, 2005

 Replace ad hoc write parameter sanity checking in nfs_file_direct_write()
 with a call to generic_write_checks().  This should make the proper checks
 modulo the O_LARGEFILE flag, and should catch NFSv2-specific limitations by
 virtue of i_sb->s_maxbytes.

 Test plan:
 Posix compliance testing with both NFSv2 and NFSv3.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

ce1a8e67

NFSv4: Remove requirement for machine creds for the "setclientid" operation · 286d7d6a
Trond Myklebust authored Jan 03, 2006
```
 Use a cred from the nfs4_client->cl_state_owners list.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
286d7d6a

NFSv4: Remove requirement for machine creds for the "renew" operation · b4454fe1

Trond Myklebust authored Jan 03, 2006

 In RFC3530, the RENEW operation is allowed to use either

 the same principal, RPC security flavour and (if RPCSEC_GSS), the same
  mechanism and service that was used for SETCLIENTID_CONFIRM

 OR

 Any principal, RPC security flavour and service combination that
 currently has an OPEN file on the server.

 Choose the latter since that doesn't require us to keep credentials for
 the same principal for the entire duration of the mount.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

b4454fe1

NFSv4: Send RENEW requests to the server only when we're holding state · 58d9714a
Trond Myklebust authored Jan 03, 2006
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
58d9714a

NFS: Convert instances of kernel_thread() to kthread() · 5043e900

Trond Myklebust authored Jan 03, 2006

 Convert private implementations in NFSv4 state recovery and delegation
 code to use kthreads.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

5043e900

NFSv4: State recovery cleanup · 433fbe4c

Trond Myklebust authored Jan 03, 2006

 Use wait_on_bit() when waiting for state recovery to complete.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

433fbe4c

NFSv4: OPEN/LOCK/LOCKU/CLOSE will automatically renew the NFSv4 lease · 26e976a8

Trond Myklebust authored Jan 03, 2006

 Cut down on the number of unnecessary RENEW requests on the wire.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

26e976a8

SUNRPC: Ensure that SIGKILL will always terminate a synchronous RPC call. · 2bd61579

Trond Myklebust authored Jan 03, 2006

 ...and make sure that the "intr" flag also enables SIGHUP and SIGTERM to
 interrupt RPC calls too (as per the Solaris implementation).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

2bd61579

NFSv4: Make DELEGRETURN an interruptible operation. · fe650407
Trond Myklebust authored Jan 03, 2006
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
fe650407
NFSv4: Convert LOCK rpc call into an asynchronous RPC call · a5d16a4d
Trond Myklebust authored Jan 03, 2006
```
 In order to allow users to interrupt/cancel it.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
a5d16a4d

NFSv4: locking XDR cleanup · 911d1aaf

Trond Myklebust authored Jan 03, 2006

 Get rid of some unnecessary intermediate structures
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

911d1aaf

NFSv4: Make open recovery track O_RDWR, O_RDONLY and O_WRONLY correctly · 864472e9

Trond Myklebust authored Jan 03, 2006

When recovering from a delegation recall or a network partition, we need
to replay open(O_RDWR), open(O_RDONLY) and open(O_WRONLY) separately.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

864472e9

NFSv4: Make nfs4_state track O_RDWR, O_RDONLY and O_WRONLY separately · e7616923

Trond Myklebust authored Jan 03, 2006

 A closer reading of RFC3530 reveals that OPEN_DOWNGRADE must always
 specify a access modes that have been the argument of a previous OPEN
 operation.
 IOW: doing OPEN(O_RDWR) and then OPEN_DOWNGRADE(O_WRONLY) is forbidden
 unless the user called OPEN(O_WRONLY)

 In order to fix that, we really need to track the three possible open
 states separately.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

e7616923

NFSv4: Make open_confirm() asynchronous too · cdd4e68b
Trond Myklebust authored Jan 03, 2006
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
cdd4e68b

NFSv4: Convert open() into an asynchronous RPC call · 24ac23ab

Trond Myklebust authored Jan 03, 2006

 OPEN is a stateful operation, so we must ensure that it always
 completes. In order to allow users to interrupt the operation,
 we need to make the RPC call asynchronous, and then wait on
 completion (or cancel).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

24ac23ab

SUNRPC: rpc_execute should not return task->tk_status; · e60859ac
Trond Myklebust authored Jan 03, 2006
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
e60859ac
SUNRPC: Get rid of some unused exports · 89991c24
Trond Myklebust authored Jan 03, 2006
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
89991c24

NFSv4: Allocate OPEN call RPC arguments using kmalloc() · e56e0b78

Trond Myklebust authored Jan 03, 2006

 Cleanup in preparation for making OPEN calls interruptible by the user.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

e56e0b78