- 03 Oct, 2008 9 commits
-
-
Chuck Lever authored
lockd accepts SM_NOTIFY calls only from a privileged process on the local system. If lockd uses an AF_INET6 listener, the sender's address (ie the local rpc.statd) will be the IPv6 loopback address, not the IPv4 loopback address. Make sure the privilege test in nlmsvc_proc_sm_notify() and nlm4svc_proc_sm_notify() works for both AF_INET and AF_INET6 family addresses by refactoring the test into a helper and adding support for IPv6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Adjust the signature and callers of nlmclnt_grant() to pass a "struct sockaddr *" instead of a "struct sockaddr_in *" in order to support IPv6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Fix up nlmsvc_lookup_host() to pass AF_INET6 source addresses to nlm_lookup_host(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Pass a struct sockaddr * and a length to nlmclnt_lookup_host() to accomodate non-AF_INET family addresses. As a side benefit, eliminate the hostname_len argument, as the hostname is always NUL-terminated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Use struct sockaddr * and length in nlm_lookup_host_info to all callers to pass in either AF_INET or AF_INET6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
The nlm_lookup_host() function already has a large number of arguments, and I'm about to add a few more. As a clean up, convert the function to use a single data structure argument. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
J. Bruce Fields authored
The current lockd does not reject reclaims that arrive outside of the grace period. Accepting a reclaim means promising to the client that no conflicting locks were granted since last it held the lock. We can meet that promise if we assume the only lockers are nfs clients, and that they are sufficiently well-behaved to reclaim only locks that they held before, and that only reclaim locks have been permitted so far. Once we leave the grace period (and start permitting non-reclaims), we can no longer keep that promise. So we must start rejecting reclaims at that point. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
J. Bruce Fields authored
Do all the grace period checks in svclock.c. This simplifies the code a bit, and will ease some later changes. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
J. Bruce Fields authored
Rewrite grace period code to unify management of grace period across lockd and nfsd. The current code has lockd and nfsd cooperate to compute a grace period which is satisfactory to them both, and then individually enforce it. This creates a slight race condition, since the enforcement is not coordinated. It's also more complicated than necessary. Here instead we have lockd and nfsd each inform common code when they enter the grace period, and when they're ready to leave the grace period, and allow normal locking only after both of them are ready to leave. We also expect the locks_start_grace()/locks_end_grace() interface here to be simpler to build on for future cluster/high-availability work, which may require (for example) putting individual filesystems into grace, or enforcing grace periods across multiple cluster nodes. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
- 29 Sep, 2008 31 commits
-
-
Benny Halevy authored
since commit ff7d9756 "nfsd: use static memory for callback program and stats" do_probe_callback uses a static callback program (NFS4_CALLBACK) rather than the one set in clp->cl_callback.cb_prog as passed in by the client in setclientid (4.0) or create_session (4.1). This patches introduces rpc_create_args.prognumber that allows overriding program->number when creating rpc_clnt. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Benny Halevy authored
Now that cb_stats are static (since commit ff7d9756) there's no need to clear them. Initially I thought it might make sense to do that every callback probing but since the stats are per-program and they are shared between possibly several client callback instances, zeroing them out seems like the wrong thing to do. Note that that commit also introduced a bug since stats.program is also being cleared in the process and it is not restored after the memset as it used to be. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
The RPCB XDR functions are used for multiple procedures. For instance, rpcb_encode_getaddr() is used for RPCB_GETADDR, RPCB_SET, and RPCB_UNSET. Make the XDR debug messages more generic so they are less confusing. And, unlike in other RPC consumers in the kernel, a single debug flag enables all levels of debug messages in the RPC bind client, including XDR debug messages. Since the XDR decoders already report success or failure in this case, remove redundant debug messages in the mid-level rpcb_register_call() function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
With the new rpcbind code, a PMAP_UNSET will not have any effect on services registered via rpcbind v3 or v4. Implement a version of svc_unregister() that uses an RPCB_UNSET with an empty netid string to make sure we have cleared *all* entries for a kernel RPC service when shutting down, or before starting a fresh instance of the service. Use the new version only when CONFIG_SUNRPC_REGISTER_V4 is enabled; otherwise, the legacy PMAP version is used to ensure complete backwards-compatibility with the Linux portmapper daemon. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Clean up: When doing an RPCB_SET, make the kernel's rpcb client use the shorthand "::" for the universal form of the IPv6 ANY address. Without this patch, rpcbind will advertise: 0000:0000:0000:0000:0000:0000:0000:0000.x.y This is cosmetic only. It cleans up the display of information from /sbin/rpcinfo. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
TI-RPC is a user-space library of RPC functions that replaces ONC RPC and allows RPC to operate in the new world of IPv6. TI-RPC combines the concept of a transport protocol (UDP and TCP) and a protocol family (PF_INET and PF_INET6) into a single identifier called a "netid." For example, "udp" means UDP over IPv4, and "udp6" means UDP over IPv6. For rpcbind, then, the RPC service tuple that is registered and advertised is: [RPC program, RPC version, service address and port, netid] instead of [RPC program, RPC version, port, protocol] Service address is typically ANYADDR, but can be a specific address of one of the interfaces on a multi-homed host. The third item in the new tuple is expressed as a universal address. The current Linux rpcbind implementation registers a netid for both protocol families when RPCB_SET is done for just the PF_INET6 version of the netid (ie udp6 or tcp6). So registering "udp6" causes a registration for "udp" to appear automatically as well. We've recently determined that this is incorrect behavior. In the TI-RPC world, "udp6" is not meant to imply that the registered RPC service handles requests from AF_INET as well, even if the listener socket does address mapping. "udp" and "udp6" are entirely separate capabilities, and must be registered separately. The Linux kernel, unlike TI-RPC, leverages address mapping to allow a single listener socket to handle requests for both AF_INET and AF_INET6. This is still OK, but the kernel currently assumes registering "udp6" will cover "udp" as well. It registers only "udp6" for it's AF_INET6 services, even though they handle both AF_INET and AF_INET6 on the same port. So svc_register() actually needs to register both "udp" and "udp6" explicitly (and likewise for TCP). Until rpcbind is fixed, the kernel can ignore the return code for the second RPCB_SET call. Please merge this with commit 15231312: SUNRPC: Support IPv6 when registering kernel RPC services Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Olaf Kirch <okir@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Clean up: Having two separate functions doesn't add clarity, so eliminate one of them. Use contemporary kernel coding conventions where appropriate. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Adopt an approach similar to the RPC server's auth cache (from Aurelien Charbon and Brian Haley). Note nlm_lookup_host()'s existing IP address hash function has the same issue with correctness on little-endian systems as the original IPv4 auth cache hash function, so I've also updated it with a hash function similar to the new auth cache hash function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Update the nlm_cmp_addr() helper to support AF_INET6 as well as AF_INET addresses. New version takes two "struct sockaddr *" arguments instead of "struct sockaddr_in *" arguments. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
To store larger addresses in the nsm_handle structure, make sm_addr a sockaddr_storage. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
To store larger addresses in the nlm_host structure, make h_saddr a sockaddr_storage. And let's call it something more self-explanatory: "saddr" could easily be mistaken for "server address". Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
To store larger addresses in the nlm_host structure, make h_addr a sockaddr_storage, and add an address length field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Clean up: Add extra type safety and squelch a few compiler complaints in upcoming patches. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Make sure an address family is specified for source addresses passed to nlm_lookup_host(). nlm_lookup_host() will need this when it becomes capable of dealing with AF_INET6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Knowing which source address is used for communicating with remote NLM services can be helpful for debugging configuration problems on hosts with multiple addresses. Keep the dprintk debugging here, but adapt it so it displays AF_INET6 addresses properly. There are also a couple of dprintk clean-ups as well. At some point we will aggregate the helpers that display presentation format addresses into a single set of shared helpers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
We're about to introduce some extra debugging messages in nlm_lookup_host(). Bring the coding style up to date first so we can cleanly introduce the new debugging messages. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
In order to advertise NFS-related services on IPv6 interfaces via rpcbind, the kernel RPC server implementation must use rpcb_v4_register() instead of rpcb_register(). A new kernel build option allows distributions to use the legacy v2 call until they integrate an appropriate user-space rpcbind daemon that can support IPv6 RPC services. I tried adding some automatic logic to fall back if registering with a v4 protocol request failed, but there are too many corner cases. So I just made it a compile-time switch that distributions can throw when they've replaced portmapper with rpcbind. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Create a separate server-level interface for unregistering RPC services. The mechanics of, and the API for, registering and unregistering RPC services will diverge further as support for IPv6 is added. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
Bruce suggested there's no need to expose the difference between an error sending the PMAP_SET request and an error reply from the portmapper to rpcb_register's callers. The user space equivalent of rpcb_register() is pmap_set(3), which returns a bool_t : either the PMAP set worked, or it didn't. Simple. So let's remove the "*okay" argument from rpcb_register() and rpcb_v4_register(), and simply return an error if any part of the call didn't work. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Chuck Lever authored
My plan is to use an AF_INET listener on systems that support only IPv4, and an AF_INET6 listener on systems that can support IPv6. Incoming IPv4 packets will be posted to an AF_INET6 listener with a mapped IPv4 address. Max Matveev <makc@sgi.com> says: Creating a single listener can be dangerous - if net.ipv6.bindv6only is enabled then it's possible to create another listener in v4 namespace on the same port and steal the traffic from the "unifed" listener. You need to disable V6ONLY explicitly via a sockopt to stop that. Set appropriate socket option on RPC server listener sockets to prevent this. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
J. Bruce Fields authored
End lockd's grace period using schedule_delayed_work() instead of a check on every pass through the main loop. After a later patch, we'll depend on lockd to end its grace period even if it's not currently handling requests; so it shouldn't depend on being woken up from the main loop to do so. Also, Nakano Hiroaki (who independently produced a similar patch) noticed that the current behavior is buggy in the face of jiffies wraparound: "lockd uses time_before() to determine whether the grace period has expired. This would seem to be enough to avoid timer wrap-around issues, but, unfortunately, that is not the case. The time_* family of comparison functions can be safely used to compare jiffies relatively close in time, but they stop working after approximately LONG_MAX/2 ticks. nfsd can suffer this problem because the time_before() comparison in lockd() is not performed until the first request comes in, which means that if there is no lockd traffic for more than LONG_MAX/2 ticks we are screwed. "The implication of this is that once time_before() starts misbehaving any attempt from a NFS client to execute fcntl() will be received with a NLM_LCK_DENIED_GRACE_PERIOD message for 25 days (assuming HZ=1000). In other words, the 50 seconds grace period could turn into a grace period of 50 days or more. "Note: This bug was analyzed independently by Oda-san <oda@valinux.co.jp> and myself." Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Nakano Hiroaki <nakano.hiroaki@oss.ntt.co.jp> Cc: Itsuro Oda <oda@valinux.co.jp>
-
J. Bruce Fields authored
The check here is currently harmless but unnecessary, since, as the comment notes, there aren't any blocked-lock callbacks to process during the grace period anyway. And eventually we want to allow multiple grace periods that come and go for different filesystems over the course of the lifetime of lockd, at which point this check is just going to get in the way. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Jeff Layton authored
I had a report from someone building a large NFS server that they were unable to start more than 585 nfsd threads. It was reported against an older kernel using the slab allocator, and I tracked it down to the large allocation in nfsd_racache_init failing. It appears that the slub allocator handles large allocations better, but large contiguous allocations can often be problematic. There doesn't seem to be any reason that the racache has to be allocated as a single large chunk. This patch breaks this up so that the racache is built up from separate allocations. (Thanks also to Takashi Iwai for a bugfix.) Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Takashi Iwai <tiwai@suse.de>
-
Benny Halevy authored
Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Benny Halevy authored
Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Benny Halevy authored
After using the encode_stateid helper the "p" pointer declared by ENCODE_SEQID_OP_HEAD is warned as unused. In the single site where it is still needed it can be declared separately using the ENCODE_HEAD macro. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Benny Halevy authored
Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Benny Halevy authored
nfsd4_encode_open first reservation is currently for 36 + sizeof(stateid_t) while it writes after the stateid a cinfo (20 bytes) and 5 more 4-bytes words, for a total of 40 + sizeof(stateid_t). Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Benny Halevy authored
Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-