Commits · e005e8041c132af9f70862e1387a222198f95e7f · nexedi / linux

23 Dec, 2008 32 commits

NFSv4: Rename the state reclaimer thread · e005e804

Trond Myklebust authored Dec 23, 2008

It is really a more general purpose state management thread at this point.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

e005e804

NFSv4: Clean up NFS4ERR_CB_PATH_DOWN error management... · 707fb4b3

Trond Myklebust authored Dec 23, 2008

Add a delegation cleanup phase to the state management loop, and do the
NFS4ERR_CB_PATH_DOWN recovery there.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

707fb4b3

NFSv4: Clean up the support for returning multiple delegations · 515d8611

Trond Myklebust authored Dec 23, 2008

Add a flag to mark delegations as requiring return, then run a garbage
collector. In the future, this will allow for more flexible delegation
management, where delegations may be marked for return if it turns out
that they are not being referenced.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

515d8611

NFSv4: Add recovery for individual stateids · 9e33bed5

Trond Myklebust authored Dec 23, 2008

NFSv4 defines a number of state errors which the client does not currently
handle. Among those we should worry about are:
  NFS4ERR_ADMIN_REVOKED - the server's administrator revoked our locks
  			  and/or delegations.
  NFS4ERR_BAD_STATEID - the client and server are out of sync, possibly
                        due to a delegation return racing with an OPEN
			request.
  NFS4ERR_OPENMODE - the client attempted to do something not sanctioned
  		     by the open mode of the stateid. Should normally just
		     occur as a result of a delegation return race.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

9e33bed5

NFSv4: Remove nfs_client->cl_sem · 95d35cb4

Trond Myklebust authored Dec 23, 2008

Now that we're using the flags to indicate state that needs to be
recovered, as well as having implemented proper refcounting and spinlocking
on the state and open_owners, we can get rid of nfs_client->cl_sem. The
only remaining case that was dubious was the file locking, and that case is
now covered by the nfsi->rwsem.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

95d35cb4

NFSv4: Ensure that file unlock requests don't conflict with state recovery · 19e03c57

Trond Myklebust authored Dec 23, 2008

The unlock path is currently failing to take the nfs_client->cl_sem read
lock, and hence the recovery path may see locks disappear from underneath
it.
Also ensure that it takes the nfs_inode->rwsem read lock so that it there
is no conflict with delegation recalls.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

19e03c57

NFS: Remove the unnecessary argument to nfs4_wait_clnt_recover() · 65de872e

Trond Myklebust authored Dec 23, 2008

...and move some code around in order to clear out an unnecessary
forward declaration.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

65de872e

NFSv4: Ensure that nfs4_reclaim_open_state() doesn't depend on cl_sem · fe1d8195
Trond Myklebust authored Dec 23, 2008
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
fe1d8195
NFSv4: Add a recovery marking scheme for state owners · 7eff03ae
Trond Myklebust authored Dec 23, 2008
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
7eff03ae

NFSv4: Don't tell server we rebooted when not necessary · 0f605b56

Trond Myklebust authored Dec 23, 2008

Instead of doing a full setclientid, try doing a RENEW call first.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

0f605b56

NFSv4: Remove redundant RENEW calls if we know the lease has expired · e598d843
Trond Myklebust authored Dec 23, 2008
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
e598d843

NFSv4: Fix state recovery when the client runs over the grace period · b79a4a1b

Trond Myklebust authored Dec 23, 2008

If the client for some reason is not able to recover all its state within
the time allotted for the grace period, and the server reboots again, the
client is not allowed to recover the state that was 'lost' using reboot
recovery.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

b79a4a1b

NFSv4: Callers to nfs4_get_renew_cred() need to hold nfs_client->cl_lock · 6dc9d57a
Trond Myklebust authored Dec 23, 2008
```
Ditto for nfs4_get_setclientid_cred().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
6dc9d57a
NFSv4: Clean up for the state loss reclaimer · 02860014
Trond Myklebust authored Dec 23, 2008
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
02860014
NFS: Use atomic bitops when changing struct nfs_delegation->flags · 15c831bf
Trond Myklebust authored Dec 23, 2008
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
15c831bf

NFSv4: Fix up the dereferencing of delegation->inode · 86e89489

Trond Myklebust authored Dec 23, 2008

Without an extra lock, we cannot just assume that the delegation->inode is
valid when we're traversing the rcu-protected nfs_client lists. Use the
delegation->lock to ensure that it is truly valid.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

86e89489

NFSv4: Fix up another delegation related race · 34310430

Trond Myklebust authored Dec 23, 2008

When we can update_open_stateid(), we need to be certain that we don't
race with a delegation return. While we could do this by grabbing the
nfs_client->cl_lock, a dedicated spin lock in the delegation structure
will scale better.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

34310430

NLM: allow lockd requests from an unprivileged port · 0cb2659b

Chuck Lever authored Dec 23, 2008

If the admin has specified the "noresvport" option for an NFS mount
point, the kernel's NFS client uses an unprivileged source port for
the main NFS transport.  The kernel's lockd client should use an
unprivileged port in this case as well.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

0cb2659b

NFS: "[no]resvport" mount option changes mountd client too · 50a737f8

Chuck Lever authored Dec 23, 2008

If the admin has specified the "noresvport" option for an NFS mount
point, the kernel's NFS client uses an unprivileged source port for
the main NFS transport.  The kernel's mountd client should use an
unprivileged port in this case as well.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

50a737f8

NFS: add "[no]resvport" mount option · d740351b

Chuck Lever authored Dec 23, 2008

The standard default security setting for NFS is AUTH_SYS.  An NFS
client connects to NFS servers via a privileged source port and a
fixed standard destination port (2049).  The client sends raw uid and
gid numbers to identify users making NFS requests, and the server
assumes an appropriate authority on the client has vetted these
values because the source port is privileged.

On Linux, by default in-kernel RPC services use a privileged port in
the range between 650 and 1023 to avoid using source ports of well-
known IP services.  Using such a small range limits the number of NFS
mount points and the number of unique NFS servers to which a client
can connect concurrently.

An NFS client can use unprivileged source ports to expand the range of
source port numbers, allowing more concurrent server connections and
more NFS mount points.  Servers must explicitly allow NFS connections
from unprivileged ports for this to work.

In the past, bumping the value of the sunrpc.max_resvport sysctl on
the client would permit the NFS client to use unprivileged ports.
Bumping this setting also changes the maximum port number used by
other in-kernel RPC services, some of which still required a port
number less than 1023.

This is exacerbated by the way source port numbers are chosen by the
Linux RPC client, which starts at the top of the range and works
downwards.  It means that bumping the maximum means all RPC services
requesting a source port will likely get an unprivileged port instead
of a privileged one.

Changing this setting effects all NFS mount points on a client.  A
sysadmin could not selectively choose which mount points would use
non-privileged ports and which could not.

Lastly, this mechanism of expanding the limit on the number of NFS
mount points was entirely undocumented.

To address the need for the NFS client to use a large range of source
ports without interfering with the activity of other in-kernel RPC
services, we introduce a new NFS mount option.  This option explicitly
tells only the NFS client to use a non-privileged source port when
communicating with the NFS server for one specific mount point.

This new mount option is called "resvport," like the similar NFS mount
option on FreeBSD and Mac OS X.  A sister patch for nfs-utils will be
submitted that documents this new option in nfs(5).

The default setting for this new mount option requires the NFS client
to use a privileged port, as before.  Explicitly specifying the
"noresvport" mount option allows the NFS client to use an unprivileged
source port for this mount point when connecting to the NFS server
port.

This mount option is supported only for text-based NFS mounts.

[ Sidebar: it is widely known that security mechanisms based on the
  use of privileged source ports are ineffective.  However, the NFS
  client can combine the use of unprivileged ports with the use of
  secure authentication mechanisms, such as Kerberos.  This allows a
  large number of connections and mount points while ensuring a useful
  level of security.

  Eventually we may change the default setting for this option
  depending on the security flavor used for the mount.  For example,
  if the mount is using only AUTH_SYS, then the default setting will
  be "resvport;" if the mount is using a strong security flavor such
  as krb5, the default setting will be "noresvport." ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[Trond.Myklebust@netapp.com: Fixed a bug whereby nfs4_init_client()
was being called with incorrect arguments.]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

d740351b

NFS: move nfs_server flag initialization · 542fcc33

Chuck Lever authored Dec 23, 2008

Make it possible for the NFSv4 mount set up logic to pass mount option
flags down the stack to nfs_create_rpc_client().

This is immediately useful if we want NFS mount options to modulate
settings of the underlying RPC transport, but it may be useful at some
later point if other parts of the NFSv4 mount initialization logic
want to know what the mount options are.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

542fcc33

NFS: expand flags passed to nfs_create_rpc_client() · 4a01b8a4

Chuck Lever authored Dec 23, 2008

The nfs_create_rpc_client() function sets up an RPC client for an NFS
mount point.  Add an option that allows it to set up an RPC transport
from an unprivileged port.

Instead of having nfs_create_rpc_client()'s callers retain local
knowledge about how to set up an RPC client, create a couple of flag
arguments to control the use of RPC_CLNT_CREATE flags.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

4a01b8a4

NFS: introduce nfs_mount_info struct for calling nfs_mount() · c5d120f8

Chuck Lever authored Dec 23, 2008

Clean up: convert nfs_mount() to take a single data structure argument to make
it simpler to add more arguments.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

c5d120f8

NFS: Move declaration of nfs_mount() to fs/nfs/internal.h · 146ec944

Chuck Lever authored Dec 23, 2008

Clean up:  The nfs_mount() function is not to be used outside of the
NFS client.  Move its public declaration to fs/nfs/internal.h.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

146ec944

NFS: rename nfs_path variable · 7b5d2b98

Chuck Lever authored Dec 23, 2008

Clean up: I'm about to move the declaration of nfs_mount into
fs/nfs/internal.h and include it in fs/nfs/nfsroot.c.  There's a
conflicting definition of nfs_path in fs/nfs/internal.h and
fs/nfs/nfsroot.c, so rename the private one.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

7b5d2b98

lockd: convert reclaimer thread to kthread interface · df94f000

Jeff Layton authored Dec 23, 2008

My understanding is that there is a push to turn the kernel_thread
interface into a non-exported symbol and move all kernel threads to use
the kthread API. This patch changes lockd to use kthread_run to spawn
the reclaimer thread.

I've made the assumption here that the extra module references taken
when we spawn this thread are unnecessary and removed them. I've also
added a KERN_ERR printk that pops if the thread can't be spawned to warn
the admin that the locks won't be reclaimed.

In the future, it would be nice to be able to notify userspace that
locks have been lost (probably by implementing SIGLOST), and adding some
good policies about how long we should reattempt to reclaim the locks.

Finally, I removed a comment about memory leaks that I believe is
obsolete and added a new one to clarify the result of sending a SIGKILL
to the reclaimer thread. As best I can tell, doing so doesn't actually
cause a memory leak.

I consider this patch 2.6.29 material.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

df94f000

LOCKD: Make lockd_up() and lockd_down() exported GPL-only · 2de59872
Trond Myklebust authored Dec 23, 2008
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
2de59872

SUNRPC: nfsacl_encode/nfsacl_decode should be exported as GPL-only · d716f0b8

Trond Myklebust authored Dec 23, 2008

Again, this has never been intended as a public abi for out-of-tree
modules.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

d716f0b8

SUNRPC: rpcsec_gss modules should not be used by out-of-tree code · 7bd88269
Trond Myklebust authored Dec 23, 2008
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
7bd88269

SUNRPC: Convert the xdr helpers and rpc_pipefs to EXPORT_SYMBOL_GPL · 468039ee

Trond Myklebust authored Dec 23, 2008

We've never considered the sunrpc code as part of any ABI to be used by
out-of-tree modules.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

468039ee

SUNRPC: Remove the last remnant of the BKL... · 88a9fe8c

Trond Myklebust authored Dec 23, 2008

Somehow, this escaped the previous purge. There should be no need to keep
any extra locks in the XDR callbacks.

The NFS client XDR code only writes into private objects, whereas all reads
of shared objects are confined to fields that do not change, such as
filehandles...

Ditto for lockd, the NFSv2/v3 client mount code, and rpcbind.

The nfsd XDR code may require the BKL, but since it does a synchronous RPC
call from a thread that already holds the lock, that issue is moot.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

88a9fe8c

nfs: remove redundant tests on reading new pages · 136221fc

Wu Fengguang authored Dec 23, 2008

aops->readpages() and its NFS helper readpage_async_filler() will only
be called to do readahead I/O for newly allocated pages. So it's not
necessary to test for the always 0 dirty/uptodate page flags.

The removal of nfs_wb_page() call also fixes a readahead bug: the NFS
readahead has been synchronous since 2.6.23, because that call will
clear PG_readahead, which is the reminder for asynchronous readahead.

More background: the PG_readahead page flag is shared with PG_reclaim,
one for read path and the other for write path. clear_page_dirty_for_io()
unconditionally clears PG_readahead to prevent possible readahead residuals,
assuming itself to be always called in the write path. However, NFS is one
and the only exception in that it _always_ calls clear_page_dirty_for_io()
in the read path, i.e. for readpages()/readpage().

Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Wu Fengguang <wfg@linux.intel.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

136221fc

20 Dec, 2008 4 commits

Null pointer deref with hrtimer_try_to_cancel() · 3d44cc3e

Thomas Gleixner authored Dec 20, 2008

Impact: Prevent kernel crash with posix timer clockid CLOCK_MONOTONIC_RAW

commit 2d42244a (clocksource:
introduce CLOCK_MONOTONIC_RAW) introduced a new clockid, which is only
available to read out the raw not NTP adjusted system time.

The above commit did not prevent that a posix timer can be created
with that clockid. The timer_create() syscall succeeds and initializes
the timer to a non existing hrtimer base. When the timer is deleted
either by timer_delete() or by the exit() cleanup the kernel crashes.

Prevent the creation of timers for CLOCK_MONOTONIC_RAW by setting the
posix clock function to no_timer_create which returns an error code.
Reported-and-tested-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3d44cc3e

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs · ab653872

Linus Torvalds authored Dec 20, 2008

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  fs/9p: change simple_strtol to simple_strtoul
  9p: convert d_iname references to d_name.name
  9p: Remove potentially bad parameter from function entry debug print.

ab653872

Merge branch 'x86-fixes-for-linus' of... · e6a997ed

Linus Torvalds authored Dec 20, 2008

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix resume (S2R) broken by Intel microcode module, on A110L
  x86 gart: don't complain if no AMD GART found
  AMD IOMMU: panic if completion wait loop fails
  AMD IOMMU: set cmd buffer pointers to zero manually
  x86: re-enable MCE on secondary CPUS after suspend/resume
  AMD IOMMU: allocate rlookup_table with __GFP_ZERO

e6a997ed

x86: fix resume (S2R) broken by Intel microcode module, on A110L · 280a9ca5

Dmitry Adamushko authored Dec 20, 2008

Impact: fix deadlock

This is in response to the following bug report:

Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=12100
Subject         : resume (S2R) broken by Intel microcode module, on A110L
Submitter       : Andreas Mohr <andi@lisas.de>
Date            : 2008-11-25 08:48 (19 days old)
Handled-By      : Dmitry Adamushko <dmitry.adamushko@gmail.com>

[ The deadlock scenario has been discovered by Andreas Mohr ]

I think I might have a logical explanation why the system:

  (http://bugzilla.kernel.org/show_bug.cgi?id=12100)

might hang upon resuming, OTOH it should have likely hanged each and every time.

(1) possible deadlock in microcode_resume_cpu() if either 'if' section is
taken;

(2) now, I don't see it in spec. and can't experimentally verify it (newer
ucodes don't seem to be available for my Core2duo)... but logically-wise, I'd
think that when read upon resuming, the 'microcode revision' (MSR 0x8B) should
be back to its original one (we need to reload ucode anyway so it doesn't seem
logical if a cpu doesn't drop the version)... if so, the comparison with
memcmp() for the full 'struct cpu_signature' is wrong... and that's how one of
the aforementioned 'if' sections might have been triggered - leading to a
deadlock.

Obviously, in my tests I simulated loading/resuming with the ucode of the same
version (just to see that the file is loaded/re-loaded upon resuming) so this
issue has never popped up.

I'd appreciate if someone with an appropriate system might give a try to the
2nd patch (titled "fix a comparison && deadlock...").

In any case, the deadlock situation is a must-have fix.
Reported-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Tested-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

280a9ca5

19 Dec, 2008 4 commits

fs/9p: change simple_strtol to simple_strtoul · f1d9e458

Julia Lawall authored Dec 19, 2008

Since v9ses->uid is unsigned, it would seem better to use simple_strtoul that
simple_strtol.

A simplified version of the semantic patch that makes this change is as
follows: (http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@r2@
long e;
position p;
@@

e = simple_strtol@p(...)

@@
position p != r2.p;
type T;
T e;
@@

e =
- simple_strtol@p
+ simple_strtoul
  (...)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>

f1d9e458

9p: convert d_iname references to d_name.name · 7dd0cdc5

Wu Fengguang authored Dec 19, 2008

d_iname is rubbish for long file names.
Use d_name.name in printks instead.
Signed-off-by: Wu Fengguang <wfg@linux.intel.com>
Acked-by: Eric Van Hensbergen <ericvh@gmail.com>

7dd0cdc5

9p: Remove potentially bad parameter from function entry debug print. · 6ff23207
Duane Griffin authored Dec 19, 2008
```
Signed-off-by: Duane Griffin <duaneg@dghda.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
```
6ff23207

Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6 · 9a1d1035

Linus Torvalds authored Dec 19, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] mpt fusion: clear list of outstanding commands on host reset
  [SCSI] scsi_lib: only call scsi_unprep_request() under queue lock
  [SCSI] ibmvstgt: move crq_queue_create to the end of initialization
  [SCSI] libiscsi REGRESSION: fix passthrough support with older iscsi tools
  [SCSI] aacraid: disable Dell Percraid quirk on Adaptec 2200S and 2120S

9a1d1035