Commits · fdd1e74c89fe39259a29c494209abad63ff76f82 · Kirill Smelkov / linux

19 Apr, 2008 5 commits

NFS: Ensure that the read code cleans up properly when rpc_run_task() fails · fdd1e74c

Trond Myklebust authored Apr 15, 2008

In the case of readpage() we need to ensure that the pages get unlocked,
and that the error is flagged.

In the case of O_DIRECT, we need to ensure that the pages are all released.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

fdd1e74c

NFS: Fix nfs_wb_page() to always exit with an error or a clean page · 73e3302f

Trond Myklebust authored Apr 11, 2008

It is possible for nfs_wb_page() to sometimes exit with 0 return value, yet
the page is left in a dirty state.
For instance in the case where the server rebooted, and the COMMIT request
failed, then all the previously "clean" pages which were cached by the
server, but were not guaranteed to have been writted out to disk,
have to be redirtied and resent to the server.
The fix is to have nfs_wb_page_priority() check that the page is clean
before it exits...

This fixes a condition that triggers the BUG_ON(PagePrivate(page)) in
nfs_create_request() when we're in the nfs_readpage() path.

Also eliminate a redundant BUG_ON(!PageLocked(page)) while we're at it. It
turns out that clear_page_dirty_for_io() has the exact same test.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

73e3302f

SUNRPC: Don't attempt to destroy expired RPCSEC_GSS credentials.. · 080a1f14

Trond Myklebust authored Apr 19, 2008

..and always destroy using a 'soft' RPC call. Destroying GSS credentials
isn't mandatory; the server can always cope with a few credentials not
getting destroyed in a timely fashion.

This actually fixes a hang situation. Basically, some servers will decide
that the client is crazy if it tries to destroy an RPC context for which
they have sent an RPCSEC_GSS_CREDPROBLEM, and so will refuse to talk to it
for a while.
The regression therefor probably was introduced by commit
0df7fb74.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

080a1f14

SUNRPC: Fix up xprt_write_space() · b6ddf64f

Trond Myklebust authored Apr 17, 2008

The rest of the networking layer uses SOCK_ASYNC_NOSPACE to signal whether
or not we have someone waiting for buffer memory. Convert the SUNRPC layer
to use the same idiom.
Remove the unlikely()s in xs_udp_write_space and xs_tcp_write_space. In
fact, the most common case will be that there is nobody waiting for buffer
space.

SOCK_NOSPACE is there to tell the TCP layer whether or not the cwnd was
limited by the application window. Ensure that we follow the same idiom as
the rest of the networking layer here too.

Finally, ensure that we clear SOCK_ASYNC_NOSPACE once we wake up, so that
write_space() doesn't keep waking things up on xprt->pending.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

b6ddf64f

SUNRPC: Fix a bug in call_decode() · 24b74bf0

Trond Myklebust authored Apr 19, 2008

call_verify() can, under certain circumstances, free the RPC slot. In that
case, our cached pointer 'req = task->tk_rqstp' is invalid. Bug was
introduced in commit 220bcc2a (SUNRPC:
Don't call xprt_release in call refresh).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

24b74bf0

19 Mar, 2008 35 commits

lockd: introduce new function to encode private argument in SM_MON requests · 0490a54a

Chuck Lever authored Mar 14, 2008

Clean up: refactor the encoding of the opaque 16-byte private argument in
xdr_encode_mon(). This will be updated later to support IPv6 addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

0490a54a

lockd: Fix up incorrect RPC buffer size calculations. · 2ca7754d

Chuck Lever authored Mar 14, 2008

Switch to using the new mon_id encoder function.

Now that we've refactored the encoding of SM_MON requests, we've
discovered that the pre-computed buffer length maximums are
incorrect!
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

2ca7754d

lockd: document use of mon_id argument in SM_MON requests · ea72a7f1

Chuck Lever authored Mar 14, 2008

Clean up: document the argument type that xdr_encode_common() is
marshalling by introducing a new function.  The new function will replace
xdr_encode_common() in just a sec.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

ea72a7f1

lockd: refactor SM_MON my_id argument encoder · 850c95fd

Chuck Lever authored Mar 14, 2008

Clean up: introduce a new XDR encoder specifically for the my_id
argument of SM_MON requests.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

850c95fd

lockd: refactor SM_MON mon_name argument encoder · 49695174

Chuck Lever authored Mar 14, 2008

Clean up: introduce a new XDR encoder specifically for the mon_name
argument of SM_MON requests.  This will be updated later to support IPv6
addresses in addition to IPv4 addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

49695174

lockd: Ensure NSM strings aren't longer than protocol allows · 099bd05f

Chuck Lever authored Mar 14, 2008

Introduce a special helper function to check the length of NSM strings
before they are placed on the wire.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

099bd05f

lockd: bring a few function declarations up to date · f34ec991

Chuck Lever authored Mar 14, 2008

Clean-up: replace  __inline__ and use up-to-date function declaration
conventions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

f34ec991

NLM: NLM protocol version numbers are u32 · eb18860e

Chuck Lever authored Mar 14, 2008

Clean up: RPC protocol version numbers are u32. Make sure we use an
appropriate type for NLM version numbers when calling nlm_lookup_host().

Eliminates a harmless mixed sign comparison in nlm_host_lookup().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

eb18860e

NLM: LOCKD fails to load if CONFIG_SYSCTL is not set · 90d5b180

Chuck Lever authored Mar 14, 2008

Bruce Fields says:
"By the way, we've got another config-related nit here:

	http://bugzilla.linux-nfs.org/show_bug.cgi?id=156

You can build lockd without CONFIG_SYSCTL set, but then the module will
fail to load."

For now, disable the sysctl registration calls in lockd if CONFIG_SYSCTL
is not enabled.  This allows the kernel to build properly if PROC_FS or
SYSCTL is not enabled, but an NFS client is desired.

In the long run, we would like to be able to build the kernel with an
NFS client but without lockd.  This makes sense, for example, if you want
an NFSv4-only NFS client, as NFSv4 doesn't use NLM at all.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

90d5b180

SUNRPC: Add a default setting for CONFIG_SUNRPC_BIND34 · 1e40316b

Chuck Lever authored Mar 14, 2008

Most distros will want support for rpcbind protocols 3 and 4 to default off
until they have integrated user-space support for the new rpcbind daemon
which supports IPv6 RPC services.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

1e40316b

SUNRPC: Update help Kconfig text · 327a299d

Chuck Lever authored Mar 14, 2008

Clean up: refresh the help text for Kconfig items related to the sunrpc
module.  Remove obsolete URLs, and make the language consistent among
the options.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

327a299d

NFS: Always enable NFS direct I/O · ecfc555a

Chuck Lever authored Mar 14, 2008

Since O_DIRECT is a standard feature that is enabled in most distros,
eliminate the CONFIG_NFS_DIRECTIO build option, and change the
fs/nfs/Makefile to always build in the NFS direct I/O engine.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

ecfc555a

NFS: Show most mount options via nfs_show_options() · 82d101d5

Chuck Lever authored Mar 14, 2008

Display all mount options in /proc/mount which may be needed to reconstruct
a previous mount.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

82d101d5

NFS: Save the values of the "mount*=" mount options · 3f8400d1

Chuck Lever authored Mar 14, 2008

Save the value of the mountproto= mountport= mountvers= and mountaddr=
options so that these values can be displayed later via
nfs_show_options().

This preserves the intent of the original mount options, should the file
system need to be remounted based on what's displayed in /proc/mounts.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

3f8400d1

NFS: Save the value of the "port=" mount option · f22d6d79

Chuck Lever authored Mar 14, 2008

During a remount based on the mount options displayed in /proc/mounts, we
want to preserve the original behavior of the mount request.  Let's save
the original setting of the "port=" mount option in the mount's nfs_server
structure.

This allows us to simplify the default behavior of port setting for NFSv4
mounts: by default, NFSv2/3 mounts first try an RPC bind to determine the
NFS server's port, unless the user specified the "port=" mount option;
Users can force the client to skip the RPC bind by explicitly specifying
"port=<value>".

NFSv4, by contrast, assumes the NFS server port is 2049 and skips the RPC
bind, unless the user specifies "port=".  Users can force an RPC bind for
NFSv4 by explicitly specifying "port=0".

I added a couple of extra comments to clarify this behavior.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

f22d6d79

NFS: Fix up data types of fields in nfs_parsed_mount_options · 78fa701f

Chuck Lever authored Mar 14, 2008

Clean up: make data types of fields in nfs_parsed_mount_options more
consistent with other uses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

78fa701f

NFS: numeric mount parameters are unsigned · 2d767432

Chuck Lever authored Mar 14, 2008

Clean up: use %u instead of %d when displaying NFS mount options.

Nit: Fix reporting of "namlen=" option in nfs_show_mount_stats.  The mount
option is called "namlen" without the "e".
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

2d767432

NFS: clean up short packet handling for NFSv4 readdir · 7bda2cdf

Jeff Layton authored Feb 22, 2008

Currently, the NFS readdir decoders have a workaround for buggy servers
that send an empty readdir response with the EOF bit unset. If the
server sends a malformed response in some cases, this workaround kicks
in and just returns an empty response rather than returning a proper
error to the caller.

This patch does 3 things:

1) have malformed responses with no entries return error (-EIO)

2) preserve existing workaround for servers that send empty
   responses with the EOF marker unset.

3) Add some comments to clarify the logic in decode_readdir().
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

7bda2cdf

NFS: clean up short packet handling for NFSv3 readdir · 643f8111

Jeff Layton authored Feb 22, 2008

Currently, the NFS readdir decoders have a workaround for buggy servers
that send an empty readdir response with the EOF bit unset. If the
server sends a malformed response in some cases, this workaround kicks
in and just returns an empty response rather than returning a proper
error to the caller.

This patch does 3 things:

1) have malformed responses with no entries return error (-EIO)

2) preserve existing workaround for servers that send empty
   responses with the EOF marker unset.

3) Add some comments to clarify the logic in nfs3_xdr_readdirres().
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

643f8111

NFS: clean up short packet handling for NFSv2 readdir · caa02bd5

Jeff Layton authored Feb 22, 2008

Currently, the NFS readdir decoders have a workaround for buggy servers
that send an empty readdir response with the EOF bit unset. If the
server sends a malformed response in some cases, this workaround kicks
in and just returns an empty response rather than returning a proper
error to the caller.

This patch does 3 things:

1) have malformed responses with no entries return error (-EIO)

2) preserve existing workaround for servers that send empty
   responses with the EOF marker unset.

3) Add some comments to clarify the logic in nfs_xdr_readdirres().
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

caa02bd5

nfs: remove duplicate initializations of nfs_read_data field · 4af68bff

Fred Isaman authored Mar 19, 2008

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

4af68bff

nfs: nfs_redirty_request · 6d884e8f

Fred authored Mar 19, 2008

Both flush functions have the same error handling routine.  Pull
it out as a function.
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

6d884e8f

Merge branch 'hotfixes' into devel · c7c350e9
Trond Myklebust authored Mar 19, 2008

c7c350e9

nfs: don't ignore return value from nfs_pageio_add_request · f8512ad0

Fred Isaman authored Mar 19, 2008

Ignoring the return value from nfs_pageio_add_request can cause deadlocks.

In read path:
  call nfs_pageio_add_request from readpage_async_filler
  assume at this point that there are requests already in desc, that
    can't be merged with the current request.
  so nfs_pageio_doio is fired up to clear out desc.
  assume something goes wrong in setting up the io, so desc->pg_error is set.
  This causes nfs_pageio_add_request to return 0, *WITHOUT* adding the original
    request.
  BUT, since return code is ignored, readpage_async_filler assumes it has
    been added, and does nothing further, leaving page locked.
  do_generic_mapping_read will eventually call lock_page, resulting in deadlock

In write path:
  page is marked dirty by generic_perform_write
  nfs_writepages is called
  call nfs_pageio_add_request from nfs_page_async_flush
  assume at this point that there are requests already in desc, that
    can't be merged with the current request.
  so nfs_pageio_doio is fired up to clear out desc.
  assume something goes wrong in setting up the io, so desc->pg_error is set.
  This causes nfs_page_async_flush to return 0, *WITHOUT* adding the original
    request, yet marking the request as locked (PG_BUSY) and in writeback,
    clearing dirty marks.
  The next time a write is done to the page, deadlock will result as
    nfs_write_end calls nfs_update_request
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

f8512ad0

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx · 264e3e88

Linus Torvalds authored Mar 18, 2008

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
  async_tx: avoid the async xor_zero_sum path when src_cnt > device->max_xor
  fsldma: Fix the DMA halt when using DMA_INTERRUPT async_tx transfer.

264e3e88

Revert "ACPI: EC: Handle IRQ storm on Acer laptops" · d7a0e1f5

Alexey Starikovskiy authored Mar 19, 2008

This reverts commit 2c81ce4c.

It caused several new troubles (eg suspend slowdown bisected down to
this patch by Pavel Machek), so just revert it for now.
Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de>
Cc: Pavel Machek <pavel@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d7a0e1f5

Merge branch 'for-linus' of... · 2caf4703

Linus Torvalds authored Mar 18, 2008

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel:
  sched: tune multi-core idle balancing
  sched: retune wake granularity
  sched: wakeup-buddy tasks are cache-hot
  sched: improve affine wakeups
  sched, net: socket wakeups are sync
  sched: clean up wakeup balancing, code flow
  sched: clean up wakeup balancing, rename variables
  sched: clean up wakeup balancing, move wake_affine()

2caf4703

IDE: Make taskfile interface more robust wrt unexpected end-of-command · 6c3c3158

Linus Torvalds authored Mar 18, 2008

Now that we handle all the special commands using REQ_TYPE_ATA_TASKFILE
rather than using the old REQ_TYPE_ATA_CMD model, we need to also
emulate the lack of full taskfile data that comes with the old command
model (ie when commands are generated with the HDIO_DRIVE_CMD ioctl
rather than using the HDIO_DRIVE_TASK[FILE] ioctls).

In particular, this means that we should handle command completion the
more relaxed way that the old drive_cmd_intr() code did.  It allows
commands to finish early even if they don't use up all the data that we
thought we had for them.

This fixes a regression seen by Anders Eriksson where some SMART
commands sent by smartd would cause a boot-time system hang on his
machine because the IDE command handling code didn't realize that the
command had completed.
Tested-by: Anders Eriksson <aeriksson@fastmail.fm>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

6c3c3158

Merge branch 'slab-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm · d5eee405
Linus Torvalds authored Mar 18, 2008
```
* 'slab-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm:
  slub page alloc fallback: Enable interrupts for GFP_WAIT.
```
d5eee405

sched: tune multi-core idle balancing · 33b0c421

Ingo Molnar authored Mar 16, 2008

WAKE_IDLE is too agressive on multi-core CPUs with the new
wake-affine code, keep it on for SMT/HT balancing alone
(where there's no cache affinity at all between logical CPUs).
Signed-off-by: Ingo Molnar <mingo@elte.hu>

33b0c421

sched: retune wake granularity · 74e3cd7f

Ingo Molnar authored Mar 18, 2008

reduce wake-up granularity for better interactivity.
Signed-off-by: Ingo Molnar <mingo@elte.hu>

74e3cd7f

sched: wakeup-buddy tasks are cache-hot · f540a608

Ingo Molnar authored Mar 15, 2008

Wakeup-buddy tasks are cache-hot - this makes it a bit harder
for the load-balancer to tear them apart. (but it's still possible,
if the load is sufficiently assymetric)
Signed-off-by: Ingo Molnar <mingo@elte.hu>

f540a608

sched: improve affine wakeups · 4ae7d5ce

Ingo Molnar authored Mar 19, 2008

improve affine wakeups. Maintain the 'overlap' metric based on CFS's
sum_exec_runtime - which means the amount of time a task executes
after it wakes up some other task.

Use the 'overlap' for the wakeup decisions: if the 'overlap' is short,
it means there's strong workload coupling between this task and the
woken up task. If the 'overlap' is large then the workload is decoupled
and the scheduler will move them to separate CPUs more easily.

( Also slightly move the preempt_check within try_to_wake_up() - this has
  no effect on functionality but allows 'early wakeups' (for still-on-rq
  tasks) to be correctly accounted as well.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>

4ae7d5ce

sched, net: socket wakeups are sync · 6f3d0929

Ingo Molnar authored Mar 19, 2008

'sync' wakeups are a hint towards the scheduler that (certain)
networking related wakeups likely create coupling between tasks.
Signed-off-by: Ingo Molnar <mingo@elte.hu>

6f3d0929

sched: clean up wakeup balancing, code flow · f4827386

Ingo Molnar authored Mar 16, 2008

Clean up the code flow. No code changed:

kernel/sched.o:

   text	   data	    bss	    dec	    hex	filename
  42521	   2858	    232	  45611	   b22b	sched.o.before
  42521	   2858	    232	  45611	   b22b	sched.o.after

md5:
   09b31c44e9aff8666f72773dc433e2df  sched.o.before.asm
   09b31c44e9aff8666f72773dc433e2df  sched.o.after.asm
Signed-off-by: Ingo Molnar <mingo@elte.hu>

f4827386