Commits · 3e6053d76dcbd92b2f9f4ad5ece9bce83149523e · nexedi / linux

09 Oct, 2008 40 commits

block: adjust blkdev_issue_discard for swap · 3e6053d7

Hugh Dickins authored Sep 11, 2008

Two mods to blkdev_issue_discard(), thinking ahead to its use on swap:

1. Add gfp_mask argument, so swap allocation can use it where GFP_KERNEL
might deadlock but GFP_NOIO is safe.

2. Enlarge nr_sects argument from unsigned to sector_t: unsigned long is
enough to cover a whole swap area, but sector_t suits any partition.

Change sb_issue_discard()'s nr_blocks to sector_t too; but no need seen
for a gfp_mask there, just pass GFP_KERNEL down to blkdev_issue_discard().
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

3e6053d7

sg: remove unnecessary blk_rq_unmap_user · 4677735f

FUJITA Tomonori authored Sep 02, 2008

blk_rq_unmap_user in sg_finish_rem_req can take care of all the cases.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

4677735f

sg: remove sg_read_xfer · 0b6cb26c

FUJITA Tomonori authored Sep 02, 2008

sg_read_xfer was used to copy data to user space for READ
commands. blk_rq_unmap_user does the job so sg_read_xfer does nothing
useful.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

0b6cb26c

sg: remove sg_write_xfer · c3919af2

FUJITA Tomonori authored Sep 02, 2008

sg_write_xfer was used to copy data from user space for WRITE
commands. blk_rq_map_user_iov and blk_rq_map_user do the job so
sg_write_xfer does nothing useful.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

c3919af2

sg: incorporate sg_build_direct into sg_start_req · 626710c9

FUJITA Tomonori authored Sep 02, 2008

Calling blk_rq_map_user() at a single place is better than at
different two places. It makes the code more understandable.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

626710c9

sg: remove __sg_start_req · 44c7b0ea

FUJITA Tomonori authored Sep 02, 2008

__sg_start_req() was used temporarily to call blk_get_request() during
converting sg to use the block layer.

Now sg always calls blk_get_request() so we can move blk_get_request()
to sg_start_req(). We don't need __sg_start_req anymore.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

44c7b0ea

sg: remove b_malloc_len in sg_scatter_hold struct · fd1c1de0

FUJITA Tomonori authored Sep 02, 2008

It's not used for anything useful after the block layer conversion.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

fd1c1de0

sg: remove SG_ALLOW_DIO_CODE define · 7e56cb0f

FUJITA Tomonori authored Sep 02, 2008

sg had lots of the own functions for the direct IO but now sg uses the
block layer functions for it. There are only five lines for the direct
IO. SG_ALLOW_DIO_CODE define was used to compile out the direct IO
code but we don't need the define. If someone wants to remove the
direct IO code, he can do easily without the define.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

7e56cb0f

sg: rename sg_cmd_done sg_rq_end_io · a91a3a20

FUJITA Tomonori authored Sep 02, 2008

old sg_rq_end_io() was used to wrap sg_cmd_done during converting sg
to use the block layer (in order to cover the difference
scsi_execute_async and blk_execute_rq_nowait). Now we don't need it so
let's remove it.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

a91a3a20

dm: Call blk_abort_queue on failed paths · 224cb3e9

Mike Anderson authored Aug 29, 2008

Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

224cb3e9

block: Add interface to abort queued requests · 11914a53

Mike Anderson authored Sep 13, 2008

Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

11914a53

block: unify request timeout handling · 242f9dcb

Jens Axboe authored Sep 14, 2008

Right now SCSI and others do their own command timeout handling.
Move those bits to the block layer.

Instead of having a timer per command, we try to be a bit more clever
and simply have one per-queue. This avoids the overhead of having to
tear down and setup a timer for each command, so it will result in a lot
less timer fiddling.
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

242f9dcb

Call flush_disk() after detecting an online resize. · 608aeef1

Andrew Patterson authored Sep 04, 2008

We call flush_disk() to make sure the buffer cache for the disk is
flushed after a disk resize. There are two resize cases, growing and
shrinking. Given that users can shrink/then grow a disk before
revalidate_disk() is called, we treat the grow case identically to
shrinking. We need to flush the buffer cache after an online shrink
because, as James Bottomley puts it,

The two use cases for shrinking I can see are

1. planned: the fs is already shrunk to within the new boundaries
and all data is relocated, so invalidate is fine (any dirty
buffers that might exist in the shrunk region are there only
because they were relocated but not yet written to their
original location).
2. unplanned: In this case, the fs is probably toast, so whether
we invalidate or not isn't going to make a whole lot of
difference; it's still going to try to read or write from
sectors beyond the new size and get I/O errors.

Immediately invalidating shrunk disks will cause errors for outstanding
I/Os for reads/write beyond the new end of the disk to be generated
earlier then if we waited for the normal buffer cache operation. It also
removes a potential security hole where we might keep old data around
from beyond the end of the shrunk disk if the disk was not invalidated.
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

608aeef1

Added flush_disk to factor out common buffer cache flushing code. · 56ade44b

Andrew Patterson authored Sep 04, 2008

We need to be able to flush the buffer cache for for more than
just when a disk is changed, so we factor out common cache flush code
in check_disk_change() to an internal flush_disk() routine.  This
routine will then be used for both disk changes and disk resizes (in a
later patch).

Include the disk name in the text indicating that there are busy
inodes on the device and increase the KERN severity of the message.
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

56ade44b

SCSI sd driver calls revalidate_disk wrapper. · f98a8cae

Andrew Patterson authored Sep 04, 2008

Modify the SCSI disk driver to call the revalidate_disk()
wrapper. This allows us to do some housekeeping such as accounting for
a disk being resized online. The wrapper will call
sd_revalidate_disk() at the appropriate time.
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

f98a8cae

Check for device resize when rescanning partitions · 9bc3ffbf

Andrew Patterson authored Sep 04, 2008

Check for device resize in the rescan_partitions() routine. If the device
has been resized, the bdev size is set to match. The rescan_partitions()
routine is called when opening the device and when calling the
BLKRRPART ioctl.
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

9bc3ffbf

Adjust block device size after an online resize of a disk. · c3279d14

Andrew Patterson authored Sep 04, 2008

The revalidate_disk routine now checks if a disk has been resized by
comparing the gendisk capacity to the bdev inode size. If they are
different (usually because the disk has been resized underneath the kernel)
the bdev inode size is adjusted to match the capacity.
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

c3279d14

Wrapper for lower-level revalidate_disk routines. · 0c002c2f

Andrew Patterson authored Sep 04, 2008

This is a wrapper for the lower-level revalidate_disk call-backs such
as sd_revalidate_disk(). It allows us to perform pre and post
operations when calling them.

We will use this wrapper in a later patch to adjust block device sizes
after an online resize (a _post_ operation).
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

0c002c2f

block: fix duplicate headers for /proc/partitions · 243294da

Tejun Heo authored Sep 04, 2008

seqf can be started multiple times for a read and the header should be
printed only for the initial one.  Fix it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

243294da

sg: set dxferp to NULL for READ with the older SG interface · fad7f01e

FUJITA Tomonori authored Sep 02, 2008

With the older SG interface, we don't know a user-space address to
trasfer data when executing a SCSI command. So we can't pass a
user-space address to blk_rq_map_user.

This patch fixes sg to pass a NULL user-space address to
blk_rq_map_user so that it just sets up a request and bios with page
frames propely without data transfer.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

fad7f01e

block: make blk_rq_map_user take a NULL user-space buffer · 81882766

FUJITA Tomonori authored Sep 02, 2008

This patch changes blk_rq_map_user to accept a NULL user-space buffer
with a READ command if rq_map_data is not NULL. Thus a caller can pass
page frames to lk_rq_map_user to just set up a request and bios with
page frames propely. bio_uncopy_user (called via blk_rq_unmap_user)
doesn't copy data to user space with such request.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

81882766

block: update comment on end_request() · 839e96af

Jens Axboe authored Sep 02, 2008

It refers to functions that no longer exist after the IO completion
changes.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

839e96af

init: DEBUG_BLOCK_EXT_DEVT requires explicit root= param · 55dc7db7

Tejun Heo authored Sep 01, 2008

DEBUG_BLOCK_EXT_DEVT shuffles SCSI and IDE device numbers and root
device number set using rdev become meaningless.  Root devices should
be explicitly specified using textual names.  Warn about it if root
can't be found and DEBUG_BLOCK_EXT_DEVT is enabled.  Also, add warning
to the help text.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

55dc7db7

block: don't test for partition size in bdget_disk() and blk_lookup_devt() · 2bbedcb4

Tejun Heo authored Aug 29, 2008

bdget_disk() and blk_lookup_devt() never cared whether the specified
partition (or disk) is zero sized or not.  I got confused while
converting those not to depend on consecutive minor numbers in commit
5a6411b1178baf534aa9138052864dfa89d3eada and later when dev0 was added
it broke callers which expected to get valid return for zero sized
disk devices.

So, they never needed nr_sects checks in the first place.  Kill them.

This problem was spotted and debugged by Bartlmoiej Zolnierkiewicz.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

2bbedcb4

Change default value of CONFIG_DEBUG_BLOCK_EXT_DEVT to 'n' · 759f8ca3

Jens Axboe authored Aug 29, 2008

It's a debug option that you would explicitly enable to test this
feature, we should default it to 'n' to prevent accidental surprises
for now.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

759f8ca3

block: kmalloc args reversed, small function definition fixes · aeb3d3a8

Harvey Harrison authored Aug 28, 2008

Noticed by sparse:
block/blk-softirq.c:156:12: warning: symbol 'blk_softirq_init' was not declared. Should it be static?
block/genhd.c:583:28: warning: function 'bdget_disk' with external linkage has definition
block/genhd.c:659:17: warning: incorrect type in argument 1 (different base types)
block/genhd.c:659:17: expected unsigned int [unsigned] [usertype] size
block/genhd.c:659:17: got restricted gfp_t
block/genhd.c:659:29: warning: incorrect type in argument 2 (different base types)
block/genhd.c:659:29: expected restricted gfp_t [usertype] flags
block/genhd.c:659:29: got unsigned int
block: kmalloc args reversed
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

aeb3d3a8

sg: use blk_rq_aligned helper function · 01cfcddd

FUJITA Tomonori authored Aug 28, 2008

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

01cfcddd

block: add blk_rq_aligned helper function · 87904074

FUJITA Tomonori authored Aug 28, 2008

This adds blk_rq_aligned helper function to see if alignment and
padding requirement is satisfied for DMA transfer. This also converts
blk_rq_map_kern and __blk_rq_map_user to use the helper function.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

87904074

bio: convert bio_copy_kern to use bio_copy_user · 4d8ab62e

FUJITA Tomonori authored Aug 28, 2008

bio_copy_kern and bio_copy_user are very similar. This converts
bio_copy_kern to use bio_copy_user.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

4d8ab62e

sg: convert the indirect IO path to use the block layer · 10db10d1

FUJITA Tomonori authored Aug 29, 2008

This patch converts the indirect IO path (including mmap IO and old
struct sg_header) to use the block layer functions (blk_get_request,
blk_execute_rq_nowait, blk_rq_map_user, etc) instead of
scsi_execute_async().

[Jens: fixed compile error with SCSI logging enabled]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

10db10d1

sg: convert the direct IO path to use the block layer · 6e5a30cb

FUJITA Tomonori authored Aug 28, 2008

This patch converts the direct IO path (SG_FLAG_DIRECT_IO) to use the
block layer functions (blk_get_request, blk_execute_rq_nowait,
blk_rq_map_user, etc) instead of scsi_execute_async().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

6e5a30cb

sg: convert the non-data path to use the block layer · 10865dfa

FUJITA Tomonori authored Aug 28, 2008

This patch converts the non data path to use the block layer functions
(blk_get_request, blk_execute_rq_nowait, etc) instead of uses
scsi_execute_async().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

10865dfa

block: introduce struct rq_map_data to use reserved pages · 152e283f

FUJITA Tomonori authored Aug 28, 2008

This patch introduces struct rq_map_data to enable bio_copy_use_iov()
use reserved pages.

Currently, bio_copy_user_iov allocates bounce pages but
drivers/scsi/sg.c wants to allocate pages by itself and use
them. struct rq_map_data can be used to pass allocated pages to
bio_copy_user_iov.

The current users of bio_copy_user_iov simply passes NULL (they don't
want to use pre-allocated pages).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

152e283f

block: add gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov · a3bce90e

FUJITA Tomonori authored Aug 28, 2008

Currently, blk_rq_map_user and blk_rq_map_user_iov always do
GFP_KERNEL allocation.

This adds gfp_mask argument to blk_rq_map_user and blk_rq_map_user_iov
so sg can use it (sg always does GFP_ATOMIC allocation).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

a3bce90e

cfq-iosched: fix queue depth detection · 45333d5a

Aaron Carroll authored Aug 26, 2008

CFQ's detection of queueing devices assumes a non-queuing device and detects
if the queue depth reaches a certain threshold. Under some workloads (e.g.
synchronous reads), CFQ effectively forces a unit queue depth, thus defeating
the detection logic. This leads to poor performance on queuing hardware,
since the idle window remains enabled.

This patch inverts the sense of the logic: assume a queuing-capable device,
and detect if the depth does not exceed the threshold.
Signed-off-by: Aaron Carroll <aaronc@gelato.unsw.edu.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

45333d5a

block: don't use bio_has_data() in the completion path · 60540161

Jens Axboe authored Aug 26, 2008

We should just check for rq->bio, as that is really the information
we are looking for. Even if the bio attached doesn't carry data,
we still need to do IO post processing on it.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

60540161

block: inherit CPU completion on bio->rq and rq->rq merges · ab780f1e

Jens Axboe authored Aug 26, 2008

Somewhat incomplete, as we do allow merges of requests and bios
that have different completion CPUs given. This is done on the
assumption that a larger IO is still more beneficial than CPU
locality.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

ab780f1e

block: add support for IO CPU affinity · c7c22e4d

Jens Axboe authored Sep 13, 2008

This patch adds support for controlling the IO completion CPU of
either all requests on a queue, or on a per-request basis. We export
a sysfs variable (rq_affinity) which, if set, migrates completions
of requests to the CPU that originally submitted it. A bio helper
(bio_set_completion_cpu()) is also added, so that queuers can ask
for completion on that specific CPU.

In testing, this has been show to cut the system time by as much
as 20-40% on synthetic workloads where CPU affinity is desired.

This requires a little help from the architecture, so it'll only
work as designed for archs that are using the new generic smp
helper infrastructure.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

c7c22e4d

block: make kblockd_schedule_work() take the queue as parameter · 18887ad9
Jens Axboe authored Jul 28, 2008
```
Preparatory patch for checking queuing affinity.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
```
18887ad9
block: split softirq handling into blk-softirq.c · b646fc59
Jens Axboe authored Jul 28, 2008
```
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
```
b646fc59