Commit 6e80e8ed authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'for-2.6.35' of git://git.kernel.dk/linux-2.6-block

* 'for-2.6.35' of git://git.kernel.dk/linux-2.6-block: (86 commits)
  pipe: set lower and upper limit on max pages in the pipe page array
  pipe: add support for shrinking and growing pipes
  drbd: This is now equivalent to drbd release 8.3.8rc1
  drbd: Do not free p_uuid early, this is done in the exit code of the receiver
  drbd: Null pointer deref fix to the large "multi bio rewrite"
  drbd: Fix: Do not detach, if a bio with a barrier fails
  drbd: Ensure to not trigger late-new-UUID creation multiple times
  drbd: Do not Oops when C_STANDALONE when uuid gets generated
  writeback: fix mixed up arguments to bdi_start_writeback()
  writeback: fix problem with !CONFIG_BLOCK compilation
  block: improve automatic native capacity unlocking
  block: use struct parsed_partitions *state universally in partition check code
  block,ide: simplify bdops->set_capacity() to ->unlock_native_capacity()
  block: restart partition scan after resizing a device
  buffer: make invalidate_bdev() drain all percpu LRU add caches
  block: remove all rcu head initializations
  writeback: fixups for !dirty_writeback_centisecs
  writeback: bdi_writeback_task() must set task state before calling schedule()
  writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync
  drivers/block/drbd: Use kzalloc
  ...
parents 6969a434 ee9a3607
...@@ -17,6 +17,9 @@ HOWTO ...@@ -17,6 +17,9 @@ HOWTO
You can do a very simple testing of running two dd threads in two different You can do a very simple testing of running two dd threads in two different
cgroups. Here is what you can do. cgroups. Here is what you can do.
- Enable Block IO controller
CONFIG_BLK_CGROUP=y
- Enable group scheduling in CFQ - Enable group scheduling in CFQ
CONFIG_CFQ_GROUP_IOSCHED=y CONFIG_CFQ_GROUP_IOSCHED=y
...@@ -54,32 +57,52 @@ cgroups. Here is what you can do. ...@@ -54,32 +57,52 @@ cgroups. Here is what you can do.
Various user visible config options Various user visible config options
=================================== ===================================
CONFIG_CFQ_GROUP_IOSCHED
- Enables group scheduling in CFQ. Currently only 1 level of group
creation is allowed.
CONFIG_DEBUG_CFQ_IOSCHED
- Enables some debugging messages in blktrace. Also creates extra
cgroup file blkio.dequeue.
Config options selected automatically
=====================================
These config options are not user visible and are selected/deselected
automatically based on IO scheduler configuration.
CONFIG_BLK_CGROUP CONFIG_BLK_CGROUP
- Block IO controller. Selected by CONFIG_CFQ_GROUP_IOSCHED. - Block IO controller.
CONFIG_DEBUG_BLK_CGROUP CONFIG_DEBUG_BLK_CGROUP
- Debug help. Selected by CONFIG_DEBUG_CFQ_IOSCHED. - Debug help. Right now some additional stats file show up in cgroup
if this option is enabled.
CONFIG_CFQ_GROUP_IOSCHED
- Enables group scheduling in CFQ. Currently only 1 level of group
creation is allowed.
Details of cgroup files Details of cgroup files
======================= =======================
- blkio.weight - blkio.weight
- Specifies per cgroup weight. - Specifies per cgroup weight. This is default weight of the group
on all the devices until and unless overridden by per device rule.
(See blkio.weight_device).
Currently allowed range of weights is from 100 to 1000. Currently allowed range of weights is from 100 to 1000.
- blkio.weight_device
- One can specify per cgroup per device rules using this interface.
These rules override the default value of group weight as specified
by blkio.weight.
Following is the format.
#echo dev_maj:dev_minor weight > /path/to/cgroup/blkio.weight_device
Configure weight=300 on /dev/sdb (8:16) in this cgroup
# echo 8:16 300 > blkio.weight_device
# cat blkio.weight_device
dev weight
8:16 300
Configure weight=500 on /dev/sda (8:0) in this cgroup
# echo 8:0 500 > blkio.weight_device
# cat blkio.weight_device
dev weight
8:0 500
8:16 300
Remove specific weight for /dev/sda in this cgroup
# echo 8:0 0 > blkio.weight_device
# cat blkio.weight_device
dev weight
8:16 300
- blkio.time - blkio.time
- disk time allocated to cgroup per device in milliseconds. First - disk time allocated to cgroup per device in milliseconds. First
two fields specify the major and minor number of the device and two fields specify the major and minor number of the device and
...@@ -92,13 +115,105 @@ Details of cgroup files ...@@ -92,13 +115,105 @@ Details of cgroup files
third field specifies the number of sectors transferred by the third field specifies the number of sectors transferred by the
group to/from the device. group to/from the device.
- blkio.io_service_bytes
- Number of bytes transferred to/from the disk by the group. These
are further divided by the type of operation - read or write, sync
or async. First two fields specify the major and minor number of the
device, third field specifies the operation type and the fourth field
specifies the number of bytes.
- blkio.io_serviced
- Number of IOs completed to/from the disk by the group. These
are further divided by the type of operation - read or write, sync
or async. First two fields specify the major and minor number of the
device, third field specifies the operation type and the fourth field
specifies the number of IOs.
- blkio.io_service_time
- Total amount of time between request dispatch and request completion
for the IOs done by this cgroup. This is in nanoseconds to make it
meaningful for flash devices too. For devices with queue depth of 1,
this time represents the actual service time. When queue_depth > 1,
that is no longer true as requests may be served out of order. This
may cause the service time for a given IO to include the service time
of multiple IOs when served out of order which may result in total
io_service_time > actual time elapsed. This time is further divided by
the type of operation - read or write, sync or async. First two fields
specify the major and minor number of the device, third field
specifies the operation type and the fourth field specifies the
io_service_time in ns.
- blkio.io_wait_time
- Total amount of time the IOs for this cgroup spent waiting in the
scheduler queues for service. This can be greater than the total time
elapsed since it is cumulative io_wait_time for all IOs. It is not a
measure of total time the cgroup spent waiting but rather a measure of
the wait_time for its individual IOs. For devices with queue_depth > 1
this metric does not include the time spent waiting for service once
the IO is dispatched to the device but till it actually gets serviced
(there might be a time lag here due to re-ordering of requests by the
device). This is in nanoseconds to make it meaningful for flash
devices too. This time is further divided by the type of operation -
read or write, sync or async. First two fields specify the major and
minor number of the device, third field specifies the operation type
and the fourth field specifies the io_wait_time in ns.
- blkio.io_merged
- Total number of bios/requests merged into requests belonging to this
cgroup. This is further divided by the type of operation - read or
write, sync or async.
- blkio.io_queued
- Total number of requests queued up at any given instant for this
cgroup. This is further divided by the type of operation - read or
write, sync or async.
- blkio.avg_queue_size
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
The average queue size for this cgroup over the entire time of this
cgroup's existence. Queue size samples are taken each time one of the
queues of this cgroup gets a timeslice.
- blkio.group_wait_time
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
This is the amount of time the cgroup had to wait since it became busy
(i.e., went from 0 to 1 request queued) to get a timeslice for one of
its queues. This is different from the io_wait_time which is the
cumulative total of the amount of time spent by each IO in that cgroup
waiting in the scheduler queue. This is in nanoseconds. If this is
read when the cgroup is in a waiting (for timeslice) state, the stat
will only report the group_wait_time accumulated till the last time it
got a timeslice and will not include the current delta.
- blkio.empty_time
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
This is the amount of time a cgroup spends without any pending
requests when not being served, i.e., it does not include any time
spent idling for one of the queues of the cgroup. This is in
nanoseconds. If this is read when the cgroup is in an empty state,
the stat will only report the empty_time accumulated till the last
time it had a pending request and will not include the current delta.
- blkio.idle_time
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
This is the amount of time spent by the IO scheduler idling for a
given cgroup in anticipation of a better request than the exising ones
from other queues/cgroups. This is in nanoseconds. If this is read
when the cgroup is in an idling state, the stat will only report the
idle_time accumulated till the last idle period and will not include
the current delta.
- blkio.dequeue - blkio.dequeue
- Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y. This - Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y. This
gives the statistics about how many a times a group was dequeued gives the statistics about how many a times a group was dequeued
from service tree of the device. First two fields specify the major from service tree of the device. First two fields specify the major
and minor number of the device and third field specifies the number and minor number of the device and third field specifies the number
of times a group was dequeued from a particular device. of times a group was dequeued from a particular device.
- blkio.reset_stats
- Writing an int to this file will result in resetting all the stats
for that cgroup.
CFQ sysfs tunable CFQ sysfs tunable
================= =================
/sys/block/<disk>/queue/iosched/group_isolation /sys/block/<disk>/queue/iosched/group_isolation
......
...@@ -77,29 +77,6 @@ config BLK_DEV_INTEGRITY ...@@ -77,29 +77,6 @@ config BLK_DEV_INTEGRITY
T10/SCSI Data Integrity Field or the T13/ATA External Path T10/SCSI Data Integrity Field or the T13/ATA External Path
Protection. If in doubt, say N. Protection. If in doubt, say N.
config BLK_CGROUP
tristate "Block cgroup support"
depends on CGROUPS
depends on CFQ_GROUP_IOSCHED
default n
---help---
Generic block IO controller cgroup interface. This is the common
cgroup interface which should be used by various IO controlling
policies.
Currently, CFQ IO scheduler uses it to recognize task groups and
control disk bandwidth allocation (proportional time slice allocation)
to such task groups.
config DEBUG_BLK_CGROUP
bool
depends on BLK_CGROUP
default n
---help---
Enable some debugging help. Currently it stores the cgroup path
in the blk group which can be used by cfq for tracing various
group related activity.
endif # BLOCK endif # BLOCK
config BLOCK_COMPAT config BLOCK_COMPAT
......
...@@ -23,7 +23,8 @@ config IOSCHED_DEADLINE ...@@ -23,7 +23,8 @@ config IOSCHED_DEADLINE
config IOSCHED_CFQ config IOSCHED_CFQ
tristate "CFQ I/O scheduler" tristate "CFQ I/O scheduler"
select BLK_CGROUP if CFQ_GROUP_IOSCHED # If BLK_CGROUP is a module, CFQ has to be built as module.
depends on (BLK_CGROUP=m && m) || !BLK_CGROUP || BLK_CGROUP=y
default y default y
---help--- ---help---
The CFQ I/O scheduler tries to distribute bandwidth equally The CFQ I/O scheduler tries to distribute bandwidth equally
...@@ -33,22 +34,15 @@ config IOSCHED_CFQ ...@@ -33,22 +34,15 @@ config IOSCHED_CFQ
This is the default I/O scheduler. This is the default I/O scheduler.
Note: If BLK_CGROUP=m, then CFQ can be built only as module.
config CFQ_GROUP_IOSCHED config CFQ_GROUP_IOSCHED
bool "CFQ Group Scheduling support" bool "CFQ Group Scheduling support"
depends on IOSCHED_CFQ && CGROUPS depends on IOSCHED_CFQ && BLK_CGROUP
default n default n
---help--- ---help---
Enable group IO scheduling in CFQ. Enable group IO scheduling in CFQ.
config DEBUG_CFQ_IOSCHED
bool "Debug CFQ Scheduling"
depends on CFQ_GROUP_IOSCHED
select DEBUG_BLK_CGROUP
default n
---help---
Enable CFQ IO scheduling debugging in CFQ. Currently it makes
blktrace output more verbose.
choice choice
prompt "Default I/O scheduler" prompt "Default I/O scheduler"
default DEFAULT_CFQ default DEFAULT_CFQ
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
obj-$(CONFIG_BLOCK) := elevator.o blk-core.o blk-tag.o blk-sysfs.o \ obj-$(CONFIG_BLOCK) := elevator.o blk-core.o blk-tag.o blk-sysfs.o \
blk-barrier.o blk-settings.o blk-ioc.o blk-map.o \ blk-barrier.o blk-settings.o blk-ioc.o blk-map.o \
blk-exec.o blk-merge.o blk-softirq.o blk-timeout.o \ blk-exec.o blk-merge.o blk-softirq.o blk-timeout.o \
blk-iopoll.o ioctl.o genhd.o scsi_ioctl.o blk-iopoll.o blk-lib.o ioctl.o genhd.o scsi_ioctl.o
obj-$(CONFIG_BLK_DEV_BSG) += bsg.o obj-$(CONFIG_BLK_DEV_BSG) += bsg.o
obj-$(CONFIG_BLK_CGROUP) += blk-cgroup.o obj-$(CONFIG_BLK_CGROUP) += blk-cgroup.o
......
...@@ -286,26 +286,31 @@ static void bio_end_empty_barrier(struct bio *bio, int err) ...@@ -286,26 +286,31 @@ static void bio_end_empty_barrier(struct bio *bio, int err)
set_bit(BIO_EOPNOTSUPP, &bio->bi_flags); set_bit(BIO_EOPNOTSUPP, &bio->bi_flags);
clear_bit(BIO_UPTODATE, &bio->bi_flags); clear_bit(BIO_UPTODATE, &bio->bi_flags);
} }
if (bio->bi_private)
complete(bio->bi_private); complete(bio->bi_private);
bio_put(bio);
} }
/** /**
* blkdev_issue_flush - queue a flush * blkdev_issue_flush - queue a flush
* @bdev: blockdev to issue flush for * @bdev: blockdev to issue flush for
* @gfp_mask: memory allocation flags (for bio_alloc)
* @error_sector: error sector * @error_sector: error sector
* @flags: BLKDEV_IFL_* flags to control behaviour
* *
* Description: * Description:
* Issue a flush for the block device in question. Caller can supply * Issue a flush for the block device in question. Caller can supply
* room for storing the error offset in case of a flush error, if they * room for storing the error offset in case of a flush error, if they
* wish to. * wish to. If WAIT flag is not passed then caller may check only what
* request was pushed in some internal queue for later handling.
*/ */
int blkdev_issue_flush(struct block_device *bdev, sector_t *error_sector) int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
sector_t *error_sector, unsigned long flags)
{ {
DECLARE_COMPLETION_ONSTACK(wait); DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q; struct request_queue *q;
struct bio *bio; struct bio *bio;
int ret; int ret = 0;
if (bdev->bd_disk == NULL) if (bdev->bd_disk == NULL)
return -ENXIO; return -ENXIO;
...@@ -314,23 +319,25 @@ int blkdev_issue_flush(struct block_device *bdev, sector_t *error_sector) ...@@ -314,23 +319,25 @@ int blkdev_issue_flush(struct block_device *bdev, sector_t *error_sector)
if (!q) if (!q)
return -ENXIO; return -ENXIO;
bio = bio_alloc(GFP_KERNEL, 0); bio = bio_alloc(gfp_mask, 0);
bio->bi_end_io = bio_end_empty_barrier; bio->bi_end_io = bio_end_empty_barrier;
bio->bi_private = &wait;
bio->bi_bdev = bdev; bio->bi_bdev = bdev;
submit_bio(WRITE_BARRIER, bio); if (test_bit(BLKDEV_WAIT, &flags))
bio->bi_private = &wait;
wait_for_completion(&wait);
/* bio_get(bio);
* The driver must store the error location in ->bi_sector, if submit_bio(WRITE_BARRIER, bio);
* it supports it. For non-stacked drivers, this should be copied if (test_bit(BLKDEV_WAIT, &flags)) {
* from blk_rq_pos(rq). wait_for_completion(&wait);
*/ /*
if (error_sector) * The driver must store the error location in ->bi_sector, if
*error_sector = bio->bi_sector; * it supports it. For non-stacked drivers, this should be
* copied from blk_rq_pos(rq).
*/
if (error_sector)
*error_sector = bio->bi_sector;
}
ret = 0;
if (bio_flagged(bio, BIO_EOPNOTSUPP)) if (bio_flagged(bio, BIO_EOPNOTSUPP))
ret = -EOPNOTSUPP; ret = -EOPNOTSUPP;
else if (!bio_flagged(bio, BIO_UPTODATE)) else if (!bio_flagged(bio, BIO_UPTODATE))
...@@ -340,107 +347,3 @@ int blkdev_issue_flush(struct block_device *bdev, sector_t *error_sector) ...@@ -340,107 +347,3 @@ int blkdev_issue_flush(struct block_device *bdev, sector_t *error_sector)
return ret; return ret;
} }
EXPORT_SYMBOL(blkdev_issue_flush); EXPORT_SYMBOL(blkdev_issue_flush);
static void blkdev_discard_end_io(struct bio *bio, int err)
{
if (err) {
if (err == -EOPNOTSUPP)
set_bit(BIO_EOPNOTSUPP, &bio->bi_flags);
clear_bit(BIO_UPTODATE, &bio->bi_flags);
}
if (bio->bi_private)
complete(bio->bi_private);
__free_page(bio_page(bio));
bio_put(bio);
}
/**
* blkdev_issue_discard - queue a discard
* @bdev: blockdev to issue discard for
* @sector: start sector
* @nr_sects: number of sectors to discard
* @gfp_mask: memory allocation flags (for bio_alloc)
* @flags: DISCARD_FL_* flags to control behaviour
*
* Description:
* Issue a discard request for the sectors in question.
*/
int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, int flags)
{
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
int type = flags & DISCARD_FL_BARRIER ?
DISCARD_BARRIER : DISCARD_NOBARRIER;
struct bio *bio;
struct page *page;
int ret = 0;
if (!q)
return -ENXIO;
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
while (nr_sects && !ret) {
unsigned int sector_size = q->limits.logical_block_size;
unsigned int max_discard_sectors =
min(q->limits.max_discard_sectors, UINT_MAX >> 9);
bio = bio_alloc(gfp_mask, 1);
if (!bio)
goto out;
bio->bi_sector = sector;
bio->bi_end_io = blkdev_discard_end_io;
bio->bi_bdev = bdev;
if (flags & DISCARD_FL_WAIT)
bio->bi_private = &wait;
/*
* Add a zeroed one-sector payload as that's what
* our current implementations need. If we'll ever need
* more the interface will need revisiting.
*/
page = alloc_page(gfp_mask | __GFP_ZERO);
if (!page)
goto out_free_bio;
if (bio_add_pc_page(q, bio, page, sector_size, 0) < sector_size)
goto out_free_page;
/*
* And override the bio size - the way discard works we
* touch many more blocks on disk than the actual payload
* length.
*/
if (nr_sects > max_discard_sectors) {
bio->bi_size = max_discard_sectors << 9;
nr_sects -= max_discard_sectors;
sector += max_discard_sectors;
} else {
bio->bi_size = nr_sects << 9;
nr_sects = 0;
}
bio_get(bio);
submit_bio(type, bio);
if (flags & DISCARD_FL_WAIT)
wait_for_completion(&wait);
if (bio_flagged(bio, BIO_EOPNOTSUPP))
ret = -EOPNOTSUPP;
else if (!bio_flagged(bio, BIO_UPTODATE))
ret = -EIO;
bio_put(bio);
}
return ret;
out_free_page:
__free_page(page);
out_free_bio:
bio_put(bio);
out:
return -ENOMEM;
}
EXPORT_SYMBOL(blkdev_issue_discard);
This diff is collapsed.
...@@ -23,11 +23,84 @@ extern struct cgroup_subsys blkio_subsys; ...@@ -23,11 +23,84 @@ extern struct cgroup_subsys blkio_subsys;
#define blkio_subsys_id blkio_subsys.subsys_id #define blkio_subsys_id blkio_subsys.subsys_id
#endif #endif
enum stat_type {
/* Total time spent (in ns) between request dispatch to the driver and
* request completion for IOs doen by this cgroup. This may not be
* accurate when NCQ is turned on. */
BLKIO_STAT_SERVICE_TIME = 0,
/* Total bytes transferred */
BLKIO_STAT_SERVICE_BYTES,
/* Total IOs serviced, post merge */
BLKIO_STAT_SERVICED,
/* Total time spent waiting in scheduler queue in ns */
BLKIO_STAT_WAIT_TIME,
/* Number of IOs merged */
BLKIO_STAT_MERGED,
/* Number of IOs queued up */
BLKIO_STAT_QUEUED,
/* All the single valued stats go below this */
BLKIO_STAT_TIME,
BLKIO_STAT_SECTORS,
#ifdef CONFIG_DEBUG_BLK_CGROUP
BLKIO_STAT_AVG_QUEUE_SIZE,
BLKIO_STAT_IDLE_TIME,
BLKIO_STAT_EMPTY_TIME,
BLKIO_STAT_GROUP_WAIT_TIME,
BLKIO_STAT_DEQUEUE
#endif
};
enum stat_sub_type {
BLKIO_STAT_READ = 0,
BLKIO_STAT_WRITE,
BLKIO_STAT_SYNC,
BLKIO_STAT_ASYNC,
BLKIO_STAT_TOTAL
};
/* blkg state flags */
enum blkg_state_flags {
BLKG_waiting = 0,
BLKG_idling,
BLKG_empty,
};
struct blkio_cgroup { struct blkio_cgroup {
struct cgroup_subsys_state css; struct cgroup_subsys_state css;
unsigned int weight; unsigned int weight;
spinlock_t lock; spinlock_t lock;
struct hlist_head blkg_list; struct hlist_head blkg_list;
struct list_head policy_list; /* list of blkio_policy_node */
};
struct blkio_group_stats {
/* total disk time and nr sectors dispatched by this group */
uint64_t time;
uint64_t sectors;
uint64_t stat_arr[BLKIO_STAT_QUEUED + 1][BLKIO_STAT_TOTAL];
#ifdef CONFIG_DEBUG_BLK_CGROUP
/* Sum of number of IOs queued across all samples */
uint64_t avg_queue_size_sum;
/* Count of samples taken for average */
uint64_t avg_queue_size_samples;
/* How many times this group has been removed from service tree */
unsigned long dequeue;
/* Total time spent waiting for it to be assigned a timeslice. */
uint64_t group_wait_time;
uint64_t start_group_wait_time;
/* Time spent idling for this blkio_group */
uint64_t idle_time;
uint64_t start_idle_time;
/*
* Total time when we have requests queued and do not contain the
* current active queue.
*/
uint64_t empty_time;
uint64_t start_empty_time;
uint16_t flags;
#endif
}; };
struct blkio_group { struct blkio_group {
...@@ -35,20 +108,25 @@ struct blkio_group { ...@@ -35,20 +108,25 @@ struct blkio_group {
void *key; void *key;
struct hlist_node blkcg_node; struct hlist_node blkcg_node;
unsigned short blkcg_id; unsigned short blkcg_id;
#ifdef CONFIG_DEBUG_BLK_CGROUP
/* Store cgroup path */ /* Store cgroup path */
char path[128]; char path[128];
/* How many times this group has been removed from service tree */
unsigned long dequeue;
#endif
/* The device MKDEV(major, minor), this group has been created for */ /* The device MKDEV(major, minor), this group has been created for */
dev_t dev; dev_t dev;
/* total disk time and nr sectors dispatched by this group */ /* Need to serialize the stats in the case of reset/update */
unsigned long time; spinlock_t stats_lock;
unsigned long sectors; struct blkio_group_stats stats;
}; };
struct blkio_policy_node {
struct list_head node;
dev_t dev;
unsigned int weight;
};
extern unsigned int blkcg_get_weight(struct blkio_cgroup *blkcg,
dev_t dev);
typedef void (blkio_unlink_group_fn) (void *key, struct blkio_group *blkg); typedef void (blkio_unlink_group_fn) (void *key, struct blkio_group *blkg);
typedef void (blkio_update_group_weight_fn) (struct blkio_group *blkg, typedef void (blkio_update_group_weight_fn) (struct blkio_group *blkg,
unsigned int weight); unsigned int weight);
...@@ -67,6 +145,11 @@ struct blkio_policy_type { ...@@ -67,6 +145,11 @@ struct blkio_policy_type {
extern void blkio_policy_register(struct blkio_policy_type *); extern void blkio_policy_register(struct blkio_policy_type *);
extern void blkio_policy_unregister(struct blkio_policy_type *); extern void blkio_policy_unregister(struct blkio_policy_type *);
static inline char *blkg_path(struct blkio_group *blkg)
{
return blkg->path;
}
#else #else
struct blkio_group { struct blkio_group {
...@@ -78,6 +161,8 @@ struct blkio_policy_type { ...@@ -78,6 +161,8 @@ struct blkio_policy_type {
static inline void blkio_policy_register(struct blkio_policy_type *blkiop) { } static inline void blkio_policy_register(struct blkio_policy_type *blkiop) { }
static inline void blkio_policy_unregister(struct blkio_policy_type *blkiop) { } static inline void blkio_policy_unregister(struct blkio_policy_type *blkiop) { }
static inline char *blkg_path(struct blkio_group *blkg) { return NULL; }
#endif #endif
#define BLKIO_WEIGHT_MIN 100 #define BLKIO_WEIGHT_MIN 100
...@@ -85,16 +170,42 @@ static inline void blkio_policy_unregister(struct blkio_policy_type *blkiop) { } ...@@ -85,16 +170,42 @@ static inline void blkio_policy_unregister(struct blkio_policy_type *blkiop) { }
#define BLKIO_WEIGHT_DEFAULT 500 #define BLKIO_WEIGHT_DEFAULT 500
#ifdef CONFIG_DEBUG_BLK_CGROUP #ifdef CONFIG_DEBUG_BLK_CGROUP
static inline char *blkg_path(struct blkio_group *blkg) void blkiocg_update_avg_queue_size_stats(struct blkio_group *blkg);
{ void blkiocg_update_dequeue_stats(struct blkio_group *blkg,
return blkg->path;
}
void blkiocg_update_blkio_group_dequeue_stats(struct blkio_group *blkg,
unsigned long dequeue); unsigned long dequeue);
void blkiocg_update_set_idle_time_stats(struct blkio_group *blkg);
void blkiocg_update_idle_time_stats(struct blkio_group *blkg);
void blkiocg_set_start_empty_time(struct blkio_group *blkg);
#define BLKG_FLAG_FNS(name) \
static inline void blkio_mark_blkg_##name( \
struct blkio_group_stats *stats) \
{ \
stats->flags |= (1 << BLKG_##name); \
} \
static inline void blkio_clear_blkg_##name( \
struct blkio_group_stats *stats) \
{ \
stats->flags &= ~(1 << BLKG_##name); \
} \
static inline int blkio_blkg_##name(struct blkio_group_stats *stats) \
{ \
return (stats->flags & (1 << BLKG_##name)) != 0; \
} \
BLKG_FLAG_FNS(waiting)
BLKG_FLAG_FNS(idling)
BLKG_FLAG_FNS(empty)
#undef BLKG_FLAG_FNS
#else #else
static inline char *blkg_path(struct blkio_group *blkg) { return NULL; } static inline void blkiocg_update_avg_queue_size_stats(
static inline void blkiocg_update_blkio_group_dequeue_stats( struct blkio_group *blkg) {}
struct blkio_group *blkg, unsigned long dequeue) {} static inline void blkiocg_update_dequeue_stats(struct blkio_group *blkg,
unsigned long dequeue) {}
static inline void blkiocg_update_set_idle_time_stats(struct blkio_group *blkg)
{}
static inline void blkiocg_update_idle_time_stats(struct blkio_group *blkg) {}
static inline void blkiocg_set_start_empty_time(struct blkio_group *blkg) {}
#endif #endif
#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE) #if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)
...@@ -105,26 +216,43 @@ extern void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg, ...@@ -105,26 +216,43 @@ extern void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
extern int blkiocg_del_blkio_group(struct blkio_group *blkg); extern int blkiocg_del_blkio_group(struct blkio_group *blkg);
extern struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg, extern struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg,
void *key); void *key);
void blkiocg_update_blkio_group_stats(struct blkio_group *blkg, void blkiocg_update_timeslice_used(struct blkio_group *blkg,
unsigned long time, unsigned long sectors); unsigned long time);
void blkiocg_update_dispatch_stats(struct blkio_group *blkg, uint64_t bytes,
bool direction, bool sync);
void blkiocg_update_completion_stats(struct blkio_group *blkg,
uint64_t start_time, uint64_t io_start_time, bool direction, bool sync);
void blkiocg_update_io_merged_stats(struct blkio_group *blkg, bool direction,
bool sync);
void blkiocg_update_io_add_stats(struct blkio_group *blkg,
struct blkio_group *curr_blkg, bool direction, bool sync);
void blkiocg_update_io_remove_stats(struct blkio_group *blkg,
bool direction, bool sync);
#else #else
struct cgroup; struct cgroup;
static inline struct blkio_cgroup * static inline struct blkio_cgroup *
cgroup_to_blkio_cgroup(struct cgroup *cgroup) { return NULL; } cgroup_to_blkio_cgroup(struct cgroup *cgroup) { return NULL; }
static inline void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg, static inline void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
struct blkio_group *blkg, void *key, dev_t dev) struct blkio_group *blkg, void *key, dev_t dev) {}
{
}
static inline int static inline int
blkiocg_del_blkio_group(struct blkio_group *blkg) { return 0; } blkiocg_del_blkio_group(struct blkio_group *blkg) { return 0; }
static inline struct blkio_group * static inline struct blkio_group *
blkiocg_lookup_group(struct blkio_cgroup *blkcg, void *key) { return NULL; } blkiocg_lookup_group(struct blkio_cgroup *blkcg, void *key) { return NULL; }
static inline void blkiocg_update_blkio_group_stats(struct blkio_group *blkg, static inline void blkiocg_update_timeslice_used(struct blkio_group *blkg,
unsigned long time, unsigned long sectors) unsigned long time) {}
{ static inline void blkiocg_update_dispatch_stats(struct blkio_group *blkg,
} uint64_t bytes, bool direction, bool sync) {}
static inline void blkiocg_update_completion_stats(struct blkio_group *blkg,
uint64_t start_time, uint64_t io_start_time, bool direction,
bool sync) {}
static inline void blkiocg_update_io_merged_stats(struct blkio_group *blkg,
bool direction, bool sync) {}
static inline void blkiocg_update_io_add_stats(struct blkio_group *blkg,
struct blkio_group *curr_blkg, bool direction, bool sync) {}
static inline void blkiocg_update_io_remove_stats(struct blkio_group *blkg,
bool direction, bool sync) {}
#endif #endif
#endif /* _BLK_CGROUP_H */ #endif /* _BLK_CGROUP_H */
...@@ -127,6 +127,7 @@ void blk_rq_init(struct request_queue *q, struct request *rq) ...@@ -127,6 +127,7 @@ void blk_rq_init(struct request_queue *q, struct request *rq)
rq->tag = -1; rq->tag = -1;
rq->ref_count = 1; rq->ref_count = 1;
rq->start_time = jiffies; rq->start_time = jiffies;
set_start_time_ns(rq);
} }
EXPORT_SYMBOL(blk_rq_init); EXPORT_SYMBOL(blk_rq_init);
...@@ -450,6 +451,7 @@ void blk_cleanup_queue(struct request_queue *q) ...@@ -450,6 +451,7 @@ void blk_cleanup_queue(struct request_queue *q)
*/ */
blk_sync_queue(q); blk_sync_queue(q);
del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
mutex_lock(&q->sysfs_lock); mutex_lock(&q->sysfs_lock);
queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q); queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
mutex_unlock(&q->sysfs_lock); mutex_unlock(&q->sysfs_lock);
...@@ -510,6 +512,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) ...@@ -510,6 +512,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
return NULL; return NULL;
} }
setup_timer(&q->backing_dev_info.laptop_mode_wb_timer,
laptop_mode_timer_fn, (unsigned long) q);
init_timer(&q->unplug_timer); init_timer(&q->unplug_timer);
setup_timer(&q->timeout, blk_rq_timed_out_timer, (unsigned long) q); setup_timer(&q->timeout, blk_rq_timed_out_timer, (unsigned long) q);
INIT_LIST_HEAD(&q->timeout_list); INIT_LIST_HEAD(&q->timeout_list);
...@@ -568,6 +572,22 @@ blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id) ...@@ -568,6 +572,22 @@ blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
{ {
struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id); struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id);
return blk_init_allocated_queue_node(q, rfn, lock, node_id);
}
EXPORT_SYMBOL(blk_init_queue_node);
struct request_queue *
blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
spinlock_t *lock)
{
return blk_init_allocated_queue_node(q, rfn, lock, -1);
}
EXPORT_SYMBOL(blk_init_allocated_queue);
struct request_queue *
blk_init_allocated_queue_node(struct request_queue *q, request_fn_proc *rfn,
spinlock_t *lock, int node_id)
{
if (!q) if (!q)
return NULL; return NULL;
...@@ -601,7 +621,7 @@ blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id) ...@@ -601,7 +621,7 @@ blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
blk_put_queue(q); blk_put_queue(q);
return NULL; return NULL;
} }
EXPORT_SYMBOL(blk_init_queue_node); EXPORT_SYMBOL(blk_init_allocated_queue_node);
int blk_get_queue(struct request_queue *q) int blk_get_queue(struct request_queue *q)
{ {
...@@ -1198,6 +1218,7 @@ static int __make_request(struct request_queue *q, struct bio *bio) ...@@ -1198,6 +1218,7 @@ static int __make_request(struct request_queue *q, struct bio *bio)
if (!blk_rq_cpu_valid(req)) if (!blk_rq_cpu_valid(req))
req->cpu = bio->bi_comp_cpu; req->cpu = bio->bi_comp_cpu;
drive_stat_acct(req, 0); drive_stat_acct(req, 0);
elv_bio_merged(q, req, bio);
if (!attempt_back_merge(q, req)) if (!attempt_back_merge(q, req))
elv_merged_request(q, req, el_ret); elv_merged_request(q, req, el_ret);
goto out; goto out;
...@@ -1231,6 +1252,7 @@ static int __make_request(struct request_queue *q, struct bio *bio) ...@@ -1231,6 +1252,7 @@ static int __make_request(struct request_queue *q, struct bio *bio)
if (!blk_rq_cpu_valid(req)) if (!blk_rq_cpu_valid(req))
req->cpu = bio->bi_comp_cpu; req->cpu = bio->bi_comp_cpu;
drive_stat_acct(req, 0); drive_stat_acct(req, 0);
elv_bio_merged(q, req, bio);
if (!attempt_front_merge(q, req)) if (!attempt_front_merge(q, req))
elv_merged_request(q, req, el_ret); elv_merged_request(q, req, el_ret);
goto out; goto out;
...@@ -1855,8 +1877,10 @@ void blk_dequeue_request(struct request *rq) ...@@ -1855,8 +1877,10 @@ void blk_dequeue_request(struct request *rq)
* and to it is freed is accounted as io that is in progress at * and to it is freed is accounted as io that is in progress at
* the driver side. * the driver side.
*/ */
if (blk_account_rq(rq)) if (blk_account_rq(rq)) {
q->in_flight[rq_is_sync(rq)]++; q->in_flight[rq_is_sync(rq)]++;
set_io_start_time_ns(rq);
}
} }
/** /**
...@@ -2098,7 +2122,7 @@ static void blk_finish_request(struct request *req, int error) ...@@ -2098,7 +2122,7 @@ static void blk_finish_request(struct request *req, int error)
BUG_ON(blk_queued_rq(req)); BUG_ON(blk_queued_rq(req));
if (unlikely(laptop_mode) && blk_fs_request(req)) if (unlikely(laptop_mode) && blk_fs_request(req))
laptop_io_completion(); laptop_io_completion(&req->q->backing_dev_info);
blk_delete_timer(req); blk_delete_timer(req);
...@@ -2517,4 +2541,3 @@ int __init blk_dev_init(void) ...@@ -2517,4 +2541,3 @@ int __init blk_dev_init(void)
return 0; return 0;
} }
/*
* Functions related to generic helpers functions
*/
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/bio.h>
#include <linux/blkdev.h>
#include <linux/scatterlist.h>
#include "blk.h"
static void blkdev_discard_end_io(struct bio *bio, int err)
{
if (err) {
if (err == -EOPNOTSUPP)
set_bit(BIO_EOPNOTSUPP, &bio->bi_flags);
clear_bit(BIO_UPTODATE, &bio->bi_flags);
}
if (bio->bi_private)
complete(bio->bi_private);
__free_page(bio_page(bio));
bio_put(bio);
}
/**
* blkdev_issue_discard - queue a discard
* @bdev: blockdev to issue discard for
* @sector: start sector
* @nr_sects: number of sectors to discard
* @gfp_mask: memory allocation flags (for bio_alloc)
* @flags: BLKDEV_IFL_* flags to control behaviour
*
* Description:
* Issue a discard request for the sectors in question.
*/
int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
{
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
int type = flags & BLKDEV_IFL_BARRIER ?
DISCARD_BARRIER : DISCARD_NOBARRIER;
struct bio *bio;
struct page *page;
int ret = 0;
if (!q)
return -ENXIO;
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
while (nr_sects && !ret) {
unsigned int sector_size = q->limits.logical_block_size;
unsigned int max_discard_sectors =
min(q->limits.max_discard_sectors, UINT_MAX >> 9);
bio = bio_alloc(gfp_mask, 1);
if (!bio)
goto out;
bio->bi_sector = sector;
bio->bi_end_io = blkdev_discard_end_io;
bio->bi_bdev = bdev;
if (flags & BLKDEV_IFL_WAIT)
bio->bi_private = &wait;
/*
* Add a zeroed one-sector payload as that's what
* our current implementations need. If we'll ever need
* more the interface will need revisiting.
*/
page = alloc_page(gfp_mask | __GFP_ZERO);
if (!page)
goto out_free_bio;
if (bio_add_pc_page(q, bio, page, sector_size, 0) < sector_size)
goto out_free_page;
/*
* And override the bio size - the way discard works we
* touch many more blocks on disk than the actual payload
* length.
*/
if (nr_sects > max_discard_sectors) {
bio->bi_size = max_discard_sectors << 9;
nr_sects -= max_discard_sectors;
sector += max_discard_sectors;
} else {
bio->bi_size = nr_sects << 9;
nr_sects = 0;
}
bio_get(bio);
submit_bio(type, bio);
if (flags & BLKDEV_IFL_WAIT)
wait_for_completion(&wait);
if (bio_flagged(bio, BIO_EOPNOTSUPP))
ret = -EOPNOTSUPP;
else if (!bio_flagged(bio, BIO_UPTODATE))
ret = -EIO;
bio_put(bio);
}
return ret;
out_free_page:
__free_page(page);
out_free_bio:
bio_put(bio);
out:
return -ENOMEM;
}
EXPORT_SYMBOL(blkdev_issue_discard);
struct bio_batch
{
atomic_t done;
unsigned long flags;
struct completion *wait;
bio_end_io_t *end_io;
};
static void bio_batch_end_io(struct bio *bio, int err)
{
struct bio_batch *bb = bio->bi_private;
if (err) {
if (err == -EOPNOTSUPP)
set_bit(BIO_EOPNOTSUPP, &bb->flags);
else
clear_bit(BIO_UPTODATE, &bb->flags);
}
if (bb) {
if (bb->end_io)
bb->end_io(bio, err);
atomic_inc(&bb->done);
complete(bb->wait);
}
bio_put(bio);
}
/**
* blkdev_issue_zeroout generate number of zero filed write bios
* @bdev: blockdev to issue
* @sector: start sector
* @nr_sects: number of sectors to write
* @gfp_mask: memory allocation flags (for bio_alloc)
* @flags: BLKDEV_IFL_* flags to control behaviour
*
* Description:
* Generate and issue number of bios with zerofiled pages.
* Send barrier at the beginning and at the end if requested. This guarantie
* correct request ordering. Empty barrier allow us to avoid post queue flush.
*/
int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
{
int ret = 0;
struct bio *bio;
struct bio_batch bb;
unsigned int sz, issued = 0;
DECLARE_COMPLETION_ONSTACK(wait);
atomic_set(&bb.done, 0);
bb.flags = 1 << BIO_UPTODATE;
bb.wait = &wait;
bb.end_io = NULL;
if (flags & BLKDEV_IFL_BARRIER) {
/* issue async barrier before the data */
ret = blkdev_issue_flush(bdev, gfp_mask, NULL, 0);
if (ret)
return ret;
}
submit:
while (nr_sects != 0) {
bio = bio_alloc(gfp_mask,
min(nr_sects, (sector_t)BIO_MAX_PAGES));
if (!bio)
break;
bio->bi_sector = sector;
bio->bi_bdev = bdev;
bio->bi_end_io = bio_batch_end_io;
if (flags & BLKDEV_IFL_WAIT)
bio->bi_private = &bb;
while (nr_sects != 0) {
sz = min((sector_t) PAGE_SIZE >> 9 , nr_sects);
if (sz == 0)
/* bio has maximum size possible */
break;
ret = bio_add_page(bio, ZERO_PAGE(0), sz << 9, 0);
nr_sects -= ret >> 9;
sector += ret >> 9;
if (ret < (sz << 9))
break;
}
issued++;
submit_bio(WRITE, bio);
}
/*
* When all data bios are in flight. Send final barrier if requeted.
*/
if (nr_sects == 0 && flags & BLKDEV_IFL_BARRIER)
ret = blkdev_issue_flush(bdev, gfp_mask, NULL,
flags & BLKDEV_IFL_WAIT);
if (flags & BLKDEV_IFL_WAIT)
/* Wait for bios in-flight */
while ( issued != atomic_read(&bb.done))
wait_for_completion(&wait);
if (!test_bit(BIO_UPTODATE, &bb.flags))
/* One of bios in the batch was completed with error.*/
ret = -EIO;
if (ret)
goto out;
if (test_bit(BIO_EOPNOTSUPP, &bb.flags)) {
ret = -EOPNOTSUPP;
goto out;
}
if (nr_sects != 0)
goto submit;
out:
return ret;
}
EXPORT_SYMBOL(blkdev_issue_zeroout);
...@@ -55,6 +55,7 @@ static const int cfq_hist_divisor = 4; ...@@ -55,6 +55,7 @@ static const int cfq_hist_divisor = 4;
#define RQ_CIC(rq) \ #define RQ_CIC(rq) \
((struct cfq_io_context *) (rq)->elevator_private) ((struct cfq_io_context *) (rq)->elevator_private)
#define RQ_CFQQ(rq) (struct cfq_queue *) ((rq)->elevator_private2) #define RQ_CFQQ(rq) (struct cfq_queue *) ((rq)->elevator_private2)
#define RQ_CFQG(rq) (struct cfq_group *) ((rq)->elevator_private3)
static struct kmem_cache *cfq_pool; static struct kmem_cache *cfq_pool;
static struct kmem_cache *cfq_ioc_pool; static struct kmem_cache *cfq_ioc_pool;
...@@ -143,8 +144,6 @@ struct cfq_queue { ...@@ -143,8 +144,6 @@ struct cfq_queue {
struct cfq_queue *new_cfqq; struct cfq_queue *new_cfqq;
struct cfq_group *cfqg; struct cfq_group *cfqg;
struct cfq_group *orig_cfqg; struct cfq_group *orig_cfqg;
/* Sectors dispatched in current dispatch round */
unsigned long nr_sectors;
}; };
/* /*
...@@ -346,7 +345,7 @@ CFQ_CFQQ_FNS(deep); ...@@ -346,7 +345,7 @@ CFQ_CFQQ_FNS(deep);
CFQ_CFQQ_FNS(wait_busy); CFQ_CFQQ_FNS(wait_busy);
#undef CFQ_CFQQ_FNS #undef CFQ_CFQQ_FNS
#ifdef CONFIG_DEBUG_CFQ_IOSCHED #ifdef CONFIG_CFQ_GROUP_IOSCHED
#define cfq_log_cfqq(cfqd, cfqq, fmt, args...) \ #define cfq_log_cfqq(cfqd, cfqq, fmt, args...) \
blk_add_trace_msg((cfqd)->queue, "cfq%d%c %s " fmt, (cfqq)->pid, \ blk_add_trace_msg((cfqd)->queue, "cfq%d%c %s " fmt, (cfqq)->pid, \
cfq_cfqq_sync((cfqq)) ? 'S' : 'A', \ cfq_cfqq_sync((cfqq)) ? 'S' : 'A', \
...@@ -858,7 +857,7 @@ cfq_group_service_tree_del(struct cfq_data *cfqd, struct cfq_group *cfqg) ...@@ -858,7 +857,7 @@ cfq_group_service_tree_del(struct cfq_data *cfqd, struct cfq_group *cfqg)
if (!RB_EMPTY_NODE(&cfqg->rb_node)) if (!RB_EMPTY_NODE(&cfqg->rb_node))
cfq_rb_erase(&cfqg->rb_node, st); cfq_rb_erase(&cfqg->rb_node, st);
cfqg->saved_workload_slice = 0; cfqg->saved_workload_slice = 0;
blkiocg_update_blkio_group_dequeue_stats(&cfqg->blkg, 1); blkiocg_update_dequeue_stats(&cfqg->blkg, 1);
} }
static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq) static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq)
...@@ -884,8 +883,7 @@ static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq) ...@@ -884,8 +883,7 @@ static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq)
slice_used = cfqq->allocated_slice; slice_used = cfqq->allocated_slice;
} }
cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u sect=%lu", slice_used, cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u", slice_used);
cfqq->nr_sectors);
return slice_used; return slice_used;
} }
...@@ -919,8 +917,8 @@ static void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg, ...@@ -919,8 +917,8 @@ static void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg,
cfq_log_cfqg(cfqd, cfqg, "served: vt=%llu min_vt=%llu", cfqg->vdisktime, cfq_log_cfqg(cfqd, cfqg, "served: vt=%llu min_vt=%llu", cfqg->vdisktime,
st->min_vdisktime); st->min_vdisktime);
blkiocg_update_blkio_group_stats(&cfqg->blkg, used_sl, blkiocg_update_timeslice_used(&cfqg->blkg, used_sl);
cfqq->nr_sectors); blkiocg_set_start_empty_time(&cfqg->blkg);
} }
#ifdef CONFIG_CFQ_GROUP_IOSCHED #ifdef CONFIG_CFQ_GROUP_IOSCHED
...@@ -961,7 +959,6 @@ cfq_find_alloc_cfqg(struct cfq_data *cfqd, struct cgroup *cgroup, int create) ...@@ -961,7 +959,6 @@ cfq_find_alloc_cfqg(struct cfq_data *cfqd, struct cgroup *cgroup, int create)
if (!cfqg) if (!cfqg)
goto done; goto done;
cfqg->weight = blkcg->weight;
for_each_cfqg_st(cfqg, i, j, st) for_each_cfqg_st(cfqg, i, j, st)
*st = CFQ_RB_ROOT; *st = CFQ_RB_ROOT;
RB_CLEAR_NODE(&cfqg->rb_node); RB_CLEAR_NODE(&cfqg->rb_node);
...@@ -978,6 +975,7 @@ cfq_find_alloc_cfqg(struct cfq_data *cfqd, struct cgroup *cgroup, int create) ...@@ -978,6 +975,7 @@ cfq_find_alloc_cfqg(struct cfq_data *cfqd, struct cgroup *cgroup, int create)
sscanf(dev_name(bdi->dev), "%u:%u", &major, &minor); sscanf(dev_name(bdi->dev), "%u:%u", &major, &minor);
blkiocg_add_blkio_group(blkcg, &cfqg->blkg, (void *)cfqd, blkiocg_add_blkio_group(blkcg, &cfqg->blkg, (void *)cfqd,
MKDEV(major, minor)); MKDEV(major, minor));
cfqg->weight = blkcg_get_weight(blkcg, cfqg->blkg.dev);
/* Add group on cfqd list */ /* Add group on cfqd list */
hlist_add_head(&cfqg->cfqd_node, &cfqd->cfqg_list); hlist_add_head(&cfqg->cfqd_node, &cfqd->cfqg_list);
...@@ -1004,6 +1002,12 @@ static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd, int create) ...@@ -1004,6 +1002,12 @@ static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd, int create)
return cfqg; return cfqg;
} }
static inline struct cfq_group *cfq_ref_get_cfqg(struct cfq_group *cfqg)
{
atomic_inc(&cfqg->ref);
return cfqg;
}
static void cfq_link_cfqq_cfqg(struct cfq_queue *cfqq, struct cfq_group *cfqg) static void cfq_link_cfqq_cfqg(struct cfq_queue *cfqq, struct cfq_group *cfqg)
{ {
/* Currently, all async queues are mapped to root group */ /* Currently, all async queues are mapped to root group */
...@@ -1087,6 +1091,12 @@ static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd, int create) ...@@ -1087,6 +1091,12 @@ static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd, int create)
{ {
return &cfqd->root_group; return &cfqd->root_group;
} }
static inline struct cfq_group *cfq_ref_get_cfqg(struct cfq_group *cfqg)
{
return cfqg;
}
static inline void static inline void
cfq_link_cfqq_cfqg(struct cfq_queue *cfqq, struct cfq_group *cfqg) { cfq_link_cfqq_cfqg(struct cfq_queue *cfqq, struct cfq_group *cfqg) {
cfqq->cfqg = cfqg; cfqq->cfqg = cfqg;
...@@ -1389,7 +1399,12 @@ static void cfq_reposition_rq_rb(struct cfq_queue *cfqq, struct request *rq) ...@@ -1389,7 +1399,12 @@ static void cfq_reposition_rq_rb(struct cfq_queue *cfqq, struct request *rq)
{ {
elv_rb_del(&cfqq->sort_list, rq); elv_rb_del(&cfqq->sort_list, rq);
cfqq->queued[rq_is_sync(rq)]--; cfqq->queued[rq_is_sync(rq)]--;
blkiocg_update_io_remove_stats(&(RQ_CFQG(rq))->blkg, rq_data_dir(rq),
rq_is_sync(rq));
cfq_add_rq_rb(rq); cfq_add_rq_rb(rq);
blkiocg_update_io_add_stats(&(RQ_CFQG(rq))->blkg,
&cfqq->cfqd->serving_group->blkg, rq_data_dir(rq),
rq_is_sync(rq));
} }
static struct request * static struct request *
...@@ -1445,6 +1460,8 @@ static void cfq_remove_request(struct request *rq) ...@@ -1445,6 +1460,8 @@ static void cfq_remove_request(struct request *rq)
cfq_del_rq_rb(rq); cfq_del_rq_rb(rq);
cfqq->cfqd->rq_queued--; cfqq->cfqd->rq_queued--;
blkiocg_update_io_remove_stats(&(RQ_CFQG(rq))->blkg, rq_data_dir(rq),
rq_is_sync(rq));
if (rq_is_meta(rq)) { if (rq_is_meta(rq)) {
WARN_ON(!cfqq->meta_pending); WARN_ON(!cfqq->meta_pending);
cfqq->meta_pending--; cfqq->meta_pending--;
...@@ -1476,6 +1493,13 @@ static void cfq_merged_request(struct request_queue *q, struct request *req, ...@@ -1476,6 +1493,13 @@ static void cfq_merged_request(struct request_queue *q, struct request *req,
} }
} }
static void cfq_bio_merged(struct request_queue *q, struct request *req,
struct bio *bio)
{
blkiocg_update_io_merged_stats(&(RQ_CFQG(req))->blkg, bio_data_dir(bio),
cfq_bio_sync(bio));
}
static void static void
cfq_merged_requests(struct request_queue *q, struct request *rq, cfq_merged_requests(struct request_queue *q, struct request *rq,
struct request *next) struct request *next)
...@@ -1493,6 +1517,8 @@ cfq_merged_requests(struct request_queue *q, struct request *rq, ...@@ -1493,6 +1517,8 @@ cfq_merged_requests(struct request_queue *q, struct request *rq,
if (cfqq->next_rq == next) if (cfqq->next_rq == next)
cfqq->next_rq = rq; cfqq->next_rq = rq;
cfq_remove_request(next); cfq_remove_request(next);
blkiocg_update_io_merged_stats(&(RQ_CFQG(rq))->blkg, rq_data_dir(next),
rq_is_sync(next));
} }
static int cfq_allow_merge(struct request_queue *q, struct request *rq, static int cfq_allow_merge(struct request_queue *q, struct request *rq,
...@@ -1520,18 +1546,24 @@ static int cfq_allow_merge(struct request_queue *q, struct request *rq, ...@@ -1520,18 +1546,24 @@ static int cfq_allow_merge(struct request_queue *q, struct request *rq,
return cfqq == RQ_CFQQ(rq); return cfqq == RQ_CFQQ(rq);
} }
static inline void cfq_del_timer(struct cfq_data *cfqd, struct cfq_queue *cfqq)
{
del_timer(&cfqd->idle_slice_timer);
blkiocg_update_idle_time_stats(&cfqq->cfqg->blkg);
}
static void __cfq_set_active_queue(struct cfq_data *cfqd, static void __cfq_set_active_queue(struct cfq_data *cfqd,
struct cfq_queue *cfqq) struct cfq_queue *cfqq)
{ {
if (cfqq) { if (cfqq) {
cfq_log_cfqq(cfqd, cfqq, "set_active wl_prio:%d wl_type:%d", cfq_log_cfqq(cfqd, cfqq, "set_active wl_prio:%d wl_type:%d",
cfqd->serving_prio, cfqd->serving_type); cfqd->serving_prio, cfqd->serving_type);
blkiocg_update_avg_queue_size_stats(&cfqq->cfqg->blkg);
cfqq->slice_start = 0; cfqq->slice_start = 0;
cfqq->dispatch_start = jiffies; cfqq->dispatch_start = jiffies;
cfqq->allocated_slice = 0; cfqq->allocated_slice = 0;
cfqq->slice_end = 0; cfqq->slice_end = 0;
cfqq->slice_dispatch = 0; cfqq->slice_dispatch = 0;
cfqq->nr_sectors = 0;
cfq_clear_cfqq_wait_request(cfqq); cfq_clear_cfqq_wait_request(cfqq);
cfq_clear_cfqq_must_dispatch(cfqq); cfq_clear_cfqq_must_dispatch(cfqq);
...@@ -1539,7 +1571,7 @@ static void __cfq_set_active_queue(struct cfq_data *cfqd, ...@@ -1539,7 +1571,7 @@ static void __cfq_set_active_queue(struct cfq_data *cfqd,
cfq_clear_cfqq_fifo_expire(cfqq); cfq_clear_cfqq_fifo_expire(cfqq);
cfq_mark_cfqq_slice_new(cfqq); cfq_mark_cfqq_slice_new(cfqq);
del_timer(&cfqd->idle_slice_timer); cfq_del_timer(cfqd, cfqq);
} }
cfqd->active_queue = cfqq; cfqd->active_queue = cfqq;
...@@ -1555,7 +1587,7 @@ __cfq_slice_expired(struct cfq_data *cfqd, struct cfq_queue *cfqq, ...@@ -1555,7 +1587,7 @@ __cfq_slice_expired(struct cfq_data *cfqd, struct cfq_queue *cfqq,
cfq_log_cfqq(cfqd, cfqq, "slice expired t=%d", timed_out); cfq_log_cfqq(cfqd, cfqq, "slice expired t=%d", timed_out);
if (cfq_cfqq_wait_request(cfqq)) if (cfq_cfqq_wait_request(cfqq))
del_timer(&cfqd->idle_slice_timer); cfq_del_timer(cfqd, cfqq);
cfq_clear_cfqq_wait_request(cfqq); cfq_clear_cfqq_wait_request(cfqq);
cfq_clear_cfqq_wait_busy(cfqq); cfq_clear_cfqq_wait_busy(cfqq);
...@@ -1857,6 +1889,7 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd) ...@@ -1857,6 +1889,7 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
sl = cfqd->cfq_slice_idle; sl = cfqd->cfq_slice_idle;
mod_timer(&cfqd->idle_slice_timer, jiffies + sl); mod_timer(&cfqd->idle_slice_timer, jiffies + sl);
blkiocg_update_set_idle_time_stats(&cfqq->cfqg->blkg);
cfq_log_cfqq(cfqd, cfqq, "arm_idle: %lu", sl); cfq_log_cfqq(cfqd, cfqq, "arm_idle: %lu", sl);
} }
...@@ -1876,7 +1909,8 @@ static void cfq_dispatch_insert(struct request_queue *q, struct request *rq) ...@@ -1876,7 +1909,8 @@ static void cfq_dispatch_insert(struct request_queue *q, struct request *rq)
elv_dispatch_sort(q, rq); elv_dispatch_sort(q, rq);
cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]++; cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]++;
cfqq->nr_sectors += blk_rq_sectors(rq); blkiocg_update_dispatch_stats(&cfqq->cfqg->blkg, blk_rq_bytes(rq),
rq_data_dir(rq), rq_is_sync(rq));
} }
/* /*
...@@ -3185,11 +3219,14 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq, ...@@ -3185,11 +3219,14 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq,
if (cfq_cfqq_wait_request(cfqq)) { if (cfq_cfqq_wait_request(cfqq)) {
if (blk_rq_bytes(rq) > PAGE_CACHE_SIZE || if (blk_rq_bytes(rq) > PAGE_CACHE_SIZE ||
cfqd->busy_queues > 1) { cfqd->busy_queues > 1) {
del_timer(&cfqd->idle_slice_timer); cfq_del_timer(cfqd, cfqq);
cfq_clear_cfqq_wait_request(cfqq); cfq_clear_cfqq_wait_request(cfqq);
__blk_run_queue(cfqd->queue); __blk_run_queue(cfqd->queue);
} else } else {
blkiocg_update_idle_time_stats(
&cfqq->cfqg->blkg);
cfq_mark_cfqq_must_dispatch(cfqq); cfq_mark_cfqq_must_dispatch(cfqq);
}
} }
} else if (cfq_should_preempt(cfqd, cfqq, rq)) { } else if (cfq_should_preempt(cfqd, cfqq, rq)) {
/* /*
...@@ -3214,7 +3251,9 @@ static void cfq_insert_request(struct request_queue *q, struct request *rq) ...@@ -3214,7 +3251,9 @@ static void cfq_insert_request(struct request_queue *q, struct request *rq)
rq_set_fifo_time(rq, jiffies + cfqd->cfq_fifo_expire[rq_is_sync(rq)]); rq_set_fifo_time(rq, jiffies + cfqd->cfq_fifo_expire[rq_is_sync(rq)]);
list_add_tail(&rq->queuelist, &cfqq->fifo); list_add_tail(&rq->queuelist, &cfqq->fifo);
cfq_add_rq_rb(rq); cfq_add_rq_rb(rq);
blkiocg_update_io_add_stats(&(RQ_CFQG(rq))->blkg,
&cfqd->serving_group->blkg, rq_data_dir(rq),
rq_is_sync(rq));
cfq_rq_enqueued(cfqd, cfqq, rq); cfq_rq_enqueued(cfqd, cfqq, rq);
} }
...@@ -3300,6 +3339,9 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq) ...@@ -3300,6 +3339,9 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
WARN_ON(!cfqq->dispatched); WARN_ON(!cfqq->dispatched);
cfqd->rq_in_driver--; cfqd->rq_in_driver--;
cfqq->dispatched--; cfqq->dispatched--;
blkiocg_update_completion_stats(&cfqq->cfqg->blkg, rq_start_time_ns(rq),
rq_io_start_time_ns(rq), rq_data_dir(rq),
rq_is_sync(rq));
cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]--; cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]--;
...@@ -3440,6 +3482,10 @@ static void cfq_put_request(struct request *rq) ...@@ -3440,6 +3482,10 @@ static void cfq_put_request(struct request *rq)
rq->elevator_private = NULL; rq->elevator_private = NULL;
rq->elevator_private2 = NULL; rq->elevator_private2 = NULL;
/* Put down rq reference on cfqg */
cfq_put_cfqg(RQ_CFQG(rq));
rq->elevator_private3 = NULL;
cfq_put_queue(cfqq); cfq_put_queue(cfqq);
} }
} }
...@@ -3528,6 +3574,7 @@ cfq_set_request(struct request_queue *q, struct request *rq, gfp_t gfp_mask) ...@@ -3528,6 +3574,7 @@ cfq_set_request(struct request_queue *q, struct request *rq, gfp_t gfp_mask)
rq->elevator_private = cic; rq->elevator_private = cic;
rq->elevator_private2 = cfqq; rq->elevator_private2 = cfqq;
rq->elevator_private3 = cfq_ref_get_cfqg(cfqq->cfqg);
return 0; return 0;
queue_fail: queue_fail:
...@@ -3743,7 +3790,6 @@ static void *cfq_init_queue(struct request_queue *q) ...@@ -3743,7 +3790,6 @@ static void *cfq_init_queue(struct request_queue *q)
* second, in order to have larger depth for async operations. * second, in order to have larger depth for async operations.
*/ */
cfqd->last_delayed_sync = jiffies - HZ; cfqd->last_delayed_sync = jiffies - HZ;
INIT_RCU_HEAD(&cfqd->rcu);
return cfqd; return cfqd;
} }
...@@ -3872,6 +3918,7 @@ static struct elevator_type iosched_cfq = { ...@@ -3872,6 +3918,7 @@ static struct elevator_type iosched_cfq = {
.elevator_merged_fn = cfq_merged_request, .elevator_merged_fn = cfq_merged_request,
.elevator_merge_req_fn = cfq_merged_requests, .elevator_merge_req_fn = cfq_merged_requests,
.elevator_allow_merge_fn = cfq_allow_merge, .elevator_allow_merge_fn = cfq_allow_merge,
.elevator_bio_merged_fn = cfq_bio_merged,
.elevator_dispatch_fn = cfq_dispatch_requests, .elevator_dispatch_fn = cfq_dispatch_requests,
.elevator_add_req_fn = cfq_insert_request, .elevator_add_req_fn = cfq_insert_request,
.elevator_activate_req_fn = cfq_activate_request, .elevator_activate_req_fn = cfq_activate_request,
......
...@@ -539,6 +539,15 @@ void elv_merge_requests(struct request_queue *q, struct request *rq, ...@@ -539,6 +539,15 @@ void elv_merge_requests(struct request_queue *q, struct request *rq,
q->last_merge = rq; q->last_merge = rq;
} }
void elv_bio_merged(struct request_queue *q, struct request *rq,
struct bio *bio)
{
struct elevator_queue *e = q->elevator;
if (e->ops->elevator_bio_merged_fn)
e->ops->elevator_bio_merged_fn(q, rq, bio);
}
void elv_requeue_request(struct request_queue *q, struct request *rq) void elv_requeue_request(struct request_queue *q, struct request *rq)
{ {
/* /*
...@@ -921,6 +930,7 @@ int elv_register_queue(struct request_queue *q) ...@@ -921,6 +930,7 @@ int elv_register_queue(struct request_queue *q)
} }
return error; return error;
} }
EXPORT_SYMBOL(elv_register_queue);
static void __elv_unregister_queue(struct elevator_queue *e) static void __elv_unregister_queue(struct elevator_queue *e)
{ {
...@@ -933,6 +943,7 @@ void elv_unregister_queue(struct request_queue *q) ...@@ -933,6 +943,7 @@ void elv_unregister_queue(struct request_queue *q)
if (q) if (q)
__elv_unregister_queue(q->elevator); __elv_unregister_queue(q->elevator);
} }
EXPORT_SYMBOL(elv_unregister_queue);
void elv_register(struct elevator_type *e) void elv_register(struct elevator_type *e)
{ {
......
...@@ -596,6 +596,7 @@ struct gendisk *get_gendisk(dev_t devt, int *partno) ...@@ -596,6 +596,7 @@ struct gendisk *get_gendisk(dev_t devt, int *partno)
return disk; return disk;
} }
EXPORT_SYMBOL(get_gendisk);
/** /**
* bdget_disk - do bdget() by gendisk and partition number * bdget_disk - do bdget() by gendisk and partition number
...@@ -987,7 +988,6 @@ int disk_expand_part_tbl(struct gendisk *disk, int partno) ...@@ -987,7 +988,6 @@ int disk_expand_part_tbl(struct gendisk *disk, int partno)
if (!new_ptbl) if (!new_ptbl)
return -ENOMEM; return -ENOMEM;
INIT_RCU_HEAD(&new_ptbl->rcu_head);
new_ptbl->len = target; new_ptbl->len = target;
for (i = 0; i < len; i++) for (i = 0; i < len; i++)
......
...@@ -126,7 +126,7 @@ static int blk_ioctl_discard(struct block_device *bdev, uint64_t start, ...@@ -126,7 +126,7 @@ static int blk_ioctl_discard(struct block_device *bdev, uint64_t start,
if (start + len > (bdev->bd_inode->i_size >> 9)) if (start + len > (bdev->bd_inode->i_size >> 9))
return -EINVAL; return -EINVAL;
return blkdev_issue_discard(bdev, start, len, GFP_KERNEL, return blkdev_issue_discard(bdev, start, len, GFP_KERNEL,
DISCARD_FL_WAIT); BLKDEV_IFL_WAIT);
} }
static int put_ushort(unsigned long arg, unsigned short val) static int put_ushort(unsigned long arg, unsigned short val)
......
...@@ -76,6 +76,17 @@ config BLK_DEV_XD ...@@ -76,6 +76,17 @@ config BLK_DEV_XD
It's pretty unlikely that you have one of these: say N. It's pretty unlikely that you have one of these: say N.
config GDROM
tristate "SEGA Dreamcast GD-ROM drive"
depends on SH_DREAMCAST
help
A standard SEGA Dreamcast comes with a modified CD ROM drive called a
"GD-ROM" by SEGA to signify it is capable of reading special disks
with up to 1 GB of data. This drive will also read standard CD ROM
disks. Select this option to access any disks in your GD ROM drive.
Most users will want to say "Y" here.
You can also build this as a module which will be called gdrom.
config PARIDE config PARIDE
tristate "Parallel port IDE device support" tristate "Parallel port IDE device support"
depends on PARPORT_PC depends on PARPORT_PC
...@@ -103,17 +114,6 @@ config PARIDE ...@@ -103,17 +114,6 @@ config PARIDE
"MicroSolutions backpack protocol", "DataStor Commuter protocol" "MicroSolutions backpack protocol", "DataStor Commuter protocol"
etc.). etc.).
config GDROM
tristate "SEGA Dreamcast GD-ROM drive"
depends on SH_DREAMCAST
help
A standard SEGA Dreamcast comes with a modified CD ROM drive called a
"GD-ROM" by SEGA to signify it is capable of reading special disks
with up to 1 GB of data. This drive will also read standard CD ROM
disks. Select this option to access any disks in your GD ROM drive.
Most users will want to say "Y" here.
You can also build this as a module which will be called gdrom.
source "drivers/block/paride/Kconfig" source "drivers/block/paride/Kconfig"
config BLK_CPQ_DA config BLK_CPQ_DA
......
...@@ -84,6 +84,9 @@ struct drbd_bitmap { ...@@ -84,6 +84,9 @@ struct drbd_bitmap {
#define BM_MD_IO_ERROR 1 #define BM_MD_IO_ERROR 1
#define BM_P_VMALLOCED 2 #define BM_P_VMALLOCED 2
static int __bm_change_bits_to(struct drbd_conf *mdev, const unsigned long s,
unsigned long e, int val, const enum km_type km);
static int bm_is_locked(struct drbd_bitmap *b) static int bm_is_locked(struct drbd_bitmap *b)
{ {
return test_bit(BM_LOCKED, &b->bm_flags); return test_bit(BM_LOCKED, &b->bm_flags);
...@@ -441,7 +444,7 @@ static void bm_memset(struct drbd_bitmap *b, size_t offset, int c, size_t len) ...@@ -441,7 +444,7 @@ static void bm_memset(struct drbd_bitmap *b, size_t offset, int c, size_t len)
* In case this is actually a resize, we copy the old bitmap into the new one. * In case this is actually a resize, we copy the old bitmap into the new one.
* Otherwise, the bitmap is initialized to all bits set. * Otherwise, the bitmap is initialized to all bits set.
*/ */
int drbd_bm_resize(struct drbd_conf *mdev, sector_t capacity) int drbd_bm_resize(struct drbd_conf *mdev, sector_t capacity, int set_new_bits)
{ {
struct drbd_bitmap *b = mdev->bitmap; struct drbd_bitmap *b = mdev->bitmap;
unsigned long bits, words, owords, obits, *p_addr, *bm; unsigned long bits, words, owords, obits, *p_addr, *bm;
...@@ -516,7 +519,7 @@ int drbd_bm_resize(struct drbd_conf *mdev, sector_t capacity) ...@@ -516,7 +519,7 @@ int drbd_bm_resize(struct drbd_conf *mdev, sector_t capacity)
obits = b->bm_bits; obits = b->bm_bits;
growing = bits > obits; growing = bits > obits;
if (opages) if (opages && growing && set_new_bits)
bm_set_surplus(b); bm_set_surplus(b);
b->bm_pages = npages; b->bm_pages = npages;
...@@ -526,8 +529,12 @@ int drbd_bm_resize(struct drbd_conf *mdev, sector_t capacity) ...@@ -526,8 +529,12 @@ int drbd_bm_resize(struct drbd_conf *mdev, sector_t capacity)
b->bm_dev_capacity = capacity; b->bm_dev_capacity = capacity;
if (growing) { if (growing) {
bm_memset(b, owords, 0xff, words-owords); if (set_new_bits) {
b->bm_set += bits - obits; bm_memset(b, owords, 0xff, words-owords);
b->bm_set += bits - obits;
} else
bm_memset(b, owords, 0x00, words-owords);
} }
if (want < have) { if (want < have) {
...@@ -773,7 +780,7 @@ static void bm_page_io_async(struct drbd_conf *mdev, struct drbd_bitmap *b, int ...@@ -773,7 +780,7 @@ static void bm_page_io_async(struct drbd_conf *mdev, struct drbd_bitmap *b, int
/* nothing to do, on disk == in memory */ /* nothing to do, on disk == in memory */
# define bm_cpu_to_lel(x) ((void)0) # define bm_cpu_to_lel(x) ((void)0)
# else # else
void bm_cpu_to_lel(struct drbd_bitmap *b) static void bm_cpu_to_lel(struct drbd_bitmap *b)
{ {
/* need to cpu_to_lel all the pages ... /* need to cpu_to_lel all the pages ...
* this may be optimized by using * this may be optimized by using
...@@ -1015,7 +1022,7 @@ unsigned long _drbd_bm_find_next_zero(struct drbd_conf *mdev, unsigned long bm_f ...@@ -1015,7 +1022,7 @@ unsigned long _drbd_bm_find_next_zero(struct drbd_conf *mdev, unsigned long bm_f
* wants bitnr, not sector. * wants bitnr, not sector.
* expected to be called for only a few bits (e - s about BITS_PER_LONG). * expected to be called for only a few bits (e - s about BITS_PER_LONG).
* Must hold bitmap lock already. */ * Must hold bitmap lock already. */
int __bm_change_bits_to(struct drbd_conf *mdev, const unsigned long s, static int __bm_change_bits_to(struct drbd_conf *mdev, const unsigned long s,
unsigned long e, int val, const enum km_type km) unsigned long e, int val, const enum km_type km)
{ {
struct drbd_bitmap *b = mdev->bitmap; struct drbd_bitmap *b = mdev->bitmap;
...@@ -1053,7 +1060,7 @@ int __bm_change_bits_to(struct drbd_conf *mdev, const unsigned long s, ...@@ -1053,7 +1060,7 @@ int __bm_change_bits_to(struct drbd_conf *mdev, const unsigned long s,
* for val != 0, we change 0 -> 1, return code positive * for val != 0, we change 0 -> 1, return code positive
* for val == 0, we change 1 -> 0, return code negative * for val == 0, we change 1 -> 0, return code negative
* wants bitnr, not sector */ * wants bitnr, not sector */
int bm_change_bits_to(struct drbd_conf *mdev, const unsigned long s, static int bm_change_bits_to(struct drbd_conf *mdev, const unsigned long s,
const unsigned long e, int val) const unsigned long e, int val)
{ {
unsigned long flags; unsigned long flags;
......
This diff is collapsed.
...@@ -684,6 +684,9 @@ static int is_valid_state(struct drbd_conf *mdev, union drbd_state ns) ...@@ -684,6 +684,9 @@ static int is_valid_state(struct drbd_conf *mdev, union drbd_state ns)
else if (ns.conn > C_CONNECTED && ns.pdsk < D_INCONSISTENT) else if (ns.conn > C_CONNECTED && ns.pdsk < D_INCONSISTENT)
rv = SS_NO_REMOTE_DISK; rv = SS_NO_REMOTE_DISK;
else if (ns.conn > C_CONNECTED && ns.disk < D_UP_TO_DATE && ns.pdsk < D_UP_TO_DATE)
rv = SS_NO_UP_TO_DATE_DISK;
else if ((ns.conn == C_CONNECTED || else if ((ns.conn == C_CONNECTED ||
ns.conn == C_WF_BITMAP_S || ns.conn == C_WF_BITMAP_S ||
ns.conn == C_SYNC_SOURCE || ns.conn == C_SYNC_SOURCE ||
...@@ -840,7 +843,12 @@ static union drbd_state sanitize_state(struct drbd_conf *mdev, union drbd_state ...@@ -840,7 +843,12 @@ static union drbd_state sanitize_state(struct drbd_conf *mdev, union drbd_state
break; break;
case C_WF_BITMAP_S: case C_WF_BITMAP_S:
case C_PAUSED_SYNC_S: case C_PAUSED_SYNC_S:
ns.pdsk = D_OUTDATED; /* remap any consistent state to D_OUTDATED,
* but disallow "upgrade" of not even consistent states.
*/
ns.pdsk =
(D_DISKLESS < os.pdsk && os.pdsk < D_OUTDATED)
? os.pdsk : D_OUTDATED;
break; break;
case C_SYNC_SOURCE: case C_SYNC_SOURCE:
ns.pdsk = D_INCONSISTENT; ns.pdsk = D_INCONSISTENT;
...@@ -1205,21 +1213,20 @@ static void after_state_ch(struct drbd_conf *mdev, union drbd_state os, ...@@ -1205,21 +1213,20 @@ static void after_state_ch(struct drbd_conf *mdev, union drbd_state os,
&& (ns.pdsk < D_INCONSISTENT || && (ns.pdsk < D_INCONSISTENT ||
ns.pdsk == D_UNKNOWN || ns.pdsk == D_UNKNOWN ||
ns.pdsk == D_OUTDATED)) { ns.pdsk == D_OUTDATED)) {
kfree(mdev->p_uuid);
mdev->p_uuid = NULL;
if (get_ldev(mdev)) { if (get_ldev(mdev)) {
if ((ns.role == R_PRIMARY || ns.peer == R_PRIMARY) && if ((ns.role == R_PRIMARY || ns.peer == R_PRIMARY) &&
mdev->ldev->md.uuid[UI_BITMAP] == 0 && ns.disk >= D_UP_TO_DATE) { mdev->ldev->md.uuid[UI_BITMAP] == 0 && ns.disk >= D_UP_TO_DATE &&
drbd_uuid_new_current(mdev); !atomic_read(&mdev->new_c_uuid))
drbd_send_uuids(mdev); atomic_set(&mdev->new_c_uuid, 2);
}
put_ldev(mdev); put_ldev(mdev);
} }
} }
if (ns.pdsk < D_INCONSISTENT && get_ldev(mdev)) { if (ns.pdsk < D_INCONSISTENT && get_ldev(mdev)) {
if (ns.peer == R_PRIMARY && mdev->ldev->md.uuid[UI_BITMAP] == 0) /* Diskless peer becomes primary or got connected do diskless, primary peer. */
drbd_uuid_new_current(mdev); if (ns.peer == R_PRIMARY && mdev->ldev->md.uuid[UI_BITMAP] == 0 &&
!atomic_read(&mdev->new_c_uuid))
atomic_set(&mdev->new_c_uuid, 2);
/* D_DISKLESS Peer becomes secondary */ /* D_DISKLESS Peer becomes secondary */
if (os.peer == R_PRIMARY && ns.peer == R_SECONDARY) if (os.peer == R_PRIMARY && ns.peer == R_SECONDARY)
...@@ -1232,7 +1239,7 @@ static void after_state_ch(struct drbd_conf *mdev, union drbd_state os, ...@@ -1232,7 +1239,7 @@ static void after_state_ch(struct drbd_conf *mdev, union drbd_state os,
os.disk == D_ATTACHING && ns.disk == D_NEGOTIATING) { os.disk == D_ATTACHING && ns.disk == D_NEGOTIATING) {
kfree(mdev->p_uuid); /* We expect to receive up-to-date UUIDs soon. */ kfree(mdev->p_uuid); /* We expect to receive up-to-date UUIDs soon. */
mdev->p_uuid = NULL; /* ...to not use the old ones in the mean time */ mdev->p_uuid = NULL; /* ...to not use the old ones in the mean time */
drbd_send_sizes(mdev, 0); /* to start sync... */ drbd_send_sizes(mdev, 0, 0); /* to start sync... */
drbd_send_uuids(mdev); drbd_send_uuids(mdev);
drbd_send_state(mdev); drbd_send_state(mdev);
} }
...@@ -1343,6 +1350,24 @@ static void after_state_ch(struct drbd_conf *mdev, union drbd_state os, ...@@ -1343,6 +1350,24 @@ static void after_state_ch(struct drbd_conf *mdev, union drbd_state os,
drbd_md_sync(mdev); drbd_md_sync(mdev);
} }
static int w_new_current_uuid(struct drbd_conf *mdev, struct drbd_work *w, int cancel)
{
if (get_ldev(mdev)) {
if (mdev->ldev->md.uuid[UI_BITMAP] == 0) {
drbd_uuid_new_current(mdev);
if (get_net_conf(mdev)) {
drbd_send_uuids(mdev);
put_net_conf(mdev);
}
drbd_md_sync(mdev);
}
put_ldev(mdev);
}
atomic_dec(&mdev->new_c_uuid);
wake_up(&mdev->misc_wait);
return 1;
}
static int drbd_thread_setup(void *arg) static int drbd_thread_setup(void *arg)
{ {
...@@ -1755,7 +1780,7 @@ int drbd_send_sync_uuid(struct drbd_conf *mdev, u64 val) ...@@ -1755,7 +1780,7 @@ int drbd_send_sync_uuid(struct drbd_conf *mdev, u64 val)
(struct p_header *)&p, sizeof(p)); (struct p_header *)&p, sizeof(p));
} }
int drbd_send_sizes(struct drbd_conf *mdev, int trigger_reply) int drbd_send_sizes(struct drbd_conf *mdev, int trigger_reply, enum dds_flags flags)
{ {
struct p_sizes p; struct p_sizes p;
sector_t d_size, u_size; sector_t d_size, u_size;
...@@ -1767,7 +1792,6 @@ int drbd_send_sizes(struct drbd_conf *mdev, int trigger_reply) ...@@ -1767,7 +1792,6 @@ int drbd_send_sizes(struct drbd_conf *mdev, int trigger_reply)
d_size = drbd_get_max_capacity(mdev->ldev); d_size = drbd_get_max_capacity(mdev->ldev);
u_size = mdev->ldev->dc.disk_size; u_size = mdev->ldev->dc.disk_size;
q_order_type = drbd_queue_order_type(mdev); q_order_type = drbd_queue_order_type(mdev);
p.queue_order_type = cpu_to_be32(drbd_queue_order_type(mdev));
put_ldev(mdev); put_ldev(mdev);
} else { } else {
d_size = 0; d_size = 0;
...@@ -1779,7 +1803,8 @@ int drbd_send_sizes(struct drbd_conf *mdev, int trigger_reply) ...@@ -1779,7 +1803,8 @@ int drbd_send_sizes(struct drbd_conf *mdev, int trigger_reply)
p.u_size = cpu_to_be64(u_size); p.u_size = cpu_to_be64(u_size);
p.c_size = cpu_to_be64(trigger_reply ? 0 : drbd_get_capacity(mdev->this_bdev)); p.c_size = cpu_to_be64(trigger_reply ? 0 : drbd_get_capacity(mdev->this_bdev));
p.max_segment_size = cpu_to_be32(queue_max_segment_size(mdev->rq_queue)); p.max_segment_size = cpu_to_be32(queue_max_segment_size(mdev->rq_queue));
p.queue_order_type = cpu_to_be32(q_order_type); p.queue_order_type = cpu_to_be16(q_order_type);
p.dds_flags = cpu_to_be16(flags);
ok = drbd_send_cmd(mdev, USE_DATA_SOCKET, P_SIZES, ok = drbd_send_cmd(mdev, USE_DATA_SOCKET, P_SIZES,
(struct p_header *)&p, sizeof(p)); (struct p_header *)&p, sizeof(p));
...@@ -2180,6 +2205,43 @@ int drbd_send_ov_request(struct drbd_conf *mdev, sector_t sector, int size) ...@@ -2180,6 +2205,43 @@ int drbd_send_ov_request(struct drbd_conf *mdev, sector_t sector, int size)
return ok; return ok;
} }
static int drbd_send_delay_probe(struct drbd_conf *mdev, struct drbd_socket *ds)
{
struct p_delay_probe dp;
int offset, ok = 0;
struct timeval now;
mutex_lock(&ds->mutex);
if (likely(ds->socket)) {
do_gettimeofday(&now);
offset = now.tv_usec - mdev->dps_time.tv_usec +
(now.tv_sec - mdev->dps_time.tv_sec) * 1000000;
dp.seq_num = cpu_to_be32(mdev->delay_seq);
dp.offset = cpu_to_be32(offset);
ok = _drbd_send_cmd(mdev, ds->socket, P_DELAY_PROBE,
(struct p_header *)&dp, sizeof(dp), 0);
}
mutex_unlock(&ds->mutex);
return ok;
}
static int drbd_send_delay_probes(struct drbd_conf *mdev)
{
int ok;
mdev->delay_seq++;
do_gettimeofday(&mdev->dps_time);
ok = drbd_send_delay_probe(mdev, &mdev->meta);
ok = ok && drbd_send_delay_probe(mdev, &mdev->data);
mdev->dp_volume_last = mdev->send_cnt;
mod_timer(&mdev->delay_probe_timer, jiffies + mdev->sync_conf.dp_interval * HZ / 10);
return ok;
}
/* called on sndtimeo /* called on sndtimeo
* returns FALSE if we should retry, * returns FALSE if we should retry,
* TRUE if we think connection is dead * TRUE if we think connection is dead
...@@ -2309,6 +2371,44 @@ static int _drbd_send_zc_bio(struct drbd_conf *mdev, struct bio *bio) ...@@ -2309,6 +2371,44 @@ static int _drbd_send_zc_bio(struct drbd_conf *mdev, struct bio *bio)
return 1; return 1;
} }
static int _drbd_send_zc_ee(struct drbd_conf *mdev, struct drbd_epoch_entry *e)
{
struct page *page = e->pages;
unsigned len = e->size;
page_chain_for_each(page) {
unsigned l = min_t(unsigned, len, PAGE_SIZE);
if (!_drbd_send_page(mdev, page, 0, l))
return 0;
len -= l;
}
return 1;
}
static void consider_delay_probes(struct drbd_conf *mdev)
{
if (mdev->state.conn != C_SYNC_SOURCE || mdev->agreed_pro_version < 93)
return;
if (mdev->dp_volume_last + mdev->sync_conf.dp_volume * 2 < mdev->send_cnt)
drbd_send_delay_probes(mdev);
}
static int w_delay_probes(struct drbd_conf *mdev, struct drbd_work *w, int cancel)
{
if (!cancel && mdev->state.conn == C_SYNC_SOURCE)
drbd_send_delay_probes(mdev);
return 1;
}
static void delay_probe_timer_fn(unsigned long data)
{
struct drbd_conf *mdev = (struct drbd_conf *) data;
if (list_empty(&mdev->delay_probe_work.list))
drbd_queue_work(&mdev->data.work, &mdev->delay_probe_work);
}
/* Used to send write requests /* Used to send write requests
* R_PRIMARY -> Peer (P_DATA) * R_PRIMARY -> Peer (P_DATA)
*/ */
...@@ -2360,7 +2460,7 @@ int drbd_send_dblock(struct drbd_conf *mdev, struct drbd_request *req) ...@@ -2360,7 +2460,7 @@ int drbd_send_dblock(struct drbd_conf *mdev, struct drbd_request *req)
drbd_send(mdev, mdev->data.socket, &p, sizeof(p), MSG_MORE)); drbd_send(mdev, mdev->data.socket, &p, sizeof(p), MSG_MORE));
if (ok && dgs) { if (ok && dgs) {
dgb = mdev->int_dig_out; dgb = mdev->int_dig_out;
drbd_csum(mdev, mdev->integrity_w_tfm, req->master_bio, dgb); drbd_csum_bio(mdev, mdev->integrity_w_tfm, req->master_bio, dgb);
ok = drbd_send(mdev, mdev->data.socket, dgb, dgs, MSG_MORE); ok = drbd_send(mdev, mdev->data.socket, dgb, dgs, MSG_MORE);
} }
if (ok) { if (ok) {
...@@ -2371,6 +2471,10 @@ int drbd_send_dblock(struct drbd_conf *mdev, struct drbd_request *req) ...@@ -2371,6 +2471,10 @@ int drbd_send_dblock(struct drbd_conf *mdev, struct drbd_request *req)
} }
drbd_put_data_sock(mdev); drbd_put_data_sock(mdev);
if (ok)
consider_delay_probes(mdev);
return ok; return ok;
} }
...@@ -2409,13 +2513,17 @@ int drbd_send_block(struct drbd_conf *mdev, enum drbd_packets cmd, ...@@ -2409,13 +2513,17 @@ int drbd_send_block(struct drbd_conf *mdev, enum drbd_packets cmd,
sizeof(p), MSG_MORE); sizeof(p), MSG_MORE);
if (ok && dgs) { if (ok && dgs) {
dgb = mdev->int_dig_out; dgb = mdev->int_dig_out;
drbd_csum(mdev, mdev->integrity_w_tfm, e->private_bio, dgb); drbd_csum_ee(mdev, mdev->integrity_w_tfm, e, dgb);
ok = drbd_send(mdev, mdev->data.socket, dgb, dgs, MSG_MORE); ok = drbd_send(mdev, mdev->data.socket, dgb, dgs, MSG_MORE);
} }
if (ok) if (ok)
ok = _drbd_send_zc_bio(mdev, e->private_bio); ok = _drbd_send_zc_ee(mdev, e);
drbd_put_data_sock(mdev); drbd_put_data_sock(mdev);
if (ok)
consider_delay_probes(mdev);
return ok; return ok;
} }
...@@ -2600,6 +2708,7 @@ void drbd_init_set_defaults(struct drbd_conf *mdev) ...@@ -2600,6 +2708,7 @@ void drbd_init_set_defaults(struct drbd_conf *mdev)
atomic_set(&mdev->net_cnt, 0); atomic_set(&mdev->net_cnt, 0);
atomic_set(&mdev->packet_seq, 0); atomic_set(&mdev->packet_seq, 0);
atomic_set(&mdev->pp_in_use, 0); atomic_set(&mdev->pp_in_use, 0);
atomic_set(&mdev->new_c_uuid, 0);
mutex_init(&mdev->md_io_mutex); mutex_init(&mdev->md_io_mutex);
mutex_init(&mdev->data.mutex); mutex_init(&mdev->data.mutex);
...@@ -2628,16 +2737,26 @@ void drbd_init_set_defaults(struct drbd_conf *mdev) ...@@ -2628,16 +2737,26 @@ void drbd_init_set_defaults(struct drbd_conf *mdev)
INIT_LIST_HEAD(&mdev->unplug_work.list); INIT_LIST_HEAD(&mdev->unplug_work.list);
INIT_LIST_HEAD(&mdev->md_sync_work.list); INIT_LIST_HEAD(&mdev->md_sync_work.list);
INIT_LIST_HEAD(&mdev->bm_io_work.w.list); INIT_LIST_HEAD(&mdev->bm_io_work.w.list);
INIT_LIST_HEAD(&mdev->delay_probes);
INIT_LIST_HEAD(&mdev->delay_probe_work.list);
INIT_LIST_HEAD(&mdev->uuid_work.list);
mdev->resync_work.cb = w_resync_inactive; mdev->resync_work.cb = w_resync_inactive;
mdev->unplug_work.cb = w_send_write_hint; mdev->unplug_work.cb = w_send_write_hint;
mdev->md_sync_work.cb = w_md_sync; mdev->md_sync_work.cb = w_md_sync;
mdev->bm_io_work.w.cb = w_bitmap_io; mdev->bm_io_work.w.cb = w_bitmap_io;
mdev->delay_probe_work.cb = w_delay_probes;
mdev->uuid_work.cb = w_new_current_uuid;
init_timer(&mdev->resync_timer); init_timer(&mdev->resync_timer);
init_timer(&mdev->md_sync_timer); init_timer(&mdev->md_sync_timer);
init_timer(&mdev->delay_probe_timer);
mdev->resync_timer.function = resync_timer_fn; mdev->resync_timer.function = resync_timer_fn;
mdev->resync_timer.data = (unsigned long) mdev; mdev->resync_timer.data = (unsigned long) mdev;
mdev->md_sync_timer.function = md_sync_timer_fn; mdev->md_sync_timer.function = md_sync_timer_fn;
mdev->md_sync_timer.data = (unsigned long) mdev; mdev->md_sync_timer.data = (unsigned long) mdev;
mdev->delay_probe_timer.function = delay_probe_timer_fn;
mdev->delay_probe_timer.data = (unsigned long) mdev;
init_waitqueue_head(&mdev->misc_wait); init_waitqueue_head(&mdev->misc_wait);
init_waitqueue_head(&mdev->state_wait); init_waitqueue_head(&mdev->state_wait);
...@@ -2680,7 +2799,7 @@ void drbd_mdev_cleanup(struct drbd_conf *mdev) ...@@ -2680,7 +2799,7 @@ void drbd_mdev_cleanup(struct drbd_conf *mdev)
drbd_set_my_capacity(mdev, 0); drbd_set_my_capacity(mdev, 0);
if (mdev->bitmap) { if (mdev->bitmap) {
/* maybe never allocated. */ /* maybe never allocated. */
drbd_bm_resize(mdev, 0); drbd_bm_resize(mdev, 0, 1);
drbd_bm_cleanup(mdev); drbd_bm_cleanup(mdev);
} }
...@@ -3129,7 +3248,7 @@ int __init drbd_init(void) ...@@ -3129,7 +3248,7 @@ int __init drbd_init(void)
if (err) if (err)
goto Enomem; goto Enomem;
drbd_proc = proc_create("drbd", S_IFREG | S_IRUGO , NULL, &drbd_proc_fops); drbd_proc = proc_create_data("drbd", S_IFREG | S_IRUGO , NULL, &drbd_proc_fops, NULL);
if (!drbd_proc) { if (!drbd_proc) {
printk(KERN_ERR "drbd: unable to register proc file\n"); printk(KERN_ERR "drbd: unable to register proc file\n");
goto Enomem; goto Enomem;
...@@ -3660,7 +3779,8 @@ _drbd_fault_str(unsigned int type) { ...@@ -3660,7 +3779,8 @@ _drbd_fault_str(unsigned int type) {
[DRBD_FAULT_DT_RD] = "Data read", [DRBD_FAULT_DT_RD] = "Data read",
[DRBD_FAULT_DT_RA] = "Data read ahead", [DRBD_FAULT_DT_RA] = "Data read ahead",
[DRBD_FAULT_BM_ALLOC] = "BM allocation", [DRBD_FAULT_BM_ALLOC] = "BM allocation",
[DRBD_FAULT_AL_EE] = "EE allocation" [DRBD_FAULT_AL_EE] = "EE allocation",
[DRBD_FAULT_RECEIVE] = "receive data corruption",
}; };
return (type < DRBD_FAULT_MAX) ? _faults[type] : "**Unknown**"; return (type < DRBD_FAULT_MAX) ? _faults[type] : "**Unknown**";
......
...@@ -510,7 +510,7 @@ void drbd_resume_io(struct drbd_conf *mdev) ...@@ -510,7 +510,7 @@ void drbd_resume_io(struct drbd_conf *mdev)
* Returns 0 on success, negative return values indicate errors. * Returns 0 on success, negative return values indicate errors.
* You should call drbd_md_sync() after calling this function. * You should call drbd_md_sync() after calling this function.
*/ */
enum determine_dev_size drbd_determin_dev_size(struct drbd_conf *mdev, int force) __must_hold(local) enum determine_dev_size drbd_determin_dev_size(struct drbd_conf *mdev, enum dds_flags flags) __must_hold(local)
{ {
sector_t prev_first_sect, prev_size; /* previous meta location */ sector_t prev_first_sect, prev_size; /* previous meta location */
sector_t la_size; sector_t la_size;
...@@ -541,12 +541,12 @@ enum determine_dev_size drbd_determin_dev_size(struct drbd_conf *mdev, int force ...@@ -541,12 +541,12 @@ enum determine_dev_size drbd_determin_dev_size(struct drbd_conf *mdev, int force
/* TODO: should only be some assert here, not (re)init... */ /* TODO: should only be some assert here, not (re)init... */
drbd_md_set_sector_offsets(mdev, mdev->ldev); drbd_md_set_sector_offsets(mdev, mdev->ldev);
size = drbd_new_dev_size(mdev, mdev->ldev, force); size = drbd_new_dev_size(mdev, mdev->ldev, flags & DDSF_FORCED);
if (drbd_get_capacity(mdev->this_bdev) != size || if (drbd_get_capacity(mdev->this_bdev) != size ||
drbd_bm_capacity(mdev) != size) { drbd_bm_capacity(mdev) != size) {
int err; int err;
err = drbd_bm_resize(mdev, size); err = drbd_bm_resize(mdev, size, !(flags & DDSF_NO_RESYNC));
if (unlikely(err)) { if (unlikely(err)) {
/* currently there is only one error: ENOMEM! */ /* currently there is only one error: ENOMEM! */
size = drbd_bm_capacity(mdev)>>1; size = drbd_bm_capacity(mdev)>>1;
...@@ -704,9 +704,6 @@ void drbd_setup_queue_param(struct drbd_conf *mdev, unsigned int max_seg_s) __mu ...@@ -704,9 +704,6 @@ void drbd_setup_queue_param(struct drbd_conf *mdev, unsigned int max_seg_s) __mu
struct request_queue * const b = mdev->ldev->backing_bdev->bd_disk->queue; struct request_queue * const b = mdev->ldev->backing_bdev->bd_disk->queue;
int max_segments = mdev->ldev->dc.max_bio_bvecs; int max_segments = mdev->ldev->dc.max_bio_bvecs;
if (b->merge_bvec_fn && !mdev->ldev->dc.use_bmbv)
max_seg_s = PAGE_SIZE;
max_seg_s = min(queue_max_sectors(b) * queue_logical_block_size(b), max_seg_s); max_seg_s = min(queue_max_sectors(b) * queue_logical_block_size(b), max_seg_s);
blk_queue_max_hw_sectors(q, max_seg_s >> 9); blk_queue_max_hw_sectors(q, max_seg_s >> 9);
...@@ -1199,13 +1196,12 @@ static int drbd_nl_net_conf(struct drbd_conf *mdev, struct drbd_nl_cfg_req *nlp, ...@@ -1199,13 +1196,12 @@ static int drbd_nl_net_conf(struct drbd_conf *mdev, struct drbd_nl_cfg_req *nlp,
} }
/* allocation not in the IO path, cqueue thread context */ /* allocation not in the IO path, cqueue thread context */
new_conf = kmalloc(sizeof(struct net_conf), GFP_KERNEL); new_conf = kzalloc(sizeof(struct net_conf), GFP_KERNEL);
if (!new_conf) { if (!new_conf) {
retcode = ERR_NOMEM; retcode = ERR_NOMEM;
goto fail; goto fail;
} }
memset(new_conf, 0, sizeof(struct net_conf));
new_conf->timeout = DRBD_TIMEOUT_DEF; new_conf->timeout = DRBD_TIMEOUT_DEF;
new_conf->try_connect_int = DRBD_CONNECT_INT_DEF; new_conf->try_connect_int = DRBD_CONNECT_INT_DEF;
new_conf->ping_int = DRBD_PING_INT_DEF; new_conf->ping_int = DRBD_PING_INT_DEF;
...@@ -1477,8 +1473,8 @@ static int drbd_nl_resize(struct drbd_conf *mdev, struct drbd_nl_cfg_req *nlp, ...@@ -1477,8 +1473,8 @@ static int drbd_nl_resize(struct drbd_conf *mdev, struct drbd_nl_cfg_req *nlp,
{ {
struct resize rs; struct resize rs;
int retcode = NO_ERROR; int retcode = NO_ERROR;
int ldsc = 0; /* local disk size changed */
enum determine_dev_size dd; enum determine_dev_size dd;
enum dds_flags ddsf;
memset(&rs, 0, sizeof(struct resize)); memset(&rs, 0, sizeof(struct resize));
if (!resize_from_tags(mdev, nlp->tag_list, &rs)) { if (!resize_from_tags(mdev, nlp->tag_list, &rs)) {
...@@ -1502,13 +1498,17 @@ static int drbd_nl_resize(struct drbd_conf *mdev, struct drbd_nl_cfg_req *nlp, ...@@ -1502,13 +1498,17 @@ static int drbd_nl_resize(struct drbd_conf *mdev, struct drbd_nl_cfg_req *nlp,
goto fail; goto fail;
} }
if (mdev->ldev->known_size != drbd_get_capacity(mdev->ldev->backing_bdev)) { if (rs.no_resync && mdev->agreed_pro_version < 93) {
mdev->ldev->known_size = drbd_get_capacity(mdev->ldev->backing_bdev); retcode = ERR_NEED_APV_93;
ldsc = 1; goto fail;
} }
if (mdev->ldev->known_size != drbd_get_capacity(mdev->ldev->backing_bdev))
mdev->ldev->known_size = drbd_get_capacity(mdev->ldev->backing_bdev);
mdev->ldev->dc.disk_size = (sector_t)rs.resize_size; mdev->ldev->dc.disk_size = (sector_t)rs.resize_size;
dd = drbd_determin_dev_size(mdev, rs.resize_force); ddsf = (rs.resize_force ? DDSF_FORCED : 0) | (rs.no_resync ? DDSF_NO_RESYNC : 0);
dd = drbd_determin_dev_size(mdev, ddsf);
drbd_md_sync(mdev); drbd_md_sync(mdev);
put_ldev(mdev); put_ldev(mdev);
if (dd == dev_size_error) { if (dd == dev_size_error) {
...@@ -1516,12 +1516,12 @@ static int drbd_nl_resize(struct drbd_conf *mdev, struct drbd_nl_cfg_req *nlp, ...@@ -1516,12 +1516,12 @@ static int drbd_nl_resize(struct drbd_conf *mdev, struct drbd_nl_cfg_req *nlp,
goto fail; goto fail;
} }
if (mdev->state.conn == C_CONNECTED && (dd != unchanged || ldsc)) { if (mdev->state.conn == C_CONNECTED) {
if (dd == grew) if (dd == grew)
set_bit(RESIZE_PENDING, &mdev->flags); set_bit(RESIZE_PENDING, &mdev->flags);
drbd_send_uuids(mdev); drbd_send_uuids(mdev);
drbd_send_sizes(mdev, 1); drbd_send_sizes(mdev, 1, ddsf);
} }
fail: fail:
...@@ -1551,6 +1551,10 @@ static int drbd_nl_syncer_conf(struct drbd_conf *mdev, struct drbd_nl_cfg_req *n ...@@ -1551,6 +1551,10 @@ static int drbd_nl_syncer_conf(struct drbd_conf *mdev, struct drbd_nl_cfg_req *n
sc.rate = DRBD_RATE_DEF; sc.rate = DRBD_RATE_DEF;
sc.after = DRBD_AFTER_DEF; sc.after = DRBD_AFTER_DEF;
sc.al_extents = DRBD_AL_EXTENTS_DEF; sc.al_extents = DRBD_AL_EXTENTS_DEF;
sc.dp_volume = DRBD_DP_VOLUME_DEF;
sc.dp_interval = DRBD_DP_INTERVAL_DEF;
sc.throttle_th = DRBD_RS_THROTTLE_TH_DEF;
sc.hold_off_th = DRBD_RS_HOLD_OFF_TH_DEF;
} else } else
memcpy(&sc, &mdev->sync_conf, sizeof(struct syncer_conf)); memcpy(&sc, &mdev->sync_conf, sizeof(struct syncer_conf));
...@@ -2207,9 +2211,9 @@ void drbd_bcast_ee(struct drbd_conf *mdev, ...@@ -2207,9 +2211,9 @@ void drbd_bcast_ee(struct drbd_conf *mdev,
{ {
struct cn_msg *cn_reply; struct cn_msg *cn_reply;
struct drbd_nl_cfg_reply *reply; struct drbd_nl_cfg_reply *reply;
struct bio_vec *bvec;
unsigned short *tl; unsigned short *tl;
int i; struct page *page;
unsigned len;
if (!e) if (!e)
return; return;
...@@ -2247,11 +2251,15 @@ void drbd_bcast_ee(struct drbd_conf *mdev, ...@@ -2247,11 +2251,15 @@ void drbd_bcast_ee(struct drbd_conf *mdev,
put_unaligned(T_ee_data, tl++); put_unaligned(T_ee_data, tl++);
put_unaligned(e->size, tl++); put_unaligned(e->size, tl++);
__bio_for_each_segment(bvec, e->private_bio, i, 0) { len = e->size;
void *d = kmap(bvec->bv_page); page = e->pages;
memcpy(tl, d + bvec->bv_offset, bvec->bv_len); page_chain_for_each(page) {
kunmap(bvec->bv_page); void *d = kmap_atomic(page, KM_USER0);
tl=(unsigned short*)((char*)tl + bvec->bv_len); unsigned l = min_t(unsigned, len, PAGE_SIZE);
memcpy(tl, d, l);
kunmap_atomic(d, KM_USER0);
tl = (unsigned short*)((char*)tl + l);
len -= l;
} }
put_unaligned(TT_END, tl++); /* Close the tag list */ put_unaligned(TT_END, tl++); /* Close the tag list */
......
...@@ -73,14 +73,21 @@ static void drbd_syncer_progress(struct drbd_conf *mdev, struct seq_file *seq) ...@@ -73,14 +73,21 @@ static void drbd_syncer_progress(struct drbd_conf *mdev, struct seq_file *seq)
seq_printf(seq, "sync'ed:%3u.%u%% ", res / 10, res % 10); seq_printf(seq, "sync'ed:%3u.%u%% ", res / 10, res % 10);
/* if more than 1 GB display in MB */ /* if more than 1 GB display in MB */
if (mdev->rs_total > 0x100000L) if (mdev->rs_total > 0x100000L)
seq_printf(seq, "(%lu/%lu)M\n\t", seq_printf(seq, "(%lu/%lu)M",
(unsigned long) Bit2KB(rs_left >> 10), (unsigned long) Bit2KB(rs_left >> 10),
(unsigned long) Bit2KB(mdev->rs_total >> 10)); (unsigned long) Bit2KB(mdev->rs_total >> 10));
else else
seq_printf(seq, "(%lu/%lu)K\n\t", seq_printf(seq, "(%lu/%lu)K",
(unsigned long) Bit2KB(rs_left), (unsigned long) Bit2KB(rs_left),
(unsigned long) Bit2KB(mdev->rs_total)); (unsigned long) Bit2KB(mdev->rs_total));
if (mdev->state.conn == C_SYNC_TARGET)
seq_printf(seq, " queue_delay: %d.%d ms\n\t",
mdev->data_delay / 1000,
(mdev->data_delay % 1000) / 100);
else if (mdev->state.conn == C_SYNC_SOURCE)
seq_printf(seq, " delay_probe: %u\n\t", mdev->delay_seq);
/* see drivers/md/md.c /* see drivers/md/md.c
* We do not want to overflow, so the order of operands and * We do not want to overflow, so the order of operands and
* the * 100 / 100 trick are important. We do a +1 to be * the * 100 / 100 trick are important. We do a +1 to be
...@@ -128,6 +135,14 @@ static void drbd_syncer_progress(struct drbd_conf *mdev, struct seq_file *seq) ...@@ -128,6 +135,14 @@ static void drbd_syncer_progress(struct drbd_conf *mdev, struct seq_file *seq)
else else
seq_printf(seq, " (%ld)", dbdt); seq_printf(seq, " (%ld)", dbdt);
if (mdev->state.conn == C_SYNC_TARGET) {
if (mdev->c_sync_rate > 1000)
seq_printf(seq, " want: %d,%03d",
mdev->c_sync_rate / 1000, mdev->c_sync_rate % 1000);
else
seq_printf(seq, " want: %d", mdev->c_sync_rate);
}
seq_printf(seq, " K/sec\n"); seq_printf(seq, " K/sec\n");
} }
......
This diff is collapsed.
...@@ -722,6 +722,7 @@ static int drbd_make_request_common(struct drbd_conf *mdev, struct bio *bio) ...@@ -722,6 +722,7 @@ static int drbd_make_request_common(struct drbd_conf *mdev, struct bio *bio)
struct drbd_request *req; struct drbd_request *req;
int local, remote; int local, remote;
int err = -EIO; int err = -EIO;
int ret = 0;
/* allocate outside of all locks; */ /* allocate outside of all locks; */
req = drbd_req_new(mdev, bio); req = drbd_req_new(mdev, bio);
...@@ -784,7 +785,7 @@ static int drbd_make_request_common(struct drbd_conf *mdev, struct bio *bio) ...@@ -784,7 +785,7 @@ static int drbd_make_request_common(struct drbd_conf *mdev, struct bio *bio)
(mdev->state.pdsk == D_INCONSISTENT && (mdev->state.pdsk == D_INCONSISTENT &&
mdev->state.conn >= C_CONNECTED)); mdev->state.conn >= C_CONNECTED));
if (!(local || remote)) { if (!(local || remote) && !mdev->state.susp) {
dev_err(DEV, "IO ERROR: neither local nor remote disk\n"); dev_err(DEV, "IO ERROR: neither local nor remote disk\n");
goto fail_free_complete; goto fail_free_complete;
} }
...@@ -810,6 +811,16 @@ static int drbd_make_request_common(struct drbd_conf *mdev, struct bio *bio) ...@@ -810,6 +811,16 @@ static int drbd_make_request_common(struct drbd_conf *mdev, struct bio *bio)
/* GOOD, everything prepared, grab the spin_lock */ /* GOOD, everything prepared, grab the spin_lock */
spin_lock_irq(&mdev->req_lock); spin_lock_irq(&mdev->req_lock);
if (mdev->state.susp) {
/* If we got suspended, use the retry mechanism of
generic_make_request() to restart processing of this
bio. In the next call to drbd_make_request_26
we sleep in inc_ap_bio() */
ret = 1;
spin_unlock_irq(&mdev->req_lock);
goto fail_free_complete;
}
if (remote) { if (remote) {
remote = (mdev->state.pdsk == D_UP_TO_DATE || remote = (mdev->state.pdsk == D_UP_TO_DATE ||
(mdev->state.pdsk == D_INCONSISTENT && (mdev->state.pdsk == D_INCONSISTENT &&
...@@ -947,12 +958,14 @@ static int drbd_make_request_common(struct drbd_conf *mdev, struct bio *bio) ...@@ -947,12 +958,14 @@ static int drbd_make_request_common(struct drbd_conf *mdev, struct bio *bio)
req->private_bio = NULL; req->private_bio = NULL;
put_ldev(mdev); put_ldev(mdev);
} }
bio_endio(bio, err); if (!ret)
bio_endio(bio, err);
drbd_req_free(req); drbd_req_free(req);
dec_ap_bio(mdev); dec_ap_bio(mdev);
kfree(b); kfree(b);
return 0; return ret;
} }
/* helper function for drbd_make_request /* helper function for drbd_make_request
...@@ -962,11 +975,6 @@ static int drbd_make_request_common(struct drbd_conf *mdev, struct bio *bio) ...@@ -962,11 +975,6 @@ static int drbd_make_request_common(struct drbd_conf *mdev, struct bio *bio)
*/ */
static int drbd_fail_request_early(struct drbd_conf *mdev, int is_write) static int drbd_fail_request_early(struct drbd_conf *mdev, int is_write)
{ {
/* Unconfigured */
if (mdev->state.conn == C_DISCONNECTING &&
mdev->state.disk == D_DISKLESS)
return 1;
if (mdev->state.role != R_PRIMARY && if (mdev->state.role != R_PRIMARY &&
(!allow_oos || is_write)) { (!allow_oos || is_write)) {
if (__ratelimit(&drbd_ratelimit_state)) { if (__ratelimit(&drbd_ratelimit_state)) {
...@@ -1070,15 +1078,21 @@ int drbd_make_request_26(struct request_queue *q, struct bio *bio) ...@@ -1070,15 +1078,21 @@ int drbd_make_request_26(struct request_queue *q, struct bio *bio)
/* we need to get a "reference count" (ap_bio_cnt) /* we need to get a "reference count" (ap_bio_cnt)
* to avoid races with the disconnect/reconnect/suspend code. * to avoid races with the disconnect/reconnect/suspend code.
* In case we need to split the bio here, we need to get two references * In case we need to split the bio here, we need to get three references
* atomically, otherwise we might deadlock when trying to submit the * atomically, otherwise we might deadlock when trying to submit the
* second one! */ * second one! */
inc_ap_bio(mdev, 2); inc_ap_bio(mdev, 3);
D_ASSERT(e_enr == s_enr + 1); D_ASSERT(e_enr == s_enr + 1);
drbd_make_request_common(mdev, &bp->bio1); while (drbd_make_request_common(mdev, &bp->bio1))
drbd_make_request_common(mdev, &bp->bio2); inc_ap_bio(mdev, 1);
while (drbd_make_request_common(mdev, &bp->bio2))
inc_ap_bio(mdev, 1);
dec_ap_bio(mdev);
bio_pair_release(bp); bio_pair_release(bp);
} }
return 0; return 0;
...@@ -1115,7 +1129,7 @@ int drbd_merge_bvec(struct request_queue *q, struct bvec_merge_data *bvm, struct ...@@ -1115,7 +1129,7 @@ int drbd_merge_bvec(struct request_queue *q, struct bvec_merge_data *bvm, struct
} else if (limit && get_ldev(mdev)) { } else if (limit && get_ldev(mdev)) {
struct request_queue * const b = struct request_queue * const b =
mdev->ldev->backing_bdev->bd_disk->queue; mdev->ldev->backing_bdev->bd_disk->queue;
if (b->merge_bvec_fn && mdev->ldev->dc.use_bmbv) { if (b->merge_bvec_fn) {
backing_limit = b->merge_bvec_fn(b, bvm, bvec); backing_limit = b->merge_bvec_fn(b, bvm, bvec);
limit = min(limit, backing_limit); limit = min(limit, backing_limit);
} }
......
...@@ -70,7 +70,7 @@ static const char *drbd_disk_s_names[] = { ...@@ -70,7 +70,7 @@ static const char *drbd_disk_s_names[] = {
static const char *drbd_state_sw_errors[] = { static const char *drbd_state_sw_errors[] = {
[-SS_TWO_PRIMARIES] = "Multiple primaries not allowed by config", [-SS_TWO_PRIMARIES] = "Multiple primaries not allowed by config",
[-SS_NO_UP_TO_DATE_DISK] = "Refusing to be Primary without at least one UpToDate disk", [-SS_NO_UP_TO_DATE_DISK] = "Need access to UpToDate data",
[-SS_NO_LOCAL_DISK] = "Can not resync without local disk", [-SS_NO_LOCAL_DISK] = "Can not resync without local disk",
[-SS_NO_REMOTE_DISK] = "Can not resync without remote disk", [-SS_NO_REMOTE_DISK] = "Can not resync without remote disk",
[-SS_CONNECTED_OUTDATES] = "Refusing to be Outdated while Connected", [-SS_CONNECTED_OUTDATES] = "Refusing to be Outdated while Connected",
......
This diff is collapsed.
...@@ -18,23 +18,9 @@ static inline void drbd_set_my_capacity(struct drbd_conf *mdev, ...@@ -18,23 +18,9 @@ static inline void drbd_set_my_capacity(struct drbd_conf *mdev,
#define drbd_bio_uptodate(bio) bio_flagged(bio, BIO_UPTODATE) #define drbd_bio_uptodate(bio) bio_flagged(bio, BIO_UPTODATE)
static inline int drbd_bio_has_active_page(struct bio *bio)
{
struct bio_vec *bvec;
int i;
__bio_for_each_segment(bvec, bio, i, 0) {
if (page_count(bvec->bv_page) > 1)
return 1;
}
return 0;
}
/* bi_end_io handlers */ /* bi_end_io handlers */
extern void drbd_md_io_complete(struct bio *bio, int error); extern void drbd_md_io_complete(struct bio *bio, int error);
extern void drbd_endio_read_sec(struct bio *bio, int error); extern void drbd_endio_sec(struct bio *bio, int error);
extern void drbd_endio_write_sec(struct bio *bio, int error);
extern void drbd_endio_pri(struct bio *bio, int error); extern void drbd_endio_pri(struct bio *bio, int error);
/* /*
......
...@@ -407,32 +407,24 @@ static int ide_disk_get_capacity(ide_drive_t *drive) ...@@ -407,32 +407,24 @@ static int ide_disk_get_capacity(ide_drive_t *drive)
return 0; return 0;
} }
static u64 ide_disk_set_capacity(ide_drive_t *drive, u64 capacity) static void ide_disk_unlock_native_capacity(ide_drive_t *drive)
{ {
u64 set = min(capacity, drive->probed_capacity);
u16 *id = drive->id; u16 *id = drive->id;
int lba48 = ata_id_lba48_enabled(id); int lba48 = ata_id_lba48_enabled(id);
if ((drive->dev_flags & IDE_DFLAG_LBA) == 0 || if ((drive->dev_flags & IDE_DFLAG_LBA) == 0 ||
ata_id_hpa_enabled(id) == 0) ata_id_hpa_enabled(id) == 0)
goto out; return;
/* /*
* according to the spec the SET MAX ADDRESS command shall be * according to the spec the SET MAX ADDRESS command shall be
* immediately preceded by a READ NATIVE MAX ADDRESS command * immediately preceded by a READ NATIVE MAX ADDRESS command
*/ */
capacity = ide_disk_hpa_get_native_capacity(drive, lba48); if (!ide_disk_hpa_get_native_capacity(drive, lba48))
if (capacity == 0) return;
goto out;
if (ide_disk_hpa_set_capacity(drive, drive->probed_capacity, lba48))
set = ide_disk_hpa_set_capacity(drive, set, lba48); drive->dev_flags |= IDE_DFLAG_NOHPA; /* disable HPA on resume */
if (set) {
/* needed for ->resume to disable HPA */
drive->dev_flags |= IDE_DFLAG_NOHPA;
return set;
}
out:
return drive->capacity64;
} }
static void idedisk_prepare_flush(struct request_queue *q, struct request *rq) static void idedisk_prepare_flush(struct request_queue *q, struct request *rq)
...@@ -783,13 +775,13 @@ static int ide_disk_set_doorlock(ide_drive_t *drive, struct gendisk *disk, ...@@ -783,13 +775,13 @@ static int ide_disk_set_doorlock(ide_drive_t *drive, struct gendisk *disk,
} }
const struct ide_disk_ops ide_ata_disk_ops = { const struct ide_disk_ops ide_ata_disk_ops = {
.check = ide_disk_check, .check = ide_disk_check,
.set_capacity = ide_disk_set_capacity, .unlock_native_capacity = ide_disk_unlock_native_capacity,
.get_capacity = ide_disk_get_capacity, .get_capacity = ide_disk_get_capacity,
.setup = ide_disk_setup, .setup = ide_disk_setup,
.flush = ide_disk_flush, .flush = ide_disk_flush,
.init_media = ide_disk_init_media, .init_media = ide_disk_init_media,
.set_doorlock = ide_disk_set_doorlock, .set_doorlock = ide_disk_set_doorlock,
.do_request = ide_do_rw_disk, .do_request = ide_do_rw_disk,
.ioctl = ide_disk_ioctl, .ioctl = ide_disk_ioctl,
}; };
...@@ -288,17 +288,14 @@ static int ide_gd_media_changed(struct gendisk *disk) ...@@ -288,17 +288,14 @@ static int ide_gd_media_changed(struct gendisk *disk)
return ret; return ret;
} }
static unsigned long long ide_gd_set_capacity(struct gendisk *disk, static void ide_gd_unlock_native_capacity(struct gendisk *disk)
unsigned long long capacity)
{ {
struct ide_disk_obj *idkp = ide_drv_g(disk, ide_disk_obj); struct ide_disk_obj *idkp = ide_drv_g(disk, ide_disk_obj);
ide_drive_t *drive = idkp->drive; ide_drive_t *drive = idkp->drive;
const struct ide_disk_ops *disk_ops = drive->disk_ops; const struct ide_disk_ops *disk_ops = drive->disk_ops;
if (disk_ops->set_capacity) if (disk_ops->unlock_native_capacity)
return disk_ops->set_capacity(drive, capacity); disk_ops->unlock_native_capacity(drive);
return drive->capacity64;
} }
static int ide_gd_revalidate_disk(struct gendisk *disk) static int ide_gd_revalidate_disk(struct gendisk *disk)
...@@ -329,7 +326,7 @@ static const struct block_device_operations ide_gd_ops = { ...@@ -329,7 +326,7 @@ static const struct block_device_operations ide_gd_ops = {
.locked_ioctl = ide_gd_ioctl, .locked_ioctl = ide_gd_ioctl,
.getgeo = ide_gd_getgeo, .getgeo = ide_gd_getgeo,
.media_changed = ide_gd_media_changed, .media_changed = ide_gd_media_changed,
.set_capacity = ide_gd_set_capacity, .unlock_native_capacity = ide_gd_unlock_native_capacity,
.revalidate_disk = ide_gd_revalidate_disk .revalidate_disk = ide_gd_revalidate_disk
}; };
......
...@@ -417,7 +417,7 @@ int blkdev_fsync(struct file *filp, struct dentry *dentry, int datasync) ...@@ -417,7 +417,7 @@ int blkdev_fsync(struct file *filp, struct dentry *dentry, int datasync)
*/ */
mutex_unlock(&bd_inode->i_mutex); mutex_unlock(&bd_inode->i_mutex);
error = blkdev_issue_flush(bdev, NULL); error = blkdev_issue_flush(bdev, GFP_KERNEL, NULL, BLKDEV_IFL_WAIT);
if (error == -EOPNOTSUPP) if (error == -EOPNOTSUPP)
error = 0; error = 0;
...@@ -668,41 +668,209 @@ void bd_forget(struct inode *inode) ...@@ -668,41 +668,209 @@ void bd_forget(struct inode *inode)
iput(bdev->bd_inode); iput(bdev->bd_inode);
} }
int bd_claim(struct block_device *bdev, void *holder) /**
* bd_may_claim - test whether a block device can be claimed
* @bdev: block device of interest
* @whole: whole block device containing @bdev, may equal @bdev
* @holder: holder trying to claim @bdev
*
* Test whther @bdev can be claimed by @holder.
*
* CONTEXT:
* spin_lock(&bdev_lock).
*
* RETURNS:
* %true if @bdev can be claimed, %false otherwise.
*/
static bool bd_may_claim(struct block_device *bdev, struct block_device *whole,
void *holder)
{ {
int res;
spin_lock(&bdev_lock);
/* first decide result */
if (bdev->bd_holder == holder) if (bdev->bd_holder == holder)
res = 0; /* already a holder */ return true; /* already a holder */
else if (bdev->bd_holder != NULL) else if (bdev->bd_holder != NULL)
res = -EBUSY; /* held by someone else */ return false; /* held by someone else */
else if (bdev->bd_contains == bdev) else if (bdev->bd_contains == bdev)
res = 0; /* is a whole device which isn't held */ return true; /* is a whole device which isn't held */
else if (bdev->bd_contains->bd_holder == bd_claim) else if (whole->bd_holder == bd_claim)
res = 0; /* is a partition of a device that is being partitioned */ return true; /* is a partition of a device that is being partitioned */
else if (bdev->bd_contains->bd_holder != NULL) else if (whole->bd_holder != NULL)
res = -EBUSY; /* is a partition of a held device */ return false; /* is a partition of a held device */
else else
res = 0; /* is a partition of an un-held device */ return true; /* is a partition of an un-held device */
}
/**
* bd_prepare_to_claim - prepare to claim a block device
* @bdev: block device of interest
* @whole: the whole device containing @bdev, may equal @bdev
* @holder: holder trying to claim @bdev
*
* Prepare to claim @bdev. This function fails if @bdev is already
* claimed by another holder and waits if another claiming is in
* progress. This function doesn't actually claim. On successful
* return, the caller has ownership of bd_claiming and bd_holder[s].
*
* CONTEXT:
* spin_lock(&bdev_lock). Might release bdev_lock, sleep and regrab
* it multiple times.
*
* RETURNS:
* 0 if @bdev can be claimed, -EBUSY otherwise.
*/
static int bd_prepare_to_claim(struct block_device *bdev,
struct block_device *whole, void *holder)
{
retry:
/* if someone else claimed, fail */
if (!bd_may_claim(bdev, whole, holder))
return -EBUSY;
/* if someone else is claiming, wait for it to finish */
if (whole->bd_claiming && whole->bd_claiming != holder) {
wait_queue_head_t *wq = bit_waitqueue(&whole->bd_claiming, 0);
DEFINE_WAIT(wait);
prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
spin_unlock(&bdev_lock);
schedule();
finish_wait(wq, &wait);
spin_lock(&bdev_lock);
goto retry;
}
/* yay, all mine */
return 0;
}
/**
* bd_start_claiming - start claiming a block device
* @bdev: block device of interest
* @holder: holder trying to claim @bdev
*
* @bdev is about to be opened exclusively. Check @bdev can be opened
* exclusively and mark that an exclusive open is in progress. Each
* successful call to this function must be matched with a call to
* either bd_claim() or bd_abort_claiming(). If this function
* succeeds, the matching bd_claim() is guaranteed to succeed.
*
* CONTEXT:
* Might sleep.
*
* RETURNS:
* Pointer to the block device containing @bdev on success, ERR_PTR()
* value on failure.
*/
static struct block_device *bd_start_claiming(struct block_device *bdev,
void *holder)
{
struct gendisk *disk;
struct block_device *whole;
int partno, err;
might_sleep();
/*
* @bdev might not have been initialized properly yet, look up
* and grab the outer block device the hard way.
*/
disk = get_gendisk(bdev->bd_dev, &partno);
if (!disk)
return ERR_PTR(-ENXIO);
whole = bdget_disk(disk, 0);
put_disk(disk);
if (!whole)
return ERR_PTR(-ENOMEM);
/* prepare to claim, if successful, mark claiming in progress */
spin_lock(&bdev_lock);
err = bd_prepare_to_claim(bdev, whole, holder);
if (err == 0) {
whole->bd_claiming = holder;
spin_unlock(&bdev_lock);
return whole;
} else {
spin_unlock(&bdev_lock);
bdput(whole);
return ERR_PTR(err);
}
}
/* now impose change */ /* releases bdev_lock */
if (res==0) { static void __bd_abort_claiming(struct block_device *whole, void *holder)
{
BUG_ON(whole->bd_claiming != holder);
whole->bd_claiming = NULL;
wake_up_bit(&whole->bd_claiming, 0);
spin_unlock(&bdev_lock);
bdput(whole);
}
/**
* bd_abort_claiming - abort claiming a block device
* @whole: whole block device returned by bd_start_claiming()
* @holder: holder trying to claim @bdev
*
* Abort a claiming block started by bd_start_claiming(). Note that
* @whole is not the block device to be claimed but the whole device
* returned by bd_start_claiming().
*
* CONTEXT:
* Grabs and releases bdev_lock.
*/
static void bd_abort_claiming(struct block_device *whole, void *holder)
{
spin_lock(&bdev_lock);
__bd_abort_claiming(whole, holder); /* releases bdev_lock */
}
/**
* bd_claim - claim a block device
* @bdev: block device to claim
* @holder: holder trying to claim @bdev
*
* Try to claim @bdev which must have been opened successfully. This
* function may be called with or without preceding
* blk_start_claiming(). In the former case, this function is always
* successful and terminates the claiming block.
*
* CONTEXT:
* Might sleep.
*
* RETURNS:
* 0 if successful, -EBUSY if @bdev is already claimed.
*/
int bd_claim(struct block_device *bdev, void *holder)
{
struct block_device *whole = bdev->bd_contains;
int res;
might_sleep();
spin_lock(&bdev_lock);
res = bd_prepare_to_claim(bdev, whole, holder);
if (res == 0) {
/* note that for a whole device bd_holders /* note that for a whole device bd_holders
* will be incremented twice, and bd_holder will * will be incremented twice, and bd_holder will
* be set to bd_claim before being set to holder * be set to bd_claim before being set to holder
*/ */
bdev->bd_contains->bd_holders ++; whole->bd_holders++;
bdev->bd_contains->bd_holder = bd_claim; whole->bd_holder = bd_claim;
bdev->bd_holders++; bdev->bd_holders++;
bdev->bd_holder = holder; bdev->bd_holder = holder;
} }
spin_unlock(&bdev_lock);
if (whole->bd_claiming)
__bd_abort_claiming(whole, holder); /* releases bdev_lock */
else
spin_unlock(&bdev_lock);
return res; return res;
} }
EXPORT_SYMBOL(bd_claim); EXPORT_SYMBOL(bd_claim);
void bd_release(struct block_device *bdev) void bd_release(struct block_device *bdev)
...@@ -1316,6 +1484,7 @@ EXPORT_SYMBOL(blkdev_get); ...@@ -1316,6 +1484,7 @@ EXPORT_SYMBOL(blkdev_get);
static int blkdev_open(struct inode * inode, struct file * filp) static int blkdev_open(struct inode * inode, struct file * filp)
{ {
struct block_device *whole = NULL;
struct block_device *bdev; struct block_device *bdev;
int res; int res;
...@@ -1338,22 +1507,25 @@ static int blkdev_open(struct inode * inode, struct file * filp) ...@@ -1338,22 +1507,25 @@ static int blkdev_open(struct inode * inode, struct file * filp)
if (bdev == NULL) if (bdev == NULL)
return -ENOMEM; return -ENOMEM;
if (filp->f_mode & FMODE_EXCL) {
whole = bd_start_claiming(bdev, filp);
if (IS_ERR(whole)) {
bdput(bdev);
return PTR_ERR(whole);
}
}
filp->f_mapping = bdev->bd_inode->i_mapping; filp->f_mapping = bdev->bd_inode->i_mapping;
res = blkdev_get(bdev, filp->f_mode); res = blkdev_get(bdev, filp->f_mode);
if (res)
return res;
if (filp->f_mode & FMODE_EXCL) { if (whole) {
res = bd_claim(bdev, filp); if (res == 0)
if (res) BUG_ON(bd_claim(bdev, filp) != 0);
goto out_blkdev_put; else
bd_abort_claiming(whole, filp);
} }
return 0;
out_blkdev_put:
blkdev_put(bdev, filp->f_mode);
return res; return res;
} }
...@@ -1564,27 +1736,34 @@ EXPORT_SYMBOL(lookup_bdev); ...@@ -1564,27 +1736,34 @@ EXPORT_SYMBOL(lookup_bdev);
*/ */
struct block_device *open_bdev_exclusive(const char *path, fmode_t mode, void *holder) struct block_device *open_bdev_exclusive(const char *path, fmode_t mode, void *holder)
{ {
struct block_device *bdev; struct block_device *bdev, *whole;
int error = 0; int error;
bdev = lookup_bdev(path); bdev = lookup_bdev(path);
if (IS_ERR(bdev)) if (IS_ERR(bdev))
return bdev; return bdev;
whole = bd_start_claiming(bdev, holder);
if (IS_ERR(whole)) {
bdput(bdev);
return whole;
}
error = blkdev_get(bdev, mode); error = blkdev_get(bdev, mode);
if (error) if (error)
return ERR_PTR(error); goto out_abort_claiming;
error = -EACCES; error = -EACCES;
if ((mode & FMODE_WRITE) && bdev_read_only(bdev)) if ((mode & FMODE_WRITE) && bdev_read_only(bdev))
goto blkdev_put; goto out_blkdev_put;
error = bd_claim(bdev, holder);
if (error)
goto blkdev_put;
BUG_ON(bd_claim(bdev, holder) != 0);
return bdev; return bdev;
blkdev_put: out_blkdev_put:
blkdev_put(bdev, mode); blkdev_put(bdev, mode);
out_abort_claiming:
bd_abort_claiming(whole, holder);
return ERR_PTR(error); return ERR_PTR(error);
} }
......
...@@ -1589,7 +1589,7 @@ static void btrfs_issue_discard(struct block_device *bdev, ...@@ -1589,7 +1589,7 @@ static void btrfs_issue_discard(struct block_device *bdev,
u64 start, u64 len) u64 start, u64 len)
{ {
blkdev_issue_discard(bdev, start >> 9, len >> 9, GFP_KERNEL, blkdev_issue_discard(bdev, start >> 9, len >> 9, GFP_KERNEL,
DISCARD_FL_BARRIER); BLKDEV_IFL_WAIT | BLKDEV_IFL_BARRIER);
} }
static int btrfs_discard_extent(struct btrfs_root *root, u64 bytenr, static int btrfs_discard_extent(struct btrfs_root *root, u64 bytenr,
......
...@@ -275,6 +275,7 @@ void invalidate_bdev(struct block_device *bdev) ...@@ -275,6 +275,7 @@ void invalidate_bdev(struct block_device *bdev)
return; return;
invalidate_bh_lrus(); invalidate_bh_lrus();
lru_add_drain_all(); /* make sure all lru add caches are flushed */
invalidate_mapping_pages(mapping, 0, -1); invalidate_mapping_pages(mapping, 0, -1);
} }
EXPORT_SYMBOL(invalidate_bdev); EXPORT_SYMBOL(invalidate_bdev);
......
...@@ -90,6 +90,7 @@ int ext3_sync_file(struct file * file, struct dentry *dentry, int datasync) ...@@ -90,6 +90,7 @@ int ext3_sync_file(struct file * file, struct dentry *dentry, int datasync)
* storage * storage
*/ */
if (needs_barrier) if (needs_barrier)
blkdev_issue_flush(inode->i_sb->s_bdev, NULL); blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL,
BLKDEV_IFL_WAIT);
return ret; return ret;
} }
...@@ -100,9 +100,11 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync) ...@@ -100,9 +100,11 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync)
if (ext4_should_writeback_data(inode) && if (ext4_should_writeback_data(inode) &&
(journal->j_fs_dev != journal->j_dev) && (journal->j_fs_dev != journal->j_dev) &&
(journal->j_flags & JBD2_BARRIER)) (journal->j_flags & JBD2_BARRIER))
blkdev_issue_flush(inode->i_sb->s_bdev, NULL); blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL,
NULL, BLKDEV_IFL_WAIT);
jbd2_log_wait_commit(journal, commit_tid); jbd2_log_wait_commit(journal, commit_tid);
} else if (journal->j_flags & JBD2_BARRIER) } else if (journal->j_flags & JBD2_BARRIER)
blkdev_issue_flush(inode->i_sb->s_bdev, NULL); blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL,
BLKDEV_IFL_WAIT);
return ret; return ret;
} }
...@@ -14,6 +14,7 @@ ...@@ -14,6 +14,7 @@
#include <linux/dnotify.h> #include <linux/dnotify.h>
#include <linux/slab.h> #include <linux/slab.h>
#include <linux/module.h> #include <linux/module.h>
#include <linux/pipe_fs_i.h>
#include <linux/security.h> #include <linux/security.h>
#include <linux/ptrace.h> #include <linux/ptrace.h>
#include <linux/signal.h> #include <linux/signal.h>
...@@ -412,6 +413,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, ...@@ -412,6 +413,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
case F_NOTIFY: case F_NOTIFY:
err = fcntl_dirnotify(fd, filp, arg); err = fcntl_dirnotify(fd, filp, arg);
break; break;
case F_SETPIPE_SZ:
case F_GETPIPE_SZ:
err = pipe_fcntl(filp, cmd, arg);
break;
default: default:
break; break;
} }
......
This diff is collapsed.
...@@ -854,7 +854,8 @@ static void gfs2_rgrp_send_discards(struct gfs2_sbd *sdp, u64 offset, ...@@ -854,7 +854,8 @@ static void gfs2_rgrp_send_discards(struct gfs2_sbd *sdp, u64 offset,
if ((start + nr_sects) != blk) { if ((start + nr_sects) != blk) {
rv = blkdev_issue_discard(bdev, start, rv = blkdev_issue_discard(bdev, start,
nr_sects, GFP_NOFS, nr_sects, GFP_NOFS,
DISCARD_FL_BARRIER); BLKDEV_IFL_WAIT |
BLKDEV_IFL_BARRIER);
if (rv) if (rv)
goto fail; goto fail;
nr_sects = 0; nr_sects = 0;
...@@ -869,7 +870,7 @@ static void gfs2_rgrp_send_discards(struct gfs2_sbd *sdp, u64 offset, ...@@ -869,7 +870,7 @@ static void gfs2_rgrp_send_discards(struct gfs2_sbd *sdp, u64 offset,
} }
if (nr_sects) { if (nr_sects) {
rv = blkdev_issue_discard(bdev, start, nr_sects, GFP_NOFS, rv = blkdev_issue_discard(bdev, start, nr_sects, GFP_NOFS,
DISCARD_FL_BARRIER); BLKDEV_IFL_WAIT | BLKDEV_IFL_BARRIER);
if (rv) if (rv)
goto fail; goto fail;
} }
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
...@@ -2,5 +2,5 @@ ...@@ -2,5 +2,5 @@
* fs/partitions/amiga.h * fs/partitions/amiga.h
*/ */
int amiga_partition(struct parsed_partitions *state, struct block_device *bdev); int amiga_partition(struct parsed_partitions *state);
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment