Commit 3ad11d7a authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - Series of merge handling cleanups (Baolin, Christoph)

 - Series of blk-throttle fixes and cleanups (Baolin)

 - Series cleaning up BDI, seperating the block device from the
   backing_dev_info (Christoph)

 - Removal of bdget() as a generic API (Christoph)

 - Removal of blkdev_get() as a generic API (Christoph)

 - Cleanup of is-partition checks (Christoph)

 - Series reworking disk revalidation (Christoph)

 - Series cleaning up bio flags (Christoph)

 - bio crypt fixes (Eric)

 - IO stats inflight tweak (Gabriel)

 - blk-mq tags fixes (Hannes)

 - Buffer invalidation fixes (Jan)

 - Allow soft limits for zone append (Johannes)

 - Shared tag set improvements (John, Kashyap)

 - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

 - DM no-wait support (Mike, Konstantin)

 - Request allocation improvements (Ming)

 - Allow md/dm/bcache to use IO stat helpers (Song)

 - Series improving blk-iocost (Tejun)

 - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
   Xianting, Yang, Yufen, yangerkun)

* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
  block: fix uapi blkzoned.h comments
  blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
  blk-mq: get rid of the dead flush handle code path
  block: get rid of unnecessary local variable
  block: fix comment and add lockdep assert
  blk-mq: use helper function to test hw stopped
  block: use helper function to test queue register
  block: remove redundant mq check
  block: invoke blk_mq_exit_sched no matter whether have .exit_sched
  percpu_ref: don't refer to ref->data if it isn't allocated
  block: ratelimit handle_bad_sector() message
  blk-throttle: Re-use the throtl_set_slice_end()
  blk-throttle: Open code __throtl_de/enqueue_tg()
  blk-throttle: Move service tree validation out of the throtl_rb_first()
  blk-throttle: Move the list operation after list validation
  blk-throttle: Fix IO hang for a corner case
  blk-throttle: Avoid tracking latency if low limit is invalid
  blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
  blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
  block: Remove redundant 'return' statement
  ...
parents 857d6448 8858e8d9
......@@ -488,9 +488,6 @@ getgeo: no
swap_slot_free_notify: no (see below)
======================= ===================
unlock_native_capacity and revalidate_disk are called only from
check_disk_change().
swap_slot_free_notify is called with swap_lock and sometimes the page lock
held.
......
......@@ -181,7 +181,7 @@ HDIO_SET_UNMASKINTR
error return:
- EINVAL (bdev != bdev->bd_contains) (not sure what this means)
- EINVAL Called on a partition instead of the whole disk device
- EACCES Access denied: requires CAP_SYS_ADMIN
- EINVAL value out of range [0 1]
- EBUSY Controller busy
......@@ -231,7 +231,7 @@ HDIO_SET_MULTCOUNT
error return:
- EINVAL (bdev != bdev->bd_contains) (not sure what this means)
- EINVAL Called on a partition instead of the whole disk device
- EACCES Access denied: requires CAP_SYS_ADMIN
- EINVAL value out of range supported by disk.
- EBUSY Controller busy or blockmode already set.
......@@ -295,7 +295,7 @@ HDIO_GET_IDENTITY
the ATA specification.
error returns:
- EINVAL (bdev != bdev->bd_contains) (not sure what this means)
- EINVAL Called on a partition instead of the whole disk device
- ENOMSG IDENTIFY DEVICE information not available
notes:
......@@ -355,7 +355,7 @@ HDIO_SET_KEEPSETTINGS
error return:
- EINVAL (bdev != bdev->bd_contains) (not sure what this means)
- EINVAL Called on a partition instead of the whole disk device
- EACCES Access denied: requires CAP_SYS_ADMIN
- EINVAL value out of range [0 1]
- EBUSY Controller busy
......@@ -1055,7 +1055,7 @@ HDIO_SET_32BIT
error return:
- EINVAL (bdev != bdev->bd_contains) (not sure what this means)
- EINVAL Called on a partition instead of the whole disk device
- EACCES Access denied: requires CAP_SYS_ADMIN
- EINVAL value out of range [0 3]
- EBUSY Controller busy
......@@ -1085,7 +1085,7 @@ HDIO_SET_NOWERR
error return:
- EINVAL (bdev != bdev->bd_contains) (not sure what this means)
- EINVAL Called on a partition instead of the whole disk device
- EACCES Access denied: requires CAP_SYS_ADMIN
- EINVAL value out of range [0 1]
- EBUSY Controller busy
......@@ -1113,7 +1113,7 @@ HDIO_SET_DMA
error return:
- EINVAL (bdev != bdev->bd_contains) (not sure what this means)
- EINVAL Called on a partition instead of the whole disk device
- EACCES Access denied: requires CAP_SYS_ADMIN
- EINVAL value out of range [0 1]
- EBUSY Controller busy
......@@ -1141,7 +1141,7 @@ HDIO_SET_PIO_MODE
error return:
- EINVAL (bdev != bdev->bd_contains) (not sure what this means)
- EINVAL Called on a partition instead of the whole disk device
- EACCES Access denied: requires CAP_SYS_ADMIN
- EINVAL value out of range [0 255]
- EBUSY Controller busy
......@@ -1237,7 +1237,7 @@ HDIO_SET_WCACHE
error return:
- EINVAL (bdev != bdev->bd_contains) (not sure what this means)
- EINVAL Called on a partition instead of the whole disk device
- EACCES Access denied: requires CAP_SYS_ADMIN
- EINVAL value out of range [0 1]
- EBUSY Controller busy
......@@ -1265,7 +1265,7 @@ HDIO_SET_ACOUSTIC
error return:
- EINVAL (bdev != bdev->bd_contains) (not sure what this means)
- EINVAL Called on a partition instead of the whole disk device
- EACCES Access denied: requires CAP_SYS_ADMIN
- EINVAL value out of range [0 254]
- EBUSY Controller busy
......@@ -1305,7 +1305,7 @@ HDIO_SET_ADDRESS
error return:
- EINVAL (bdev != bdev->bd_contains) (not sure what this means)
- EINVAL Called on a partition instead of the whole disk device
- EACCES Access denied: requires CAP_SYS_ADMIN
- EINVAL value out of range [0 2]
- EBUSY Controller busy
......@@ -1331,7 +1331,7 @@ HDIO_SET_IDE_SCSI
error return:
- EINVAL (bdev != bdev->bd_contains) (not sure what this means)
- EINVAL Called on a partition instead of the whole disk device
- EACCES Access denied: requires CAP_SYS_ADMIN
- EINVAL value out of range [0 1]
- EBUSY Controller busy
......
......@@ -161,8 +161,6 @@ config BLK_WBT_MQ
depends on BLK_WBT
help
Enable writeback throttling by default on multiqueue devices.
Multiqueue currently doesn't have support for IO scheduling,
enabling this option is recommended.
config BLK_DEBUG_FS
bool "Block layer debugging information in debugfs"
......
......@@ -4640,6 +4640,9 @@ static bool bfq_has_work(struct blk_mq_hw_ctx *hctx)
{
struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;
if (!atomic_read(&hctx->elevator_queued))
return false;
/*
* Avoiding lock: a race on bfqd->busy_queues should cause at
* most a call to dispatch for nothing
......@@ -5554,6 +5557,7 @@ static void bfq_insert_requests(struct blk_mq_hw_ctx *hctx,
rq = list_first_entry(list, struct request, queuelist);
list_del_init(&rq->queuelist);
bfq_insert_request(hctx, rq, at_head);
atomic_inc(&hctx->elevator_queued);
}
}
......@@ -5921,6 +5925,7 @@ static void bfq_finish_requeue_request(struct request *rq)
bfq_completed_request(bfqq, bfqd);
bfq_finish_requeue_request_body(bfqq);
atomic_dec(&rq->mq_hctx->elevator_queued);
spin_unlock_irqrestore(&bfqd->lock, flags);
} else {
......@@ -6360,8 +6365,8 @@ static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx)
struct blk_mq_tags *tags = hctx->sched_tags;
unsigned int min_shallow;
min_shallow = bfq_update_depths(bfqd, &tags->bitmap_tags);
sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, min_shallow);
min_shallow = bfq_update_depths(bfqd, tags->bitmap_tags);
sbitmap_queue_min_shallow_depth(tags->bitmap_tags, min_shallow);
}
static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
......
......@@ -713,20 +713,18 @@ struct bio *bio_clone_fast(struct bio *bio, gfp_t gfp_mask, struct bio_set *bs)
__bio_clone_fast(b, bio);
bio_crypt_clone(b, bio, gfp_mask);
if (bio_crypt_clone(b, bio, gfp_mask) < 0)
goto err_put;
if (bio_integrity(bio)) {
int ret;
ret = bio_integrity_clone(b, bio, gfp_mask);
if (ret < 0) {
bio_put(b);
return NULL;
}
}
if (bio_integrity(bio) &&
bio_integrity_clone(b, bio, gfp_mask) < 0)
goto err_put;
return b;
err_put:
bio_put(b);
return NULL;
}
EXPORT_SYMBOL(bio_clone_fast);
......
......@@ -119,6 +119,8 @@ static void blkg_async_bio_workfn(struct work_struct *work)
async_bio_work);
struct bio_list bios = BIO_EMPTY_LIST;
struct bio *bio;
struct blk_plug plug;
bool need_plug = false;
/* as long as there are pending bios, @blkg can't go away */
spin_lock_bh(&blkg->async_bio_lock);
......@@ -126,8 +128,15 @@ static void blkg_async_bio_workfn(struct work_struct *work)
bio_list_init(&blkg->async_bios);
spin_unlock_bh(&blkg->async_bio_lock);
/* start plug only when bio_list contains at least 2 bios */
if (bios.head && bios.head->bi_next) {
need_plug = true;
blk_start_plug(&plug);
}
while ((bio = bio_list_pop(&bios)))
submit_bio(bio);
if (need_plug)
blk_finish_plug(&plug);
}
/**
......@@ -1613,16 +1622,24 @@ static void blkcg_scale_delay(struct blkcg_gq *blkg, u64 now)
static void blkcg_maybe_throttle_blkg(struct blkcg_gq *blkg, bool use_memdelay)
{
unsigned long pflags;
bool clamp;
u64 now = ktime_to_ns(ktime_get());
u64 exp;
u64 delay_nsec = 0;
int tok;
while (blkg->parent) {
if (atomic_read(&blkg->use_delay)) {
int use_delay = atomic_read(&blkg->use_delay);
if (use_delay) {
u64 this_delay;
blkcg_scale_delay(blkg, now);
delay_nsec = max_t(u64, delay_nsec,
atomic64_read(&blkg->delay_nsec));
this_delay = atomic64_read(&blkg->delay_nsec);
if (this_delay > delay_nsec) {
delay_nsec = this_delay;
clamp = use_delay > 0;
}
}
blkg = blkg->parent;
}
......@@ -1634,10 +1651,13 @@ static void blkcg_maybe_throttle_blkg(struct blkcg_gq *blkg, bool use_memdelay)
* Let's not sleep for all eternity if we've amassed a huge delay.
* Swapping or metadata IO can accumulate 10's of seconds worth of
* delay, and we want userspace to be able to do _something_ so cap the
* delays at 1 second. If there's 10's of seconds worth of delay then
* the tasks will be delayed for 1 second for every syscall.
* delays at 0.25s. If there's 10's of seconds worth of delay then the
* tasks will be delayed for 0.25 second for every syscall. If
* blkcg_set_delay() was used as indicated by negative use_delay, the
* caller is responsible for regulating the range.
*/
delay_nsec = min_t(u64, delay_nsec, 250 * NSEC_PER_MSEC);
if (clamp)
delay_nsec = min_t(u64, delay_nsec, 250 * NSEC_PER_MSEC);
if (use_memdelay)
psi_memstall_enter(&pflags);
......
This diff is collapsed.
......@@ -142,13 +142,24 @@ static inline void blk_crypto_free_request(struct request *rq)
__blk_crypto_free_request(rq);
}
void __blk_crypto_rq_bio_prep(struct request *rq, struct bio *bio,
gfp_t gfp_mask);
static inline void blk_crypto_rq_bio_prep(struct request *rq, struct bio *bio,
gfp_t gfp_mask)
int __blk_crypto_rq_bio_prep(struct request *rq, struct bio *bio,
gfp_t gfp_mask);
/**
* blk_crypto_rq_bio_prep - Prepare a request's crypt_ctx when its first bio
* is inserted
* @rq: The request to prepare
* @bio: The first bio being inserted into the request
* @gfp_mask: Memory allocation flags
*
* Return: 0 on success, -ENOMEM if out of memory. -ENOMEM is only possible if
* @gfp_mask doesn't include %__GFP_DIRECT_RECLAIM.
*/
static inline int blk_crypto_rq_bio_prep(struct request *rq, struct bio *bio,
gfp_t gfp_mask)
{
if (bio_has_crypt_ctx(bio))
__blk_crypto_rq_bio_prep(rq, bio, gfp_mask);
return __blk_crypto_rq_bio_prep(rq, bio, gfp_mask);
return 0;
}
/**
......
......@@ -81,7 +81,15 @@ subsys_initcall(bio_crypt_ctx_init);
void bio_crypt_set_ctx(struct bio *bio, const struct blk_crypto_key *key,
const u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE], gfp_t gfp_mask)
{
struct bio_crypt_ctx *bc = mempool_alloc(bio_crypt_ctx_pool, gfp_mask);
struct bio_crypt_ctx *bc;
/*
* The caller must use a gfp_mask that contains __GFP_DIRECT_RECLAIM so
* that the mempool_alloc() can't fail.
*/
WARN_ON_ONCE(!(gfp_mask & __GFP_DIRECT_RECLAIM));
bc = mempool_alloc(bio_crypt_ctx_pool, gfp_mask);
bc->bc_key = key;
memcpy(bc->bc_dun, dun, sizeof(bc->bc_dun));
......@@ -95,10 +103,13 @@ void __bio_crypt_free_ctx(struct bio *bio)
bio->bi_crypt_context = NULL;
}
void __bio_crypt_clone(struct bio *dst, struct bio *src, gfp_t gfp_mask)
int __bio_crypt_clone(struct bio *dst, struct bio *src, gfp_t gfp_mask)
{
dst->bi_crypt_context = mempool_alloc(bio_crypt_ctx_pool, gfp_mask);
if (!dst->bi_crypt_context)
return -ENOMEM;
*dst->bi_crypt_context = *src->bi_crypt_context;
return 0;
}
EXPORT_SYMBOL_GPL(__bio_crypt_clone);
......@@ -280,20 +291,16 @@ bool __blk_crypto_bio_prep(struct bio **bio_ptr)
return false;
}
/**
* __blk_crypto_rq_bio_prep - Prepare a request's crypt_ctx when its first bio
* is inserted
*
* @rq: The request to prepare
* @bio: The first bio being inserted into the request
* @gfp_mask: gfp mask
*/
void __blk_crypto_rq_bio_prep(struct request *rq, struct bio *bio,
gfp_t gfp_mask)
int __blk_crypto_rq_bio_prep(struct request *rq, struct bio *bio,
gfp_t gfp_mask)
{
if (!rq->crypt_ctx)
if (!rq->crypt_ctx) {
rq->crypt_ctx = mempool_alloc(bio_crypt_ctx_pool, gfp_mask);
if (!rq->crypt_ctx)
return -ENOMEM;
}
*rq->crypt_ctx = *bio->bi_crypt_context;
return 0;
}
/**
......
......@@ -183,7 +183,6 @@ bool blk_integrity_merge_rq(struct request_queue *q, struct request *req,
return true;
}
EXPORT_SYMBOL(blk_integrity_merge_rq);
bool blk_integrity_merge_bio(struct request_queue *q, struct request *req,
struct bio *bio)
......@@ -212,7 +211,6 @@ bool blk_integrity_merge_bio(struct request_queue *q, struct request *req,
return true;
}
EXPORT_SYMBOL(blk_integrity_merge_bio);
struct integrity_sysfs_entry {
struct attribute attr;
......@@ -408,7 +406,7 @@ void blk_integrity_register(struct gendisk *disk, struct blk_integrity *template
bi->tuple_size = template->tuple_size;
bi->tag_size = template->tag_size;
disk->queue->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, disk->queue);
#ifdef CONFIG_BLK_INLINE_ENCRYPTION
if (disk->queue->ksm) {
......@@ -428,7 +426,7 @@ EXPORT_SYMBOL(blk_integrity_register);
*/
void blk_integrity_unregister(struct gendisk *disk)
{
disk->queue->backing_dev_info->capabilities &= ~BDI_CAP_STABLE_WRITES;
blk_queue_flag_clear(QUEUE_FLAG_STABLE_WRITES, disk->queue);
memset(&disk->queue->integrity, 0, sizeof(struct blk_integrity));
}
EXPORT_SYMBOL(blk_integrity_unregister);
......
This diff is collapsed.
......@@ -1046,7 +1046,7 @@ static int __init iolatency_init(void)
static void __exit iolatency_exit(void)
{
return blkcg_policy_unregister(&blkcg_policy_iolatency);
blkcg_policy_unregister(&blkcg_policy_iolatency);
}
module_init(iolatency_init);
......
......@@ -64,7 +64,7 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
return -EINVAL;
/* In case the discard request is in a partition */
if (bdev->bd_partno)
if (bdev_is_partition(bdev))
part_offset = bdev->bd_part->start_sect;
while (nr_sects) {
......
......@@ -12,7 +12,8 @@
#include "blk.h"
struct bio_map_data {
int is_our_pages;
bool is_our_pages : 1;
bool is_null_mapped : 1;
struct iov_iter iter;
struct iovec iov[];
};
......@@ -108,7 +109,7 @@ static int bio_uncopy_user(struct bio *bio)
struct bio_map_data *bmd = bio->bi_private;
int ret = 0;
if (!bio_flagged(bio, BIO_NULL_MAPPED)) {
if (!bmd->is_null_mapped) {
/*
* if we're in a workqueue, the request is orphaned, so
* don't copy into a random user address space, just free
......@@ -126,24 +127,12 @@ static int bio_uncopy_user(struct bio *bio)
return ret;
}
/**
* bio_copy_user_iov - copy user data to bio
* @q: destination block queue
* @map_data: pointer to the rq_map_data holding pages (if necessary)
* @iter: iovec iterator
* @gfp_mask: memory allocation flags
*
* Prepares and returns a bio for indirect user io, bouncing data
* to/from kernel pages as necessary. Must be paired with
* call bio_uncopy_user() on io completion.
*/
static struct bio *bio_copy_user_iov(struct request_queue *q,
struct rq_map_data *map_data, struct iov_iter *iter,
gfp_t gfp_mask)
static int bio_copy_user_iov(struct request *rq, struct rq_map_data *map_data,
struct iov_iter *iter, gfp_t gfp_mask)
{
struct bio_map_data *bmd;
struct page *page;
struct bio *bio;
struct bio *bio, *bounce_bio;
int i = 0, ret;
int nr_pages;
unsigned int len = iter->count;
......@@ -151,14 +140,15 @@ static struct bio *bio_copy_user_iov(struct request_queue *q,
bmd = bio_alloc_map_data(iter, gfp_mask);
if (!bmd)
return ERR_PTR(-ENOMEM);
return -ENOMEM;
/*
* We need to do a deep copy of the iov_iter including the iovecs.
* The caller provided iov might point to an on-stack or otherwise
* shortlived one.
*/
bmd->is_our_pages = map_data ? 0 : 1;
bmd->is_our_pages = !map_data;
bmd->is_null_mapped = (map_data && map_data->null_mapped);
nr_pages = DIV_ROUND_UP(offset + len, PAGE_SIZE);
if (nr_pages > BIO_MAX_PAGES)
......@@ -168,8 +158,7 @@ static struct bio *bio_copy_user_iov(struct request_queue *q,
bio = bio_kmalloc(gfp_mask, nr_pages);
if (!bio)
goto out_bmd;
ret = 0;
bio->bi_opf |= req_op(rq);
if (map_data) {
nr_pages = 1 << map_data->page_order;
......@@ -186,7 +175,7 @@ static struct bio *bio_copy_user_iov(struct request_queue *q,
if (map_data) {
if (i == map_data->nr_entries * nr_pages) {
ret = -ENOMEM;
break;
goto cleanup;
}
page = map_data->pages[i / nr_pages];
......@@ -194,14 +183,14 @@ static struct bio *bio_copy_user_iov(struct request_queue *q,
i++;
} else {
page = alloc_page(q->bounce_gfp | gfp_mask);
page = alloc_page(rq->q->bounce_gfp | gfp_mask);
if (!page) {
ret = -ENOMEM;
break;
goto cleanup;
}
}
if (bio_add_pc_page(q, bio, page, bytes, offset) < bytes) {
if (bio_add_pc_page(rq->q, bio, page, bytes, offset) < bytes) {
if (!map_data)
__free_page(page);
break;
......@@ -211,9 +200,6 @@ static struct bio *bio_copy_user_iov(struct request_queue *q,
offset = 0;
}
if (ret)
goto cleanup;
if (map_data)
map_data->offset += bio->bi_iter.bi_size;
......@@ -233,41 +219,42 @@ static struct bio *bio_copy_user_iov(struct request_queue *q,
}
bio->bi_private = bmd;
if (map_data && map_data->null_mapped)
bio_set_flag(bio, BIO_NULL_MAPPED);
return bio;
bounce_bio = bio;
ret = blk_rq_append_bio(rq, &bounce_bio);
if (ret)
goto cleanup;
/*
* We link the bounce buffer in and could have to traverse it later, so
* we have to get a ref to prevent it from being freed
*/
bio_get(bounce_bio);
return 0;
cleanup:
if (!map_data)
bio_free_pages(bio);
bio_put(bio);
out_bmd:
kfree(bmd);
return ERR_PTR(ret);
return ret;
}
/**
* bio_map_user_iov - map user iovec into bio
* @q: the struct request_queue for the bio
* @iter: iovec iterator
* @gfp_mask: memory allocation flags
*
* Map the user space address into a bio suitable for io to a block
* device. Returns an error pointer in case of error.
*/
static struct bio *bio_map_user_iov(struct request_queue *q,
struct iov_iter *iter, gfp_t gfp_mask)
static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
gfp_t gfp_mask)
{
unsigned int max_sectors = queue_max_hw_sectors(q);
int j;
struct bio *bio;
unsigned int max_sectors = queue_max_hw_sectors(rq->q);
struct bio *bio, *bounce_bio;
int ret;
int j;
if (!iov_iter_count(iter))
return ERR_PTR(-EINVAL);
return -EINVAL;
bio = bio_kmalloc(gfp_mask, iov_iter_npages(iter, BIO_MAX_PAGES));
if (!bio)
return ERR_PTR(-ENOMEM);
return -ENOMEM;
bio->bi_opf |= req_op(rq);
while (iov_iter_count(iter)) {
struct page **pages;
......@@ -283,7 +270,7 @@ static struct bio *bio_map_user_iov(struct request_queue *q,
npages = DIV_ROUND_UP(offs + bytes, PAGE_SIZE);
if (unlikely(offs & queue_dma_alignment(q))) {
if (unlikely(offs & queue_dma_alignment(rq->q))) {
ret = -EINVAL;
j = 0;
} else {
......@@ -295,7 +282,7 @@ static struct bio *bio_map_user_iov(struct request_queue *q,
if (n > bytes)
n = bytes;
if (!bio_add_hw_page(q, bio, page, n, offs,
if (!bio_add_hw_page(rq->q, bio, page, n, offs,
max_sectors, &same_page)) {
if (same_page)
put_page(page);
......@@ -319,21 +306,31 @@ static struct bio *bio_map_user_iov(struct request_queue *q,
break;
}
bio_set_flag(bio, BIO_USER_MAPPED);
/*
* subtle -- if bio_map_user_iov() ended up bouncing a bio,
* it would normally disappear when its bi_end_io is run.
* however, we need it for the unmap, so grab an extra
* reference to it
* Subtle: if we end up needing to bounce a bio, it would normally
* disappear when its bi_end_io is run. However, we need the original
* bio for the unmap, so grab an extra reference to it
*/
bio_get(bio);
return bio;
bounce_bio = bio;
ret = blk_rq_append_bio(rq, &bounce_bio);
if (ret)
goto out_put_orig;
/*
* We link the bounce buffer in and could have to traverse it
* later, so we have to get a ref to prevent it from being freed
*/
bio_get(bounce_bio);
return 0;
out_put_orig:
bio_put(bio);
out_unmap:
bio_release_pages(bio, false);
bio_put(bio);
return ERR_PTR(ret);
return ret;
}
/**
......@@ -557,55 +554,6 @@ int blk_rq_append_bio(struct request *rq, struct bio **bio)
}
EXPORT_SYMBOL(blk_rq_append_bio);
static int __blk_rq_unmap_user(struct bio *bio)
{
int ret = 0;
if (bio) {
if (bio_flagged(bio, BIO_USER_MAPPED))
bio_unmap_user(bio);
else
ret = bio_uncopy_user(bio);
}
return ret;
}
static int __blk_rq_map_user_iov(struct request *rq,
struct rq_map_data *map_data, struct iov_iter *iter,
gfp_t gfp_mask, bool copy)
{
struct request_queue *q = rq->q;
struct bio *bio, *orig_bio;
int ret;
if (copy)
bio = bio_copy_user_iov(q, map_data, iter, gfp_mask);
else
bio = bio_map_user_iov(q, iter, gfp_mask);
if (IS_ERR(bio))
return PTR_ERR(bio);
bio->bi_opf &= ~REQ_OP_MASK;
bio->bi_opf |= req_op(rq);
orig_bio = bio;
/*
* We link the bounce buffer in and could have to traverse it
* later so we have to get a ref to prevent it from being freed
*/
ret = blk_rq_append_bio(rq, &bio);
if (ret) {
__blk_rq_unmap_user(orig_bio);
return ret;
}
bio_get(bio);
return 0;
}
/**
* blk_rq_map_user_iov - map user data to a request, for passthrough requests
* @q: request queue where request should be inserted
......@@ -649,7 +597,10 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
i = *iter;
do {
ret =__blk_rq_map_user_iov(rq, map_data, &i, gfp_mask, copy);
if (copy)
ret = bio_copy_user_iov(rq, map_data, &i, gfp_mask);
else
ret = bio_map_user_iov(rq, &i, gfp_mask);
if (ret)
goto unmap_rq;
if (!bio)
......@@ -700,9 +651,13 @@ int blk_rq_unmap_user(struct bio *bio)
if (unlikely(bio_flagged(bio, BIO_BOUNCED)))
mapped_bio = bio->bi_private;
ret2 = __blk_rq_unmap_user(mapped_bio);
if (ret2 && !ret)
ret = ret2;
if (bio->bi_private) {
ret2 = bio_uncopy_user(mapped_bio);
if (ret2 && !ret)
ret = ret2;
} else {
bio_unmap_user(mapped_bio);
}
mapped_bio = bio;
bio = bio->bi_next;
......
......@@ -11,6 +11,7 @@
#include <trace/events/block.h>
#include "blk.h"
#include "blk-rq-qos.h"
static inline bool bio_will_gap(struct request_queue *q,
struct request *prev_rq, struct bio *prev, struct bio *next)
......@@ -579,7 +580,8 @@ int ll_back_merge_fn(struct request *req, struct bio *bio, unsigned int nr_segs)
return ll_new_hw_segment(req, bio, nr_segs);
}
int ll_front_merge_fn(struct request *req, struct bio *bio, unsigned int nr_segs)
static int ll_front_merge_fn(struct request *req, struct bio *bio,
unsigned int nr_segs)
{
if (req_gap_front_merge(req, bio))
return 0;
......@@ -809,7 +811,8 @@ static struct request *attempt_merge(struct request_queue *q,
return next;
}
struct request *attempt_back_merge(struct request_queue *q, struct request *rq)
static struct request *attempt_back_merge(struct request_queue *q,
struct request *rq)
{
struct request *next = elv_latter_request(q, rq);
......@@ -819,7 +822,8 @@ struct request *attempt_back_merge(struct request_queue *q, struct request *rq)
return NULL;
}
struct request *attempt_front_merge(struct request_queue *q, struct request *rq)
static struct request *attempt_front_merge(struct request_queue *q,
struct request *rq)
{
struct request *prev = elv_former_request(q, rq);
......@@ -895,3 +899,238 @@ enum elv_merge blk_try_merge(struct request *rq, struct bio *bio)
return ELEVATOR_FRONT_MERGE;
return ELEVATOR_NO_MERGE;
}
static void blk_account_io_merge_bio(struct request *req)
{
if (!blk_do_io_stat(req))
return;
part_stat_lock();
part_stat_inc(req->part, merges[op_stat_group(req_op(req))]);
part_stat_unlock();
}
enum bio_merge_status {
BIO_MERGE_OK,
BIO_MERGE_NONE,
BIO_MERGE_FAILED,
};
static enum bio_merge_status bio_attempt_back_merge(struct request *req,
struct bio *bio, unsigned int nr_segs)
{
const int ff = bio->bi_opf & REQ_FAILFAST_MASK;
if (!ll_back_merge_fn(req, bio, nr_segs))
return BIO_MERGE_FAILED;
trace_block_bio_backmerge(req->q, req, bio);
rq_qos_merge(req->q, req, bio);
if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff)
blk_rq_set_mixed_merge(req);
req->biotail->bi_next = bio;
req->biotail = bio;
req->__data_len += bio->bi_iter.bi_size;
bio_crypt_free_ctx(bio);
blk_account_io_merge_bio(req);
return BIO_MERGE_OK;
}
static enum bio_merge_status bio_attempt_front_merge(struct request *req,
struct bio *bio, unsigned int nr_segs)
{
const int ff = bio->bi_opf & REQ_FAILFAST_MASK;
if (!ll_front_merge_fn(req, bio, nr_segs))
return BIO_MERGE_FAILED;
trace_block_bio_frontmerge(req->q, req, bio);
rq_qos_merge(req->q, req, bio);
if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff)
blk_rq_set_mixed_merge(req);
bio->bi_next = req->bio;
req->bio = bio;
req->__sector = bio->bi_iter.bi_sector;
req->__data_len += bio->bi_iter.bi_size;
bio_crypt_do_front_merge(req, bio);
blk_account_io_merge_bio(req);
return BIO_MERGE_OK;
}
static enum bio_merge_status bio_attempt_discard_merge(struct request_queue *q,
struct request *req, struct bio *bio)
{
unsigned short segments = blk_rq_nr_discard_segments(req);
if (segments >= queue_max_discard_segments(q))
goto no_merge;
if (blk_rq_sectors(req) + bio_sectors(bio) >
blk_rq_get_max_sectors(req, blk_rq_pos(req)))
goto no_merge;
rq_qos_merge(q, req, bio);
req->biotail->bi_next = bio;
req->biotail = bio;
req->__data_len += bio->bi_iter.bi_size;
req->nr_phys_segments = segments + 1;
blk_account_io_merge_bio(req);
return BIO_MERGE_OK;
no_merge:
req_set_nomerge(q, req);
return BIO_MERGE_FAILED;
}
static enum bio_merge_status blk_attempt_bio_merge(struct request_queue *q,
struct request *rq,
struct bio *bio,
unsigned int nr_segs,
bool sched_allow_merge)
{
if (!blk_rq_merge_ok(rq, bio))
return BIO_MERGE_NONE;
switch (blk_try_merge(rq, bio)) {
case ELEVATOR_BACK_MERGE:
if (!sched_allow_merge || blk_mq_sched_allow_merge(q, rq, bio))
return bio_attempt_back_merge(rq, bio, nr_segs);
break;
case ELEVATOR_FRONT_MERGE:
if (!sched_allow_merge || blk_mq_sched_allow_merge(q, rq, bio))
return bio_attempt_front_merge(rq, bio, nr_segs);
break;
case ELEVATOR_DISCARD_MERGE:
return bio_attempt_discard_merge(q, rq, bio);
default:
return BIO_MERGE_NONE;
}
return BIO_MERGE_FAILED;
}
/**
* blk_attempt_plug_merge - try to merge with %current's plugged list
* @q: request_queue new bio is being queued at
* @bio: new bio being queued
* @nr_segs: number of segments in @bio
* @same_queue_rq: pointer to &struct request that gets filled in when
* another request associated with @q is found on the plug list
* (optional, may be %NULL)
*
* Determine whether @bio being queued on @q can be merged with a request
* on %current's plugged list. Returns %true if merge was successful,
* otherwise %false.
*
* Plugging coalesces IOs from the same issuer for the same purpose without
* going through @q->queue_lock. As such it's more of an issuing mechanism
* than scheduling, and the request, while may have elvpriv data, is not
* added on the elevator at this point. In addition, we don't have
* reliable access to the elevator outside queue lock. Only check basic
* merging parameters without querying the elevator.
*
* Caller must ensure !blk_queue_nomerges(q) beforehand.
*/
bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
unsigned int nr_segs, struct request **same_queue_rq)
{
struct blk_plug *plug;
struct request *rq;
struct list_head *plug_list;
plug = blk_mq_plug(q, bio);
if (!plug)
return false;
plug_list = &plug->mq_list;
list_for_each_entry_reverse(rq, plug_list, queuelist) {
if (rq->q == q && same_queue_rq) {
/*
* Only blk-mq multiple hardware queues case checks the
* rq in the same queue, there should be only one such
* rq in a queue
**/
*same_queue_rq = rq;
}
if (rq->q != q)
continue;
if (blk_attempt_bio_merge(q, rq, bio, nr_segs, false) ==
BIO_MERGE_OK)
return true;
}
return false;
}
/*
* Iterate list of requests and see if we can merge this bio with any
* of them.
*/
bool blk_bio_list_merge(struct request_queue *q, struct list_head *list,
struct bio *bio, unsigned int nr_segs)
{
struct request *rq;
int checked = 8;
list_for_each_entry_reverse(rq, list, queuelist) {
if (!checked--)
break;
switch (blk_attempt_bio_merge(q, rq, bio, nr_segs, true)) {
case BIO_MERGE_NONE:
continue;
case BIO_MERGE_OK:
return true;
case BIO_MERGE_FAILED:
return false;
}
}
return false;
}
EXPORT_SYMBOL_GPL(blk_bio_list_merge);
bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
unsigned int nr_segs, struct request **merged_request)
{
struct request *rq;
switch (elv_merge(q, &rq, bio)) {
case ELEVATOR_BACK_MERGE:
if (!blk_mq_sched_allow_merge(q, rq, bio))
return false;
if (bio_attempt_back_merge(rq, bio, nr_segs) != BIO_MERGE_OK)
return false;
*merged_request = attempt_back_merge(q, rq);
if (!*merged_request)
elv_merged_request(q, rq, ELEVATOR_BACK_MERGE);
return true;
case ELEVATOR_FRONT_MERGE:
if (!blk_mq_sched_allow_merge(q, rq, bio))
return false;
if (bio_attempt_front_merge(rq, bio, nr_segs) != BIO_MERGE_OK)
return false;
*merged_request = attempt_front_merge(q, rq);
if (!*merged_request)
elv_merged_request(q, rq, ELEVATOR_FRONT_MERGE);
return true;
case ELEVATOR_DISCARD_MERGE:
return bio_attempt_discard_merge(q, rq, bio) == BIO_MERGE_OK;
default:
return false;
}
}
EXPORT_SYMBOL_GPL(blk_mq_sched_try_merge);
......@@ -116,6 +116,7 @@ static const char *const blk_queue_flag_name[] = {
QUEUE_FLAG_NAME(SAME_FORCE),
QUEUE_FLAG_NAME(DEAD),
QUEUE_FLAG_NAME(INIT_DONE),
QUEUE_FLAG_NAME(STABLE_WRITES),
QUEUE_FLAG_NAME(POLL),
QUEUE_FLAG_NAME(WC),
QUEUE_FLAG_NAME(FUA),
......@@ -240,7 +241,7 @@ static const char *const alloc_policy_name[] = {
#define HCTX_FLAG_NAME(name) [ilog2(BLK_MQ_F_##name)] = #name
static const char *const hctx_flag_name[] = {
HCTX_FLAG_NAME(SHOULD_MERGE),
HCTX_FLAG_NAME(TAG_SHARED),
HCTX_FLAG_NAME(TAG_QUEUE_SHARED),
HCTX_FLAG_NAME(BLOCKING),
HCTX_FLAG_NAME(NO_SCHED),
HCTX_FLAG_NAME(STACKING),
......@@ -452,11 +453,11 @@ static void blk_mq_debugfs_tags_show(struct seq_file *m,
atomic_read(&tags->active_queues));
seq_puts(m, "\nbitmap_tags:\n");
sbitmap_queue_show(&tags->bitmap_tags, m);
sbitmap_queue_show(tags->bitmap_tags, m);
if (tags->nr_reserved_tags) {
seq_puts(m, "\nbreserved_tags:\n");
sbitmap_queue_show(&tags->breserved_tags, m);
sbitmap_queue_show(tags->breserved_tags, m);
}
}
......@@ -487,7 +488,7 @@ static int hctx_tags_bitmap_show(void *data, struct seq_file *m)
if (res)
goto out;
if (hctx->tags)
sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
sbitmap_bitmap_show(&hctx->tags->bitmap_tags->sb, m);
mutex_unlock(&q->sysfs_lock);
out:
......@@ -521,7 +522,7 @@ static int hctx_sched_tags_bitmap_show(void *data, struct seq_file *m)
if (res)
goto out;
if (hctx->sched_tags)
sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags.sb, m);
sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags->sb, m);
mutex_unlock(&q->sysfs_lock);
out:
......
......@@ -18,21 +18,6 @@
#include "blk-mq-tag.h"
#include "blk-wbt.h"
void blk_mq_sched_free_hctx_data(struct request_queue *q,
void (*exit)(struct blk_mq_hw_ctx *))
{
struct blk_mq_hw_ctx *hctx;
int i;
queue_for_each_hw_ctx(q, hctx, i) {
if (exit && hctx->sched_data)
exit(hctx);
kfree(hctx->sched_data);
hctx->sched_data = NULL;
}
}
EXPORT_SYMBOL_GPL(blk_mq_sched_free_hctx_data);
void blk_mq_sched_assign_ioc(struct request *rq)
{
struct request_queue *q = rq->q;
......@@ -359,104 +344,6 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
}
}
bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
unsigned int nr_segs, struct request **merged_request)
{
struct request *rq;
switch (elv_merge(q, &rq, bio)) {
case ELEVATOR_BACK_MERGE:
if (!blk_mq_sched_allow_merge(q, rq, bio))
return false;
if (!bio_attempt_back_merge(rq, bio, nr_segs))
return false;
*merged_request = attempt_back_merge(q, rq);
if (!*merged_request)
elv_merged_request(q, rq, ELEVATOR_BACK_MERGE);
return true;
case ELEVATOR_FRONT_MERGE:
if (!blk_mq_sched_allow_merge(q, rq, bio))
return false;
if (!bio_attempt_front_merge(rq, bio, nr_segs))
return false;
*merged_request = attempt_front_merge(q, rq);
if (!*merged_request)
elv_merged_request(q, rq, ELEVATOR_FRONT_MERGE);
return true;
case ELEVATOR_DISCARD_MERGE:
return bio_attempt_discard_merge(q, rq, bio);
default:
return false;
}
}
EXPORT_SYMBOL_GPL(blk_mq_sched_try_merge);
/*
* Iterate list of requests and see if we can merge this bio with any
* of them.
*/
bool blk_mq_bio_list_merge(struct request_queue *q, struct list_head *list,
struct bio *bio, unsigned int nr_segs)
{
struct request *rq;
int checked = 8;
list_for_each_entry_reverse(rq, list, queuelist) {
bool merged = false;
if (!checked--)
break;
if (!blk_rq_merge_ok(rq, bio))
continue;
switch (blk_try_merge(rq, bio)) {
case ELEVATOR_BACK_MERGE:
if (blk_mq_sched_allow_merge(q, rq, bio))
merged = bio_attempt_back_merge(rq, bio,
nr_segs);
break;
case ELEVATOR_FRONT_MERGE:
if (blk_mq_sched_allow_merge(q, rq, bio))
merged = bio_attempt_front_merge(rq, bio,
nr_segs);
break;
case ELEVATOR_DISCARD_MERGE:
merged = bio_attempt_discard_merge(q, rq, bio);
break;
default:
continue;
}
return merged;
}
return false;
}
EXPORT_SYMBOL_GPL(blk_mq_bio_list_merge);
/*
* Reverse check our software queue for entries that we could potentially
* merge with. Currently includes a hand-wavy stop count of 8, to not spend
* too much time checking for merges.
*/
static bool blk_mq_attempt_merge(struct request_queue *q,
struct blk_mq_hw_ctx *hctx,
struct blk_mq_ctx *ctx, struct bio *bio,
unsigned int nr_segs)
{
enum hctx_type type = hctx->type;
lockdep_assert_held(&ctx->lock);
if (blk_mq_bio_list_merge(q, &ctx->rq_lists[type], bio, nr_segs)) {
ctx->rq_merged++;
return true;
}
return false;
}
bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
unsigned int nr_segs)
{
......@@ -470,14 +357,24 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
return e->type->ops.bio_merge(hctx, bio, nr_segs);
type = hctx->type;
if ((hctx->flags & BLK_MQ_F_SHOULD_MERGE) &&
!list_empty_careful(&ctx->rq_lists[type])) {
/* default per sw-queue merge */
spin_lock(&ctx->lock);
ret = blk_mq_attempt_merge(q, hctx, ctx, bio, nr_segs);
spin_unlock(&ctx->lock);
if (!(hctx->flags & BLK_MQ_F_SHOULD_MERGE) ||
list_empty_careful(&ctx->rq_lists[type]))
return false;
/* default per sw-queue merge */
spin_lock(&ctx->lock);
/*
* Reverse check our software queue for entries that we could
* potentially merge with. Currently includes a hand-wavy stop
* count of 8, to not spend too much time checking for merges.
*/
if (blk_bio_list_merge(q, &ctx->rq_lists[type], bio, nr_segs)) {
ctx->rq_merged++;
ret = true;
}
spin_unlock(&ctx->lock);
return ret;
}
......@@ -525,13 +422,7 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
struct blk_mq_ctx *ctx = rq->mq_ctx;
struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
/* flush rq in flush machinery need to be dispatched directly */
if (!(rq->rq_flags & RQF_FLUSH_SEQ) && op_is_flush(rq->cmd_flags)) {
blk_insert_flush(rq);
goto run;
}
WARN_ON(e && (rq->tag != -1));
WARN_ON(e && (rq->tag != BLK_MQ_NO_TAG));
if (blk_mq_sched_bypass_insert(hctx, !!e, rq)) {
/*
......@@ -616,9 +507,11 @@ static void blk_mq_sched_free_tags(struct blk_mq_tag_set *set,
struct blk_mq_hw_ctx *hctx,
unsigned int hctx_idx)
{
unsigned int flags = set->flags & ~BLK_MQ_F_TAG_HCTX_SHARED;
if (hctx->sched_tags) {
blk_mq_free_rqs(set, hctx->sched_tags, hctx_idx);
blk_mq_free_rq_map(hctx->sched_tags);
blk_mq_free_rq_map(hctx->sched_tags, flags);
hctx->sched_tags = NULL;
}
}
......@@ -628,10 +521,12 @@ static int blk_mq_sched_alloc_tags(struct request_queue *q,
unsigned int hctx_idx)
{
struct blk_mq_tag_set *set = q->tag_set;
/* Clear HCTX_SHARED so tags are init'ed */
unsigned int flags = set->flags & ~BLK_MQ_F_TAG_HCTX_SHARED;
int ret;
hctx->sched_tags = blk_mq_alloc_rq_map(set, hctx_idx, q->nr_requests,
set->reserved_tags);
set->reserved_tags, flags);
if (!hctx->sched_tags)
return -ENOMEM;
......@@ -649,8 +544,11 @@ static void blk_mq_sched_tags_teardown(struct request_queue *q)
int i;
queue_for_each_hw_ctx(q, hctx, i) {
/* Clear HCTX_SHARED so tags are freed */
unsigned int flags = hctx->flags & ~BLK_MQ_F_TAG_HCTX_SHARED;
if (hctx->sched_tags) {
blk_mq_free_rq_map(hctx->sched_tags);
blk_mq_free_rq_map(hctx->sched_tags, flags);
hctx->sched_tags = NULL;
}
}
......
......@@ -5,9 +5,6 @@
#include "blk-mq.h"
#include "blk-mq-tag.h"
void blk_mq_sched_free_hctx_data(struct request_queue *q,
void (*exit)(struct blk_mq_hw_ctx *));
void blk_mq_sched_assign_ioc(struct request *rq);
void blk_mq_sched_request_inserted(struct request *rq);
......
......@@ -36,8 +36,6 @@ static void blk_mq_hw_sysfs_release(struct kobject *kobj)
struct blk_mq_hw_ctx *hctx = container_of(kobj, struct blk_mq_hw_ctx,
kobj);
cancel_delayed_work_sync(&hctx->run_work);
if (hctx->flags & BLK_MQ_F_BLOCKING)
cleanup_srcu_struct(hctx->srcu);
blk_free_flush_queue(hctx->fq);
......
......@@ -23,9 +23,18 @@
*/
bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
{
if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
!test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
atomic_inc(&hctx->tags->active_queues);
if (blk_mq_is_sbitmap_shared(hctx->flags)) {
struct request_queue *q = hctx->queue;
struct blk_mq_tag_set *set = q->tag_set;
if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) &&
!test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
atomic_inc(&set->active_queues_shared_sbitmap);
} else {
if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
!test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
atomic_inc(&hctx->tags->active_queues);
}
return true;
}
......@@ -35,9 +44,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
*/
void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
{
sbitmap_queue_wake_all(&tags->bitmap_tags);
sbitmap_queue_wake_all(tags->bitmap_tags);
if (include_reserve)
sbitmap_queue_wake_all(&tags->breserved_tags);
sbitmap_queue_wake_all(tags->breserved_tags);
}
/*
......@@ -47,11 +56,19 @@ void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
{
struct blk_mq_tags *tags = hctx->tags;
if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
return;
atomic_dec(&tags->active_queues);
struct request_queue *q = hctx->queue;
struct blk_mq_tag_set *set = q->tag_set;
if (blk_mq_is_sbitmap_shared(hctx->flags)) {
if (!test_and_clear_bit(QUEUE_FLAG_HCTX_ACTIVE,
&q->queue_flags))
return;
atomic_dec(&set->active_queues_shared_sbitmap);
} else {
if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
return;
atomic_dec(&tags->active_queues);
}
blk_mq_tag_wakeup_all(tags, false);
}
......@@ -59,7 +76,8 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
static int __blk_mq_get_tag(struct blk_mq_alloc_data *data,
struct sbitmap_queue *bt)
{
if (!data->q->elevator && !hctx_may_queue(data->hctx, bt))
if (!data->q->elevator && !(data->flags & BLK_MQ_REQ_RESERVED) &&
!hctx_may_queue(data->hctx, bt))
return BLK_MQ_NO_TAG;
if (data->shallow_depth)
......@@ -82,10 +100,10 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
WARN_ON_ONCE(1);
return BLK_MQ_NO_TAG;
}
bt = &tags->breserved_tags;
bt = tags->breserved_tags;
tag_offset = 0;
} else {
bt = &tags->bitmap_tags;
bt = tags->bitmap_tags;
tag_offset = tags->nr_reserved_tags;
}
......@@ -131,9 +149,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
data->ctx);
tags = blk_mq_tags_from_data(data);
if (data->flags & BLK_MQ_REQ_RESERVED)
bt = &tags->breserved_tags;
bt = tags->breserved_tags;
else
bt = &tags->bitmap_tags;
bt = tags->bitmap_tags;
/*
* If destination hw queue is changed, fake wake up on
......@@ -167,10 +185,10 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
const int real_tag = tag - tags->nr_reserved_tags;
BUG_ON(real_tag >= tags->nr_tags);
sbitmap_queue_clear(&tags->bitmap_tags, real_tag, ctx->cpu);
sbitmap_queue_clear(tags->bitmap_tags, real_tag, ctx->cpu);
} else {
BUG_ON(tag >= tags->nr_reserved_tags);
sbitmap_queue_clear(&tags->breserved_tags, tag, ctx->cpu);
sbitmap_queue_clear(tags->breserved_tags, tag, ctx->cpu);
}
}
......@@ -197,7 +215,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
* We can hit rq == NULL here, because the tagging functions
* test and set the bit before assigning ->rqs[].
*/
if (rq && rq->q == hctx->queue)
if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
return iter_data->fn(hctx, rq, iter_data->data, reserved);
return true;
}
......@@ -298,9 +316,9 @@ static void __blk_mq_all_tag_iter(struct blk_mq_tags *tags,
WARN_ON_ONCE(flags & BT_TAG_ITER_RESERVED);
if (tags->nr_reserved_tags)
bt_tags_for_each(tags, &tags->breserved_tags, fn, priv,
bt_tags_for_each(tags, tags->breserved_tags, fn, priv,
flags | BT_TAG_ITER_RESERVED);
bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, flags);
bt_tags_for_each(tags, tags->bitmap_tags, fn, priv, flags);
}
/**
......@@ -398,9 +416,7 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
/*
* __blk_mq_update_nr_hw_queues() updates nr_hw_queues and queue_hw_ctx
* while the queue is frozen. So we can use q_usage_counter to avoid
* racing with it. __blk_mq_update_nr_hw_queues() uses
* synchronize_rcu() to ensure this function left the critical section
* below.
* racing with it.
*/
if (!percpu_ref_tryget(&q->q_usage_counter))
return;
......@@ -416,8 +432,8 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
continue;
if (tags->nr_reserved_tags)
bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
bt_for_each(hctx, tags->breserved_tags, fn, priv, true);
bt_for_each(hctx, tags->bitmap_tags, fn, priv, false);
}
blk_queue_exit(q);
}
......@@ -429,30 +445,64 @@ static int bt_alloc(struct sbitmap_queue *bt, unsigned int depth,
node);
}
static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
int node, int alloc_policy)
static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
int node, int alloc_policy)
{
unsigned int depth = tags->nr_tags - tags->nr_reserved_tags;
bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
if (bt_alloc(&tags->bitmap_tags, depth, round_robin, node))
goto free_tags;
if (bt_alloc(&tags->breserved_tags, tags->nr_reserved_tags, round_robin,
node))
if (bt_alloc(&tags->__bitmap_tags, depth, round_robin, node))
return -ENOMEM;
if (bt_alloc(&tags->__breserved_tags, tags->nr_reserved_tags,
round_robin, node))
goto free_bitmap_tags;
return tags;
tags->bitmap_tags = &tags->__bitmap_tags;
tags->breserved_tags = &tags->__breserved_tags;
return 0;
free_bitmap_tags:
sbitmap_queue_free(&tags->bitmap_tags);
free_tags:
kfree(tags);
return NULL;
sbitmap_queue_free(&tags->__bitmap_tags);
return -ENOMEM;
}
int blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int flags)
{
unsigned int depth = set->queue_depth - set->reserved_tags;
int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags);
bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
int i, node = set->numa_node;
if (bt_alloc(&set->__bitmap_tags, depth, round_robin, node))
return -ENOMEM;
if (bt_alloc(&set->__breserved_tags, set->reserved_tags,
round_robin, node))
goto free_bitmap_tags;
for (i = 0; i < set->nr_hw_queues; i++) {
struct blk_mq_tags *tags = set->tags[i];
tags->bitmap_tags = &set->__bitmap_tags;
tags->breserved_tags = &set->__breserved_tags;
}
return 0;
free_bitmap_tags:
sbitmap_queue_free(&set->__bitmap_tags);
return -ENOMEM;
}
void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *set)
{
sbitmap_queue_free(&set->__bitmap_tags);
sbitmap_queue_free(&set->__breserved_tags);
}
struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
unsigned int reserved_tags,
int node, int alloc_policy)
int node, unsigned int flags)
{
int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(flags);
struct blk_mq_tags *tags;
if (total_tags > BLK_MQ_TAG_MAX) {
......@@ -467,13 +517,22 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
tags->nr_tags = total_tags;
tags->nr_reserved_tags = reserved_tags;
return blk_mq_init_bitmap_tags(tags, node, alloc_policy);
if (flags & BLK_MQ_F_TAG_HCTX_SHARED)
return tags;
if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
kfree(tags);
return NULL;
}
return tags;
}
void blk_mq_free_tags(struct blk_mq_tags *tags)
void blk_mq_free_tags(struct blk_mq_tags *tags, unsigned int flags)
{
sbitmap_queue_free(&tags->bitmap_tags);
sbitmap_queue_free(&tags->breserved_tags);
if (!(flags & BLK_MQ_F_TAG_HCTX_SHARED)) {
sbitmap_queue_free(tags->bitmap_tags);
sbitmap_queue_free(tags->breserved_tags);
}
kfree(tags);
}
......@@ -492,6 +551,8 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
*/
if (tdepth > tags->nr_tags) {
struct blk_mq_tag_set *set = hctx->queue->tag_set;
/* Only sched tags can grow, so clear HCTX_SHARED flag */
unsigned int flags = set->flags & ~BLK_MQ_F_TAG_HCTX_SHARED;
struct blk_mq_tags *new;
bool ret;
......@@ -506,30 +567,35 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
return -EINVAL;
new = blk_mq_alloc_rq_map(set, hctx->queue_num, tdepth,
tags->nr_reserved_tags);
tags->nr_reserved_tags, flags);
if (!new)
return -ENOMEM;
ret = blk_mq_alloc_rqs(set, new, hctx->queue_num, tdepth);
if (ret) {
blk_mq_free_rq_map(new);
blk_mq_free_rq_map(new, flags);
return -ENOMEM;
}
blk_mq_free_rqs(set, *tagsptr, hctx->queue_num);
blk_mq_free_rq_map(*tagsptr);
blk_mq_free_rq_map(*tagsptr, flags);
*tagsptr = new;
} else {
/*
* Don't need (or can't) update reserved tags here, they
* remain static and should never need resizing.
*/
sbitmap_queue_resize(&tags->bitmap_tags,
sbitmap_queue_resize(tags->bitmap_tags,
tdepth - tags->nr_reserved_tags);
}
return 0;
}
void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int size)
{
sbitmap_queue_resize(&set->__bitmap_tags, size - set->reserved_tags);
}
/**
* blk_mq_unique_tag() - return a tag that is unique queue-wide
* @rq: request for which to compute a unique tag
......
......@@ -2,8 +2,6 @@
#ifndef INT_BLK_MQ_TAG_H
#define INT_BLK_MQ_TAG_H
#include "blk-mq.h"
/*
* Tag address space map.
*/
......@@ -13,17 +11,25 @@ struct blk_mq_tags {
atomic_t active_queues;
struct sbitmap_queue bitmap_tags;
struct sbitmap_queue breserved_tags;
struct sbitmap_queue *bitmap_tags;
struct sbitmap_queue *breserved_tags;
struct sbitmap_queue __bitmap_tags;
struct sbitmap_queue __breserved_tags;
struct request **rqs;
struct request **static_rqs;
struct list_head page_list;
};
extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
unsigned int reserved_tags,
int node, unsigned int flags);
extern void blk_mq_free_tags(struct blk_mq_tags *tags, unsigned int flags);
extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy);
extern void blk_mq_free_tags(struct blk_mq_tags *tags);
extern int blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *set,
unsigned int flags);
extern void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *set);
extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
......@@ -31,6 +37,9 @@ extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
struct blk_mq_tags **tags,
unsigned int depth, bool can_grow);
extern void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set,
unsigned int size);
extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
void *priv);
......@@ -56,7 +65,7 @@ extern void __blk_mq_tag_idle(struct blk_mq_hw_ctx *);
static inline bool blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
{
if (!(hctx->flags & BLK_MQ_F_TAG_SHARED))
if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
return false;
return __blk_mq_tag_busy(hctx);
......@@ -64,43 +73,12 @@ static inline bool blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
static inline void blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
{
if (!(hctx->flags & BLK_MQ_F_TAG_SHARED))
if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
return;
__blk_mq_tag_idle(hctx);
}
/*
* For shared tag users, we track the number of currently active users
* and attempt to provide a fair share of the tag depth for each of them.
*/
static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
struct sbitmap_queue *bt)
{
unsigned int depth, users;
if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_SHARED))
return true;
if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
return true;
/*
* Don't try dividing an ant
*/
if (bt->sb.depth == 1)
return true;
users = atomic_read(&hctx->tags->active_queues);
if (!users)
return true;
/*
* Allow at least some tags
*/
depth = max((bt->sb.depth + users - 1) / users, 4U);
return atomic_read(&hctx->nr_active) < depth;
}
static inline bool blk_mq_tag_is_reserved(struct blk_mq_tags *tags,
unsigned int tag)
{
......
This diff is collapsed.
......@@ -53,11 +53,12 @@ struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
*/
void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
unsigned int hctx_idx);
void blk_mq_free_rq_map(struct blk_mq_tags *tags);
void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags);
struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
unsigned int hctx_idx,
unsigned int nr_tags,
unsigned int reserved_tags);
unsigned int reserved_tags,
unsigned int flags);
int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
unsigned int hctx_idx, unsigned int depth);
......@@ -158,6 +159,11 @@ struct blk_mq_alloc_data {
struct blk_mq_hw_ctx *hctx;
};
static inline bool blk_mq_is_sbitmap_shared(unsigned int flags)
{
return flags & BLK_MQ_F_TAG_HCTX_SHARED;
}
static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data *data)
{
if (data->q->elevator)
......@@ -193,6 +199,28 @@ static inline bool blk_mq_get_dispatch_budget(struct request_queue *q)
return true;
}
static inline void __blk_mq_inc_active_requests(struct blk_mq_hw_ctx *hctx)
{
if (blk_mq_is_sbitmap_shared(hctx->flags))
atomic_inc(&hctx->queue->nr_active_requests_shared_sbitmap);
else
atomic_inc(&hctx->nr_active);
}
static inline void __blk_mq_dec_active_requests(struct blk_mq_hw_ctx *hctx)
{
if (blk_mq_is_sbitmap_shared(hctx->flags))
atomic_dec(&hctx->queue->nr_active_requests_shared_sbitmap);
else
atomic_dec(&hctx->nr_active);
}
static inline int __blk_mq_active_requests(struct blk_mq_hw_ctx *hctx)
{
if (blk_mq_is_sbitmap_shared(hctx->flags))
return atomic_read(&hctx->queue->nr_active_requests_shared_sbitmap);
return atomic_read(&hctx->nr_active);
}
static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
struct request *rq)
{
......@@ -201,7 +229,7 @@ static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
if (rq->rq_flags & RQF_MQ_INFLIGHT) {
rq->rq_flags &= ~RQF_MQ_INFLIGHT;
atomic_dec(&hctx->nr_active);
__blk_mq_dec_active_requests(hctx);
}
}
......@@ -253,4 +281,46 @@ static inline struct blk_plug *blk_mq_plug(struct request_queue *q,
return NULL;
}
/*
* For shared tag users, we track the number of currently active users
* and attempt to provide a fair share of the tag depth for each of them.
*/
static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
struct sbitmap_queue *bt)
{
unsigned int depth, users;
if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
return true;
/*
* Don't try dividing an ant
*/
if (bt->sb.depth == 1)
return true;
if (blk_mq_is_sbitmap_shared(hctx->flags)) {
struct request_queue *q = hctx->queue;
struct blk_mq_tag_set *set = q->tag_set;
if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &q->queue_flags))
return true;
users = atomic_read(&set->active_queues_shared_sbitmap);
} else {
if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
return true;
users = atomic_read(&hctx->tags->active_queues);
}
if (!users)
return true;
/*
* Allow at least some tags
*/
depth = max((bt->sb.depth + users - 1) / users, 4U);
return __blk_mq_active_requests(hctx) < depth;
}
#endif
......@@ -172,15 +172,13 @@ EXPORT_SYMBOL(blk_queue_max_hw_sectors);
*
* Description:
* If a driver doesn't want IOs to cross a given chunk size, it can set
* this limit and prevent merging across chunks. Note that the chunk size
* must currently be a power-of-2 in sectors. Also note that the block
* layer must accept a page worth of data at any offset. So if the
* crossing of chunks is a hard limitation in the driver, it must still be
* prepared to split single page bios.
* this limit and prevent merging across chunks. Note that the block layer
* must accept a page worth of data at any offset. So if the crossing of
* chunks is a hard limitation in the driver, it must still be prepared
* to split single page bios.
**/
void blk_queue_chunk_sectors(struct request_queue *q, unsigned int chunk_sectors)
{
BUG_ON(!is_power_of_2(chunk_sectors));
q->limits.chunk_sectors = chunk_sectors;
}
EXPORT_SYMBOL(blk_queue_chunk_sectors);
......@@ -374,6 +372,19 @@ void blk_queue_alignment_offset(struct request_queue *q, unsigned int offset)
}
EXPORT_SYMBOL(blk_queue_alignment_offset);
void blk_queue_update_readahead(struct request_queue *q)
{
/*
* For read-ahead of large files to be effective, we need to read ahead
* at least twice the optimal I/O size.
*/
q->backing_dev_info->ra_pages =
max(queue_io_opt(q) * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);
q->backing_dev_info->io_pages =
queue_max_sectors(q) >> (PAGE_SHIFT - 9);
}
EXPORT_SYMBOL_GPL(blk_queue_update_readahead);
/**
* blk_limits_io_min - set minimum request size for a device
* @limits: the queue limits
......@@ -452,6 +463,8 @@ EXPORT_SYMBOL(blk_limits_io_opt);
void blk_queue_io_opt(struct request_queue *q, unsigned int opt)
{
blk_limits_io_opt(&q->limits, opt);
q->backing_dev_info->ra_pages =
max(queue_io_opt(q) * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);
}
EXPORT_SYMBOL(blk_queue_io_opt);
......@@ -534,6 +547,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
t->io_min = max(t->io_min, b->io_min);
t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
t->chunk_sectors = lcm_not_zero(t->chunk_sectors, b->chunk_sectors);
/* Physical block size a multiple of the logical block size? */
if (t->physical_block_size & (t->logical_block_size - 1)) {
......@@ -556,6 +570,13 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
ret = -1;
}
/* chunk_sectors a multiple of the physical block size? */
if ((t->chunk_sectors << 9) & (t->physical_block_size - 1)) {
t->chunk_sectors = 0;
t->misaligned = 1;
ret = -1;
}
t->raid_partial_stripes_expensive =
max(t->raid_partial_stripes_expensive,
b->raid_partial_stripes_expensive);
......@@ -594,10 +615,6 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
t->discard_granularity;
}
if (b->chunk_sectors)
t->chunk_sectors = min_not_zero(t->chunk_sectors,
b->chunk_sectors);
t->zoned = max(t->zoned, b->zoned);
return ret;
}
......@@ -629,8 +646,7 @@ void disk_stack_limits(struct gendisk *disk, struct block_device *bdev,
top, bottom);
}
t->backing_dev_info->io_pages =
t->limits.max_sectors >> (PAGE_SHIFT - 9);
blk_queue_update_readahead(disk->queue);
}
EXPORT_SYMBOL(disk_stack_limits);
......
This diff is collapsed.
This diff is collapsed.
......@@ -114,6 +114,11 @@ static inline bool bio_integrity_endio(struct bio *bio)
return true;
}
bool blk_integrity_merge_rq(struct request_queue *, struct request *,
struct request *);
bool blk_integrity_merge_bio(struct request_queue *, struct request *,
struct bio *);
static inline bool integrity_req_gap_back_merge(struct request *req,
struct bio *next)
{
......@@ -137,6 +142,16 @@ static inline bool integrity_req_gap_front_merge(struct request *req,
void blk_integrity_add(struct gendisk *);
void blk_integrity_del(struct gendisk *);
#else /* CONFIG_BLK_DEV_INTEGRITY */
static inline bool blk_integrity_merge_rq(struct request_queue *rq,
struct request *r1, struct request *r2)
{
return true;
}
static inline bool blk_integrity_merge_bio(struct request_queue *rq,
struct request *r, struct bio *b)
{
return true;
}
static inline bool integrity_req_gap_back_merge(struct request *req,
struct bio *next)
{
......@@ -169,14 +184,10 @@ static inline void blk_integrity_del(struct gendisk *disk)
unsigned long blk_rq_timeout(unsigned long timeout);
void blk_add_timer(struct request *req);
bool bio_attempt_front_merge(struct request *req, struct bio *bio,
unsigned int nr_segs);
bool bio_attempt_back_merge(struct request *req, struct bio *bio,
unsigned int nr_segs);
bool bio_attempt_discard_merge(struct request_queue *q, struct request *req,
struct bio *bio);
bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
unsigned int nr_segs, struct request **same_queue_rq);
bool blk_bio_list_merge(struct request_queue *q, struct list_head *list,
struct bio *bio, unsigned int nr_segs);
void blk_account_io_start(struct request *req);
void blk_account_io_done(struct request *req, u64 now);
......@@ -223,10 +234,6 @@ ssize_t part_timeout_store(struct device *, struct device_attribute *,
void __blk_queue_split(struct bio **bio, unsigned int *nr_segs);
int ll_back_merge_fn(struct request *req, struct bio *bio,
unsigned int nr_segs);
int ll_front_merge_fn(struct request *req, struct bio *bio,
unsigned int nr_segs);
struct request *attempt_back_merge(struct request_queue *q, struct request *rq);
struct request *attempt_front_merge(struct request_queue *q, struct request *rq);
int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
struct request *next);
unsigned int blk_recalc_rq_segments(struct request *rq);
......@@ -350,7 +357,7 @@ char *disk_name(struct gendisk *hd, int partno, char *buf);
#define ADDPART_FLAG_NONE 0
#define ADDPART_FLAG_RAID 1
#define ADDPART_FLAG_WHOLEDISK 2
void delete_partition(struct gendisk *disk, struct hd_struct *part);
void delete_partition(struct hd_struct *part);
int bdev_add_partition(struct block_device *bdev, int partno,
sector_t start, sector_t length);
int bdev_del_partition(struct block_device *bdev, int partno);
......
......@@ -267,22 +267,21 @@ static struct bio *bounce_clone_bio(struct bio *bio_src, gfp_t gfp_mask,
break;
}
bio_crypt_clone(bio, bio_src, gfp_mask);
if (bio_crypt_clone(bio, bio_src, gfp_mask) < 0)
goto err_put;
if (bio_integrity(bio_src)) {
int ret;
ret = bio_integrity_clone(bio, bio_src, gfp_mask);
if (ret < 0) {
bio_put(bio);
return NULL;
}
}
if (bio_integrity(bio_src) &&
bio_integrity_clone(bio, bio_src, gfp_mask) < 0)
goto err_put;
bio_clone_blkg_association(bio, bio_src);
blkcg_bio_issue_init(bio);
return bio;
err_put:
bio_put(bio);
return NULL;
}
static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
......
......@@ -207,7 +207,7 @@ static int bsg_map_buffer(struct bsg_buffer *buf, struct request *req)
BUG_ON(!req->nr_phys_segments);
buf->sg_list = kzalloc(sz, GFP_KERNEL);
buf->sg_list = kmalloc(sz, GFP_KERNEL);
if (!buf->sg_list)
return -ENOMEM;
sg_init_table(buf->sg_list, req->nr_phys_segments);
......
......@@ -191,8 +191,7 @@ static void elevator_release(struct kobject *kobj)
void __elevator_exit(struct request_queue *q, struct elevator_queue *e)
{
mutex_lock(&e->sysfs_lock);
if (e->type->ops.exit_sched)
blk_mq_exit_sched(q, e);
blk_mq_exit_sched(q, e);
mutex_unlock(&e->sysfs_lock);
kobject_put(&e->kobj);
......@@ -480,16 +479,13 @@ static struct kobj_type elv_ktype = {
.release = elevator_release,
};
/*
* elv_register_queue is called from either blk_register_queue or
* elevator_switch, elevator switch is prevented from being happen
* in the two paths, so it is safe to not hold q->sysfs_lock.
*/
int elv_register_queue(struct request_queue *q, bool uevent)
{
struct elevator_queue *e = q->elevator;
int error;
lockdep_assert_held(&q->sysfs_lock);
error = kobject_add(&e->kobj, &q->kobj, "%s", "iosched");
if (!error) {
struct elv_fs_entry *attr = e->type->elevator_attrs;
......@@ -508,13 +504,10 @@ int elv_register_queue(struct request_queue *q, bool uevent)
return error;
}
/*
* elv_unregister_queue is called from either blk_unregister_queue or
* elevator_switch, elevator switch is prevented from being happen
* in the two paths, so it is safe to not hold q->sysfs_lock.
*/
void elv_unregister_queue(struct request_queue *q)
{
lockdep_assert_held(&q->sysfs_lock);
if (q) {
struct elevator_queue *e = q->elevator;
......@@ -616,7 +609,7 @@ int elevator_switch_mq(struct request_queue *q,
static inline bool elv_support_iosched(struct request_queue *q)
{
if (!q->mq_ops ||
if (!queue_is_mq(q) ||
(q->tag_set && (q->tag_set->flags & BLK_MQ_F_NO_SCHED)))
return false;
return true;
......@@ -673,7 +666,7 @@ void elevator_init_mq(struct request_queue *q)
if (!elv_support_iosched(q))
return;
WARN_ON_ONCE(test_bit(QUEUE_FLAG_REGISTERED, &q->queue_flags));
WARN_ON_ONCE(blk_queue_registered(q));
if (unlikely(q->elevator))
return;
......@@ -764,7 +757,7 @@ ssize_t elv_iosched_store(struct request_queue *q, const char *name,
{
int ret;
if (!queue_is_mq(q) || !elv_support_iosched(q))
if (!elv_support_iosched(q))
return count;
ret = __elevator_change(q, name);
......
This diff is collapsed.
......@@ -23,7 +23,7 @@ static int blkpg_do_ioctl(struct block_device *bdev,
return -EACCES;
if (copy_from_user(&p, upart, sizeof(struct blkpg_partition)))
return -EFAULT;
if (bdev != bdev->bd_contains)
if (bdev_is_partition(bdev))
return -EINVAL;
if (p.pno <= 0)
......@@ -94,7 +94,7 @@ static int blkdev_reread_part(struct block_device *bdev)
{
int ret;
if (!disk_part_scan_enabled(bdev->bd_disk) || bdev != bdev->bd_contains)
if (!disk_part_scan_enabled(bdev->bd_disk) || bdev_is_partition(bdev))
return -EINVAL;
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
......@@ -112,8 +112,7 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
uint64_t range[2];
uint64_t start, len;
struct request_queue *q = bdev_get_queue(bdev);
struct address_space *mapping = bdev->bd_inode->i_mapping;
int err;
if (!(mode & FMODE_WRITE))
return -EBADF;
......@@ -134,7 +133,11 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
if (start + len > i_size_read(bdev->bd_inode))
return -EINVAL;
truncate_inode_pages_range(mapping, start, start + len - 1);
err = truncate_bdev_range(bdev, mode, start, start + len - 1);
if (err)
return err;
return blkdev_issue_discard(bdev, start >> 9, len >> 9,
GFP_KERNEL, flags);
}
......@@ -143,8 +146,8 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
unsigned long arg)
{
uint64_t range[2];
struct address_space *mapping;
uint64_t start, end, len;
int err;
if (!(mode & FMODE_WRITE))
return -EBADF;
......@@ -166,8 +169,9 @@ static int blk_ioctl_zeroout(struct block_device *bdev, fmode_t mode,
return -EINVAL;
/* Invalidate the page cache, including dirty pages */
mapping = bdev->bd_inode->i_mapping;
truncate_inode_pages_range(mapping, start, end);
err = truncate_bdev_range(bdev, mode, start, end);
if (err)
return err;
return blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL,
BLKDEV_ZERO_NOUNMAP);
......@@ -474,15 +478,14 @@ static int blkdev_bszset(struct block_device *bdev, fmode_t mode,
if (get_user(n, argp))
return -EFAULT;
if (!(mode & FMODE_EXCL)) {
bdgrab(bdev);
if (blkdev_get(bdev, mode | FMODE_EXCL, &bdev) < 0)
return -EBUSY;
}
if (mode & FMODE_EXCL)
return set_blocksize(bdev, n);
if (IS_ERR(blkdev_get_by_dev(bdev->bd_dev, mode | FMODE_EXCL, &bdev)))
return -EBUSY;
ret = set_blocksize(bdev, n);
if (!(mode & FMODE_EXCL))
blkdev_put(bdev, mode | FMODE_EXCL);
blkdev_put(bdev, mode | FMODE_EXCL);
return ret;
}
......
......@@ -69,7 +69,7 @@ int ioprio_check_cap(int ioprio)
switch (class) {
case IOPRIO_CLASS_RT:
if (!capable(CAP_SYS_ADMIN))
if (!capable(CAP_SYS_NICE) && !capable(CAP_SYS_ADMIN))
return -EPERM;
fallthrough;
/* rt has prio field too */
......
......@@ -359,7 +359,7 @@ static unsigned int kyber_sched_tags_shift(struct request_queue *q)
* All of the hardware queues have the same depth, so we can just grab
* the shift of the first one.
*/
return q->queue_hw_ctx[0]->sched_tags->bitmap_tags.sb.shift;
return q->queue_hw_ctx[0]->sched_tags->bitmap_tags->sb.shift;
}
static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q)
......@@ -502,7 +502,7 @@ static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
khd->batching = 0;
hctx->sched_data = khd;
sbitmap_queue_min_shallow_depth(&hctx->sched_tags->bitmap_tags,
sbitmap_queue_min_shallow_depth(hctx->sched_tags->bitmap_tags,
kqd->async_depth);
return 0;
......@@ -573,7 +573,7 @@ static bool kyber_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio,
bool merged;
spin_lock(&kcq->lock);
merged = blk_mq_bio_list_merge(hctx->queue, rq_list, bio, nr_segs);
merged = blk_bio_list_merge(hctx->queue, rq_list, bio, nr_segs);
spin_unlock(&kcq->lock);
return merged;
......
......@@ -386,6 +386,8 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
spin_lock(&dd->lock);
rq = __dd_dispatch_request(dd);
spin_unlock(&dd->lock);
if (rq)
atomic_dec(&rq->mq_hctx->elevator_queued);
return rq;
}
......@@ -533,6 +535,7 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
rq = list_first_entry(list, struct request, queuelist);
list_del_init(&rq->queuelist);
dd_insert_request(hctx, rq, at_head);
atomic_inc(&hctx->elevator_queued);
}
spin_unlock(&dd->lock);
}
......@@ -579,6 +582,9 @@ static bool dd_has_work(struct blk_mq_hw_ctx *hctx)
{
struct deadline_data *dd = hctx->queue->elevator->elevator_data;
if (!atomic_read(&hctx->elevator_queued))
return false;
return !list_empty_careful(&dd->dispatch) ||
!list_empty_careful(&dd->fifo_list[0]) ||
!list_empty_careful(&dd->fifo_list[1]);
......
This diff is collapsed.
This diff is collapsed.
......@@ -1670,7 +1670,7 @@ static int floppy_open(struct block_device *bdev, fmode_t mode)
}
if (mode & (FMODE_READ|FMODE_WRITE)) {
check_disk_change(bdev);
bdev_check_media_change(bdev);
if (mode & FMODE_WRITE) {
int wrprot;
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment