- 10 Jun, 2016 4 commits
-
-
Mike Snitzer authored
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
-
Mike Snitzer authored
Add "multipath-bio" target that offers a bio-based multipath target as an alternative to the request-based "multipath" target -- but in a following commit "multipath-bio" will immediately be replaced by a new "queue_mode" feature for the "multipath" target which will allow bio-based mode to be selected. When DM multipath was originally converted from bio-based to request-based the motivation for the change was better dynamic load balancing (by leveraging block core's request-based IO schedulers, for merging and sorting, _before_ DM multipath would make the decision on where to steer the IO -- based on path load and/or availability). More background is available in this "Request-based Device-mapper multipath and Dynamic load balancing" paper: https://www.kernel.org/doc/ols/2007/ols2007v2-pages-235-244.pdf But we've now come full circle where significantly faster storage devices no longer need IOs to be made larger to drive optimal IO performance. And even if they do there have been changes to the block and filesystem layers that help ensure upper layers are constructing larger IOs. In addition, SCSI's differentiated IO errors will propagate through to bio-based IO completion hooks -- so that eliminates another historic justiciation for request-based DM multipath. Lastly, the block layer's immutable biovec changes have made bio cloning cheaper than it has ever been; whereas request cloning is still relatively expensive (both on a CPU usage and memory footprint level). As such, bio-based DM multipath offers the promise of a more efficient IO path for high IOPs devices that are, or will be, emerging. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
-
Mike Snitzer authored
Add some seperation between bio-based and request-based DM core code. 'struct mapped_device' and other DM core only structures and functions have been moved to dm-core.h and all relevant DM core .c files have been updated to include dm-core.h rather than dm.h DM targets should _never_ include dm-core.h! [block core merge conflict resolution from Stephen Rothwell] Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
-
Ming Lei authored
No one need this macro now, so remove it. Basically only how many bvecs in one bio matters instead of how many bytes in this bio. The motivation is for supporting multipage bvecs, in which we only know what the max count of bvecs is supported in the bio. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 09 Jun, 2016 10 commits
-
-
Jens Axboe authored
If we're queuing REQ_PRIO IO and the task is running at an idle IO class, then temporarily boost the priority. This prevents livelocks due to priority inversion, when a low priority task is holding file system resources while attempting to do IO. An example of that is shown below. An ioniced idle task is holding the directory mutex, while a normal priority task is trying to do a directory lookup. [478381.198925] ------------[ cut here ]------------ [478381.200315] INFO: task ionice:1168369 blocked for more than 120 seconds. [478381.201324] Not tainted 4.0.9-38_fbk5_hotfix1_2936_g85409c6 #1 [478381.202278] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [478381.203462] ionice D ffff8803692736a8 0 1168369 1 0x00000080 [478381.203466] ffff8803692736a8 ffff880399c21300 ffff880276adcc00 ffff880369273698 [478381.204589] ffff880369273fd8 0000000000000000 7fffffffffffffff 0000000000000002 [478381.205752] ffffffff8177d5e0 ffff8803692736c8 ffffffff8177cea7 0000000000000000 [478381.206874] Call Trace: [478381.207253] [<ffffffff8177d5e0>] ? bit_wait_io_timeout+0x80/0x80 [478381.208175] [<ffffffff8177cea7>] schedule+0x37/0x90 [478381.208932] [<ffffffff8177f5fc>] schedule_timeout+0x1dc/0x250 [478381.209805] [<ffffffff81421c17>] ? __blk_run_queue+0x37/0x50 [478381.210706] [<ffffffff810ca1c5>] ? ktime_get+0x45/0xb0 [478381.211489] [<ffffffff8177c407>] io_schedule_timeout+0xa7/0x110 [478381.212402] [<ffffffff810a8c2b>] ? prepare_to_wait+0x5b/0x90 [478381.213280] [<ffffffff8177d616>] bit_wait_io+0x36/0x50 [478381.214063] [<ffffffff8177d325>] __wait_on_bit+0x65/0x90 [478381.214961] [<ffffffff8177d5e0>] ? bit_wait_io_timeout+0x80/0x80 [478381.215872] [<ffffffff8177d47c>] out_of_line_wait_on_bit+0x7c/0x90 [478381.216806] [<ffffffff810a89f0>] ? wake_atomic_t_function+0x40/0x40 [478381.217773] [<ffffffff811f03aa>] __wait_on_buffer+0x2a/0x30 [478381.218641] [<ffffffff8123c557>] ext4_bread+0x57/0x70 [478381.219425] [<ffffffff8124498c>] __ext4_read_dirblock+0x3c/0x380 [478381.220467] [<ffffffff8124665d>] ext4_dx_find_entry+0x7d/0x170 [478381.221357] [<ffffffff8114c49e>] ? find_get_entry+0x1e/0xa0 [478381.222208] [<ffffffff81246bd4>] ext4_find_entry+0x484/0x510 [478381.223090] [<ffffffff812471a2>] ext4_lookup+0x52/0x160 [478381.223882] [<ffffffff811c401d>] lookup_real+0x1d/0x60 [478381.224675] [<ffffffff811c4698>] __lookup_hash+0x38/0x50 [478381.225697] [<ffffffff817745bd>] lookup_slow+0x45/0xab [478381.226941] [<ffffffff811c690e>] link_path_walk+0x7ae/0x820 [478381.227880] [<ffffffff811c6a42>] path_init+0xc2/0x430 [478381.228677] [<ffffffff813e6e26>] ? security_file_alloc+0x16/0x20 [478381.229776] [<ffffffff811c8c57>] path_openat+0x77/0x620 [478381.230767] [<ffffffff81185c6e>] ? page_add_file_rmap+0x2e/0x70 [478381.232019] [<ffffffff811cb253>] do_filp_open+0x43/0xa0 [478381.233016] [<ffffffff8108c4a9>] ? creds_are_invalid+0x29/0x70 [478381.234072] [<ffffffff811c0cb0>] do_open_execat+0x70/0x170 [478381.235039] [<ffffffff811c1bf8>] do_execveat_common.isra.36+0x1b8/0x6e0 [478381.236051] [<ffffffff811c214c>] do_execve+0x2c/0x30 [478381.236809] [<ffffffff811ca392>] ? getname+0x12/0x20 [478381.237564] [<ffffffff811c23be>] SyS_execve+0x2e/0x40 [478381.238338] [<ffffffff81780a1d>] stub_execve+0x6d/0xa0 [478381.239126] ------------[ cut here ]------------ [478381.239915] ------------[ cut here ]------------ [478381.240606] INFO: task python2.7:1168375 blocked for more than 120 seconds. [478381.242673] Not tainted 4.0.9-38_fbk5_hotfix1_2936_g85409c6 #1 [478381.243653] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [478381.244902] python2.7 D ffff88005cf8fb98 0 1168375 1168248 0x00000080 [478381.244904] ffff88005cf8fb98 ffff88016c1f0980 ffffffff81c134c0 ffff88016c1f11a0 [478381.246023] ffff88005cf8ffd8 ffff880466cd0cbc ffff88016c1f0980 00000000ffffffff [478381.247138] ffff880466cd0cc0 ffff88005cf8fbb8 ffffffff8177cea7 ffff88005cf8fcc8 [478381.248252] Call Trace: [478381.248630] [<ffffffff8177cea7>] schedule+0x37/0x90 [478381.249382] [<ffffffff8177d08e>] schedule_preempt_disabled+0xe/0x10 [478381.250465] [<ffffffff8177e892>] __mutex_lock_slowpath+0x92/0x100 [478381.251409] [<ffffffff8177e91b>] mutex_lock+0x1b/0x2f [478381.252199] [<ffffffff817745ae>] lookup_slow+0x36/0xab [478381.253023] [<ffffffff811c690e>] link_path_walk+0x7ae/0x820 [478381.253877] [<ffffffff811aeb41>] ? try_charge+0xc1/0x700 [478381.254690] [<ffffffff811c6a42>] path_init+0xc2/0x430 [478381.255525] [<ffffffff813e6e26>] ? security_file_alloc+0x16/0x20 [478381.256450] [<ffffffff811c8c57>] path_openat+0x77/0x620 [478381.257256] [<ffffffff8115b2fb>] ? lru_cache_add_active_or_unevictable+0x2b/0xa0 [478381.258390] [<ffffffff8117b623>] ? handle_mm_fault+0x13f3/0x1720 [478381.259309] [<ffffffff811cb253>] do_filp_open+0x43/0xa0 [478381.260139] [<ffffffff811d7ae2>] ? __alloc_fd+0x42/0x120 [478381.260962] [<ffffffff811b95ac>] do_sys_open+0x13c/0x230 [478381.261779] [<ffffffff81011393>] ? syscall_trace_enter_phase1+0x113/0x170 [478381.262851] [<ffffffff811b96c2>] SyS_open+0x22/0x30 [478381.263598] [<ffffffff81780532>] system_call_fastpath+0x12/0x17 [478381.264551] ------------[ cut here ]------------ [478381.265377] ------------[ cut here ]------------ Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
-
Ming Lei authored
Use BIO_MAX_PAGES instead and we will remove BIO_MAX_SIZE. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lei authored
No one need this macro, so remove it. The motivation is for supporting multipage bvecs, in which we only know what the max count of bvecs is supported in the bio, instead of max size or max sectors. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lei authored
BIO_MAX_PAGES is used as maximum count of bvecs, so replace BIO_MAX_SECTORS with BIO_MAX_PAGES since BIO_MAX_SECTORS is to be removed. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lei authored
bvec has one native/mature iterator for long time, so not necessary to use the reinvented wheel for iterating bvecs in lib/iov_iter.c. Two ITER_BVEC test cases are run: - xfstest(-g auto) on loop dio/aio, no regression found - swap file works well under extreme stress(stress-ng --all 64 -t 800 -v), and lots of OOMs are triggerd, and the whole system still survives Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lei authored
bvec_iter_advance() only writes the parameter of iterator, so the base address of bvec can be marked as const safely. Without the change, we can see compiling warning in the following patch for implementing iterate_bvec(): lib/iov_iter.c with bvec iterator. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lei authored
This patch moves 'struct bio_vec' and 'struct bvec_iter' into 'include/linux/bvec.h', then always include this header into 'include/linux/blk_types.h'. With this change, both 'struct bvec_iter' and bvec iterator helpers don't depend on CONFIG_BLOCK any more, then we can use bvec iterator to implement iterate_bvec(): lib/iov_iter.c. Reviewed-by: Christoph Hellwig <hch@lst.de> Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Ming Lei <ming.lei@canonical.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Ming Lei authored
bvec iterator helpers should be used to implement by iterate_bvec():lib/iov_iter.c too, and move them into one header, so that we can keep bvec iterator header out of CONFIG_BLOCK. Then we can remove the reinventing of wheel in iterate_bvec(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Tested-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Omar Sandoval authored
If ->queue_rq() returns BLK_MQ_RQ_QUEUE_OK, we use continue and skip over the rest of the loop body. However, dptr is assigned later in the loop body, and the BLK_MQ_RQ_QUEUE_OK case is exactly the case that we'd want it for. NVMe isn't actually using BLK_MQ_F_DEFER_ISSUE yet, nor is any other in-tree driver, but if the code's going to be there, it might as well work. Fixes: 74c45052 ("blk-mq: add a 'list' parameter to ->queue_rq()") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Christoph Hellwig authored
Keep the 32-bit CPU and cmd_type flags together to avoid holes on 64-bit architectures. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 08 Jun, 2016 4 commits
-
-
Mike Christie authored
This was missed from my last patchset. This patch has ext4 crypto code use the bio op helper to set the operation. The operation (discard, write, writesame, etc) is now defined seperately from the other REQ bits. They still share the bi_rw field to save space, so we use these helpers so modules do not have to worry about setting/overwriting info. Jens, I am not sure how you handle patches on top of patches in the next branches. If you merge patches that fix issues in previous patches in next, then this patch could be part of commit 95fe6c1a Author: Mike Christie <mchristi@redhat.com> Date: Sun Jun 5 14:31:48 2016 -0500 block, fs, mm, drivers: use bio set/get op accessors Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jan Kara authored
Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jeff Moyer authored
Expose interfaces to tune time slices of CFQ IO scheduler in microseconds. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Jeff Moyer authored
Convert all time-keeping in CFQ IO scheduler from jiffies to nanoseconds so that we can later make the intervals more fine-grained than jiffies. One jiffie is several miliseconds and even for today's rotating disks that is a noticeable amount of time and thus we leave disk unnecessarily idle. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
-
- 07 Jun, 2016 22 commits
-
-
Mike Christie authored
To avoid confusion between REQ_OP_FLUSH, which is handled by request_fn drivers, and upper layers requesting the block layer perform a flush sequence along with possibly a WRITE, this patch renames REQ_FLUSH to REQ_PREFLUSH. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
The last patch added a REQ_OP_FLUSH for request_fn drivers and the next patch renames REQ_FLUSH to REQ_PREFLUSH which will be used by file systems and make_request_fn drivers so they can send a write/flush combo. This patch drops xen's use of REQ_FLUSH to track if it supports REQ_OP_FLUSH requests, so REQ_FLUSH can be deleted. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Juergen Gross <kernel@pfupf.net> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
This adds a REQ_OP_FLUSH operation that is sent to request_fn based drivers by the block layer's flush code, instead of sending requests with the request->cmd_flags REQ_FLUSH bit set. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
This patch drops the compat definition of req_op where it matches the rq_flag_bits definitions, and drops the related old and compat code that allowed users to set either the op or flags for the operation. We also then store the operation in the bi_rw/cmd_flags field similar to how we used to store the bio ioprio where it sat in the upper bits of the field. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
We don't need bi_rw to be so large on 64 bit archs, so reduce it to unsigned int. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
In the next patch, we move drop the compat code and make the op a separate value that is hidden in bi_rw. To give the op and rq bits flags room to grow this moves prio to its own field. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
The block layer will set the correct READ/WRITE operation flags/fields when creating a request, so there is not need for drivers to set the REQ_WRITE flag. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
Have blktrace use the req/bio op accessor to get the REQ_OP. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
The req operation REQ_OP is separated from the rq_flag_bits definition. This converts the block layer drivers to use req_op to get the op from the request struct. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
This patch converts the is_sync helpers to use separate variables for the operation and flags. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
This patch converts the block layer merging code to use separate variables for the operation and flags, and to check req_op for the REQ_OP. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
The bio and request operation and flags are going to be separate definitions, so we cannot pass them in as a bitmap. This patch converts the blkg_rwstat code and its caller, cfq, to pass in the values separately. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
This patch converts the elevator code to use separate variables for the operation and flags, and to check req_op for the REQ_OP. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
This patch modifies the blk mq request creation code to use separate variables for the operation and flags, because in the the next patches the struct request users will be converted like was done for bios. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
This patch prepares *_get_request/*_put_request and freed_request, to use separate variables for the operation and flags. In the next patches the struct request users will be converted like was done for bios where the op and flags are set separately. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
The bio users should now always be setting up the bio op. This patch has the block layer copy that to the request. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
Separate the op from the rq_flag_bits and have xen set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
Separate the op from the rq_flag_bits and have the target layer set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
Separate the op from the rq_flag_bits and have md set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
Separate the op from the rq_flag_bits and have drbd set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
Separate the op from the rq_flag_bits and have bcache set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-
Mike Christie authored
Separate the op from the rq_flag_bits and have dm set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
-