- 20 Aug, 2019 1 commit
-
-
Revanth Rajashekar authored
Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com> Reviewed-by: Scott Bauer <sbauer@plzdonthack.me> Reviewed-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 19 Aug, 2019 1 commit
-
-
Junxiao Bi authored
The dispatch list is not used any more, as the legacy block IO stack has been removed. Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 15 Aug, 2019 2 commits
-
-
Tejun Heo authored
As inode wb switching may make sync(2) miss some inodes, they're synchronized using wb_switch_rwsem so that no wb switching happens while sync(2) is in progress. In addition to synchronizing the actual switching, the rwsem is also used to prevent queueing new switch attempts while sync(2) is in progress. This is to avoid queueing too many instances while the rwsem is held by sync(2). Unfortunately, this is too agressive and can block wb switching for a long time if sync(2) is frequent. The goal is avoiding expolding the number of scheduled switches, not avoiding scheduling anything. Let's use wb_switch_rwsem only for synchronizing the actual switching and sync(2) and use isw_nr_in_flight instead for limiting the maximum number of scheduled switches. The limit is set to 1024 which should be more than enough while still avoiding extreme situations. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Tejun Heo authored
WB_FRN_TIME_CUT_DIV is used to tell the foreign inode detection logic to ignore short writeback rounds to prevent getting confused by a burst of short writebacks. The parameter is currently 2 meaning that anything smaller than half of the running average writback duration will be ignored. This is unnecessarily aggressive. The detection logic uses 16 history slots and is already reasonably protected against some short bursts confusing it and the current parameter can lead to tens of seconds of missed detection depending on the writeback pattern. Let's change the parameter to 8, so that it only ignores writeback with are smaller than 12.5% of the current running average. v2: Add comment explaining what's going on with the foreign detection parameters. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 14 Aug, 2019 1 commit
-
-
Johannes Weiner authored
psi tracks the time tasks wait for refaulting pages to become uptodate, but it does not track the time spent submitting the IO. The submission part can be significant if backing storage is contended or when cgroup throttling (io.latency) is in effect - a lot of time is spent in submit_bio(). In that case, we underreport memory pressure. Annotate submit_bio() to account submission time as memory stall when the bio is reading userspace workingset pages. Tested-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 12 Aug, 2019 1 commit
-
-
zhengbin authored
If blk_mq_init_allocated_queue->elevator_init_mq fails, need to release the previously requested resources. Fixes: d3484991 ("blk-mq-sched: allow setting of default IO scheduler") Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 09 Aug, 2019 3 commits
-
-
Jann Horn authored
As sparse points out, these two copy_from_user() should actually be copy_to_user(). Fixes: 229b53c9 ("take floppy compat ioctls to sodding floppy.c") Cc: stable@vger.kernel.org Acked-by: Alexander Popov <alex.popov@linux.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
A previous commit correctly removed set-but-not-read variables, but this left two new variables now unused. Kill them. Fixes: ba6f7da9 ("lightnvm: remove set but not used variables 'data_len' and 'rq_len'") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Alessio Balsini authored
Enabling Direct I/O with loop devices helps reducing memory usage by avoiding double caching. 32 bit applications running on 64 bits systems are currently not able to request direct I/O because is missing from the lo_compat_ioctl. This patch fixes the compatibility issue mentioned above by exporting LOOP_SET_DIRECT_IO as additional lo_compat_ioctl() entry. The input argument for this ioctl is a single long converted to a 1-bit boolean, so compatibility is preserved. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Alessio Balsini <balsini@android.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 08 Aug, 2019 2 commits
-
-
Zhou Wang authored
In function sg_split, the second sg_calculate_split will return -EINVAL when in_mapped_nents is 0. Indeed there is no need to do second sg_calculate_split and sg_split_mapped when in_mapped_nents is 0, as in_mapped_nents indicates no mapped entry in original sgl. Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Acked-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
YueHaibing authored
drivers/lightnvm/pblk-read.c: In function pblk_submit_read_gc: drivers/lightnvm/pblk-read.c:423:6: warning: variable data_len set but not used [-Wunused-but-set-variable] drivers/lightnvm/pblk-recovery.c: In function pblk_recov_scan_oob: drivers/lightnvm/pblk-recovery.c:368:15: warning: variable rq_len set but not used [-Wunused-but-set-variable] They are not used since commit 48e5da72 ("lightnvm: move metadata mapping to lower level driver") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 07 Aug, 2019 10 commits
-
-
https://github.com/liu-song-6/linuxJens Axboe authored
Pull MD changes from Song. * 'md-next' of https://github.com/liu-song-6/linux: raid1: factor out a common routine to handle the completion of sync write md: don't call spare_active in md_reap_sync_thread if all member devices can't work md: don't set In_sync if array is frozen md: allow last device to be forcibly removed from RAID1/RAID10. md: Convert to use int_pow() md/raid10: end bio when the device faulty md/raid1: end bio when the device faulty md/raid6: Set R5_ReadError when there is read failure on parity disk raid1: use an int as the return value of raise_barrier()
-
Hou Tao authored
It's just code clean-up. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>
-
Guoqing Jiang authored
When add one disk to array, the md_reap_sync_thread is responsible to activate the spare and set In_sync flag for the new member in spare_active(). But if raid1 has one member disk A, and disk B is added to the array. Then we offline A before all the datas are synchronized from A to B, obviously B doesn't have the latest data as A, but B is still marked with In_sync flag. So let's not call spare_active under the condition, otherwise B is still showed with 'U' state which is not correct. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
-
Guoqing Jiang authored
When a disk is added to array, the following path is called in mdadm. Manage_subdevs -> sysfs_freeze_array -> Manage_add -> sysfs_set_str(&info, NULL, "sync_action","idle") Then from kernel side, Manage_add invokes the path (add_new_disk -> validate_super = super_1_validate) to set In_sync flag. Since In_sync means "device is in_sync with rest of array", and the new added disk need to resync thread to help the synchronization of data. And md_reap_sync_thread would call spare_active to set In_sync for the new added disk finally. So don't set In_sync if array is in frozen. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
-
Guoqing Jiang authored
When the 'last' device in a RAID1 or RAID10 reports an error, we do not mark it as failed. This would serve little purpose as there is no risk of losing data beyond that which is obviously lost (as there is with RAID5), and there could be other sectors on the device which are readable, and only readable from this device. This in general this maximises access to data. However the current implementation also stops an admin from removing the last device by direct action. This is rarely useful, but in many case is not harmful and can make automation easier by removing special cases. Also, if an attempt to write metadata fails the device must be marked as faulty, else an infinite loop will result, attempting to update the metadata on all non-faulty devices. So add 'fail_last_dev' member to 'struct mddev', then we can bypasses the 'last disk' checks for RAID1 and RAID10, and control the behavior per array by change sysfs node. Signed-off-by: NeilBrown <neilb@suse.de> [add sysfs node for fail_last_dev by Guoqing] Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
-
Andy Shevchenko authored
Instead of linear approach to calculate power of 10, use generic int_pow() which does it better. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Song Liu <songliubraving@fb.com>
-
Yufen Yu authored
Just like raid1, we do not queue write error bio to retry write and acknowlege badblocks, when the device is faulty. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>
-
Yufen Yu authored
When write bio return error, it would be added to conf->retry_list and wait for raid1d thread to retry write and acknowledge badblocks. In narrow_write_error(), the error bio will be split in the unit of badblock shift (such as one sector) and raid1d thread issues them one by one. Until all of the splited bio has finished, raid1d thread can go on processing other things, which is time consuming. But, there is a scene for error handling that is not necessary. When the device has been set faulty, flush_bio_list() may end bios in pending_bio_list with error status. Since these bios has not been issued to the device actually, error handlding to retry write and acknowledge badblocks make no sense. Even without that scene, when the device is faulty, badblocks info can not be written out to the device. Thus, we also no need to handle the error IO. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>
-
Xiao Ni authored
7471fb77 ("md/raid6: Fix anomily when recovering a single device in RAID6.") avoids rereading P when it can be computed from other members. However, this misses the chance to re-write the right data to P. This patch sets R5_ReadError if the re-read fails. Also, when re-read is skipped, we also missed the chance to reset rdev->read_errors to 0. It can fail the disk when there are many read errors on P member disk (other disks don't have read error) V2: upper layer read request don't read parity/Q data. So there is no need to consider such situation. This is Reported-by: kbuild test robot <lkp@intel.com> Fixes: 7471fb77 ("md/raid6: Fix anomily when recovering a single device in RAID6.") Cc: <stable@vger.kernel.org> #4.4+ Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>
-
Hou Tao authored
Using a sector_t as the return value is misleading, because raise_barrier() only return 0 or -EINTR. Also add comments for the return values of raise_barrier(). Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>
-
- 06 Aug, 2019 4 commits
-
-
Hans Holmberg authored
Now that there no module users left of bio_map_kern, stop exporting the symbol. Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Hans Holmberg <hans@owltronix.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Hans Holmberg authored
There is no reason now not to use kvmalloc, so replace the internal metadata allocation scheme. Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Hans Holmberg <hans@owltronix.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Hans Holmberg authored
Now that blk_rq_map_kern can map both kmem and vmem, move internal metadata mapping down to the lower level driver. Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Hans Holmberg <hans@owltronix.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Hans Holmberg authored
Move the redundant sync handling interface and wait for a completion in the lightnvm core instead. Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Hans Holmberg <hans@owltronix.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 05 Aug, 2019 15 commits
-
-
Ming Lei authored
Spread queues among present CPUs first, then building mapping on other non-present CPUs. So we can minimize count of dead queues which are mapped by un-present CPUs only. Then bad IO performance can be avoided by unbalanced mapping between present CPUs and queues. The similar policy has been applied on Managed IRQ affinity. Cc: Yi Zhang <yi.zhang@redhat.com> Reported-by: Yi Zhang <yi.zhang@redhat.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Implement .cleanup_rq() callback for freeing driver private part of the request. Then we can avoid to leak this part if the request isn't completed by SCSI, and freed by blk-mq or upper layer(such as dm-rq) finally. Cc: Ewan D. Milne <emilne@redhat.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: <stable@vger.kernel.org> Fixes: 396eaf21 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
SCSI maintains its own driver private data hooked off of each SCSI request, and the pridate data won't be freed after scsi_queue_rq() returns BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE. An upper layer driver (e.g. dm-rq) may need to retry these SCSI requests, before SCSI has fully dispatched them, due to a lower level SCSI driver's resource limitation identified in scsi_queue_rq(). Currently SCSI's per-request private data is leaked when the upper layer driver (dm-rq) frees and then retries these requests in response to BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE returns from scsi_queue_rq(). This usecase is so specialized that it doesn't warrant training an existing blk-mq interface (e.g. blk_mq_free_request) to allow SCSI to account for freeing its driver private data -- doing so would add an extra branch for handling a special case that all other consumers of SCSI (and blk-mq) won't ever need to worry about. So the most pragmatic way forward is to delegate freeing SCSI driver private data to the upper layer driver (dm-rq). Do so by adding new .cleanup_rq callback and calling a new blk_mq_cleanup_rq() method from dm-rq. A following commit will implement the .cleanup_rq() hook in scsi_mq_ops. Cc: Ewan D. Milne <emilne@redhat.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: <stable@vger.kernel.org> Fixes: 396eaf21 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Chaitanya Kulkarni authored
This patch implements newly introduced zone reset all operation for null_blk driver. Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Chaitanya Kulkarni authored
This patch implements the zone reset all operation for sd_zbc.c. We add a new boolean parameter for the sd_zbc_setup_reset_cmd() to indicate REQ_OP_ZONE_RESET_ALL command setup. Along with that we add support in the completion path for the zone reset all. Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Chaitanya Kulkarni authored
This implements REQ_OP_ZONE_RESET_ALL as a special case of the block device zone reset operations where we just simply issue bio with the newly introduced req op. We issue this req op when the number of sectors is equal to the device's partition's number of sectors and device has no partitions. We also add support so that blk_op_str() can print the new reset-all zone operation. This patch also adds a generic make request check for newly introduced REQ_OP_ZONE_RESET_ALL req_opf. We simply return error when queue is zoned and reset-all flag is not set for REQ_OP_ZONE_RESET_ALL. Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Chaitanya Kulkarni authored
This patch introduces a new request operation REQ_OP_ZONE_RESET_ALL. This is useful for the applications like mkfs where it needs to reset all the zones present on the underlying block device. As part for this patch we also introduce new QUEUE_FLAG_ZONE_RESETALL which indicates the queue zone reset all capability and corresponding helper macro. Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
Change a reference to the legacy block layer into a reference to blk-mq. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.com> Cc: James Smart <james.smart@broadcom.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
See also commit 8f4236d9 ("block: remove QUEUE_FLAG_BYPASS and ->bypass") # v5.0. Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
Consider the following example: * The logical block size is 4 KB. * The physical block size is 8 KB. * max_sectors equals (16 KB >> 9) sectors. * A non-aligned 4 KB and an aligned 64 KB bio are merged into a single non-aligned 68 KB bio. The current behavior is to split such a bio into (16 KB + 16 KB + 16 KB + 16 KB + 4 KB). The start of none of these five bio's is aligned to a physical block boundary. This patch ensures that such a bio is split into four aligned and one non-aligned bio instead of being split into five non-aligned bios. This improves performance because most block devices can handle aligned requests faster than non-aligned requests. Since the physical block size is larger than or equal to the logical block size, this patch preserves the guarantee that the returned value is a multiple of the logical block size. Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
Move the max_sectors check into bvec_split_segs() such that a single call to that function can do all the necessary checks. This patch optimizes the fast path further, namely if a bvec fits in a page. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
Simplify this function by by removing two if-tests. Other than requiring that the @sectors pointer is not NULL, this patch does not change the behavior of bvec_split_segs(). Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
Since what the bio splitting functions do is nontrivial, document these functions. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
Make it clear to the compiler and also to humans that the functions that query request queue properties do not modify any member of the request_queue data structure. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
blk_mq_tagset_wait_completed_request() has been applied for waiting for completed request's fn, so not necessary to use blk_mq_complete_request_sync() any more. Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Keith Busch <keith.busch@intel.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-