- 26 Jun, 2023 4 commits
-
-
Yu Kuai authored
There are following smatch warning: block/blk-wbt.c:843 wbt_init() warn: sleeping in atomic context ioc_qos_write() <- disables preempt -> wbt_enable_default() -> wbt_init() wbt_init() will be called from wbt_enable_default() if wbt is not initialized, currently this is only possible in blk_register_queue(), hence wbt_init() will never be called from iocost and this warning is false positive. However, we might support rq_qos destruction dynamically in the future, and it's better to prevent that, hence move wbt_enable_default() outside 'ioc->lock'. This is safe because queue is still freezed. Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/lkml/Y+Ja5SRs886CEz7a@kadam/Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230527010644.647900-5-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
'wb_normal' will set to 0 if 'min_lat_nsec' is 0, and 'min_lat_nsec' can only be set to 0 through sysfs configuration where 'WBT_STATE_OFF_MANUAL' is set together, in the meantime, they can only be cleared together through sysfs afterwards. Hence 'wb_normal != 0' is the same as 'rwb->enable_state != WBT_STATE_OFF_MANUAL'. The code is redundan, hence replace the checking of 'wb_normal' to 'enable_state' in rwb_enabled() and reuse rwb_enabled() for wbt_disabled(). Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230527010644.647900-4-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
enable or disable wbt is always called with queue freezed, so that wbt can never be enabled or disabled while io is still inflight, and this behaviour should always hold to avoid io hang(There have been reported several times). Therefor, the code to handle wbt enable/diskble with io inflight is not and never will be used, hence remove such dead code. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230527010644.647900-3-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
sysfs entry /sys/block/[device]/queue/wbt_lat_usec will be created even if CONFIG_BLK_WBT is disabled, while read and write will always fail. It doesn't make sense to create a sysfs entry that can't be accessed, so don't create such entry. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230527010644.647900-2-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 25 Jun, 2023 2 commits
-
-
Ming Lei authored
Request allocated from sched tags can't be issued via ->queue_rqs() directly, since driver tag isn't allocated yet. This is the 1st misuse of RQF_USE_SCHED for figuring out plug->has_elevator. Request allocated from sched tags can't be ended by blk_mq_end_request_batch() too, fix the 2nd RQF_USE_SCHED misuse in blk_mq_add_to_batch(). Without this patch, NVMe uring cmd passthrough IO workload can run into hang easily with real io scheduler. Fixes: dd6216bb ("blk-mq: make sure elevator callbacks aren't called for passthrough request") Reported-by: Guangwu Zhang <guazhang@redhat.com> Closes: https://lore.kernel.org/linux-block/CAGS2=YrBjpLPOKa-gzcKuuOG60AGth5794PNCDwatdnnscB9ug@mail.gmail.com/ Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230624130105.1443879-1-ming.lei@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jinke Han authored
After commit f382fb0b ("block: remove legacy IO schedulers"), blkio.throttle.io_serviced and blkio.throttle.io_service_bytes become the only stable io stats interface of cgroup v1, and these statistics are done in the blk-throttle code. But the current code only counts the bios that are actually throttled. When the user does not add the throttle limit, the io stats for cgroup v1 has nothing. I fix it according to the statistical method of v2, and made it count all ios accurately. Fixes: a7b36ee6 ("block: move blk-throtl fast path inline") Tested-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Jinke Han <hanjinke.666@bytedance.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230507170631.89607-1-hanjinke.666@bytedance.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 23 Jun, 2023 10 commits
-
-
Jan Kara authored
Commit 2736e8ee ("block: use the holder as indication for exclusive opens") introduced a change that blkdev_put() has to get exclusive holder of the bdev as an argument. However it overlooked that register_bdev() and register_cache() overwrite the bdev->bd_holder field in the block device to point to the real owning object which was not available at the time we called blkdev_get_by_path(). Messing with bdev internals like this is a layering violation and it also causes blkdev_put() to issue warning about mismatching holders. Fix bcache to reopen the block device with appropriate holder once it is available which also restores the behavior that multiple bcache caches cannot claim the same device which was broken by commit 29499ab0 ("bcache: don't pass a stack address to blkdev_get_by_path"). Fixes: 2736e8ee ("block: use the holder as indication for exclusive opens") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Acked-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20230622164658.12861-2-jack@suse.czSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jan Kara authored
Allocate holder object (cache or cached_dev) before offloading the rest of the startup to async work. This will allow us to open the block block device with proper holder. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Coly Li <colyli@suse.de> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/r/20230622164658.12861-1-jack@suse.czSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Merge tag 'md-next-20230623' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.5/block-late Pull MD fixes from Song. * tag 'md-next-20230623' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: raid10: avoid spin_lock from fastpath from raid10_unplug() md: fix 'delete_mutex' deadlock md: use mddev->external to select holder in export_rdev() md/raid1-10: fix casting from randomized structure in raid1_submit_write() md/raid10: fix the condition to call bio_end_io_acct()
-
Yu Kuai authored
Commit 0c0be98b ("md/raid10: prevent unnecessary calls to wake_up() in fast path") missed one place, for example, with: fio -direct=1 -rw=write/randwrite -iodepth=1 ... Plug and unplug are called for each io, then wake_up() from raid10_unplug() will cause lock contention as well. Avoid this contention by using wake_up_barrier() instead of wake_up(), where spin_lock is not held if waitqueue is empty. Fio test script: [global] name=random reads and writes ioengine=libaio direct=1 readwrite=randrw rwmixread=70 iodepth=64 buffered=0 filename=/dev/md0 size=1G runtime=30 time_based randrepeat=0 norandommap refill_buffers ramp_time=10 bs=4k numjobs=400 group_reporting=1 [job1] Test result with ramdisk raid10(By Ali): Before this patch With this patch READ IOPS=2033k IOPS=3642k WRITE IOPS=871k IOPS=1561K By the way, in this scenario, blk_plug_cb() will be allocated and freed for each io, this seems need to be optimized as well. Reported-and-tested-by: Ali Gholami Rudi <aligrudi@gmail.com> Closes: https://lore.kernel.org/all/20231606122233@laper.mirepesht/Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230621105728.1268542-1-yukuai1@huaweicloud.com
-
Yu Kuai authored
Commit 3ce94ce5 ("md: fix duplicate filename for rdev") introduce a new lock 'delete_mutex', and trigger a new deadlock: t1: remove rdev t2: sysfs writer rdev_attr_store rdev_attr_store mddev_lock state_store md_kick_rdev_from_array lock delete_mutex list_add mddev->deleting unlock delete_mutex mddev_unlock mddev_lock ... lock delete_mutex kobject_del // wait for sysfs writers to be done mddev_unlock lock delete_mutex // wait for delete_mutex, deadlock 'delete_mutex' is used to protect the list 'mddev->deleting', turns out that this list can be protected by 'reconfig_mutex' directly, and this lock can be removed. Fix this problem by removing the lock, and use 'reconfig_mutex' to protect the list. mddev_unlock() will move this list to a local list to be handled after 'reconfig_mutex' is dropped. Fixes: 3ce94ce5 ("md: fix duplicate filename for rdev") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230621142933.1395629-1-yukuai1@huaweicloud.com
-
Song Liu authored
mdadm test "10ddf-create-fail-rebuild" triggers warnings like the following [ 215.526357] ------------[ cut here ]------------ [ 215.527243] WARNING: CPU: 18 PID: 1264 at block/bdev.c:617 blkdev_put+0x269/0x350 [ 215.528334] Modules linked in: [ 215.528806] CPU: 18 PID: 1264 Comm: mdmon Not tainted 6.4.0-rc2+ #768 [ 215.529863] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS [ 215.531464] RIP: 0010:blkdev_put+0x269/0x350 [ 215.532167] Code: ff ff 49 8d 7d 10 e8 56 bf b8 ff 4d 8b 65 10 49 8d bc 24 58 05 00 00 e8 05 be b8 ff 41 83 ac 24 58 05 00 00 01 e9 44 ff ff ff <0f> 0b e9 52 fe ff ff 0f 0b e9 6b fe ff ff1 [ 215.534780] RSP: 0018:ffffc900040bfbf0 EFLAGS: 00010283 [ 215.535635] RAX: ffff888174001000 RBX: ffff88810b1c3b00 RCX: ffffffff819a4061 [ 215.536645] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff88810b1c3ba0 [ 215.537657] RBP: ffff88810dbde800 R08: fffffbfff0fca983 R09: fffffbfff0fca983 [ 215.538674] R10: ffffc900040bfbf0 R11: fffffbfff0fca982 R12: ffff88810b1c3b38 [ 215.539687] R13: ffff88810b1c3b10 R14: ffff88810dbdecb8 R15: ffff88810b1c3b00 [ 215.540833] FS: 00007f2aabdff700(0000) GS:ffff888dfb400000(0000) knlGS:0000000000000000 [ 215.541961] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 215.542775] CR2: 00007fa19a85d934 CR3: 000000010c076006 CR4: 0000000000370ee0 [ 215.543814] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 215.544840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 215.545885] Call Trace: [ 215.546257] <TASK> [ 215.546608] export_rdev.isra.63+0x71/0xe0 [ 215.547338] mddev_unlock+0x1b1/0x2d0 [ 215.547898] array_state_store+0x28d/0x450 [ 215.548519] md_attr_store+0xd7/0x150 [ 215.549059] ? __pfx_sysfs_kf_write+0x10/0x10 [ 215.549702] kernfs_fop_write_iter+0x1b9/0x260 [ 215.550351] vfs_write+0x491/0x760 [ 215.550863] ? __pfx_vfs_write+0x10/0x10 [ 215.551445] ? __fget_files+0x156/0x230 [ 215.552053] ksys_write+0xc0/0x160 [ 215.552570] ? __pfx_ksys_write+0x10/0x10 [ 215.553141] ? ktime_get_coarse_real_ts64+0xec/0x100 [ 215.553878] do_syscall_64+0x3a/0x90 [ 215.554403] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 215.555125] RIP: 0033:0x7f2aade11847 [ 215.555696] Code: c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 1b fd ff ff 4c 89 e2 48 89 ee 89 df 41 89 c0 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 448 [ 215.558398] RSP: 002b:00007f2aabdfeba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 [ 215.559516] RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 00007f2aade11847 [ 215.560515] RDX: 0000000000000005 RSI: 0000000000438b8b RDI: 0000000000000010 [ 215.561512] RBP: 0000000000438b8b R08: 0000000000000000 R09: 00007f2aaecf0060 [ 215.562511] R10: 000000000e3ba40b R11: 0000000000000293 R12: 0000000000000005 [ 215.563647] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000c70750 [ 215.564693] </TASK> [ 215.565029] irq event stamp: 15979 [ 215.565584] hardirqs last enabled at (15991): [<ffffffff811a7432>] __up_console_sem+0x52/0x60 [ 215.566806] hardirqs last disabled at (16000): [<ffffffff811a7417>] __up_console_sem+0x37/0x60 [ 215.568022] softirqs last enabled at (15716): [<ffffffff8277a2db>] __do_softirq+0x3eb/0x531 [ 215.569239] softirqs last disabled at (15711): [<ffffffff810d8f45>] irq_exit_rcu+0x115/0x160 [ 215.570434] ---[ end trace 0000000000000000 ]--- This means export_rdev() calls blkdev_put with a different holder than the one used by blkdev_get_by_dev(). This is because mddev->major_version == -2 is not a good check for external metadata. Fix this by using mddev->external instead. Also, do not clear mddev->external in md_clean(), as the flag might be used later in export_rdev(). Fixes: 2736e8ee ("block: use the holder as indication for exclusive opens") Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Song Liu <song@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230617052405.305871-1-song@kernel.org
-
Yu Kuai authored
Following build error triggered while build with clang version 17.0.0 with W=1(this can't be reporduced with gcc 13.1.0): drivers/md/raid1-10.c:117:25: error: casting from randomized structure pointer type 'struct block_device *' to 'struct md_rdev *' 117 | struct md_rdev *rdev = (struct md_rdev *)bio->bi_bdev; | ^ Fix this by casting 'bio->bi_bdev' to 'void *', as it used to be. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202306142042.fmjfmTF8-lkp@intel.com/ Fixes: 8295efbe ("md/raid1-10: factor out a helper to submit normal write") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230616012136.3047071-1-yukuai1@huaweicloud.com
-
Li Nan authored
/sys/block/[device]/queue/iostats is used to control whether to count io stat. Write 0 to it will clear queue_flags QUEUE_FLAG_IO_STAT which means iostats is disabled. If we disable iostats and later endable it, the io issued during this period will be counted incorrectly, inflight will be decreased to -1. //T1 set iostats echo 0 > /sys/block/md0/queue/iostats clear QUEUE_FLAG_IO_STAT //T2 issue io if (QUEUE_FLAG_IO_STAT) -> false bio_start_io_acct inflight++ echo 1 > /sys/block/md0/queue/iostats set QUEUE_FLAG_IO_STAT //T3 io end if (QUEUE_FLAG_IO_STAT) -> true bio_end_io_acct inflight-- -> -1 Also, if iostats is enabled while issuing io but disabled while io end, inflight will never be decreased. Fix it by checking start_time when io end. If start_time is not 0, call bio_end_io_acct(). Fixes: 528bc2cf ("md/raid10: enable io accounting") Signed-off-by: Li Nan <linan122@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230609094320.2397604-1-linan666@huaweicloud.com
-
Yu Kuai authored
In order to prevent request_queue to be freed before cleaning up blktrace debugfs entries, commit db59133e ("scsi: sg: fix blktrace debugfs entries leakage") use scsi_device_get(), however, scsi_device_get() will also grab scsi module reference and scsi module can't be removed. It's reported that blktests can't unload scsi_debug after block/001: blktests (master) # ./check block block/001 (stress device hotplugging) [failed] +++ /root/blktests/results/nodev/block/001.out.bad 2023-06-19 Running block/001 Stressing sd +modprobe: FATAL: Module scsi_debug is in use. Fix this problem by grabbing request_queue reference directly, so that scsi host module can still be unloaded while request_queue will be pinged by sg device. Reported-by: Chaitanya Kulkarni <chaitanyak@nvidia.com> Link: https://lore.kernel.org/all/1760da91-876d-fc9c-ab51-999a6f66ad50@nvidia.com/ Fixes: db59133e ("scsi: sg: fix blktrace debugfs entries leakage") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230621160111.1433521-1-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jan Kara authored
ext4_blkdev_remove() passes a wrong holder pointer to blkdev_put() which triggers a warning there. Fix it. Fixes: 2736e8ee ("block: use the holder as indication for exclusive opens") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230622165107.13687-1-jack@suse.czSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 22 Jun, 2023 2 commits
-
-
Christoph Hellwig authored
When we didn't find a device and didn't guess it might be a partition, it might still show up later, so don't disable rootwait for it by returning -EINVAL. Fixes: 079caa35 ("init: clear root_wait on all invalid root= strings") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230622150644.600327-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jordy Zomer authored
This patch fixes a spectre-v1 gadget in cdrom. The gadget could be triggered by speculatively bypassing the cdi->capacity check. Signed-off-by: Jordy Zomer <jordyzomer@google.com> Link: https://lore.kernel.org/all/20230612110040.849318-2-jordyzomer@google.comReviewed-by: Phillip Potter <phil@philpotter.co.uk> Link: https://lore.kernel.org/all/ZI1+1OG9Ut1MqsUC@equinoxSigned-off-by: Phillip Potter <phil@philpotter.co.uk> Link: https://lore.kernel.org/r/20230617113828.1230-2-phil@philpotter.co.ukSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 21 Jun, 2023 7 commits
-
-
Bart Van Assche authored
Fix the documentation of the devt_from_partuuid() return value. Fix the following two recently introduced kernel-doc warnings: block/bdev.c:570: warning: Function parameter or member 'hops' not described in 'bd_finish_claiming' block/early-lookup.c:46: warning: Function parameter or member 'devt' not described in 'devt_from_partuuid' Cc: Christoph Hellwig <hch@lst.de> Fixes: 0718afd4 ("block: introduce holder ops") Fixes: cf056a43 ("init: improve the name_to_dev_t interface") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230621165054.743815-1-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
In case of real io scheduler, q->elevator is set, so blk_mq_run_hw_queue() may just check if scheduler queue has request to dispatch, see __blk_mq_sched_dispatch_requests(). Then IO hang may be caused because all passthorugh requests may stay in sw queue. And any passthrough request should have been inserted to hctx->dispatch always. Reported-by: Guangwu Zhang <guazhang@redhat.com> Fixes: d97217e7 ("blk-mq: don't queue plugged passthrough requests into scheduler") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230621132208.1142318-1-ming.lei@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Ivan Orlov authored
Now that the driver core allows for struct class to be in read-only memory, move the bsg_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-scsi@vger.kernel.org Cc: linux-block@vger.kernel.org Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230620180129.645646-8-gregkh@linuxfoundation.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Ivan Orlov authored
Now that the driver core allows for struct class to be in read-only memory, move the ublk_chr_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Ming Lei <ming.lei@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230620180129.645646-7-gregkh@linuxfoundation.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Ivan Orlov authored
Now that the driver core allows for struct class to be in read-only memory, move the aoe_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Justin Sanders <justin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230620180129.645646-6-gregkh@linuxfoundation.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Ivan Orlov authored
Now that the driver core allows for struct class to be in read-only memory, making all 'class' structures to be declared at build time placing them into read-only memory, instead of having to be dynamically allocated at load time. Cc: "Md. Haris Iqbal" <haris.iqbal@ionos.com> Cc: Jack Wang <jinpu.wang@ionos.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20230620180129.645646-5-gregkh@linuxfoundation.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
FMODE_EXEC has nothing to do with exclusive opens, and even is of the wrong type. We need to check for BLK_OPEN_EXCL here. Fixes: 985958b8 ("block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions()") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230621124914.185992-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 20 Jun, 2023 11 commits
-
-
Michael Schmitz authored
The Amiga partition parser module uses signed int for partition sector address and count, which will overflow for disks larger than 1 TB. Use u64 as type for sector address and size to allow using disks up to 2 TB without LBD support, and disks larger than 2 TB with LBD. The RBD format allows to specify disk sizes up to 2^128 bytes (though native OS limitations reduce this somewhat, to max 2^68 bytes), so check for u64 overflow carefully to protect against overflowing sector_t. Bail out if sector addresses overflow 32 bits on kernels without LBD support. This bug was reported originally in 2012, and the fix was created by the RDB author, Joanne Dow <jdow@earthlink.net>. A patch had been discussed and reviewed on linux-m68k at that time but never officially submitted (now resubmitted as patch 1 in this series). This patch adds additional error checking and warning messages. Reported-by: Martin Steigerwald <Martin@lichtvoll.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=43511 Fixes: 1da177e4 ("Linux-2.6.12-rc2") Message-ID: <201206192146.09327.Martin@lichtvoll.de> Cc: <stable@vger.kernel.org> # 5.2 Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Christoph Hellwig <hch@infradead.org> Link: https://lore.kernel.org/r/20230620201725.7020-4-schmitzmic@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Michael Schmitz authored
The Amiga partition parser module uses signed int for partition sector address and count, which will overflow for disks larger than 1 TB. Use u64 as type for sector address and size to allow using disks up to 2 TB without LBD support, and disks larger than 2 TB with LBD. The RBD format allows to specify disk sizes up to 2^128 bytes (though native OS limitations reduce this somewhat, to max 2^68 bytes), so check for u64 overflow carefully to protect against overflowing sector_t. This bug was reported originally in 2012, and the fix was created by the RDB author, Joanne Dow <jdow@earthlink.net>. A patch had been discussed and reviewed on linux-m68k at that time but never officially submitted (now resubmitted as patch 1 of this series). Patch 3 (this series) adds additional error checking and warning messages. One of the error checks now makes use of the previously unused rdb_CylBlocks field, which causes a 'sparse' warning (cast to restricted __be32). Annotate all 32 bit fields in affs_hardblocks.h as __be32, as the on-disk format of RDB and partition blocks is always big endian. Reported-by: Martin Steigerwald <Martin@lichtvoll.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=43511 Fixes: 1da177e4 ("Linux-2.6.12-rc2") Message-ID: <201206192146.09327.Martin@lichtvoll.de> Cc: <stable@vger.kernel.org> # 5.2 Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20230620201725.7020-3-schmitzmic@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Michael Schmitz authored
The Amiga partition parser module uses signed int for partition sector address and count, which will overflow for disks larger than 1 TB. Use sector_t as type for sector address and size to allow using disks up to 2 TB without LBD support, and disks larger than 2 TB with LBD. This bug was reported originally in 2012, and the fix was created by the RDB author, Joanne Dow <jdow@earthlink.net>. A patch had been discussed and reviewed on linux-m68k at that time but never officially submitted. This patch differs from Joanne's patch only in its use of sector_t instead of unsigned int. No checking for overflows is done (see patch 3 of this series for that). Reported-by: Martin Steigerwald <Martin@lichtvoll.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=43511 Fixes: 1da177e4 ("Linux-2.6.12-rc2") Message-ID: <201206192146.09327.Martin@lichtvoll.de> Cc: <stable@vger.kernel.org> # 5.2 Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Martin Steigerwald <Martin@lichtvoll.de> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620201725.7020-2-schmitzmic@gmail.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Min Li authored
In the function bdev_add_partition(),there is no check that the start and end sectors exceed the size of the disk before calling add_partition. When we call the block's ioctl interface directly to add a partition, and the capacity of the disk is set to 0 by driver,the command will continue to execute. Signed-off-by: Min Li <min15.li@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230619091214.31615-1-min15.li@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jingbo Xu authored
Allow of unprivileged Persistent Reservation operations on devices if the write permission check on the device node has passed. brw-rw---- 1 root disk 259, 0 Jun 13 07:09 /dev/nvme0n1 In the example above, the "disk" group of nvme0n1 is also allowed to make reservations on the device even without CAP_SYS_ADMIN. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230613084008.93795-3-jefflexu@linux.alibaba.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jingbo Xu authored
Refuse Persistent Reservation operations on partitions as reservation on partitions doesn't make sense. Besides, introduce blkdev_pr_allowed() helper, where more policies could be placed here later. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230613084008.93795-2-jefflexu@linux.alibaba.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
In journal_init_dev(), if super bdev is used as 'j_dev_bd', then blkdev_get_by_dev() is called with NULL holder, otherwise, holder will be journal. However, later in release_journal_dev(), blkdev_put() is called with journal unconditionally, cause following warning: WARNING: CPU: 1 PID: 5034 at block/bdev.c:617 bd_end_claim block/bdev.c:617 [inline] WARNING: CPU: 1 PID: 5034 at block/bdev.c:617 blkdev_put+0x562/0x8a0 block/bdev.c:901 RIP: 0010:blkdev_put+0x562/0x8a0 block/bdev.c:901 Call Trace: <TASK> release_journal_dev fs/reiserfs/journal.c:2592 [inline] free_journal_ram+0x421/0x5c0 fs/reiserfs/journal.c:1896 do_journal_release fs/reiserfs/journal.c:1960 [inline] journal_release+0x276/0x630 fs/reiserfs/journal.c:1971 reiserfs_put_super+0xe4/0x5c0 fs/reiserfs/super.c:616 generic_shutdown_super+0x158/0x480 fs/super.c:499 kill_block_super+0x64/0xb0 fs/super.c:1422 deactivate_locked_super+0x98/0x160 fs/super.c:330 deactivate_super+0xb1/0xd0 fs/super.c:361 cleanup_mnt+0x2ae/0x3d0 fs/namespace.c:1247 task_work_run+0x16f/0x270 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xadc/0x2a30 kernel/exit.c:874 do_group_exit+0xd4/0x2a0 kernel/exit.c:1024 __do_sys_exit_group kernel/exit.c:1035 [inline] __se_sys_exit_group kernel/exit.c:1033 [inline] __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1033 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fix this problem by passing in NULL holder in this case. Reported-by: syzbot+04625c80899f4555de39@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=04625c80899f4555de39 Fixes: 2736e8ee ("block: use the holder as indication for exclusive opens") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620111322.1014775-1-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
After commit 2736e8ee ("block: use the holder as indication for exclusive opens"), blkdev_get_by_dev() will warn if holder is NULL and mode contains 'FMODE_EXCL'. holder from blkdev_get_by_dev() from disk_scan_partitions() is always NULL, hence it should not use 'FMODE_EXCL', which is broben by the commit. For consequence, WARN_ON_ONCE() will be triggered from blkdev_get_by_dev() if user scan partitions with device opened exclusively. Fix this problem by removing 'FMODE_EXCL' from disk_scan_partitions(), as it used to be. Reported-by: syzbot+00cd27751f78817f167b@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=00cd27751f78817f167b Fixes: 2736e8ee ("block: use the holder as indication for exclusive opens") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230618140402.7556-1-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620043536.707249-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Demi Marie Obenour authored
Currently, associating a loop device with a different file descriptor does not increment its diskseq. This allows the following race condition: 1. Program X opens a loop device 2. Program X gets the diskseq of the loop device. 3. Program X associates a file with the loop device. 4. Program X passes the loop device major, minor, and diskseq to something. 5. Program X exits. 6. Program Y detaches the file from the loop device. 7. Program Y attaches a different file to the loop device. 8. The opener finally gets around to opening the loop device and checks that the diskseq is what it expects it to be. Even though the diskseq is the expected value, the result is that the opener is accessing the wrong file. From discussions with Christoph Hellwig, it appears that disk_force_media_change() was supposed to call inc_diskseq(), but in fact it does not. Adding a Fixes: tag to indicate this. Christoph's Reported-by is because he stated that disk_force_media_change() calls inc_diskseq(), which is what led me to discover that it should but does not. Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Fixes: e6138dc1 ("block: add a helper to raise a media changed event") Cc: stable@vger.kernel.org # 5.15+ Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230607170837.1559-1-demi@invisiblethingslab.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Fix a missing conversion to the new BLK_OPEN constant in swim. Fixes: 05bdb996 ("block: replace fmode_t with a block-specific type for block open flags") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620043051.707196-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 16 Jun, 2023 4 commits
-
-
Ming Lei authored
After grabbing q->sysfs_lock, q->elevator may become NULL because of elevator switch. Fix the NULL dereference on q->elevator by checking it with lock. Reported-by: Guangwu Zhang <guazhang@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230616132354.415109-1-ming.lei@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Now that the direct I/O helpers have switched to use iov_iter_extract_pages, these helpers are unused. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230614140341.521331-5-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Now that all block direct I/O helpers use page pinning, this flag is unused. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230614140341.521331-4-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Check for -EFAULT instead of wrapping the check in an ret < 0 block. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230614140341.521331-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-