- 12 Oct, 2023 1 commit
-
-
Jens Axboe authored
Merge tag 'md-next-20231012' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.7/block Pull MD changes from Song: "1. Rewrite mddev_suspend(), by Yu Kuai; 2. Simplify md_seq_ops, by Yu Kuai; 3. Reduce unnecessary locking array_state_store(), by Mariusz Tkaczyk." * tag 'md-next-20231012' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: (23 commits) md: rename __mddev_suspend/resume() back to mddev_suspend/resume() md: remove old apis to suspend the array md: suspend array in md_start_sync() if array need reconfiguration md/raid5: replace suspend with quiesce() callback md/md-linear: cleanup linear_add() md: cleanup mddev_create/destroy_serial_pool() md: use new apis to suspend array before mddev_create/destroy_serial_pool md: use new apis to suspend array for ioctls involed array reconfiguration md: use new apis to suspend array for adding/removing rdev from state_store() md: use new apis to suspend array for sysfs apis md/raid5: use new apis to suspend array md/raid5-cache: use new apis to suspend array md/md-bitmap: use new apis to suspend array for location_store() md/dm-raid: use new apis to suspend array md: add new helpers to suspend/resume and lock/unlock array md: add new helpers to suspend/resume array md: replace is_md_suspended() with 'mddev->suspended' in md_check_recovery() md/raid5-cache: use READ_ONCE/WRITE_ONCE for 'conf->log' md: use READ_ONCE/WRITE_ONCE for 'suspend_lo' and 'suspend_hi' md/raid1: don't split discard io for write behind ...
-
- 11 Oct, 2023 20 commits
-
-
Song Liu authored
From Yu Kuai, written by Song Liu Recent tests with raid10 revealed many issues with the following scenarios: - add or remove disks to the array - issue io to the array At first, we fixed each problem independently respect that io can concurrent with array reconfiguration. However, with more issues reported continuously, I am hoping to fix these problems thoroughly. Refer to how block layer protect io with queue reconfiguration (for example, change elevator): blk_mq_freeze_queue -> wait for all io to be done, and prevent new io to be dispatched // reconfiguration blk_mq_unfreeze_queue I think we can do something similar to synchronize io with array reconfiguration. Current synchronization works as the following. For the reconfiguration operation: 1. Hold 'reconfig_mutex'; 2. Check that rdev can be added/removed, one condition is that there is no IO (for example, check nr_pending). 3. Do the actual operations to add/remove a rdev, one procedure is set/clear a pointer to rdev. 4. Check if there is still no IO on this rdev, if not, revert the change. IO path uses rcu_read_lock/unlock() to access rdev. - rcu is used wrongly; - There are lots of places involved that old rdev can be read, however, many places doesn't handle old value correctly; - Between step 3 and 4, if new io is dispatched, NULL will be read for the rdev, and data will be lost if step 4 failed. The new synchronization is similar to blk_mq_freeze_queue(). To add or remove disk: 1. Suspend the array, that is, stop new IO from being dispatched and wait for inflight IO to finish. 2. Add or remove rdevs to array; 3. Resume the array; IO path doesn't need to change for now, and all rcu implementation can be removed. Then main work is divided into 3 steps: First, first make sure new apis to suspend the array is general: - make sure suspend array will wait for io to be done(Done by [1]); - make sure suspend array can be called for all personalities(Done by [2]); - make sure suspend array can be called at any time(Done by [3]); - make sure suspend array doesn't rely on 'reconfig_mutex'(PATCH 3-5); Second replace old apis with new apis(PATCH 6-16). Specifically, the synchronization is changed from: lock reconfig_mutex suspend array make changes resume array unlock reconfig_mutex to: suspend array lock reconfig_mutex make changes unlock reconfig_mutex resume array Finally, for the remain path that involved reconfiguration, suspend the array first(PATCH 11,12, [4] and PATCH 17): Preparatory work: [1] https://lore.kernel.org/all/20230621165110.1498313-1-yukuai1@huaweicloud.com/ [2] https://lore.kernel.org/all/20230628012931.88911-2-yukuai1@huaweicloud.com/ [3] https://lore.kernel.org/all/20230825030956.1527023-1-yukuai1@huaweicloud.com/ [4] https://lore.kernel.org/all/20230825031622.1530464-1-yukuai1@huaweicloud.com/ * md-suspend-rewrite: md: rename __mddev_suspend/resume() back to mddev_suspend/resume() md: remove old apis to suspend the array md: suspend array in md_start_sync() if array need reconfiguration md/raid5: replace suspend with quiesce() callback md/md-linear: cleanup linear_add() md: cleanup mddev_create/destroy_serial_pool() md: use new apis to suspend array before mddev_create/destroy_serial_pool md: use new apis to suspend array for ioctls involed array reconfiguration md: use new apis to suspend array for adding/removing rdev from state_store() md: use new apis to suspend array for sysfs apis md/raid5: use new apis to suspend array md/raid5-cache: use new apis to suspend array md/md-bitmap: use new apis to suspend array for location_store() md/dm-raid: use new apis to suspend array md: add new helpers to suspend/resume and lock/unlock array md: add new helpers to suspend/resume array md: replace is_md_suspended() with 'mddev->suspended' in md_check_recovery() md/raid5-cache: use READ_ONCE/WRITE_ONCE for 'conf->log' md: use READ_ONCE/WRITE_ONCE for 'suspend_lo' and 'suspend_hi'
-
Yu Kuai authored
Now that the old apis are removed, __mddev_suspend/resume() can be renamed to their original names. This is done by: sed -i "s/__mddev_suspend/mddev_suspend/g" *.[ch] sed -i "s/__mddev_resume/mddev_resume/g" *.[ch] Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-20-yukuai1@huaweicloud.com
-
Yu Kuai authored
Now that mddev_suspend() and mddev_resume() is not used anywhere, remove them, and remove 'MD_ALLOW_SB_UPDATE' and 'MD_UPDATING_SB' as well. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-19-yukuai1@huaweicloud.com
-
Yu Kuai authored
So that io won't concurrent with array reconfiguration, and it's safe to suspend the array directly because normal io won't rely on md_start_sync(). Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-18-yukuai1@huaweicloud.com
-
Yu Kuai authored
raid5 is the only personality to suspend array in check_reshape() and start_reshape() callback, suspend and quiesce() callback can both wait for all normal io to be done, and prevent new io to be dispatched, the difference is that suspend is implemented in common layer, and quiesce() callback is implemented in raid5. In order to cleanup all the usage of mddev_suspend(), the new apis __mddev_suspend() need to be called before 'reconfig_mutex' is held, and it's not good to affect all the personalities in common layer just for raid5. Hence replace suspend with quiesce() callaback, prepare to reomove all the users of mddev_suspend(). Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-17-yukuai1@huaweicloud.com
-
Yu Kuai authored
Now that caller already suspend the array, there is no need to suspend array in liner_add(). Note that mddev_suspend/resume() is not used anymore. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-16-yukuai1@huaweicloud.com
-
Yu Kuai authored
Now that except for stopping the array, all the callers already suspend the array, there is no need to suspend anymore, hence remove the second parameter. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-15-yukuai1@huaweicloud.com
-
Yu Kuai authored
mddev_create/destroy_serial_pool() will be called from several places where mddev_suspend() will be called later. Prepare to remove the mddev_suspend() from mddev_create/destroy_serial_pool(). Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-14-yukuai1@huaweicloud.com
-
Yu Kuai authored
'reconfig_mutex' will be grabbed before these ioctls, suspend array before holding the lock, so that io won't concurrent with array reconfiguration through ioctls. This is not hot path, so performance is not concerned. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-13-yukuai1@huaweicloud.com
-
Yu Kuai authored
User can write 'remove' and 're-add' to trigger array reconfiguration through sysfs, suspend array in this case so that io won't concurrent with array reconfiguration. And now that all the caller of add_bound_rdev() alread suspend the array, remove mddev_suspend/resume() from add_bound_rdev() as well. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-12-yukuai1@huaweicloud.com
-
Yu Kuai authored
Convert to use new apis in following sysfs apis: - level_store - suspend_lo_store - suspend_hi_store - serialize_policy_store These are not hot path, so performance is not concerned. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-11-yukuai1@huaweicloud.com
-
Yu Kuai authored
Convert to use new apis, the old apis will be removed eventually. These are not hot path, so performance is not concerned. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-10-yukuai1@huaweicloud.com
-
Yu Kuai authored
Convert to use new apis, the old apis will be removed eventually. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-9-yukuai1@huaweicloud.com
-
Yu Kuai authored
Convert to use new apis, the old apis will be removed eventually. This is not hot path, so performance is not concerned. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-8-yukuai1@huaweicloud.com
-
Yu Kuai authored
Convert to use new apis, the old apis will be removed eventually. These are not hot path, so performance is not concerned. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-7-yukuai1@huaweicloud.com
-
Yu Kuai authored
The new helpers suspend the array first and then lock the array, Prepare to refactor from: mddev_lock/lock_nointr mddev_suspend ... mddev_resuem mddev_lock With: mddev_suspend_and_lock/lock_nointr ... mddev_unlock_and_resume After all the use cases is refactored, mddev_suspend/resume() will be removed. And mddev_suspend_and_lock() will also replace mddev_lock() for the case that the array will be reconfigured, in order to synchronize with io to prevent problems in many corner cases. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-6-yukuai1@huaweicloud.com
-
Yu Kuai authored
Advantages for new apis: - reconfig_mutex is not required; - the weird logical that suspend array hold 'reconfig_mutex' for mddev_check_recovery() to update superblock is not needed; - the specail handling, 'pers->prepare_suspend', for raid456 is not needed; - It's safe to be called at any time once mddev is allocated, and it's designed to be used from slow path where array configuration is changed; - the new helpers is designed to be called before mddev_lock(), hence it support to be interrupted by user as well. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-5-yukuai1@huaweicloud.com
-
Yu Kuai authored
Prepare to cleanup pers->prepare_suspend(), which is used to fix a deadlock in raid456 by returning error for io that is waiting for reshape to make progress in mddev_suspend(). This change will allow reshape to make progress while waiting for io to be done in mddev_suspend() in following patches. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-4-yukuai1@huaweicloud.com
-
Yu Kuai authored
'conf->log' is set with 'reconfig_mutex' grabbed, however, readers are not procted, hence protect it with READ_ONCE/WRITE_ONCE to prevent reading abnormal values. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-3-yukuai1@huaweicloud.com
-
Yu Kuai authored
Protect 'suspend_lo' and 'suspend_hi' with READ_ONCE/WRITE_ONCE to prevent reading abnormal values. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-2-yukuai1@huaweicloud.com
-
- 09 Oct, 2023 1 commit
-
-
Yu Kuai authored
Currently, discad io is treated the same as normal write io, and for write behind case, io size is limited to: BIO_MAX_VECS * (PAGE_SIZE >> 9) For 0.5KB sector size and 4KB PAGE_SIZE, this is just 1MB. For consequence, if 'WriteMostly' is set to one of the underlying disks, then diskcard io will be splited into 1MB and it will take a long time for the diskcard to finish. Fix this problem by disable write behind for discard io. Reported-by:
Roman Mamedov <rm@romanrm.net> Closes: https://lore.kernel.org/all/6a1165f7-c792-c054-b8f0-1ad4f7b8ae01@ultracoder.org/Reported-and-tested-by:
Kirill Kirilenko <kirill@ultracoder.org> Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231007112105.407449-1-yukuai1@huaweicloud.com
-
- 04 Oct, 2023 5 commits
-
-
Jan Höppner authored
The length values for volume label type and volume label id are hard-coded in several places. Provide defines for those values and replace all occurrences accordingly. Note that the length is defined and used, and not the size since the volume label type string and volume label id string are not nul-terminated. Signed-off-by:
Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by:
Stefan Haberland <sth@linux.ibm.com> Signed-off-by:
Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20230915131001.697070-4-sth@linux.ibm.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jan Höppner authored
strncpy() is deprecated and needs to be replaced. The volume label information strings are not nul-terminated and strncpy() can simply be replaced with memcpy(). To enhance the readability of find_label() alongside this change, the following improvements are made: - Introduce the array dasd_vollabels[] containing all information necessary for the label detection. - Provide a helper function to obtain an index value corresponding to a volume label type. This allows the use of a switch statement to reduce indentation levels. - The 'temp' variable is used to check against valid volume label types. In the good case, this variable already contains the volume label type making it unnecessary to copy the information again from e.g. label->vol.vollbl. Remove the 'temp' variable and the second copy as all information are already provided. - Remove the 'found' variable and replace it with early returns Signed-off-by:
Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by:
Stefan Haberland <sth@linux.ibm.com> Signed-off-by:
Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20230915131001.697070-3-sth@linux.ibm.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jan Höppner authored
The data holding the volume label information is zeroed in case no valid volume label was found. Since the label information isn't used in that case, zeroing the data doesn't provide any value whatsoever. Remove the unnecessary memset() call accordingly. Signed-off-by:
Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by:
Stefan Haberland <sth@linux.ibm.com> Signed-off-by:
Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20230915131001.697070-2-sth@linux.ibm.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Justin Stitt authored
`strncpy` is deprecated for use on NUL-terminated destination strings [1]. `aoe_iflist` is expected to be NUL-terminated which is evident by its use with string apis later on like `strspn`: | p = aoe_iflist + strspn(aoe_iflist, WHITESPACE); It also seems `aoe_iflist` does not need to be NUL-padded which means `strscpy` [2] is a suitable replacement due to the fact that it guarantees NUL-termination on the destination buffer while not unnecessarily NUL-padding. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Cc: Kees Cook <keescook@chromium.org> Cc: Xu Panda <xu.panda@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com> Signed-off-by:
Justin Stitt <justinstitt@google.com> Reviewed-by:
Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230919-strncpy-drivers-block-aoe-aoenet-c-v2-1-3d5d158410e9@google.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Justin Stitt authored
`strncpy` is deprecated for use on NUL-terminated destination strings [1]. We should favor a more robust and less ambiguous interface. We expect that both `nullb->disk_name` and `disk->disk_name` be NUL-terminated: | snprintf(nullb->disk_name, sizeof(nullb->disk_name), | "%s", config_item_name(&dev->group.cg_item)); ... | pr_info("disk %s created\n", nullb->disk_name); It seems like NUL-padding may be required due to __assign_disk_name() utilizing a memcpy as opposed to a `str*cpy` api. | static inline void __assign_disk_name(char *name, struct gendisk *disk) | { | if (disk) | memcpy(name, disk->disk_name, DISK_NAME_LEN); | else | memset(name, 0, DISK_NAME_LEN); | } Then we go and print it with `__print_disk_name` which wraps `nullb_trace_disk_name()`. | #define __print_disk_name(name) nullb_trace_disk_name(p, name) This function obviously expects a NUL-terminated string. | const char *nullb_trace_disk_name(struct trace_seq *p, char *name) | { | const char *ret = trace_seq_buffer_ptr(p); | | if (name && *name) | trace_seq_printf(p, "disk=%s, ", name); | trace_seq_putc(p, 0); | | return ret; | } >From the above, we need both 1) a NUL-terminated string and 2) a NUL-padded string. So, let's use strscpy_pad() as per Kees' suggestion from v1. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Cc: Kees Cook <keescook@chromium.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by:
Justin Stitt <justinstitt@google.com> Reviewed-by:
Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230919-strncpy-drivers-block-null_blk-main-c-v3-1-10cf0a87a2c3@google.comSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 03 Oct, 2023 1 commit
-
-
Joel Granados authored
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) Remove sentinel element from cdrom_table Signed-off-by:
Joel Granados <j.granados@samsung.com> Link: https://lore.kernel.org/lkml/20231002-jag-sysctl_remove_empty_elem_drivers-v2-1-02dd0d46f71e@samsung.comReviewed-by:
Phillip Potter <phil@philpotter.co.uk> Link: https://lore.kernel.org/lkml/20231002214528.15529-1-phil@philpotter.co.ukSigned-off-by:
Phillip Potter <phil@philpotter.co.uk> Link: https://lore.kernel.org/r/20231002220203.15637-2-phil@philpotter.co.ukSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 29 Sep, 2023 1 commit
-
-
Jens Axboe authored
Merge tag 'md-next-20230927' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.7/block Pull MD updates from Song: "1. Make rdev add/remove independent from daemon thread, by Yu Kuai; 2. Refactor code around quiesce() and mddev_suspend(), by Yu Kuai." * tag 'md-next-20230927' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: replace deprecated strncpy with memcpy md/md-linear: Annotate struct linear_conf with __counted_by md: don't check 'mddev->pers' and 'pers->quiesce' from suspend_lo_store() md: don't check 'mddev->pers' from suspend_hi_store() md-bitmap: suspend array earlier in location_store() md-bitmap: remove the checking of 'pers->quiesce' from location_store() md: don't rely on 'mddev->pers' to be set in mddev_suspend() md: initialize 'writes_pending' while allocating mddev md: initialize 'active_io' while allocating mddev md: delay remove_and_add_spares() for read only array to md_start_sync() md: factor out a helper rdev_addable() from remove_and_add_spares() md: factor out a helper rdev_is_spare() from remove_and_add_spares() md: factor out a helper rdev_removeable() from remove_and_add_spares() md: delay choosing sync action to md_start_sync() md: factor out a helper to choose sync action from md_check_recovery() md: use separate work_struct for md_start_sync()
-
- 28 Sep, 2023 1 commit
-
-
Mariusz Tkaczyk authored
We don't need to lock device to reject not supported request in array_state_store(). No functional changes intended. There are differences between ioctl and sysfs handling during stopping. With this change, it will be easier to add additional steps which needs to be completed before mddev_lock() is taken. Reviewed-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230928125517.12356-1-mariusz.tkaczyk@linux.intel.com
-
- 27 Sep, 2023 2 commits
-
-
Yu Kuai authored
Before this patch, the implementation is hacky and hard to understand: 1) md_seq_start set pos to 1; 2) md_seq_show found pos is 1, then print Personalities; 3) md_seq_next found pos is 1, then it update pos to the first mddev; 4) md_seq_show found pos is not 1 or 2, show mddev; 5) md_seq_next found pos is not 1 or 2, update pos to next mddev; 6) loop 4-5 until the last mddev, then md_seq_next update pos to 2; 7) md_seq_show found pos is 2, then print unused devices; 8) md_seq_next found pos is 2, stop; This patch remove the magic value and use seq_list_start/next/stop() directly, and move printing "Personalities" to md_seq_start(), "unsed devices" to md_seq_stop(): 1) md_seq_start print Personalities, and then set pos to first mddev; 2) md_seq_show show mddev; 3) md_seq_next update pos to next mddev; 4) loop 2-3 until the last mddev; 5) md_seq_stop print unsed devices; Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230927061241.1552837-3-yukuai1@huaweicloud.com
-
Yu Kuai authored
There are no functional changes, prepare to simplify md_seq_ops in next patch. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230927061241.1552837-2-yukuai1@huaweicloud.com
-
- 26 Sep, 2023 6 commits
-
-
Coly Li authored
This patch removes old code of badblocks_set(), badblocks_clear() and badblocks_check(), and make them as wrappers to call _badblocks_set(), _badblocks_clear() and _badblocks_check(). By this change now the badblock handing switch to the improved algorithm in _badblocks_set(), _badblocks_clear() and _badblocks_check(). This patch only contains the changes of old code deletion, new added code for the improved algorithms are in previous patches. Signed-off-by:
Coly Li <colyli@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Geliang Tang <geliang.tang@suse.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: NeilBrown <neilb@suse.de> Cc: Vishal L Verma <vishal.l.verma@intel.com> Cc: Xiao Ni <xni@redhat.com> Reviewed-by:
Xiao Ni <xni@redhat.com> Acked-by:
Geliang Tang <geliang.tang@suse.com> Link: https://lore.kernel.org/r/20230811170513.2300-7-colyli@suse.deSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Coly Li authored
This patch rewrites badblocks_check() with similar coding style as _badblocks_set() and _badblocks_clear(). The only difference is bad blocks checking may handle multiple ranges in bad tables now. If a checking range covers multiple bad blocks range in bad block table, like the following condition (C is the checking range, E1, E2, E3 are three bad block ranges in bad block table), +------------------------------------+ | C | +------------------------------------+ +----+ +----+ +----+ | E1 | | E2 | | E3 | +----+ +----+ +----+ The improved badblocks_check() algorithm will divide checking range C into multiple parts, and handle them in 7 runs of a while-loop, +--+ +----+ +----+ +----+ +----+ +----+ +----+ |C1| | C2 | | C3 | | C4 | | C5 | | C6 | | C7 | +--+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ | E1 | | E2 | | E3 | +----+ +----+ +----+ And the start LBA and length of range E1 will be set as first_bad and bad_sectors for the caller. The return value rule is consistent for multiple ranges. For example if there are following bad block ranges in bad block table, Index No. Start Len Ack 0 400 20 1 1 500 50 1 2 650 20 0 the return value, first_bad, bad_sectors by calling badblocks_set() with different checking range can be the following values, Checking Start, Len Return Value first_bad bad_sectors 100, 100 0 N/A N/A 100, 310 1 400 10 100, 440 1 400 10 100, 540 1 400 10 100, 600 -1 400 10 100, 800 -1 400 10 In order to make code review easier, this patch names the improved bad block range checking routine as _badblocks_check() and does not change existing badblock_check() code yet. Later patch will delete old code of badblocks_check() and make it as a wrapper to call _badblocks_check(). Then the new added code won't mess up with the old deleted code, it will be more clear and easier for code review. Signed-off-by:
Coly Li <colyli@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Geliang Tang <geliang.tang@suse.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: NeilBrown <neilb@suse.de> Cc: Vishal L Verma <vishal.l.verma@intel.com> Cc: Xiao Ni <xni@redhat.com> Reviewed-by:
Xiao Ni <xni@redhat.com> Acked-by:
Geliang Tang <geliang.tang@suse.com> Link: https://lore.kernel.org/r/20230811170513.2300-6-colyli@suse.deSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Coly Li authored
With the fundamental ideas and helper routines from badblocks_set() improvement, clearing bad block for multiple ranges is much simpler. With a similar idea from badblocks_set() improvement, this patch simplifies bad block range clearing into 5 situations. No matter how complicated the clearing condition is, we just look at the head part of clearing range with relative already set bad block range from the bad block table. The rested part will be handled in next run of the while-loop. Based on existing helpers added from badblocks_set(), this patch adds two more helpers, - front_clear() Clear the bad block range from bad block table which is front overlapped with the clearing range. - front_splitting_clear() Handle the condition that the clearing range hits middle of an already set bad block range from bad block table. Similar as badblocks_set(), the first part of clearing range is handled with relative bad block range which is find by prev_badblocks(). In most cases a valid hint is provided to prev_badblocks() to avoid unnecessary bad block table iteration. This patch also explains the detail algorithm code comments at beginning of badblocks.c, including which five simplified situations are categrized and how all the bad block range clearing conditions are handled by these five situations. Again, in order to make the code review easier and avoid the code changes mixed together, this patch does not modify badblock_clear() and implement another routine called _badblock_clear() for the improvement. Later patch will delete current code of badblock_clear() and make it as a wrapper to _badblock_clear(), so the code change can be much clear for review. Signed-off-by:
Coly Li <colyli@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Geliang Tang <geliang.tang@suse.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: NeilBrown <neilb@suse.de> Cc: Vishal L Verma <vishal.l.verma@intel.com> Cc: Xiao Ni <xni@redhat.com> Reviewed-by:
Xiao Ni <xni@redhat.com> Acked-by:
Geliang Tang <geliang.tang@suse.com> Link: https://lore.kernel.org/r/20230811170513.2300-5-colyli@suse.deSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Coly Li authored
Recently I received a bug report that current badblocks code does not properly handle multiple ranges. For example, badblocks_set(bb, 32, 1, true); badblocks_set(bb, 34, 1, true); badblocks_set(bb, 36, 1, true); badblocks_set(bb, 32, 12, true); Then indeed badblocks_show() reports, 32 3 36 1 But the expected bad blocks table should be, 32 12 Obviously only the first 2 ranges are merged and badblocks_set() returns and ignores the rest setting range. This behavior is improper, if the caller of badblocks_set() wants to set a range of blocks into bad blocks table, all of the blocks in the range should be handled even the previous part encountering failure. The desired way to set bad blocks range by badblocks_set() is, - Set as many as blocks in the setting range into bad blocks table. - Merge the bad blocks ranges and occupy as less as slots in the bad blocks table. - Fast. Indeed the above proposal is complicated, especially with the following restrictions, - The setting bad blocks range can be acknowledged or not acknowledged. - The bad blocks table size is limited. - Memory allocation should be avoided. The basic idea of the patch is to categorize all possible bad blocks range setting combinations into much less simplified and more less special conditions. Inside badblocks_set() there is an implicit loop composed by jumping between labels 're_insert' and 'update_sectors'. No matter how large the setting bad blocks range is, in every loop just a minimized range from the head is handled by a pre-defined behavior from one of the categorized conditions. The logic is simple and code flow is manageable. The different relative layout between the setting range and existing bad block range are checked and handled (merge, combine, overwrite, insert) by the helpers in previous patch. This patch is to make all the helpers work together with the above idea. This patch only has the algorithm improvement for badblocks_set(). There are following patches contain improvement for badblocks_clear() and badblocks_check(). But the algorithm in badblocks_set() is fundamental and typical, other improvement in clear and check routines are based on all the helpers and ideas in this patch. In order to make the change to be more clear for code review, this patch does not directly modify existing badblocks_set(), and just add a new one named _badblocks_set(). Later patch will remove current existing badblocks_set() code and make it as a wrapper of _badblocks_set(). So the new added change won't be mixed with deleted code, the code review can be easier. Signed-off-by:
Coly Li <colyli@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Geliang Tang <geliang.tang@suse.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: NeilBrown <neilb@suse.de> Cc: Vishal L Verma <vishal.l.verma@intel.com> Cc: Wols Lists <antlists@youngman.org.uk> Cc: Xiao Ni <xni@redhat.com> Reviewed-by:
Xiao Ni <xni@redhat.com> Acked-by:
Geliang Tang <geliang.tang@suse.com> Link: https://lore.kernel.org/r/20230811170513.2300-4-colyli@suse.deSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Coly Li authored
This patch adds several helper routines to improve badblock ranges handling. These helper routines will be used later in the improved version of badblocks_set()/badblocks_clear()/badblocks_check(). - Helpers prev_by_hint() and prev_badblocks() are used to find the bad range from bad table which the searching range starts at or after. - The following helpers are to decide the relative layout between the manipulating range and existing bad block range from bad table. - can_merge_behind() Return 'true' if the manipulating range can backward merge with the bad block range. - can_merge_front() Return 'true' if the manipulating range can forward merge with the bad block range. - can_combine_front() Return 'true' if two adjacent bad block ranges before the manipulating range can be merged. - overlap_front() Return 'true' if the manipulating range exactly overlaps with the bad block range in front of its range. - overlap_behind() Return 'true' if the manipulating range exactly overlaps with the bad block range behind its range. - can_front_overwrite() Return 'true' if the manipulating range can forward overwrite the bad block range in front of its range. - The following helpers are to add the manipulating range into the bad block table. Different routine is called with the specific relative layout between the manipulating range and other bad block range in the bad block table. - behind_merge() Merge the manipulating range with the bad block range behind its range, and return the number of merged length in unit of sector. - front_merge() Merge the manipulating range with the bad block range in front of its range, and return the number of merged length in unit of sector. - front_combine() Combine the two adjacent bad block ranges before the manipulating range into a larger one. - front_overwrite() Overwrite partial of whole bad block range which is in front of the manipulating range. The overwrite may split existing bad block range and generate more bad block ranges into the bad block table. - insert_at() Insert the manipulating range at a specific location in the bad block table. All the above helpers are used in later patches to improve the bad block ranges handling for badblocks_set()/badblocks_clear()/badblocks_check(). Signed-off-by:
Coly Li <colyli@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Geliang Tang <geliang.tang@suse.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: NeilBrown <neilb@suse.de> Cc: Vishal L Verma <vishal.l.verma@intel.com> Cc: Xiao Ni <xni@redhat.com> Reviewed-by:
Xiao Ni <xni@redhat.com> Acked-by:
Geliang Tang <geliang.tang@suse.com> Link: https://lore.kernel.org/r/20230811170513.2300-3-colyli@suse.deSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
Coly Li authored
This patch adds the following helper structure and routines into badblocks.h, - struct badblocks_context This structure is used in improved badblocks code for bad table iteration. - BB_END() The macro to calculate end LBA of a bad range record from bad table. - badblocks_full() and badblocks_empty() The inline routines to check whether bad table is full or empty. - set_changed() and clear_changed() The inline routines to set and clear 'changed' tag from struct badblocks. These new helper structure and routines can help to make the code more clear, they will be used in the improved badblocks code in following patches. Signed-off-by:
Coly Li <colyli@suse.de> Reviewed-by:
Xiao Ni <xni@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Geliang Tang <geliang.tang@suse.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: NeilBrown <neilb@suse.de> Cc: Vishal L Verma <vishal.l.verma@intel.com> Acked-by:
Geliang Tang <geliang.tang@suse.com> Link: https://lore.kernel.org/r/20230811170513.2300-2-colyli@suse.deSigned-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 25 Sep, 2023 1 commit
-
-
Justin Stitt authored
`strncpy` is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. There are three such strncpy uses that this patch addresses: The respective destination buffers are: 1) mddev->clevel 2) clevel 3) mddev->metadata_type We expect mddev->clevel to be NUL-terminated due to its use with format strings: | ret = sprintf(page, "%s\n", mddev->clevel); Furthermore, we can see that mddev->clevel is not expected to be NUL-padded as `md_clean()` merely set's its first byte to NULL -- not the entire buffer: | static void md_clean(struct mddev *mddev) | { | mddev->array_sectors = 0; | mddev->external_size = 0; | ... | mddev->level = LEVEL_NONE; | mddev->clevel[0] = 0; | ... A suitable replacement for this instance is `memcpy` as we know the number of bytes to copy and perform manual NUL-termination at a specified offset. This really decays to just a byte copy from one buffer to another. `strscpy` is also a considerable replacement but using `slen` as the length argument would result in truncation of the last byte unless something like `slen + 1` was provided which isn't the most idiomatic strscpy usage. For the next case, the destination buffer `clevel` is expected to be NUL-terminated based on its usage within kstrtol() which expects NUL-terminated strings. Note that, in context, this code removes a trailing newline which is seemingly not required as kstrtol() can handle trailing newlines implicitly. However, there exists further usage of clevel (or buf) that would also like to have the newline removed. All in all, with similar reasoning to the first case, let's just use memcpy as this is just a byte copy and NUL-termination is handled manually. The third and final case concerning `mddev->metadata_type` is more or less the same as the other two. We expect that it be NUL-terminated based on its usage with seq_printf: | seq_printf(seq, " super external:%s", | mddev->metadata_type); ... and we can surmise that NUL-padding isn't required either due to how it is handled in md_clean(): | static void md_clean(struct mddev *mddev) | { | ... | mddev->metadata_type[0] = 0; | ... So really, all these instances have precisely calculated lengths and purposeful NUL-termination so we can just use memcpy to remove ambiguity surrounding strncpy. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by:
Justin Stitt <justinstitt@google.com> Reviewed-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230925-strncpy-drivers-md-md-c-v1-1-2b0093b89c2b@google.com
-
- 22 Sep, 2023 1 commit
-
-
Kees Cook authored
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct linear_conf. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Song Liu <song@kernel.org> Cc: linux-raid@vger.kernel.org Signed-off-by:
Kees Cook <keescook@chromium.org> Reviewed-by:
Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by:
Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230915200328.never.064-kees@kernel.org
-