- 11 Oct, 2023 4 commits
-
-
Yu Kuai authored
Advantages for new apis: - reconfig_mutex is not required; - the weird logical that suspend array hold 'reconfig_mutex' for mddev_check_recovery() to update superblock is not needed; - the specail handling, 'pers->prepare_suspend', for raid456 is not needed; - It's safe to be called at any time once mddev is allocated, and it's designed to be used from slow path where array configuration is changed; - the new helpers is designed to be called before mddev_lock(), hence it support to be interrupted by user as well. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-5-yukuai1@huaweicloud.com
-
Yu Kuai authored
Prepare to cleanup pers->prepare_suspend(), which is used to fix a deadlock in raid456 by returning error for io that is waiting for reshape to make progress in mddev_suspend(). This change will allow reshape to make progress while waiting for io to be done in mddev_suspend() in following patches. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-4-yukuai1@huaweicloud.com
-
Yu Kuai authored
'conf->log' is set with 'reconfig_mutex' grabbed, however, readers are not procted, hence protect it with READ_ONCE/WRITE_ONCE to prevent reading abnormal values. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-3-yukuai1@huaweicloud.com
-
Yu Kuai authored
Protect 'suspend_lo' and 'suspend_hi' with READ_ONCE/WRITE_ONCE to prevent reading abnormal values. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231010151958.145896-2-yukuai1@huaweicloud.com
-
- 09 Oct, 2023 1 commit
-
-
Yu Kuai authored
Currently, discad io is treated the same as normal write io, and for write behind case, io size is limited to: BIO_MAX_VECS * (PAGE_SIZE >> 9) For 0.5KB sector size and 4KB PAGE_SIZE, this is just 1MB. For consequence, if 'WriteMostly' is set to one of the underlying disks, then diskcard io will be splited into 1MB and it will take a long time for the diskcard to finish. Fix this problem by disable write behind for discard io. Reported-by: Roman Mamedov <rm@romanrm.net> Closes: https://lore.kernel.org/all/6a1165f7-c792-c054-b8f0-1ad4f7b8ae01@ultracoder.org/Reported-and-tested-by: Kirill Kirilenko <kirill@ultracoder.org> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231007112105.407449-1-yukuai1@huaweicloud.com
-
- 28 Sep, 2023 1 commit
-
-
Mariusz Tkaczyk authored
We don't need to lock device to reject not supported request in array_state_store(). No functional changes intended. There are differences between ioctl and sysfs handling during stopping. With this change, it will be easier to add additional steps which needs to be completed before mddev_lock() is taken. Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230928125517.12356-1-mariusz.tkaczyk@linux.intel.com
-
- 27 Sep, 2023 2 commits
-
-
Yu Kuai authored
Before this patch, the implementation is hacky and hard to understand: 1) md_seq_start set pos to 1; 2) md_seq_show found pos is 1, then print Personalities; 3) md_seq_next found pos is 1, then it update pos to the first mddev; 4) md_seq_show found pos is not 1 or 2, show mddev; 5) md_seq_next found pos is not 1 or 2, update pos to next mddev; 6) loop 4-5 until the last mddev, then md_seq_next update pos to 2; 7) md_seq_show found pos is 2, then print unused devices; 8) md_seq_next found pos is 2, stop; This patch remove the magic value and use seq_list_start/next/stop() directly, and move printing "Personalities" to md_seq_start(), "unsed devices" to md_seq_stop(): 1) md_seq_start print Personalities, and then set pos to first mddev; 2) md_seq_show show mddev; 3) md_seq_next update pos to next mddev; 4) loop 2-3 until the last mddev; 5) md_seq_stop print unsed devices; Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230927061241.1552837-3-yukuai1@huaweicloud.com
-
Yu Kuai authored
There are no functional changes, prepare to simplify md_seq_ops in next patch. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230927061241.1552837-2-yukuai1@huaweicloud.com
-
- 25 Sep, 2023 1 commit
-
-
Justin Stitt authored
`strncpy` is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. There are three such strncpy uses that this patch addresses: The respective destination buffers are: 1) mddev->clevel 2) clevel 3) mddev->metadata_type We expect mddev->clevel to be NUL-terminated due to its use with format strings: | ret = sprintf(page, "%s\n", mddev->clevel); Furthermore, we can see that mddev->clevel is not expected to be NUL-padded as `md_clean()` merely set's its first byte to NULL -- not the entire buffer: | static void md_clean(struct mddev *mddev) | { | mddev->array_sectors = 0; | mddev->external_size = 0; | ... | mddev->level = LEVEL_NONE; | mddev->clevel[0] = 0; | ... A suitable replacement for this instance is `memcpy` as we know the number of bytes to copy and perform manual NUL-termination at a specified offset. This really decays to just a byte copy from one buffer to another. `strscpy` is also a considerable replacement but using `slen` as the length argument would result in truncation of the last byte unless something like `slen + 1` was provided which isn't the most idiomatic strscpy usage. For the next case, the destination buffer `clevel` is expected to be NUL-terminated based on its usage within kstrtol() which expects NUL-terminated strings. Note that, in context, this code removes a trailing newline which is seemingly not required as kstrtol() can handle trailing newlines implicitly. However, there exists further usage of clevel (or buf) that would also like to have the newline removed. All in all, with similar reasoning to the first case, let's just use memcpy as this is just a byte copy and NUL-termination is handled manually. The third and final case concerning `mddev->metadata_type` is more or less the same as the other two. We expect that it be NUL-terminated based on its usage with seq_printf: | seq_printf(seq, " super external:%s", | mddev->metadata_type); ... and we can surmise that NUL-padding isn't required either due to how it is handled in md_clean(): | static void md_clean(struct mddev *mddev) | { | ... | mddev->metadata_type[0] = 0; | ... So really, all these instances have precisely calculated lengths and purposeful NUL-termination so we can just use memcpy to remove ambiguity surrounding strncpy. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230925-strncpy-drivers-md-md-c-v1-1-2b0093b89c2b@google.com
-
- 22 Sep, 2023 20 commits
-
-
Kees Cook authored
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct linear_conf. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Song Liu <song@kernel.org> Cc: linux-raid@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230915200328.never.064-kees@kernel.org
-
Yu Kuai authored
Now that mddev_suspend() doean't rely on 'mddev->pers' to be set, it's safe to remove such checking. This will also allow the array to be suspended even before the array is ran. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230825030956.1527023-8-yukuai1@huaweicloud.com
-
Yu Kuai authored
Now that mddev_suspend() doean't rely on 'mddev->pers' to be set, it's safe to remove such checking. This will also allow the array to be suspended even before the array is ran. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230825030956.1527023-7-yukuai1@huaweicloud.com
-
Yu Kuai authored
Now that mddev_suspend() doean't rely on 'mddev->pers' to be set, it's safe to call mddev_suspend() earlier. This will also be helper to refactor mddev_suspend() later. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230825030956.1527023-6-yukuai1@huaweicloud.com
-
Yu Kuai authored
After commit 4d27e927344a ("md: don't quiesce in mddev_suspend()"), there is no need to check 'pers->quiesce' before calling mddev_suspend(). Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230825030956.1527023-5-yukuai1@huaweicloud.com
-
Yu Kuai authored
'active_io' used to be initialized while the array is running, and 'mddev->pers' is set while the array is running as well. Hence caller must hold 'reconfig_mutex' and guarantee 'mddev->pers' is set before calling mddev_suspend(). Now that 'active_io' is initialized when mddev is allocated, such restriction doesn't exist anymore. In the meantime, follow up patches will refactor mddev_suspend(), hence add checking for 'mddev->pers' to prevent null-ptr-deref. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230825030956.1527023-4-yukuai1@huaweicloud.com
-
Yu Kuai authored
Currently 'writes_pending' is initialized in pers->run for raid1/5/10, and it's freed while deleing mddev, instead of pers->free. pers->run can be called multiple times before mddev is deleted, and a helper mddev_init_writes_pending() is used to prevent 'writes_pending' to be initialized multiple times, this usage is safe but a litter weird. On the other hand, 'writes_pending' is only initialized for raid1/5/10, however, it's used in common layer, for example: array_state_store set_in_sync if (!mddev->in_sync) -> in_sync is used for all levels // access writes_pending There might be some implicit dependency that I don't recognized to make sure 'writes_pending' can only be accessed for raid1/5/10, but there are no comments about that. By the way, it make sense to initialize 'writes_pending' in common layer because there are already three levels use it. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230825030956.1527023-3-yukuai1@huaweicloud.com
-
Yu Kuai authored
'active_io' is used for mddev_suspend() and it's initialized in md_run(), this restrict that 'reconfig_mutex' must be held and "mddev->pers" must be set before calling mddev_suspend(). Initialize 'active_io' early so that mddev_suspend() is safe to call once mddev is allocated, this will be helpful to refactor mddev_suspend() in following patches. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230825030956.1527023-2-yukuai1@huaweicloud.com
-
Yu Kuai authored
Before this patch, for read-only array: md_check_recovery() check that 'MD_RECOVERY_NEEDED' is set, then it will call remove_and_add_spares() directly to try to remove and add rdevs from array. After this patch: 1) md_check_recovery() check that 'MD_RECOVERY_NEEDED' is set, and the worker 'sync_work' is not pending, and there are rdevs can be added or removed, then it will queue new work md_start_sync(); 2) md_start_sync() will call remove_and_add_spares() and exist; This change make sure that array reconfiguration is independent from daemon, and it'll be much easier to synchronize it with io, consier that io may rely on daemon thread to be done. Also fix a problem that 'pers->spars_active' is called after remove_and_add_spares(), which order is wrong, because spares must active first, and then remove_and_add_spares() can add spares to the array, like what read-write case does: 1) daemon set 'MD_RECOVERY_RUNNING', register new sync thread to do recovery; 2) recovery is done, md_do_sync() set 'MD_RECOVERY_DONE' before return; 3) daemon call 'pers->spars_active', and clear 'MD_RECOVERY_RUNNING'; 4) in the next round of daemon, call remove_and_add_spares() to add spares to the array. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230825031622.1530464-8-yukuai1@huaweicloud.com
-
Yu Kuai authored
There are no functional changes, just to make the code simpler and prepare to delay remove_and_add_spares() to md_start_sync(). Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230825031622.1530464-7-yukuai1@huaweicloud.com
-
Yu Kuai authored
There are no functional changes, just to make the code simpler and prepare to delay remove_and_add_spares() to md_start_sync(). Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230825031622.1530464-6-yukuai1@huaweicloud.com
-
Yu Kuai authored
There are no functional changes, just to make the code simpler and prepare to delay remove_and_add_spares() to md_start_sync(). Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230825031622.1530464-5-yukuai1@huaweicloud.com
-
Yu Kuai authored
Before this patch, for read-write array: 1) md_check_recover() found that something need to be done, and it'll try to grab 'reconfig_mutex'. The case that md_check_recover() need to do something: - array is not suspend; - super_block need to be updated; - 'MD_RECOVERY_NEEDED' or 'MD_RECOVERY_DONE' is set; - unusual case related to safemode; 2) if 'MD_RECOVERY_RUNNING' is not set, and 'MD_RECOVERY_NEEDED' is set, md_check_recover() will try to choose a sync action, and then queue a work md_start_sync(). 3) md_start_sync() register sync_thread; After this patch, 1) is the same; 2) if 'MD_RECOVERY_RUNNING' is not set, and 'MD_RECOVERY_NEEDED' is set, queue a work md_start_sync() directly; 3) md_start_sync() will try to choose a sync action, and then register sync_thread(); Because 'MD_RECOVERY_RUNNING' is cleared when sync_thread is done, 2) and 3) and md_do_sync() is always ran in serial and they can never concurrent, this change should not introduce any behavior change for now. Also fix a problem that md_start_sync() can clear 'MD_RECOVERY_RUNNING' without protection in error path, which might affect the logical in md_check_recovery(). The advantage to change this is that array reconfiguration is independent from daemon now, and it'll be much easier to synchronize it with io, consider that io may rely on daemon thread to be done. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230825031622.1530464-4-yukuai1@huaweicloud.com
-
Yu Kuai authored
There are no functional changes, on the one hand make the code cleaner, on the other hand prevent following checkpatch error in the next patch to delay choosing sync action to md_start_sync(). ERROR: do not use assignment in if condition + } else if ((spares = remove_and_add_spares(mddev, NULL))) { Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230825031622.1530464-3-yukuai1@huaweicloud.com
-
Yu Kuai authored
It's a little weird to borrow 'del_work' for md_start_sync(), declare a new work_struct 'sync_work' for md_start_sync(). Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230825031622.1530464-2-yukuai1@huaweicloud.com
-
Chengming Zhou authored
Add batched mq_ops.queue_rqs() support in null_blk for testing. The implementation is much easy since null_blk doesn't have commit_rqs(). We simply handle each request one by one, if errors are encountered, leave them in the passed in list and return back. There is about 3.6% improvement in IOPS of fio/t/io_uring on null_blk with hw_queue_depth=256 on my test VM, from 1.09M to 1.13M. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230913151616.3164338-6-chengming.zhou@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Chengming Zhou authored
Now we update driver tags request table in blk_mq_get_driver_tag(), so the driver that support queue_rqs() have to update that inflight table by itself. Move it to blk_mq_start_request(), which is a better place where we setup the deadline for request timeout check. And it's just where the request becomes inflight. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230913151616.3164338-5-chengming.zhou@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Chengming Zhou authored
Since active requests have been accounted when allocate driver tags, we can remove this limit now. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230913151616.3164338-4-chengming.zhou@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Chengming Zhou authored
Since the previous patch change to only account active requests when we really allocate the driver tag, the RQF_MQ_INFLIGHT can be removed and no double account problem. 1. none elevator: flush request will use the first pending request's driver tag, won't double account. 2. other elevator: flush request will be accounted when allocate driver tag when issue, and will be unaccounted when it put the driver tag. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230913151616.3164338-3-chengming.zhou@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Chengming Zhou authored
There is a limit that batched queue_rqs() can't work on shared tags queue, since the account of active requests can't be done there. Now we account the active requests only in blk_mq_get_driver_tag(), which is not the time we get driver tag actually (with none elevator). To support batched queue_rqs() on shared tags queue, we move the account of active requests to where we get the driver tag: 1. none elevator: blk_mq_get_tags() and blk_mq_get_tag() 2. other elevator: __blk_mq_alloc_driver_tag() This is clearer and match with the unaccount side, which just happen when we put the driver tag. The other good point is that we don't need RQF_MQ_INFLIGHT trick anymore, which used to avoid double account of flush request. Now we only account when actually get the driver tag, so all is good. We will remove RQF_MQ_INFLIGHT in the next patch. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230913151616.3164338-2-chengming.zhou@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 17 Sep, 2023 11 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Ingo Molnar: "Misc fixes: - Fix an UV boot crash - Skip spurious ENDBR generation on _THIS_IP_ - Fix ENDBR use in putuser() asm methods - Fix corner case boot crashes on 5-level paging - and fix a false positive WARNING on LTO kernels" * tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/purgatory: Remove LTO flags x86/boot/compressed: Reserve more memory for page tables x86/ibt: Avoid duplicate ENDBR in __put_user_nocheck*() x86/ibt: Suppress spurious ENDBR x86/platform/uv: Use alternate source for socket to node data
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler fixes from Ingo Molnar: "Fix a performance regression on large SMT systems, an Intel SMT4 balancing bug, and a topology setup bug on (Intel) hybrid processors" * tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sched: Restore the SD_ASYM_PACKING flag in the DIE domain sched/fair: Fix SMT4 group_smt_balance handling sched/fair: Optimize should_we_balance() for large SMT systems
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull objtool fix from Ingo Molnar: "Fix a cold functions related false-positive objtool warning that triggers on Clang" * tag 'objtool-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix _THIS_IP_ detection for cold functions
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull WARN fix from Ingo Molnar: "Fix a missing preempt-enable in the WARN() slowpath" * tag 'core-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: panic: Reenable preemption in WARN slowpath
-
Linus Torvalds authored
The choose_32_64() macros were added to deal with an odd inconsistency between the 32-bit and 64-bit layout of 'struct stat' way back when in commit a52dd971 ("vfs: de-crapify "cp_new_stat()" function"). Then a decade later Mikulas noticed that said inconsistency had been a mistake in the early x86-64 port, and shouldn't have existed in the first place. So commit 932aba1e ("stat: fix inconsistency between struct stat and struct compat_stat") removed the uses of the helpers. But the helpers remained around, unused. Get rid of them. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull smb client fixes from Steve French: "Three small SMB3 client fixes, one to improve a null check and two minor cleanups" * tag '6.6-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: fix some minor typos and repeated words smb3: correct places where ENOTSUPP is used instead of preferred EOPNOTSUPP smb3: move server check earlier when setting channel sequence number
-
git://git.samba.org/ksmbdLinus Torvalds authored
Pull smb server fixes from Steve French: "Two ksmbd server fixes" * tag '6.6-rc1-ksmbd' of git://git.samba.org/ksmbd: ksmbd: fix passing freed memory 'aux_payload_buf' ksmbd: remove unneeded mark_inode_dirty in set_info_sec()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds authored
Pull ext4 fixes from Ted Ts'o: "Regression and bug fixes for ext4" * tag 'ext4_for_linus-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix rec_len verify error ext4: do not let fstrim block system suspend ext4: move setting of trimmed bit into ext4_try_to_trim_range() jbd2: Fix memory leak in journal_init_common() jbd2: Remove page size assumptions buffer: Make bh_offset() work for compound pages
-
Song Liu authored
-flto* implies -ffunction-sections. With LTO enabled, ld.lld generates multiple .text sections for purgatory.ro: $ readelf -S purgatory.ro | grep " .text" [ 1] .text PROGBITS 0000000000000000 00000040 [ 7] .text.purgatory PROGBITS 0000000000000000 000020e0 [ 9] .text.warn PROGBITS 0000000000000000 000021c0 [13] .text.sha256_upda PROGBITS 0000000000000000 000022f0 [15] .text.sha224_upda PROGBITS 0000000000000000 00002be0 [17] .text.sha256_fina PROGBITS 0000000000000000 00002bf0 [19] .text.sha224_fina PROGBITS 0000000000000000 00002cc0 This causes WARNING from kexec_purgatory_setup_sechdrs(): WARNING: CPU: 26 PID: 110894 at kernel/kexec_file.c:919 kexec_load_purgatory+0x37f/0x390 Fix this by disabling LTO for purgatory. [ AFAICT, x86 is the only arch that supports LTO and purgatory. ] We could also fix this with an explicit linker script to rejoin .text.* sections back into .text. However, given the benefit of LTOing purgatory is small, simply disable the production of more .text.* sections for now. Fixes: b33fff07 ("x86, build: allow LTO to be selected") Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20230914170138.995606-1-song@kernel.org
-
Kirill A. Shutemov authored
The decompressor has a hard limit on the number of page tables it can allocate. This limit is defined at compile-time and will cause boot failure if it is reached. The kernel is very strict and calculates the limit precisely for the worst-case scenario based on the current configuration. However, it is easy to forget to adjust the limit when a new use-case arises. The worst-case scenario is rarely encountered during sanity checks. In the case of enabling 5-level paging, a use-case was overlooked. The limit needs to be increased by one to accommodate the additional level. This oversight went unnoticed until Aaron attempted to run the kernel via kexec with 5-level paging and unaccepted memory enabled. Update wost-case calculations to include 5-level paging. To address this issue, let's allocate some extra space for page tables. 128K should be sufficient for any use-case. The logic can be simplified by using a single value for all kernel configurations. [ Also add a warning, should this memory run low - by Dave Hansen. ] Fixes: 34bbb000 ("x86/boot/compressed: Enable 5-level paging during decompression stage") Reported-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230915070221.10266-1-kirill.shutemov@linux.intel.com
-