- 24 Dec, 2023 1 commit
-
-
Coly Li authored
If prev_badblocks() returns '-1', it means no valid badblocks record before the checking range. It doesn't make sense to check whether the input checking range is overlapped with the non-existed invalid front range. This patch checkes whether 'prev >= 0' is true before calling overlap_front(), to void such invalid operations. Fixes: 3ea3354c ("badblocks: improve badblocks_check() for multiple ranges handling") Reported-and-tested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/nvdimm/3035e75a-9be0-4bc3-8d4a-6e52c207f277@leemhuis.info/ Cc: Dan Williams <dan.j.williams@intel.com> Cc: Geliang Tang <geliang.tang@suse.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: NeilBrown <neilb@suse.de> Cc: Vishal L Verma <vishal.l.verma@intel.com> Cc: Xiao Ni <xni@redhat.com> Link: https://lore.kernel.org/r/20231224002820.20234-1-colyli@suse.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 21 Dec, 2023 1 commit
-
-
git://git.infradead.org/nvmeJens Axboe authored
Pull NVMe fixes from Keith: "nvme fixes for Linux 6.7 - Revert a commit with improper sleep context (Keith) - Fix async event handling sleep context (Maurizio)" * tag 'nvme-6.7-2023-12-21' of git://git.infradead.org/nvme: nvme-pci: fix sleeping function called from interrupt context Revert "nvme-fc: fix race between error recovery and creating association"
-
- 19 Dec, 2023 2 commits
-
-
Maurizio Lombardi authored
the nvme_handle_cqe() interrupt handler calls nvme_complete_async_event() but the latter may call nvme_auth_stop() which is a blocking function. Sleeping functions can't be called in interrupt context BUG: sleeping function called from invalid context in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/15 Call Trace: <IRQ> __cancel_work_timer+0x31e/0x460 ? nvme_change_ctrl_state+0xcf/0x3c0 [nvme_core] ? nvme_change_ctrl_state+0xcf/0x3c0 [nvme_core] nvme_complete_async_event+0x365/0x480 [nvme_core] nvme_poll_cq+0x262/0xe50 [nvme] Fix the bug by moving nvme_auth_stop() to fw_act_work (executed by the nvme_wq workqueue) Fixes: f50fff73 ("nvme: implement In-Band authentication") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
-
Keith Busch authored
The commit was identified to might sleep in invalid context and is blocking regression testing. This reverts commit ee6fdc50. Link: https://lore.kernel.org/linux-nvme/hkhl56n665uvc6t5d6h3wtx7utkcorw4xlwi7d2t2bnonavhe6@xaan6pu43ap6/ Link: https://lists.infradead.org/pipermail/linux-nvme/2023-December/043756.htmlReported-by: Daniel Wagner <dwagner@suse.de> Reported-by: Maurizio Lombardi <mlombard@redhat.com> Cc: Michael Liang <mliang@purestorage.com> Tested-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
-
- 07 Dec, 2023 4 commits
-
-
Jens Axboe authored
Merge tag 'md-fixes-20231207-1' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.7 Pull MD fix from Song: "This change from Yu Kuai fixes a bug reported in https://bugzilla.kernel.org/show_bug.cgi?id=218200" * tag 'md-fixes-20231207-1' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: split MD_RECOVERY_NEEDED out of mddev_resume
-
Yu Kuai authored
New mddev_resume() calls are added to synchronize IO with array reconfiguration, however, this introduces a performance regression while adding it in md_start_sync(): 1) someone sets MD_RECOVERY_NEEDED first; 2) daemon thread grabs reconfig_mutex, then clears MD_RECOVERY_NEEDED and queues a new sync work; 3) daemon thread releases reconfig_mutex; 4) in md_start_sync a) check that there are spares that can be added/removed, then suspend the array; b) remove_and_add_spares may not be called, or called without really add/remove spares; c) resume the array, then set MD_RECOVERY_NEEDED again! Loop between 2 - 4, then mddev_suspend() will be called quite often, for consequence, normal IO will be quite slow. Fix this problem by don't set MD_RECOVERY_NEEDED again in md_start_sync(), hence the loop will be broken. Fixes: bc08041b ("md: suspend array in md_start_sync() if array need reconfiguration") Suggested-by: Song Liu <song@kernel.org> Reported-by: Janpieter Sollie <janpieter.sollie@edpnet.be> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218200Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231207020724.2797445-1-yukuai1@huaweicloud.com
-
git://git.infradead.org/nvmeJens Axboe authored
Pull NVMe fixes from Keith: "nvme fixes for Linux 6.7 - Proper nvme ctrl state setting (Keith) - Passthrough command optimization (Keith) - Spectre fix (Nitesh) - Kconfig clarifications (Shin'ichiro) - Frozen state deadlock fix (Bitao) - Power setting quirk (Georg)" * tag 'nvme-6.7-2023-12-7' of git://git.infradead.org/nvme: nvme-pci: Add sleep quirk for Kingston drives nvme: fix deadlock between reset and scan nvme: prevent potential spectre v1 gadget nvme: improve NVME_HOST_AUTH and NVME_TARGET_AUTH config descriptions nvme-ioctl: move capable() admin check to the end nvme: ensure reset state check ordering nvme: introduce helper function to get ctrl state
-
Georg Gottleuber authored
Some Kingston NV1 and A2000 are wasting a lot of power on specific TUXEDO platforms in s2idle sleep if 'Simple Suspend' is used. This patch applies a new quirk 'Force No Simple Suspend' to achieve a low power sleep without 'Simple Suspend'. Signed-off-by: Werner Sembach <wse@tuxedocomputers.com> Signed-off-by: Georg Gottleuber <ggo@tuxedocomputers.com> Cc: <stable@vger.kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
-
- 06 Dec, 2023 4 commits
-
-
Jens Axboe authored
Merge tag 'md-fixes-20231206' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.7 Pull MD fixes from Song: "This set from Yu Kuai fixes issues around sync_work, which was introduced in 6.7 kernels." * tag 'md-fixes-20231206' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: fix stopping sync thread md: don't leave 'MD_RECOVERY_FROZEN' in error path of md_set_readonly() md: fix missing flush of sync_work
-
Yu Kuai authored
Currently sync thread is stopped from multiple contex: - idle_sync_thread - frozen_sync_thread - __md_stop_writes - md_set_readonly - do_md_stop And there are some problems: 1) sync_work is flushed while reconfig_mutex is grabbed, this can deadlock because the work function will grab reconfig_mutex as well. 2) md_reap_sync_thread() can't be called directly while md_do_sync() is not finished yet, for example, commit 130443d6 ("md: refactor idle/frozen_sync_thread() to fix deadlock"). 3) If MD_RECOVERY_RUNNING is not set, there is no need to stop sync_thread at all because sync_thread must not be registered. Factor out a helper stop_sync_thread(), so that above contex will behave the same. Fix 1) by flushing sync_work after reconfig_mutex is released, before waiting for sync_thread to be done; Fix 2) bt letting daemon thread to unregister sync_thread; Fix 3) by always checking MD_RECOVERY_RUNNING first. Fixes: db5e653d ("md: delay choosing sync action to md_start_sync()") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231205094215.1824240-4-yukuai1@huaweicloud.com
-
Yu Kuai authored
If md_set_readonly() failed, the array could still be read-write, however 'MD_RECOVERY_FROZEN' could still be set, which leave the array in an abnormal state that sync or recovery can't continue anymore. Hence make sure the flag is cleared after md_set_readonly() returns. Fixes: 88724bfa ("md: wait for pending superblock updates before switching to read-only") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231205094215.1824240-3-yukuai1@huaweicloud.com
-
Yu Kuai authored
Commit ac619781 ("md: use separate work_struct for md_start_sync()") use a new sync_work to replace del_work, however, stop_sync_thread() and __md_stop_writes() was trying to wait for sync_thread to be done, hence they should switch to use sync_work as well. Noted that md_start_sync() from sync_work will grab 'reconfig_mutex', hence other contex can't held the same lock to flush work, and this will be fixed in later patches. Fixes: ac619781 ("md: use separate work_struct for md_start_sync()") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231205094215.1824240-2-yukuai1@huaweicloud.com
-
- 04 Dec, 2023 6 commits
-
-
Bitao Hu authored
If controller reset occurs when allocating namespace, both nvme_reset_work and nvme_scan_work will hang, as shown below. Test Scripts: for ((t=1;t<=128;t++)) do nsid=`nvme create-ns /dev/nvme1 -s 14537724 -c 14537724 -f 0 -m 0 \ -d 0 | awk -F: '{print($NF);}'` nvme attach-ns /dev/nvme1 -n $nsid -c 0 done nvme reset /dev/nvme1 We will find that both nvme_reset_work and nvme_scan_work hung: INFO: task kworker/u249:4:17848 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u249:4 state:D stack: 0 pid:17848 ppid: 2 flags:0x00000028 Workqueue: nvme-reset-wq nvme_reset_work [nvme] Call trace: __switch_to+0xb4/0xfc __schedule+0x22c/0x670 schedule+0x4c/0xd0 blk_mq_freeze_queue_wait+0x84/0xc0 nvme_wait_freeze+0x40/0x64 [nvme_core] nvme_reset_work+0x1c0/0x5cc [nvme] process_one_work+0x1d8/0x4b0 worker_thread+0x230/0x440 kthread+0x114/0x120 INFO: task kworker/u249:3:22404 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u249:3 state:D stack: 0 pid:22404 ppid: 2 flags:0x00000028 Workqueue: nvme-wq nvme_scan_work [nvme_core] Call trace: __switch_to+0xb4/0xfc __schedule+0x22c/0x670 schedule+0x4c/0xd0 rwsem_down_write_slowpath+0x32c/0x98c down_write+0x70/0x80 nvme_alloc_ns+0x1ac/0x38c [nvme_core] nvme_validate_or_alloc_ns+0xbc/0x150 [nvme_core] nvme_scan_ns_list+0xe8/0x2e4 [nvme_core] nvme_scan_work+0x60/0x500 [nvme_core] process_one_work+0x1d8/0x4b0 worker_thread+0x260/0x440 kthread+0x114/0x120 INFO: task nvme:28428 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:nvme state:D stack: 0 pid:28428 ppid: 27119 flags:0x00000000 Call trace: __switch_to+0xb4/0xfc __schedule+0x22c/0x670 schedule+0x4c/0xd0 schedule_timeout+0x160/0x194 do_wait_for_common+0xac/0x1d0 __wait_for_common+0x78/0x100 wait_for_completion+0x24/0x30 __flush_work.isra.0+0x74/0x90 flush_work+0x14/0x20 nvme_reset_ctrl_sync+0x50/0x74 [nvme_core] nvme_dev_ioctl+0x1b0/0x250 [nvme_core] __arm64_sys_ioctl+0xa8/0xf0 el0_svc_common+0x88/0x234 do_el0_svc+0x7c/0x90 el0_svc+0x1c/0x30 el0_sync_handler+0xa8/0xb0 el0_sync+0x148/0x180 The reason for the hang is that nvme_reset_work occurs while nvme_scan_work is still running. nvme_scan_work may add new ns into ctrl->namespaces list after nvme_reset_work frozen all ns->q in ctrl->namespaces list. The newly added ns is not frozen, so nvme_wait_freeze will wait forever. Unfortunately, ctrl->namespaces_rwsem is held by nvme_reset_work, so nvme_scan_work will also wait forever. Now we are deadlocked! PROCESS1 PROCESS2 ============== ============== nvme_scan_work ... nvme_reset_work nvme_validate_or_alloc_ns nvme_dev_disable nvme_alloc_ns nvme_start_freeze down_write ... nvme_ns_add_to_ctrl_list ... up_write nvme_wait_freeze ... down_read nvme_alloc_ns blk_mq_freeze_queue_wait down_write Fix by marking the ctrl with say NVME_CTRL_FROZEN flag set in nvme_start_freeze and cleared in nvme_unfreeze. Then the scan can check it before adding the new namespace (under the namespaces_rwsem). Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com> Reviewed-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
-
Nitesh Shetty authored
This patch fixes the smatch warning, "nvmet_ns_ana_grpid_store() warn: potential spectre issue 'nvmet_ana_group_enabled' [w] (local cap)" Prevent the contents of kernel memory from being leaked to user space via speculative execution by using array_index_nospec. Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
-
Shin'ichiro Kawasaki authored
Currently two similar config options NVME_HOST_AUTH and NVME_TARGET_AUTH have almost same descriptions. It is confusing to choose them in menuconfig. Improve the descriptions to distinguish them. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
-
Keith Busch authored
This can be an expensive call on some kernel configs. Move it to the end after checking the cheaper ways to determine if the command is allowed. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
-
Keith Busch authored
A different CPU may be setting the ctrl->state value, so ensure proper barriers to prevent optimizing to a stale state. Normally it isn't a problem to observe the wrong state as it is merely advisory to take a quicker path during initialization and error recovery, but seeing an old state can report unexpected ENETRESET errors when a reset request was in fact successful. Reported-by: Minh Hoang <mh2022@meta.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Hannes Reinecke <hare@suse.de>
-
Keith Busch authored
The controller state is typically written by another CPU, so reading it should ensure no optimizations are taken. This is a repeated pattern in the driver, so start with adding a convenience function that returns the controller state with READ_ONCE(). Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
-
- 02 Dec, 2023 1 commit
-
-
Jens Axboe authored
Merge tag 'md-fixes-20231201-1' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.7 Pull MD fix from Song: "This change fixes issue with raid456 reshape." * tag 'md-fixes-20231201-1' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md/raid6: use valid sector values to determine if an I/O should wait on the reshape
-
- 01 Dec, 2023 4 commits
-
-
David Jeffery authored
During a reshape or a RAID6 array such as expanding by adding an additional disk, I/Os to the region of the array which have not yet been reshaped can stall indefinitely. This is from errors in the stripe_ahead_of_reshape function causing md to think the I/O is to a region in the actively undergoing the reshape. stripe_ahead_of_reshape fails to account for the q disk having a sector value of 0. By not excluding the q disk from the for loop, raid6 will always generate a min_sector value of 0, causing a return value which stalls. The function's max_sector calculation also uses min() when it should use max(), causing the max_sector value to always be 0. During a backwards rebuild this can cause the opposite problem where it allows I/O to advance when it should wait. Fixing these errors will allow safe I/O to advance in a timely manner and delay only I/O which is unsafe due to stripes in the middle of undergoing the reshape. Fixes: 486f6055 ("md/raid5: Check all disks in a stripe_head for reshape progress") Cc: stable@vger.kernel.org # v6.0+ Signed-off-by: David Jeffery <djeffery@redhat.com> Tested-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231128181233.6187-1-djeffery@redhat.com
-
git://git.infradead.org/nvmeJens Axboe authored
Pull NVMe fixes from Keith: "nvme fixes for Linux 6.7 - Invalid namespace identification error handling (Marizio Ewan, Keith) - Fabrics keep-alive tuning (Mark)" * tag 'nvme-6.7-2023-12-01' of git://git.infradead.org/nvme: nvme-core: check for too small lba shift nvme: check for valid nvme_identify_ns() before using it nvme-core: fix a memory leak in nvme_ns_info_from_identify() nvme: fine-tune sending of first keep-alive
-
Keith Busch authored
The block layer doesn't support logical block sizes smaller than 512 bytes. The nvme spec doesn't support that small either, but the driver isn't checking to make sure the device responded with usable data. Failing to catch this will result in a kernel bug, either from a division by zero when stacking, or a zero length bio. Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <kbusch@kernel.org>
-
Ming Lei authored
Request queue quiesce may interrupt flush sequence, and the original request may have been marked as COMPLETE, but can't get finished because of queue quiesce. This way is fine from driver viewpoint, because flush sequence is block layer concept, and it isn't related with driver. However, driver(such as dm-rq) can call blk_mq_queue_inflight() to count & drain inflight requests, then the wait & drain never gets done because the completed & not-finished flush request is counted as inflight. Fix this issue by not counting completed flush data request as inflight in case of quiesce. Cc: Mike Snitzer <snitzer@kernel.org> Cc: David Jeffery <djeffery@redhat.com> Cc: John Pittman <jpittman@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20231201085605.577730-1-ming.lei@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 29 Nov, 2023 1 commit
-
-
Bart Van Assche authored
It is nontrivial to derive the role of the two attribute groups in source file block/blk-sysfs.c. Hence add a comment that explains their roles. See also commit 6d85ebf9 ("blk-sysfs: add a new attr_group for blk_mq"). Cc: Christoph Hellwig <hch@lst.de> Cc: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20231128194019.72762-1-bvanassche@acm.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 28 Nov, 2023 2 commits
-
-
Yu Kuai authored
Commit 1b0a151c ("blk-core: use pr_warn_ratelimited() in bio_check_ro()") fix message storm by limit the rate, however, there will still be lots of message in the long term. Fix it better by warn once for each partition. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231128123027.971610-3-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
The .bd_inode field of block_device is used in IO fast path of blkdev_write_iter() and blkdev_llseek(), so it is more efficient to keep it into the 1st cacheline. .bd_openers is only touched in open()/close(), and .bd_size_lock is only for updating bdev capacity, which is in slow path too. So swap .bd_inode layout with .bd_openers & .bd_size_lock to move .bd_inode into the 1st cache line. Cc: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231128123027.971610-2-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 27 Nov, 2023 3 commits
-
-
Ewan D. Milne authored
When scanning namespaces, it is possible to get valid data from the first call to nvme_identify_ns() in nvme_alloc_ns(), but not from the second call in nvme_update_ns_info_block(). In particular, if the NSID becomes inactive between the two commands, a storage device may return a buffer filled with zero as per 4.1.5.1. In this case, we can get a kernel crash due to a divide-by-zero in blk_stack_limits() because ns->lba_shift will be set to zero. PID: 326 TASK: ffff95fec3cd8000 CPU: 29 COMMAND: "kworker/u98:10" #0 [ffffad8f8702f9e0] machine_kexec at ffffffff91c76ec7 #1 [ffffad8f8702fa38] __crash_kexec at ffffffff91dea4fa #2 [ffffad8f8702faf8] crash_kexec at ffffffff91deb788 #3 [ffffad8f8702fb00] oops_end at ffffffff91c2e4bb #4 [ffffad8f8702fb20] do_trap at ffffffff91c2a4ce #5 [ffffad8f8702fb70] do_error_trap at ffffffff91c2a595 #6 [ffffad8f8702fbb0] exc_divide_error at ffffffff928506e6 #7 [ffffad8f8702fbd0] asm_exc_divide_error at ffffffff92a00926 [exception RIP: blk_stack_limits+434] RIP: ffffffff92191872 RSP: ffffad8f8702fc80 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff95efa0c91800 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001 RBP: 00000000ffffffff R8: ffff95fec7df35a8 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff95fed33c09a8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffffad8f8702fce0] nvme_update_ns_info_block at ffffffffc06d3533 [nvme_core] #9 [ffffad8f8702fd18] nvme_scan_ns at ffffffffc06d6fa7 [nvme_core] This happened when the check for valid data was moved out of nvme_identify_ns() into one of the callers. Fix this by checking in both callers. Link: https://bugzilla.kernel.org/show_bug.cgi?id=218186 Fixes: 0dd6fff2 ("nvme: bring back auto-removal of deleted namespaces during sequential scan") Cc: stable@vger.kernel.org Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
-
Maurizio Lombardi authored
In case of error, free the nvme_id_ns structure that was allocated by nvme_identify_ns(). Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
-
Mark O'Donovan authored
Keep-alive commands are sent half-way through the kato period. This normally works well but fails when the keep-alive system is started when we are more than half way through the kato. This can happen on larger setups or due to host delays. With this change we now time the initial keep-alive command from the controller initialisation time, rather than the keep-alive mechanism activation time. Signed-off-by: Mark O'Donovan <shiftee@posteo.net> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
-
- 24 Nov, 2023 1 commit
-
-
Markus Weippert authored
Commit 028ddcac ("bcache: Remove unnecessary NULL point check in node allocations") replaced IS_ERR_OR_NULL by IS_ERR. This leads to a NULL pointer dereference. BUG: kernel NULL pointer dereference, address: 0000000000000080 Call Trace: ? __die_body.cold+0x1a/0x1f ? page_fault_oops+0xd2/0x2b0 ? exc_page_fault+0x70/0x170 ? asm_exc_page_fault+0x22/0x30 ? btree_node_free+0xf/0x160 [bcache] ? up_write+0x32/0x60 btree_gc_coalesce+0x2aa/0x890 [bcache] ? bch_extent_bad+0x70/0x170 [bcache] btree_gc_recurse+0x130/0x390 [bcache] ? btree_gc_mark_node+0x72/0x230 [bcache] bch_btree_gc+0x5da/0x600 [bcache] ? cpuusage_read+0x10/0x10 ? bch_btree_gc+0x600/0x600 [bcache] bch_gc_thread+0x135/0x180 [bcache] The relevant code starts with: new_nodes[0] = NULL; for (i = 0; i < nodes; i++) { if (__bch_keylist_realloc(&keylist, bkey_u64s(&r[i].b->key))) goto out_nocoalesce; // ... out_nocoalesce: // ... for (i = 0; i < nodes; i++) if (!IS_ERR(new_nodes[i])) { // IS_ERR_OR_NULL before 028ddcac btree_node_free(new_nodes[i]); // new_nodes[0] is NULL rw_unlock(true, new_nodes[i]); } This patch replaces IS_ERR() by IS_ERR_OR_NULL() to fix this. Fixes: 028ddcac ("bcache: Remove unnecessary NULL point check in node allocations") Link: https://lore.kernel.org/all/3DF4A87A-2AC1-4893-AE5F-E921478419A9@suse.de/ Cc: stable@vger.kernel.org Cc: Zheng Wang <zyytlz.wz@163.com> Cc: Coly Li <colyli@suse.de> Signed-off-by: Markus Weippert <markus@gekmihesg.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 23 Nov, 2023 3 commits
-
-
Arnd Bergmann authored
When CONFIG_NVME_KEYRING is enabled as a loadable module, but the TCP host code is built-in, it fails to link: arm-linux-gnueabi-ld: drivers/nvme/host/tcp.o: in function `nvme_tcp_setup_ctrl': tcp.c:(.text+0x1940): undefined reference to `nvme_tls_psk_default' The problem is that the compile-time conditionals are inconsistent here, using a mix of #ifdef CONFIG_NVME_TCP_TLS, IS_ENABLED(CONFIG_NVME_TCP_TLS) and IS_ENABLED(CONFIG_NVME_KEYRING) checks, with CONFIG_NVME_KEYRING controlling whether the implementation is actually built. Change it to use IS_ENABLED(CONFIG_NVME_KEYRING) checks consistently, which should help readability and make it less error-prone. Combining it with the check for the ctrl->opts->tls flag lets the compiler drop all the TLS code in configurations without this feature, which also helps runtime behavior in addition to avoiding the link failure. To make it possible for the compiler to build the dead code, both the tls_handshake_timeout variable and the TLS specific members of nvme_tcp_queue need to be moved out of the #ifdef block as well, but at least the former of these gets optimized out again. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20231122224719.4042108-4-arnd@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Arnd Bergmann authored
When the NVME target code is built-in but its TCP frontend is a loadable module, enabling keyring support causes a link failure: x86_64-linux-ld: vmlinux.o: in function `nvmet_ports_make': configfs.c:(.text+0x100a2110): undefined reference to `nvme_keyring_id' The problem is that CONFIG_NVME_TARGET_TCP_TLS is a 'bool' symbol that depends on the tristate CONFIG_NVME_TARGET_TCP, so any 'select' from it inherits the state of the tristate symbol rather than the intended CONFIG_NVME_TARGET one that contains the actual call. The same thing is true for CONFIG_KEYS, which itself is required for NVME_KEYRING. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20231122224719.4042108-3-arnd@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Arnd Bergmann authored
In configurations without CONFIG_NVME_TARGET_TCP_TLS, the keyring code might not be available, or using it will result in a runtime failure: x86_64-linux-ld: vmlinux.o: in function `nvmet_ports_make': configfs.c:(.text+0x100a2110): undefined reference to `nvme_keyring_id' Add a check to ensure we only check the keyring if there is a chance of it being used, which avoids both the runtime and link-time problems. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20231122224719.4042108-2-arnd@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 22 Nov, 2023 2 commits
-
-
git://git.infradead.org/nvmeJens Axboe authored
Pull NVMe fixes from Keith: "nvme fixes for Linux 6.7 - TCP TLS fixes (Hannes) - Authentifaction fixes (Mark, Hannes) - Properly terminate target names (Christoph)" * tag 'nvme-6.7-2023-11-22' of git://git.infradead.org/nvme: nvme: move nvme_stop_keep_alive() back to original position nvmet-tcp: always initialize tls_handshake_tmo_work nvmet: nul-terminate the NQNs passed in the connect command nvme: blank out authentication fabrics options if not configured nvme: catch errors from nvme_configure_metadata() nvme-tcp: only evaluate 'tls' option if TLS is selected nvme-auth: set explanation code for failure2 msgs nvme-auth: unlock mutex in one place only
-
Hannes Reinecke authored
Stopping keep-alive not only stops the keep-alive workqueue, but also needs to be synchronized with I/O termination as we must not send a keep-alive command after all I/O had been terminated. So to avoid any regressions move the call to stop_keep_alive() back to its original position and ensure that keep-alive is correctly stopped failing to setup the admin queue. Fixes: 4733b65d ("nvme: start keep-alive after admin queue setup") Suggested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
-
- 21 Nov, 2023 1 commit
-
-
Li Nan authored
If a socket is processing ioctl 'NBD_SET_SOCK', config->socks might be krealloc in nbd_add_socket(), and a garbage request is received now, a UAF may occurs. T1 nbd_ioctl __nbd_ioctl nbd_add_socket blk_mq_freeze_queue T2 recv_work nbd_read_reply sock_xmit krealloc config->socks def config->socks Pass nbd_sock to nbd_read_reply(). And introduce a new function sock_xmit_recv(), which differs from sock_xmit only in the way it get socket. ================================================================== BUG: KASAN: use-after-free in sock_xmit+0x525/0x550 Read of size 8 at addr ffff8880188ec428 by task kworker/u12:1/18779 Workqueue: knbd4-recv recv_work Call Trace: __dump_stack dump_stack+0xbe/0xfd print_address_description.constprop.0+0x19/0x170 __kasan_report.cold+0x6c/0x84 kasan_report+0x3a/0x50 sock_xmit+0x525/0x550 nbd_read_reply+0xfe/0x2c0 recv_work+0x1c2/0x750 process_one_work+0x6b6/0xf10 worker_thread+0xdd/0xd80 kthread+0x30a/0x410 ret_from_fork+0x22/0x30 Allocated by task 18784: kasan_save_stack+0x1b/0x40 kasan_set_track set_alloc_info __kasan_kmalloc __kasan_kmalloc.constprop.0+0xf0/0x130 slab_post_alloc_hook slab_alloc_node slab_alloc __kmalloc_track_caller+0x157/0x550 __do_krealloc krealloc+0x37/0xb0 nbd_add_socket +0x2d3/0x880 __nbd_ioctl nbd_ioctl+0x584/0x8e0 __blkdev_driver_ioctl blkdev_ioctl+0x2a0/0x6e0 block_ioctl+0xee/0x130 vfs_ioctl __do_sys_ioctl __se_sys_ioctl+0x138/0x190 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x61/0xc6 Freed by task 18784: kasan_save_stack+0x1b/0x40 kasan_set_track+0x1c/0x30 kasan_set_free_info+0x20/0x40 __kasan_slab_free.part.0+0x13f/0x1b0 slab_free_hook slab_free_freelist_hook slab_free kfree+0xcb/0x6c0 krealloc+0x56/0xb0 nbd_add_socket+0x2d3/0x880 __nbd_ioctl nbd_ioctl+0x584/0x8e0 __blkdev_driver_ioctl blkdev_ioctl+0x2a0/0x6e0 block_ioctl+0xee/0x130 vfs_ioctl __do_sys_ioctl __se_sys_ioctl+0x138/0x190 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x61/0xc6 Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230911023308.3467802-1-linan666@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 20 Nov, 2023 4 commits
-
-
Jan Höppner authored
In dasd_profile_start() the amount of requests on the device queue are counted. The access to the device queue is unprotected against concurrent access. With a lot of parallel I/O, especially with alias devices enabled, the device queue can change while dasd_profile_start() is accessing the queue. In the worst case this leads to a kernel panic due to incorrect pointer accesses. Fix this by taking the device lock before accessing the queue and counting the requests. Additionally the check for a valid profile data pointer can be done earlier to avoid unnecessary locking in a hot path. Cc: <stable@vger.kernel.org> Fixes: 4fa52aa7 ("[S390] dasd: add enhanced DASD statistics interface") Reviewed-by: Stefan Haberland <sth@linux.ibm.com> Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Signed-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20231025132437.1223363-3-sth@linux.ibm.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Muhammad Muzammil authored
resolve typing mistake from pimary to primary Signed-off-by: Muhammad Muzammil <m.muzzammilashraf@gmail.com> Link: https://lore.kernel.org/r/20231010043140.28416-1-m.muzzammilashraf@gmail.comSigned-off-by: Stefan Haberland <sth@linux.ibm.com> Link: https://lore.kernel.org/r/20231025132437.1223363-2-sth@linux.ibm.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Chengming Zhou authored
When CONFIG_BLK_DEV_NULL_BLK_FAULT_INJECTION is enabled, null_queue_rq() would return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE for the request, which has been marked as MQ_RQ_IN_FLIGHT by blk_mq_start_request(). Then null_queue_rqs() put these requests in the rqlist, return back to the block layer core, which would try to queue them individually again, so the warning in blk_mq_start_request() triggered. Fix it by splitting the null_queue_rq() into two parts: the first is the preparation of request, the second is the handling of request. We put the blk_mq_start_request() after the preparation part, which may fail and return back to the block layer core. The throttling also belongs to the preparation part, so move it before blk_mq_start_request(). And change the return type of null_handle_cmd() to void, since it always return BLK_STS_OK now. Reported-by: <syzbot+fcc47ba2476570cbbeb0@syzkaller.appspotmail.com> Closes: https://lore.kernel.org/all/0000000000000e6aac06098aee0c@google.com/ Fixes: d78bfa13 ("block/null_blk: add queue_rqs() support") Suggested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/r/20231120032521.1012037-1-chengming.zhou@linux.devSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Hannes Reinecke authored
The TLS handshake timeout work item should always be initialized to avoid a crash when cancelling the workqueue. Fixes: 675b453e ("nvmet-tcp: enable TLS handshake upcall") Suggested-by: Maurizio Lombardi <mlombard@redhat.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
-