- 12 Jan, 2023 2 commits
-
-
Adrian Huang authored
Commit fb541ca4 ("md: remove lock_bdev / unlock_bdev") removes wrappers for blkdev_get/blkdev_put. However, the uninitialized local static variable of pointer type 'claim_rdev' in md_import_device() is NULL, which leads to the following warning call trace: WARNING: CPU: 22 PID: 1037 at block/bdev.c:577 bd_prepare_to_claim+0x131/0x150 CPU: 22 PID: 1037 Comm: mdadm Not tainted 6.2.0-rc3+ #69 .. RIP: 0010:bd_prepare_to_claim+0x131/0x150 .. Call Trace: <TASK> ? _raw_spin_unlock+0x15/0x30 ? iput+0x6a/0x220 blkdev_get_by_dev.part.0+0x4b/0x300 md_import_device+0x126/0x1d0 new_dev_store+0x184/0x240 md_attr_store+0x80/0xf0 kernfs_fop_write_iter+0x128/0x1c0 vfs_write+0x2be/0x3c0 ksys_write+0x5f/0xe0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc It turns out the md device cannot be used: md: could not open device unknown-block(259,0). md: md127 stopped. Fix the issue by declaring the local static variable of struct type and passing the pointer of the variable to blkdev_get_by_dev(). Fixes: fb541ca4 ("md: remove lock_bdev / unlock_bdev") Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>
-
git://git.infradead.org/nvmeJens Axboe authored
Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.2 - Identify quirks for Apple controllers (Hector Martin) - fix error handling in nvme_pci_enable (Tong Zhang) - refuse unprivileged passthrough on partitions (Christoph Hellwig) - fix MAINTAINERS to not match nvmem subsystem headers (Russell King)" * tag 'nvme-6.2-2023-01-12' of git://git.infradead.org/nvme: MAINTAINERS: stop nvme matching for nvmem files nvme: don't allow unprivileged passthrough on partitions nvme: replace the "bool vec" arguments with flags in the ioctl path nvme: remove __nvme_ioctl nvme-pci: fix error handling in nvme_pci_enable() nvme-pci: add NVME_QUIRK_IDENTIFY_CNS quirk to Apple T2 controllers nvme-apple: add NVME_QUIRK_IDENTIFY_CNS quirk to fix regression
-
- 10 Jan, 2023 7 commits
-
-
Russell King (Oracle) authored
The nvme patterns detect all include files starting with nvme, which also picks up the nvmem subsystem header files. Fix this by using a more specific pattern. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [hch: switched to a purely inclusive pattern instead of excluding nvmem*] Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Christoph Hellwig authored
Passthrough commands can always access the entire device, and thus submitting them on partitions is an privelege escalation. In hindsight we should have never allowed any passthrough commands on partitions, but it's probably too late to change that decision now. Fixes: e4fbcf32 ("nvme: identify-namespace without CAP_SYS_ADMIN") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-
Christoph Hellwig authored
To prepare for passing down more information, replace the boolean vec argument with a more extensible flags one. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-
Christoph Hellwig authored
Open code __nvme_ioctl in the two callers to make future changes that pass down additional paramters in the ioctl path easier. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-
Tong Zhang authored
There are two issues in nvme_pci_enable(): 1) If pci_alloc_irq_vectors() fails, device is left enabled. Fix this by adding a goto disable statement. 2) nvme_pci_configure_admin_queue could return -ENODEV, in this case, we will need to free IRQ properly. Otherwise the following warning could be triggered: [ 5.286752] WARNING: CPU: 0 PID: 33 at kernel/irq/irqdomain.c:253 irq_domain_remove+0x12d/0x140 [ 5.290547] Call Trace: [ 5.290626] <TASK> [ 5.290695] msi_remove_device_irq_domain+0xc9/0xf0 [ 5.290843] msi_device_data_release+0x15/0x80 [ 5.290978] release_nodes+0x58/0x90 [ 5.293788] WARNING: CPU: 0 PID: 33 at kernel/irq/msi.c:276 msi_device_data_release+0x76/0x80 [ 5.297573] Call Trace: [ 5.297651] <TASK> [ 5.297719] release_nodes+0x58/0x90 [ 5.297831] devres_release_all+0xef/0x140 [ 5.298339] device_unbind_cleanup+0x11/0xc0 [ 5.298479] really_probe+0x296/0x320 Fixes: a6ee7f19 ("nvme-pci: call nvme_pci_configure_admin_queue from nvme_pci_enable") Co-developed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Tong Zhang <ztong0001@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Hector Martin authored
This mirrors the quirk added to Apple Silicon controllers in apple.c. These controllers do not support the Active NS ID List command and behave identically to the SoC version judging by existing user reports/syslogs, so will need the same fix. This quirk reverts back to NVMe 1.0 behavior and disables the broken commands. Fixes: 811f4de0 ("nvme: avoid fallback to sequential scan due to transient issues") Signed-off-by: Hector Martin <marcan@marcan.st> Tested-by: Orlando Chamberlain <orlandoch.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Hector Martin authored
From the get-go, this driver and the ANS syslog have been complaining about namespace identification. In 6.2-rc1, commit 811f4de0 ("nvme: avoid fallback to sequential scan due to transient issues") regressed the driver by no longer allowing fallback to sequential namespace scans, leaving us with no namespaces. It turns out that the real problem is that this controller claiming NVMe 1.1 compat is treating the CNS field as a binary field, as in NVMe 1.0. This already has a quirk, NVME_QUIRK_IDENTIFY_CNS, so set it for the controller to fix all this nonsense (including other errors triggered by other CNS commands). Fixes: 811f4de0 ("nvme: avoid fallback to sequential scan due to transient issues") Fixes: 5bd2927a ("nvme-apple: Add initial Apple SoC NVMe driver") Signed-off-by: Hector Martin <marcan@marcan.st> Reviewed-by: Sven Peter <sven@svenpeter.dev> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
- 09 Jan, 2023 1 commit
-
-
Tejun Heo authored
Dan reports the following smatch detected the following: block/blk-cgroup.c:1863 blkcg_schedule_throttle() warn: sleeping in atomic context caused by blkcg_schedule_throttle() calling blk_put_queue() in an non-sleepable context. blk_put_queue() acquired might_sleep() in 63f93fd6 ("block: mark blk_put_queue as potentially blocking") which transferred the might_sleep() from blk_free_queue(). blk_free_queue() acquired might_sleep() in e8c7d14a ("block: revert back to synchronous request_queue removal") while turning request_queue removal synchronous. However, this isn't necessary as nothing in the free path actually requires sleeping. It's pretty unusual to require a sleeping context in a put operation and it's not needed in the first place. Let's drop it. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lkml.kernel.org/r/Y7g3L6fntnTtOm63@kili Cc: Christoph Hellwig <hch@lst.de> Cc: Luis Chamberlain <mcgrof@kernel.org> Fixes: e8c7d14a ("block: revert back to synchronous request_queue removal") # v5.9+ Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/Y7iFwjN+XzWvLv3y@slm.duckdns.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 05 Jan, 2023 1 commit
-
-
Paul E. McKenney authored
Now that the SRCU Kconfig option is unconditionally selected, there is no longer any point in selecting it. Therefore, remove the "select SRCU" Kconfig statements. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 04 Jan, 2023 6 commits
-
-
Jens Axboe authored
This reverts commit f40eb998. There are apparently still users out there of this driver. While we'd love to remove it to ease the maintenance burden, let's reinstate it for now until better (userspace) solutions can be developed. Link: https://lore.kernel.org/lkml/20230104190115.ceglfefco475ev6c@pali/Reported-by: Pali Rohár <pali@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This reverts commit 85d6ce58. We're reinstating the pktcdvd driver, which needs this API. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This reverts commit db1c7d77. We're reinstating the pktcdvd driver, which needs this API. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Most of control command handlers may sleep, so return -EAGAIN in case of IO_URING_F_NONBLOCK to defer the handling into io wq context. Fixes: 71f28f31 ("ublk_drv: add io_uring based userspace block driver") Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230104133235.836536-1-ming.lei@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
If we split a bio marked with REQ_NOWAIT, then we can trigger spurious EAGAIN if constituent parts of that split bio end up failing request allocations. Parts will complete just fine, but just a single failure in one of the chained bios will yield an EAGAIN final result for the parent bio. Return EAGAIN early if we end up needing to split such a bio, which allows for saner recovery handling. Cc: stable@vger.kernel.org # 5.15+ Link: https://github.com/axboe/liburing/issues/766Reported-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This can't happen right now, but in preparation for allowing bio_split_to_limits() returning NULL if it ended the bio, check for it in all the callers. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 29 Dec, 2022 1 commit
-
-
git://git.infradead.org/nvmeJens Axboe authored
Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.2 - fix various problems in handling the Command Supported and Effects log (Christoph Hellwig) - don't allow unprivileged passthrough of commands that don't transfer data but modify logical block content (Christoph Hellwig) - add a features and quirks policy document (Christoph Hellwig) - fix some really nasty code that was correct but made smatch complain (Sagi Grimberg)" * tag 'nvme-6.2-2022-12-29' of git://git.infradead.org/nvme: nvme-auth: fix smatch warning complaints nvme: consult the CSE log page for unprivileged passthrough nvme: also return I/O command effects from nvme_command_effects nvmet: don't defer passthrough commands with trivial effects to the workqueue nvmet: set the LBCC bit for commands that modify data nvmet: use NVME_CMD_EFFECTS_CSUPP instead of open coding it nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definition docs, nvme: add a feature and quirk policy document
-
- 28 Dec, 2022 8 commits
-
-
Sagi Grimberg authored
When initializing auth context, there may be no secrets passed by the user. Make return code explicit when returning successfully. smatch warnings: drivers/nvme/host/auth.c:950 nvme_auth_init_ctrl() warn: missing error code? 'ret' Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Christoph Hellwig authored
Commands like Write Zeros can change the contents of a namespaces without actually transferring data. To protect against this, check the Commands Supported and Effects log is supported by the controller for any unprivileg command passthrough and refuse unprivileged passthrough if the command has any effects that can change data or metadata. Note: While the Commands Support and Effects log page has only been mandatory since NVMe 2.0, it is widely supported because Windows requires it for any command passthrough from userspace. Fixes: e4fbcf32 ("nvme: identify-namespace without CAP_SYS_ADMIN") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
-
Christoph Hellwig authored
To be able to use the Commands Supported and Effects Log for allowing unprivileged passtrough, it needs to be corretly reported for I/O commands as well. Return the I/O command effects from nvme_command_effects, and also add a default list of effects for the NVM command set. For other command sets, the Commands Supported and Effects log is required to be present already. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
-
Christoph Hellwig authored
Mask out the "Command Supported" and "Logical Block Content Change" bits and only defer execution of commands that have non-trivial effects to the workqueue for synchronous execution. This allows to execute admin commands asynchronously on controllers that provide a Command Supported and Effects log page, and will keep allowing to execute Write commands asynchronously once command effects on I/O commands are taken into account. Fixes: c1fef73f ("nvmet: add passthru code to process commands") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
-
Christoph Hellwig authored
Write, Write Zeroes, Zone append and a Zone Reset through Zone Management Send modify the logical block content of a namespace, so make sure the LBCC bit is reported for them. Fixes: b5d0b38c0475 ("nvmet: add Command Set Identifier support") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-
Christoph Hellwig authored
Use NVME_CMD_EFFECTS_CSUPP instead of open coding it and assign a single value to multiple array entries instead of repeated assignments. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-
Christoph Hellwig authored
3 << 16 does not generate the correct mask for bits 16, 17 and 18. Use the GENMASK macro to generate the correct mask instead. Fixes: 84fef62d ("nvme: check admin passthru command effects") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
-
Christoph Hellwig authored
This adds a document about what specification features are supported by the Linux NVMe driver, and what qualifies for a quirk if an implementation has problems following the specification. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
-
- 26 Dec, 2022 3 commits
-
-
Christoph Hellwig authored
Update the core sqsize field in addition to the PCIe-specific q_depth field as the core tagset allocation helpers rely on it. Fixes: 0da7feaa ("nvme-pci: use the tagset alloc/free helpers") Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Hugh Dickins <hughd@google.com> Link: https://lore.kernel.org/r/20221225103234.226794-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
While the CAP.MQES field in NVMe is a 0s based filed with a natural one off, we also need to account for the queue wrap condition and fix undo the one off again in nvme_alloc_io_tag_set. This was never properly done by the fabrics drivers, but they don't seem to care because there is no actual physical queue that can wrap around, but it became a problem when converting over the PCIe driver. Also add back the BLK_MQ_MAX_DEPTH check that was lost in the same commit. Fixes: 0da7feaa ("nvme-pci: use the tagset alloc/free helpers") Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Hugh Dickins <hughd@google.com> Link: https://lore.kernel.org/r/20221225103234.226794-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
Commit 64dc8c73 ("block, bfq: fix possible uaf for 'bfqq->bic'") will access 'bic->bfqq' in bic_set_bfqq(), however, bfq_exit_icq_bfqq() can free bfqq first, and then call bic_set_bfqq(), which will cause uaf. Fix the problem by moving bfq_exit_bfqq() behind bic_set_bfqq(). Fixes: 64dc8c73 ("block, bfq: fix possible uaf for 'bfqq->bic'") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20221226030605.1437081-1-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 22 Dec, 2022 2 commits
-
-
git://git.infradead.org/nvmeJens Axboe authored
Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.2 - fix doorbell buffer value endianness (Klaus Jensen) - fix Linux vs NVMe page size mismatch (Keith Busch) - fix a potential use memory access beyong the allocation limit (Keith Busch) - fix a multipath vs blktrace NULL pointer dereference (Yanjun Zhang)" * tag 'nvme-6.2-2022-12-22' of git://git.infradead.org/nvme: nvme: fix multipath crash caused by flush request when blktrace is enabled nvme-pci: fix page size checks nvme-pci: fix mempool alloc size nvme-pci: fix doorbell buffer value endianness
-
Yanjun Zhang authored
The flush request initialized by blk_kick_flush has NULL bio, and it may be dealt with nvme_end_req during io completion. When blktrace is enabled, nvme_trace_bio_complete with multipath activated trying to access NULL pointer bio from flush request results in the following crash: [ 2517.831677] BUG: kernel NULL pointer dereference, address: 000000000000001a [ 2517.835213] #PF: supervisor read access in kernel mode [ 2517.838724] #PF: error_code(0x0000) - not-present page [ 2517.842222] PGD 7b2d51067 P4D 0 [ 2517.845684] Oops: 0000 [#1] SMP NOPTI [ 2517.849125] CPU: 2 PID: 732 Comm: kworker/2:1H Kdump: loaded Tainted: G S 5.15.67-0.cl9.x86_64 #1 [ 2517.852723] Hardware name: XFUSION 2288H V6/BC13MBSBC, BIOS 1.13 07/27/2022 [ 2517.856358] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp] [ 2517.859993] RIP: 0010:blk_add_trace_bio_complete+0x6/0x30 [ 2517.863628] Code: 1f 44 00 00 48 8b 46 08 31 c9 ba 04 00 10 00 48 8b 80 50 03 00 00 48 8b 78 50 e9 e5 fe ff ff 0f 1f 44 00 00 41 54 49 89 f4 55 <0f> b6 7a 1a 48 89 d5 e8 3e 1c 2b 00 48 89 ee 4c 89 e7 5d 89 c1 ba [ 2517.871269] RSP: 0018:ff7f6a008d9dbcd0 EFLAGS: 00010286 [ 2517.875081] RAX: ff3d5b4be00b1d50 RBX: 0000000002040002 RCX: ff3d5b0a270f2000 [ 2517.878966] RDX: 0000000000000000 RSI: ff3d5b0b021fb9f8 RDI: 0000000000000000 [ 2517.882849] RBP: ff3d5b0b96a6fa00 R08: 0000000000000001 R09: 0000000000000000 [ 2517.886718] R10: 000000000000000c R11: 000000000000000c R12: ff3d5b0b021fb9f8 [ 2517.890575] R13: 0000000002000000 R14: ff3d5b0b021fb1b0 R15: 0000000000000018 [ 2517.894434] FS: 0000000000000000(0000) GS:ff3d5b42bfc80000(0000) knlGS:0000000000000000 [ 2517.898299] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2517.902157] CR2: 000000000000001a CR3: 00000004f023e005 CR4: 0000000000771ee0 [ 2517.906053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2517.909930] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2517.913761] PKRU: 55555554 [ 2517.917558] Call Trace: [ 2517.921294] <TASK> [ 2517.924982] nvme_complete_rq+0x1c3/0x1e0 [nvme_core] [ 2517.928715] nvme_tcp_recv_pdu+0x4d7/0x540 [nvme_tcp] [ 2517.932442] nvme_tcp_recv_skb+0x4f/0x240 [nvme_tcp] [ 2517.936137] ? nvme_tcp_recv_pdu+0x540/0x540 [nvme_tcp] [ 2517.939830] tcp_read_sock+0x9c/0x260 [ 2517.943486] nvme_tcp_try_recv+0x65/0xa0 [nvme_tcp] [ 2517.947173] nvme_tcp_io_work+0x64/0x90 [nvme_tcp] [ 2517.950834] process_one_work+0x1e8/0x390 [ 2517.954473] worker_thread+0x53/0x3c0 [ 2517.958069] ? process_one_work+0x390/0x390 [ 2517.961655] kthread+0x10c/0x130 [ 2517.965211] ? set_kthread_struct+0x40/0x40 [ 2517.968760] ret_from_fork+0x1f/0x30 [ 2517.972285] </TASK> To avoid this situation, add a NULL check for req->bio before calling trace_block_bio_complete. Signed-off-by: Yanjun Zhang <zhangyanjun@cestc.cn> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
- 21 Dec, 2022 3 commits
-
-
Keith Busch authored
The size allocated out of the dma pool is at most NVME_CTRL_PAGE_SIZE, which may be smaller than the PAGE_SIZE. Fixes: c61b82c7 ("nvme-pci: fix PRP pool size") Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Keith Busch authored
Convert the max size to bytes to match the units of the divisor that calculates the worst-case number of PRP entries. The result is used to determine how many PRP Lists are required. The code was previously rounding this to 1 list, but we can require 2 in the worst case. In that scenario, the driver would corrupt memory beyond the size provided by the mempool. While unlikely to occur (you'd need a 4MB in exactly 127 phys segments on a queue that doesn't support SGLs), this memory corruption has been observed by kfence. Cc: Jens Axboe <axboe@kernel.dk> Fixes: 943e942e ("nvme-pci: limit max IO size and segments to avoid high order allocations") Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Klaus Jensen authored
When using shadow doorbells, the event index and the doorbell values are written to host memory. Prior to this patch, the values written would erroneously be written in host endianness. This causes trouble on big-endian platforms. Fix this by adding missing endian conversions. This issue was noticed by Guenter while testing various big-endian platforms under QEMU[1]. A similar fix required for hw/nvme in QEMU is up for review as well[2]. [1]: https://lore.kernel.org/qemu-devel/20221209110022.GA3396194@roeck-us.net/ [2]: https://lore.kernel.org/qemu-devel/20221212114409.34972-4-its@irrelevant.dk/ Fixes: f9f38e33 ("nvme: improve performance for virtual NVMe devices") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
- 16 Dec, 2022 1 commit
-
-
Jens Axboe authored
Since commit: b99182c5 ("bio: add pcpu caching for non-polling bio_put") we support bio caching for IRQ based IO as well, hence there's no need to manually clear REQ_ALLOC_CACHE if we disable polling on a request. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 15 Dec, 2022 2 commits
-
-
Ming Lei authored
For blk-mq, queue release handler is usually called after blk_mq_freeze_queue_wait() returns. However, the q_usage_counter->release() handler may not be run yet at that time, so this can cause a use-after-free. Fix the issue by moving percpu_ref_exit() into blk_free_queue_rcu(). Since ->release() is called with rcu read lock held, it is agreed that the race should be covered in caller per discussion from the two links. Reported-by: Zhang Wensheng <zhangwensheng@huaweicloud.com> Reported-by: Zhong Jinghua <zhongjinghua@huawei.com> Link: https://lore.kernel.org/linux-block/Y5prfOjyyjQKUrtH@T590/T/#u Link: https://lore.kernel.org/lkml/Y4%2FmzMd4evRg9yDi@fedora/ Cc: Hillf Danton <hdanton@sina.com> Cc: Yu Kuai <yukuai3@huawei.com> Cc: Dennis Zhou <dennis@kernel.org> Fixes: 2b0d3d3e ("percpu_ref: reduce memory footprint of percpu_ref in fast path") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20221215021629.74870-1-ming.lei@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yuwei Guan authored
The 'bfqd->num_groups_with_pending_reqs' is used when CONFIG_BFQ_GROUP_IOSCHED is enabled, so let the variables and processes take effect when CONFIG_BFQ_GROUP_IOSCHED is enabled. Cc: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20221110112622.389332-1-Yuwei.Guan@zeekrlife.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 14 Dec, 2022 3 commits
-
-
Tejun Heo authored
When a gendisk is successfully initialized but add_disk() fails such as when a loop device has invalid number of minor device numbers specified, blkcg_init_disk() is called during init and then blkcg_exit_disk() during error handling. Unfortunately, iolatency gets initialized in the former but doesn't get cleaned up in the latter. This is because, in non-error cases, the cleanup is performed by del_gendisk() calling rq_qos_exit(), the assumption being that rq_qos policies, iolatency being one of them, can only be activated once the disk is fully registered and visible. That assumption is true for wbt and iocost, but not so for iolatency as it gets initialized before add_disk() is called. It is desirable to lazy-init rq_qos policies because they are optional features and add to hot path overhead once initialized - each IO has to walk all the registered rq_qos policies. So, we want to switch iolatency to lazy init too. However, that's a bigger change. As a fix for the immediate problem, let's just add an extra call to rq_qos_exit() in blkcg_exit_disk(). This is safe because duplicate calls to rq_qos_exit() become noop's. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: darklight2357@icloud.com Cc: Josef Bacik <josef@toxicpanda.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Fixes: d7067512 ("block: introduce blk-iolatency io controller") Cc: stable@vger.kernel.org # v4.19+ Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/Y5TQ5gm3O4HXrXR3@slm.duckdns.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Isaac J. Manjarres authored
Currently, the max_loop commandline argument can be used to specify how many loop block devices are created at init time. If it is not specified on the commandline, CONFIG_BLK_DEV_LOOP_MIN_COUNT loop block devices will be created. The max_loop commandline argument can be used to override the value of CONFIG_BLK_DEV_LOOP_MIN_COUNT. However, when max_loop is set to 0 through the commandline, the current logic treats it as if it had not been set, and creates CONFIG_BLK_DEV_LOOP_MIN_COUNT devices anyway. Fix this by starting max_loop off as set to CONFIG_BLK_DEV_LOOP_MIN_COUNT. This preserves the intended behavior of creating CONFIG_BLK_DEV_LOOP_MIN_COUNT loop block devices if the max_loop commandline parameter is not specified, and allowing max_loop to be respected for all values, including 0. This allows environments that can create all of their required loop block devices on demand to not have to unnecessarily preallocate loop block devices. Fixes: 73285082 ("remove artificial software max_loop limit") Cc: stable@vger.kernel.org Cc: Ken Chen <kenchen@google.com> Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> Link: https://lore.kernel.org/r/20221208212902.765781-1-isaacmanjarres@google.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jiri Slaby (SUSE) authored
Since gcc13, each member of an enum has the same type as the enum [1]. And that is inherited from its members. Provided: VTIME_PER_SEC_SHIFT = 37, VTIME_PER_SEC = 1LLU << VTIME_PER_SEC_SHIFT, ... AUTOP_CYCLE_NSEC = 10LLU * NSEC_PER_SEC, the named type is unsigned long. This generates warnings with gcc-13: block/blk-iocost.c: In function 'ioc_weight_prfill': block/blk-iocost.c:3037:37: error: format '%u' expects argument of type 'unsigned int', but argument 4 has type 'long unsigned int' block/blk-iocost.c: In function 'ioc_weight_show': block/blk-iocost.c:3047:34: error: format '%u' expects argument of type 'unsigned int', but argument 3 has type 'long unsigned int' So split the anonymous enum with large values to a separate enum, so that they don't affect other members. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36113 Cc: Martin Liska <mliska@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: cgroups@vger.kernel.org Cc: linux-block@vger.kernel.org Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20221213120826.17446-1-jirislaby@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-