- 01 Feb, 2023 10 commits
-
-
Keith Busch authored
Instead of appending command effects flags per IO, set the known effects flags the driver needs to react to just once during initial setup. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Amit Engel authored
This is due to the fact that the host is allowed to pass the controller an sgl describing a buffer that is larger than the payload itself Signed-off-by: Amit Engel <Amit.Engel@dell.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Amit Engel authored
This provides additional details about the rq/cmd that is timed out example log if CONFIG_NVME_VERBOSE_ERRORS is configured: "nvme nvme0: queue 2 timeout cid 0xd058 type 4 opc Write (0x1)" example log if CONFIG_NVME_VERBOSE_ERRORS is not configured: "nvme nvme0: queue 2 timeout cid 0xd058 type 4 opc I/O Cmd (0x1)" Signed-off-by: Amit Engel <Amit.Engel@dell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Amit Engel authored
nvme_opcode_str will handle io/admin/fabrics ops This improves NVMe errors logging Signed-off-by: Amit Engel <Amit.Engel@dell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Christoph Hellwig authored
After moving the nvme_passthru_end call to the callers of nvme_execute_passthru_rq, this function has become quite pointless, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-
Keith Busch authored
The 'struct nvme_iod' space is appended at the end of the preallocated 'struct request', and padded to the cache line size. This leaves some free memory (in most kernel configs) up for grabs. Instead of appending the nvme data descriptor addresses after the scatterlist, inline these for free within struct nvme_iod. There is now enough space in the mempool for 128 possibe segments. And without increasing the size of the preallocated requests, we can hold up to 5 PRP descriptor elements, allowing the driver to increase its max transfer size to 8MB. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Keith Busch authored
The driver uses the dma entries for setting up its command's SGL/PRP lists. The dma mapping might have fewer entries than the physical segments, so check the dma mapped count to determine which nvme data layout method is more optimal. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Keith Busch authored
The max segments this driver can see is 127, well below the 256 threshold needed to add an nvme sgl segment descriptor. Remove all the useless checks and dead code. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Hannes Reinecke authored
NVMe status codes are part of the wire protocol, and shouldn't be fabricated in the stack. So with this patch the authentication code is switched over to use error codes; as a side effect authentication failures due to internal error won't be retried anymore. But that shouldn't have happened anyway. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Hannes Reinecke authored
The NVMe 2.0 spec defines the ATR and ASCR bits in the AUTHREQ connect response field to be mutually exclusive. So to clarify the handling here switch the AUTHREQ handling to use the bit definitions and check for both bits. And while we're at it, add a message to the user that secure concatenation is not supported (yet). Suggested-by: Mark Lehrer <mark.lehrer@wdc.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
- 31 Jan, 2023 2 commits
-
-
Matthew Wilcox authored
This was originally added for the definition of nth_page(), but we no longer use nth_page() in this header, so we can drop the heavyweight mm.h now. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20230131050132.2627124-1-willy@infradead.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Owner of one unprivileged ublk device could be one evil user, which can grant this disk's privilege to other users deliberately, and this way could be like making one trap and waiting for other users to be caught. So only owner to open unprivileged disk even though the owner grants disk privilege to other user. This way is reasonable too given anyone can create ublk disk, and no need other's grant. Reported-by: Stefan Hajnoczi <stefanha@redhat.com> Fixes: 4093cb5a ("ublk_drv: add mechanism for supporting unprivileged ublk device") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230131040446.214583-1-ming.lei@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 30 Jan, 2023 14 commits
-
-
Ulf Hansson authored
Assuming that both Kconfig options, BLK_CGROUP and IOSCHED_BFQ are set, we most likely want cgroup support for BFQ too (BFQ_GROUP_IOSCHED), so let's make it default y. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20230130121240.159456-1-ulf.hansson@linaro.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
bfqd->bfq_wr_max_time is set to 0 in bfq_init_queue and is never changed. It is only used in bfq_wr_duration when bfq_wr_max_time > 0 which never meets, so bfqd->bfq_wr_max_time is not used actually. Just remove it. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230116095153.3810101-9-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
We jump to tag only for returning current rq. Return directly to remove this tag. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20230116095153.3810101-8-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
We have already avoided a circular list in bfq_setup_merge (see comments in bfq_setup_merge() for details), so bfq_queue will not appear in it's new_bfqq list. Just remove this check. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230116095153.3810101-7-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
The async_bfqq is assigned with bfqq->bic->bfqq[0], use it directly. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230116095153.3810101-6-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Use helper macro RQ_BFQQ to get bfqq of request. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230116095153.3810101-5-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Inject limit is updated or reset when time_is_before_eq_jiffies( decrease_time_jif + several msecs) or think-time state changes. decrease_time_jif is initialized to 0 and will be set to current jiffies when inject limit is updated or reset. If the jiffies is slightly greater than LONG_MAX, time_is_after_eq_jiffies(0) will keep for a long time, so as time_is_after_eq_jiffies(decrease_time_jif + several msecs). If the think-time state never chages, then the injection will not work as expected for long time. To be more specific: Function bfq_update_inject_limit maybe triggered when jiffies pasts decrease_time_jif + msecs_to_jiffies(10) in bfq_add_request by setting bfqd->wait_dispatch to true. Function bfq_reset_inject_limit are called in two conditions: 1. jiffies pasts bfqq->decrease_time_jif + msecs_to_jiffies(1000) in function bfq_add_request. 2. jiffies pasts bfqq->decrease_time_jif + msecs_to_jiffies(100) or bfq think-time state change from short to long. Fix this by initializing bfqq->decrease_time_jif to current jiffies to trigger service injection soon when service injection conditions are met. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230116095153.3810101-4-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Parameter reason is never used, just remove it. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230116095153.3810101-3-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Function bfq_choose_bfqq_for_injection may temporarily raise inject limit to one request if current inject_limit is 0 before search of the source queue for injection. However the search below will reset inject limit to bfqd->in_service_queue which is zero for raised inject limit. Then the temporarily raised inject limit never works as expected. Assigment limit to bfqd->in_service_queue in search is needed as limit maybe overwriten to min_t(unsigned int, 1, limit) for condition that a large in-flight request is on non-rotational devices in found queue. So we need to reset limit to bfqd->in_service_queue for normal case. Actually, we have already make sure bfqd->rq_in_driver is < limit before search, then -Limit is >= 1 as bfqd->rq_in_driver is >= 0. Then min_t(unsigned int, 1, limit) is always 1. So we can simply check bfqd->rq_in_driver with 1 instead of result of min_t(unsigned int, 1, limit) for larget request in non-rotational device case to avoid overwritting limit and the bug is gone. -For normal case, we have already check bfqd->rq_in_driver is < limit, so we can return found bfqq unconditionally to remove unncessary check. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230116095153.3810101-2-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Commit 180dccb0 ("blk-mq: fix tag_get wait task can't be awakened") mentioned that in case of shared tags, there could be just one real active hctx(queue) because of lazy detection of tag idle. Then driver tag allocation may wait forever on this real active hctx(queue) if wake_batch is > hctx_max_depth where hctx_max_depth is available tags depth for the actve hctx(queue). However, the condition wake_batch > hctx_max_depth is not strong enough to avoid IO hung as the sbitmap_queue_wake_up will only wake up one wait queue for each wake_batch even though there is only one waiter in the woken wait queue. After this, there is only one tag to free and wake_batch may not be reached anymore. Commit 180dccb0 ("blk-mq: fix tag_get wait task can't be awakened") methioned that driver tag allocation may wait forever. Actually, the inactive hctx(queue) will be truely idle after at most 30 seconds and will call blk_mq_tag_wakeup_all to wake one waiter per wait queue to break the hung. But IO hung for 30 seconds is also not acceptable. Set batch size to small enough that depth of the shared hctx(queue) is enough to wake up all of the queues like sbq_calc_wake_batch do to fix this potential IO hung. Although hctx_max_depth will be clamped to at least 4 while wake_batch recalculation does not do the clamp, the wake_batch will be always recalculated to 1 when hctx_max_depth <= 4. Fixes: 180dccb0 ("blk-mq: fix tag_get wait task can't be awakened") Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20230116205059.3821738-6-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
There are three differences between __sbitmap_get and __sbitmap_get_shallow when searching free bit: 1. __sbitmap_get_shallow limit number of bit to search per word. __sbitmap_get has no such limit. 2. __sbitmap_get_shallow always searches with wrap set. __sbitmap_get set wrap according to round_robin. 3. __sbitmap_get_shallow always searches from first bit in first word. __sbitmap_get searches from first bit when round_robin is not set otherwise searches from SB_NR_TO_BIT(sb, alloc_hint). Add helper function sbitmap_find_bit function to do common search while accept "limit depth per word", "wrap flag" and "first bit to search" from caller to support the need of both __sbitmap_get and __sbitmap_get_shallow. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20230116205059.3821738-5-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Rewrite sbitmap_find_bit_in_index as following: 1. Rename sbitmap_find_bit_in_index to sbitmap_find_bit_in_word 2. Accept "struct sbitmap_word *" directly instead of accepting "struct sbitmap *" and "int index" to get "struct sbitmap_word *". 3. Accept depth/shallow_depth and wrap for __sbitmap_get_word from caller to support need of both __sbitmap_get_shallow and __sbitmap_get. With helper function sbitmap_find_bit_in_word, we can remove repeat code in __sbitmap_get_shallow to find bit considring deferred clear. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20230116205059.3821738-4-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Commit fbb564a5 ("lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()") mentioned that "Checking free bits when setting the target bits. Otherwise, it may reuse the busying bits." This commit add check to make sure all masked bits in word before cmpxchg is zero. Then the existing check after cmpxchg to check any zero bit is existing in masked bits in word is redundant. Actually, old value of word before cmpxchg is stored in val and we will filter out busy bits in val by "(get_mask & ~val)" after cmpxchg. So we will not reuse busy bits methioned in commit fbb564a5 ("lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()"). Revert new-added check to remove redundant check. Fixes: fbb564a5 ("lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()") Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20230116205059.3821738-3-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Updates to alloc_hint in the loop in __sbitmap_get_shallow() are mostly pointless and equivalent to setting alloc_hint to zero (because SB_NR_TO_BIT() considers only low sb->shift bits from alloc_hint). So simplify the logic. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://lore.kernel.org/r/20230116205059.3821738-2-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
- 29 Jan, 2023 14 commits
-
-
Yu Kuai authored
Currently parent pd can be freed before child pd: t1: remove cgroup C1 blkcg_destroy_blkgs blkg_destroy list_del_init(&blkg->q_node) // remove blkg from queue list percpu_ref_kill(&blkg->refcnt) blkg_release call_rcu t2: from t1 __blkg_release blkg_free schedule_work t4: deactivate policy blkcg_deactivate_policy pd_free_fn // parent of C1 is freed first t3: from t2 blkg_free_workfn pd_free_fn If policy(for example, ioc_timer_fn() from iocost) access parent pd from child pd after pd_offline_fn(), then UAF can be triggered. Fix the problem by delaying 'list_del_init(&blkg->q_node)' from blkg_destroy() to blkg_free_workfn(), and using a new disk level mutex to synchronize blkg_free_workfn() and blkcg_deactivate_policy(). Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230119110350.2287325-4-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
A new field 'online' is added to blkg_policy_data to fix following 2 problem: 1) In blkcg_activate_policy(), if pd_alloc_fn() with 'GFP_NOWAIT' failed, 'queue_lock' will be dropped and pd_alloc_fn() will try again without 'GFP_NOWAIT'. In the meantime, remove cgroup can race with it, and pd_offline_fn() will be called without pd_init_fn() and pd_online_fn(). This way null-ptr-deference can be triggered. 2) In order to synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy(), 'list_del_init(&blkg->q_node)' will be delayed to blkg_free_workfn(), hence pd_offline_fn() can be called first in blkg_destroy(), and then blkcg_deactivate_policy() will call it again, we must prevent it. The new field 'online' will be set after pd_online_fn() and will be cleared after pd_offline_fn(), in the meantime pd_offline_fn() will only be called if 'online' is set. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230119110350.2287325-3-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
Some cgroup policies will access parent pd through child pd even after pd_offline_fn() is done. If pd_free_fn() for parent is called before child, then UAF can be triggered. Hence it's better to guarantee the order of pd_free_fn(). Currently refcount of parent blkg is dropped in __blkg_release(), which is before pd_free_fn() is called in blkg_free_work_fn() while blkg_free_work_fn() is called asynchronously. This patch make sure pd_free_fn() called from removing cgroup is ordered by delaying dropping parent refcount after calling pd_free_fn() for child. BTW, pd_free_fn() will also be called from blkcg_deactivate_policy() from deleting device, and following patches will guarantee the order. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230119110350.2287325-2-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Zhong Jinghua authored
We found that the blk_mq_hw_sysfs_store interface has no place to use. The object default_hw_ctx_attrs using blk_mq_hw_sysfs_ops only uses the show method and does not use the store method. Since this patch: 4a46f05e ("blk-mq: move hctx and ctx counters from sysfs to debugfs") moved the store method to debugfs, the store method is not used anymore. So let me do some tiny work to clean up unused code. Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230128030419.2780298-1-zhongjinghua@huawei.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
s390 iterates over the bio using bio_for_each_segment and doesn't need any bio splitting. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/r/20230123075356.60847-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
ps3vram iterates over the bio one segment, that is page aligned and max page sized chunk, a time. Because of that there is no point in calling bio_split_to_limits, or explicitly setting the default limits that are only used by bio_split_to_limits. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Geoff Levand <geoff@infradead.org> Link: https://lore.kernel.org/r/20230123074718.57951-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
We ran into an issue where a production workload would randomly grind to a halt and not continue until the pending IO had timed out. This turned out to be a complicated interaction between queue freezing and polled IO: 1) You have an application that does polled IO. At any point in time, there may be polled IO pending. 2) You have a monitoring application that issues a passthrough command, which is marked with side effects such that it needs to freeze the queue. 3) Passthrough command is started, which calls blk_freeze_queue_start() on the device. At this point the queue is marked frozen, and any attempt to enter the queue will fail (for non-blocking) or block. 4) Now the driver calls blk_mq_freeze_queue_wait(), which will return when the queue is quiesced and pending IO has completed. 5) The pending IO is polled IO, but any attempt to poll IO through the normal iocb_bio_iopoll() -> bio_poll() will fail when it gets to bio_queue_enter() as the queue is frozen. Rather than poll and complete IO, the polling threads will sit in a tight loop attempting to poll, but failing to enter the queue to do so. The end result is that progress for either application will be stalled until all pending polled IO has timed out. This causes obvious huge latency issues for the application doing polled IO, but also long delays for passthrough command. Fix this by treating queue enter for polled IO just like we do for timeouts. This allows quick quiesce of the queue as we still poll and complete this IO, while still disallowing queueing up new IO. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Li Nan authored
vrate_min is calculated by DIV64_U64_ROUND_UP, but vrate_max is calculated by div64_u64. Vrate_min may be 1 greater than vrate_max if the input values min and max of cost.qos are equal. Signed-off-by: Li Nan <linan122@huawei.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230117070806.3857142-6-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Li Nan authored
echo max of u64 to cost.model can cause divide by 0 error. # echo 8:0 rbps=18446744073709551615 > /sys/fs/cgroup/io.cost.model divide error: 0000 [#1] PREEMPT SMP RIP: 0010:calc_lcoefs+0x4c/0xc0 Call Trace: <TASK> ioc_refresh_params+0x2b3/0x4f0 ioc_cost_model_write+0x3cb/0x4c0 ? _copy_from_iter+0x6d/0x6c0 ? kernfs_fop_write_iter+0xfc/0x270 cgroup_file_write+0xa0/0x200 kernfs_fop_write_iter+0x17d/0x270 vfs_write+0x414/0x620 ksys_write+0x73/0x160 __x64_sys_write+0x1e/0x30 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd calc_lcoefs() uses the input value of cost.model in DIV_ROUND_UP_ULL, overflow would happen if bps plus IOC_PAGE_SIZE is greater than ULLONG_MAX, it can cause divide by 0 error. Fix the problem by setting basecost Signed-off-by: Li Nan <linan122@huawei.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230117070806.3857142-5-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
Otherwise, user might get abnormal values if params is updated concurrently. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230117070806.3857142-4-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
iocost is based on rq_qos, which can only work for request based device, thus it doesn't make sense to configure iocost for bio based device. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230117070806.3857142-3-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
This patch fixs that the return value of match_u64() from ioc_qos_write() is not checked, Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230117070806.3857142-2-yukuai1@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Arnd Bergmann authored
The behavior of 'enum' types has changed in gcc-13, so now the UNBUSY_THR_PCT constant is interpreted as a 64-bit number because it is defined as part of the same enum definition as some other constants that do not fit within a 32-bit integer. This in turn leads to some inefficient code on 32-bit architectures as well as a link error: arm-linux-gnueabi/bin/arm-linux-gnueabi-ld: block/blk-iocost.o: in function `ioc_timer_fn': blk-iocost.c:(.text+0x68e8): undefined reference to `__aeabi_uldivmod' arm-linux-gnueabi-ld: blk-iocost.c:(.text+0x6908): undefined reference to `__aeabi_uldivmod' Split the enum definition to keep the 64-bit timing constants in a separate enum type from those constants that can clearly fit within a smaller type. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230118080706.3303186-1-arnd@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Fix the following warning: Documentation/block/ublk.rst:157: WARNING: Enumerated list ends without a blank line; unexpected unindent. Documentation/block/ublk.rst:171: WARNING: Enumerated list ends without a blank line; unexpected unindent. Fixes: 56f5160bc1b8 ("ublk_drv: add mechanism for supporting unprivileged ublk device") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230118042318.127900-1-ming.lei@redhat.comSigned-off-by: Jens Axboe <axboe@kernel.dk>
-