- 18 Dec, 2018 14 commits
-
-
Sagi Grimberg authored
When passed with nr_poll_queues setup additional queues with cq polling context IB_POLL_DIRECT (no interrupts) and make sure to set QUEUE_FLAG_POLL on the connect_q. In addition add the third queue mapping for polling queues. nvmf connect on this queue is polled for like all other requests so make nvmf_connect_io_queue poll for polling queues. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Sagi Grimberg authored
This argument will specify how many polling I/O queues to connect when creating the controller. These I/O queues will host I/O that is set with REQ_HIPRI. Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Sagi Grimberg authored
Preparation for polling support for fabrics. Polling support means that our completion queues are not generating any interrupts which means we need to poll for the nvmf io queue connect as well. Reviewed by Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Sagi Grimberg authored
Pass poll bool to indicate that we need it to poll. This prepares us for polling support in nvmf since connect is an I/O that will be queued and has to be polled in order to complete. If poll is passed, we call nvme_execute_rq_polled which sends the requests and polls for its completion. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Sagi Grimberg authored
block consumers will need it for polling requests that are sent with blk_execute_rq_nowait. Also, get rid of blk_tag_to_qc_t and open-code it instead. Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Colin Ian King authored
There is a spelling mistake in a dev_info message, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me>
-
Christoph Hellwig authored
By duplicating the nvme_process_cq in both branches we keep the sparse lock context checking happy, so do it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
-
Christoph Hellwig authored
The block layer now enables polling support on a queue if nr_maps includes the poll map, so we should only set that if we actually support poll queues. Fixes: 6544d229 ("block: enable polling by default if a poll map is initalized") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
-
Chaitanya Kulkarni authored
This patch defines a new macro NVMET_NO_ERROR_LOC to represent the default error location value in the nvme-error-log-page. This is a pure cleanup patch and it does not change any functionality. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Colin Ian King authored
Currently the u16 req->error_loc is being compared to -1 which will always be false. Fix this by casting -1 to u16 to fix this. Detected by clang: warning: result of comparison of constant -1 with expression of type 'u16' (aka 'unsigned short') is always false [-Wtautological-constant-out-of-range-compare] Fixes: 76574f37 ("nvmet: add interface to update error-log page") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
-
Ming Lei authored
The queue mapping of type poll only exists when set->map[HCTX_TYPE_POLL].nr_queues is bigger than zero, so enhance the constraint by checking .nr_queues of type poll before enabling IO poll. Otherwise IO race & timeout can be observed when running block/007. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
There's a single user of this function, dm, and dm just wants to check if IO is inflight, not that it's just allocated. This fixes a hang with srp/002 in blktests with dm, where it tries to suspend but waits for inflight IO to finish first. As it checks for just allocated requests, this fails. Tested-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 17 Dec, 2018 11 commits
-
-
Ming Lei authored
From 7e849dd9 ("nvme-pci: don't share queue maps"), the mapping table won't be initialized actually if map->nr_queues is zero, so we can't use blk_mq_map_queue_type() to retrieve hctx any more. This way still may cause broken mapping, fix it by skipping zero-queues maps in blk_mq_map_swqueue(). Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou authored
The blk-iolatency controller measures the time from rq_qos_throttle() to rq_qos_done_bio() and attributes this time to the first bio that needs to create the request. This means if a bio is plug-mergeable or bio-mergeable, it gets to bypass the blk-iolatency controller. The recent series [1], to tag all bios w/ blkgs undermined how iolatency was determining which bios it was charging and should process in rq_qos_done_bio(). Because all bios are being tagged, this caused the atomic_t for the struct rq_wait inflight count to underflow and result in a stall. This patch adds a new flag BIO_TRACKED to let controllers know that a bio is going through the rq_qos path. blk-iolatency now checks if this flag is set to see if it should process the bio in rq_qos_done_bio(). Overloading BLK_QUEUE_ENTERED works, but makes the flag rules confusing. BIO_THROTTLED was another candidate, but the flag is set for all bios that have gone through blk-throttle code. Overloading a flag comes with the burden of making sure that when either implementation changes, a change in setting rules for one doesn't cause a bug in the other. So here, we unfortunately opt for adding a new flag. [1] https://lore.kernel.org/lkml/20181205171039.73066-1-dennis@kernel.org/ Fixes: 5cdf2e3f ("blkcg: associate blkg when associating a device") Signed-off-by: Dennis Zhou <dennis@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
When a request is added to rq list of sw queue(ctx), the rq may be from a different type of hctx, especially after multi queue mapping is introduced. So when dispach request from sw queue via blk_mq_flush_busy_ctxs() or blk_mq_dequeue_from_ctx(), one request belonging to other queue type of hctx can be dispatched to current hctx in case that read queue or poll queue is enabled. This patch fixes this issue by introducing per-queue-type list. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Changed by me to not use separately cacheline aligned lists, just place them all in the same cacheline where we had just the one list and lock before. Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Damien Le Moal authored
For a zoned block device using mq-deadline, if a write request for a zone is received while another write was already dispatched for the same zone, dd_dispatch_request() will return NULL and the newly inserted write request is kept in the scheduler queue waiting for the ongoing zone write to complete. With this behavior, when no other request has been dispatched, rq_list in blk_mq_sched_dispatch_requests() is empty and blk_mq_sched_mark_restart_hctx() not called. This in turn leads to __blk_mq_free_request() call of blk_mq_sched_restart() to not run the queue when the already dispatched write request completes. The newly dispatched request stays stuck in the scheduler queue until eventually another request is submitted. This problem does not affect SCSI disk as the SCSI stack handles queue restart on request completion. However, this problem is can be triggered the nullblk driver with zoned mode enabled. Fix this by always requesting a queue restart in dd_dispatch_request() if no request was dispatched while WRITE requests are queued. Fixes: 5700f691 ("mq-deadline: Introduce zone locking support") Cc: <stable@vger.kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Add missing export of blk_mq_sched_restart() Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Now that the block layer checks if a queue map has any queues inside it there is no more reason to duplicate the maps for the non-default types. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
We should check if a given queue map actually has queues enabled before dispatching to it. This allows drivers to not initialize optional but not used map types, which subsequently will allow fixing problems with queue map rebuilds for that case. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Now we only export hctx->type via sysfs, and there isn't such info in hctx entry under debugfs. We often use debugfs only to diagnose queue mapping issue, so add the support in debugfs. Queue mapping becomes a bit more complicated after multiple queue mapping is supported, we may write blktest to verify if queue mapping is valid based on blk-mq-debugfs. Given not necessary to export hctx->type twice, so remove the export from sysfs. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Type of each element in queue mapping table is 'unsigned int, intead of 'struct blk_mq_queue_map)', so fix it. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
This information is helpful to either investigate issues, or understand wbt's internal behaviour. Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
blk-mq-debugfs has been proved as very helpful for debug some tough issues, such as IO hang. We have seen blk-wbt related IO hang several times, even inside Red Hat BZ, there is such report not sovled yet, so this patch adds support debugfs on rq_qos. Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Damien Le Moal authored
Add the description of the zoned, nr_zones and chunk_sectors sysfs queue attributes to Documentation/block/queue-sysfs.txt. The description of the zoned and chunk_sector attributes are mostly copied from ABI/testing/sysfs-block (added a typo fix). While at it, also fix a typo in the description of the io_poll_delay attribute. nr_zones description is also added to ABI/testing/sysfs-block and contact email address updated for the zoned attribute. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 16 Dec, 2018 8 commits
-
-
Chengguang Xu authored
blk_mq_init_queue() will not return NULL pointer to its caller, so it's better to replace IS_ERR_OR_NULL using IS_ERR in loop_add(). If in the future things change to check NULL pointer inside loop_add(), we should return -ENOMEM as return code instead of PTR_ERR(NULL). Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Chengguang Xu authored
Add __exit annotation to cleanup helper which is only called once in the module. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
This prevents a HIPRI bio from being submitted through a stacking driver that does not support polling and thus won't poll for I/O completion. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jianchao Wang authored
Replace blk_mq_request_issue_directly with blk_mq_try_issue_directly in blk_insert_cloned_request and kill it as nobody uses it any more. Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jianchao Wang authored
It is not necessary to issue request directly with bypass 'true' in blk_mq_sched_insert_requests and handle the non-issued requests itself. Just set bypass to 'false' and let blk_mq_try_issue_directly handle them totally. Remove the blk_rq_can_direct_dispatch check, because blk_mq_try_issue_directly can handle it well.If request is direct-issued unsuccessfully, insert the reset. Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jianchao Wang authored
Merge blk_mq_try_issue_directly and __blk_mq_try_issue_directly into one interface to unify the interfaces to issue requests directly. The merged interface takes over the requests totally, it could insert, end or do nothing based on the return value of .queue_rq and 'bypass' parameter. Then caller needn't any other handling any more and then code could be cleaned up. And also the commit c616cbee ( blk-mq: punt failed direct issue to dispatch list ) always inserts requests to hctx dispatch list whenever get a BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE, this is overkill and will harm the merging. We just need to do that for the requests that has been through .queue_rq. This patch also could fix this. Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 14 Dec, 2018 4 commits
-
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 13 Dec, 2018 3 commits
-
-
Sagi Grimberg authored
free the controller discard_page correctly. Fixes: cb5b7262 ("nvme: provide fallback for discard alloc failure") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
git://git.infradead.org/nvmeJens Axboe authored
Pull NVMe updates from Christoph: "Here is the second large chunk of nvme updates for 4.21: - host and target support for NVMe over TCP (Sagi Grimberg, Roy Shterman, Solganik Alexander) - error log page support in target (Chaitanya Kulkarni) plus small fixes and improvements from Jens Axboe and Chengguang Xu." * 'nvme-4.21' of git://git.infradead.org/nvme: (33 commits) nvme-rdma: support separate queue maps for read and write nvme-tcp: support separate queue maps for read and write nvme-fabrics: allow user to set nr_write_queues for separate queue maps nvme-fabrics: add missing nvmf_ctrl_options documentation blk-mq-rdma: pass in queue map to blk_mq_rdma_map_queues nvmet: update smart log with num err log entries nvmet: add error log page cmd handler nvmet: add error log support for file backend nvmet: add error log support for bdev backend nvmet: add error log support for admin-cmd nvmet: add error log support for rdma backend nvmet: add error log support for fabrics-cmd nvmet: add error log support in the core nvmet: add interface to update error-log page nvmet: add error-log definitions nvme: add error log page slot definition nvme: remove nvme_common command cdw10 array nvmet: remove unused variable nvme: provide fallback for discard alloc failure nvme: add __exit annotation ...
-
Guoju Fang authored
Sometimes flush journal may be very frequent, so it's useful to dump number of keys every time write journal. Signed-off-by: Guoju Fang <fangguoju@gmail.com> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-