- 09 May, 2020 40 commits
-
-
James Smart authored
The kbuild tst robot flagged the following 3 issues: Case 1) >> drivers/nvme/target/fc.c:1201:37: warning: Either the condition >> '!assoc' is redundant or there is possible null pointer dereference: >> assoc. [nullPointerRedundantCheck] >> struct nvmet_fc_tgtport *tgtport = assoc->tgtport; ^ >> drivers/nvme/target/fc.c:1853:7: note: Assuming that condition '!assoc' >> is not redundant >> if (!assoc) ^ >> drivers/nvme/target/fc.c:1850:37: note: Assignment >> 'assoc=nvmet_fc_find_target_assoc(tgtport,be64_to_cpu( >> rqst->associd.association_id))', assigned value is 0 >> assoc = nvmet_fc_find_target_assoc(tgtport, ^ >> drivers/nvme/target/fc.c:1896:31: note: Calling function >> 'nvmet_fc_delete_target_assoc', 1st argument 'assoc' value is 0 >> nvmet_fc_delete_target_assoc(assoc); ^ The tool isn't smart enough to see that line 1854 sets a ret value which thereafter causes the routine to exit. This occurs before any of the assoc references, so it is not an issue. There are 2 more reportings of this same failure. To quiet the tool - rework the if test that does the exit to also reference assoc. No change in logic otherwise. Case 2) drivers/nvme/target/fc.c:1202:29: warning: The scope of the variable 'queue' can be reduced. [variableScope] struct nvmet_fc_tgt_queue *queue; ^ The tool is requesting the variable be declared within the code block that utilizes it. Ignoring this report as existing code style is fine. Case 3) drivers/nvme/target/fc.c:1137:16: warning: Variable 'needrandom' is assigned a value that is never used. [unreadVariable] needrandom = true; ^ Another parsing issue with the tool. Given that parens were not used with the list_for_each_entry() check, it inadvertantly thinks the break exited the outer while loop not the inner for loop. This is not an error. But, added parens to the inner list_for_each_entry() to quiet the tool and as it is better coding style. -- james Signed-off-by: James Smart <jsmart2021@gmail.com> Reported-by: kbuild test robot <lkp@intel.com> CC: kbuild test robot <lkp@intel.com> CC: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Max Gurtovoy authored
In order to save resource allocation and utilize the completion locality in a better way (compared to SRQ per device that exist today), allocate Shared Receive Queues (SRQs) per completion vector. Associate each created QP/CQ with an appropriate SRQ according to the queue index. This association will reduce the lock contention in the fast path (compared to SRQ per device solution) and increase the locality in memory buffers. Add new module parameter for SRQ size to adjust it according to the expected load. User should make sure the size is >= 256 to avoid lack of resources. Also reduce the debug level of "last WQE reached" event that is raised when a QP is using SRQ during destruction process to relief the log. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Keith Busch authored
nvme_alloc_ns_head() doesn't use the 'struct nvme_id_ns' parameter. Remove it, and update caller accordingly. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Keith Busch authored
Various nvme commands use a zeroes based number of dwords field. Create a helper function to convert byte lengths to this format. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
Now that common helpers exist, add the ability to Send an NVME LS Request and to Abort an outstanding LS Request to the nvmet side of the driver. Signed-off-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
As the nvmet layer does not have the concept of a remoteport object, which can be used to identify the entity on the other end of the fabric that is to receive an LS, the hosthandle was introduced. The driver passes the hosthandle, a value representative of the remote port, with a ls request receive. The LS request will create the association. The transport will remember the hosthandle for the association, and if there is a need to initiate a LS request to the remote port for the association, the hosthandle will be used. When the driver loses connectivity with the remote port, it needs to notify the transport that the hosthandle is no longer valid, allowing the transport to terminate associations related to the hosthandle. This patch adds support to the driver for the hosthandle. The driver will use the ndlp pointer of the remote port for the hosthandle in calls to nvmet_fc_rcv_ls_req(). The discovery engine is updated to invalidate the hosthandle whenever connectivity with the remote port is lost. Signed-off-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
Now that common helpers exist, add the ability to receive NVME LS requests to the driver. New requests will be delivered to the transport by nvme_fc_rcv_ls_req(). In order to complete the LS, add support for Send LS Response and send LS response completion handling to the driver. Signed-off-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
Currently, the ability to send an NVME LS response is limited to the nvmet (controller/target) side of the driver. In preparation of both the nvme and nvmet sides supporting Send LS Response, rework the existing send ls_rsp and ls_rsp completion routines such that there is common code that can be used by both sides. Signed-off-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
Send LS Abort support is needed when Send LS Request is supported. Currently, the ability to abort an NVME LS request is limited to the nvme (host) side of the driver. In preparation of both the nvme and nvmet sides supporting Send LS Abort, rework the existing ls_req abort routines such that there is common code that can be used by both sides. While refactoring it was seen the logic in the abort routine was incorrect. It attempted to abort all NVME LS's on the indicated port. As such, the routine was reworked to abort only the NVME LS request that was specified. Signed-off-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
Currently, the ability to send an NVME LS request is limited to the nvme (host) side of the driver. In preparation of both the nvme and nvmet sides support Send LS Request, rework the existing send ls_req and ls_req completion routines such that there is common code that can be used by both sides. Signed-off-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
In preparation for supporting both intiator mode and target mode receiving NVME LS's, commonize the existing NVME LS request receive handling found in the base driver and in the nvmet side. Using the original lpfc_nvmet_unsol_ls_event() and lpfc_nvme_unsol_ls_buffer() routines as a templates, commonize the reception of an NVME LS request. The common routine will validate the LS request, that it was received from a logged-in node, and allocate a lpfc_async_xchg_ctx that is used to manage the LS request. The role of the port is then inspected to determine which handler is to receive the LS - nvme or nvmet. As such, the nvmet handler is tied back in. A handler is created in nvme and is stubbed out. Signed-off-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
The last step of commonization is to remove the 'T' suffix from state and flag field definitions. This is minor, but removes the mental association that it solely applies to nvmet use. Signed-off-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
To support FC-NVME-2 support (actually FC-NVME (rev 1) with Ammendment 1), both the nvme (host) and nvmet (controller/target) sides will need to be able to receive LS requests. Currently, this support is in the nvmet side only. To prepare for both sides supporting LS receive, rename lpfc_nvmet_rcv_ctx to lpfc_async_xchg_ctx and commonize the definition. Signed-off-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
A lot of files in lpfc include nvme headers, building up relationships that require a file to change for its headers when there is no other change necessary. It would be better to localize the nvme headers. There is also no need for separate nvme (initiator) and nvmet (tgt) header files. Refactor the inclusion of nvme headers so that all nvme items are included by lpfc_nvme.h Merge lpfc_nvmet.h into lpfc_nvme.h so that there is a single header used by both the nvme and nvmet sides. This prepares for structure sharing between the two roles. Prep to add shared function prototypes for upcoming shared routines. Signed-off-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
Add support for performing LS requests from target to host. Include sending request from targetport, reception into host, host sending ls rsp. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
Currently nvmefc-loop only sends LS's from host to target. Slightly rework data structures and routine names to reflect this path. Allows a straight-forward conversion to be used by ls's from target to host. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
As part of FC-NVME-2 (and ammendment on FC-NVME), the target is to send a Disconnect LS after an association is terminated and any exchanges for the association have been ABTS'd. The target is also not to send the receipt to any Disconnect Association LS, received to initiate the association termination or received while the association is terminating, until the Disconnect LS has been transmit. Add support for sending Disconnect Association LS after all I/O's complete (which is after ABTS'd certainly). Utilizes the new LLDD api to send ls requests. There is no need to track the Disconnect LS response or to retry after timeout. All spec requirements will have been met by waiting for i/o completion to initiate the transmission. Add support for tracking the reception of Disconnect Association and defering the response transmission until after the Disconnect Association LS has been transmit. Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
In preparation to add ls request support, rename the current ls_list, which is RCV LS request only, to ls_rcv_list. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
In preparation for sending LS requests for an association that terminates, save and track the hosthandle that is part of the LS's that are received to create associations. Support consists of: - Create a hostport structure that will be 1:1 mapped to a host port handle. The hostport structure is specific to a targetport. - Whenever an association is created, create a host port for the hosthandle the Create Association LS was received from. There will be only 1 hostport structure created, with all associations that have the same hosthandle sharing the hostport structure. - When the association is terminated, the hostport reference will be removed. After the last association for the host port is removed, the hostport will be deleted. - Add support for the new nvmet_fc_invalidate_host() interface. In the past, the LLDD didn't notify loss of connectivity to host ports - the LLD would simply reject new requests and wait for the kato timeout to kill the association. Now, when host port connectivity is lost, the LLDD can notify the transport. The transport will initiate the termination of all associations for that host port. When the last association has been terminated and the hosthandle will no longer be referenced, the new host_release callback will be made to the lldd. - For compatibility with prior behavior which didn't report the hosthandle: the LLDD must set hosthandle to NULL. In these cases, not LS request will be made, and no host_release callbacks will be made either. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
While code reviewing saw a couple of items that can be cleaned up: - In nvmet_fc_delete_target_queue(), the routine unlocks, then checks and relocks. Reorganize to avoid the unlock/relock. - In nvmet_fc_delete_target_queue(), there's a check on the disconnect state that is unnecessary as the routine validates the state before starting any action. Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
Add LS reception failure messages Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
The nvme-fc host transport did not support the reception of a FC-NVME LS. Reception is necessary to implement full compliance with FC-NVME-2. Populate the LS receive handler, and specifically the handling of a Disconnect Association LS. The response to the LS, if it matched a controller, must be sent after the aborts for any I/O on any connection have been sent. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
Given that both host and target now generate and receive LS's create a single table definition for LS names. Each tranport half will have a local version of the table. Convert the target side transport to use the new common Create Association LS validation routine. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
Given that both host and target now generate and receive LS's create a single table definition for LS names. Each tranport half will have a local version of the table. As Create Association LS is issued by both sides, and received by both sides, create common routines to format the LS and to validate the LS. Convert the host side transport to use the new common Create Association LS formatting routine. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
Convert the assoc_active boolean flag to a bitop on the flags field. The bit ops will provide atomicity. To make this change, the flags field was converted to a long type, which also affects the FCCTRL_TERMIO flag. Both FCCTRL_TERMIO and now ASSOC_ACTIVE flags are set/cleared by bit operations. Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
Ensure that when allocations are done, and the lldd options indicate no private data is needed, that private pointers will be set to NULL (catches driver error that forgot to set private data size). Slightly reorg the allocations so that private data follows allocations for LS request/response buffers. Ensures better alignments for the buffers as well as the private pointer. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
Current code uses NVME_FC_MAX_LS_BUFFER_SIZE (2KB) when allocating buffers for LS requests and responses. This is considerable overkill for what is actually defined. Rework code to have unions for all possible requests and responses and size based on the unions. Remove NVME_FC_MAX_LS_BUFFER_SIZE. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
Routines in the target will want to be used in the host as well. Error definitions should now shared as both sides will process requests and responses to requests. Moved common declarations to new fc.h header kept in the host subdirectory. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
The current LLDD api has: nvme-fc: contains api for transport to do LS requests (and aborts of them). However, there is no interface for reception of LS's and sending responses for them. nvmet-fc: contains api for transport to do reception of LS's and sending of responses for them. However, there is no interface for doing LS requests. Revise the api's so that both nvme-fc and nvmet-fc can send LS's, as well as receiving LS's and sending their responses. Change name of the rcv_ls_req struct to better reflect generic use as a context to used to send an ls rsp. Specifically: nvmefc_tgt_ls_req -> nvmefc_ls_rsp nvmefc_tgt_ls_req.nvmet_fc_private -> nvmefc_ls_rsp.nvme_fc_private Change nvmet_fc_rcv_ls_req() calling sequence to provide handle that can be used by transport in later LS request sequences for an association. nvme-fc nvmet_fc nvme_fcloop: Revise to adapt to changed names in api header. Change calling sequence to nvmet_fc_rcv_ls_req() for hosthandle. Add stubs for new interfaces: host/fc.c: nvme_fc_rcv_ls_req() target/fc.c: nvmet_fc_invalidate_host() lpfc: Revise to adapt code to changed names in api header. Change calling sequence to nvmet_fc_rcv_ls_req() for hosthandle. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
A couple of minor changes occurred between 1.06 and 1.08: - Addition of NVME_SR_RSP opcode - change of SR_RSP status code 1 to Reserved Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Instead just call the CDROM layer functionality directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The name is only printed for a not registered bdi in writeback. Use the device name there as is more useful anyway for the unlike case that the warning triggers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Merge the _node vs normal version and drop the superflous gfp_t argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Split out a new bdi_set_owner helper to set the owner, and move the policy for creating the bdi name back into genhd.c, where it belongs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
bdi_register_va is only used by super.c, which can't be modular. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
All external users of device_create_vargs are gone, so remove it and open code it in the only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Weiping Zhang authored
rename blk_mq_alloc_rq_maps to blk_mq_alloc_map_and_requests, this function allocs both map and request, make function name align with funtion. Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Weiping Zhang authored
rename __blk_mq_alloc_rq_map to __blk_mq_alloc_map_and_request, actually it alloc both map and request, make function name align with function. Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Alloc new map and request for new hardware queue when increse hardware queue count. Before this patch, it will show a warning for each new hardware queue, but it's not enough, these hctx have no maps and reqeust, when a bio was mapped to these hardware queue, it will trigger kernel panic when get request from these hctx. Test environment: * A NVMe disk supports 128 io queues * 96 cpus in system A corner case can always trigger this panic, there are 96 io queues allocated for HCTX_TYPE_DEFAULT type, the corresponding kernel log: nvme nvme0: 96/0/0 default/read/poll queues. Now we set nvme write queues to 96, then nvme will alloc others(32) queues for read, but blk_mq_update_nr_hw_queues does not alloc map and request for these new added io queues. So when process read nvme disk, it will trigger kernel panic when get request from these hardware context. Reproduce script: nr=$(expr `cat /sys/block/nvme0n1/device/queue_count` - 1) echo $nr > /sys/module/nvme/parameters/write_queues echo 1 > /sys/block/nvme0n1/device/reset_controller dd if=/dev/nvme0n1 of=/dev/null bs=4K count=1 [ 8040.805626] ------------[ cut here ]------------ [ 8040.805627] WARNING: CPU: 82 PID: 12921 at block/blk-mq.c:2578 blk_mq_map_swqueue+0x2b6/0x2c0 [ 8040.805627] Modules linked in: nvme nvme_core nf_conntrack_netlink xt_addrtype br_netfilter overlay xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nft_counter nf_nat_tftp nf_conntrack_tftp nft_masq nf_tables_set nft_fib_inet nft_f ib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack tun bridge nf_defrag_ipv6 nf_defrag_ipv4 stp llc ip6_tables ip_tables nft_compat rfkill ip_set nf_tables nfne tlink sunrpc intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support ghash_clmulni_intel intel_ cstate intel_uncore raid0 joydev intel_rapl_perf ipmi_si pcspkr mei_me ioatdma sg ipmi_devintf mei i2c_i801 dca lpc_ich ipmi_msghandler acpi_power_meter acpi_pad xfs libcrc32c sd_mod ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm d rm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops [ 8040.805637] ahci drm i40e libahci crc32c_intel libata t10_pi wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nvme_core] [ 8040.805640] CPU: 82 PID: 12921 Comm: kworker/u194:2 Kdump: loaded Tainted: G W 5.6.0-rc5.78317c+ #2 [ 8040.805640] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 08/27/2019 [ 8040.805641] Workqueue: nvme-reset-wq nvme_reset_work [nvme] [ 8040.805642] RIP: 0010:blk_mq_map_swqueue+0x2b6/0x2c0 [ 8040.805643] Code: 00 00 00 00 00 41 83 c5 01 44 39 6d 50 77 b8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b bb 98 00 00 00 89 d6 e8 8c 81 03 00 eb 83 <0f> 0b e9 52 ff ff ff 0f 1f 00 0f 1f 44 00 00 41 57 48 89 f1 41 56 [ 8040.805643] RSP: 0018:ffffba590d2e7d48 EFLAGS: 00010246 [ 8040.805643] RAX: 0000000000000000 RBX: ffff9f013e1ba800 RCX: 000000000000003d [ 8040.805644] RDX: ffff9f00ffff6000 RSI: 0000000000000003 RDI: ffff9ed200246d90 [ 8040.805644] RBP: ffff9f00f6a79860 R08: 0000000000000000 R09: 000000000000003d [ 8040.805645] R10: 0000000000000001 R11: ffff9f0138c3d000 R12: ffff9f00fb3a9008 [ 8040.805645] R13: 000000000000007f R14: ffffffff96822660 R15: 000000000000005f [ 8040.805645] FS: 0000000000000000(0000) GS:ffff9f013fa80000(0000) knlGS:0000000000000000 [ 8040.805646] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8040.805646] CR2: 00007f7f397fa6f8 CR3: 0000003d8240a002 CR4: 00000000007606e0 [ 8040.805647] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8040.805647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8040.805647] PKRU: 55555554 [ 8040.805647] Call Trace: [ 8040.805649] blk_mq_update_nr_hw_queues+0x31b/0x390 [ 8040.805650] nvme_reset_work+0xb4b/0xeab [nvme] [ 8040.805651] process_one_work+0x1a7/0x370 [ 8040.805652] worker_thread+0x1c9/0x380 [ 8040.805653] ? max_active_store+0x80/0x80 [ 8040.805655] kthread+0x112/0x130 [ 8040.805656] ? __kthread_parkme+0x70/0x70 [ 8040.805657] ret_from_fork+0x35/0x40 [ 8040.805658] ---[ end trace b5f13b1e73ccb5d3 ]--- [ 8229.365135] BUG: kernel NULL pointer dereference, address: 0000000000000004 [ 8229.365165] #PF: supervisor read access in kernel mode [ 8229.365178] #PF: error_code(0x0000) - not-present page [ 8229.365191] PGD 0 P4D 0 [ 8229.365201] Oops: 0000 [#1] SMP PTI [ 8229.365212] CPU: 77 PID: 13024 Comm: dd Kdump: loaded Tainted: G W 5.6.0-rc5.78317c+ #2 [ 8229.365232] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 08/27/2019 [ 8229.365253] RIP: 0010:blk_mq_get_tag+0x227/0x250 [ 8229.365265] Code: 44 24 04 44 01 e0 48 8b 74 24 38 65 48 33 34 25 28 00 00 00 75 33 48 83 c4 40 5b 5d 41 5c 41 5d 41 5e c3 48 8d 68 10 4c 89 ef <44> 8b 60 04 48 89 ee e8 dd f9 ff ff 83 f8 ff 75 c8 e9 67 fe ff ff [ 8229.365304] RSP: 0018:ffffba590e977970 EFLAGS: 00010246 [ 8229.365317] RAX: 0000000000000000 RBX: ffff9f00f6a79860 RCX: ffffba590e977998 [ 8229.365333] RDX: 0000000000000000 RSI: ffff9f012039b140 RDI: ffffba590e977a38 [ 8229.365349] RBP: 0000000000000010 R08: ffffda58ff94e190 R09: ffffda58ff94e198 [ 8229.365365] R10: 0000000000000011 R11: ffff9f00f6a79860 R12: 0000000000000000 [ 8229.365381] R13: ffffba590e977a38 R14: ffff9f012039b140 R15: 0000000000000001 [ 8229.365397] FS: 00007f481c230580(0000) GS:ffff9f013f940000(0000) knlGS:0000000000000000 [ 8229.365415] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8229.365428] CR2: 0000000000000004 CR3: 0000005f35e26004 CR4: 00000000007606e0 [ 8229.365444] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8229.365460] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8229.365476] PKRU: 55555554 [ 8229.365484] Call Trace: [ 8229.365498] ? finish_wait+0x80/0x80 [ 8229.365512] blk_mq_get_request+0xcb/0x3f0 [ 8229.365525] blk_mq_make_request+0x143/0x5d0 [ 8229.365538] generic_make_request+0xcf/0x310 [ 8229.365553] ? scan_shadow_nodes+0x30/0x30 [ 8229.365564] submit_bio+0x3c/0x150 [ 8229.365576] mpage_readpages+0x163/0x1a0 [ 8229.365588] ? blkdev_direct_IO+0x490/0x490 [ 8229.365601] read_pages+0x6b/0x190 [ 8229.365612] __do_page_cache_readahead+0x1c1/0x1e0 [ 8229.365626] ondemand_readahead+0x182/0x2f0 [ 8229.365639] generic_file_buffered_read+0x590/0xab0 [ 8229.365655] new_sync_read+0x12a/0x1c0 [ 8229.365666] vfs_read+0x8a/0x140 [ 8229.365676] ksys_read+0x59/0xd0 [ 8229.365688] do_syscall_64+0x55/0x1d0 [ 8229.365700] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com> Tested-by: Weiping Zhang <zhangweiping@didiglobal.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Weiping Zhang authored
blk_mq_realloc_tag_set_tags will update set->nr_hw_queues, so save old set->nr_hw_queues before call this function. Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-