An error occurred fetching the project authors.
- 23 Jul, 2023 2 commits
-
-
Justin Tee authored
The ndlp kref count implementation in lpfc_dev_loss_tmo_callbk() removes the initial node reference when a vport is unloading. When lpfc_cleanup() sends a DEVICE_RM event and is in NPR state, the driver calls lpfc_drop_node(). Subsequently, lpfc_drop_node() also removes an ndlp kref thinking it is the initial reference. This unintentionally introduces an extra kref decrement on the ndlp object. Fix by using the NLP_DROPPED node flag in lpfc_dev_loss_tmo_callbk() and lpfc_drop_node() to coordinate the removal of the initial node reference. In lpfc_dev_loss_tmo_callbk(), remove the SCSI transport reference provided the node is registered in the dev_loss context because the driver cannot call the SCSI transport in dev_loss context or afterwards. And, have lpfc_drop_node() not remove a reference if another thread is acting or has already acted on it. Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230712180522.112722-6-justintee8345@gmail.comSigned-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
Conditionalize when to put an ndlp into recovery mode when processing RSCNs. As long as an ndlp state is beyond a PLOGI issue and has been mapped to a transport layer before, the ndlp qualifies to be put into recovery mode. Otherwise, treat the ndlp rport normally through the discovery engine. Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230712180522.112722-5-justintee8345@gmail.comSigned-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- 31 May, 2023 4 commits
-
-
Justin Tee authored
Pre-existing device loss recovery logic via the NLP_IN_RECOV_POST_DEV_LOSS flag only handled Fabric Port Login, Fabric Controller, Management, and Name Server addresses. Fabric domain controllers fall under the same category for usage of the NLP_IN_RECOV_POST_DEV_LOSS flag. Add a default case statement to mark an ndlp for device loss recovery. Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230523183206.7728-4-justintee8345@gmail.comAcked-by:
Martin Wilck <mwilck@suse.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
In dev_loss_tmo callback routine, we early return if the ndlp is in a state of rediscovery. This occurs when a target proactively PLOGIs or PRLIs after an RSCN before the dev_loss_tmo callback routine is scheduled to run. Move clear of the NLP_IN_DEV_LOSS flag before the ndlp state check in such cases. Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230523183206.7728-3-justintee8345@gmail.comSigned-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
Due to a target port D_ID swap, it is possible for the lpfc_register_remote_port() routine to touch post mortem fc_rport memory when trying to access fc_rport->dd_data. The D_ID swap causes a simultaneous call to lpfc_unregister_remote_port(), where fc_remote_port_delete() reclaims fc_rport memory. Remove the fc_rport->dd_data->pnode NULL assignment because the following line reassigns ndlp->rport with an fc_rport object from fc_remote_port_add() anyways. The pnode nullification is superfluous. Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230523183206.7728-2-justintee8345@gmail.comAcked-by:
Martin Wilck <mwilck@suse.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
Azeem Shaikh authored
strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). No return values were used, so direct replacement is safe. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89Signed-off-by:
Azeem Shaikh <azeemshaikh38@gmail.com> Link: https://lore.kernel.org/r/20230530155745.343032-1-azeemshaikh38@gmail.comReviewed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- 08 May, 2023 1 commit
-
-
Justin Tee authored
Smatch detected a double free path because lpfc_nlp_not_used() releases an ndlp object before reaching lpfc_nlp_put() at the end of lpfc_cmpl_els_logo_acc(). Remove the outdated lpfc_nlp_not_used() routine. In lpfc_mbx_cmpl_ns_reg_login(), replace the call with lpfc_nlp_put(). In lpfc_cmpl_els_logo_acc(), replace the call with lpfc_unreg_rpi() and keep the lpfc_nlp_put() at the end of the routine. If ndlp's rpi was registered, then lpfc_unreg_rpi()'s completion routine performs the final ndlp clean up after lpfc_nlp_put() is called from lpfc_cmpl_els_logo_acc(). Otherwise if ndlp has no rpi registered, the lpfc_nlp_put() at the end of lpfc_cmpl_els_logo_acc() is the final ndlp clean up. Fixes: 4430f7fd ("scsi: lpfc: Rework locations of ndlp reference taking") Cc: <stable@vger.kernel.org> # v5.11+ Reported-by:
Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/all/Y3OefhyyJNKH%2Fiaf@kili/Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230417191558.83100-3-justintee8345@gmail.comSigned-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- 10 Mar, 2023 2 commits
-
-
Justin Tee authored
Extended status reason code errors should mask off the IOERR_PARAM_MASK before checking strict equalities for IOERR values. Update the lpfc_error_lost_link() routine as such. Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230301231626.9621-9-justintee8345@gmail.comSigned-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
When mapped to a target with multiple virtual ports, a link bounce sometimes results in unsuccessful rediscovery of all of the target's virtual ports. This is because a succession of repeat RSCNs for the virtual target ports leaves ndlps in the REG_LOGIN state with the NLP_REG_LOGIN_SEND flag set. With NLP_REG_LOGIN_SEND set, during the next PLOGI, the driver will UNREG_RPI. When UNREG_RPI is processed, the driver can be in the middle of PRLI_ISSUE or MAPPED state resulting in an illegal state transition by the discovery engine and stalling. Fix by calling the discovery state machine with DEVICE_RECOVERY event during RSCN processing. This will set the NLP_IGNR_REG_CMPL bit and prevent the old REG_LOGIN state from advancing. Then for the new PLOGI issue, add the check for the NLP_IGNR_REG_CMPL bit to delay issuing the new PLOGI until the queued REG_LOGIN and UNREG_LOGIN have been processed. Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230301231626.9621-6-justintee8345@gmail.comSigned-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- 22 Feb, 2023 1 commit
-
-
Bo Liu authored
Remove the repeated word "the" in comments. [mkp: fixed additional typos in the changed lines] Link: https://lore.kernel.org/r/20230217083046.4090-1-liubo03@inspur.comSigned-off-by:
Bo Liu <liubo03@inspur.com> Reviewed-by:
Randy Dunlap <rdunlap@infradead.org> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- 12 Jan, 2023 2 commits
-
-
Justin Tee authored
Update copyrights to 2023 for files modified in the 14.2.0.10 patch set. Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
Justin Tee authored
With faulty cables in PT2PT topology, an unintentional ndlp double kref decrement can occur. If a FLOGI request is outstanding before the link goes down, the missing FLOGI_ACC causes an F_Port ndlp to remain in the UNUSED state. During link down, lpfc_cleanup_rpis() is called and decrements an ndlp kref. Additionally, when the driver later decides to abort the FLOGI, the FLOGI completion handler decrements the ndlp kref a second time. Remove duplicate clean up logic in lpfc_cleanup_rpis() because the updated FLOGI completion handler already handles the ndlp kref decrement. Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- 17 Nov, 2022 1 commit
-
-
Justin Tee authored
When a FLOGI completes with a sequence timeout error, a freed kref ptr dereference crash can occur due to a timing race involving ndlp referencing in lpfc_dev_loss_tmo_callbk. Fix by ensuring the driver accounts for an outstanding FLOGI when dev_loss is active. Also, don't remove the HBA_FLOGI_OUTSTANDING flag when the FLOGI is retried to allow the driver to handle the reference counts correctly in lpfc_dev_loss_tmo_handler. Reported-by:
Dietmar Hahn <dietmar.hahn@fujitsu.com> Tested-by:
Dietmar Hahn <dietmar.hahn@fujitsu.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20221116011921.105995-5-justintee8345@gmail.comSigned-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- 11 Oct, 2022 1 commit
-
-
Jason A. Donenfeld authored
Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value, simply use the get_random_{u8,u16}() functions, which are faster than wasting the additional bytes from a 32-bit value. This was done mechanically with this coccinelle script: @@ expression E; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u16; typedef __be16; typedef __le16; typedef u8; @@ ( - (get_random_u32() & 0xffff) + get_random_u16() | - (get_random_u32() & 0xff) + get_random_u8() | - (get_random_u32() % 65536) + get_random_u16() | - (get_random_u32() % 256) + get_random_u8() | - (get_random_u32() >> 16) + get_random_u16() | - (get_random_u32() >> 24) + get_random_u8() | - (u16)get_random_u32() + get_random_u16() | - (u8)get_random_u32() + get_random_u8() | - (__be16)get_random_u32() + (__be16)get_random_u16() | - (__le16)get_random_u32() + (__le16)get_random_u16() | - prandom_u32_max(65536) + get_random_u16() | - prandom_u32_max(256) + get_random_u8() | - E->inet_id = get_random_u32() + E->inet_id = get_random_u16() ) @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u16; identifier v; @@ - u16 v = get_random_u32(); + u16 v = get_random_u16(); @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u8; identifier v; @@ - u8 v = get_random_u32(); + u8 v = get_random_u8(); @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u16; u16 v; @@ - v = get_random_u32(); + v = get_random_u16(); @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u8; u8 v; @@ - v = get_random_u32(); + v = get_random_u8(); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Examine limits @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value < 256: coccinelle.RESULT = cocci.make_ident("get_random_u8") elif value < 65536: coccinelle.RESULT = cocci.make_ident("get_random_u16") else: print("Skipping large mask of %s" % (literal)) cocci.include_match(False) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; identifier add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + (RESULT() & LITERAL) Reviewed-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by:
Kees Cook <keescook@chromium.org> Reviewed-by:
Yury Norov <yury.norov@gmail.com> Acked-by:
Jakub Kicinski <kuba@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- 16 Sep, 2022 3 commits
-
-
James Smart authored
This patch fixes below Smatch reported issues: 1. lpfc_hbadisc.c:3020 lpfc_mbx_cmpl_fcf_rr_read_fcf_rec() error: uninitialized symbol 'vlan_id'. 2. lpfc_hbadisc.c:3121 lpfc_mbx_cmpl_read_fcf_rec() error: uninitialized symbol 'vlan_id'. 3. lpfc_init.c:335 lpfc_dump_wakeup_param_cmpl() warn: always true condition '(prg->dist < 4) => (0-3 < 4)' 4. lpfc_init.c:2419 lpfc_parse_vpd() warn: inconsistent indenting. 5. lpfc_init.c:13248 lpfc_sli4_enable_msi() warn: 'phba->pcidev->irq' 2147483648 can't fit into 65535 'eqhdl->irq' 6. lpfc_debugfs.c:5300 lpfc_idiag_extacc_avail_get() error: uninitialized symbol 'ext_cnt' 7. lpfc_debugfs.c:5300 lpfc_idiag_extacc_avail_get() error: uninitialized symbol 'ext_size' 8. lpfc_vmid.c:248 lpfc_vmid_get_appid() warn: sleeping in atomic context. 9. lpfc_init.c:8342 lpfc_sli4_driver_resource_setup() warn: missing error code 'rc'. 10. lpfc_init.c:13573 lpfc_sli4_hba_unset() warn: variable dereferenced before check 'phba->pport' (see line 13546) 11. lpfc_auth.c:1923 lpfc_auth_handle_dhchap_reply() error: double free of 'hash_value' Fixes: 1. Initialize vlan_id to LPFC_FCOE_NULL_VID. 2. Initialize vlan_id to LPFC_FCOE_NULL_VID. 3. prg->dist is a 2 bit field. Its value can only be between 0-3. Remove redundent check 'if (prg->dist < 4)'. 4. Fix inconsistent indenting. Moved logic into helper function lpfc_fill_vpd(). 5. Define 'eqhdl->irq' as int value as pci_irq_vector() returns int. Also, check for return value of pci_irq_vector() and log message in case of failure. 6. Initialize 'ext_cnt' to 0. 7. Initialize 'ext_size' to 0. 8. Use alloc_percpu_gfp() with GFP_ATOMIC flag. 9. 'rc' was not updated when dma_pool_create() fails. Update 'rc = -ENOMEM' when dma_pool_create() fails before calling goto statement. 10. Add check for 'phba->pport' in lpfc_cpuhp_remove(). 11. Initialize 'hash_value' to NULL, same like 'aug_chal' variable. Link: https://lore.kernel.org/r/20220911221505.117655-13-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
Firmware reports link degrade signaling via ACQES. Handlers and new additions to the SET_FEATURES mbox command are implemented so that link degrade parameters for 64GB capable links are reported through EDC ELS frames. Link: https://lore.kernel.org/r/20220911221505.117655-12-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
When a FLOGI is received before we have issued our FLOGI, the ACC response to the received FLOGI is issued with SID 2 instead of the expected fabric controller SID. Certain target vendors ignore the malformed ACC with SID 2 and wait for a properly filled ACC with a fabric controller SID. The lpfc_sli_prep_wqe() routine depends on the FC_PT2PT flag to fill in the fabric controller SID when in PT2PT mode, but due to a previous commit the flag was getting cleared. Fix by adding a check for the defer_flogi_acc flag to know whether or not to clear the FC_PT2PT flag on link up. Link: https://lore.kernel.org/r/20220911221505.117655-3-jsmart2021@gmail.com Fixes: 439b9329 ("scsi: lpfc: Fix unsolicited FLOGI receive handling during PT2PT discovery") Co-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- 01 Sep, 2022 2 commits
-
-
James Smart authored
The SANDiags feature is unused, and related code is removed. Link: https://lore.kernel.org/r/20220819011736.14141-6-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
During a stress offline/online test in PT2PT topology, target rediscovery can fail with a specific target vendor array. When the HBA transitions to online mode it is possible to receive an unsolicited FLOGI before processing the Link Up event. The received FLOGI will set the defer_flogi_acc_flag, which instructs the driver to wait until it transmits its own FLOGI before ACKing the received FLOGI. In this failure scenario, the link up processing clears the set defer_flogi_acc_flag before we have sent out the FLOGI. As the target has the higher WWPN and is responsible for sending the PLOGI, the target is stuck waiting for its FLOGI_ACC that the driver will never send. Remove the clear of defer_flogi_acc_flag from Link Up event processing. In this stress test case, the defer_flogi_acc_flag is cleared during the Link Down event processing anyways. Link: https://lore.kernel.org/r/20220819011736.14141-2-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- 07 Jul, 2022 2 commits
-
-
James Smart authored
The Menlo/Hornet adapter was never released to the field. As such, driver code specific to the adapter is unnecessary and should be removed. Link: https://lore.kernel.org/r/20220701211425.2708-11-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
The RSCN_MEMENTO logic was to workaround a target that does not register both FCP and NVMe FC4 types at the same time. This caused the configuration to not produce a second RSCN for the NVMe FC4 type registration in a timely manner. The intention of the RSCN_MEMENTO flag was to always signal to try NVMe PRLI. However, there are other FCP-only target arrays in correctly behaved configurations that reject the NVMe PRLI followed by a LOGO leading to never rediscovering the target after an issue_lip (as LOGO causes a repeat of PLOGI/PRLIs). Revert the RSCN_MEMENTO patch as it is causing correctly behaved configs to fail while it exists only to succeed on a misbehaved config. Link: https://lore.kernel.org/r/20220701211425.2708-9-jsmart2021@gmail.com Fixes: 1045592f ("scsi: lpfc: Introduce FC_RSCN_MEMENTO flag for tracking post RSCN completion") Co-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- 11 May, 2022 4 commits
-
-
James Smart authored
After a link up, it's possible for the switch to change FDMI support (e.g. FDMI1 vs FDMI2 vs SmartSAN). If the switch reverts to FDMI1, then the revert is currently not detected. Additionally, when NPIV is configured, it's possible the physical port's RHBA is unprocessed by the switch before reciept of an NPIV port issued RPRT. This causes some switches vendors to reject the NPIV's RPRT. Fix by reinitializing base FDMI mode on link up, and defer FDMI vport RPRT submission until after confirming physical port's RHBA is completed. Link: https://lore.kernel.org/r/20220506035519.50908-10-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
During large NPIV port testing, it was sometimes seen that not all vports would log back in to the target device. There are instances when the fabric is slow to respond to a spam of GID_PT requests and as a result the SLI PORT may abort the GID_PT request because the fabric takes so long. lpfc_cmpl_ct_cmd_gid_pt() would enter the lpfc_err_lost_link() logic and attempt to lpfc_els_flush_rscn(), which is fine, but forgets to decrement the gidft_inp counter. This results in a vport->gidft_inp never reaching 0 and never restarting discovery again. Decrement vport->gidft_inp if lpfc_err_lost_link() is true for both lpfc_cmpl_ct_cmd_gid_pt() and lpfc_cmpl_ct_cmd_gid_ft(). Increase logging info during RSCN timeout and lpfc_err_lost_link() events. Link: https://lore.kernel.org/r/20220506035519.50908-8-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
After running a short external loopback test, when the external loopback is removed and a normal cable inserted that is directly connected to a target device, the system oops in the llpfc_set_rrq_active() routine. When the loopback was inserted an FLOGI was transmit. As we're looped back, we receive the FLOGI request. The FLOGI is ABTS'd as we recognize the same wppn thus understand it's a loopback. However, as the ABTS sends address information the port is not set to (fffffe), the ABTS is dropped on the wire. A short 1 frame loopback test is run and completes before the ABTS times out. The looback is unplugged and the new cable plugged in, and the an FLOGI to the new device occurs and completes. Due to a mixup in ref counting the completion of the new FLOGI releases the fabric ndlp. Then the original ABTS completes and references the released ndlp generating the oops. Correct by no-op'ing the ABTS when in loopback mode (it will be dropped anyway). Added a flag to track the mode to recognize when it should be no-op'd. Link: https://lore.kernel.org/r/20220506035519.50908-5-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
During testing with repeated asynchronous resets of the target, an issue was found when the driver issues a LOGO to disconnect its login and recover all exchanges. The LOGO command takes a node reference but neglects to remove it, keeping the node reference count artifically high. Add a call to lpfc_nlp_put() to lpfc_nlp_logo_unreg() and move the mempool free call to the routine exit along with the needed put. This is always safe as this will not be the last reference removed as lpfc_unreg_rpi() ensures there is an additional reference on the ndlp. Link: https://lore.kernel.org/r/20220506035519.50908-4-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- 19 Apr, 2022 4 commits
-
-
James Smart authored
Do not rely on vendor version field of the CSPs to determine if we are in a FA-PWWN environment. Instead, use the following procedure: First, during HBA initialization, driver does a READ_CONFIG to determine if FA-PWWN is configured on the HBA. A LPFC_FAWWPN_CONFIG hba_flag is set accordingly. Next, when the link comes up before the driver gets a link up event, the firmware logs into the fabric with FA-PWWN. If the fabric port does not support FA-PWWN, the driver will get a Misconfigured FA-WWN async event before the link up. A LPFC_FAWWPN_FABRIC hba_flag will be set accordingly. Finally, if the fabric supports FA-PWWN, the firmware will replace its CSPs WWN with the Fabric Assigned ones. Then after link up, the driver will retrieve the Fabric Assigned WWN when it does a READ_SPARAM mbox command. Link: https://lore.kernel.org/r/20220412222008.126521-23-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
The intention of this patch is to refactor mailbox memory allocation and cleanup steps in one routine respectively to prevent memory leaks or memory errors related to mailbox commands. There are trivial localized fixes as well. Provide lpfc_mbox_rsrc_prep() - this routine allocates the dmabuf and the mbuf associated with it. It also catches allocation errors and returns status. Provide lpfc_mbox_rsrc_cleanup() - this routine verifies a dmabuf exists and if so releases the associated mbuf and the dmabuf memory. It then sets the ctx_buf to NULL and releases the mailbox memory to the mailbox pool. Link: https://lore.kernel.org/r/20220412222008.126521-22-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
The lpfc_iocbq data structure has void * pointers that are overloaded to be as many as 8 different data types and the driver translates the void * by casting. This patch removes the void * pointers by declaring the specific types needed by the driver. It also expands the context_un to include more seldom used pointer types to save structure bytes. It also groups the u8 types together to pack the 8 bytes needed. This work allows the lpfc_iocbq data structure to be more strongly typed and keeps it from being allocated from the 512 byte slab. [mkp: rolled in zeroday fix] Link: https://lore.kernel.org/r/20220412222008.126521-21-jsmart2021@gmail.comReported-by:
kernel test robot <lkp@intel.com> Co-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
During an NVMe target reboot, the target may initialize itself as FCP only during the first RSCN and shortly after trigger a second RSCN claiming NVMe support. The timing of these RSCNs occur before FCP-PRLI for the first RSCN completes leading discovery issues over NVMe. Change RSCN and NVME-PRLI send logic based on a new FC_RSCN_MEMENTO flag that signals when lpfc_end_rscn() is completed and serves as a memento that discovery was started from RSCN. Link: https://lore.kernel.org/r/20220412222008.126521-20-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- 30 Mar, 2022 2 commits
-
-
James Smart authored
When injecting EEH errors the port is getting hung up waiting on the node list to empty, message number 0233. The driver is stuck at this point and also can't unload. The driver makes transport remoteport delete calls which try to abort I/O's, but the EEH daemon has already called the driver to detach and the detachment has set the global FC_UNLOADING flag. There are several code paths that will avoid I/O cleanup if the FC_UNLOADING flag is set, resulting in transports waiting for I/O while the driver is waiting on transports to clean up. Additionally, during study of the list, a locking issue was found in lpfc_sli_abort_iocb_ring that could corrupt the list. A special case was added to the lpfc_cleanup() routine to call lpfc_sli_flush_rings() if the driver is FC_UNLOADING and if the pci-slot is offline (e.g. EEH). The SLI4 part of lpfc_sli_abort_iocb_ring() is changed to use the ring_lock. Also added code to cancel the I/Os if the pci-slot is offline and added checks and returns for the FC_UNLOADING and HBA_IOQ_FLUSH flags to prevent trying to send an I/O that we cannot handle. Link: https://lore.kernel.org/r/20220317032737.45308-3-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
Following EEH errors, the driver can crash or hang when deleting the localport or when attempting to unload. The EEH handlers in the driver did not notify the NVMe-FC transport before tearing the driver down. This was delayed until the resume steps. This worked for SCSI because lpfc_block_scsi() would notify the scsi_fc_transport that the target was not available but it would not clean up all the references to the ndlp. The SLI3 prep for dev reset handler did the lpfc_offline_prep() and lpfc_offline() calls to get the port stopped before restarting. The SLI4 version of the prep for dev reset just destroyed the queues and did not stop NVMe from continuing. Also because the port was not really stopped the localport destroy would hang because the transport was still waiting for I/O. Additionally, a devloss tmo can fire and post events to a stopped worker thread creating another hang condition. lpfc_sli4_prep_dev_for_reset() is modified to call lpfc_offline_prep() and lpfc_offline() rather than just lpfc_scsi_dev_block() to ensure both SCSI and NVMe transports are notified to block I/O to the driver. Logic is added to devloss handler and worker thread to clean up ndlp references and quiesce appropriately. Link: https://lore.kernel.org/r/20220317032737.45308-2-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- 15 Mar, 2022 2 commits
-
-
James Smart authored
Update copyrights to 2022 for files modified in the 14.2.0.0 patch set. Link: https://lore.kernel.org/r/20220225022308.16486-18-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
This patch refactors the remaining ELS paths to use SLI-4 as the primary interface. Paths include RRQ, RSCN, unsolicited ELS RQST and RSP paths, ELS timeouts, etc.: - Remove unused routines lpfc_sli4_bpl2sgl and lpfc_sli4_iocb2wqe - Conversion away from using SLI-3 iocb structures to set/access fields in common routines. Use the new generic get/set routines that were added. This move changes code from indirect structure references to using local variables with the generic routines. - Refactor routines when setting non-generic fields, to have both SLI3 and SLI4 specific sections. This replaces the set-as-SLI3 then translate to SLI4 behavior of the past. Link: https://lore.kernel.org/r/20220225022308.16486-12-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- 07 Dec, 2021 2 commits
-
-
James Smart authored
Extraneous teardown routines are present in the firmware dump path causing altered states in firmware captures. When a firmware dump is requested via sysfs, trigger the dump immediately without tearing down structures and changing adapter state. The driver shall rely on pre-existing firmware error state clean up handlers to restore the adapter. Link: https://lore.kernel.org/r/20211204002644.116455-6-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
The driver is calling schedule_timeout after the DA_ID nameserver request and LOGO commands are issued to the fabric by the initiator virtual endport. These fixed delay functions are causing long delays in the driver's worker thread when processing discovery I/Os in a serialized fashion, which is then triggering mailbox timeout errors artificially. To fix this, don't wait on the DA_ID request to complete and call wait_event_timeout to allow the vport delete thread to make progress on an event driven basis rather than fixing the wait time. Link: https://lore.kernel.org/r/20211204002644.116455-5-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- 21 Oct, 2021 2 commits
-
-
James Smart authored
A link bounce to a slow fabric may observe FDISC response delays lasting longer than devloss tmo. Current logic decrements the final fabric node kref during a devloss tmo event. This results in a NULL ptr dereference crash if the FDISC completes for that fabric node after devloss tmo. Fix by adding the NLP_IN_RECOV_POST_DEV_LOSS flag, which is set when devloss tmo triggers and we've noticed that fabric node recovery has already started or finished in between the time lpfc_dev_loss_tmo_callbk queues lpfc_dev_loss_tmo_handler. If fabric node recovery succeeds, then the driver reverses the devloss tmo marked kref put with a kref get. If fabric node recovery fails, then the final kref put relies on the ELS timing out or the REG_LOGIN cmpl routine. Link: https://lore.kernel.org/r/20211020211417.88754-8-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
An error is detected with the following report when unloading the driver: "KASAN: use-after-free in lpfc_unreg_rpi+0x1b1b" The NLP_REG_LOGIN_SEND nlp_flag is set in lpfc_reg_fab_ctrl_node(), but the flag is not cleared upon completion of the login. This allows a second call to lpfc_unreg_rpi() to proceed with nlp_rpi set to LPFC_RPI_ALLOW_ERROR. This results in a use after free access when used as an rpi_ids array index. Fix by clearing the NLP_REG_LOGIN_SEND nlp_flag in lpfc_mbx_cmpl_fc_reg_login(). Link: https://lore.kernel.org/r/20211020211417.88754-5-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- 15 Sep, 2021 3 commits
-
-
James Smart authored
Injecting errors on the PCI slot while the driver is handling NVMe I/O will cause crashes and hangs. There are several rather difficult scenarios occurring. The main issue is that the adapter can report a PCI error before or simultaneously to the PCI subsystem reporting the error. Both paths have different entry points and currently there is no interlock between them. Thus multiple teardown paths are competing and all heck breaks loose. Complicating things is the NVMs path. To a large degree, I/O was able to be shutdown for a full FC port on the SCSI stack. But on NVMe, there isn't a similar call. At best, it works on a per-controller basis, but even at the controller level, it's a controller "reset" call. All of which means I/O is still flowing on different CPUs with reset paths expecting hw access (mailbox commands) to execute properly. The following modifications are made: - A new flag is set in PCI error entrypoints so the driver can track being called by that path. - An interlock is added in the SLI hw error path and the PCI error path such that only one of the paths proceeds with the teardown logic. - RPI cleanup is patched such that RPIs are marked unregistered w/o mbx cmds in cases of hw error. - If entering the SLI port re-init calls, a case where SLI error teardown was quick and beat the PCI calls now reporting error, check whether the SLI port is still live on the PCI bus. - In the PCI reset code to bring the adapter back, recheck the IRQ settings. Different checks for SLI3 vs SLI4. - In I/O completions, that may be called as part of the cleanup or underway just before the hw error, check the state of the adapter. If in error, shortcut handling that would expect further adapter completions as the hw error won't be sending them. - In routines waiting on I/O completions, which may have been in progress prior to the hw error, detect the device is being torn down and abort from their waits and just give up. This points to a larger issue in the driver on ref-counting for data structures, as it doesn't have ref-counting on q and port structures. We'll do this fix for now as it would be a major rework to be done differently. - Fix the NVMe cleanup to simulate NVMe I/O completions if I/O is being failed back due to hw error. - In I/O buf allocation, done at the start of new I/Os, check hw state and fail if hw error. Link: https://lore.kernel.org/r/20210910233159.115896-10-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
On link up and node discovery, a remote port is registered with the SCSI transport and the driver sets fc4_xpt_flags to track transport registration. A link down event causes the driver to deregister with the SCSI transport, starting the devloss timer, and calls a local unreg routine to clear the login state. Part of the login state is the fc4_xpt_flags. However, with tape devices that support sequence level error recovery, which wants to preserve the login, the local unreg routine is skipped, thus the flags aren't cleared. A subsequent link up, ADISC is performed and the lpfc_nlp_reg_node() routine is called. As the fc4_xpt_flags is not clear, it's believed the node is already registered with the transport. Unfortunately, the registration was already terminated. Eventually the devloss tmo timer expires and tears down the device. Fix by ensuring the tape device, known by the ADISC flag, is always unregistered if the link drops. Link: https://lore.kernel.org/r/20210910233159.115896-6-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
James Smart authored
A test scenario encountered an unload hang while an FLOGI ELS was in flight when a link down condition occurred. The driver fails unload as it never releases the fport node. For most nodes, when the link drops, devloss tmo is started and the timeout will cause the final node release. For the Fport, as it has not yet registered with the SCSI transport, there is no devloss timer to be started, so there is no final release. Additionally, the link down sequence causes ABORTS to be issued for pending ELS's. The completions from the ABORTS perform the release of node references. However, as the adapter is being reset to be unloaded, those completions will never occur. Fix by the following: - In the ELS cleanup, recognize when unloading and place the ELS's on a different list that immediately cleans up/completes the ELS's. It's recognized that this condition primarily affects only the fport, with other ports having normal clean up logic that handles things. - Resolve the devloss issue by, when cleaning up nodes on after link down, recognizing when the fabric node does not have a completed state (its state is UNUSED) and removing a reference so the node can delete after the ELS reference is released. Link: https://lore.kernel.org/r/20210910233159.115896-5-jsmart2021@gmail.comCo-developed-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
Justin Tee <justin.tee@broadcom.com> Signed-off-by:
James Smart <jsmart2021@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-