Commit 21fc7933 authored by Tomer Tayar's avatar Tomer Tayar Committed by Oded Gabbay

habanalabs/gaudi2: mark PCIE access error as fatal

F/W events are enabled in a late phase of the device init, so an event
for a PCIE access error during the init, can be received after the init
is already done and considered as successful.
A resulting device reset, which does the same H/W init, can end
similarly with this event right after the reset is done and considered
as successful, and a loop of this sequence can continue.

To avoid it mark the PCIE access error as a fatal event, so after 2
consecutive events no more resets will be done.
Signed-off-by: default avatarTomer Tayar <ttayar@habana.ai>
Reviewed-by: default avatarOded Gabbay <ogabbay@kernel.org>
Signed-off-by: default avatarOded Gabbay <ogabbay@kernel.org>
parent f018c54e
......@@ -8532,6 +8532,7 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent
case GAUDI2_EVENT_PCIE_ADDR_DEC_ERR:
gaudi2_print_pcie_addr_dec_info(hdev,
le64_to_cpu(eq_entry->intr_cause.intr_cause_data));
reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
break;
case GAUDI2_EVENT_HMMU0_PAGE_FAULT_OR_WR_PERM ... GAUDI2_EVENT_HMMU12_SECURITY_ERROR:
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment