Commit 22fca179 authored by Gavin Shan's avatar Gavin Shan Committed by Michael Ellerman

powerpc/eeh: Clear frozen device state in time

The problem was reported by Carol: In the scenario of passing mlx4
adapter to guest, EEH error could be recovered successfully. When
returning the device back to host, the driver (mlx4_core.ko)
couldn't be loaded successfully because of error number -5 (-EIO)
returned from mlx4_get_ownership(), which hits offlined PCI device.
The root cause is that we missed to put the affected devices into
normal state on clearing PE isolated state right after PE reset.

The patch fixes above issue by putting the affected devices to
normal state when clearing PE isolated state in eeh_pe_state_clear().

Cc: stable@vger.kernel.org
Reported-by: default avatarCarol L. Soto <clsoto@us.ibm.com>
Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
parent d9df1b5d
...@@ -584,6 +584,8 @@ static void *__eeh_pe_state_clear(void *data, void *flag) ...@@ -584,6 +584,8 @@ static void *__eeh_pe_state_clear(void *data, void *flag)
{ {
struct eeh_pe *pe = (struct eeh_pe *)data; struct eeh_pe *pe = (struct eeh_pe *)data;
int state = *((int *)flag); int state = *((int *)flag);
struct eeh_dev *edev, *tmp;
struct pci_dev *pdev;
/* Keep the state of permanently removed PE intact */ /* Keep the state of permanently removed PE intact */
if ((pe->freeze_count > EEH_MAX_ALLOWED_FREEZES) && if ((pe->freeze_count > EEH_MAX_ALLOWED_FREEZES) &&
...@@ -592,9 +594,22 @@ static void *__eeh_pe_state_clear(void *data, void *flag) ...@@ -592,9 +594,22 @@ static void *__eeh_pe_state_clear(void *data, void *flag)
pe->state &= ~state; pe->state &= ~state;
/* Clear check count since last isolation */ /*
if (state & EEH_PE_ISOLATED) * Special treatment on clearing isolated state. Clear
* check count since last isolation and put all affected
* devices to normal state.
*/
if (!(state & EEH_PE_ISOLATED))
return NULL;
pe->check_count = 0; pe->check_count = 0;
eeh_pe_for_each_dev(pe, edev, tmp) {
pdev = eeh_dev_to_pci_dev(edev);
if (!pdev)
continue;
pdev->error_state = pci_channel_io_normal;
}
return NULL; return NULL;
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment