Commit 8778b276 authored by Aaron Young's avatar Aaron Young Committed by David S. Miller

ldmvsw: tx queue stuck in stopped state after LDC reset

The following patch fixes an issue with the ldmvsw driver where
the network connection of a guest domain becomes non-functional after
the guest domain has panic'd and rebooted.

The root cause was determined to be from the following series of
events:

1. Guest domain panics - resulting in the guest no longer processing
   network packets (from ldmvsw driver)
2. The ldmvsw driver (in the control domain) eventually exerts flow
   control due to no more available tx drings and stops the tx queue
   for the guest domain
3. The LDC of the network connection for the guest is reset when
   the guest domain reboots after the panic.
4. The LDC reset event is received by the ldmvsw driver and the ldmvsw
   responds by clearing the tx queue for the guest.
5. ldmvsw waits indefinitely for a DATA ACK from the guest - which is
   the normal method to re-enable the tx queue. But the ACK never comes
   because the tx queue was cleared due to the LDC reset.

To fix this issue, in addition to clearing the tx queue, re-enable the
tx queue on a LDC reset. This prevents the ldmvsw from getting caught in
this deadlocked state of waiting for a DATA ACK which will never come.
Signed-off-by: default avatarAaron Young <Aaron.Young@oracle.com>
Acked-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parent 04f762e8
...@@ -704,9 +704,8 @@ static int handle_mcast(struct vnet_port *port, void *msgbuf) ...@@ -704,9 +704,8 @@ static int handle_mcast(struct vnet_port *port, void *msgbuf)
return 0; return 0;
} }
/* Got back a STOPPED LDC message on port. If the queue is stopped, /* If the queue is stopped, wake it up so that we'll
* wake it up so that we'll send out another START message at the * send out another START message at the next TX.
* next TX.
*/ */
static void maybe_tx_wakeup(struct vnet_port *port) static void maybe_tx_wakeup(struct vnet_port *port)
{ {
...@@ -734,6 +733,7 @@ EXPORT_SYMBOL_GPL(sunvnet_port_is_up_common); ...@@ -734,6 +733,7 @@ EXPORT_SYMBOL_GPL(sunvnet_port_is_up_common);
static int vnet_event_napi(struct vnet_port *port, int budget) static int vnet_event_napi(struct vnet_port *port, int budget)
{ {
struct net_device *dev = VNET_PORT_TO_NET_DEVICE(port);
struct vio_driver_state *vio = &port->vio; struct vio_driver_state *vio = &port->vio;
int tx_wakeup, err; int tx_wakeup, err;
int npkts = 0; int npkts = 0;
...@@ -747,6 +747,16 @@ static int vnet_event_napi(struct vnet_port *port, int budget) ...@@ -747,6 +747,16 @@ static int vnet_event_napi(struct vnet_port *port, int budget)
if (event == LDC_EVENT_RESET) { if (event == LDC_EVENT_RESET) {
vnet_port_reset(port); vnet_port_reset(port);
vio_port_up(vio); vio_port_up(vio);
/* If the device is running but its tx queue was
* stopped (due to flow control), restart it.
* This is necessary since vnet_port_reset()
* clears the tx drings and thus we may never get
* back a VIO_TYPE_DATA ACK packet - which is
* the normal mechanism to restart the tx queue.
*/
if (netif_running(dev))
maybe_tx_wakeup(port);
} }
port->rx_event = 0; port->rx_event = 0;
return 0; return 0;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment