• Ronald Wahl's avatar
    net: ks8851: Fix TX stall caused by TX buffer overrun · 3dc5d445
    Ronald Wahl authored
    There is a bug in the ks8851 Ethernet driver that more data is written
    to the hardware TX buffer than actually available. This is caused by
    wrong accounting of the free TX buffer space.
    
    The driver maintains a tx_space variable that represents the TX buffer
    space that is deemed to be free. The ks8851_start_xmit_spi() function
    adds an SKB to a queue if tx_space is large enough and reduces tx_space
    by the amount of buffer space it will later need in the TX buffer and
    then schedules a work item. If there is not enough space then the TX
    queue is stopped.
    
    The worker function ks8851_tx_work() dequeues all the SKBs and writes
    the data into the hardware TX buffer. The last packet will trigger an
    interrupt after it was send. Here it is assumed that all data fits into
    the TX buffer.
    
    In the interrupt routine (which runs asynchronously because it is a
    threaded interrupt) tx_space is updated with the current value from the
    hardware. Also the TX queue is woken up again.
    
    Now it could happen that after data was sent to the hardware and before
    handling the TX interrupt new data is queued in ks8851_start_xmit_spi()
    when the TX buffer space had still some space left. When the interrupt
    is actually handled tx_space is updated from the hardware but now we
    already have new SKBs queued that have not been written to the hardware
    TX buffer yet. Since tx_space has been overwritten by the value from the
    hardware the space is not accounted for.
    
    Now we have more data queued then buffer space available in the hardware
    and ks8851_tx_work() will potentially overrun the hardware TX buffer. In
    many cases it will still work because often the buffer is written out
    fast enough so that no overrun occurs but for example if the peer
    throttles us via flow control then an overrun may happen.
    
    This can be fixed in different ways. The most simple way would be to set
    tx_space to 0 before writing data to the hardware TX buffer preventing
    the queuing of more SKBs until the TX interrupt has been handled. I have
    chosen a slightly more efficient (and still rather simple) way and
    track the amount of data that is already queued and not yet written to
    the hardware. When new SKBs are to be queued the already queued amount
    of data is honoured when checking free TX buffer space.
    
    I tested this with a setup of two linked KS8851 running iperf3 between
    the two in bidirectional mode. Before the fix I got a stall after some
    minutes. With the fix I saw now issues anymore after hours.
    
    Fixes: 3ba81f3e ("net: Micrel KS8851 SPI network driver")
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Eric Dumazet <edumazet@google.com>
    Cc: Jakub Kicinski <kuba@kernel.org>
    Cc: Paolo Abeni <pabeni@redhat.com>
    Cc: Ben Dooks <ben.dooks@codethink.co.uk>
    Cc: Tristram Ha <Tristram.Ha@microchip.com>
    Cc: netdev@vger.kernel.org
    Cc: stable@vger.kernel.org # 5.10+
    Signed-off-by: default avatarRonald Wahl <ronald.wahl@raritan.com>
    Reviewed-by: default avatarSimon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/20231214181112.76052-1-rwahl@gmx.deSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
    3dc5d445
ks8851.h 13.2 KB