• Lukas Wunner's avatar
    spi: bcm2835: Speed up TX-only DMA transfers by clearing RX FIFO · 8259bf66
    Lukas Wunner authored
    The BCM2835 SPI driver currently sets the SPI_CONTROLLER_MUST_RX flag.
    When performing a TX-only transfer, this flag causes the SPI core to
    allocate and DMA-map a dummy buffer into which the RX FIFO contents are
    copied.  The dummy buffer is necessary because the chip is not capable
    of disabling the receiver or automatically throwing away received data.
    Not reading the RX FIFO isn't an option either since transmission is
    halted once it's full.
    
    Avoid the overhead induced by the dummy buffer by preallocating a
    reusable DMA transaction which cyclically clears the RX FIFO.  The
    transaction requires very little CPU time to submit and generates no
    interrupts while running.  Specifics are provided in kerneldoc comments.
    
    With a ks8851 Ethernet chip attached to the SPI controller, I am seeing
    a 30 us reduction in ping time with this commit (1.819 ms vs. 1.849 ms,
    average of 100,000 packets) as well as a 2% reduction in CPU time
    (75:08 vs. 76:39 for transmission of 5 GByte over the SPI bus).
    
    The commit uses the TX DMA interrupt to signal completion of a transfer.
    This interrupt is raised once all bytes have been written to the
    TX FIFO and it is then necessary to busy-wait for the TX FIFO to become
    empty before the transfer can be finalized.  As an alternative approach,
    I have explored using the SPI controller's DONE interrupt to detect
    completion.  This interrupt is signaled when the TX FIFO becomes empty,
    avoiding the need to busy-wait.  However latency deteriorates compared
    to the present commit and surprisingly, CPU time is slightly higher as
    well:
    
    It turns out that in 45% of the cases, no busy-waiting is needed at all
    and in 76% of the cases, less than 10 busy-wait iterations are
    sufficient for the TX FIFO to drain.  This was measured on an RT kernel.
    On a vanilla kernel, wakeup latency is worse and thus fewer iterations
    are needed.  The measurements were made with an SPI clock of 20 MHz,
    they may differ slightly for slower or faster clock speeds.
    
    Previously we always used the RX DMA interrupt to signal completion of a
    transfer.  Using the TX DMA interrupt now introduces a race condition:
    TX DMA is always started before RX DMA so that bytes are already clocked
    out while RX DMA is still being set up.  But if a TX-only transfer is
    very short, then the TX DMA interrupt may occur before RX DMA is set up.
    If the interrupt happens to occur on the same CPU, setup of RX DMA may
    even be delayed until after the interrupt was handled.
    
    I've solved this by having the TX DMA callback clear the RX FIFO while
    busy-waiting for the TX FIFO to drain, thus avoiding a dependency on
    setup of RX DMA.  Additionally, I am using a lock-free mechanism with
    two flags, tx_dma_active and rx_dma_active plus memory barriers to
    terminate RX DMA either by the TX DMA callback or immediately after
    setting it up, whichever wins the race.  I've explored an alternative
    approach which temporarily disables the TX DMA callback until RX DMA
    has been set up (using tasklet_disable(), local_bh_disable() or
    local_irq_save()), but the performance was minimally worse.
    
    [Nathan Chancellor contributed a DMA mapping fixup for an early version
    of this commit, hence his Signed-off-by.]
    Tested-by: default avatarNuno Sá <nuno.sa@analog.com>
    Tested-by: default avatarNoralf Trønnes <noralf@tronnes.org>
    Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
    Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
    Acked-by: default avatarStefan Wahren <wahrenst@gmx.net>
    Acked-by: default avatarMartin Sperl <kernel@martin.sperl.org>
    Cc: Robert Jarzmik <robert.jarzmik@free.fr>
    Link: https://lore.kernel.org/r/874949385f28251e2dcaa9494e39a27b50e9f9e4.1568187525.git.lukas@wunner.deSigned-off-by: default avatarMark Brown <broonie@kernel.org>
    8259bf66
spi-bcm2835.c 38 KB