• Steve Wise's avatar
    RDMA/cxgb4: SQ flush fix · b4e2901c
    Steve Wise authored
    There is a race when moving a QP from RTS->CLOSING where a SQ work
    request could be posted after the FW receives the RDMA_RI/FINI WR.
    The SQ work request will never get processed, and should be completed
    with FLUSHED status.  Function c4iw_flush_sq(), however was dropping
    the oldest SQ work request when in CLOSING or IDLE states, instead of
    completing the pending work request. If that oldest pending work
    request was actually complete and has a CQE in the CQ, then when that
    CQE is proceessed in poll_cq, we'll BUG_ON() due to the inconsistent
    SQ/CQ state.
    
    This is a very small timing hole and has only been hit once so far.
    
    The fix is two-fold:
    
    1) c4iw_flush_sq() MUST always flush all non-completed WRs with FLUSHED
       status regardless of the QP state.
    
    2) In c4iw_modify_rc_qp(), always set the "in error" bit on the queue
       before moving the state out of RTS.  This ensures that the state
       transition will not happen while another thread is in
       post_rc_send(), because set_state() and post_rc_send() both aquire
       the qp spinlock.  Also, once we transition the state out of RTS,
       subsequent calls to post_rc_send() will fail because the "in error"
       bit is set.  I don't think this fully closes the race where the FW
       can get a FINI followed a SQ work request being posted (because
       they are posted to differente EQs), but the #1 fix will handle the
       issue by flushing the SQ work request.
    Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
    Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
    b4e2901c
cq.c 24.6 KB