• Mike Marciniszyn's avatar
    IB/hfi1: Fix yield logic in send engine · dd1ed108
    Mike Marciniszyn authored
    When there are many RC QPs and an RDMA READ request
    is sent, timeouts occur on the requester side because
    of fairness among RC QPs on their relative SDMA engine
    on the responder side.  This also hits write and send, but
    to a lesser extent.
    
    Complicating the issue is that the current code checks if workqueue
    is congested before scheduling other QPs, however, this
    check is based on the number of active entries in the
    workqueue, which was found to be too big to for
    workqueue_congested() to be effective.
    
    Fix by reducing the number of active entries as revealed by
    experimentation from the default of num_sdma to
    HFI1_MAX_ACTIVE_WORKQUEUE_ENTRIES.  Retry counts were monitored
    to determine the correct value.
    
    Tracing to investigate any future issues is also added.
    Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
    Signed-off-by: default avatarSebastian Sanchez <sebastian.sanchez@intel.com>
    Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
    Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
    dd1ed108
ruc.c 26.3 KB