• Steve Wise's avatar
    RDMA/cxgb4: Endpoint timeout fixes · b33bd0cb
    Steve Wise authored
    1) timedout endpoint processing can be starved. If there are continual
       CPL messages flowing into the driver, the endpoint timeout
       processing can be starved.  This condition exposed the other bugs
       below.
    
    Solution: In process_work(), call process_timedout_eps() after each CPL
    is processed.
    
    2) Connection events can be processed even though the endpoint is on
       the timeout list.  If the endpoint is scheduled for timeout
       processing, then we must ignore MPA Start Requests and Replies.
    
    Solution: Change stop_ep_timer() to return 1 if the ep has already been
    queued for timeout processing.  All the callers of stop_ep_timer() need
    to check this and act accordingly.  There are just a few cases where
    the caller needs to do something different if stop_ep_timer() returns 1:
    
    1) in process_mpa_reply(), ignore the reply and  process_timeout()
       will abort the connection.
    
    2) in process_mpa_request, ignore the request and process_timeout()
       will abort the connection.
    
    It is ok for callers of stop_ep_timer() to abort the connection since
    that will leave the state in ABORTING or DEAD, and process_timeout()
    now ignores timeouts when the ep is in these states.
    
    3) Double insertion on the timeout list.  Since the endpoint timers
       are used for connection setup and teardown, we need to guard
       against the possibility that an endpoint is already on the timeout
       list.  This is a rare condition and only seen under heavy load and
       in the presense of the above 2 bugs.
    
    Solution: In ep_timeout(), don't queue the endpoint if it is already on
    the queue.
    Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
    Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
    b33bd0cb
cm.c 100 KB