• Devesh Sharma's avatar
    RDMA/ocrdma: Depend on async link events from CNA · 10a214dc
    Devesh Sharma authored
    Recently Dough Ledford reported a deadlock happening
    between ocrdma-load sequence and NetworkManager service
    issuing "open" on be2net interface.
    
    The deadlock happens when any be2net hook (e.g. open/close) is called
    in parallel to insmod ocrdma.ko.
    
    A. be2net is sending administrative open/close event to ocrdma holding
       device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net.
       So sequence of locks is rtnl_lock---> device_list lock
    
    B.  When new ocrdma roce device gets registered, infiniband stack now
        takes rtnl_lock in ib_register_device() in GID initialization routines.
        So sequence of locks in this path is device_list lock ---> rtnl_lock.
    
    This improper locking sequence causes deadlock.
    
    With this patch we stop using administrative open and close events
    injected by be2net driver. These events were used to dispatch PORT_ACTIVE
    and PORT_ERROR events to the IB-stack. This patch implements a logic
    to receive async-link-events generated from CNA whenever link-state-change
    is detected. Now on, these async-events will be used to dispatch
    PORT_ACTIVE and PORT_ERROR events to IB-stack.
    
    Depending on async-events from CNA removes the need to hold device-list-mutex
    and thus breaks the busy-wait scenario.
    Reported-by: default avatarDoug Ledford <dledford@redhat.com>
    CC: Sathya Perla <sathya.perla@avagotech.com>
    Signed-off-by: default avatarPadmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
    Signed-off-by: default avatarSelvin Xavier <selvin.xavier@avagotech.com>
    Signed-off-by: default avatarDevesh Sharma <devesh.sharma@avagotech.com>
    Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
    10a214dc
ocrdma_sli.h 54.3 KB