• Chuck Lever's avatar
    xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting · ec62f40d
    Chuck Lever authored
    Devesh Sharma <Devesh.Sharma@Emulex.Com> reports that after a
    disconnect, his HCA is failing to create a fresh QP, leaving
    ia_ri->ri_id->qp set to NULL. But xprtrdma still allows RPCs to
    wake up and post LOCAL_INV as they exit, causing an oops.
    
    rpcrdma_ep_connect() is allowing the wake-up by leaking the QP
    creation error code (-EPERM in this case) to the RPC client's
    generic layer. xprt_connect_status() does not recognize -EPERM, so
    it kills pending RPC tasks immediately rather than retrying the
    connect.
    
    Re-arrange the QP creation logic so that when it fails on reconnect,
    it leaves ->qp with the old QP rather than NULL.  If pending RPC
    tasks wake and exit, LOCAL_INV work requests will flush rather than
    oops.
    
    On initial connect, leaving ->qp == NULL is OK, since there are no
    pending RPCs that might use ->qp. But be sure not to try to destroy
    a NULL QP when rpcrdma_ep_connect() is retried.
    Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
    Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
    ec62f40d
verbs.c 46.9 KB