• Chuck Lever's avatar
    xprtrdma: Detect unreachable NFS/RDMA servers more reliably · 33849792
    Chuck Lever authored
    Current NFS clients rely on connection loss to determine when to
    retransmit. In particular, for protocols like NFSv4, clients no
    longer rely on RPC timeouts to drive retransmission: NFSv4 servers
    are required to terminate a connection when they need a client to
    retransmit pending RPCs.
    
    When a server is no longer reachable, either because it has crashed
    or because the network path has broken, the server cannot actively
    terminate a connection. Thus NFS clients depend on transport-level
    keepalive to determine when a connection must be replaced and
    pending RPCs retransmitted.
    
    However, RDMA RC connections do not have a native keepalive
    mechanism. If an NFS/RDMA server crashes after a client has sent
    RPCs successfully (an RC ACK has been received for all OTW RDMA
    requests), there is no way for the client to know the connection is
    moribund.
    
    In addition, new RDMA requests are subject to the RPC-over-RDMA
    credit limit. If the client has consumed all granted credits with
    NFS traffic, it is not allowed to send another RDMA request until
    the server replies. Thus it has no way to send a true keepalive when
    the workload has already consumed all credits with pending RPCs.
    
    To address this, forcibly disconnect a transport when an RPC times
    out. This prevents moribund connections from stopping the
    detection of failover or other configuration changes on the server.
    
    Note that even if the connection is still good, retransmitting
    any RPC will trigger a disconnect thanks to this logic in
    xprt_rdma_send_request:
    
    	/* Must suppress retransmit to maintain credits */
    	if (req->rl_connect_cookie == xprt->connect_cookie)
    		goto drop_connection;
    	req->rl_connect_cookie = xprt->connect_cookie;
    Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
    Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
    33849792
transport.c 24.3 KB