• Chuck Lever's avatar
    xprtrdma: Pad optimization, revisited · 2324fbed
    Chuck Lever authored
    The NetApp Linux team discovered that with NFS/RDMA servers that do
    not support RFC 8797, the Linux client is forming NFSv4.x WRITE
    requests incorrectly.
    
    In this case, the Linux NFS client disables implicit chunk round-up
    for odd-length Read and Write chunks. The goal was to support old
    servers that needed that padding to be sent explicitly by clients.
    
    In that case the Linux NFS included the tail kvec in the Read chunk,
    since the tail contains any needed padding. That meant a separate
    memory registration is needed for the tail kvec, adding to the cost
    of forming such requests. To avoid that cost for a mere 3 bytes of
    zeroes that are always ignored by receivers, we try to use implicit
    roundup when possible.
    
    For NFSv4.x, the tail kvec also sometimes contains a trailing
    GETATTR operation. The Linux NFS client unintentionally includes
    that GETATTR operation in the Read chunk as well as inline.
    
    The fix is simply to /never/ include the tail kvec when forming a
    data payload Read chunk. The padding is thus now always present.
    
    Note that since commit 9ed5af26 ("SUNRPC: Clean up the handling
    of page padding in rpc_prepare_reply_pages()") [Dec 2020] the NFS
    client passes payload data to the transport with the padding in
    xdr->pages instead of in the send buffer's tail kvec. So now the
    Linux NFS client appends XDR padding to all odd-sized Read chunks.
    This shouldn't be a problem because:
    
     - RFC 8166-compliant servers are supposed to work with or without
       that XDR padding in Read chunks.
    
     - Since the padding is now in the same memory region as the data
       payload, a separate memory registration is not needed. In
       addition, the link layer extends data in RDMA Read responses to
       4-byte boundaries anyway. Thus there is now no savings when the
       padding is not included.
    
    Because older kernels include the payload's XDR padding in the
    tail kvec, a fix there will be more complicated. Thus backporting
    this patch is not recommended.
    
    Reported by: Olga Kornievskaia <Olga.Kornievskaia@netapp.com>
    Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
    Reviewed-by: default avatarTom Talpey <tom@talpey.com>
    Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
    2324fbed
rpc_rdma.c 39.1 KB