• Chuck Lever's avatar
    svcrdma: Increase the per-transport rw_ctx count · 2da0f610
    Chuck Lever authored
    rdma_rw_mr_factor() returns the smallest number of MRs needed to
    move a particular number of pages. svcrdma currently asks for the
    number of MRs needed to move RPCSVC_MAXPAGES (a little over one
    megabyte), as that is the number of pages in the largest r/wsize
    the server supports.
    
    This call assumes that the client's NIC can bundle a full one
    megabyte payload in a single rdma_segment. In fact, most NICs cannot
    handle a full megabyte with a single rkey / rdma_segment. Clients
    will typically split even a single Read chunk into many segments.
    
    The server needs one MR to read each rdma_segment in a Read chunk,
    and thus each one needs an rw_ctx.
    
    svcrdma has been vastly underestimating the number of rw_ctxs needed
    to handle 64 RPC requests with large Read chunks using small
    rdma_segments.
    
    Unfortunately there doesn't seem to be a good way to estimate this
    number without knowing the client NIC's capabilities. Even then,
    the client RPC/RDMA implementation is still free to split a chunk
    into smaller segments (for example, it might be using physical
    registration, which needs an rdma_segment per page).
    
    The best we can do for now is choose a number that will guarantee
    forward progress in the worst case (one page per segment).
    
    At some later point, we could add some mechanisms to make this
    much less of a problem:
    - Add a core API to add more rw_ctxs to an already-established QP
    - svcrdma could treat rw_ctx exhaustion as a temporary error and
      try again
    - Limit the number of Reads in flight
    Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
    2da0f610
svc_rdma_transport.c 19.3 KB