xprtrdma: Fix latency regression on NUMA NFS/RDMA clients

With v4.15, on one of my NFS/RDMA clients I measured a nearly doubling in the latency of small read and write system calls. There was no change in server round trip time. The extra latency appears in the whole RPC execution path. "git bisect" settled on commit ccede759 ("xprtrdma: Spread reply processing over more CPUs") . After some experimentation, I found that leaving the WQ bound and allowing the scheduler to pick the dispatch CPU seems to eliminate the long latencies, and it does not introduce any new regressions. The fix is implemented by reverting only the part of commit ccede759 ("xprtrdma: Spread reply processing over more CPUs") that dispatches RPC replies specifically on the CPU where the matching RPC call was made. Interestingly, saving the CPU number and later queuing reply processing there was effective _only_ for a NFS READ and WRITE request. On my NUMA client, in-kernel RPC reply processing for asynchronous RPCs was dispatched on the same CPU where the RPC call was made, as expected. However synchronous RPCs seem to get their reply dispatched on some other CPU than where the call was placed, every time. Fixes: ccede759 ("xprtrdma: Spread reply processing over ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

xprtrdma: Fix latency regression on NUMA NFS/RDMA clients
With v4.15, on one of my NFS/RDMA clients I measured a nearly doubling in the latency of small read and write system calls. There was no change in server round trip time. The extra latency appears in the whole RPC execution path. "git bisect" settled on commit ccede759 ("xprtrdma: Spread reply processing over more CPUs") . After some experimentation, I found that leaving the WQ bound and allowing the scheduler to pick the dispatch CPU seems to eliminate the long latencies, and it does not introduce any new regressions. The fix is implemented by reverting only the part of commit ccede759 ("xprtrdma: Spread reply processing over more CPUs") that dispatches RPC replies specifically on the CPU where the matching RPC call was made. Interestingly, saving the CPU number and later queuing reply processing there was effective _only_ for a NFS READ and WRITE request. On my NUMA client, in-kernel RPC reply processing for asynchronous RPCs was dispatched on the same CPU where the RPC call was made, as expected. However synchronous RPCs seem to get their reply dispatched on some other CPU than where the call was placed, every time. Fixes: ccede759 ("xprtrdma: Spread reply processing over ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
6720a899 · Chuck Lever · Anna Schumaker · 661e50bc · 6720a899 · 6720a899
Commit 6720a899 authored Feb 28, 2018 by Chuck Lever Committed by Anna Schumaker Apr 10, 2018
Showing with 1 addition and 4 deletions

net/sunrpc/xprtrdma/rpc_rdma.c net/sunrpc/xprtrdma/rpc_rdma.c +1 -1

net/sunrpc/xprtrdma/transport.c net/sunrpc/xprtrdma/transport.c +0 -2

net/sunrpc/xprtrdma/xprt_rdma.h net/sunrpc/xprtrdma/xprt_rdma.h +0 -1

No files found.
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -1366,7 +1366,7 @@ void rpcrdma_reply_handler(struct rpcrdma_rep *rep)

 	trace_xprtrdma_reply(rqst->rq_task, rep, req, credits);

-	queue_work_on(req->rl_cpu, rpcrdma_receive_wq, &rep->rr_work);
+	queue_work(rpcrdma_receive_wq, &rep->rr_work);
 	return;

 out_badstatus:

--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -52,7 +52,6 @@
 #include <linux/slab.h>
 #include <linux/seq_file.h>
 #include <linux/sunrpc/addr.h>
-#include <linux/smp.h>

 #include "xprt_rdma.h"

@@ -651,7 +650,6 @@ xprt_rdma_allocate(struct rpc_task *task)
 	if (!rpcrdma_get_recvbuf(r_xprt, req, rqst->rq_rcvsize, flags))
 		goto out_fail;

-	req->rl_cpu = smp_processor_id();
 	req->rl_connect_cookie = 0;	/* our reserved value */
 	rpcrdma_set_xprtdata(rqst, req);
 	rqst->rq_buffer = req->rl_sendbuf->rg_base;

--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -334,7 +334,6 @@ enum {
 struct rpcrdma_buffer;
 struct rpcrdma_req {
 	struct list_head	rl_list;
-	int			rl_cpu;
 	unsigned int		rl_connect_cookie;
 	struct rpcrdma_buffer	*rl_buffer;
 	struct rpcrdma_rep	*rl_reply;