Merge tag 'rxrpc-next-20221108' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

rxrpc changes David Howells says: ==================== rxrpc: Increasing SACK size and moving away from softirq, part 1 AF_RXRPC has some issues that need addressing: (1) The SACK table has a maximum capacity of 255, but for modern networks that isn't sufficient. This is hard to increase in the upstream code because of the way the application thread is coupled to the softirq and retransmission side through a ring buffer. Adjustments to the rx protocol allows a capacity of up to 8192, and having a ring sufficiently large to accommodate that would use an excessive amount of memory as this is per-call. (2) Processing ACKs in softirq mode causes the ACKs get conflated, with only the most recent being considered. Whilst this has the upside that the retransmission algorithm only needs to deal with the most recent ACK, it causes DATA transmission for a call to be very bursty because DATA packets cannot be transmitted in softirq mode. Rather transmission must be delegated to either the application thread or a workqueue, so there tend to be sudden bursts of traffic for any particular call due to scheduling delays. (3) All crypto in a single call is done in series; however, each DATA packet is individually encrypted so encryption and decryption of large calls could be parallelised if spare CPU resources are available. This is the first of a number of sets of patches that try and address them. The overall aims of these changes include: (1) To get rid of the TxRx ring and instead pass the packets round in queues (eg. sk_buff_head). On the Tx side, each ACK packet comes with a SACK table that can be parsed as-is, so there's no particular need to maintain our own; we just have to refer to the ACK. On the Rx side, we do need to maintain a SACK table with one bit per entry - but only if packets go missing - and we don't want to have to perform a complex transformation to get the information into an ACK packet. (2) To try and move almost all processing of received packets out of the softirq handler and into a high-priority kernel I/O thread. Only the transferral of packets would be left there. I would still use the encap_rcv hook to receive packets as there's a noticeable performance drop from letting the UDP socket put the packets into its own queue and then getting them out of there. (3) To make the I/O thread also do all the transmission. The app thread would be responsible for packaging the data into packets and then buffering them for the I/O thread to transmit. This would make it easier for the app thread to run ahead of the I/O thread, and would mean the I/O thread is less likely to have to wait around for a new packet to come available for transmission. (4) To logically partition the socket/UAPI/KAPI side of things from the I/O side of things. The local endpoint, connection, peer and call objects would belong to the I/O side. The socket side would not then touch the private internals of calls and suchlike and would not change their states. It would only look at the send queue, receive queue and a way to pass a message to cause an abort. (5) To remove as much locking, synchronisation, barriering and atomic ops as possible from the I/O side. Exclusion would be achieved by limiting modification of state to the I/O thread only. Locks would still need to be used in communication with the UDP socket and the AF_RXRPC socket API. (6) To provide crypto offload kernel threads that, when there's slack in the system, can see packets that need crypting and provide parallelisation in dealing with them. (7) To remove the use of system timers. Since each timer would then send a poke to the I/O thread, which would then deal with it when it had the opportunity, there seems no point in using system timers if, instead, a list of timeouts can be sensibly consulted. An I/O thread only then needs to schedule with a timeout when it is idle. (8) To use zero-copy sendmsg to send packets. This would make use of the I/O thread being the sole transmitter on the socket to manage the dead-reckoning sequencing of the completion notifications. There is a problem with zero-copy, though: the UDP socket doesn't handle running out of option memory very gracefully. With regard to this first patchset, the changes made include: (1) Some fixes, including a fallback for proc_create_net_single_write(), setting ack.bufferSize to 0 in ACK packets and a fix for rxrpc congestion management, which shouldn't be saving the cwnd value between calls. (2) Improvements in rxrpc tracepoints, including splitting the timer tracepoint into a set-timer and a timer-expired trace. (3) Addition of a new proc file to display some stats. (4) Some code cleanups, including removing some unused bits and unnecessary header inclusions. (5) A change to the recently added UDP encap_err_rcv hook so that it has the same signature as {ip,ipv6}_icmp_error(), and then just have rxrpc point its UDP socket's hook directly at those. (6) Definition of a new struct, rxrpc_txbuf, that is used to hold transmissible packets of DATA and ACK type in a single 2KiB block rather than using an sk_buff. This allows the buffer to be on a number of queues simultaneously more easily, and also guarantees that the entire block is in a single unit for zerocopy purposes and that the data payload is aligned for in-place crypto purposes. (7) ACK txbufs are allocated at proposal and queued for later transmission rather than being stored in a single place in the rxrpc_call struct, which means only a single ACK can be pending transmission at a time. The queue is then drained at various points. This allows the ACK generation code to be simplified. (8) The Rx ring buffer is removed. When a jumbo packet is received (which comprises a number of ordinary DATA packets glued together), it used to be pointed to by the ring multiple times, with an annotation in a side ring indicating which subpacket was in that slot - but this is no longer possible. Instead, the packet is cloned once for each subpacket, barring the last, and the range of data is set in the skb private area. This makes it easier for the subpackets in a jumbo packet to be decrypted in parallel. (9) The Tx ring buffer is removed. The side annotation ring that held the SACK information is also removed. Instead, in the event of packet loss, the SACK data attached an ACK packet is parsed. (10) Allocate an skcipher request when needed in the rxkad security class rather than caching one in the rxrpc_call struct. This deals with a race between externally-driven call disconnection getting rid of the skcipher request and sendmsg/recvmsg trying to use it because they haven't seen the completion yet. This is also needed to support parallelisation as the skcipher request cannot be used by two or more threads simultaneously. (11) Call udp_sendmsg() and udpv6_sendmsg() directly rather than going through kernel_sendmsg() so that we can provide our own iterator (zerocopy explicitly doesn't work with a KVEC iterator). This also lets us avoid the overhead of the security hook. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'rxrpc-next-20221108' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
rxrpc changes David Howells says: ==================== rxrpc: Increasing SACK size and moving away from softirq, part 1 AF_RXRPC has some issues that need addressing: (1) The SACK table has a maximum capacity of 255, but for modern networks that isn't sufficient. This is hard to increase in the upstream code because of the way the application thread is coupled to the softirq and retransmission side through a ring buffer. Adjustments to the rx protocol allows a capacity of up to 8192, and having a ring sufficiently large to accommodate that would use an excessive amount of memory as this is per-call. (2) Processing ACKs in softirq mode causes the ACKs get conflated, with only the most recent being considered. Whilst this has the upside that the retransmission algorithm only needs to deal with the most recent ACK, it causes DATA transmission for a call to be very bursty because DATA packets cannot be transmitted in softirq mode. Rather transmission must be delegated to either the application thread or a workqueue, so there tend to be sudden bursts of traffic for any particular call due to scheduling delays. (3) All crypto in a single call is done in series; however, each DATA packet is individually encrypted so encryption and decryption of large calls could be parallelised if spare CPU resources are available. This is the first of a number of sets of patches that try and address them. The overall aims of these changes include: (1) To get rid of the TxRx ring and instead pass the packets round in queues (eg. sk_buff_head). On the Tx side, each ACK packet comes with a SACK table that can be parsed as-is, so there's no particular need to maintain our own; we just have to refer to the ACK. On the Rx side, we do need to maintain a SACK table with one bit per entry - but only if packets go missing - and we don't want to have to perform a complex transformation to get the information into an ACK packet. (2) To try and move almost all processing of received packets out of the softirq handler and into a high-priority kernel I/O thread. Only the transferral of packets would be left there. I would still use the encap_rcv hook to receive packets as there's a noticeable performance drop from letting the UDP socket put the packets into its own queue and then getting them out of there. (3) To make the I/O thread also do all the transmission. The app thread would be responsible for packaging the data into packets and then buffering them for the I/O thread to transmit. This would make it easier for the app thread to run ahead of the I/O thread, and would mean the I/O thread is less likely to have to wait around for a new packet to come available for transmission. (4) To logically partition the socket/UAPI/KAPI side of things from the I/O side of things. The local endpoint, connection, peer and call objects would belong to the I/O side. The socket side would not then touch the private internals of calls and suchlike and would not change their states. It would only look at the send queue, receive queue and a way to pass a message to cause an abort. (5) To remove as much locking, synchronisation, barriering and atomic ops as possible from the I/O side. Exclusion would be achieved by limiting modification of state to the I/O thread only. Locks would still need to be used in communication with the UDP socket and the AF_RXRPC socket API. (6) To provide crypto offload kernel threads that, when there's slack in the system, can see packets that need crypting and provide parallelisation in dealing with them. (7) To remove the use of system timers. Since each timer would then send a poke to the I/O thread, which would then deal with it when it had the opportunity, there seems no point in using system timers if, instead, a list of timeouts can be sensibly consulted. An I/O thread only then needs to schedule with a timeout when it is idle. (8) To use zero-copy sendmsg to send packets. This would make use of the I/O thread being the sole transmitter on the socket to manage the dead-reckoning sequencing of the completion notifications. There is a problem with zero-copy, though: the UDP socket doesn't handle running out of option memory very gracefully. With regard to this first patchset, the changes made include: (1) Some fixes, including a fallback for proc_create_net_single_write(), setting ack.bufferSize to 0 in ACK packets and a fix for rxrpc congestion management, which shouldn't be saving the cwnd value between calls. (2) Improvements in rxrpc tracepoints, including splitting the timer tracepoint into a set-timer and a timer-expired trace. (3) Addition of a new proc file to display some stats. (4) Some code cleanups, including removing some unused bits and unnecessary header inclusions. (5) A change to the recently added UDP encap_err_rcv hook so that it has the same signature as {ip,ipv6}_icmp_error(), and then just have rxrpc point its UDP socket's hook directly at those. (6) Definition of a new struct, rxrpc_txbuf, that is used to hold transmissible packets of DATA and ACK type in a single 2KiB block rather than using an sk_buff. This allows the buffer to be on a number of queues simultaneously more easily, and also guarantees that the entire block is in a single unit for zerocopy purposes and that the data payload is aligned for in-place crypto purposes. (7) ACK txbufs are allocated at proposal and queued for later transmission rather than being stored in a single place in the rxrpc_call struct, which means only a single ACK can be pending transmission at a time. The queue is then drained at various points. This allows the ACK generation code to be simplified. (8) The Rx ring buffer is removed. When a jumbo packet is received (which comprises a number of ordinary DATA packets glued together), it used to be pointed to by the ring multiple times, with an annotation in a side ring indicating which subpacket was in that slot - but this is no longer possible. Instead, the packet is cloned once for each subpacket, barring the last, and the range of data is set in the skb private area. This makes it easier for the subpackets in a jumbo packet to be decrypted in parallel. (9) The Tx ring buffer is removed. The side annotation ring that held the SACK information is also removed. Instead, in the event of packet loss, the SACK data attached an ACK packet is parsed. (10) Allocate an skcipher request when needed in the rxkad security class rather than caching one in the rxrpc_call struct. This deals with a race between externally-driven call disconnection getting rid of the skcipher request and sendmsg/recvmsg trying to use it because they haven't seen the completion yet. This is also needed to support parallelisation as the skcipher request cannot be used by two or more threads simultaneously. (11) Call udp_sendmsg() and udpv6_sendmsg() directly rather than going through kernel_sendmsg() so that we can provide our own iterator (zerocopy explicitly doesn't work with a KVEC iterator). This also lets us avoid the overhead of the security hook. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
3ca6c3b4 · David S. Miller · bd039b5e · 30d95efe · 3ca6c3b4 · 3ca6c3b4
Commit 3ca6c3b4 authored Nov 09, 2022 by David S. Miller
33 changed files
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -208,8 +208,10 @@ static inline void proc_remove(struct proc_dir_entry *de) {}
 static inline int remove_proc_subtree(const char *name, struct proc_dir_entry *parent) { return 0; }

 #define proc_create_net_data(name, mode, parent, ops, state_size, data) ({NULL;})
+#define proc_create_net_data_write(name, mode, parent, ops, write, state_size, data) ({NULL;})
 #define proc_create_net(name, mode, parent, state_size, ops) ({NULL;})
 #define proc_create_net_single(name, mode, parent, show, data) ({NULL;})
+#define proc_create_net_single_write(name, mode, parent, show, write, data) ({NULL;})

 static inline struct pid *tgid_pidfd_to_pid(const struct file *file)
 {

--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -70,7 +70,8 @@ struct udp_sock {
 	 * For encapsulation sockets.
 	 */
 	int (*encap_rcv)(struct sock *sk, struct sk_buff *skb);
-	void (*encap_err_rcv)(struct sock *sk, struct sk_buff *skb, unsigned int udp_offset);
+	void (*encap_err_rcv)(struct sock *sk, struct sk_buff *skb, int err,
+			      __be16 port, u32 info, u8 *payload);
 	int (*encap_err_lookup)(struct sock *sk, struct sk_buff *skb);
 	void (*encap_destroy)(struct sock *sk);


--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -68,8 +68,8 @@ typedef int (*udp_tunnel_encap_rcv_t)(struct sock *sk, struct sk_buff *skb);
 typedef int (*udp_tunnel_encap_err_lookup_t)(struct sock *sk,
 					     struct sk_buff *skb);
 typedef void (*udp_tunnel_encap_err_rcv_t)(struct sock *sk,
-					   struct sk_buff *skb,
-					   unsigned int udp_offset);
+					   struct sk_buff *skb, int err,
+					   __be16 port, u32 info, u8 *payload);
 typedef void (*udp_tunnel_encap_destroy_t)(struct sock *sk);
 typedef struct sk_buff *(*udp_tunnel_gro_receive_t)(struct sock *sk,
 						    struct list_head *head,

--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -6435,6 +6435,7 @@ void skb_condense(struct sk_buff *skb)
 	 */
 	skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
 }
+EXPORT_SYMBOL(skb_condense);

 #ifdef CONFIG_SKB_EXTENSIONS
 static void *skb_ext_get_ptr(struct skb_ext *ext, enum skb_ext_id id)

--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -433,6 +433,7 @@ void ip_icmp_error(struct sock *sk, struct sk_buff *skb, int err,
 	}
 	kfree_skb(skb);
 }
+EXPORT_SYMBOL_GPL(ip_icmp_error);

 void ip_local_error(struct sock *sk, int err, __be32 daddr, __be16 port, u32 info)
 {

--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -784,7 +784,8 @@ int __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable)
 	if (tunnel) {
 		/* ...not for tunnels though: we don't have a sending socket */
 		if (udp_sk(sk)->encap_err_rcv)
-			udp_sk(sk)->encap_err_rcv(sk, skb, iph->ihl << 2);
+			udp_sk(sk)->encap_err_rcv(sk, skb, err, uh->dest, info,
+						  (u8 *)(uh+1));
 		goto out;
 	}
 	if (!inet->recverr) {

--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -334,6 +334,7 @@ void ipv6_icmp_error(struct sock *sk, struct sk_buff *skb, int err,
 	if (sock_queue_err_skb(sk, skb))
 		kfree_skb(skb);
 }
+EXPORT_SYMBOL_GPL(ipv6_icmp_error);

 void ipv6_local_error(struct sock *sk, int err, struct flowi6 *fl6, u32 info)
 {

--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -632,7 +632,8 @@ int __udp6_lib_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	/* Tunnels don't have an application socket: don't pass errors back */
 	if (tunnel) {
 		if (udp_sk(sk)->encap_err_rcv)
-			udp_sk(sk)->encap_err_rcv(sk, skb, offset);
+			udp_sk(sk)->encap_err_rcv(sk, skb, err, uh->dest,
+						  ntohl(info), (u8 *)(uh+1));
 		goto out;
 	}

@@ -1639,6 +1640,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	err = 0;
 	goto out;
 }
+EXPORT_SYMBOL(udpv6_sendmsg);

 void udpv6_destroy_sock(struct sock *sk)
 {

--- a/net/rxrpc/Makefile
+++ b/net/rxrpc/Makefile
@@ -30,6 +30,7 @@ rxrpc-y := \
 	sendmsg.o \
 	server_key.o \
 	skbuff.o \
+	txbuf.o \
 	utils.o

 rxrpc-$(CONFIG_PROC_FS) += proc.o

--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -39,7 +39,7 @@ atomic_t rxrpc_debug_id;
 EXPORT_SYMBOL(rxrpc_debug_id);

 /* count of skbs currently in use */
-atomic_t rxrpc_n_tx_skbs, rxrpc_n_rx_skbs;
+atomic_t rxrpc_n_rx_skbs;

 struct workqueue_struct *rxrpc_workqueue;

@@ -979,7 +979,7 @@ static int __init af_rxrpc_init(void)
 		goto error_call_jar;
 	}

-	rxrpc_workqueue = alloc_workqueue("krxrpcd", 0, 1);
+	rxrpc_workqueue = alloc_workqueue("krxrpcd", WQ_HIGHPRI | WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
 	if (!rxrpc_workqueue) {
 		pr_notice("Failed to allocate work queue\n");
 		goto error_work_queue;
@@ -1059,7 +1059,6 @@ static void __exit af_rxrpc_exit(void)
 	sock_unregister(PF_RXRPC);
 	proto_unregister(&rxrpc_proto);
 	unregister_pernet_device(&rxrpc_net_ops);
-	ASSERTCMP(atomic_read(&rxrpc_n_tx_skbs), ==, 0);
 	ASSERTCMP(atomic_read(&rxrpc_n_rx_skbs), ==, 0);

 	/* Make sure the local and peer records pinned by any dying connections

--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -248,8 +248,7 @@ static void rxrpc_send_ping(struct rxrpc_call *call, struct sk_buff *skb)

 	if (call->peer->rtt_count < 3 ||
 	    ktime_before(ktime_add_ms(call->peer->rtt_last_req, 1000), now))
-		rxrpc_propose_ACK(call, RXRPC_ACK_PING, sp->hdr.serial,
-				  true, true,
+		rxrpc_send_ACK(call, RXRPC_ACK_PING, sp->hdr.serial,
 			       rxrpc_propose_ack_ping_for_params);
 }

@@ -325,7 +324,8 @@ static struct rxrpc_call *rxrpc_alloc_incoming_call(struct rxrpc_sock *rx,
 	call->security = conn->security;
 	call->security_ix = conn->security_ix;
 	call->peer = rxrpc_get_peer(conn->params.peer);
-	call->cong_cwnd = call->peer->cong_cwnd;
+	call->cong_ssthresh = call->peer->cong_ssthresh;
+	call->tx_last_sent = ktime_get_real();
 	return call;
 }


--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -52,7 +52,7 @@ static void rxrpc_call_timer_expired(struct timer_list *t)
 	_enter("%d", call->debug_id);

 	if (call->state < RXRPC_CALL_COMPLETE) {
-		trace_rxrpc_timer(call, rxrpc_timer_expired, jiffies);
+		trace_rxrpc_timer_expired(call, jiffies);
 		__rxrpc_queue_call(call);
 	} else {
 		rxrpc_put_call(call, rxrpc_call_put);
@@ -129,16 +129,6 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp,
 	if (!call)
 		return NULL;

-	call->rxtx_buffer = kcalloc(RXRPC_RXTX_BUFF_SIZE,
-				    sizeof(struct sk_buff *),
-				    gfp);
-	if (!call->rxtx_buffer)
-		goto nomem;
-
-	call->rxtx_annotations = kcalloc(RXRPC_RXTX_BUFF_SIZE, sizeof(u8), gfp);
-	if (!call->rxtx_annotations)
-		goto nomem_2;
-
 	mutex_init(&call->user_mutex);

 	/* Prevent lockdep reporting a deadlock false positive between the afs
@@ -155,37 +145,39 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp,
 	INIT_LIST_HEAD(&call->accept_link);
 	INIT_LIST_HEAD(&call->recvmsg_link);
 	INIT_LIST_HEAD(&call->sock_link);
+	INIT_LIST_HEAD(&call->tx_buffer);
+	skb_queue_head_init(&call->recvmsg_queue);
+	skb_queue_head_init(&call->rx_oos_queue);
 	init_waitqueue_head(&call->waitq);
-	spin_lock_init(&call->lock);
 	spin_lock_init(&call->notify_lock);
+	spin_lock_init(&call->tx_lock);
 	spin_lock_init(&call->input_lock);
+	spin_lock_init(&call->acks_ack_lock);
 	rwlock_init(&call->state_lock);
 	refcount_set(&call->ref, 1);
 	call->debug_id = debug_id;
 	call->tx_total_len = -1;
 	call->next_rx_timo = 20 * HZ;
 	call->next_req_timo = 1 * HZ;
+	atomic64_set(&call->ackr_window, 0x100000001ULL);

 	memset(&call->sock_node, 0xed, sizeof(call->sock_node));

-	/* Leave space in the ring to handle a maxed-out jumbo packet */
 	call->rx_winsize = rxrpc_rx_window_size;
 	call->tx_winsize = 16;
-	call->rx_expect_next = 1;

+	if (RXRPC_TX_SMSS > 2190)
 		call->cong_cwnd = 2;
-	call->cong_ssthresh = RXRPC_RXTX_BUFF_SIZE - 1;
+	else if (RXRPC_TX_SMSS > 1095)
+		call->cong_cwnd = 3;
+	else
+		call->cong_cwnd = 4;
+	call->cong_ssthresh = RXRPC_TX_MAX_WINDOW;

 	call->rxnet = rxnet;
 	call->rtt_avail = RXRPC_CALL_RTT_AVAIL_MASK;
 	atomic_inc(&rxnet->nr_calls);
 	return call;
-
-nomem_2:
-	kfree(call->rxtx_buffer);
-nomem:
-	kmem_cache_free(rxrpc_call_jar, call);
-	return NULL;
 }

 /*
@@ -206,7 +198,6 @@ static struct rxrpc_call *rxrpc_alloc_client_call(struct rxrpc_sock *rx,
 		return ERR_PTR(-ENOMEM);
 	call->state = RXRPC_CALL_CLIENT_AWAIT_CONN;
 	call->service_id = srx->srx_service;
-	call->tx_phase = true;
 	now = ktime_get_real();
 	call->acks_latest_ts = now;
 	call->cong_tstamp = now;
@@ -223,7 +214,7 @@ static void rxrpc_start_call_timer(struct rxrpc_call *call)
 	unsigned long now = jiffies;
 	unsigned long j = now + MAX_JIFFY_OFFSET;

-	call->ack_at = j;
+	call->delay_ack_at = j;
 	call->ack_lost_at = j;
 	call->resend_at = j;
 	call->ping_at = j;
@@ -510,16 +501,12 @@ void rxrpc_get_call(struct rxrpc_call *call, enum rxrpc_call_trace op)
 }

 /*
- * Clean up the RxTx skb ring.
+ * Clean up the Rx skb ring.
 */
 static void rxrpc_cleanup_ring(struct rxrpc_call *call)
 {
-	int i;
-
-	for (i = 0; i < RXRPC_RXTX_BUFF_SIZE; i++) {
-		rxrpc_free_skb(call->rxtx_buffer[i], rxrpc_skb_cleaned);
-		call->rxtx_buffer[i] = NULL;
-	}
+	skb_queue_purge(&call->recvmsg_queue);
+	skb_queue_purge(&call->rx_oos_queue);
 }

 /*
@@ -539,10 +526,8 @@ void rxrpc_release_call(struct rxrpc_sock *rx, struct rxrpc_call *call)

 	ASSERTCMP(call->state, ==, RXRPC_CALL_COMPLETE);

-	spin_lock_bh(&call->lock);
 	if (test_and_set_bit(RXRPC_CALL_RELEASED, &call->flags))
 		BUG();
-	spin_unlock_bh(&call->lock);

 	rxrpc_put_call_slot(call);
 	rxrpc_delete_call_timer(call);
@@ -656,8 +641,6 @@ static void rxrpc_destroy_call(struct work_struct *work)

 	rxrpc_put_connection(call->conn);
 	rxrpc_put_peer(call->peer);
-	kfree(call->rxtx_buffer);
-	kfree(call->rxtx_annotations);
 	kmem_cache_free(rxrpc_call_jar, call);
 	if (atomic_dec_and_test(&rxnet->nr_calls))
 		wake_up_var(&rxnet->nr_calls);
@@ -684,6 +667,8 @@ static void rxrpc_rcu_destroy_call(struct rcu_head *rcu)
 */
 void rxrpc_cleanup_call(struct rxrpc_call *call)
 {
+	struct rxrpc_txbuf *txb;
+
 	_net("DESTROY CALL %d", call->debug_id);

 	memset(&call->sock_node, 0xcd, sizeof(call->sock_node));
@@ -692,7 +677,13 @@ void rxrpc_cleanup_call(struct rxrpc_call *call)
 	ASSERT(test_bit(RXRPC_CALL_RELEASED, &call->flags));

 	rxrpc_cleanup_ring(call);
-	rxrpc_free_skb(call->tx_pending, rxrpc_skb_cleaned);
+	while ((txb = list_first_entry_or_null(&call->tx_buffer,
+					       struct rxrpc_txbuf, call_link))) {
+		list_del(&txb->call_link);
+		rxrpc_put_txbuf(txb, rxrpc_txbuf_put_cleaned);
+	}
+	rxrpc_put_txbuf(call->tx_pending, rxrpc_txbuf_put_cleaned);
+	rxrpc_free_skb(call->acks_soft_tbl, rxrpc_skb_cleaned);

 	call_rcu(&call->rcu, rxrpc_rcu_destroy_call);
 }

--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -363,7 +363,8 @@ static struct rxrpc_bundle *rxrpc_prep_call(struct rxrpc_sock *rx,
 	if (!cp->peer)
 		goto error;

-	call->cong_cwnd = cp->peer->cong_cwnd;
+	call->tx_last_sent = ktime_get_real();
+	call->cong_ssthresh = cp->peer->cong_ssthresh;
 	if (call->cong_cwnd >= call->cong_ssthresh)
 		call->cong_mode = RXRPC_CALL_CONGEST_AVOIDANCE;
 	else

--- a/net/rxrpc/conn_object.c
+++ b/net/rxrpc/conn_object.c
@@ -175,7 +175,7 @@ void __rxrpc_disconnect_call(struct rxrpc_connection *conn,
 		trace_rxrpc_disconnect_call(call);
 		switch (call->completion) {
 		case RXRPC_CALL_SUCCEEDED:
-			chan->last_seq = call->rx_hard_ack;
+			chan->last_seq = call->rx_highest_seq;
 			chan->last_type = RXRPC_PACKET_TYPE_ACK;
 			break;
 		case RXRPC_CALL_LOCALLY_ABORTED:
@@ -207,7 +207,7 @@ void rxrpc_disconnect_call(struct rxrpc_call *call)
 {
 	struct rxrpc_connection *conn = call->conn;

-	call->peer->cong_cwnd = call->cong_cwnd;
+	call->peer->cong_ssthresh = call->cong_ssthresh;

 	if (!hlist_unhashed(&call->error_link)) {
 		spin_lock_bh(&call->peer->lock);

--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
--- a/net/rxrpc/insecure.c
+++ b/net/rxrpc/insecure.c
@@ -25,16 +25,16 @@ static int none_how_much_data(struct rxrpc_call *call, size_t remain,
 	return 0;
 }

-static int none_secure_packet(struct rxrpc_call *call, struct sk_buff *skb,
-			      size_t data_size)
+static int none_secure_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb)
 {
 	return 0;
 }

-static int none_verify_packet(struct rxrpc_call *call, struct sk_buff *skb,
-			      unsigned int offset, unsigned int len,
-			      rxrpc_seq_t seq, u16 expected_cksum)
+static int none_verify_packet(struct rxrpc_call *call, struct sk_buff *skb)
 {
+	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
+
+	sp->flags |= RXRPC_RX_VERIFIED;
 	return 0;
 }

@@ -42,11 +42,6 @@ static void none_free_call_crypto(struct rxrpc_call *call)
 {
 }

-static void none_locate_data(struct rxrpc_call *call, struct sk_buff *skb,
-			     unsigned int *_offset, unsigned int *_len)
-{
-}
-
 static int none_respond_to_challenge(struct rxrpc_connection *conn,
 				     struct sk_buff *skb,
 				     u32 *_abort_code)
@@ -95,7 +90,6 @@ const struct rxrpc_security rxrpc_no_security = {
 	.how_much_data			= none_how_much_data,
 	.secure_packet			= none_secure_packet,
 	.verify_packet			= none_verify_packet,
-	.locate_data			= none_locate_data,
 	.respond_to_challenge		= none_respond_to_challenge,
 	.verify_response		= none_verify_response,
 	.clear				= none_clear,

--- a/net/rxrpc/local_object.c
+++ b/net/rxrpc/local_object.c
@@ -23,6 +23,19 @@
 static void rxrpc_local_processor(struct work_struct *);
 static void rxrpc_local_rcu(struct rcu_head *);

+/*
+ * Handle an ICMP/ICMP6 error turning up at the tunnel.  Push it through the
+ * usual mechanism so that it gets parsed and presented through the UDP
+ * socket's error_report().
+ */
+static void rxrpc_encap_err_rcv(struct sock *sk, struct sk_buff *skb, int err,
+				__be16 port, u32 info, u8 *payload)
+{
+	if (ip_hdr(skb)->version == IPVERSION)
+		return ip_icmp_error(sk, skb, err, port, info, payload);
+	return ipv6_icmp_error(sk, skb, err, port, info, payload);
+}
+
 /*
 * Compare a local to an address.  Return -ve, 0 or +ve to indicate less than,
 * same or greater than.
@@ -84,6 +97,8 @@ static struct rxrpc_local *rxrpc_alloc_local(struct rxrpc_net *rxnet,
 		local->rxnet = rxnet;
 		INIT_HLIST_NODE(&local->link);
 		INIT_WORK(&local->processor, rxrpc_local_processor);
+		INIT_LIST_HEAD(&local->ack_tx_queue);
+		spin_lock_init(&local->ack_tx_lock);
 		init_rwsem(&local->defrag_sem);
 		skb_queue_head_init(&local->reject_queue);
 		skb_queue_head_init(&local->event_queue);
@@ -419,6 +434,11 @@ static void rxrpc_local_processor(struct work_struct *work)
 			break;
 		}

+		if (!list_empty(&local->ack_tx_queue)) {
+			rxrpc_transmit_ack_packets(local);
+			again = true;
+		}
+
 		if (!skb_queue_empty(&local->reject_queue)) {
 			rxrpc_reject_packets(local);
 			again = true;

--- a/net/rxrpc/misc.c
+++ b/net/rxrpc/misc.c
@@ -16,12 +16,6 @@
 */
 unsigned int rxrpc_max_backlog __read_mostly = 10;

-/*
- * How long to wait before scheduling ACK generation after seeing a
- * packet with RXRPC_REQUEST_ACK set (in jiffies).
- */
-unsigned long rxrpc_requested_ack_delay = 1;
-
 /*
 * How long to wait before scheduling an ACK with subtype DELAY (in jiffies).
 *
@@ -46,10 +40,7 @@ unsigned long rxrpc_idle_ack_delay = HZ / 2;
 * limit is hit, we should generate an EXCEEDS_WINDOW ACK and discard further
 * packets.
 */
-unsigned int rxrpc_rx_window_size = RXRPC_INIT_RX_WINDOW_SIZE;
-#if (RXRPC_RXTX_BUFF_SIZE - 1) < RXRPC_INIT_RX_WINDOW_SIZE
-#error Need to reduce RXRPC_INIT_RX_WINDOW_SIZE
-#endif
+unsigned int rxrpc_rx_window_size = 255;

 /*
 * Maximum Rx MTU size.  This indicates to the sender the size of jumbo packet
@@ -62,15 +53,3 @@ unsigned int rxrpc_rx_mtu = 5692;
 * sender that we're willing to handle.
 */
 unsigned int rxrpc_rx_jumbo_max = 4;
-
-const s8 rxrpc_ack_priority[] = {
-	[0]				= 0,
-	[RXRPC_ACK_DELAY]		= 1,
-	[RXRPC_ACK_REQUESTED]		= 2,
-	[RXRPC_ACK_IDLE]		= 3,
-	[RXRPC_ACK_DUPLICATE]		= 4,
-	[RXRPC_ACK_OUT_OF_SEQUENCE]	= 5,
-	[RXRPC_ACK_EXCEEDS_WINDOW]	= 6,
-	[RXRPC_ACK_NOSPACE]		= 7,
-	[RXRPC_ACK_PING_RESPONSE]	= 8,
-};
--- a/net/rxrpc/net_ns.c
+++ b/net/rxrpc/net_ns.c
@@ -101,6 +101,8 @@ static __net_init int rxrpc_init_net(struct net *net)
 	proc_create_net("locals", 0444, rxnet->proc_net,
 			&rxrpc_local_seq_ops,
 			sizeof(struct seq_net_private));
+	proc_create_net_single_write("stats", S_IFREG | 0644, rxnet->proc_net,
+				     rxrpc_stats_show, rxrpc_stats_clear, NULL);
 	return 0;

 err_proc:

--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
--- a/net/rxrpc/peer_event.c
+++ b/net/rxrpc/peer_event.c
@@ -16,257 +16,12 @@
 #include <net/sock.h>
 #include <net/af_rxrpc.h>
 #include <net/ip.h>
-#include <net/icmp.h>
 #include "ar-internal.h"

-static void rxrpc_adjust_mtu(struct rxrpc_peer *, unsigned int);
 static void rxrpc_store_error(struct rxrpc_peer *, struct sock_exterr_skb *);
 static void rxrpc_distribute_error(struct rxrpc_peer *, int,
 				   enum rxrpc_call_completion);

-/*
- * Find the peer associated with an ICMPv4 packet.
- */
-static struct rxrpc_peer *rxrpc_lookup_peer_icmp_rcu(struct rxrpc_local *local,
-						     struct sk_buff *skb,
-						     unsigned int udp_offset,
-						     unsigned int *info,
-						     struct sockaddr_rxrpc *srx)
-{
-	struct iphdr *ip, *ip0 = ip_hdr(skb);
-	struct icmphdr *icmp = icmp_hdr(skb);
-	struct udphdr *udp = (struct udphdr *)(skb->data + udp_offset);
-
-	_enter("%u,%u,%u", ip0->protocol, icmp->type, icmp->code);
-
-	switch (icmp->type) {
-	case ICMP_DEST_UNREACH:
-		*info = ntohs(icmp->un.frag.mtu);
-		fallthrough;
-	case ICMP_TIME_EXCEEDED:
-	case ICMP_PARAMETERPROB:
-		ip = (struct iphdr *)((void *)icmp + 8);
-		break;
-	default:
-		return NULL;
-	}
-
-	memset(srx, 0, sizeof(*srx));
-	srx->transport_type = local->srx.transport_type;
-	srx->transport_len = local->srx.transport_len;
-	srx->transport.family = local->srx.transport.family;
-
-	/* Can we see an ICMP4 packet on an ICMP6 listening socket?  and vice
-	 * versa?
-	 */
-	switch (srx->transport.family) {
-	case AF_INET:
-		srx->transport_len = sizeof(srx->transport.sin);
-		srx->transport.family = AF_INET;
-		srx->transport.sin.sin_port = udp->dest;
-		memcpy(&srx->transport.sin.sin_addr, &ip->daddr,
-		       sizeof(struct in_addr));
-		break;
-
-#ifdef CONFIG_AF_RXRPC_IPV6
-	case AF_INET6:
-		srx->transport_len = sizeof(srx->transport.sin);
-		srx->transport.family = AF_INET;
-		srx->transport.sin.sin_port = udp->dest;
-		memcpy(&srx->transport.sin.sin_addr, &ip->daddr,
-		       sizeof(struct in_addr));
-		break;
-#endif
-
-	default:
-		WARN_ON_ONCE(1);
-		return NULL;
-	}
-
-	_net("ICMP {%pISp}", &srx->transport);
-	return rxrpc_lookup_peer_rcu(local, srx);
-}
-
-#ifdef CONFIG_AF_RXRPC_IPV6
-/*
- * Find the peer associated with an ICMPv6 packet.
- */
-static struct rxrpc_peer *rxrpc_lookup_peer_icmp6_rcu(struct rxrpc_local *local,
-						      struct sk_buff *skb,
-						      unsigned int udp_offset,
-						      unsigned int *info,
-						      struct sockaddr_rxrpc *srx)
-{
-	struct icmp6hdr *icmp = icmp6_hdr(skb);
-	struct ipv6hdr *ip, *ip0 = ipv6_hdr(skb);
-	struct udphdr *udp = (struct udphdr *)(skb->data + udp_offset);
-
-	_enter("%u,%u,%u", ip0->nexthdr, icmp->icmp6_type, icmp->icmp6_code);
-
-	switch (icmp->icmp6_type) {
-	case ICMPV6_DEST_UNREACH:
-		*info = ntohl(icmp->icmp6_mtu);
-		fallthrough;
-	case ICMPV6_PKT_TOOBIG:
-	case ICMPV6_TIME_EXCEED:
-	case ICMPV6_PARAMPROB:
-		ip = (struct ipv6hdr *)((void *)icmp + 8);
-		break;
-	default:
-		return NULL;
-	}
-
-	memset(srx, 0, sizeof(*srx));
-	srx->transport_type = local->srx.transport_type;
-	srx->transport_len = local->srx.transport_len;
-	srx->transport.family = local->srx.transport.family;
-
-	/* Can we see an ICMP4 packet on an ICMP6 listening socket?  and vice
-	 * versa?
-	 */
-	switch (srx->transport.family) {
-	case AF_INET:
-		_net("Rx ICMP6 on v4 sock");
-		srx->transport_len = sizeof(srx->transport.sin);
-		srx->transport.family = AF_INET;
-		srx->transport.sin.sin_port = udp->dest;
-		memcpy(&srx->transport.sin.sin_addr,
-		       &ip->daddr.s6_addr32[3], sizeof(struct in_addr));
-		break;
-	case AF_INET6:
-		_net("Rx ICMP6");
-		srx->transport.sin.sin_port = udp->dest;
-		memcpy(&srx->transport.sin6.sin6_addr, &ip->daddr,
-		       sizeof(struct in6_addr));
-		break;
-	default:
-		WARN_ON_ONCE(1);
-		return NULL;
-	}
-
-	_net("ICMP {%pISp}", &srx->transport);
-	return rxrpc_lookup_peer_rcu(local, srx);
-}
-#endif /* CONFIG_AF_RXRPC_IPV6 */
-
-/*
- * Handle an error received on the local endpoint as a tunnel.
- */
-void rxrpc_encap_err_rcv(struct sock *sk, struct sk_buff *skb,
-			 unsigned int udp_offset)
-{
-	struct sock_extended_err ee;
-	struct sockaddr_rxrpc srx;
-	struct rxrpc_local *local;
-	struct rxrpc_peer *peer;
-	unsigned int info = 0;
-	int err;
-	u8 version = ip_hdr(skb)->version;
-	u8 type = icmp_hdr(skb)->type;
-	u8 code = icmp_hdr(skb)->code;
-
-	rcu_read_lock();
-	local = rcu_dereference_sk_user_data(sk);
-	if (unlikely(!local)) {
-		rcu_read_unlock();
-		return;
-	}
-
-	rxrpc_new_skb(skb, rxrpc_skb_received);
-
-	switch (ip_hdr(skb)->version) {
-	case IPVERSION:
-		peer = rxrpc_lookup_peer_icmp_rcu(local, skb, udp_offset,
-						  &info, &srx);
-		break;
-#ifdef CONFIG_AF_RXRPC_IPV6
-	case 6:
-		peer = rxrpc_lookup_peer_icmp6_rcu(local, skb, udp_offset,
-						   &info, &srx);
-		break;
-#endif
-	default:
-		rcu_read_unlock();
-		return;
-	}
-
-	if (peer && !rxrpc_get_peer_maybe(peer))
-		peer = NULL;
-	if (!peer) {
-		rcu_read_unlock();
-		return;
-	}
-
-	memset(&ee, 0, sizeof(ee));
-
-	switch (version) {
-	case IPVERSION:
-		switch (type) {
-		case ICMP_DEST_UNREACH:
-			switch (code) {
-			case ICMP_FRAG_NEEDED:
-				rxrpc_adjust_mtu(peer, info);
-				rcu_read_unlock();
-				rxrpc_put_peer(peer);
-				return;
-			default:
-				break;
-			}
-
-			err = EHOSTUNREACH;
-			if (code <= NR_ICMP_UNREACH) {
-				/* Might want to do something different with
-				 * non-fatal errors
-				 */
-				//harderr = icmp_err_convert[code].fatal;
-				err = icmp_err_convert[code].errno;
-			}
-			break;
-
-		case ICMP_TIME_EXCEEDED:
-			err = EHOSTUNREACH;
-			break;
-		default:
-			err = EPROTO;
-			break;
-		}
-
-		ee.ee_origin = SO_EE_ORIGIN_ICMP;
-		ee.ee_type = type;
-		ee.ee_code = code;
-		ee.ee_errno = err;
-		break;
-
-#ifdef CONFIG_AF_RXRPC_IPV6
-	case 6:
-		switch (type) {
-		case ICMPV6_PKT_TOOBIG:
-			rxrpc_adjust_mtu(peer, info);
-			rcu_read_unlock();
-			rxrpc_put_peer(peer);
-			return;
-		}
-
-		icmpv6_err_convert(type, code, &err);
-
-		if (err == EACCES)
-			err = EHOSTUNREACH;
-
-		ee.ee_origin = SO_EE_ORIGIN_ICMP6;
-		ee.ee_type = type;
-		ee.ee_code = code;
-		ee.ee_errno = err;
-		break;
-#endif
-	}
-
-	trace_rxrpc_rx_icmp(peer, &ee, &srx);
-
-	rxrpc_distribute_error(peer, err, RXRPC_CALL_NETWORK_ERROR);
-	rcu_read_unlock();
-	rxrpc_put_peer(peer);
-}
-
 /*
 * Find the peer associated with a local error.
 */
@@ -283,6 +38,9 @@ static struct rxrpc_peer *rxrpc_lookup_peer_local_rcu(struct rxrpc_local *local,
 	srx->transport_len = local->srx.transport_len;
 	srx->transport.family = local->srx.transport.family;

+	/* Can we see an ICMP4 packet on an ICMP6 listening socket?  and vice
+	 * versa?
+	 */
 	switch (srx->transport.family) {
 	case AF_INET:
 		srx->transport_len = sizeof(srx->transport.sin);
@@ -412,20 +170,38 @@ void rxrpc_error_report(struct sock *sk)
 	}
 	rxrpc_new_skb(skb, rxrpc_skb_received);
 	serr = SKB_EXT_ERR(skb);
+	if (!skb->len && serr->ee.ee_origin == SO_EE_ORIGIN_TIMESTAMPING) {
+		_leave("UDP empty message");
+		rcu_read_unlock();
+		rxrpc_free_skb(skb, rxrpc_skb_freed);
+		return;
+	}

-	if (serr->ee.ee_origin == SO_EE_ORIGIN_LOCAL) {
 	peer = rxrpc_lookup_peer_local_rcu(local, skb, &srx);
 	if (peer && !rxrpc_get_peer_maybe(peer))
 		peer = NULL;
-		if (peer) {
-			trace_rxrpc_rx_icmp(peer, &serr->ee, &srx);
-			rxrpc_store_error(peer, serr);
+	if (!peer) {
+		rcu_read_unlock();
+		rxrpc_free_skb(skb, rxrpc_skb_freed);
+		_leave(" [no peer]");
+		return;
 	}
+
+	trace_rxrpc_rx_icmp(peer, &serr->ee, &srx);
+
+	if ((serr->ee.ee_origin == SO_EE_ORIGIN_ICMP &&
+	     serr->ee.ee_type == ICMP_DEST_UNREACH &&
+	     serr->ee.ee_code == ICMP_FRAG_NEEDED)) {
+		rxrpc_adjust_mtu(peer, serr->ee.ee_info);
+		goto out;
 	}

+	rxrpc_store_error(peer, serr);
+out:
 	rcu_read_unlock();
 	rxrpc_free_skb(skb, rxrpc_skb_freed);
 	rxrpc_put_peer(peer);
+
 	_leave("");
 }


--- a/net/rxrpc/peer_object.c
+++ b/net/rxrpc/peer_object.c
@@ -227,12 +227,7 @@ struct rxrpc_peer *rxrpc_alloc_peer(struct rxrpc_local *local, gfp_t gfp)

 		rxrpc_peer_init_rtt(peer);

-		if (RXRPC_TX_SMSS > 2190)
-			peer->cong_cwnd = 2;
-		else if (RXRPC_TX_SMSS > 1095)
-			peer->cong_cwnd = 3;
-		else
-			peer->cong_cwnd = 4;
+		peer->cong_ssthresh = RXRPC_TX_MAX_WINDOW;
 		trace_rxrpc_peer(peer->debug_id, rxrpc_peer_new, 1, here);
 	}


--- a/net/rxrpc/proc.c
+++ b/net/rxrpc/proc.c
@@ -54,8 +54,9 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v)
 	struct rxrpc_call *call;
 	struct rxrpc_net *rxnet = rxrpc_net(seq_file_net(seq));
 	unsigned long timeout = 0;
-	rxrpc_seq_t tx_hard_ack, rx_hard_ack;
+	rxrpc_seq_t acks_hard_ack;
 	char lbuff[50], rbuff[50];
+	u64 wtmp;

 	if (v == &rxnet->calls) {
 		seq_puts(seq,
@@ -90,8 +91,8 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v)
 		timeout -= jiffies;
 	}

-	tx_hard_ack = READ_ONCE(call->tx_hard_ack);
-	rx_hard_ack = READ_ONCE(call->rx_hard_ack);
+	acks_hard_ack = READ_ONCE(call->acks_hard_ack);
+	wtmp   = atomic64_read_acquire(&call->ackr_window);
 	seq_printf(seq,
 		   "UDP   %-47.47s %-47.47s %4x %08x %08x %s %3u"
 		   " %-8.8s %08x %08x %08x %02x %08x %02x %08x %06lx\n",
@@ -105,8 +106,8 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v)
 		   rxrpc_call_states[call->state],
 		   call->abort_code,
 		   call->debug_id,
-		   tx_hard_ack, READ_ONCE(call->tx_top) - tx_hard_ack,
-		   rx_hard_ack, READ_ONCE(call->rx_top) - rx_hard_ack,
+		   acks_hard_ack, READ_ONCE(call->tx_top) - acks_hard_ack,
+		   lower_32_bits(wtmp), upper_32_bits(wtmp) - lower_32_bits(wtmp),
 		   call->rx_serial,
 		   timeout);

@@ -216,7 +217,7 @@ static int rxrpc_peer_seq_show(struct seq_file *seq, void *v)
 		seq_puts(seq,
 			 "Proto Local                                          "
 			 " Remote                                         "
-			 " Use  CW   MTU LastUse      RTT      RTO\n"
+			 " Use SST   MTU LastUse      RTT      RTO\n"
 			 );
 		return 0;
 	}
@@ -234,7 +235,7 @@ static int rxrpc_peer_seq_show(struct seq_file *seq, void *v)
 		   lbuff,
 		   rbuff,
 		   refcount_read(&peer->ref),
-		   peer->cong_cwnd,
+		   peer->cong_ssthresh,
 		   peer->mtu,
 		   now - peer->last_tx_at,
 		   peer->srtt_us >> 3,
@@ -397,3 +398,98 @@ const struct seq_operations rxrpc_local_seq_ops = {
 	.stop   = rxrpc_local_seq_stop,
 	.show   = rxrpc_local_seq_show,
 };
+
+/*
+ * Display stats in /proc/net/rxrpc/stats
+ */
+int rxrpc_stats_show(struct seq_file *seq, void *v)
+{
+	struct rxrpc_net *rxnet = rxrpc_net(seq_file_single_net(seq));
+
+	seq_printf(seq,
+		   "Data     : send=%u sendf=%u\n",
+		   atomic_read(&rxnet->stat_tx_data_send),
+		   atomic_read(&rxnet->stat_tx_data_send_frag));
+	seq_printf(seq,
+		   "Data-Tx  : nr=%u retrans=%u\n",
+		   atomic_read(&rxnet->stat_tx_data),
+		   atomic_read(&rxnet->stat_tx_data_retrans));
+	seq_printf(seq,
+		   "Data-Rx  : nr=%u reqack=%u jumbo=%u\n",
+		   atomic_read(&rxnet->stat_rx_data),
+		   atomic_read(&rxnet->stat_rx_data_reqack),
+		   atomic_read(&rxnet->stat_rx_data_jumbo));
+	seq_printf(seq,
+		   "Ack      : fill=%u send=%u skip=%u\n",
+		   atomic_read(&rxnet->stat_tx_ack_fill),
+		   atomic_read(&rxnet->stat_tx_ack_send),
+		   atomic_read(&rxnet->stat_tx_ack_skip));
+	seq_printf(seq,
+		   "Ack-Tx   : req=%u dup=%u oos=%u exw=%u nos=%u png=%u prs=%u dly=%u idl=%u\n",
+		   atomic_read(&rxnet->stat_tx_acks[RXRPC_ACK_REQUESTED]),
+		   atomic_read(&rxnet->stat_tx_acks[RXRPC_ACK_DUPLICATE]),
+		   atomic_read(&rxnet->stat_tx_acks[RXRPC_ACK_OUT_OF_SEQUENCE]),
+		   atomic_read(&rxnet->stat_tx_acks[RXRPC_ACK_EXCEEDS_WINDOW]),
+		   atomic_read(&rxnet->stat_tx_acks[RXRPC_ACK_NOSPACE]),
+		   atomic_read(&rxnet->stat_tx_acks[RXRPC_ACK_PING]),
+		   atomic_read(&rxnet->stat_tx_acks[RXRPC_ACK_PING_RESPONSE]),
+		   atomic_read(&rxnet->stat_tx_acks[RXRPC_ACK_DELAY]),
+		   atomic_read(&rxnet->stat_tx_acks[RXRPC_ACK_IDLE]));
+	seq_printf(seq,
+		   "Ack-Rx   : req=%u dup=%u oos=%u exw=%u nos=%u png=%u prs=%u dly=%u idl=%u\n",
+		   atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_REQUESTED]),
+		   atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_DUPLICATE]),
+		   atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_OUT_OF_SEQUENCE]),
+		   atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_EXCEEDS_WINDOW]),
+		   atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_NOSPACE]),
+		   atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_PING]),
+		   atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_PING_RESPONSE]),
+		   atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_DELAY]),
+		   atomic_read(&rxnet->stat_rx_acks[RXRPC_ACK_IDLE]));
+	seq_printf(seq,
+		   "Why-Req-A: acklost=%u already=%u mrtt=%u ortt=%u\n",
+		   atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_ack_lost]),
+		   atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_already_on]),
+		   atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_more_rtt]),
+		   atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_old_rtt]));
+	seq_printf(seq,
+		   "Why-Req-A: nolast=%u retx=%u slows=%u smtxw=%u\n",
+		   atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_no_srv_last]),
+		   atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_retrans]),
+		   atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_slow_start]),
+		   atomic_read(&rxnet->stat_why_req_ack[rxrpc_reqack_small_txwin]));
+	seq_printf(seq,
+		   "Buffers  : txb=%u rxb=%u\n",
+		   atomic_read(&rxrpc_nr_txbuf),
+		   atomic_read(&rxrpc_n_rx_skbs));
+	return 0;
+}
+
+/*
+ * Clear stats if /proc/net/rxrpc/stats is written to.
+ */
+int rxrpc_stats_clear(struct file *file, char *buf, size_t size)
+{
+	struct seq_file *m = file->private_data;
+	struct rxrpc_net *rxnet = rxrpc_net(seq_file_single_net(m));
+
+	if (size > 1 || (size == 1 && buf[0] != '\n'))
+		return -EINVAL;
+
+	atomic_set(&rxnet->stat_tx_data, 0);
+	atomic_set(&rxnet->stat_tx_data_retrans, 0);
+	atomic_set(&rxnet->stat_tx_data_send, 0);
+	atomic_set(&rxnet->stat_tx_data_send_frag, 0);
+	atomic_set(&rxnet->stat_rx_data, 0);
+	atomic_set(&rxnet->stat_rx_data_reqack, 0);
+	atomic_set(&rxnet->stat_rx_data_jumbo, 0);
+
+	atomic_set(&rxnet->stat_tx_ack_fill, 0);
+	atomic_set(&rxnet->stat_tx_ack_send, 0);
+	atomic_set(&rxnet->stat_tx_ack_skip, 0);
+	memset(&rxnet->stat_tx_acks, 0, sizeof(rxnet->stat_tx_acks));
+	memset(&rxnet->stat_rx_acks, 0, sizeof(rxnet->stat_rx_acks));
+
+	memset(&rxnet->stat_why_req_ack, 0, sizeof(rxnet->stat_why_req_ack));
+	return size;
+}
--- a/net/rxrpc/protocol.h
+++ b/net/rxrpc/protocol.h
@@ -84,7 +84,7 @@ struct rxrpc_jumbo_header {
 		__be16	_rsvd;		/* reserved */
 		__be16	cksum;		/* kerberos security checksum */
 	};
-};
+} __packed;

 #define RXRPC_JUMBO_DATALEN	1412	/* non-terminal jumbo packet data length */
 #define RXRPC_JUMBO_SUBPKTLEN	(RXRPC_JUMBO_DATALEN + sizeof(struct rxrpc_jumbo_header))
@@ -132,13 +132,6 @@ struct rxrpc_ackpacket {

 } __packed;

-/* Some ACKs refer to specific packets and some are general and can be updated. */
-#define RXRPC_ACK_UPDATEABLE ((1 << RXRPC_ACK_REQUESTED)	|	\
-			      (1 << RXRPC_ACK_PING_RESPONSE)	|	\
-			      (1 << RXRPC_ACK_DELAY)		|	\
-			      (1 << RXRPC_ACK_IDLE))
-
-
 /*
 * ACK packets can have a further piece of information tagged on the end
 */

--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
--- a/net/rxrpc/skbuff.c
+++ b/net/rxrpc/skbuff.c
--- a/net/rxrpc/sysctl.c
+++ b/net/rxrpc/sysctl.c
@@ -14,7 +14,7 @@ static struct ctl_table_header *rxrpc_sysctl_reg_table;
 static const unsigned int four = 4;
 static const unsigned int max_backlog = RXRPC_BACKLOG_MAX - 1;
 static const unsigned int n_65535 = 65535;
-static const unsigned int n_max_acks = RXRPC_RXTX_BUFF_SIZE - 1;
+static const unsigned int n_max_acks = 255;
 static const unsigned long one_jiffy = 1;
 static const unsigned long max_jiffies = MAX_JIFFY_OFFSET;

@@ -26,15 +26,6 @@ static const unsigned long max_jiffies = MAX_JIFFY_OFFSET;
 */
 static struct ctl_table rxrpc_sysctl_table[] = {
 	/* Values measured in milliseconds but used in jiffies */
-	{
-		.procname	= "req_ack_delay",
-		.data		= &rxrpc_requested_ack_delay,
-		.maxlen		= sizeof(unsigned long),
-		.mode		= 0644,
-		.proc_handler	= proc_doulongvec_ms_jiffies_minmax,
-		.extra1		= (void *)&one_jiffy,
-		.extra2		= (void *)&max_jiffies,
-	},
 	{
 		.procname	= "soft_ack_delay",
 		.data		= &rxrpc_soft_ack_delay,

--- a/net/rxrpc/txbuf.c
+++ b/net/rxrpc/txbuf.c