Commit a9744f7c authored by Magnus Karlsson's avatar Magnus Karlsson Committed by Alexei Starovoitov

xsk: fix potential race in SKB TX completion code

There is a potential race in the TX completion code for the SKB
case. One process enters the sendmsg code of an AF_XDP socket in order
to send a frame. The execution eventually trickles down to the driver
that is told to send the packet. However, it decides to drop the
packet due to some error condition (e.g., rings full) and frees the
SKB. This will trigger the SKB destructor and a completion will be
sent to the AF_XDP user space through its
single-producer/single-consumer queues.

At the same time a TX interrupt has fired on another core and it
dispatches the TX completion code in the driver. It does its HW
specific things and ends up freeing the SKB associated with the
transmitted packet. This will trigger the SKB destructor and a
completion will be sent to the AF_XDP user space through its
single-producer/single-consumer queues. With a pseudo call stack, it
would look like this:

Core 1:
sendmsg() being called in the application
  netdev_start_xmit()
    Driver entered through ndo_start_xmit
      Driver decides to free the SKB for some reason (e.g., rings full)
        Destructor of SKB called
          xskq_produce_addr() is called to signal completion to user space

Core 2:
TX completion irq
  NAPI loop
    Driver irq handler for TX completions
      Frees the SKB
        Destructor of SKB called
          xskq_produce_addr() is called to signal completion to user space

We now have a violation of the single-producer/single-consumer
principle for our queues as there are two threads trying to produce at
the same time on the same queue.

Fixed by introducing a spin_lock in the destructor. In regards to the
performance, I get around 1.74 Mpps for txonly before and after the
introduction of the spinlock. There is of course some impact due to
the spin lock but it is in the less significant digits that are too
noisy for me to measure. But let us say that the version without the
spin lock got 1.745 Mpps in the best case and the version with 1.735
Mpps in the worst case, then that would mean a maximum drop in
performance of 0.5%.

Fixes: 35fcde7f ("xsk: support for Tx")
Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
parent c03079c9
...@@ -60,6 +60,10 @@ struct xdp_sock { ...@@ -60,6 +60,10 @@ struct xdp_sock {
bool zc; bool zc;
/* Protects multiple processes in the control path */ /* Protects multiple processes in the control path */
struct mutex mutex; struct mutex mutex;
/* Mutual exclusion of NAPI TX thread and sendmsg error paths
* in the SKB destructor callback.
*/
spinlock_t tx_completion_lock;
u64 rx_dropped; u64 rx_dropped;
}; };
......
...@@ -199,8 +199,11 @@ static void xsk_destruct_skb(struct sk_buff *skb) ...@@ -199,8 +199,11 @@ static void xsk_destruct_skb(struct sk_buff *skb)
{ {
u64 addr = (u64)(long)skb_shinfo(skb)->destructor_arg; u64 addr = (u64)(long)skb_shinfo(skb)->destructor_arg;
struct xdp_sock *xs = xdp_sk(skb->sk); struct xdp_sock *xs = xdp_sk(skb->sk);
unsigned long flags;
spin_lock_irqsave(&xs->tx_completion_lock, flags);
WARN_ON_ONCE(xskq_produce_addr(xs->umem->cq, addr)); WARN_ON_ONCE(xskq_produce_addr(xs->umem->cq, addr));
spin_unlock_irqrestore(&xs->tx_completion_lock, flags);
sock_wfree(skb); sock_wfree(skb);
} }
...@@ -755,6 +758,7 @@ static int xsk_create(struct net *net, struct socket *sock, int protocol, ...@@ -755,6 +758,7 @@ static int xsk_create(struct net *net, struct socket *sock, int protocol,
xs = xdp_sk(sk); xs = xdp_sk(sk);
mutex_init(&xs->mutex); mutex_init(&xs->mutex);
spin_lock_init(&xs->tx_completion_lock);
local_bh_disable(); local_bh_disable();
sock_prot_inuse_add(net, &xsk_proto, 1); sock_prot_inuse_add(net, &xsk_proto, 1);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment