Merge branch 'bpf-libbpf-af-xdp'

Magnus Karlsson says: ==================== This patch proposes to add AF_XDP support to libbpf. The main reason for this is to facilitate writing applications that use AF_XDP by offering higher-level APIs that hide many of the details of the AF_XDP uapi. This is in the same vein as libbpf facilitates XDP adoption by offering easy-to-use higher level interfaces of XDP functionality. Hopefully this will facilitate adoption of AF_XDP, make applications using it simpler and smaller, and finally also make it possible for applications to benefit from optimizations in the AF_XDP user space access code. Previously, people just copied and pasted the code from the sample application into their application, which is not desirable. The proposed interface is composed of two parts: * Low-level access interface to the four rings and the packet * High-level control plane interface for creating and setting up umems and AF_XDP sockets. This interface also loads a simple XDP program that routes all traffic on a queue up to the AF_XDP socket. The sample program has been updated to use this new interface and in that process it lost roughly 300 lines of code. I cannot detect any performance degradations due to the use of this library instead of the previous functions that were inlined in the sample application. But I did measure this on a slower machine and not the Broadwell that we normally use. The rings are now called xsk_ring and when a producer operates on it. It is xsk_ring_prod and for a consumer it is xsk_ring_cons. This way we can get some compile time error checking that the rings are used correctly. Comments and contenplations: * The current behaviour is that the library loads an XDP program (if requested to do so) but the clean up of this program is left to the application. It would be possible to implement this cleanup in the library, but it would require state to be kept on netdev level, which there is none at the moment, and the synchronization of this between processes. All this adding complexity. But when we get an XDP program per queue id, then it becomes trivial to also remove the XDP program when the application exits. This proposal from Jesper, Björn and others will also improve the performance of libbpf, since most of the XDP program code can be removed when that feature is supported. * In a future release, I am planning on adding a higher level data plane interface too. This will be based around recvmsg and sendmsg with the use of struct iovec for batching, without the user having to know anything about the underlying four rings of an AF_XDP socket. There will be one semantic difference though from the standard recvmsg and that is that the kernel will fill in the iovecs instead of the application. But the rest should be the same as the libc versions so that application writers feel at home. Patch 1: adds AF_XDP support in libbpf Patch 2: updates the xdpsock sample application to use the libbpf functions Patch 3: Documentation update to help first time users Changes v5 to v6: * Fixed prog_fd bug found by Xiaolong Ye. Thanks! Changes v4 to v5: * Added a FAQ to the documentation * Removed xsk_umem__get_data and renamed xsk_umem__get_dat_raw to xsk_umem__get_data * Replaced the netlink code with bpf_get_link_xdp_id() * Dynamic allocation of the map sizes. They are now sized after the max number of queueus on the netdev in question. Changes v3 to v4: * Dropped the pr_*() patch in favor of Yonghong Song's patch set * Addressed the review comments of Daniel Borkmann, mainly leaking of file descriptors at clean up and making the data plane APIs all static inline (with the exception of xsk_umem__get_data that uses an internal structure I do not want to expose). * Fixed the netlink callback as suggested by Maciej Fijalkowski. * Removed an unecessary include in the sample program as spotted by Ilia Fillipov. Changes v2 to v3: * Added automatic loading of a simple XDP program that routes all traffic on a queue up to the AF_XDP socket. This program loading can be disabled. * Updated function names to be consistent with the libbpf naming convention * Moved all code to xsk.[ch] * Removed all the XDP program loading code from the sample since this is now done by libbpf * The initialization functions now return a handle as suggested by Alexei * const statements added in the API where applicable. Changes v1 to v2: * Fixed cleanup of library state on error. * Moved API to initial version * Prefixed all public functions by xsk__ instead of xsk_ * Added comment about changed default ring sizes, batch size and umem size in the sample application commit message * The library now only creates an Rx or Tx ring if the respective parameter is != NULL ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge branch 'bpf-libbpf-af-xdp'
Magnus Karlsson says: ==================== This patch proposes to add AF_XDP support to libbpf. The main reason for this is to facilitate writing applications that use AF_XDP by offering higher-level APIs that hide many of the details of the AF_XDP uapi. This is in the same vein as libbpf facilitates XDP adoption by offering easy-to-use higher level interfaces of XDP functionality. Hopefully this will facilitate adoption of AF_XDP, make applications using it simpler and smaller, and finally also make it possible for applications to benefit from optimizations in the AF_XDP user space access code. Previously, people just copied and pasted the code from the sample application into their application, which is not desirable. The proposed interface is composed of two parts: * Low-level access interface to the four rings and the packet * High-level control plane interface for creating and setting up umems and AF_XDP sockets. This interface also loads a simple XDP program that routes all traffic on a queue up to the AF_XDP socket. The sample program has been updated to use this new interface and in that process it lost roughly 300 lines of code. I cannot detect any performance degradations due to the use of this library instead of the previous functions that were inlined in the sample application. But I did measure this on a slower machine and not the Broadwell that we normally use. The rings are now called xsk_ring and when a producer operates on it. It is xsk_ring_prod and for a consumer it is xsk_ring_cons. This way we can get some compile time error checking that the rings are used correctly. Comments and contenplations: * The current behaviour is that the library loads an XDP program (if requested to do so) but the clean up of this program is left to the application. It would be possible to implement this cleanup in the library, but it would require state to be kept on netdev level, which there is none at the moment, and the synchronization of this between processes. All this adding complexity. But when we get an XDP program per queue id, then it becomes trivial to also remove the XDP program when the application exits. This proposal from Jesper, Björn and others will also improve the performance of libbpf, since most of the XDP program code can be removed when that feature is supported. * In a future release, I am planning on adding a higher level data plane interface too. This will be based around recvmsg and sendmsg with the use of struct iovec for batching, without the user having to know anything about the underlying four rings of an AF_XDP socket. There will be one semantic difference though from the standard recvmsg and that is that the kernel will fill in the iovecs instead of the application. But the rest should be the same as the libc versions so that application writers feel at home. Patch 1: adds AF_XDP support in libbpf Patch 2: updates the xdpsock sample application to use the libbpf functions Patch 3: Documentation update to help first time users Changes v5 to v6: * Fixed prog_fd bug found by Xiaolong Ye. Thanks! Changes v4 to v5: * Added a FAQ to the documentation * Removed xsk_umem__get_data and renamed xsk_umem__get_dat_raw to xsk_umem__get_data * Replaced the netlink code with bpf_get_link_xdp_id() * Dynamic allocation of the map sizes. They are now sized after the max number of queueus on the netdev in question. Changes v3 to v4: * Dropped the pr_*() patch in favor of Yonghong Song's patch set * Addressed the review comments of Daniel Borkmann, mainly leaking of file descriptors at clean up and making the data plane APIs all static inline (with the exception of xsk_umem__get_data that uses an internal structure I do not want to expose). * Fixed the netlink callback as suggested by Maciej Fijalkowski. * Removed an unecessary include in the sample program as spotted by Ilia Fillipov. Changes v2 to v3: * Added automatic loading of a simple XDP program that routes all traffic on a queue up to the AF_XDP socket. This program loading can be disabled. * Updated function names to be consistent with the libbpf naming convention * Moved all code to xsk.[ch] * Removed all the XDP program loading code from the sample since this is now done by libbpf * The initialization functions now return a handle as suggested by Alexei * const statements added in the API where applicable. Changes v1 to v2: * Fixed cleanup of library state on error. * Moved API to initial version * Prefixed all public functions by xsk__ instead of xsk_ * Added comment about changed default ring sizes, batch size and umem size in the sample application commit message * The library now only creates an Rx or Tx ring if the respective parameter is != NULL ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
143bdc2e · Daniel Borkmann · 740f8a65 · 0f4a9b7d · 143bdc2e · 143bdc2e
Commit 143bdc2e authored Feb 25, 2019 by Daniel Borkmann
13 changed files
--- a/Documentation/networking/af_xdp.rst
+++ b/Documentation/networking/af_xdp.rst
@@ -295,6 +295,41 @@ using::
 For XDP_SKB mode, use the switch "-S" instead of "-N" and all options
 can be displayed with "-h", as usual.

+FAQ
+=======
+
+Q: I am not seeing any traffic on the socket. What am I doing wrong?
+
+A: When a netdev of a physical NIC is initialized, Linux usually
+   allocates one Rx and Tx queue pair per core. So on a 8 core system,
+   queue ids 0 to 7 will be allocated, one per core. In the AF_XDP
+   bind call or the xsk_socket__create libbpf function call, you
+   specify a specific queue id to bind to and it is only the traffic
+   towards that queue you are going to get on you socket. So in the
+   example above, if you bind to queue 0, you are NOT going to get any
+   traffic that is distributed to queues 1 through 7. If you are
+   lucky, you will see the traffic, but usually it will end up on one
+   of the queues you have not bound to.
+
+   There are a number of ways to solve the problem of getting the
+   traffic you want to the queue id you bound to. If you want to see
+   all the traffic, you can force the netdev to only have 1 queue, queue
+   id 0, and then bind to queue 0. You can use ethtool to do this::
+
+   sudo ethtool -L <interface> combined 1
+
+   If you want to only see part of the traffic, you can program the
+   NIC through ethtool to filter out your traffic to a single queue id
+   that you can bind your XDP socket to. Here is one example in which
+   UDP traffic to and from port 4242 are sent to queue 2::
+
+   sudo ethtool -N <interface> rx-flow-hash udp4 fn
+   sudo ethtool -N <interface> flow-type udp4 src-port 4242 dst-port \
+   4242 action 2
+
+   A number of other ways are possible all up to the capabilitites of
+   the NIC you have.
+
 Credits
 =======

@@ -309,4 +344,3 @@ Credits
 - Michael S. Tsirkin
 - Qi Z Zhang
 - Willem de Bruijn
-
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -163,7 +163,6 @@ always += xdp2skb_meta_kern.o
 always += syscall_tp_kern.o
 always += cpustat_kern.o
 always += xdp_adjust_tail_kern.o
-always += xdpsock_kern.o
 always += xdp_fwd_kern.o
 always += task_fd_query_kern.o
 always += xdp_sample_pkts_kern.o

--- a/samples/bpf/xdpsock.h
+++ b/samples/bpf/xdpsock.h
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef XDPSOCK_H_
-#define XDPSOCK_H_
-
-/* Power-of-2 number of sockets */
-#define MAX_SOCKS 4
-
-/* Round-robin receive */
-#define RR_LB 0
-
-#endif /* XDPSOCK_H_ */
--- a/samples/bpf/xdpsock_kern.c
+++ b/samples/bpf/xdpsock_kern.c
-// SPDX-License-Identifier: GPL-2.0
-#define KBUILD_MODNAME "foo"
-#include <uapi/linux/bpf.h>
-#include "bpf_helpers.h"
-
-#include "xdpsock.h"
-
-struct bpf_map_def SEC("maps") qidconf_map = {
-	.type		= BPF_MAP_TYPE_ARRAY,
-	.key_size	= sizeof(int),
-	.value_size	= sizeof(int),
-	.max_entries	= 1,
-};
-
-struct bpf_map_def SEC("maps") xsks_map = {
-	.type = BPF_MAP_TYPE_XSKMAP,
-	.key_size = sizeof(int),
-	.value_size = sizeof(int),
-	.max_entries = MAX_SOCKS,
-};
-
-struct bpf_map_def SEC("maps") rr_map = {
-	.type = BPF_MAP_TYPE_PERCPU_ARRAY,
-	.key_size = sizeof(int),
-	.value_size = sizeof(unsigned int),
-	.max_entries = 1,
-};
-
-SEC("xdp_sock")
-int xdp_sock_prog(struct xdp_md *ctx)
-{
-	int *qidconf, key = 0, idx;
-	unsigned int *rr;
-
-	qidconf = bpf_map_lookup_elem(&qidconf_map, &key);
-	if (!qidconf)
-		return XDP_ABORTED;
-
-	if (*qidconf != ctx->rx_queue_index)
-		return XDP_PASS;
-
-#if RR_LB /* NB! RR_LB is configured in xdpsock.h */
-	rr = bpf_map_lookup_elem(&rr_map, &key);
-	if (!rr)
-		return XDP_ABORTED;
-
-	*rr = (*rr + 1) & (MAX_SOCKS - 1);
-	idx = *rr;
-#else
-	idx = 0;
-#endif
-
-	return bpf_redirect_map(&xsks_map, idx, 0);
-}
-
-char _license[] SEC("license") = "GPL";
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
--- a/tools/include/uapi/linux/ethtool.h
+++ b/tools/include/uapi/linux/ethtool.h
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * ethtool.h: Defines for Linux ethtool.
+ *
+ * Copyright (C) 1998 David S. Miller (davem@redhat.com)
+ * Copyright 2001 Jeff Garzik <jgarzik@pobox.com>
+ * Portions Copyright 2001 Sun Microsystems (thockin@sun.com)
+ * Portions Copyright 2002 Intel (eli.kupermann@intel.com,
+ *                                christopher.leech@intel.com,
+ *                                scott.feldman@intel.com)
+ * Portions Copyright (C) Sun Microsystems 2008
+ */
+
+#ifndef _UAPI_LINUX_ETHTOOL_H
+#define _UAPI_LINUX_ETHTOOL_H
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/if_ether.h>
+
+#define ETHTOOL_GCHANNELS       0x0000003c /* Get no of channels */
+
+/**
+ * struct ethtool_channels - configuring number of network channel
+ * @cmd: ETHTOOL_{G,S}CHANNELS
+ * @max_rx: Read only. Maximum number of receive channel the driver support.
+ * @max_tx: Read only. Maximum number of transmit channel the driver support.
+ * @max_other: Read only. Maximum number of other channel the driver support.
+ * @max_combined: Read only. Maximum number of combined channel the driver
+ *	support. Set of queues RX, TX or other.
+ * @rx_count: Valid values are in the range 1 to the max_rx.
+ * @tx_count: Valid values are in the range 1 to the max_tx.
+ * @other_count: Valid values are in the range 1 to the max_other.
+ * @combined_count: Valid values are in the range 1 to the max_combined.
+ *
+ * This can be used to configure RX, TX and other channels.
+ */
+
+struct ethtool_channels {
+	__u32	cmd;
+	__u32	max_rx;
+	__u32	max_tx;
+	__u32	max_other;
+	__u32	max_combined;
+	__u32	rx_count;
+	__u32	tx_count;
+	__u32	other_count;
+	__u32	combined_count;
+};
+
+#endif /* _UAPI_LINUX_ETHTOOL_H */
--- a/tools/include/uapi/linux/if_xdp.h
+++ b/tools/include/uapi/linux/if_xdp.h
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * if_xdp: XDP socket user-space interface
+ * Copyright(c) 2018 Intel Corporation.
+ *
+ * Author(s): Björn Töpel <bjorn.topel@intel.com>
+ *	      Magnus Karlsson <magnus.karlsson@intel.com>
+ */
+
+#ifndef _LINUX_IF_XDP_H
+#define _LINUX_IF_XDP_H
+
+#include <linux/types.h>
+
+/* Options for the sxdp_flags field */
+#define XDP_SHARED_UMEM	(1 << 0)
+#define XDP_COPY	(1 << 1) /* Force copy-mode */
+#define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
+
+struct sockaddr_xdp {
+	__u16 sxdp_family;
+	__u16 sxdp_flags;
+	__u32 sxdp_ifindex;
+	__u32 sxdp_queue_id;
+	__u32 sxdp_shared_umem_fd;
+};
+
+struct xdp_ring_offset {
+	__u64 producer;
+	__u64 consumer;
+	__u64 desc;
+};
+
+struct xdp_mmap_offsets {
+	struct xdp_ring_offset rx;
+	struct xdp_ring_offset tx;
+	struct xdp_ring_offset fr; /* Fill */
+	struct xdp_ring_offset cr; /* Completion */
+};
+
+/* XDP socket options */
+#define XDP_MMAP_OFFSETS		1
+#define XDP_RX_RING			2
+#define XDP_TX_RING			3
+#define XDP_UMEM_REG			4
+#define XDP_UMEM_FILL_RING		5
+#define XDP_UMEM_COMPLETION_RING	6
+#define XDP_STATISTICS			7
+
+struct xdp_umem_reg {
+	__u64 addr; /* Start of packet data area */
+	__u64 len; /* Length of packet data area */
+	__u32 chunk_size;
+	__u32 headroom;
+};
+
+struct xdp_statistics {
+	__u64 rx_dropped; /* Dropped for reasons other than invalid desc */
+	__u64 rx_invalid_descs; /* Dropped due to invalid descriptor */
+	__u64 tx_invalid_descs; /* Dropped due to invalid descriptor */
+};
+
+/* Pgoff for mmaping the rings */
+#define XDP_PGOFF_RX_RING			  0
+#define XDP_PGOFF_TX_RING		 0x80000000
+#define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
+#define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
+
+/* Rx/Tx descriptor */
+struct xdp_desc {
+	__u64 addr;
+	__u32 len;
+	__u32 options;
+};
+
+/* UMEM descriptor is __u64 */
+
+#endif /* _LINUX_IF_XDP_H */
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
-libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o netlink.o bpf_prog_linfo.o libbpf_probes.o
+libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o netlink.o bpf_prog_linfo.o libbpf_probes.o xsk.o
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -164,6 +164,9 @@ $(BPF_IN): force elfdep bpfdep
 	@(test -f ../../include/uapi/linux/if_link.h -a -f ../../../include/uapi/linux/if_link.h && ( \
 	(diff -B ../../include/uapi/linux/if_link.h ../../../include/uapi/linux/if_link.h >/dev/null) || \
 	echo "Warning: Kernel ABI header at 'tools/include/uapi/linux/if_link.h' differs from latest version at 'include/uapi/linux/if_link.h'" >&2 )) || true
+	@(test -f ../../include/uapi/linux/if_xdp.h -a -f ../../../include/uapi/linux/if_xdp.h && ( \
+	(diff -B ../../include/uapi/linux/if_xdp.h ../../../include/uapi/linux/if_xdp.h >/dev/null) || \
+	echo "Warning: Kernel ABI header at 'tools/include/uapi/linux/if_xdp.h' differs from latest version at 'include/uapi/linux/if_xdp.h'" >&2 )) || true
 	$(Q)$(MAKE) $(build)=libbpf

 $(OUTPUT)libbpf.so: $(BPF_IN)
@@ -174,7 +177,7 @@ $(OUTPUT)libbpf.a: $(BPF_IN)
 	$(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^

 $(OUTPUT)test_libbpf: test_libbpf.cpp $(OUTPUT)libbpf.a
-	$(QUIET_LINK)$(CXX) $^ -lelf -o $@
+	$(QUIET_LINK)$(CXX) $(INCLUDES) $^ -lelf -o $@

 check: check_abi


--- a/tools/lib/bpf/README.rst
+++ b/tools/lib/bpf/README.rst
@@ -9,7 +9,7 @@ described here. It's recommended to follow these conventions whenever a
 new function or type is added to keep libbpf API clean and consistent.

 All types and functions provided by libbpf API should have one of the
-following prefixes: ``bpf_``, ``btf_``, ``libbpf_``.
+following prefixes: ``bpf_``, ``btf_``, ``libbpf_``, ``xsk_``.

 System call wrappers
 --------------------
@@ -62,6 +62,19 @@ Auxiliary functions and types that don't fit well in any of categories
 described above should have ``libbpf_`` prefix, e.g.
 ``libbpf_get_error`` or ``libbpf_prog_type_by_name``.

+AF_XDP functions
+-------------------
+
+AF_XDP functions should have an ``xsk_`` prefix, e.g.
+``xsk_umem__get_data`` or ``xsk_umem__create``. The interface consists
+of both low-level ring access functions and high-level configuration
+functions. These can be mixed and matched. Note that these functions
+are not reentrant for performance reasons.
+
+Please take a look at Documentation/networking/af_xdp.rst in the Linux
+kernel source tree on how to use XDP sockets and for some common
+mistakes in case you do not get any traffic up to user space.
+
 libbpf ABI
 ==========


--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -147,4 +147,10 @@ LIBBPF_0.0.2 {
 		btf_ext__new;
 		btf_ext__reloc_func_info;
 		btf_ext__reloc_line_info;
+		xsk_umem__create;
+		xsk_socket__create;
+		xsk_umem__delete;
+		xsk_socket__delete;
+		xsk_umem__fd;
+		xsk_socket__fd;
 } LIBBPF_0.0.1;
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+
+/*
+ * AF_XDP user-space access library.
+ *
+ * Copyright(c) 2018 - 2019 Intel Corporation.
+ *
+ * Author(s): Magnus Karlsson <magnus.karlsson@intel.com>
+ */
+
+#ifndef __LIBBPF_XSK_H
+#define __LIBBPF_XSK_H
+
+#include <stdio.h>
+#include <stdint.h>
+#include <linux/if_xdp.h>
+
+#include "libbpf.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Do not access these members directly. Use the functions below. */
+#define DEFINE_XSK_RING(name) \
+struct name { \
+	__u32 cached_prod; \
+	__u32 cached_cons; \
+	__u32 mask; \
+	__u32 size; \
+	__u32 *producer; \
+	__u32 *consumer; \
+	void *ring; \
+}
+
+DEFINE_XSK_RING(xsk_ring_prod);
+DEFINE_XSK_RING(xsk_ring_cons);
+
+struct xsk_umem;
+struct xsk_socket;
+
+static inline __u64 *xsk_ring_prod__fill_addr(struct xsk_ring_prod *fill,
+					      __u32 idx)
+{
+	__u64 *addrs = (__u64 *)fill->ring;
+
+	return &addrs[idx & fill->mask];
+}
+
+static inline const __u64 *
+xsk_ring_cons__comp_addr(const struct xsk_ring_cons *comp, __u32 idx)
+{
+	const __u64 *addrs = (const __u64 *)comp->ring;
+
+	return &addrs[idx & comp->mask];
+}
+
+static inline struct xdp_desc *xsk_ring_prod__tx_desc(struct xsk_ring_prod *tx,
+						      __u32 idx)
+{
+	struct xdp_desc *descs = (struct xdp_desc *)tx->ring;
+
+	return &descs[idx & tx->mask];
+}
+
+static inline const struct xdp_desc *
+xsk_ring_cons__rx_desc(const struct xsk_ring_cons *rx, __u32 idx)
+{
+	const struct xdp_desc *descs = (const struct xdp_desc *)rx->ring;
+
+	return &descs[idx & rx->mask];
+}
+
+static inline __u32 xsk_prod_nb_free(struct xsk_ring_prod *r, __u32 nb)
+{
+	__u32 free_entries = r->cached_cons - r->cached_prod;
+
+	if (free_entries >= nb)
+		return free_entries;
+
+	/* Refresh the local tail pointer.
+	 * cached_cons is r->size bigger than the real consumer pointer so
+	 * that this addition can be avoided in the more frequently
+	 * executed code that computs free_entries in the beginning of
+	 * this function. Without this optimization it whould have been
+	 * free_entries = r->cached_prod - r->cached_cons + r->size.
+	 */
+	r->cached_cons = *r->consumer + r->size;
+
+	return r->cached_cons - r->cached_prod;
+}
+
+static inline __u32 xsk_cons_nb_avail(struct xsk_ring_cons *r, __u32 nb)
+{
+	__u32 entries = r->cached_prod - r->cached_cons;
+
+	if (entries == 0) {
+		r->cached_prod = *r->producer;
+		entries = r->cached_prod - r->cached_cons;
+	}
+
+	return (entries > nb) ? nb : entries;
+}
+
+static inline size_t xsk_ring_prod__reserve(struct xsk_ring_prod *prod,
+					    size_t nb, __u32 *idx)
+{
+	if (unlikely(xsk_prod_nb_free(prod, nb) < nb))
+		return 0;
+
+	*idx = prod->cached_prod;
+	prod->cached_prod += nb;
+
+	return nb;
+}
+
+static inline void xsk_ring_prod__submit(struct xsk_ring_prod *prod, size_t nb)
+{
+	/* Make sure everything has been written to the ring before signalling
+	 * this to the kernel.
+	 */
+	smp_wmb();
+
+	*prod->producer += nb;
+}
+
+static inline size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons,
+					 size_t nb, __u32 *idx)
+{
+	size_t entries = xsk_cons_nb_avail(cons, nb);
+
+	if (likely(entries > 0)) {
+		/* Make sure we do not speculatively read the data before
+		 * we have received the packet buffers from the ring.
+		 */
+		smp_rmb();
+
+		*idx = cons->cached_cons;
+		cons->cached_cons += entries;
+	}
+
+	return entries;
+}
+
+static inline void xsk_ring_cons__release(struct xsk_ring_cons *cons, size_t nb)
+{
+	*cons->consumer += nb;
+}
+
+static inline void *xsk_umem__get_data(void *umem_area, __u64 addr)
+{
+	return &((char *)umem_area)[addr];
+}
+
+LIBBPF_API int xsk_umem__fd(const struct xsk_umem *umem);
+LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
+
+#define XSK_RING_CONS__DEFAULT_NUM_DESCS      2048
+#define XSK_RING_PROD__DEFAULT_NUM_DESCS      2048
+#define XSK_UMEM__DEFAULT_FRAME_SHIFT    11 /* 2048 bytes */
+#define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
+#define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
+
+struct xsk_umem_config {
+	__u32 fill_size;
+	__u32 comp_size;
+	__u32 frame_size;
+	__u32 frame_headroom;
+};
+
+/* Flags for the libbpf_flags field. */
+#define XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD (1 << 0)
+
+struct xsk_socket_config {
+	__u32 rx_size;
+	__u32 tx_size;
+	__u32 libbpf_flags;
+	__u32 xdp_flags;
+	__u16 bind_flags;
+};
+
+/* Set config to NULL to get the default configuration. */
+LIBBPF_API int xsk_umem__create(struct xsk_umem **umem,
+				void *umem_area, __u64 size,
+				struct xsk_ring_prod *fill,
+				struct xsk_ring_cons *comp,
+				const struct xsk_umem_config *config);
+LIBBPF_API int xsk_socket__create(struct xsk_socket **xsk,
+				  const char *ifname, __u32 queue_id,
+				  struct xsk_umem *umem,
+				  struct xsk_ring_cons *rx,
+				  struct xsk_ring_prod *tx,
+				  const struct xsk_socket_config *config);
+
+/* Returns 0 for success and -EBUSY if the umem is still in use. */
+LIBBPF_API int xsk_umem__delete(struct xsk_umem *umem);
+LIBBPF_API void xsk_socket__delete(struct xsk_socket *xsk);
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+#endif /* __LIBBPF_XSK_H */