Merge branch 'bpf: Add user-space-publisher ring buffer map type'

David Vernet says: ==================== This patch set defines a new map type, BPF_MAP_TYPE_USER_RINGBUF, which provides single-user-space-producer / single-kernel-consumer semantics over a ring buffer. Along with the new map type, a helper function called bpf_user_ringbuf_drain() is added which allows a BPF program to specify a callback with the following signature, to which samples are posted by the helper: void (struct bpf_dynptr *dynptr, void *context); The program can then use the bpf_dynptr_read() or bpf_dynptr_data() helper functions to safely read the sample from the dynptr. There are currently no helpers available to determine the size of the sample, but one could easily be added if required. On the user-space side, libbpf has been updated to export a new 'struct ring_buffer_user' type, along with the following symbols: struct ring_buffer_user * ring_buffer_user__new(int map_fd, const struct ring_buffer_user_opts *opts); void ring_buffer_user__free(struct ring_buffer_user *rb); void *ring_buffer_user__reserve(struct ring_buffer_user *rb, uint32_t size); void *ring_buffer_user__poll(struct ring_buffer_user *rb, uint32_t size, int timeout_ms); void ring_buffer_user__discard(struct ring_buffer_user *rb, void *sample); void ring_buffer_user__submit(struct ring_buffer_user *rb, void *sample); These symbols are exported for inclusion in libbpf version 1.0.0. Signed-off-by: David Vernet <void@manifault.com> --- v5 -> v6: - Fixed s/BPF_MAP_TYPE_RINGBUF/BPF_MAP_TYPE_USER_RINGBUF typo in the libbpf user ringbuf doxygen header comment for ring_buffer_user__new() (Andrii). - Specify that pointer returned from ring_buffer_user__reserve() and its blocking counterpart is 8-byte aligned (Andrii). - Renamed user_ringbuf__commit() to user_ringbuf_commit(), as it's static (Andrii). - Another slight reworking of user_ring_buffer__reserve_blocking() to remove some extraneous nanosecond variables + checking (Andrii). - Add a final check of user_ring_buffer__reserve() in user_ring_buffer__reserve_blocking(). - Moved busy bit lock / unlock logic from __bpf_user_ringbuf_peek() to bpf_user_ringbuf_drain() (Andrii). - -ENOSPC -> -ENODATA for an empty ring buffer in __bpf_user_ringbuf_peek() (Andrii). - Updated BPF_RB_FORCE_WAKEUP to only force a wakeup notification to be sent if even if no sample was drained. - Changed a bit of the wording in the UAPI header for bpf_user_ringbuf_drain() to mention the BPF_RB_FORCE_WAKEUP behavior. - Remove extra space after return in ringbuf_map_poll_user() (Andrii). - Removed now-extraneous paragraph from the commit summary of patch 2/4 (Andrii). v4 -> v5: - DENYLISTed the user-ringbuf test suite on s390x. We have a number of functions in the progs/user_ringbuf_success.c prog that user-space fires by invoking a syscall. Not all of these syscalls are available on s390x. If and when we add the ability to kick the kernel from user-space, or if we end up using iterators for that per Hao's suggestion, we could re-enable this test suite on s390x. - Fixed a few more places that needed ringbuffer -> ring buffer. v3 -> v4: - Update BPF_MAX_USER_RINGBUF_SAMPLES to not specify a bit, and instead just specify a number of samples. (Andrii) - Update "ringbuffer" in comments and commit summaries to say "ring buffer". (Andrii) - Return -E2BIG from bpf_user_ringbuf_drain() both when a sample can't fit into the ring buffer, and when it can't fit into a dynptr. (Andrii) - Don't loop over samples in __bpf_user_ringbuf_peek() if a sample was discarded. Instead, return -EAGAIN so the caller can deal with it. Also updated the caller to detect -EAGAIN and skip over it when iterating. (Andrii) - Removed the heuristic for notifying user-space when a sample is drained, causing the ring buffer to no longer be full. This may be useful in the future, but is being removed now because it's strictly a heuristic. - Re-add BPF_RB_FORCE_WAKEUP flag to bpf_user_ringbuf_drain(). (Andrii) - Remove helper_allocated_dynptr tracker from verifier. (Andrii) - Add libbpf function header comments to tools/lib/bpf/libbpf.h, so that they will be included in rendered libbpf docs. (Andrii) - Add symbols to a new LIBBPF_1.1.0 section in linker version script, rather than including them in LIBBPF_1.0.0. (Andrii) - Remove libbpf_err() calls from static libbpf functions. (Andrii) - Check user_ring_buffer_opts instead of ring_buffer_opts in user_ring_buffer__new(). (Andrii) - Avoid an extra if in the hot path in user_ringbuf__commit(). (Andrii) - Use ENOSPC rather than ENODATA if no space is available in the ring buffer. (Andrii) - Don't round sample size in header to 8, but still round size that is reserved and written to 8, and validate positions are multiples of 8 (Andrii). - Use nanoseconds for most calculations in user_ring_buffer__reserve_blocking(). (Andrii) - Don't use CHECK() in testcases, instead use ASSERT_*. (Andrii) - Use SEC("?raw_tp") instead of SEC("?raw_tp/sys_nanosleep") in negative test. (Andrii) - Move test_user_ringbuf.h header to live next to BPF program instead of a directory up from both it and the user-space test program. (Andrii) - Update bpftool help message / docs to also include user_ringbuf. v2 -> v3: - Lots of formatting fixes, such as keeping things on one line if they fit within 100 characters, and removing some extraneous newlines. Applies to all diffs in the patch-set. (Andrii) - Renamed ring_buffer_user__* symbols to user_ring_buffer__*. (Andrii) - Added a missing smb_mb__before_atomic() in __bpf_user_ringbuf_sample_release(). (Hao) - Restructure how and when notification events are sent from the kernel to the user-space producers via the .map_poll() callback for the BPF_MAP_TYPE_USER_RINGBUF map. Before, we only sent a notification when the ringbuffer was fully drained. Now, we guarantee user-space that we'll send an event at least once per bpf_user_ringbuf_drain(), as long as at least one sample was drained, and BPF_RB_NO_WAKEUP was not passed. As a heuristic, we also send a notification event any time a sample being drained causes the ringbuffer to no longer be full. (Andrii) - Continuing on the above point, updated user_ring_buffer__reserve_blocking() to loop around epoll_wait() until a sufficiently large sample is found. (Andrii) - Communicate BPF_RINGBUF_BUSY_BIT and BPF_RINGBUF_DISCARD_BIT in sample headers. The ringbuffer implementation still only supports single-producer semantics, but we can now add synchronization support in user_ring_buffer__reserve(), and will automatically get multi-producer semantics. (Andrii) - Updated some commit summaries, specifically adding more details where warranted. (Andrii) - Improved function documentation for bpf_user_ringbuf_drain(), more clearly explaining all function arguments and return types, as well as the semantics for waking up user-space producers. - Add function header comments for user_ring_buffer__reserve{_blocking}(). (Andrii) - Rounding-up all samples to 8-bytes in the user-space producer, and enforcing that all samples are properly aligned in the kernel. (Andrii) - Added testcases that verify that bpf_user_ringbuf_drain() properly validates samples, and returns error conditions if any invalid samples are encountered. (Andrii) - Move atomic_t busy field out of the consumer page, and into the struct bpf_ringbuf. (Andrii) - Split ringbuf_map_{mmap, poll}_{kern, user}() into separate implementations. (Andrii) - Don't silently consume errors in bpf_user_ringbuf_drain(). (Andrii) - Remove magic number of samples (4096) from bpf_user_ringbuf_drain(), and instead use BPF_MAX_USER_RINGBUF_SAMPLES macro, which allows 128k samples. (Andrii) - Remove MEM_ALLOC modifier from PTR_TO_DYNPTR register in verifier, and instead rely solely on the register being PTR_TO_DYNPTR. (Andrii) - Move freeing of atomic_t busy bit to before we invoke irq_work_queue() in __bpf_user_ringbuf_sample_release(). (Andrii) - Only check for BPF_RB_NO_WAKEUP flag in bpf_ringbuf_drain(). - Remove libbpf function names from kernel smp_{load, store}* comments in the kernel. (Andrii) - Don't use double-underscore naming convention in libbpf functions. (Andrii) - Use proper __u32 and __u64 for types where we need to guarantee their size. (Andrii) v1 -> v2: - Following Joanne landing 88374342 ("bpf: Fix ref_obj_id for dynptr data slices in verifier") [0], removed [PATCH 1/5] bpf: Clear callee saved regs after updating REG0 [1]. (Joanne) - Following the above adjustment, updated check_helper_call() to not store a reference for bpf_dynptr_data() if the register containing the dynptr is of type MEM_ALLOC. (Joanne) - Fixed casting issue pointed out by kernel test robot by adding a missing (uintptr_t) cast. (lkp) [0] https://lore.kernel.org/all/20220809214055.4050604-1-joannelkoong@gmail.com/ [1] https://lore.kernel.org/all/20220808155341.2479054-1-void@manifault.com/ ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Merge branch 'bpf: Add user-space-publisher ring buffer map type'
David Vernet says: ==================== This patch set defines a new map type, BPF_MAP_TYPE_USER_RINGBUF, which provides single-user-space-producer / single-kernel-consumer semantics over a ring buffer. Along with the new map type, a helper function called bpf_user_ringbuf_drain() is added which allows a BPF program to specify a callback with the following signature, to which samples are posted by the helper: void (struct bpf_dynptr *dynptr, void *context); The program can then use the bpf_dynptr_read() or bpf_dynptr_data() helper functions to safely read the sample from the dynptr. There are currently no helpers available to determine the size of the sample, but one could easily be added if required. On the user-space side, libbpf has been updated to export a new 'struct ring_buffer_user' type, along with the following symbols: struct ring_buffer_user * ring_buffer_user__new(int map_fd, const struct ring_buffer_user_opts *opts); void ring_buffer_user__free(struct ring_buffer_user *rb); void *ring_buffer_user__reserve(struct ring_buffer_user *rb, uint32_t size); void *ring_buffer_user__poll(struct ring_buffer_user *rb, uint32_t size, int timeout_ms); void ring_buffer_user__discard(struct ring_buffer_user *rb, void *sample); void ring_buffer_user__submit(struct ring_buffer_user *rb, void *sample); These symbols are exported for inclusion in libbpf version 1.0.0. Signed-off-by: David Vernet <void@manifault.com> --- v5 -> v6: - Fixed s/BPF_MAP_TYPE_RINGBUF/BPF_MAP_TYPE_USER_RINGBUF typo in the libbpf user ringbuf doxygen header comment for ring_buffer_user__new() (Andrii). - Specify that pointer returned from ring_buffer_user__reserve() and its blocking counterpart is 8-byte aligned (Andrii). - Renamed user_ringbuf__commit() to user_ringbuf_commit(), as it's static (Andrii). - Another slight reworking of user_ring_buffer__reserve_blocking() to remove some extraneous nanosecond variables + checking (Andrii). - Add a final check of user_ring_buffer__reserve() in user_ring_buffer__reserve_blocking(). - Moved busy bit lock / unlock logic from __bpf_user_ringbuf_peek() to bpf_user_ringbuf_drain() (Andrii). - -ENOSPC -> -ENODATA for an empty ring buffer in __bpf_user_ringbuf_peek() (Andrii). - Updated BPF_RB_FORCE_WAKEUP to only force a wakeup notification to be sent if even if no sample was drained. - Changed a bit of the wording in the UAPI header for bpf_user_ringbuf_drain() to mention the BPF_RB_FORCE_WAKEUP behavior. - Remove extra space after return in ringbuf_map_poll_user() (Andrii). - Removed now-extraneous paragraph from the commit summary of patch 2/4 (Andrii). v4 -> v5: - DENYLISTed the user-ringbuf test suite on s390x. We have a number of functions in the progs/user_ringbuf_success.c prog that user-space fires by invoking a syscall. Not all of these syscalls are available on s390x. If and when we add the ability to kick the kernel from user-space, or if we end up using iterators for that per Hao's suggestion, we could re-enable this test suite on s390x. - Fixed a few more places that needed ringbuffer -> ring buffer. v3 -> v4: - Update BPF_MAX_USER_RINGBUF_SAMPLES to not specify a bit, and instead just specify a number of samples. (Andrii) - Update "ringbuffer" in comments and commit summaries to say "ring buffer". (Andrii) - Return -E2BIG from bpf_user_ringbuf_drain() both when a sample can't fit into the ring buffer, and when it can't fit into a dynptr. (Andrii) - Don't loop over samples in __bpf_user_ringbuf_peek() if a sample was discarded. Instead, return -EAGAIN so the caller can deal with it. Also updated the caller to detect -EAGAIN and skip over it when iterating. (Andrii) - Removed the heuristic for notifying user-space when a sample is drained, causing the ring buffer to no longer be full. This may be useful in the future, but is being removed now because it's strictly a heuristic. - Re-add BPF_RB_FORCE_WAKEUP flag to bpf_user_ringbuf_drain(). (Andrii) - Remove helper_allocated_dynptr tracker from verifier. (Andrii) - Add libbpf function header comments to tools/lib/bpf/libbpf.h, so that they will be included in rendered libbpf docs. (Andrii) - Add symbols to a new LIBBPF_1.1.0 section in linker version script, rather than including them in LIBBPF_1.0.0. (Andrii) - Remove libbpf_err() calls from static libbpf functions. (Andrii) - Check user_ring_buffer_opts instead of ring_buffer_opts in user_ring_buffer__new(). (Andrii) - Avoid an extra if in the hot path in user_ringbuf__commit(). (Andrii) - Use ENOSPC rather than ENODATA if no space is available in the ring buffer. (Andrii) - Don't round sample size in header to 8, but still round size that is reserved and written to 8, and validate positions are multiples of 8 (Andrii). - Use nanoseconds for most calculations in user_ring_buffer__reserve_blocking(). (Andrii) - Don't use CHECK() in testcases, instead use ASSERT_*. (Andrii) - Use SEC("?raw_tp") instead of SEC("?raw_tp/sys_nanosleep") in negative test. (Andrii) - Move test_user_ringbuf.h header to live next to BPF program instead of a directory up from both it and the user-space test program. (Andrii) - Update bpftool help message / docs to also include user_ringbuf. v2 -> v3: - Lots of formatting fixes, such as keeping things on one line if they fit within 100 characters, and removing some extraneous newlines. Applies to all diffs in the patch-set. (Andrii) - Renamed ring_buffer_user__* symbols to user_ring_buffer__*. (Andrii) - Added a missing smb_mb__before_atomic() in __bpf_user_ringbuf_sample_release(). (Hao) - Restructure how and when notification events are sent from the kernel to the user-space producers via the .map_poll() callback for the BPF_MAP_TYPE_USER_RINGBUF map. Before, we only sent a notification when the ringbuffer was fully drained. Now, we guarantee user-space that we'll send an event at least once per bpf_user_ringbuf_drain(), as long as at least one sample was drained, and BPF_RB_NO_WAKEUP was not passed. As a heuristic, we also send a notification event any time a sample being drained causes the ringbuffer to no longer be full. (Andrii) - Continuing on the above point, updated user_ring_buffer__reserve_blocking() to loop around epoll_wait() until a sufficiently large sample is found. (Andrii) - Communicate BPF_RINGBUF_BUSY_BIT and BPF_RINGBUF_DISCARD_BIT in sample headers. The ringbuffer implementation still only supports single-producer semantics, but we can now add synchronization support in user_ring_buffer__reserve(), and will automatically get multi-producer semantics. (Andrii) - Updated some commit summaries, specifically adding more details where warranted. (Andrii) - Improved function documentation for bpf_user_ringbuf_drain(), more clearly explaining all function arguments and return types, as well as the semantics for waking up user-space producers. - Add function header comments for user_ring_buffer__reserve{_blocking}(). (Andrii) - Rounding-up all samples to 8-bytes in the user-space producer, and enforcing that all samples are properly aligned in the kernel. (Andrii) - Added testcases that verify that bpf_user_ringbuf_drain() properly validates samples, and returns error conditions if any invalid samples are encountered. (Andrii) - Move atomic_t busy field out of the consumer page, and into the struct bpf_ringbuf. (Andrii) - Split ringbuf_map_{mmap, poll}_{kern, user}() into separate implementations. (Andrii) - Don't silently consume errors in bpf_user_ringbuf_drain(). (Andrii) - Remove magic number of samples (4096) from bpf_user_ringbuf_drain(), and instead use BPF_MAX_USER_RINGBUF_SAMPLES macro, which allows 128k samples. (Andrii) - Remove MEM_ALLOC modifier from PTR_TO_DYNPTR register in verifier, and instead rely solely on the register being PTR_TO_DYNPTR. (Andrii) - Move freeing of atomic_t busy bit to before we invoke irq_work_queue() in __bpf_user_ringbuf_sample_release(). (Andrii) - Only check for BPF_RB_NO_WAKEUP flag in bpf_ringbuf_drain(). - Remove libbpf function names from kernel smp_{load, store}* comments in the kernel. (Andrii) - Don't use double-underscore naming convention in libbpf functions. (Andrii) - Use proper __u32 and __u64 for types where we need to guarantee their size. (Andrii) v1 -> v2: - Following Joanne landing 88374342 ("bpf: Fix ref_obj_id for dynptr data slices in verifier") [0], removed [PATCH 1/5] bpf: Clear callee saved regs after updating REG0 [1]. (Joanne) - Following the above adjustment, updated check_helper_call() to not store a reference for bpf_dynptr_data() if the register containing the dynptr is of type MEM_ALLOC. (Joanne) - Fixed casting issue pointed out by kernel test robot by adding a missing (uintptr_t) cast. (lkp) [0] https://lore.kernel.org/all/20220809214055.4050604-1-joannelkoong@gmail.com/ [1] https://lore.kernel.org/all/20220808155341.2479054-1-void@manifault.com/ ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
c12a0376 · Andrii Nakryiko · 3a74904c · e5a9df51 · c12a0376 · c12a0376
Commit c12a0376 authored Sep 21, 2022 by Andrii Nakryiko
20 changed files
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -451,7 +451,7 @@ enum bpf_type_flag {
 	/* DYNPTR points to memory local to the bpf program. */
 	DYNPTR_TYPE_LOCAL	= BIT(8 + BPF_BASE_TYPE_BITS),
-	/* DYNPTR points to a ringbuf record. */
+	/* DYNPTR points to a kernel-produced ringbuf record. */
 	DYNPTR_TYPE_RINGBUF	= BIT(9 + BPF_BASE_TYPE_BITS),
 	/* Size is known at compile time. */
@@ -656,6 +656,7 @@ enum bpf_reg_type {
 	PTR_TO_MEM,		 /* reg points to valid memory region */
 	PTR_TO_BUF,		 /* reg points to a read/write buffer */
 	PTR_TO_FUNC,		 /* reg points to a bpf program function */
+	PTR_TO_DYNPTR,		 /* reg points to a dynptr */
 	__BPF_REG_TYPE_MAX,
 	/* Extended reg_types. */
@@ -1394,6 +1395,11 @@ struct bpf_array {
 #define BPF_MAP_CAN_READ	BIT(0)
 #define BPF_MAP_CAN_WRITE	BIT(1)
+/* Maximum number of user-producer ring buffer samples that can be drained in
+ * a call to bpf_user_ringbuf_drain().
+ */
+#define BPF_MAX_USER_RINGBUF_SAMPLES (128 * 1024)
 static inline u32 bpf_map_flags_to_cap(struct bpf_map *map)
 {
 	u32 access_flags = map->map_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG);
@@ -2495,6 +2501,7 @@ extern const struct bpf_func_proto bpf_loop_proto;
 extern const struct bpf_func_proto bpf_copy_from_user_task_proto;
 extern const struct bpf_func_proto bpf_set_retval_proto;
 extern const struct bpf_func_proto bpf_get_retval_proto;
+extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto;
 const struct bpf_func_proto *tracing_prog_func_proto(
  enum bpf_func_id func_id, const struct bpf_prog *prog);
@@ -2639,7 +2646,7 @@ enum bpf_dynptr_type {
 	BPF_DYNPTR_TYPE_INVALID,
 	/* Points to memory that is local to the bpf program */
 	BPF_DYNPTR_TYPE_LOCAL,
-	/* Underlying data is a ringbuf record */
+	/* Underlying data is a kernel-produced ringbuf record */
 	BPF_DYNPTR_TYPE_RINGBUF,
 };

--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -126,6 +126,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
 #endif
 BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_USER_RINGBUF, user_ringbuf_map_ops)
 BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
 BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)

--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -928,6 +928,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_INODE_STORAGE,
 	BPF_MAP_TYPE_TASK_STORAGE,
 	BPF_MAP_TYPE_BLOOM_FILTER,
+	BPF_MAP_TYPE_USER_RINGBUF,
 };
 /* Note that tracing related programs such as
@@ -5387,6 +5388,43 @@ union bpf_attr {
 *	Return
 *		Current *ktime*.
 *
+ * long bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx, u64 flags)
+ *	Description
+ *		Drain samples from the specified user ring buffer, and invoke
+ *		the provided callback for each such sample:
+ *
+ *		long (\*callback_fn)(struct bpf_dynptr \*dynptr, void \*ctx);
+ *
+ *		If **callback_fn** returns 0, the helper will continue to try
+ *		and drain the next sample, up to a maximum of
+ *		BPF_MAX_USER_RINGBUF_SAMPLES samples. If the return value is 1,
+ *		the helper will skip the rest of the samples and return. Other
+ *		return values are not used now, and will be rejected by the
+ *		verifier.
+ *	Return
+ *		The number of drained samples if no error was encountered while
+ *		draining samples, or 0 if no samples were present in the ring
+ *		buffer. If a user-space producer was epoll-waiting on this map,
+ *		and at least one sample was drained, they will receive an event
+ *		notification notifying them of available space in the ring
+ *		buffer. If the BPF_RB_NO_WAKEUP flag is passed to this
+ *		function, no wakeup notification will be sent. If the
+ *		BPF_RB_FORCE_WAKEUP flag is passed, a wakeup notification will
+ *		be sent even if no sample was drained.
+ *
+ *		On failure, the returned value is one of the following:
+ *
+ *		**-EBUSY** if the ring buffer is contended, and another calling
+ *		context was concurrently draining the ring buffer.
+ *
+ *		**-EINVAL** if user-space is not properly tracking the ring
+ *		buffer due to the producer position not being aligned to 8
+ *		bytes, a sample not being aligned to 8 bytes, or the producer
+ *		position not matching the advertised length of a sample.
+ *
+ *		**-E2BIG** if user-space has tried to publish a sample which is
+ *		larger than the size of the ring buffer, or which cannot fit
+ *		within a struct bpf_dynptr.
 */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5598,6 +5636,7 @@ union bpf_attr {
 	FN(tcp_raw_check_syncookie_ipv4),	\
 	FN(tcp_raw_check_syncookie_ipv6),	\
 	FN(ktime_get_tai_ns),		\
+	FN(user_ringbuf_drain),		\
 	/* */
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper

--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1659,6 +1659,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_for_each_map_elem_proto;
 	case BPF_FUNC_loop:
 		return &bpf_loop_proto;
+	case BPF_FUNC_user_ringbuf_drain:
+		return &bpf_user_ringbuf_drain_proto;
 	default:
 		break;
 	}

--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -563,6 +563,7 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
 		[PTR_TO_BUF]		= "buf",
 		[PTR_TO_FUNC]		= "func",
 		[PTR_TO_MAP_KEY]	= "map_key",
+		[PTR_TO_DYNPTR]		= "dynptr_ptr",
 	};
 	if (type & PTR_MAYBE_NULL) {
@@ -5688,6 +5689,12 @@ static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK }
 static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
 static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE } };
 static const struct bpf_reg_types kptr_types = { .types = { PTR_TO_MAP_VALUE } };
+static const struct bpf_reg_types dynptr_types = {
+	.types = {
+		PTR_TO_STACK,
+		PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL,
+	}
+};
 static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_MAP_KEY]		= &map_key_value_types,
@@ -5714,7 +5721,7 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_CONST_STR]		= &const_str_ptr_types,
 	[ARG_PTR_TO_TIMER]		= &timer_types,
 	[ARG_PTR_TO_KPTR]		= &kptr_types,
-	[ARG_PTR_TO_DYNPTR]		= &stack_ptr_types,
+	[ARG_PTR_TO_DYNPTR]		= &dynptr_types,
 };
 static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
@@ -6066,6 +6073,13 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		err = check_mem_size_reg(env, reg, regno, true, meta);
 		break;
 	case ARG_PTR_TO_DYNPTR:
+		/* We only need to check for initialized / uninitialized helper
+		 * dynptr args if the dynptr is not PTR_TO_DYNPTR, as the
+		 * assumption is that if it is, that a helper function
+		 * initialized the dynptr on behalf of the BPF program.
+		 */
+		if (base_type(reg->type) == PTR_TO_DYNPTR)
+			break;
 		if (arg_type & MEM_UNINIT) {
 			if (!is_dynptr_reg_valid_uninit(env, reg)) {
 				verbose(env, "Dynptr has to be an uninitialized dynptr\n");
@@ -6240,6 +6254,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		    func_id != BPF_FUNC_ringbuf_discard_dynptr)
 			goto error;
 		break;
+	case BPF_MAP_TYPE_USER_RINGBUF:
+		if (func_id != BPF_FUNC_user_ringbuf_drain)
+			goto error;
+		break;
 	case BPF_MAP_TYPE_STACK_TRACE:
 		if (func_id != BPF_FUNC_get_stackid)
 			goto error;
@@ -6359,6 +6377,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		if (map->map_type != BPF_MAP_TYPE_RINGBUF)
 			goto error;
 		break;
+	case BPF_FUNC_user_ringbuf_drain:
+		if (map->map_type != BPF_MAP_TYPE_USER_RINGBUF)
+			goto error;
+		break;
 	case BPF_FUNC_get_stackid:
 		if (map->map_type != BPF_MAP_TYPE_STACK_TRACE)
 			goto error;
@@ -6885,6 +6907,29 @@ static int set_find_vma_callback_state(struct bpf_verifier_env *env,
 	return 0;
 }
+static int set_user_ringbuf_callback_state(struct bpf_verifier_env *env,
+					   struct bpf_func_state *caller,
+					   struct bpf_func_state *callee,
+					   int insn_idx)
+{
+	/* bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void
+	 *			  callback_ctx, u64 flags);
+	 * callback_fn(struct bpf_dynptr_t* dynptr, void *callback_ctx);
+	 */
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_0]);
+	callee->regs[BPF_REG_1].type = PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL;
+	__mark_reg_known_zero(&callee->regs[BPF_REG_1]);
+	callee->regs[BPF_REG_2] = caller->regs[BPF_REG_3];
+	/* unused */
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_3]);
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
+	callee->in_callback_fn = true;
+	return 0;
+}
 static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
 {
 	struct bpf_verifier_state *state = env->cur_state;
@@ -7344,12 +7389,18 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 	case BPF_FUNC_dynptr_data:
 		for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
 			if (arg_type_is_dynptr(fn->arg_type[i])) {
+				struct bpf_reg_state *reg = &regs[BPF_REG_1 + i];
 				if (meta.ref_obj_id) {
 					verbose(env, "verifier internal error: meta.ref_obj_id already set\n");
 					return -EFAULT;
 				}
-				/* Find the id of the dynptr we're tracking the reference of */
-				meta.ref_obj_id = stack_slot_get_id(env, &regs[BPF_REG_1 + i]);
+				if (base_type(reg->type) != PTR_TO_DYNPTR)
+					/* Find the id of the dynptr we're
+					 * tracking the reference of
+					 */
+					meta.ref_obj_id = stack_slot_get_id(env, reg);
 				break;
 			}
 		}
@@ -7358,6 +7409,10 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			return -EFAULT;
 		}
 		break;
+	case BPF_FUNC_user_ringbuf_drain:
+		err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
+					set_user_ringbuf_callback_state);
+		break;
 	}
 	if (err)
@@ -12635,6 +12690,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 		case BPF_MAP_TYPE_ARRAY_OF_MAPS:
 		case BPF_MAP_TYPE_HASH_OF_MAPS:
 		case BPF_MAP_TYPE_RINGBUF:
+		case BPF_MAP_TYPE_USER_RINGBUF:
 		case BPF_MAP_TYPE_INODE_STORAGE:
 		case BPF_MAP_TYPE_SK_STORAGE:
 		case BPF_MAP_TYPE_TASK_STORAGE:

--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -55,7 +55,7 @@ MAP COMMANDS
 |		| **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash**
 |		| **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
 |		| **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage**
-|		| **task_storage** | **bloom_filter** }
+|		| **task_storage** | **bloom_filter** | **user_ringbuf** }
 DESCRIPTION
 ===========

--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -1459,7 +1459,7 @@ static int do_help(int argc, char **argv)
 		"                 devmap | devmap_hash | sockmap | cpumap | xskmap | sockhash |\n"
 		"                 cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n"
 		"                 queue | stack | sk_storage | struct_ops | ringbuf | inode_storage |\n"
-		"                 task_storage | bloom_filter }\n"
+		"                 task_storage | bloom_filter | user_ringbuf }\n"
 		"       " HELP_SPEC_OPTIONS " |\n"
 		"                    {-f|--bpffs} | {-n|--nomount} }\n"
 		"",

--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -928,6 +928,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_INODE_STORAGE,
 	BPF_MAP_TYPE_TASK_STORAGE,
 	BPF_MAP_TYPE_BLOOM_FILTER,
+	BPF_MAP_TYPE_USER_RINGBUF,
 };
 /* Note that tracing related programs such as
@@ -5387,6 +5388,43 @@ union bpf_attr {
 *	Return
 *		Current *ktime*.
 *
+ * long bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx, u64 flags)
+ *	Description
+ *		Drain samples from the specified user ring buffer, and invoke
+ *		the provided callback for each such sample:
+ *
+ *		long (\*callback_fn)(struct bpf_dynptr \*dynptr, void \*ctx);
+ *
+ *		If **callback_fn** returns 0, the helper will continue to try
+ *		and drain the next sample, up to a maximum of
+ *		BPF_MAX_USER_RINGBUF_SAMPLES samples. If the return value is 1,
+ *		the helper will skip the rest of the samples and return. Other
+ *		return values are not used now, and will be rejected by the
+ *		verifier.
+ *	Return
+ *		The number of drained samples if no error was encountered while
+ *		draining samples, or 0 if no samples were present in the ring
+ *		buffer. If a user-space producer was epoll-waiting on this map,
+ *		and at least one sample was drained, they will receive an event
+ *		notification notifying them of available space in the ring
+ *		buffer. If the BPF_RB_NO_WAKEUP flag is passed to this
+ *		function, no wakeup notification will be sent. If the
+ *		BPF_RB_FORCE_WAKEUP flag is passed, a wakeup notification will
+ *		be sent even if no sample was drained.
+ *
+ *		On failure, the returned value is one of the following:
+ *
+ *		**-EBUSY** if the ring buffer is contended, and another calling
+ *		context was concurrently draining the ring buffer.
+ *
+ *		**-EINVAL** if user-space is not properly tracking the ring
+ *		buffer due to the producer position not being aligned to 8
+ *		bytes, a sample not being aligned to 8 bytes, or the producer
+ *		position not matching the advertised length of a sample.
+ *
+ *		**-E2BIG** if user-space has tried to publish a sample which is
+ *		larger than the size of the ring buffer, or which cannot fit
+ *		within a struct bpf_dynptr.
 */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5598,6 +5636,7 @@ union bpf_attr {
 	FN(tcp_raw_check_syncookie_ipv4),	\
 	FN(tcp_raw_check_syncookie_ipv6),	\
 	FN(ktime_get_tai_ns),		\
+	FN(user_ringbuf_drain),		\
 	/* */
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper

--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -163,6 +163,7 @@ static const char * const map_type_name[] = {
 	[BPF_MAP_TYPE_INODE_STORAGE]		= "inode_storage",
 	[BPF_MAP_TYPE_TASK_STORAGE]		= "task_storage",
 	[BPF_MAP_TYPE_BLOOM_FILTER]		= "bloom_filter",
+	[BPF_MAP_TYPE_USER_RINGBUF]             = "user_ringbuf",
 };
 static const char * const prog_type_name[] = {
@@ -2372,6 +2373,12 @@ static size_t adjust_ringbuf_sz(size_t sz)
 	return sz;
 }
+static bool map_is_ringbuf(const struct bpf_map *map)
+{
+	return map->def.type == BPF_MAP_TYPE_RINGBUF ||
+	       map->def.type == BPF_MAP_TYPE_USER_RINGBUF;
+}
 static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def)
 {
 	map->def.type = def->map_type;
@@ -2386,7 +2393,7 @@ static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def
 	map->btf_value_type_id = def->value_type_id;
 	/* auto-adjust BPF ringbuf map max_entries to be a multiple of page size */
-	if (map->def.type == BPF_MAP_TYPE_RINGBUF)
+	if (map_is_ringbuf(map))
 		map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries);
 	if (def->parts & MAP_DEF_MAP_TYPE)
@@ -4369,7 +4376,7 @@ int bpf_map__set_max_entries(struct bpf_map *map, __u32 max_entries)
 	map->def.max_entries = max_entries;
 	/* auto-adjust BPF ringbuf map max_entries to be a multiple of page size */
-	if (map->def.type == BPF_MAP_TYPE_RINGBUF)
+	if (map_is_ringbuf(map))
 		map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries);
 	return 0;

--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -1011,6 +1011,7 @@ LIBBPF_API int bpf_tc_query(const struct bpf_tc_hook *hook,
 /* Ring buffer APIs */
 struct ring_buffer;
+struct user_ring_buffer;
 typedef int (*ring_buffer_sample_fn)(void *ctx, void *data, size_t size);
@@ -1030,6 +1031,112 @@ LIBBPF_API int ring_buffer__poll(struct ring_buffer *rb, int timeout_ms);
 LIBBPF_API int ring_buffer__consume(struct ring_buffer *rb);
 LIBBPF_API int ring_buffer__epoll_fd(const struct ring_buffer *rb);
+struct user_ring_buffer_opts {
+	size_t sz; /* size of this struct, for forward/backward compatibility */
+};
+#define user_ring_buffer_opts__last_field sz
+/* @brief **user_ring_buffer__new()** creates a new instance of a user ring
+ * buffer.
+ *
+ * @param map_fd A file descriptor to a BPF_MAP_TYPE_USER_RINGBUF map.
+ * @param opts Options for how the ring buffer should be created.
+ * @return A user ring buffer on success; NULL and errno being set on a
+ * failure.
+ */
+LIBBPF_API struct user_ring_buffer *
+user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts);
+/* @brief **user_ring_buffer__reserve()** reserves a pointer to a sample in the
+ * user ring buffer.
+ * @param rb A pointer to a user ring buffer.
+ * @param size The size of the sample, in bytes.
+ * @return A pointer to an 8-byte aligned reserved region of the user ring
+ * buffer; NULL, and errno being set if a sample could not be reserved.
+ *
+ * This function is *not* thread safe, and callers must synchronize accessing
+ * this function if there are multiple producers.  If a size is requested that
+ * is larger than the size of the entire ring buffer, errno will be set to
+ * E2BIG and NULL is returned. If the ring buffer could accommodate the size,
+ * but currently does not have enough space, errno is set to ENOSPC and NULL is
+ * returned.
+ *
+ * After initializing the sample, callers must invoke
+ * **user_ring_buffer__submit()** to post the sample to the kernel. Otherwise,
+ * the sample must be freed with **user_ring_buffer__discard()**.
+ */
+LIBBPF_API void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size);
+/* @brief **user_ring_buffer__reserve_blocking()** reserves a record in the
+ * ring buffer, possibly blocking for up to @timeout_ms until a sample becomes
+ * available.
+ * @param rb The user ring buffer.
+ * @param size The size of the sample, in bytes.
+ * @param timeout_ms The amount of time, in milliseconds, for which the caller
+ * should block when waiting for a sample. -1 causes the caller to block
+ * indefinitely.
+ * @return A pointer to an 8-byte aligned reserved region of the user ring
+ * buffer; NULL, and errno being set if a sample could not be reserved.
+ *
+ * This function is *not* thread safe, and callers must synchronize
+ * accessing this function if there are multiple producers
+ *
+ * If **timeout_ms** is -1, the function will block indefinitely until a sample
+ * becomes available. Otherwise, **timeout_ms** must be non-negative, or errno
+ * is set to EINVAL, and NULL is returned. If **timeout_ms** is 0, no blocking
+ * will occur and the function will return immediately after attempting to
+ * reserve a sample.
+ *
+ * If **size** is larger than the size of the entire ring buffer, errno is set
+ * to E2BIG and NULL is returned. If the ring buffer could accommodate
+ * **size**, but currently does not have enough space, the caller will block
+ * until at most **timeout_ms** has elapsed. If insufficient space is available
+ * at that time, errno is set to ENOSPC, and NULL is returned.
+ *
+ * The kernel guarantees that it will wake up this thread to check if
+ * sufficient space is available in the ring buffer at least once per
+ * invocation of the **bpf_ringbuf_drain()** helper function, provided that at
+ * least one sample is consumed, and the BPF program did not invoke the
+ * function with BPF_RB_NO_WAKEUP. A wakeup may occur sooner than that, but the
+ * kernel does not guarantee this. If the helper function is invoked with
+ * BPF_RB_FORCE_WAKEUP, a wakeup event will be sent even if no sample is
+ * consumed.
+ *
+ * When a sample of size **size** is found within **timeout_ms**, a pointer to
+ * the sample is returned. After initializing the sample, callers must invoke
+ * **user_ring_buffer__submit()** to post the sample to the ring buffer.
+ * Otherwise, the sample must be freed with **user_ring_buffer__discard()**.
+ */
+LIBBPF_API void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb,
+						    __u32 size,
+						    int timeout_ms);
+/* @brief **user_ring_buffer__submit()** submits a previously reserved sample
+ * into the ring buffer.
+ * @param rb The user ring buffer.
+ * @param sample A reserved sample.
+ *
+ * It is not necessary to synchronize amongst multiple producers when invoking
+ * this function.
+ */
+LIBBPF_API void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample);
+/* @brief **user_ring_buffer__discard()** discards a previously reserved sample.
+ * @param rb The user ring buffer.
+ * @param sample A reserved sample.
+ *
+ * It is not necessary to synchronize amongst multiple producers when invoking
+ * this function.
+ */
+LIBBPF_API void user_ring_buffer__discard(struct user_ring_buffer *rb, void *sample);
+/* @brief **user_ring_buffer__free()** frees a ring buffer that was previously
+ * created with **user_ring_buffer__new()**.
+ * @param rb The user ring buffer being freed.
+ */
+LIBBPF_API void user_ring_buffer__free(struct user_ring_buffer *rb);
 /* Perf buffer APIs */
 struct perf_buffer;

--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -368,3 +368,13 @@ LIBBPF_1.0.0 {
 		libbpf_bpf_prog_type_str;
 		perf_buffer__buffer;
 };
+LIBBPF_1.1.0 {
+	global:
+		user_ring_buffer__discard;
+		user_ring_buffer__free;
+		user_ring_buffer__new;
+		user_ring_buffer__reserve;
+		user_ring_buffer__reserve_blocking;
+		user_ring_buffer__submit;
+} LIBBPF_1.0.0;
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -231,6 +231,7 @@ static int probe_map_create(enum bpf_map_type map_type)
 			return btf_fd;
 		break;
 	case BPF_MAP_TYPE_RINGBUF:
+	case BPF_MAP_TYPE_USER_RINGBUF:
 		key_size = 0;
 		value_size = 0;
 		max_entries = 4096;

--- a/tools/lib/bpf/libbpf_version.h
+++ b/tools/lib/bpf/libbpf_version.h
@@ -4,6 +4,6 @@
 #define __LIBBPF_VERSION_H
 #define LIBBPF_MAJOR_VERSION 1
-#define LIBBPF_MINOR_VERSION 0
+#define LIBBPF_MINOR_VERSION 1
 #endif /* __LIBBPF_VERSION_H */
--- a/tools/lib/bpf/ringbuf.c
+++ b/tools/lib/bpf/ringbuf.c
@@ -16,6 +16,7 @@
 #include <asm/barrier.h>
 #include <sys/mman.h>
 #include <sys/epoll.h>
+#include <time.h>
 #include "libbpf.h"
 #include "libbpf_internal.h"
@@ -39,6 +40,23 @@ struct ring_buffer {
 	int ring_cnt;
 };
+struct user_ring_buffer {
+	struct epoll_event event;
+	unsigned long *consumer_pos;
+	unsigned long *producer_pos;
+	void *data;
+	unsigned long mask;
+	size_t page_size;
+	int map_fd;
+	int epoll_fd;
+};
+/* 8-byte ring buffer header structure */
+struct ringbuf_hdr {
+	__u32 len;
+	__u32 pad;
+};
 static void ringbuf_unmap_ring(struct ring_buffer *rb, struct ring *r)
 {
 	if (r->consumer_pos) {
@@ -300,3 +318,256 @@ int ring_buffer__epoll_fd(const struct ring_buffer *rb)
 {
 	return rb->epoll_fd;
 }
+static void user_ringbuf_unmap_ring(struct user_ring_buffer *rb)
+{
+	if (rb->consumer_pos) {
+		munmap(rb->consumer_pos, rb->page_size);
+		rb->consumer_pos = NULL;
+	}
+	if (rb->producer_pos) {
+		munmap(rb->producer_pos, rb->page_size + 2 * (rb->mask + 1));
+		rb->producer_pos = NULL;
+	}
+}
+void user_ring_buffer__free(struct user_ring_buffer *rb)
+{
+	if (!rb)
+		return;
+	user_ringbuf_unmap_ring(rb);
+	if (rb->epoll_fd >= 0)
+		close(rb->epoll_fd);
+	free(rb);
+}
+static int user_ringbuf_map(struct user_ring_buffer *rb, int map_fd)
+{
+	struct bpf_map_info info;
+	__u32 len = sizeof(info);
+	void *tmp;
+	struct epoll_event *rb_epoll;
+	int err;
+	memset(&info, 0, sizeof(info));
+	err = bpf_obj_get_info_by_fd(map_fd, &info, &len);
+	if (err) {
+		err = -errno;
+		pr_warn("user ringbuf: failed to get map info for fd=%d: %d\n", map_fd, err);
+		return err;
+	}
+	if (info.type != BPF_MAP_TYPE_USER_RINGBUF) {
+		pr_warn("user ringbuf: map fd=%d is not BPF_MAP_TYPE_USER_RINGBUF\n", map_fd);
+		return -EINVAL;
+	}
+	rb->map_fd = map_fd;
+	rb->mask = info.max_entries - 1;
+	/* Map read-only consumer page */
+	tmp = mmap(NULL, rb->page_size, PROT_READ, MAP_SHARED, map_fd, 0);
+	if (tmp == MAP_FAILED) {
+		err = -errno;
+		pr_warn("user ringbuf: failed to mmap consumer page for map fd=%d: %d\n",
+			map_fd, err);
+		return err;
+	}
+	rb->consumer_pos = tmp;
+	/* Map read-write the producer page and data pages. We map the data
+	 * region as twice the total size of the ring buffer to allow the
+	 * simple reading and writing of samples that wrap around the end of
+	 * the buffer.  See the kernel implementation for details.
+	 */
+	tmp = mmap(NULL, rb->page_size + 2 * info.max_entries,
+		   PROT_READ | PROT_WRITE, MAP_SHARED, map_fd, rb->page_size);
+	if (tmp == MAP_FAILED) {
+		err = -errno;
+		pr_warn("user ringbuf: failed to mmap data pages for map fd=%d: %d\n",
+			map_fd, err);
+		return err;
+	}
+	rb->producer_pos = tmp;
+	rb->data = tmp + rb->page_size;
+	rb_epoll = &rb->event;
+	rb_epoll->events = EPOLLOUT;
+	if (epoll_ctl(rb->epoll_fd, EPOLL_CTL_ADD, map_fd, rb_epoll) < 0) {
+		err = -errno;
+		pr_warn("user ringbuf: failed to epoll add map fd=%d: %d\n", map_fd, err);
+		return err;
+	}
+	return 0;
+}
+struct user_ring_buffer *
+user_ring_buffer__new(int map_fd, const struct user_ring_buffer_opts *opts)
+{
+	struct user_ring_buffer *rb;
+	int err;
+	if (!OPTS_VALID(opts, user_ring_buffer_opts))
+		return errno = EINVAL, NULL;
+	rb = calloc(1, sizeof(*rb));
+	if (!rb)
+		return errno = ENOMEM, NULL;
+	rb->page_size = getpagesize();
+	rb->epoll_fd = epoll_create1(EPOLL_CLOEXEC);
+	if (rb->epoll_fd < 0) {
+		err = -errno;
+		pr_warn("user ringbuf: failed to create epoll instance: %d\n", err);
+		goto err_out;
+	}
+	err = user_ringbuf_map(rb, map_fd);
+	if (err)
+		goto err_out;
+	return rb;
+err_out:
+	user_ring_buffer__free(rb);
+	return errno = -err, NULL;
+}
+static void user_ringbuf_commit(struct user_ring_buffer *rb, void *sample, bool discard)
+{
+	__u32 new_len;
+	struct ringbuf_hdr *hdr;
+	uintptr_t hdr_offset;
+	hdr_offset = rb->mask + 1 + (sample - rb->data) - BPF_RINGBUF_HDR_SZ;
+	hdr = rb->data + (hdr_offset & rb->mask);
+	new_len = hdr->len & ~BPF_RINGBUF_BUSY_BIT;
+	if (discard)
+		new_len |= BPF_RINGBUF_DISCARD_BIT;
+	/* Synchronizes with smp_load_acquire() in __bpf_user_ringbuf_peek() in
+	 * the kernel.
+	 */
+	__atomic_exchange_n(&hdr->len, new_len, __ATOMIC_ACQ_REL);
+}
+void user_ring_buffer__discard(struct user_ring_buffer *rb, void *sample)
+{
+	user_ringbuf_commit(rb, sample, true);
+}
+void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample)
+{
+	user_ringbuf_commit(rb, sample, false);
+}
+void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size)
+{
+	__u32 avail_size, total_size, max_size;
+	/* 64-bit to avoid overflow in case of extreme application behavior */
+	__u64 cons_pos, prod_pos;
+	struct ringbuf_hdr *hdr;
+	/* Synchronizes with smp_store_release() in __bpf_user_ringbuf_peek() in
+	 * the kernel.
+	 */
+	cons_pos = smp_load_acquire(rb->consumer_pos);
+	/* Synchronizes with smp_store_release() in user_ringbuf_commit() */
+	prod_pos = smp_load_acquire(rb->producer_pos);
+	max_size = rb->mask + 1;
+	avail_size = max_size - (prod_pos - cons_pos);
+	/* Round up total size to a multiple of 8. */
+	total_size = (size + BPF_RINGBUF_HDR_SZ + 7) / 8 * 8;
+	if (total_size > max_size)
+		return errno = E2BIG, NULL;
+	if (avail_size < total_size)
+		return errno = ENOSPC, NULL;
+	hdr = rb->data + (prod_pos & rb->mask);
+	hdr->len = size | BPF_RINGBUF_BUSY_BIT;
+	hdr->pad = 0;
+	/* Synchronizes with smp_load_acquire() in __bpf_user_ringbuf_peek() in
+	 * the kernel.
+	 */
+	smp_store_release(rb->producer_pos, prod_pos + total_size);
+	return (void *)rb->data + ((prod_pos + BPF_RINGBUF_HDR_SZ) & rb->mask);
+}
+static __u64 ns_elapsed_timespec(const struct timespec *start, const struct timespec *end)
+{
+	__u64 start_ns, end_ns, ns_per_s = 1000000000;
+	start_ns = (__u64)start->tv_sec * ns_per_s + start->tv_nsec;
+	end_ns = (__u64)end->tv_sec * ns_per_s + end->tv_nsec;
+	return end_ns - start_ns;
+}
+void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb, __u32 size, int timeout_ms)
+{
+	void *sample;
+	int err, ms_remaining = timeout_ms;
+	struct timespec start;
+	if (timeout_ms < 0 && timeout_ms != -1)
+		return errno = EINVAL, NULL;
+	if (timeout_ms != -1) {
+		err = clock_gettime(CLOCK_MONOTONIC, &start);
+		if (err)
+			return NULL;
+	}
+	do {
+		int cnt, ms_elapsed;
+		struct timespec curr;
+		__u64 ns_per_ms = 1000000;
+		sample = user_ring_buffer__reserve(rb, size);
+		if (sample)
+			return sample;
+		else if (errno != ENOSPC)
+			return NULL;
+		/* The kernel guarantees at least one event notification
+		 * delivery whenever at least one sample is drained from the
+		 * ring buffer in an invocation to bpf_ringbuf_drain(). Other
+		 * additional events may be delivered at any time, but only one
+		 * event is guaranteed per bpf_ringbuf_drain() invocation,
+		 * provided that a sample is drained, and the BPF program did
+		 * not pass BPF_RB_NO_WAKEUP to bpf_ringbuf_drain(). If
+		 * BPF_RB_FORCE_WAKEUP is passed to bpf_ringbuf_drain(), a
+		 * wakeup event will be delivered even if no samples are
+		 * drained.
+		 */
+		cnt = epoll_wait(rb->epoll_fd, &rb->event, 1, ms_remaining);
+		if (cnt < 0)
+			return NULL;
+		if (timeout_ms == -1)
+			continue;
+		err = clock_gettime(CLOCK_MONOTONIC, &curr);
+		if (err)
+			return NULL;
+		ms_elapsed = ns_elapsed_timespec(&start, &curr) / ns_per_ms;
+		ms_remaining = timeout_ms - ms_elapsed;
+	} while (ms_remaining > 0);
+	/* Try one more time to reserve a sample after the specified timeout has elapsed. */
+	return user_ring_buffer__reserve(rb, size);
+}
--- a/tools/testing/selftests/bpf/DENYLIST.s390x
+++ b/tools/testing/selftests/bpf/DENYLIST.s390x
@@ -71,3 +71,4 @@ cb_refs                                  # expected error message unexpected err
 cgroup_hierarchical_stats                # JIT does not support calling kernel function                                (kfunc)
 htab_update                              # failed to attach: ERROR: strerror_r(-524)=22                                (trampoline)
 tracing_struct                           # failed to auto-attach: -524                                                 (trampoline)
+user_ringbuf                             # failed to find kernel BTF type ID of '__s390x_sys_prctl': -3                (?)
--- a/tools/testing/selftests/bpf/prog_tests/user_ringbuf.c
+++ b/tools/testing/selftests/bpf/prog_tests/user_ringbuf.c
--- a/tools/testing/selftests/bpf/progs/test_user_ringbuf.h
+++ b/tools/testing/selftests/bpf/progs/test_user_ringbuf.h
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+#ifndef _TEST_USER_RINGBUF_H
+#define _TEST_USER_RINGBUF_H
+#define TEST_OP_64 4
+#define TEST_OP_32 2
+enum test_msg_op {
+	TEST_MSG_OP_INC64,
+	TEST_MSG_OP_INC32,
+	TEST_MSG_OP_MUL64,
+	TEST_MSG_OP_MUL32,
+	// Must come last.
+	TEST_MSG_OP_NUM_OPS,
+};
+struct test_msg {
+	enum test_msg_op msg_op;
+	union {
+		__s64 operand_64;
+		__s32 operand_32;
+	};
+};
+struct sample {
+	int pid;
+	int seq;
+	long value;
+	char comm[16];
+};
+#endif /* _TEST_USER_RINGBUF_H */
--- a/tools/testing/selftests/bpf/progs/user_ringbuf_fail.c
+++ b/tools/testing/selftests/bpf/progs/user_ringbuf_fail.c
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+char _license[] SEC("license") = "GPL";
+struct sample {
+	int pid;
+	int seq;
+	long value;
+	char comm[16];
+};
+struct {
+	__uint(type, BPF_MAP_TYPE_USER_RINGBUF);
+} user_ringbuf SEC(".maps");
+static long
+bad_access1(struct bpf_dynptr *dynptr, void *context)
+{
+	const struct sample *sample;
+	sample = bpf_dynptr_data(dynptr - 1, 0, sizeof(*sample));
+	bpf_printk("Was able to pass bad pointer %lx\n", (__u64)dynptr - 1);
+	return 0;
+}
+/* A callback that accesses a dynptr in a bpf_user_ringbuf_drain callback should
+ * not be able to read before the pointer.
+ */
+SEC("?raw_tp/sys_nanosleep")
+int user_ringbuf_callback_bad_access1(void *ctx)
+{
+	bpf_user_ringbuf_drain(&user_ringbuf, bad_access1, NULL, 0);
+	return 0;
+}
+static long
+bad_access2(struct bpf_dynptr *dynptr, void *context)
+{
+	const struct sample *sample;
+	sample = bpf_dynptr_data(dynptr + 1, 0, sizeof(*sample));
+	bpf_printk("Was able to pass bad pointer %lx\n", (__u64)dynptr + 1);
+	return 0;
+}
+/* A callback that accesses a dynptr in a bpf_user_ringbuf_drain callback should
+ * not be able to read past the end of the pointer.
+ */
+SEC("?raw_tp/sys_nanosleep")
+int user_ringbuf_callback_bad_access2(void *ctx)
+{
+	bpf_user_ringbuf_drain(&user_ringbuf, bad_access2, NULL, 0);
+	return 0;
+}
+static long
+write_forbidden(struct bpf_dynptr *dynptr, void *context)
+{
+	*((long *)dynptr) = 0;
+	return 0;
+}
+/* A callback that accesses a dynptr in a bpf_user_ringbuf_drain callback should
+ * not be able to write to that pointer.
+ */
+SEC("?raw_tp/sys_nanosleep")
+int user_ringbuf_callback_write_forbidden(void *ctx)
+{
+	bpf_user_ringbuf_drain(&user_ringbuf, write_forbidden, NULL, 0);
+	return 0;
+}
+static long
+null_context_write(struct bpf_dynptr *dynptr, void *context)
+{
+	*((__u64 *)context) = 0;
+	return 0;
+}
+/* A callback that accesses a dynptr in a bpf_user_ringbuf_drain callback should
+ * not be able to write to that pointer.
+ */
+SEC("?raw_tp/sys_nanosleep")
+int user_ringbuf_callback_null_context_write(void *ctx)
+{
+	bpf_user_ringbuf_drain(&user_ringbuf, null_context_write, NULL, 0);
+	return 0;
+}
+static long
+null_context_read(struct bpf_dynptr *dynptr, void *context)
+{
+	__u64 id = *((__u64 *)context);
+	bpf_printk("Read id %lu\n", id);
+	return 0;
+}
+/* A callback that accesses a dynptr in a bpf_user_ringbuf_drain callback should
+ * not be able to write to that pointer.
+ */
+SEC("?raw_tp/sys_nanosleep")
+int user_ringbuf_callback_null_context_read(void *ctx)
+{
+	bpf_user_ringbuf_drain(&user_ringbuf, null_context_read, NULL, 0);
+	return 0;
+}
+static long
+try_discard_dynptr(struct bpf_dynptr *dynptr, void *context)
+{
+	bpf_ringbuf_discard_dynptr(dynptr, 0);
+	return 0;
+}
+/* A callback that accesses a dynptr in a bpf_user_ringbuf_drain callback should
+ * not be able to read past the end of the pointer.
+ */
+SEC("?raw_tp/sys_nanosleep")
+int user_ringbuf_callback_discard_dynptr(void *ctx)
+{
+	bpf_user_ringbuf_drain(&user_ringbuf, try_discard_dynptr, NULL, 0);
+	return 0;
+}
+static long
+try_submit_dynptr(struct bpf_dynptr *dynptr, void *context)
+{
+	bpf_ringbuf_submit_dynptr(dynptr, 0);
+	return 0;
+}
+/* A callback that accesses a dynptr in a bpf_user_ringbuf_drain callback should
+ * not be able to read past the end of the pointer.
+ */
+SEC("?raw_tp/sys_nanosleep")
+int user_ringbuf_callback_submit_dynptr(void *ctx)
+{
+	bpf_user_ringbuf_drain(&user_ringbuf, try_submit_dynptr, NULL, 0);
+	return 0;
+}
+static long
+invalid_drain_callback_return(struct bpf_dynptr *dynptr, void *context)
+{
+	return 2;
+}
+/* A callback that accesses a dynptr in a bpf_user_ringbuf_drain callback should
+ * not be able to write to that pointer.
+ */
+SEC("?raw_tp/sys_nanosleep")
+int user_ringbuf_callback_invalid_return(void *ctx)
+{
+	bpf_user_ringbuf_drain(&user_ringbuf, invalid_drain_callback_return, NULL, 0);
+	return 0;
+}
--- a/tools/testing/selftests/bpf/progs/user_ringbuf_success.c
+++ b/tools/testing/selftests/bpf/progs/user_ringbuf_success.c
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+#include "test_user_ringbuf.h"
+char _license[] SEC("license") = "GPL";
+struct {
+	__uint(type, BPF_MAP_TYPE_USER_RINGBUF);
+} user_ringbuf SEC(".maps");
+struct {
+	__uint(type, BPF_MAP_TYPE_RINGBUF);
+} kernel_ringbuf SEC(".maps");
+/* inputs */
+int pid, err, val;
+int read = 0;
+/* Counter used for end-to-end protocol test */
+__u64 kern_mutated = 0;
+__u64 user_mutated = 0;
+__u64 expected_user_mutated = 0;
+static int
+is_test_process(void)
+{
+	int cur_pid = bpf_get_current_pid_tgid() >> 32;
+	return cur_pid == pid;
+}
+static long
+record_sample(struct bpf_dynptr *dynptr, void *context)
+{
+	const struct sample *sample = NULL;
+	struct sample stack_sample;
+	int status;
+	static int num_calls;
+	if (num_calls++ % 2 == 0) {
+		status = bpf_dynptr_read(&stack_sample, sizeof(stack_sample), dynptr, 0, 0);
+		if (status) {
+			bpf_printk("bpf_dynptr_read() failed: %d\n", status);
+			err = 1;
+			return 0;
+		}
+	} else {
+		sample = bpf_dynptr_data(dynptr, 0, sizeof(*sample));
+		if (!sample) {
+			bpf_printk("Unexpectedly failed to get sample\n");
+			err = 2;
+			return 0;
+		}
+		stack_sample = *sample;
+	}
+	__sync_fetch_and_add(&read, 1);
+	return 0;
+}
+static void
+handle_sample_msg(const struct test_msg *msg)
+{
+	switch (msg->msg_op) {
+	case TEST_MSG_OP_INC64:
+		kern_mutated += msg->operand_64;
+		break;
+	case TEST_MSG_OP_INC32:
+		kern_mutated += msg->operand_32;
+		break;
+	case TEST_MSG_OP_MUL64:
+		kern_mutated *= msg->operand_64;
+		break;
+	case TEST_MSG_OP_MUL32:
+		kern_mutated *= msg->operand_32;
+		break;
+	default:
+		bpf_printk("Unrecognized op %d\n", msg->msg_op);
+		err = 2;
+	}
+}
+static long
+read_protocol_msg(struct bpf_dynptr *dynptr, void *context)
+{
+	const struct test_msg *msg = NULL;
+	msg = bpf_dynptr_data(dynptr, 0, sizeof(*msg));
+	if (!msg) {
+		err = 1;
+		bpf_printk("Unexpectedly failed to get msg\n");
+		return 0;
+	}
+	handle_sample_msg(msg);
+	return 0;
+}
+static int publish_next_kern_msg(__u32 index, void *context)
+{
+	struct test_msg *msg = NULL;
+	int operand_64 = TEST_OP_64;
+	int operand_32 = TEST_OP_32;
+	msg = bpf_ringbuf_reserve(&kernel_ringbuf, sizeof(*msg), 0);
+	if (!msg) {
+		err = 4;
+		return 1;
+	}
+	switch (index % TEST_MSG_OP_NUM_OPS) {
+	case TEST_MSG_OP_INC64:
+		msg->operand_64 = operand_64;
+		msg->msg_op = TEST_MSG_OP_INC64;
+		expected_user_mutated += operand_64;
+		break;
+	case TEST_MSG_OP_INC32:
+		msg->operand_32 = operand_32;
+		msg->msg_op = TEST_MSG_OP_INC32;
+		expected_user_mutated += operand_32;
+		break;
+	case TEST_MSG_OP_MUL64:
+		msg->operand_64 = operand_64;
+		msg->msg_op = TEST_MSG_OP_MUL64;
+		expected_user_mutated *= operand_64;
+		break;
+	case TEST_MSG_OP_MUL32:
+		msg->operand_32 = operand_32;
+		msg->msg_op = TEST_MSG_OP_MUL32;
+		expected_user_mutated *= operand_32;
+		break;
+	default:
+		bpf_ringbuf_discard(msg, 0);
+		err = 5;
+		return 1;
+	}
+	bpf_ringbuf_submit(msg, 0);
+	return 0;
+}
+static void
+publish_kern_messages(void)
+{
+	if (expected_user_mutated != user_mutated) {
+		bpf_printk("%lu != %lu\n", expected_user_mutated, user_mutated);
+		err = 3;
+		return;
+	}
+	bpf_loop(8, publish_next_kern_msg, NULL, 0);
+}
+SEC("fentry/" SYS_PREFIX "sys_prctl")
+int test_user_ringbuf_protocol(void *ctx)
+{
+	long status = 0;
+	struct sample *sample = NULL;
+	struct bpf_dynptr ptr;
+	if (!is_test_process())
+		return 0;
+	status = bpf_user_ringbuf_drain(&user_ringbuf, read_protocol_msg, NULL, 0);
+	if (status < 0) {
+		bpf_printk("Drain returned: %ld\n", status);
+		err = 1;
+		return 0;
+	}
+	publish_kern_messages();
+	return 0;
+}
+SEC("fentry/" SYS_PREFIX "sys_getpgid")
+int test_user_ringbuf(void *ctx)
+{
+	int status = 0;
+	struct sample *sample = NULL;
+	struct bpf_dynptr ptr;
+	if (!is_test_process())
+		return 0;
+	err = bpf_user_ringbuf_drain(&user_ringbuf, record_sample, NULL, 0);
+	return 0;
+}
+static long
+do_nothing_cb(struct bpf_dynptr *dynptr, void *context)
+{
+	__sync_fetch_and_add(&read, 1);
+	return 0;
+}
+SEC("fentry/" SYS_PREFIX "sys_getrlimit")
+int test_user_ringbuf_epoll(void *ctx)
+{
+	long num_samples;
+	if (!is_test_process())
+		return 0;
+	num_samples = bpf_user_ringbuf_drain(&user_ringbuf, do_nothing_cb, NULL, 0);
+	if (num_samples <= 0)
+		err = 1;
+	return 0;
+}