Commit 0bc23a1d authored by Daniel Borkmann's avatar Daniel Borkmann

Merge branch 'bpf-umd-debug'

Alexei Starovoitov says:

====================
This patch set is the first real user of user mode driver facility. The
general use case for user mode driver is to ship vmlinux with preloaded BPF
programs. In this particular case the user mode driver populates bpffs instance
with two BPF iterators. In several months BPF_LSM project would need to preload
the kernel with its own set of BPF programs and attach to LSM hooks instead of
bpffs. BPF iterators and BPF_LSM are unstable from uapi perspective. They are
tracing based and peek into arbitrary kernel data structures. One can question
why a kernel module cannot embed BPF programs inside. The reason is that libbpf
is necessary to load them. First libbpf loads BPF Type Format, then creates BPF
maps, populates them. Then it relocates code sections inside BPF programs,
loads BPF programs, and finally attaches them to events. Theoretically libbpf
can be rewritten to work in the kernel, but that is massive undertaking. The
maintenance of in-kernel libbpf and user space libbpf would be another
challenge. Another obstacle to embedding BPF programs into kernel module is
sys_bpf api. Loading of programs, BTF, maps goes through the verifier. It
validates and optimizes the code. It's possible to provide in-kernel api to all
of sys_bpf commands (load progs, create maps, update maps, load BTF, etc), but
that is huge amount of work and forever maintenance headache.
Hence the decision is to ship vmlinux with user mode drivers that load
BPF programs. Just like kernel modules extend vmlinux BPF programs
are safe extensions of the kernel and some of them need to ship with vmlinux.

This patch set adds a kernel module with user mode driver that populates bpffs
with two BPF iterators.

$ mount bpffs /my/bpffs/ -t bpf
$ ls -la /my/bpffs/
total 4
drwxrwxrwt  2 root root    0 Jul  2 00:27 .
drwxr-xr-x 19 root root 4096 Jul  2 00:09 ..
-rw-------  1 root root    0 Jul  2 00:27 maps.debug
-rw-------  1 root root    0 Jul  2 00:27 progs.debug

The user mode driver will load BPF Type Formats, create BPF maps, populate BPF
maps, load two BPF programs, attach them to BPF iterators, and finally send two
bpf_link IDs back to the kernel.
The kernel will pin two bpf_links into newly mounted bpffs instance under
names "progs.debug" and "maps.debug". These two files become human readable.

$ cat /my/bpffs/progs.debug
  id name            attached
  11 dump_bpf_map    bpf_iter_bpf_map
  12 dump_bpf_prog   bpf_iter_bpf_prog
  27 test_pkt_access
  32 test_main       test_pkt_access test_pkt_access
  33 test_subprog1   test_pkt_access_subprog1 test_pkt_access
  34 test_subprog2   test_pkt_access_subprog2 test_pkt_access
  35 test_subprog3   test_pkt_access_subprog3 test_pkt_access
  36 new_get_skb_len get_skb_len test_pkt_access
  37 new_get_skb_ifindex get_skb_ifindex test_pkt_access
  38 new_get_constant get_constant test_pkt_access

The BPF program dump_bpf_prog() in iterators.bpf.c is printing this data about
all BPF programs currently loaded in the system. This information is unstable
and will change from kernel to kernel.

In some sence this output is similar to 'bpftool prog show' that is using
stable api to retreive information about BPF programs. The BPF subsytems grows
quickly and there is always demand to show as much info about BPF things as
possible. But we cannot expose all that info via stable uapi of bpf syscall,
since the details change so much. Right now a BPF program can be attached to
only one other BPF program. Folks are working on patches to enable
multi-attach, but for debugging it's necessary to see the current state. There
is no uapi for that, but above output shows it:
  37 new_get_skb_ifindex  get_skb_ifindex test_pkt_access
  38 new_get_constant     get_constant    test_pkt_access
     [1]                  [2]             [3]
[1] is the full name of BPF prog from BTF.
[2] is the name of function inside target BPF prog.
[3] is the name of target BPF prog.

[2] and [3] are not exposed via uapi, since they will change from single to
multi soon. There are many other cases where bpf internals are useful for
debugging, but shouldn't be exposed via uapi due to high rate of changes.

systemd mounts /sys/fs/bpf at the start, so this kernel module with user mode
driver needs to be available early. BPF_LSM most likely would need to preload
BPF programs even earlier.

Few interesting observations:
- though bpffs comes with two human readble files "progs.debug" and
  "maps.debug" they can be removed. 'rm -f /sys/fs/bpf/progs.debug' will remove
  bpf_link and kernel will automatically unload corresponding BPF progs, maps,
  BTFs. In the future '-o remount' will be able to restore them. This is not
  implemented yet.

- 'ps aux|grep bpf_preload' shows nothing. User mode driver loaded BPF
  iterators and exited. Nothing is lingering in user space at this point.

- We can consider giving 0644 permissions to "progs.debug" and "maps.debug"
  to allow unprivileged users see BPF things loaded in the system.
  We cannot do so with "bpftool prog show", since it's using cap_sys_admin
  parts of bpf syscall.

- The functionality split between core kernel, bpf_preload kernel module and
  user mode driver is very similar to bpfilter style of interaction.

- Similar BPF iterators can be used as unstable extensions to /proc.
  Like mounting /proc can prepopolate some subdirectory in there with
  a BPF iterator that will print QUIC sockets instead of tcp and udp.

Changelog:

v5->v6:
- refactored Makefiles with Andrii's help
  - switched to explicit $(MAKE) style
  - switched to userldlibs instead of userldflags
  - fixed build issue with libbpf Makefile due to invocation from kbuild
- fixed menuconfig order as spotted by Daniel
- introduced CONFIG_USERMODE_DRIVER bool that is selected by bpfilter and bpf_preload

v4->v5:
- addressed Song and Andrii feedback. s/pages/max_entries/

v3->v4:
- took THIS_MODULE in patch 3 as suggested by Daniel to simplify the code.
- converted BPF iterator to use BTF (when available) to print full BPF program name
instead of 16-byte truncated version.
This is something I've been using drgn scripts for.
Take a look at get_name() in iterators.bpf.c to see how short it is comparing
to what user space bpftool would have to do to print the same full name:
. get prog info via obj_info_by_fd
. do get_fd_by_id from info->btf_id
. fetch potentially large BTF of the program from the kernel
. parse that BTF in user space to figure out all type boundaries and string section
. read info->func_info to get btf_id of func_proto from there
. find that btf_id in the parsed BTF
That's quite a bit work for bpftool comparing to few lines in get_name().
I guess would be good to make bpftool do this info extraction anyway.
While doing this BTF reading in the kernel realized that the verifier is not smart
enough to follow double pointers (added to my todo list), otherwise get_name()
would have been even shorter.

v2->v3:
- fixed module unload race (Daniel)
- added selftest (Daniel)
- fixed build bot warning

v1->v2:
- changed names to 'progs.debug' and 'maps.debug' to hopefully better indicate
  instability of the text output. Having dot in the name also guarantees
  that these special files will not conflict with normal bpf objects pinned
  in bpffs, since dot is disallowed for normal pins.
- instead of hard coding link_name in the core bpf moved into UMD.
- cleanedup error handling.
- addressed review comments from Yonghong and Andrii.
====================
Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
parents 6e9cab2e edb65ee5
...@@ -1358,6 +1358,7 @@ int btf_check_type_match(struct bpf_verifier_env *env, struct bpf_prog *prog, ...@@ -1358,6 +1358,7 @@ int btf_check_type_match(struct bpf_verifier_env *env, struct bpf_prog *prog,
struct btf *btf, const struct btf_type *t); struct btf *btf, const struct btf_type *t);
struct bpf_prog *bpf_prog_by_id(u32 id); struct bpf_prog *bpf_prog_by_id(u32 id);
struct bpf_link *bpf_link_by_id(u32 id);
const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id); const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id);
#else /* !CONFIG_BPF_SYSCALL */ #else /* !CONFIG_BPF_SYSCALL */
......
...@@ -1710,6 +1710,8 @@ config BPF_JIT_DEFAULT_ON ...@@ -1710,6 +1710,8 @@ config BPF_JIT_DEFAULT_ON
def_bool ARCH_WANT_DEFAULT_BPF_JIT || BPF_JIT_ALWAYS_ON def_bool ARCH_WANT_DEFAULT_BPF_JIT || BPF_JIT_ALWAYS_ON
depends on HAVE_EBPF_JIT && BPF_JIT depends on HAVE_EBPF_JIT && BPF_JIT
source "kernel/bpf/preload/Kconfig"
config USERFAULTFD config USERFAULTFD
bool "Enable userfaultfd() system call" bool "Enable userfaultfd() system call"
depends on MMU depends on MMU
......
...@@ -12,7 +12,7 @@ obj-y = fork.o exec_domain.o panic.o \ ...@@ -12,7 +12,7 @@ obj-y = fork.o exec_domain.o panic.o \
notifier.o ksysfs.o cred.o reboot.o \ notifier.o ksysfs.o cred.o reboot.o \
async.o range.o smpboot.o ucount.o regset.o async.o range.o smpboot.o ucount.o regset.o
obj-$(CONFIG_BPFILTER) += usermode_driver.o obj-$(CONFIG_USERMODE_DRIVER) += usermode_driver.o
obj-$(CONFIG_MODULES) += kmod.o obj-$(CONFIG_MODULES) += kmod.o
obj-$(CONFIG_MULTIUSER) += groups.o obj-$(CONFIG_MULTIUSER) += groups.o
......
...@@ -29,3 +29,4 @@ ifeq ($(CONFIG_BPF_JIT),y) ...@@ -29,3 +29,4 @@ ifeq ($(CONFIG_BPF_JIT),y)
obj-$(CONFIG_BPF_SYSCALL) += bpf_struct_ops.o obj-$(CONFIG_BPF_SYSCALL) += bpf_struct_ops.o
obj-${CONFIG_BPF_LSM} += bpf_lsm.o obj-${CONFIG_BPF_LSM} += bpf_lsm.o
endif endif
obj-$(CONFIG_BPF_PRELOAD) += preload/
...@@ -20,6 +20,7 @@ ...@@ -20,6 +20,7 @@
#include <linux/filter.h> #include <linux/filter.h>
#include <linux/bpf.h> #include <linux/bpf.h>
#include <linux/bpf_trace.h> #include <linux/bpf_trace.h>
#include "preload/bpf_preload.h"
enum bpf_type { enum bpf_type {
BPF_TYPE_UNSPEC = 0, BPF_TYPE_UNSPEC = 0,
...@@ -369,9 +370,10 @@ static struct dentry * ...@@ -369,9 +370,10 @@ static struct dentry *
bpf_lookup(struct inode *dir, struct dentry *dentry, unsigned flags) bpf_lookup(struct inode *dir, struct dentry *dentry, unsigned flags)
{ {
/* Dots in names (e.g. "/sys/fs/bpf/foo.bar") are reserved for future /* Dots in names (e.g. "/sys/fs/bpf/foo.bar") are reserved for future
* extensions. * extensions. That allows popoulate_bpffs() create special files.
*/ */
if (strchr(dentry->d_name.name, '.')) if ((dir->i_mode & S_IALLUGO) &&
strchr(dentry->d_name.name, '.'))
return ERR_PTR(-EPERM); return ERR_PTR(-EPERM);
return simple_lookup(dir, dentry, flags); return simple_lookup(dir, dentry, flags);
...@@ -409,6 +411,27 @@ static const struct inode_operations bpf_dir_iops = { ...@@ -409,6 +411,27 @@ static const struct inode_operations bpf_dir_iops = {
.unlink = simple_unlink, .unlink = simple_unlink,
}; };
/* pin iterator link into bpffs */
static int bpf_iter_link_pin_kernel(struct dentry *parent,
const char *name, struct bpf_link *link)
{
umode_t mode = S_IFREG | S_IRUSR;
struct dentry *dentry;
int ret;
inode_lock(parent->d_inode);
dentry = lookup_one_len(name, parent, strlen(name));
if (IS_ERR(dentry)) {
inode_unlock(parent->d_inode);
return PTR_ERR(dentry);
}
ret = bpf_mkobj_ops(dentry, mode, link, &bpf_link_iops,
&bpf_iter_fops);
dput(dentry);
inode_unlock(parent->d_inode);
return ret;
}
static int bpf_obj_do_pin(const char __user *pathname, void *raw, static int bpf_obj_do_pin(const char __user *pathname, void *raw,
enum bpf_type type) enum bpf_type type)
{ {
...@@ -638,6 +661,91 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param) ...@@ -638,6 +661,91 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
return 0; return 0;
} }
struct bpf_preload_ops *bpf_preload_ops;
EXPORT_SYMBOL_GPL(bpf_preload_ops);
static bool bpf_preload_mod_get(void)
{
/* If bpf_preload.ko wasn't loaded earlier then load it now.
* When bpf_preload is built into vmlinux the module's __init
* function will populate it.
*/
if (!bpf_preload_ops) {
request_module("bpf_preload");
if (!bpf_preload_ops)
return false;
}
/* And grab the reference, so the module doesn't disappear while the
* kernel is interacting with the kernel module and its UMD.
*/
if (!try_module_get(bpf_preload_ops->owner)) {
pr_err("bpf_preload module get failed.\n");
return false;
}
return true;
}
static void bpf_preload_mod_put(void)
{
if (bpf_preload_ops)
/* now user can "rmmod bpf_preload" if necessary */
module_put(bpf_preload_ops->owner);
}
static DEFINE_MUTEX(bpf_preload_lock);
static int populate_bpffs(struct dentry *parent)
{
struct bpf_preload_info objs[BPF_PRELOAD_LINKS] = {};
struct bpf_link *links[BPF_PRELOAD_LINKS] = {};
int err = 0, i;
/* grab the mutex to make sure the kernel interactions with bpf_preload
* UMD are serialized
*/
mutex_lock(&bpf_preload_lock);
/* if bpf_preload.ko wasn't built into vmlinux then load it */
if (!bpf_preload_mod_get())
goto out;
if (!bpf_preload_ops->info.tgid) {
/* preload() will start UMD that will load BPF iterator programs */
err = bpf_preload_ops->preload(objs);
if (err)
goto out_put;
for (i = 0; i < BPF_PRELOAD_LINKS; i++) {
links[i] = bpf_link_by_id(objs[i].link_id);
if (IS_ERR(links[i])) {
err = PTR_ERR(links[i]);
goto out_put;
}
}
for (i = 0; i < BPF_PRELOAD_LINKS; i++) {
err = bpf_iter_link_pin_kernel(parent,
objs[i].link_name, links[i]);
if (err)
goto out_put;
/* do not unlink successfully pinned links even
* if later link fails to pin
*/
links[i] = NULL;
}
/* finish() will tell UMD process to exit */
err = bpf_preload_ops->finish();
if (err)
goto out_put;
}
out_put:
bpf_preload_mod_put();
out:
mutex_unlock(&bpf_preload_lock);
for (i = 0; i < BPF_PRELOAD_LINKS && err; i++)
if (!IS_ERR_OR_NULL(links[i]))
bpf_link_put(links[i]);
return err;
}
static int bpf_fill_super(struct super_block *sb, struct fs_context *fc) static int bpf_fill_super(struct super_block *sb, struct fs_context *fc)
{ {
static const struct tree_descr bpf_rfiles[] = { { "" } }; static const struct tree_descr bpf_rfiles[] = { { "" } };
...@@ -654,8 +762,8 @@ static int bpf_fill_super(struct super_block *sb, struct fs_context *fc) ...@@ -654,8 +762,8 @@ static int bpf_fill_super(struct super_block *sb, struct fs_context *fc)
inode = sb->s_root->d_inode; inode = sb->s_root->d_inode;
inode->i_op = &bpf_dir_iops; inode->i_op = &bpf_dir_iops;
inode->i_mode &= ~S_IALLUGO; inode->i_mode &= ~S_IALLUGO;
populate_bpffs(sb->s_root);
inode->i_mode |= S_ISVTX | opts->mode; inode->i_mode |= S_ISVTX | opts->mode;
return 0; return 0;
} }
...@@ -705,6 +813,8 @@ static int __init bpf_init(void) ...@@ -705,6 +813,8 @@ static int __init bpf_init(void)
{ {
int ret; int ret;
mutex_init(&bpf_preload_lock);
ret = sysfs_create_mount_point(fs_kobj, "bpf"); ret = sysfs_create_mount_point(fs_kobj, "bpf");
if (ret) if (ret)
return ret; return ret;
......
# SPDX-License-Identifier: GPL-2.0-only
config USERMODE_DRIVER
bool
default n
menuconfig BPF_PRELOAD
bool "Preload BPF file system with kernel specific program and map iterators"
depends on BPF
select USERMODE_DRIVER
help
This builds kernel module with several embedded BPF programs that are
pinned into BPF FS mount point as human readable files that are
useful in debugging and introspection of BPF programs and maps.
if BPF_PRELOAD
config BPF_PRELOAD_UMD
tristate "bpf_preload kernel module with user mode driver"
depends on CC_CAN_LINK
depends on m || CC_CAN_LINK_STATIC
default m
help
This builds bpf_preload kernel module with embedded user mode driver.
endif
# SPDX-License-Identifier: GPL-2.0
LIBBPF_SRCS = $(srctree)/tools/lib/bpf/
LIBBPF_A = $(obj)/libbpf.a
LIBBPF_OUT = $(abspath $(obj))
$(LIBBPF_A):
$(Q)$(MAKE) -C $(LIBBPF_SRCS) OUTPUT=$(LIBBPF_OUT)/ $(LIBBPF_OUT)/libbpf.a
userccflags += -I $(srctree)/tools/include/ -I $(srctree)/tools/include/uapi \
-I $(srctree)/tools/lib/ -Wno-unused-result
userprogs := bpf_preload_umd
bpf_preload_umd-objs := iterators/iterators.o
bpf_preload_umd-userldlibs := $(LIBBPF_A) -lelf -lz
$(obj)/bpf_preload_umd: $(LIBBPF_A)
$(obj)/bpf_preload_umd_blob.o: $(obj)/bpf_preload_umd
obj-$(CONFIG_BPF_PRELOAD_UMD) += bpf_preload.o
bpf_preload-objs += bpf_preload_kern.o bpf_preload_umd_blob.o
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _BPF_PRELOAD_H
#define _BPF_PRELOAD_H
#include <linux/usermode_driver.h>
#include "iterators/bpf_preload_common.h"
struct bpf_preload_ops {
struct umd_info info;
int (*preload)(struct bpf_preload_info *);
int (*finish)(void);
struct module *owner;
};
extern struct bpf_preload_ops *bpf_preload_ops;
#define BPF_PRELOAD_LINKS 2
#endif
// SPDX-License-Identifier: GPL-2.0
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/init.h>
#include <linux/module.h>
#include <linux/pid.h>
#include <linux/fs.h>
#include <linux/sched/signal.h>
#include "bpf_preload.h"
extern char bpf_preload_umd_start;
extern char bpf_preload_umd_end;
static int preload(struct bpf_preload_info *obj);
static int finish(void);
static struct bpf_preload_ops umd_ops = {
.info.driver_name = "bpf_preload",
.preload = preload,
.finish = finish,
.owner = THIS_MODULE,
};
static int preload(struct bpf_preload_info *obj)
{
int magic = BPF_PRELOAD_START;
loff_t pos = 0;
int i, err;
ssize_t n;
err = fork_usermode_driver(&umd_ops.info);
if (err)
return err;
/* send the start magic to let UMD proceed with loading BPF progs */
n = kernel_write(umd_ops.info.pipe_to_umh,
&magic, sizeof(magic), &pos);
if (n != sizeof(magic))
return -EPIPE;
/* receive bpf_link IDs and names from UMD */
pos = 0;
for (i = 0; i < BPF_PRELOAD_LINKS; i++) {
n = kernel_read(umd_ops.info.pipe_from_umh,
&obj[i], sizeof(*obj), &pos);
if (n != sizeof(*obj))
return -EPIPE;
}
return 0;
}
static int finish(void)
{
int magic = BPF_PRELOAD_END;
struct pid *tgid;
loff_t pos = 0;
ssize_t n;
/* send the last magic to UMD. It will do a normal exit. */
n = kernel_write(umd_ops.info.pipe_to_umh,
&magic, sizeof(magic), &pos);
if (n != sizeof(magic))
return -EPIPE;
tgid = umd_ops.info.tgid;
wait_event(tgid->wait_pidfd, thread_group_exited(tgid));
umd_ops.info.tgid = NULL;
return 0;
}
static int __init load_umd(void)
{
int err;
err = umd_load_blob(&umd_ops.info, &bpf_preload_umd_start,
&bpf_preload_umd_end - &bpf_preload_umd_start);
if (err)
return err;
bpf_preload_ops = &umd_ops;
return err;
}
static void __exit fini_umd(void)
{
bpf_preload_ops = NULL;
/* kill UMD in case it's still there due to earlier error */
kill_pid(umd_ops.info.tgid, SIGKILL, 1);
umd_ops.info.tgid = NULL;
umd_unload_blob(&umd_ops.info);
}
late_initcall(load_umd);
module_exit(fini_umd);
MODULE_LICENSE("GPL");
/* SPDX-License-Identifier: GPL-2.0 */
.section .init.rodata, "a"
.global bpf_preload_umd_start
bpf_preload_umd_start:
.incbin "kernel/bpf/preload/bpf_preload_umd"
.global bpf_preload_umd_end
bpf_preload_umd_end:
# SPDX-License-Identifier: GPL-2.0-only
/.output
# SPDX-License-Identifier: GPL-2.0
OUTPUT := .output
CLANG ?= clang
LLC ?= llc
LLVM_STRIP ?= llvm-strip
DEFAULT_BPFTOOL := $(OUTPUT)/sbin/bpftool
BPFTOOL ?= $(DEFAULT_BPFTOOL)
LIBBPF_SRC := $(abspath ../../../../tools/lib/bpf)
BPFOBJ := $(OUTPUT)/libbpf.a
BPF_INCLUDE := $(OUTPUT)
INCLUDES := -I$(OUTPUT) -I$(BPF_INCLUDE) -I$(abspath ../../../../tools/lib) \
-I$(abspath ../../../../tools/include/uapi)
CFLAGS := -g -Wall
abs_out := $(abspath $(OUTPUT))
ifeq ($(V),1)
Q =
msg =
else
Q = @
msg = @printf ' %-8s %s%s\n' "$(1)" "$(notdir $(2))" "$(if $(3), $(3))";
MAKEFLAGS += --no-print-directory
submake_extras := feature_display=0
endif
.DELETE_ON_ERROR:
.PHONY: all clean
all: iterators.skel.h
clean:
$(call msg,CLEAN)
$(Q)rm -rf $(OUTPUT) iterators
iterators.skel.h: $(OUTPUT)/iterators.bpf.o | $(BPFTOOL)
$(call msg,GEN-SKEL,$@)
$(Q)$(BPFTOOL) gen skeleton $< > $@
$(OUTPUT)/iterators.bpf.o: iterators.bpf.c $(BPFOBJ) | $(OUTPUT)
$(call msg,BPF,$@)
$(Q)$(CLANG) -g -O2 -target bpf $(INCLUDES) \
-c $(filter %.c,$^) -o $@ && \
$(LLVM_STRIP) -g $@
$(OUTPUT):
$(call msg,MKDIR,$@)
$(Q)mkdir -p $(OUTPUT)
$(BPFOBJ): $(wildcard $(LIBBPF_SRC)/*.[ch] $(LIBBPF_SRC)/Makefile) | $(OUTPUT)
$(Q)$(MAKE) $(submake_extras) -C $(LIBBPF_SRC) \
OUTPUT=$(abspath $(dir $@))/ $(abspath $@)
$(DEFAULT_BPFTOOL):
$(Q)$(MAKE) $(submake_extras) -C ../../../../tools/bpf/bpftool \
prefix= OUTPUT=$(abs_out)/ DESTDIR=$(abs_out) install
WARNING:
If you change "iterators.bpf.c" do "make -j" in this directory to rebuild "iterators.skel.h".
Make sure to have clang 10 installed.
See Documentation/bpf/bpf_devel_QA.rst
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _BPF_PRELOAD_COMMON_H
#define _BPF_PRELOAD_COMMON_H
#define BPF_PRELOAD_START 0x5555
#define BPF_PRELOAD_END 0xAAAA
struct bpf_preload_info {
char link_name[16];
int link_id;
};
#endif
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2020 Facebook */
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)
struct seq_file;
struct bpf_iter_meta {
struct seq_file *seq;
__u64 session_id;
__u64 seq_num;
};
struct bpf_map {
__u32 id;
char name[16];
__u32 max_entries;
};
struct bpf_iter__bpf_map {
struct bpf_iter_meta *meta;
struct bpf_map *map;
};
struct btf_type {
__u32 name_off;
};
struct btf_header {
__u32 str_len;
};
struct btf {
const char *strings;
struct btf_type **types;
struct btf_header hdr;
};
struct bpf_prog_aux {
__u32 id;
char name[16];
const char *attach_func_name;
struct bpf_prog *linked_prog;
struct bpf_func_info *func_info;
struct btf *btf;
};
struct bpf_prog {
struct bpf_prog_aux *aux;
};
struct bpf_iter__bpf_prog {
struct bpf_iter_meta *meta;
struct bpf_prog *prog;
};
#pragma clang attribute pop
static const char *get_name(struct btf *btf, long btf_id, const char *fallback)
{
struct btf_type **types, *t;
unsigned int name_off;
const char *str;
if (!btf)
return fallback;
str = btf->strings;
types = btf->types;
bpf_probe_read_kernel(&t, sizeof(t), types + btf_id);
name_off = BPF_CORE_READ(t, name_off);
if (name_off >= btf->hdr.str_len)
return fallback;
return str + name_off;
}
SEC("iter/bpf_map")
int dump_bpf_map(struct bpf_iter__bpf_map *ctx)
{
struct seq_file *seq = ctx->meta->seq;
__u64 seq_num = ctx->meta->seq_num;
struct bpf_map *map = ctx->map;
if (!map)
return 0;
if (seq_num == 0)
BPF_SEQ_PRINTF(seq, " id name max_entries\n");
BPF_SEQ_PRINTF(seq, "%4u %-16s%6d\n", map->id, map->name, map->max_entries);
return 0;
}
SEC("iter/bpf_prog")
int dump_bpf_prog(struct bpf_iter__bpf_prog *ctx)
{
struct seq_file *seq = ctx->meta->seq;
__u64 seq_num = ctx->meta->seq_num;
struct bpf_prog *prog = ctx->prog;
struct bpf_prog_aux *aux;
if (!prog)
return 0;
aux = prog->aux;
if (seq_num == 0)
BPF_SEQ_PRINTF(seq, " id name attached\n");
BPF_SEQ_PRINTF(seq, "%4u %-16s %s %s\n", aux->id,
get_name(aux->btf, aux->func_info[0].type_id, aux->name),
aux->attach_func_name, aux->linked_prog->aux->name);
return 0;
}
char LICENSE[] SEC("license") = "GPL";
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2020 Facebook */
#include <argp.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/resource.h>
#include <bpf/libbpf.h>
#include <bpf/bpf.h>
#include <sys/mount.h>
#include "iterators.skel.h"
#include "bpf_preload_common.h"
int to_kernel = -1;
int from_kernel = 0;
static int send_link_to_kernel(struct bpf_link *link, const char *link_name)
{
struct bpf_preload_info obj = {};
struct bpf_link_info info = {};
__u32 info_len = sizeof(info);
int err;
err = bpf_obj_get_info_by_fd(bpf_link__fd(link), &info, &info_len);
if (err)
return err;
obj.link_id = info.id;
if (strlen(link_name) >= sizeof(obj.link_name))
return -E2BIG;
strcpy(obj.link_name, link_name);
if (write(to_kernel, &obj, sizeof(obj)) != sizeof(obj))
return -EPIPE;
return 0;
}
int main(int argc, char **argv)
{
struct rlimit rlim = { RLIM_INFINITY, RLIM_INFINITY };
struct iterators_bpf *skel;
int err, magic;
int debug_fd;
debug_fd = open("/dev/console", O_WRONLY | O_NOCTTY | O_CLOEXEC);
if (debug_fd < 0)
return 1;
to_kernel = dup(1);
close(1);
dup(debug_fd);
/* now stdin and stderr point to /dev/console */
read(from_kernel, &magic, sizeof(magic));
if (magic != BPF_PRELOAD_START) {
printf("bad start magic %d\n", magic);
return 1;
}
setrlimit(RLIMIT_MEMLOCK, &rlim);
/* libbpf opens BPF object and loads it into the kernel */
skel = iterators_bpf__open_and_load();
if (!skel) {
/* iterators.skel.h is little endian.
* libbpf doesn't support automatic little->big conversion
* of BPF bytecode yet.
* The program load will fail in such case.
*/
printf("Failed load could be due to wrong endianness\n");
return 1;
}
err = iterators_bpf__attach(skel);
if (err)
goto cleanup;
/* send two bpf_link IDs with names to the kernel */
err = send_link_to_kernel(skel->links.dump_bpf_map, "maps.debug");
if (err)
goto cleanup;
err = send_link_to_kernel(skel->links.dump_bpf_prog, "progs.debug");
if (err)
goto cleanup;
/* The kernel will proceed with pinnging the links in bpffs.
* UMD will wait on read from pipe.
*/
read(from_kernel, &magic, sizeof(magic));
if (magic != BPF_PRELOAD_END) {
printf("bad final magic %d\n", magic);
err = -EINVAL;
}
cleanup:
iterators_bpf__destroy(skel);
return err != 0;
}
This diff is collapsed.
...@@ -4014,40 +4014,50 @@ static int link_detach(union bpf_attr *attr) ...@@ -4014,40 +4014,50 @@ static int link_detach(union bpf_attr *attr)
return ret; return ret;
} }
static int bpf_link_inc_not_zero(struct bpf_link *link) static struct bpf_link *bpf_link_inc_not_zero(struct bpf_link *link)
{ {
return atomic64_fetch_add_unless(&link->refcnt, 1, 0) ? 0 : -ENOENT; return atomic64_fetch_add_unless(&link->refcnt, 1, 0) ? link : ERR_PTR(-ENOENT);
} }
#define BPF_LINK_GET_FD_BY_ID_LAST_FIELD link_id struct bpf_link *bpf_link_by_id(u32 id)
static int bpf_link_get_fd_by_id(const union bpf_attr *attr)
{ {
struct bpf_link *link; struct bpf_link *link;
u32 id = attr->link_id;
int fd, err;
if (CHECK_ATTR(BPF_LINK_GET_FD_BY_ID)) if (!id)
return -EINVAL; return ERR_PTR(-ENOENT);
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
spin_lock_bh(&link_idr_lock); spin_lock_bh(&link_idr_lock);
link = idr_find(&link_idr, id);
/* before link is "settled", ID is 0, pretend it doesn't exist yet */ /* before link is "settled", ID is 0, pretend it doesn't exist yet */
link = idr_find(&link_idr, id);
if (link) { if (link) {
if (link->id) if (link->id)
err = bpf_link_inc_not_zero(link); link = bpf_link_inc_not_zero(link);
else else
err = -EAGAIN; link = ERR_PTR(-EAGAIN);
} else { } else {
err = -ENOENT; link = ERR_PTR(-ENOENT);
} }
spin_unlock_bh(&link_idr_lock); spin_unlock_bh(&link_idr_lock);
return link;
}
if (err) #define BPF_LINK_GET_FD_BY_ID_LAST_FIELD link_id
return err;
static int bpf_link_get_fd_by_id(const union bpf_attr *attr)
{
struct bpf_link *link;
u32 id = attr->link_id;
int fd;
if (CHECK_ATTR(BPF_LINK_GET_FD_BY_ID))
return -EINVAL;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
link = bpf_link_by_id(id);
if (IS_ERR(link))
return PTR_ERR(link);
fd = bpf_link_new_fd(link); fd = bpf_link_new_fd(link);
if (fd < 0) if (fd < 0)
......
...@@ -2,6 +2,7 @@ ...@@ -2,6 +2,7 @@
menuconfig BPFILTER menuconfig BPFILTER
bool "BPF based packet filtering framework (BPFILTER)" bool "BPF based packet filtering framework (BPFILTER)"
depends on NET && BPF && INET depends on NET && BPF && INET
select USERMODE_DRIVER
help help
This builds experimental bpfilter framework that is aiming to This builds experimental bpfilter framework that is aiming to
provide netfilter compatible functionality via BPF provide netfilter compatible functionality via BPF
......
# SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
# Most of this file is copied from tools/lib/traceevent/Makefile # Most of this file is copied from tools/lib/traceevent/Makefile
RM ?= rm
srctree = $(abs_srctree)
LIBBPF_VERSION := $(shell \ LIBBPF_VERSION := $(shell \
grep -oE '^LIBBPF_([0-9.]+)' libbpf.map | \ grep -oE '^LIBBPF_([0-9.]+)' libbpf.map | \
sort -rV | head -n1 | cut -d'_' -f2) sort -rV | head -n1 | cut -d'_' -f2)
...@@ -188,7 +191,7 @@ $(OUTPUT)libbpf.so.$(LIBBPF_VERSION): $(BPF_IN_SHARED) ...@@ -188,7 +191,7 @@ $(OUTPUT)libbpf.so.$(LIBBPF_VERSION): $(BPF_IN_SHARED)
@ln -sf $(@F) $(OUTPUT)libbpf.so.$(LIBBPF_MAJOR_VERSION) @ln -sf $(@F) $(OUTPUT)libbpf.so.$(LIBBPF_MAJOR_VERSION)
$(OUTPUT)libbpf.a: $(BPF_IN_STATIC) $(OUTPUT)libbpf.a: $(BPF_IN_STATIC)
$(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^ $(QUIET_LINK)$(RM) -f $@; $(AR) rcs $@ $^
$(OUTPUT)libbpf.pc: $(OUTPUT)libbpf.pc:
$(QUIET_GEN)sed -e "s|@PREFIX@|$(prefix)|" \ $(QUIET_GEN)sed -e "s|@PREFIX@|$(prefix)|" \
...@@ -291,7 +294,7 @@ cscope: ...@@ -291,7 +294,7 @@ cscope:
cscope -b -q -I $(srctree)/include -f cscope.out cscope -b -q -I $(srctree)/include -f cscope.out
tags: tags:
rm -f TAGS tags $(RM) -f TAGS tags
ls *.c *.h | xargs $(TAGS_PROG) -a ls *.c *.h | xargs $(TAGS_PROG) -a
# Declare the contents of the .PHONY variable as phony. We keep that # Declare the contents of the .PHONY variable as phony. We keep that
......
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2020 Facebook */
#define _GNU_SOURCE
#include <sched.h>
#include <sys/mount.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <test_progs.h>
#define TDIR "/sys/kernel/debug"
static int read_iter(char *file)
{
/* 1024 should be enough to get contiguous 4 "iter" letters at some point */
char buf[1024];
int fd, len;
fd = open(file, 0);
if (fd < 0)
return -1;
while ((len = read(fd, buf, sizeof(buf))) > 0)
if (strstr(buf, "iter")) {
close(fd);
return 0;
}
close(fd);
return -1;
}
static int fn(void)
{
int err, duration = 0;
err = unshare(CLONE_NEWNS);
if (CHECK(err, "unshare", "failed: %d\n", errno))
goto out;
err = mount("", "/", "", MS_REC | MS_PRIVATE, NULL);
if (CHECK(err, "mount /", "failed: %d\n", errno))
goto out;
err = umount(TDIR);
if (CHECK(err, "umount " TDIR, "failed: %d\n", errno))
goto out;
err = mount("none", TDIR, "tmpfs", 0, NULL);
if (CHECK(err, "mount", "mount root failed: %d\n", errno))
goto out;
err = mkdir(TDIR "/fs1", 0777);
if (CHECK(err, "mkdir "TDIR"/fs1", "failed: %d\n", errno))
goto out;
err = mkdir(TDIR "/fs2", 0777);
if (CHECK(err, "mkdir "TDIR"/fs2", "failed: %d\n", errno))
goto out;
err = mount("bpf", TDIR "/fs1", "bpf", 0, NULL);
if (CHECK(err, "mount bpffs "TDIR"/fs1", "failed: %d\n", errno))
goto out;
err = mount("bpf", TDIR "/fs2", "bpf", 0, NULL);
if (CHECK(err, "mount bpffs " TDIR "/fs2", "failed: %d\n", errno))
goto out;
err = read_iter(TDIR "/fs1/maps.debug");
if (CHECK(err, "reading " TDIR "/fs1/maps.debug", "failed\n"))
goto out;
err = read_iter(TDIR "/fs2/progs.debug");
if (CHECK(err, "reading " TDIR "/fs2/progs.debug", "failed\n"))
goto out;
out:
umount(TDIR "/fs1");
umount(TDIR "/fs2");
rmdir(TDIR "/fs1");
rmdir(TDIR "/fs2");
umount(TDIR);
exit(err);
}
void test_test_bpffs(void)
{
int err, duration = 0, status = 0;
pid_t pid;
pid = fork();
if (CHECK(pid == -1, "clone", "clone failed %d", errno))
return;
if (pid == 0)
fn();
err = waitpid(pid, &status, 0);
if (CHECK(err == -1 && errno != ECHILD, "waitpid", "failed %d", errno))
return;
if (CHECK(WEXITSTATUS(status), "bpffs test ", "failed %d", WEXITSTATUS(status)))
return;
}
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment