- 28 Jan, 2021 2 commits
-
-
Stanislav Fomichev authored
Return 3 to indicate that permission check for port 111 should be skipped. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210127193140.3170382-2-sdf@google.com
-
Stanislav Fomichev authored
At the moment, BPF_CGROUP_INET{4,6}_BIND hooks can rewrite user_port to the privileged ones (< ip_unprivileged_port_start), but it will be rejected later on in the __inet_bind or __inet6_bind. Let's add another return value to indicate that CAP_NET_BIND_SERVICE check should be ignored. Use the same idea as we currently use in cgroup/egress where bit #1 indicates CN. Instead, for cgroup/bind{4,6}, bit #1 indicates that CAP_NET_BIND_SERVICE should be bypassed. v5: - rename flags to be less confusing (Andrey Ignatov) - rework BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY to work on flags and accept BPF_RET_SET_CN (no behavioral changes) v4: - Add missing IPv6 support (Martin KaFai Lau) v3: - Update description (Martin KaFai Lau) - Fix capability restore in selftest (Martin KaFai Lau) v2: - Switch to explicit return code (Martin KaFai Lau) Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrey Ignatov <rdna@fb.com> Link: https://lore.kernel.org/bpf/20210127193140.3170382-1-sdf@google.com
-
- 27 Jan, 2021 2 commits
-
-
Cong Wang authored
sk_psock_destroy() is a RCU callback, I can't see any reason why it could be used outside. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Cc: Lorenz Bauer <lmb@cloudflare.com> Link: https://lore.kernel.org/bpf/20210127221501.46866-1-xiyou.wangcong@gmail.com
-
Menglong Dong authored
This 'BPF_ADD' is duplicated, and I belive it should be 'BPF_AND'. Fixes: 981f94c3 ("bpf: Add bitwise atomic instructions") Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Brendan Jackman <jackmanb@google.com> Link: https://lore.kernel.org/bpf/20210127022507.23674-1-dong.menglong@zte.com.cn
-
- 26 Jan, 2021 1 commit
-
-
Andrii Nakryiko authored
Fix bug in handling bpf_testmod unloading that will cause test_progs exiting prematurely if bpf_testmod unloading failed. This is especially problematic when running a subset of test_progs that doesn't require root permissions and doesn't rely on bpf_testmod, yet will fail immediately due to exit(1) in unload_bpf_testmod(). Fixes: 9f7fa225 ("selftests/bpf: Add bpf_testmod kernel module for testing") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210126065019.1268027-1-andrii@kernel.org
-
- 25 Jan, 2021 17 commits
-
-
Tiezhu Yang authored
There exists many build warnings when make M=samples/bpf on the Loongson platform, this issue is MIPS related, x86 compiles just fine. Here are some warnings: CC samples/bpf/ibumad_user.o samples/bpf/ibumad_user.c: In function ‘dump_counts’: samples/bpf/ibumad_user.c:46:24: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=] printf("0x%02x : %llu\n", key, value); ~~~^ ~~~~~ %lu CC samples/bpf/offwaketime_user.o samples/bpf/offwaketime_user.c: In function ‘print_ksym’: samples/bpf/offwaketime_user.c:34:17: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=] printf("%s/%llx;", sym->name, addr); ~~~^ ~~~~ %lx samples/bpf/offwaketime_user.c: In function ‘print_stack’: samples/bpf/offwaketime_user.c:68:17: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=] printf(";%s %lld\n", key->waker, count); ~~~^ ~~~~~ %ld MIPS needs __SANE_USERSPACE_TYPES__ before <linux/types.h> to select 'int-ll64.h' in arch/mips/include/uapi/asm/types.h, then it can avoid build warnings when printing __u64 with %llu, %llx or %lld. The header tools/include/linux/types.h defines __SANE_USERSPACE_TYPES__, it seems that we can include <linux/types.h> in the source files which have build warnings, but it has no effect due to actually it includes usr/include/linux/types.h instead of tools/include/linux/types.h, the problem is that "usr/include" is preferred first than "tools/include" in samples/bpf/Makefile, that sounds like a ugly hack to -Itools/include before -Iusr/include. So define __SANE_USERSPACE_TYPES__ for MIPS in samples/bpf/Makefile is proper, if add "TPROGS_CFLAGS += -D__SANE_USERSPACE_TYPES__" in samples/bpf/Makefile, it appears the following error: Auto-detecting system features: ... libelf: [ on ] ... zlib: [ on ] ... bpf: [ OFF ] BPF API too old make[3]: *** [Makefile:293: bpfdep] Error 1 make[2]: *** [Makefile:156: all] Error 2 With #ifndef __SANE_USERSPACE_TYPES__ in tools/include/linux/types.h, the above error has gone and this ifndef change does not hurt other compilations. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/1611551146-14052-1-git-send-email-yangtiezhu@loongson.cn
-
Florian Lehner authored
Update struct bpf_perf_event_data with the addr field to match the tools headers with the kernel headers. Signed-off-by: Florian Lehner <dev@der-flo.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210123185221.23946-1-dev@der-flo.net
-
Björn Töpel authored
There is no need to cast to void * when the argument is void *. Avoid cluttering of code. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-13-bjorn.topel@gmail.com
-
Björn Töpel authored
Use calloc instead of malloc where it makes sense, and avoid C++-style void *-cast. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-12-bjorn.topel@gmail.com
-
Björn Töpel authored
The data variable is only used locally. Instead of using the heap, stick to using the stack. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-11-bjorn.topel@gmail.com
-
Björn Töpel authored
Use C89 rules for variable definition. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-10-bjorn.topel@gmail.com
-
Björn Töpel authored
Instead of casting from void *, let us use the actual type in gen_udp_hdr(). Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-9-bjorn.topel@gmail.com
-
Björn Töpel authored
Instead of casting from void *, let us use the actual type in init_iface_config(). Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-8-bjorn.topel@gmail.com
-
Björn Töpel authored
Let us use a local variable in nsswitchthread(), so we can remove a lot of casting for better readability. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-7-bjorn.topel@gmail.com
-
Björn Töpel authored
Introduce a local variable to get rid of lot of casting. Move common code outside the if/else-clause. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-6-bjorn.topel@gmail.com
-
Björn Töpel authored
The allocated entry is immediately overwritten by an assignment. Fix that. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-5-bjorn.topel@gmail.com
-
Björn Töpel authored
Silence three checkpatch style warnings. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-4-bjorn.topel@gmail.com
-
Björn Töpel authored
The enums undef and bidi are not used. Remove them. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-3-bjorn.topel@gmail.com
-
Björn Töpel authored
Instead of passing void * all over the place, let us pass the actual type (ifobject) and remove the void-ptr-to-type-ptr casting. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-2-bjorn.topel@gmail.com
-
Björn Töpel authored
Add detection for kernel version, and adapt the BPF program based on kernel support. This way, users will get the best possible performance from the BPF program. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Marek Majtyka <alardam@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20210122105351.11751-4-bjorn.topel@gmail.com
-
Björn Töpel authored
Fold xp_assign_dev and __xp_assign_dev. The former directly calls the latter. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20210122105351.11751-3-bjorn.topel@gmail.com
-
Björn Töpel authored
The explicit_free parameter of the __xsk_rcv() function was used to mark whether the call was via the generic XDP or the native XDP path. Instead of clutter the code with if-statements and "true/false" parameters which are hard to understand, simply move the explicit free to the __xsk_map_redirect() which is always called from the native XDP path. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20210122105351.11751-2-bjorn.topel@gmail.com
-
- 22 Jan, 2021 3 commits
-
-
Hangbin Liu authored
This patch add a xdp program on egress to show that we can modify the packet on egress. In this sample we will set the pkt's src mac to egress's mac address. The xdp_prog will be attached when -X option supplied. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20210122025007.2968381-1-liuhangbin@gmail.com
-
Tobias Klauser authored
s/bounts/bounds/ Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210121174324.24127-1-tklauser@distanz.ch
-
Tiezhu Yang authored
The current LLVM and Clang build procedure in samples/bpf/README.rst is out of date. See below that the links are not accessible any more. $ git clone http://llvm.org/git/llvm.git Cloning into 'llvm'... fatal: unable to access 'http://llvm.org/git/llvm.git/': Maximum (20) redirects followed $ git clone --depth 1 http://llvm.org/git/clang.git Cloning into 'clang'... fatal: unable to access 'http://llvm.org/git/clang.git/': Maximum (20) redirects followed The LLVM community has adopted new ways to build the compiler. There are different ways to build LLVM and Clang, the Clang Getting Started page [1] has one way. As Yonghong said, it is better to copy the build procedure in Documentation/bpf/bpf_devel_QA.rst to keep consistent. I verified the procedure and it is proved to be feasible, so we should update README.rst to reflect the reality. At the same time, update the related comment in Makefile. Additionally, as Fangrui said, the dir llvm-project/llvm/build/install is not used, BUILD_SHARED_LIBS=OFF is the default option [2], so also change Documentation/bpf/bpf_devel_QA.rst together. At last, we recommend that developers who want the fastest incremental builds use the Ninja build system [1], you can find it in your system's package manager, usually the package is ninja or ninja-build [3], so add ninja to build dependencies suggested by Nathan. [1] https://clang.llvm.org/get_started.html [2] https://www.llvm.org/docs/CMake.html [3] https://github.com/ninja-build/ninja/wiki/Pre-built-Ninja-packagesSigned-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Cc: Fangrui Song <maskray@google.com> Link: https://lore.kernel.org/bpf/1611279584-26047-1-git-send-email-yangtiezhu@loongson.cn
-
- 21 Jan, 2021 4 commits
-
-
Junlin Yang authored
Change 'exeeds' to 'exceeds'. Signed-off-by: Junlin Yang <yangjunlin@yulong.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210121122309.1501-1-angkery@163.com
-
Jiri Olsa authored
For very large ELF objects (with many sections), we could get special value SHN_XINDEX (65535) for elf object's string table index - e_shstrndx. Call elf_getshdrstrndx to get the proper string table index, instead of reading it directly from ELF header. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210121202203.9346-4-jolsa@kernel.org
-
Brendan Jackman authored
Alexei pointed out [1] that this wording is pretty confusing. Here's an attempt to be more explicit and clear. [1] https://lore.kernel.org/bpf/CAADnVQJVvwoZsE1K+6qRxzF7+6CvZNzygnoBW9tZNWJELk5c=Q@mail.gmail.com/Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210120133946.2107897-3-jackmanb@google.com
-
Brendan Jackman authored
This fixes up the markup to fix a warning, be more consistent with use of monospace, and use the correct .rst syntax for <em> (* instead of _). Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Link: https://lore.kernel.org/bpf/20210120133946.2107897-2-jackmanb@google.com
-
- 20 Jan, 2021 11 commits
-
-
Alexei Starovoitov authored
Stanislav Fomichev says: ==================== First patch adds custom getsockopt for TCP_ZEROCOPY_RECEIVE to remove kmalloc and lock_sock overhead from the dat path. Second patch removes kzalloc/kfree from getsockopt for the common cases. Third patch switches cgroup_bpf_enabled to be per-attach to to add only overhead for the cgroup attach types used on the system. No visible user-side changes. v9: - include linux/tcp.h instead of netinet/tcp.h in sockopt_sk.c - note that v9 depends on the commit 4be34f3d ("bpf: Don't leak memory in bpf getsockopt when optlen == 0") from bpf tree v8: - add bpi.h to tools/include/uapi in the same patch (Martin KaFai Lau) - kmalloc instead of kzalloc when exporting buffer (Martin KaFai Lau) - note that v8 depends on the commit 4be34f3d ("bpf: Don't leak memory in bpf getsockopt when optlen == 0") from bpf tree v7: - add comment about buffer contents for retval != 0 (Martin KaFai Lau) - export tcp.h into tools/include/uapi (Martin KaFai Lau) - note that v7 depends on the commit 4be34f3d ("bpf: Don't leak memory in bpf getsockopt when optlen == 0") from bpf tree v6: - avoid indirect cost for new bpf_bypass_getsockopt (Eric Dumazet) v5: - reorder patches to reduce the churn (Martin KaFai Lau) v4: - update performance numbers - bypass_bpf_getsockopt (Martin KaFai Lau) v3: - remove extra newline, add comment about sizeof tcp_zerocopy_receive (Martin KaFai Lau) - add another patch to remove lock_sock overhead from TCP_ZEROCOPY_RECEIVE; technically, this makes patch #1 obsolete, but I'd still prefer to keep it to help with other socket options v2: - perf numbers for getsockopt kmalloc reduction (Song Liu) - (sk) in BPF_CGROUP_PRE_CONNECT_ENABLED (Song Liu) - 128 -> 64 buffer size, BUILD_BUG_ON (Martin KaFai Lau) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Stanislav Fomichev authored
When we attach any cgroup hook, the rest (even if unused/unattached) start to contribute small overhead. In particular, the one we want to avoid is __cgroup_bpf_run_filter_skb which does two redirections to get to the cgroup and pushes/pulls skb. Let's split cgroup_bpf_enabled to be per-attach to make sure only used attach types trigger. I've dropped some existing high-level cgroup_bpf_enabled in some places because BPF_PROG_CGROUP_XXX_RUN macros usually have another cgroup_bpf_enabled check. I also had to copy-paste BPF_CGROUP_RUN_SA_PROG_LOCK for GETPEERNAME/GETSOCKNAME because type for cgroup_bpf_enabled[type] has to be constant and known at compile time. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210115163501.805133-4-sdf@google.com
-
Stanislav Fomichev authored
When we attach a bpf program to cgroup/getsockopt any other getsockopt() syscall starts incurring kzalloc/kfree cost. Let add a small buffer on the stack and use it for small (majority) {s,g}etsockopt values. The buffer is small enough to fit into the cache line and cover the majority of simple options (most of them are 4 byte ints). It seems natural to do the same for setsockopt, but it's a bit more involved when the BPF program modifies the data (where we have to kmalloc). The assumption is that for the majority of setsockopt calls (which are doing pure BPF options or apply policy) this will bring some benefit as well. Without this patch (we remove about 1% __kmalloc): 3.38% 0.07% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt | --3.30%--__cgroup_bpf_run_filter_getsockopt | --0.81%--__kmalloc Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210115163501.805133-3-sdf@google.com
-
Stanislav Fomichev authored
Add custom implementation of getsockopt hook for TCP_ZEROCOPY_RECEIVE. We skip generic hooks for TCP_ZEROCOPY_RECEIVE and have a custom call in do_tcp_getsockopt using the on-stack data. This removes 3% overhead for locking/unlocking the socket. Without this patch: 3.38% 0.07% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt | --3.30%--__cgroup_bpf_run_filter_getsockopt | --0.81%--__kmalloc With the patch applied: 0.52% 0.12% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt_kern Note, exporting uapi/tcp.h requires removing netinet/tcp.h from test_progs.h because those headers have confliciting definitions. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210115163501.805133-2-sdf@google.com
-
Yonghong Song authored
llvm patch https://reviews.llvm.org/D84002 permitted to emit empty rodata datasec if the elf .rodata section contains read-only data from local variables. These local variables will be not emitted as BTF_KIND_VARs since llvm converted these local variables as static variables with private linkage without debuginfo types. Such an empty rodata datasec will make skeleton code generation easy since for skeleton a rodata struct will be generated if there is a .rodata elf section. The existence of a rodata btf datasec is also consistent with the existence of a rodata map created by libbpf. The btf with such an empty rodata datasec will fail in the kernel though as kernel will reject a datasec with zero vlen and zero size. For example, for the below code, int sys_enter(void *ctx) { int fmt[6] = {1, 2, 3, 4, 5, 6}; int dst[6]; bpf_probe_read(dst, sizeof(dst), fmt); return 0; } We got the below btf (bpftool btf dump ./test.o): [1] PTR '(anon)' type_id=0 [2] FUNC_PROTO '(anon)' ret_type_id=3 vlen=1 'ctx' type_id=1 [3] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED [4] FUNC 'sys_enter' type_id=2 linkage=global [5] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED [6] ARRAY '(anon)' type_id=5 index_type_id=7 nr_elems=4 [7] INT '__ARRAY_SIZE_TYPE__' size=4 bits_offset=0 nr_bits=32 encoding=(none) [8] VAR '_license' type_id=6, linkage=global-alloc [9] DATASEC '.rodata' size=0 vlen=0 [10] DATASEC 'license' size=0 vlen=1 type_id=8 offset=0 size=4 When loading the ./test.o to the kernel with bpftool, we see the following error: libbpf: Error loading BTF: Invalid argument(22) libbpf: magic: 0xeb9f ... [6] ARRAY (anon) type_id=5 index_type_id=7 nr_elems=4 [7] INT __ARRAY_SIZE_TYPE__ size=4 bits_offset=0 nr_bits=32 encoding=(none) [8] VAR _license type_id=6 linkage=1 [9] DATASEC .rodata size=24 vlen=0 vlen == 0 libbpf: Error loading .BTF into kernel: -22. BTF is optional, ignoring. Basically, libbpf changed .rodata datasec size to 24 since elf .rodata section size is 24. The kernel then rejected the BTF since vlen = 0. Note that the above kernel verifier failure can be worked around with changing local variable "fmt" to a static or global, optionally const, variable. This patch permits a datasec with vlen = 0 in kernel. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210119153519.3901963-1-yhs@fb.com
-
Alexei Starovoitov authored
Qais Yousef says: ==================== Changes in v3: * Fix not returning error value correctly in trigger_module_test_write() (Yonghong) * Add Yonghong acked-by to patch 1. Changes in v2: * Fix compilation error. (Andrii) * Make the new test use write() instead of read() (Andrii) Add some missing glue logic to teach bpf about bare tracepoints - tracepoints without any trace event associated with them. Bare tracepoints are declare with DECLARE_TRACE(). Full tracepoints are declare with TRACE_EVENT(). BPF can attach to these tracepoints as RAW_TRACEPOINT() only as there're no events in tracefs created with them. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Qais Yousef authored
Reuse module_attach infrastructure to add a new bare tracepoint to check we can attach to it as a raw tracepoint. Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210119122237.2426878-3-qais.yousef@arm.com
-
Alexei Starovoitov authored
Gary Lin says: ==================== This patch series implements jump padding to x64 jit to cover some corner cases that used to consume more than 20 jit passes and caused failure. v4: - Add the detailed comments about the possible padding bytes - Add the second test case which triggers jmp_cond padding and imm32 nop jmp padding. - Add the new test case as another subprog v3: - Copy the instructions of prologue separately or the size calculation of the first BPF instruction would include the prologue. - Replace WARN_ONCE() with pr_err() and EFAULT - Use MAX_PASSES in the for loop condition check - Remove the "padded" flag from x64_jit_data. For the extra pass of subprogs, padding is always enabled since it won't hurt the images that converge without padding. v2: - Simplify the sample code in the commit description and provide the jit code - Check the expected padding bytes with WARN_ONCE - Move the 'padded' flag to 'struct x64_jit_data' - Remove the EXPECTED_FAIL flag from bpf_fill_maxinsns11() in test_bpf - Add 2 verifier tests ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Qais Yousef authored
Some subsystems only have bare tracepoints (a tracepoint with no associated trace event) to avoid the problem of trace events being an ABI that can't be changed. >From bpf presepective, bare tracepoints are what it calls RAW_TRACEPOINT(). Since bpf assumed there's 1:1 mapping, it relied on hooking to DEFINE_EVENT() macro to create bpf mapping of the tracepoints. Since bare tracepoints use DECLARE_TRACE() to create the tracepoint, bpf had no knowledge about their existence. By teaching bpf_probe.h to parse DECLARE_TRACE() in a similar fashion to DEFINE_EVENT(), bpf can find and attach to the new raw tracepoints. Enabling that comes with the contract that changes to raw tracepoints don't constitute a regression if they break existing bpf programs. We need the ability to continue to morph and modify these raw tracepoints without worrying about any ABI. Update Documentation/bpf/bpf_design_QA.rst to document this contract. Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210119122237.2426878-2-qais.yousef@arm.com
-
Gary Lin authored
There are 3 tests added into verifier's jit tests to trigger x64 jit jump padding. The first test can be represented as the following assembly code: 1: bpf_call bpf_get_prandom_u32 2: if r0 == 1 goto pc+128 3: if r0 == 2 goto pc+128 ... 129: if r0 == 128 goto pc+128 130: goto pc+128 131: goto pc+127 ... 256: goto pc+2 257: goto pc+1 258: r0 = 1 259: ret We first store a random number to r0 and add the corresponding conditional jumps (2~129) to make verifier believe that those jump instructions from 130 to 257 are reachable. When the program is sent to x64 jit, it starts to optimize out the NOP jumps backwards from 257. Since there are 128 such jumps, the program easily reaches 15 passes and triggers jump padding. Here is the x64 jit code of the first test: 0: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0] 5: 66 90 xchg ax,ax 7: 55 push rbp 8: 48 89 e5 mov rbp,rsp b: e8 4c 90 75 e3 call 0xffffffffe375905c 10: 48 83 f8 01 cmp rax,0x1 14: 0f 84 fe 04 00 00 je 0x518 1a: 48 83 f8 02 cmp rax,0x2 1e: 0f 84 f9 04 00 00 je 0x51d ... f6: 48 83 f8 18 cmp rax,0x18 fa: 0f 84 8b 04 00 00 je 0x58b 100: 48 83 f8 19 cmp rax,0x19 104: 0f 84 86 04 00 00 je 0x590 10a: 48 83 f8 1a cmp rax,0x1a 10e: 0f 84 81 04 00 00 je 0x595 ... 500: 0f 84 83 01 00 00 je 0x689 506: 48 81 f8 80 00 00 00 cmp rax,0x80 50d: 0f 84 76 01 00 00 je 0x689 513: e9 71 01 00 00 jmp 0x689 518: e9 6c 01 00 00 jmp 0x689 ... 5fe: e9 86 00 00 00 jmp 0x689 603: e9 81 00 00 00 jmp 0x689 608: 0f 1f 00 nop DWORD PTR [rax] 60b: eb 7c jmp 0x689 60d: eb 7a jmp 0x689 ... 683: eb 04 jmp 0x689 685: eb 02 jmp 0x689 687: 66 90 xchg ax,ax 689: b8 01 00 00 00 mov eax,0x1 68e: c9 leave 68f: c3 ret As expected, a 3 bytes NOPs is inserted at 608 due to the transition from imm32 jmp to imm8 jmp. A 2 bytes NOPs is also inserted at 687 to replace a NOP jump. The second test case is tricky. Here is the assembly code: 1: bpf_call bpf_get_prandom_u32 2: if r0 == 1 goto pc+2048 3: if r0 == 2 goto pc+2048 ... 2049: if r0 == 2048 goto pc+2048 2050: goto pc+2048 2051: goto pc+16 2052: goto pc+15 ... 2064: goto pc+3 2065: goto pc+2 2066: goto pc+1 ... [repeat "goto pc+16".."goto pc+1" 127 times] ... 4099: r0 = 2 4100: ret There are 4 major parts of the program. 1) 1~2049: Those are instructions to make 2050~4098 reachable. Some of them also could generate the padding for jmp_cond. 2) 2050: This is the target instruction for the imm32 nop jmp padding. 3) 2051~4098: The repeated "goto 1~16" instructions are designed to be consumed by the nop jmp optimization. In the end, those instrucitons become 128 continuous 0 offset jmp and are optimized out in 1 pass, and this make insn 2050 an imm32 nop jmp in the next pass, so that we can trigger the 5 bytes padding. 4) 4099~4100: Those are the instructions to end the program. The x64 jit code is like this: 0: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0] 5: 66 90 xchg ax,ax 7: 55 push rbp 8: 48 89 e5 mov rbp,rsp b: e8 bc 7b d5 d3 call 0xffffffffd3d57bcc 10: 48 83 f8 01 cmp rax,0x1 14: 0f 84 7e 66 00 00 je 0x6698 1a: 48 83 f8 02 cmp rax,0x2 1e: 0f 84 74 66 00 00 je 0x6698 24: 48 83 f8 03 cmp rax,0x3 28: 0f 84 6a 66 00 00 je 0x6698 2e: 48 83 f8 04 cmp rax,0x4 32: 0f 84 60 66 00 00 je 0x6698 38: 48 83 f8 05 cmp rax,0x5 3c: 0f 84 56 66 00 00 je 0x6698 42: 48 83 f8 06 cmp rax,0x6 46: 0f 84 4c 66 00 00 je 0x6698 ... 666c: 48 81 f8 fe 07 00 00 cmp rax,0x7fe 6673: 0f 1f 40 00 nop DWORD PTR [rax+0x0] 6677: 74 1f je 0x6698 6679: 48 81 f8 ff 07 00 00 cmp rax,0x7ff 6680: 0f 1f 40 00 nop DWORD PTR [rax+0x0] 6684: 74 12 je 0x6698 6686: 48 81 f8 00 08 00 00 cmp rax,0x800 668d: 0f 1f 40 00 nop DWORD PTR [rax+0x0] 6691: 74 05 je 0x6698 6693: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0] 6698: b8 02 00 00 00 mov eax,0x2 669d: c9 leave 669e: c3 ret Since insn 2051~4098 are optimized out right before the padding pass, there are several conditional jumps from the first part are replaced with imm8 jmp_cond, and this triggers the 4 bytes padding, for example at 6673, 6680, and 668d. On the other hand, Insn 2050 is replaced with the 5 bytes nops at 6693. The third test is to invoke the first and second tests as subprogs to test bpf2bpf. Per the system log, there was one more jit happened with only one pass and the same jit code was produced. v4: - Add the second test case which triggers jmp_cond padding and imm32 nop jmp padding. - Add the new test case as another subprog Signed-off-by: Gary Lin <glin@suse.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210119102501.511-4-glin@suse.com
-
Gary Lin authored
With NOPs padding, x64 jit now can handle the jump cases like bpf_fill_maxinsns11(). Signed-off-by: Gary Lin <glin@suse.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210119102501.511-3-glin@suse.com
-