- 02 Feb, 2021 1 commit
-
-
Tiezhu Yang authored
There exists many build errors when make M=samples/bpf on the Loongson platform. This issue is MIPS related, x86 compiles just fine. Here are some errors: CLANG-bpf samples/bpf/sockex2_kern.o In file included from samples/bpf/sockex2_kern.c:2: In file included from ./include/uapi/linux/in.h:24: In file included from ./include/linux/socket.h:8: In file included from ./include/linux/uio.h:8: In file included from ./include/linux/kernel.h:11: In file included from ./include/linux/bitops.h:32: In file included from ./arch/mips/include/asm/bitops.h:19: In file included from ./arch/mips/include/asm/barrier.h:11: ./arch/mips/include/asm/addrspace.h:13:10: fatal error: 'spaces.h' file not found ^~~~~~~~~~ 1 error generated. CLANG-bpf samples/bpf/sockex2_kern.o In file included from samples/bpf/sockex2_kern.c:2: In file included from ./include/uapi/linux/in.h:24: In file included from ./include/linux/socket.h:8: In file included from ./include/linux/uio.h:8: In file included from ./include/linux/kernel.h:11: In file included from ./include/linux/bitops.h:32: In file included from ./arch/mips/include/asm/bitops.h:22: In file included from ./arch/mips/include/asm/cpu-features.h:13: In file included from ./arch/mips/include/asm/cpu-info.h:15: In file included from ./include/linux/cache.h:6: ./arch/mips/include/asm/cache.h:12:10: fatal error: 'kmalloc.h' file not found ^~~~~~~~~~~ 1 error generated. CLANG-bpf samples/bpf/sockex2_kern.o In file included from samples/bpf/sockex2_kern.c:2: In file included from ./include/uapi/linux/in.h:24: In file included from ./include/linux/socket.h:8: In file included from ./include/linux/uio.h:8: In file included from ./include/linux/kernel.h:11: In file included from ./include/linux/bitops.h:32: In file included from ./arch/mips/include/asm/bitops.h:22: ./arch/mips/include/asm/cpu-features.h:15:10: fatal error: 'cpu-feature-overrides.h' file not found ^~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. $ find arch/mips/include/asm -name spaces.h | sort arch/mips/include/asm/mach-ar7/spaces.h ... arch/mips/include/asm/mach-generic/spaces.h ... arch/mips/include/asm/mach-loongson64/spaces.h ... arch/mips/include/asm/mach-tx49xx/spaces.h $ find arch/mips/include/asm -name kmalloc.h | sort arch/mips/include/asm/mach-generic/kmalloc.h arch/mips/include/asm/mach-ip32/kmalloc.h arch/mips/include/asm/mach-tx49xx/kmalloc.h $ find arch/mips/include/asm -name cpu-feature-overrides.h | sort arch/mips/include/asm/mach-ath25/cpu-feature-overrides.h ... arch/mips/include/asm/mach-generic/cpu-feature-overrides.h ... arch/mips/include/asm/mach-loongson64/cpu-feature-overrides.h ... arch/mips/include/asm/mach-tx49xx/cpu-feature-overrides.h In the arch/mips/Makefile, there exists the following board-dependent options: include arch/mips/Kbuild.platforms cflags-y += -I$(srctree)/arch/mips/include/asm/mach-generic So we can do the similar things in samples/bpf/Makefile, just add platform specific and generic include dir for MIPS Loongson64 to fix the build errors. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1611669925-25315-1-git-send-email-yangtiezhu@loongson.cn
-
- 29 Jan, 2021 6 commits
-
-
Tobias Klauser authored
!perfmon_capable() is checked before the last switch(func_id) in bpf_base_func_proto. Thus, the cases BPF_FUNC_trace_printk and BPF_FUNC_snprintf_btf can be moved to that last switch(func_id) to omit the inline !perfmon_capable() checks. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210127174615.3038-1-tklauser@distanz.ch
-
Stanislav Fomichev authored
Those hooks run as BPF_CGROUP_RUN_SA_PROG_LOCK and operate on a locked socket. Note that we could remove the switch for prog->expected_attach_type altogether since all current sock_addr attach types are covered. However, it makes sense to keep it as a safe-guard in case new sock_addr attach types are added that might not operate on a locked socket. Therefore, avoid to let this slip through. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210127232853.3753823-5-sdf@google.com
-
Stanislav Fomichev authored
I'll extend them in the next patch. It's easier to work with C than with asm. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210127232853.3753823-4-sdf@google.com
-
Stanislav Fomichev authored
Those hooks run as BPF_CGROUP_RUN_SA_PROG_LOCK and operate on a locked socket. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210127232853.3753823-3-sdf@google.com
-
Stanislav Fomichev authored
Can be used to query/modify socket state for unconnected UDP sendmsg. Those hooks run as BPF_CGROUP_RUN_SA_PROG_LOCK and operate on a locked socket. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210127232853.3753823-2-sdf@google.com
-
Sedat Dilek authored
When dealing with BPF/BTF/pahole and DWARF v5 I wanted to build bpftool. While looking into the source code I found duplicate assignments in misc tools for the LLVM eco system, e.g. clang and llvm-objcopy. Move the Clang, LLC and/or LLVM utils definitions to tools/scripts/Makefile.include file and add missing includes where needed. Honestly, I was inspired by the commit c8a950d0 ("tools: Factor HOSTCC, HOSTLD, HOSTAR definitions"). I tested with bpftool and perf on Debian/testing AMD64 and LLVM/Clang v11.1.0-rc1. Build instructions: [ make and make-options ] MAKE="make V=1" MAKE_OPTS="HOSTCC=clang HOSTCXX=clang++ HOSTLD=ld.lld CC=clang LD=ld.lld LLVM=1 LLVM_IAS=1" MAKE_OPTS="$MAKE_OPTS PAHOLE=/opt/pahole/bin/pahole" [ clean-up ] $MAKE $MAKE_OPTS -C tools/ clean [ bpftool ] $MAKE $MAKE_OPTS -C tools/bpf/bpftool/ [ perf ] PYTHON=python3 $MAKE $MAKE_OPTS -C tools/perf/ I was careful with respecting the user's wish to override custom compiler, linker, GNU/binutils and/or LLVM utils settings. Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> # tools/build and tools/perf Link: https://lore.kernel.org/bpf/20210128015117.20515-1-sedat.dilek@gmail.com
-
- 28 Jan, 2021 2 commits
-
-
Stanislav Fomichev authored
Return 3 to indicate that permission check for port 111 should be skipped. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210127193140.3170382-2-sdf@google.com
-
Stanislav Fomichev authored
At the moment, BPF_CGROUP_INET{4,6}_BIND hooks can rewrite user_port to the privileged ones (< ip_unprivileged_port_start), but it will be rejected later on in the __inet_bind or __inet6_bind. Let's add another return value to indicate that CAP_NET_BIND_SERVICE check should be ignored. Use the same idea as we currently use in cgroup/egress where bit #1 indicates CN. Instead, for cgroup/bind{4,6}, bit #1 indicates that CAP_NET_BIND_SERVICE should be bypassed. v5: - rename flags to be less confusing (Andrey Ignatov) - rework BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY to work on flags and accept BPF_RET_SET_CN (no behavioral changes) v4: - Add missing IPv6 support (Martin KaFai Lau) v3: - Update description (Martin KaFai Lau) - Fix capability restore in selftest (Martin KaFai Lau) v2: - Switch to explicit return code (Martin KaFai Lau) Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrey Ignatov <rdna@fb.com> Link: https://lore.kernel.org/bpf/20210127193140.3170382-1-sdf@google.com
-
- 27 Jan, 2021 2 commits
-
-
Cong Wang authored
sk_psock_destroy() is a RCU callback, I can't see any reason why it could be used outside. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Cc: Lorenz Bauer <lmb@cloudflare.com> Link: https://lore.kernel.org/bpf/20210127221501.46866-1-xiyou.wangcong@gmail.com
-
Menglong Dong authored
This 'BPF_ADD' is duplicated, and I belive it should be 'BPF_AND'. Fixes: 981f94c3 ("bpf: Add bitwise atomic instructions") Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Brendan Jackman <jackmanb@google.com> Link: https://lore.kernel.org/bpf/20210127022507.23674-1-dong.menglong@zte.com.cn
-
- 26 Jan, 2021 1 commit
-
-
Andrii Nakryiko authored
Fix bug in handling bpf_testmod unloading that will cause test_progs exiting prematurely if bpf_testmod unloading failed. This is especially problematic when running a subset of test_progs that doesn't require root permissions and doesn't rely on bpf_testmod, yet will fail immediately due to exit(1) in unload_bpf_testmod(). Fixes: 9f7fa225 ("selftests/bpf: Add bpf_testmod kernel module for testing") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210126065019.1268027-1-andrii@kernel.org
-
- 25 Jan, 2021 17 commits
-
-
Tiezhu Yang authored
There exists many build warnings when make M=samples/bpf on the Loongson platform, this issue is MIPS related, x86 compiles just fine. Here are some warnings: CC samples/bpf/ibumad_user.o samples/bpf/ibumad_user.c: In function ‘dump_counts’: samples/bpf/ibumad_user.c:46:24: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=] printf("0x%02x : %llu\n", key, value); ~~~^ ~~~~~ %lu CC samples/bpf/offwaketime_user.o samples/bpf/offwaketime_user.c: In function ‘print_ksym’: samples/bpf/offwaketime_user.c:34:17: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=] printf("%s/%llx;", sym->name, addr); ~~~^ ~~~~ %lx samples/bpf/offwaketime_user.c: In function ‘print_stack’: samples/bpf/offwaketime_user.c:68:17: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=] printf(";%s %lld\n", key->waker, count); ~~~^ ~~~~~ %ld MIPS needs __SANE_USERSPACE_TYPES__ before <linux/types.h> to select 'int-ll64.h' in arch/mips/include/uapi/asm/types.h, then it can avoid build warnings when printing __u64 with %llu, %llx or %lld. The header tools/include/linux/types.h defines __SANE_USERSPACE_TYPES__, it seems that we can include <linux/types.h> in the source files which have build warnings, but it has no effect due to actually it includes usr/include/linux/types.h instead of tools/include/linux/types.h, the problem is that "usr/include" is preferred first than "tools/include" in samples/bpf/Makefile, that sounds like a ugly hack to -Itools/include before -Iusr/include. So define __SANE_USERSPACE_TYPES__ for MIPS in samples/bpf/Makefile is proper, if add "TPROGS_CFLAGS += -D__SANE_USERSPACE_TYPES__" in samples/bpf/Makefile, it appears the following error: Auto-detecting system features: ... libelf: [ on ] ... zlib: [ on ] ... bpf: [ OFF ] BPF API too old make[3]: *** [Makefile:293: bpfdep] Error 1 make[2]: *** [Makefile:156: all] Error 2 With #ifndef __SANE_USERSPACE_TYPES__ in tools/include/linux/types.h, the above error has gone and this ifndef change does not hurt other compilations. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/1611551146-14052-1-git-send-email-yangtiezhu@loongson.cn
-
Florian Lehner authored
Update struct bpf_perf_event_data with the addr field to match the tools headers with the kernel headers. Signed-off-by: Florian Lehner <dev@der-flo.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210123185221.23946-1-dev@der-flo.net
-
Björn Töpel authored
There is no need to cast to void * when the argument is void *. Avoid cluttering of code. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-13-bjorn.topel@gmail.com
-
Björn Töpel authored
Use calloc instead of malloc where it makes sense, and avoid C++-style void *-cast. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-12-bjorn.topel@gmail.com
-
Björn Töpel authored
The data variable is only used locally. Instead of using the heap, stick to using the stack. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-11-bjorn.topel@gmail.com
-
Björn Töpel authored
Use C89 rules for variable definition. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-10-bjorn.topel@gmail.com
-
Björn Töpel authored
Instead of casting from void *, let us use the actual type in gen_udp_hdr(). Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-9-bjorn.topel@gmail.com
-
Björn Töpel authored
Instead of casting from void *, let us use the actual type in init_iface_config(). Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-8-bjorn.topel@gmail.com
-
Björn Töpel authored
Let us use a local variable in nsswitchthread(), so we can remove a lot of casting for better readability. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-7-bjorn.topel@gmail.com
-
Björn Töpel authored
Introduce a local variable to get rid of lot of casting. Move common code outside the if/else-clause. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-6-bjorn.topel@gmail.com
-
Björn Töpel authored
The allocated entry is immediately overwritten by an assignment. Fix that. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-5-bjorn.topel@gmail.com
-
Björn Töpel authored
Silence three checkpatch style warnings. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-4-bjorn.topel@gmail.com
-
Björn Töpel authored
The enums undef and bidi are not used. Remove them. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-3-bjorn.topel@gmail.com
-
Björn Töpel authored
Instead of passing void * all over the place, let us pass the actual type (ifobject) and remove the void-ptr-to-type-ptr casting. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-2-bjorn.topel@gmail.com
-
Björn Töpel authored
Add detection for kernel version, and adapt the BPF program based on kernel support. This way, users will get the best possible performance from the BPF program. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Marek Majtyka <alardam@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20210122105351.11751-4-bjorn.topel@gmail.com
-
Björn Töpel authored
Fold xp_assign_dev and __xp_assign_dev. The former directly calls the latter. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20210122105351.11751-3-bjorn.topel@gmail.com
-
Björn Töpel authored
The explicit_free parameter of the __xsk_rcv() function was used to mark whether the call was via the generic XDP or the native XDP path. Instead of clutter the code with if-statements and "true/false" parameters which are hard to understand, simply move the explicit free to the __xsk_map_redirect() which is always called from the native XDP path. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20210122105351.11751-2-bjorn.topel@gmail.com
-
- 22 Jan, 2021 3 commits
-
-
Hangbin Liu authored
This patch add a xdp program on egress to show that we can modify the packet on egress. In this sample we will set the pkt's src mac to egress's mac address. The xdp_prog will be attached when -X option supplied. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20210122025007.2968381-1-liuhangbin@gmail.com
-
Tobias Klauser authored
s/bounts/bounds/ Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210121174324.24127-1-tklauser@distanz.ch
-
Tiezhu Yang authored
The current LLVM and Clang build procedure in samples/bpf/README.rst is out of date. See below that the links are not accessible any more. $ git clone http://llvm.org/git/llvm.git Cloning into 'llvm'... fatal: unable to access 'http://llvm.org/git/llvm.git/': Maximum (20) redirects followed $ git clone --depth 1 http://llvm.org/git/clang.git Cloning into 'clang'... fatal: unable to access 'http://llvm.org/git/clang.git/': Maximum (20) redirects followed The LLVM community has adopted new ways to build the compiler. There are different ways to build LLVM and Clang, the Clang Getting Started page [1] has one way. As Yonghong said, it is better to copy the build procedure in Documentation/bpf/bpf_devel_QA.rst to keep consistent. I verified the procedure and it is proved to be feasible, so we should update README.rst to reflect the reality. At the same time, update the related comment in Makefile. Additionally, as Fangrui said, the dir llvm-project/llvm/build/install is not used, BUILD_SHARED_LIBS=OFF is the default option [2], so also change Documentation/bpf/bpf_devel_QA.rst together. At last, we recommend that developers who want the fastest incremental builds use the Ninja build system [1], you can find it in your system's package manager, usually the package is ninja or ninja-build [3], so add ninja to build dependencies suggested by Nathan. [1] https://clang.llvm.org/get_started.html [2] https://www.llvm.org/docs/CMake.html [3] https://github.com/ninja-build/ninja/wiki/Pre-built-Ninja-packagesSigned-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Cc: Fangrui Song <maskray@google.com> Link: https://lore.kernel.org/bpf/1611279584-26047-1-git-send-email-yangtiezhu@loongson.cn
-
- 21 Jan, 2021 4 commits
-
-
Junlin Yang authored
Change 'exeeds' to 'exceeds'. Signed-off-by: Junlin Yang <yangjunlin@yulong.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210121122309.1501-1-angkery@163.com
-
Jiri Olsa authored
For very large ELF objects (with many sections), we could get special value SHN_XINDEX (65535) for elf object's string table index - e_shstrndx. Call elf_getshdrstrndx to get the proper string table index, instead of reading it directly from ELF header. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210121202203.9346-4-jolsa@kernel.org
-
Brendan Jackman authored
Alexei pointed out [1] that this wording is pretty confusing. Here's an attempt to be more explicit and clear. [1] https://lore.kernel.org/bpf/CAADnVQJVvwoZsE1K+6qRxzF7+6CvZNzygnoBW9tZNWJELk5c=Q@mail.gmail.com/Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210120133946.2107897-3-jackmanb@google.com
-
Brendan Jackman authored
This fixes up the markup to fix a warning, be more consistent with use of monospace, and use the correct .rst syntax for <em> (* instead of _). Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Link: https://lore.kernel.org/bpf/20210120133946.2107897-2-jackmanb@google.com
-
- 20 Jan, 2021 4 commits
-
-
Alexei Starovoitov authored
Stanislav Fomichev says: ==================== First patch adds custom getsockopt for TCP_ZEROCOPY_RECEIVE to remove kmalloc and lock_sock overhead from the dat path. Second patch removes kzalloc/kfree from getsockopt for the common cases. Third patch switches cgroup_bpf_enabled to be per-attach to to add only overhead for the cgroup attach types used on the system. No visible user-side changes. v9: - include linux/tcp.h instead of netinet/tcp.h in sockopt_sk.c - note that v9 depends on the commit 4be34f3d ("bpf: Don't leak memory in bpf getsockopt when optlen == 0") from bpf tree v8: - add bpi.h to tools/include/uapi in the same patch (Martin KaFai Lau) - kmalloc instead of kzalloc when exporting buffer (Martin KaFai Lau) - note that v8 depends on the commit 4be34f3d ("bpf: Don't leak memory in bpf getsockopt when optlen == 0") from bpf tree v7: - add comment about buffer contents for retval != 0 (Martin KaFai Lau) - export tcp.h into tools/include/uapi (Martin KaFai Lau) - note that v7 depends on the commit 4be34f3d ("bpf: Don't leak memory in bpf getsockopt when optlen == 0") from bpf tree v6: - avoid indirect cost for new bpf_bypass_getsockopt (Eric Dumazet) v5: - reorder patches to reduce the churn (Martin KaFai Lau) v4: - update performance numbers - bypass_bpf_getsockopt (Martin KaFai Lau) v3: - remove extra newline, add comment about sizeof tcp_zerocopy_receive (Martin KaFai Lau) - add another patch to remove lock_sock overhead from TCP_ZEROCOPY_RECEIVE; technically, this makes patch #1 obsolete, but I'd still prefer to keep it to help with other socket options v2: - perf numbers for getsockopt kmalloc reduction (Song Liu) - (sk) in BPF_CGROUP_PRE_CONNECT_ENABLED (Song Liu) - 128 -> 64 buffer size, BUILD_BUG_ON (Martin KaFai Lau) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Stanislav Fomichev authored
When we attach any cgroup hook, the rest (even if unused/unattached) start to contribute small overhead. In particular, the one we want to avoid is __cgroup_bpf_run_filter_skb which does two redirections to get to the cgroup and pushes/pulls skb. Let's split cgroup_bpf_enabled to be per-attach to make sure only used attach types trigger. I've dropped some existing high-level cgroup_bpf_enabled in some places because BPF_PROG_CGROUP_XXX_RUN macros usually have another cgroup_bpf_enabled check. I also had to copy-paste BPF_CGROUP_RUN_SA_PROG_LOCK for GETPEERNAME/GETSOCKNAME because type for cgroup_bpf_enabled[type] has to be constant and known at compile time. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210115163501.805133-4-sdf@google.com
-
Stanislav Fomichev authored
When we attach a bpf program to cgroup/getsockopt any other getsockopt() syscall starts incurring kzalloc/kfree cost. Let add a small buffer on the stack and use it for small (majority) {s,g}etsockopt values. The buffer is small enough to fit into the cache line and cover the majority of simple options (most of them are 4 byte ints). It seems natural to do the same for setsockopt, but it's a bit more involved when the BPF program modifies the data (where we have to kmalloc). The assumption is that for the majority of setsockopt calls (which are doing pure BPF options or apply policy) this will bring some benefit as well. Without this patch (we remove about 1% __kmalloc): 3.38% 0.07% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt | --3.30%--__cgroup_bpf_run_filter_getsockopt | --0.81%--__kmalloc Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210115163501.805133-3-sdf@google.com
-
Stanislav Fomichev authored
Add custom implementation of getsockopt hook for TCP_ZEROCOPY_RECEIVE. We skip generic hooks for TCP_ZEROCOPY_RECEIVE and have a custom call in do_tcp_getsockopt using the on-stack data. This removes 3% overhead for locking/unlocking the socket. Without this patch: 3.38% 0.07% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt | --3.30%--__cgroup_bpf_run_filter_getsockopt | --0.81%--__kmalloc With the patch applied: 0.52% 0.12% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt_kern Note, exporting uapi/tcp.h requires removing netinet/tcp.h from test_progs.h because those headers have confliciting definitions. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210115163501.805133-2-sdf@google.com
-