- 17 Jul, 2021 3 commits
-
-
Alan Maguire authored
By using the stack for this small structure, we avoid the need for freeing memory in error paths. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1626475617-25984-4-git-send-email-alan.maguire@oracle.com
-
Alan Maguire authored
__s64 can be defined as either long or long long, depending on the architecture. On ppc64le it's defined as long, giving this error: In file included from btf_dump.c:22: btf_dump.c: In function 'btf_dump_type_data_check_overflow': libbpf_internal.h:111:22: error: format '%lld' expects argument of type 'long long int', but argument 3 has type '__s64' {aka 'long int'} [-Werror=format=] 111 | libbpf_print(level, "libbpf: " fmt, ##__VA_ARGS__); \ | ^~~~~~~~~~ libbpf_internal.h:114:27: note: in expansion of macro '__pr' 114 | #define pr_warn(fmt, ...) __pr(LIBBPF_WARN, fmt, ##__VA_ARGS__) | ^~~~ btf_dump.c:1992:3: note: in expansion of macro 'pr_warn' 1992 | pr_warn("unexpected size [%lld] for id [%u]\n", | ^~~~~~~ btf_dump.c:1992:32: note: format string is defined here 1992 | pr_warn("unexpected size [%lld] for id [%u]\n", | ~~~^ | | | long long int | %ld Cast to size_t and use %zu instead. Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1626475617-25984-3-git-send-email-alan.maguire@oracle.com
-
Alan Maguire authored
If data is packed, data structures can store it outside of usual boundaries. For example a 4-byte int can be stored on a unaligned boundary in a case like this: struct s { char f1; int f2; } __attribute((packed)); ...the int is stored at an offset of one byte. Some platforms have problems dereferencing data that is not aligned with its size, and code exists to handle most cases of this for BTF typed data display. However pointer display was missed, and a simple function to test if "ptr_is_aligned(data, data_sz)" would help clarify this code. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1626475617-25984-2-git-send-email-alan.maguire@oracle.com
-
- 16 Jul, 2021 37 commits
-
-
Andrii Nakryiko authored
Alan Maguire says: ==================== Add a libbpf dumper function that supports dumping a representation of data passed in using the BTF id associated with the data in a manner similar to the bpf_snprintf_btf helper. Default output format is identical to that dumped by bpf_snprintf_btf() (bar using tabs instead of spaces for indentation, but the indent string can be customized also); for example, a "struct sk_buff" representation would look like this: (struct sk_buff){ (union){ (struct){ .next = (struct sk_buff *)0xffffffffffffffff, .prev = (struct sk_buff *)0xffffffffffffffff, (union){ .dev = (struct net_device *)0xffffffffffffffff, .dev_scratch = (long unsigned int)18446744073709551615, }, }, ... Patch 1 implements the dump functionality in a manner similar to that in kernel/bpf/btf.c, but with a view to fitting into libbpf more naturally. For example, rather than using flags, boolean dump options are used to control output. In addition, rather than combining checks for display (such as is this field zero?) and actual display - as is done for the kernel code - the code is organized to separate zero and overflow checks from type display. Patch 2 adds ASSERT_STRNEQ() for use in the following BTF dumper tests. Patch 3 consists of selftests that utilize a dump printf function to snprintf the dump output to a string for comparison with expected output. Tests deliberately mirror those in snprintf_btf helper test to keep output consistent, but also cover overflow handling, var/section display. Changes since v5 [1] - readjust dump options to avoid unnecessary padding (Andrii, patch 1). - tidied up bitfield data checking/retrieval using Andrii's suggestions. Removed code where we adjust data pointer prior to calling bitfield functions as this adjustment is not needed, provided we use the type size as the number of bytes to iterate over when retrieving the full value we apply bit shifting operations to retrieve the bitfield value. With these chances, the *_int_bits() functions were no longer needed (Andrii, patch 1). - coalesced the "is zero" checking for ints, floats and pointers into btf_dump_base_type_check_zero(), using a memcmp() of the size of the data. This can be derived from t->size for ints and floats, and pointer size is retrieved from dump's ptr_sz field (Andrii, patch 1). - Added alignment-aware handling for int, enum, float retrieval. Packed data structures can force ints, enums and floats to be aligned on different boundaries; for example, the struct p { char f1; int f2; } __attribute__((packed)); ...will have the int f2 field offset at byte 1, rather than at byte 4 for an unpacked structure. The problem is directly dereferencing that as an int is problematic on some platforms. For ints and enums, we can reuse bitfield retrieval to get the value for display, while for floats we use a local union of the floating-point types and memcpy into it, ensuring we can then dereference pointers into that union which will have safe alignment (Andrii, patch 1). - added comments to explain why we increment depth prior to displaying opening parens, and decrement it prior to displaying closing parens for structs, unions and arrays. The reason is that we don't want to have a trailing newline when displaying a type. The logic that handles this says "don't show a newline when the depth we're at is 0". For this to work for opening parens then we need to bump depth before showing opening parens + newline, and when we close out structure we need to show closing parens after reducing depth so that we don't append a newline to a top-level structure. So as a result we have struct foo {\n struct bar {\n }\n } - silently truncate provided indent string with strncat() if > 31 bytes (Andrii, patch 1). - fixed ASSERT_STRNEQ() macro to show only n bytes of string (Andrii, patch 2). - fixed strncat() of type data string to avoid stack corruption (Andrii, patch 3). - removed early returns from dump type tests (Andrii, patch 3). - have tests explicitly specify prefix (enum, struct, union) (Andrii, patch 3). - switch from CHECK() to ASSERT_* where possible (Andrii, patch 3). Changes since v4 [2] - Andrii kindly provided code to unify emitting a prepended cast (for example "(int)") with existing code, and this had the nice benefit of adding array indices in type specifications (Andrii, patches 1, 3) - Fixed indent_str option to make it a const char *, stored in a fixed-length buffer internally (Andrii, patch 1) - Reworked bit shift logic to minimize endian-specific interactions, and use same macros as found elsewhere in libbpf to determine endianness (Andrii, patch 1) - Fixed type emitting to ensure that a trailing '\n' is not displayed; newlines are added during struct/array display, but for a single type the last character is no longer a newline (Andrii, patches 1, 3) - Added support for ASSERT_STRNEQ() macro (Andrii, patch 2) - Split tests into subtests for int, char, enum etc rather than one "dump type data" subtest (Andrii, patch 3) - Made better use of ASSERT* macros (Andrii, patch 3) - Got rid of some other TEST_* macros that were unneeded (Andrii, patch 3) - Switched to using "struct fs_context" to verify enum bitfield values (Andrii, patch 3) Changes since v3 [3] - Retained separation of emitting of type name cast prefixing type values from existing functionality such as btf_dump_emit_type_chain() since initial code-shared version had so many exceptions it became hard to read. For example, we don't emit a type name if the type to be displayed is an array member, we also always emit "forward" definitions for structs/unions that aren't really forward definitions (we just want a "struct foo" output for "(struct foo){.bar = ...". We also always ignore modifiers const/volatile/restrict as they clutter output when emitting large types. - Added configurable 4-char indent string option; defaults to tab (Andrii) - Added support for BTF_KIND_FLOAT and associated tests (Andrii) - Added support for BTF_KIND_FUNC_PROTO function pointers to improve output of "ops" structures; for example: (struct file_operations){ .owner = (struct module *)0xffffffffffffffff, .llseek = (loff_t(*)(struct file *, loff_t, int))0xffffffffffffffff, ... Added associated test also (Andrii) - Added handling for enum bitfields and associated test (Andrii) - Allocation of "struct btf_dump_data" done on-demand (Andrii) - Removed ".field = " output from function emitting type name and into caller (Andrii) - Removed BTF_INT_OFFSET() support (Andrii) - Use libbpf_err() to set errno for error cases (Andrii) - btf_dump_dump_type_data() returns size written, which is used when returning successfully from btf_dump__dump_type_data() (Andrii) Changes since v2 [4] - Renamed function to btf_dump__dump_type_data, reorganized arguments such that opts are last (Andrii) - Modified code to separate questions about display such as have we overflowed?/is this field zero? from actual display of typed data, such that we ask those questions separately from the code that actually displays typed data (Andrii) - Reworked code to handle overflow - where we do not provide enough data for the type we wish to display - by returning -E2BIG and attempting to present as much data as possible. Such a mode of operation allows for tracers which retrieve partial data (such as first 1024 bytes of a "struct task_struct" say), and want to display that partial data, while also knowing that it is not the full type. Such tracers can then denote this (perhaps via "..." or similar). - Explored reusing existing type emit functions, such as passing in a type id stack with a single type id to btf_dump_emit_type_chain() to support the display of typed data where a "cast" is prepended to the data to denote its type; "(int)1", "(struct foo){", etc. However the task of emitting a ".field_name = (typecast)" did not match well with model of walking the stack to display innermost types first and made the resultant code harder to read. Added a dedicated btf_dump_emit_type_name() function instead which is only ~70 lines (Andrii) - Various cleanups around bitfield macros, unneeded member iteration macros, avoiding compiler complaints when displaying int da ta by casting to long long, etc (Andrii) - Use DECLARE_LIBBPF_OPTS() in defining opts for tests (Andrii) - Added more type tests, overflow tests, var tests and section tests. Changes since RFC [5] - The initial approach explored was to share the kernel code with libbpf using #defines to paper over the different needs; however it makes more sense to try and fit in with libbpf code style for maintenance. A comment in the code points at the implementation in kernel/bpf/btf.c and notes that any issues found in it should be fixed there or vice versa; mirroring the tests should help with this also (Andrii) [1] https://lore.kernel.org/bpf/1624092968-5598-1-git-send-email-alan.maguire@oracle.com/ [2] https://lore.kernel.org/bpf/CAEf4BzYtbnphCkhz0epMKE4zWfvSOiMpu+-SXp9hadsrRApuZw@mail.gmail.com/T/ [3] https://lore.kernel.org/bpf/1622131170-8260-1-git-send-email-alan.maguire@oracle.com/ [4] https://lore.kernel.org/bpf/1610921764-7526-1-git-send-email-alan.maguire@oracle.com/ [5] https://lore.kernel.org/bpf/1610386373-24162-1-git-send-email-alan.maguire@oracle.com/ ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
-
Alan Maguire authored
Test various type data dumping operations by comparing expected format with the dumped string; an snprintf-style printf function is used to record the string dumped. Also verify overflow handling where the data passed does not cover the full size of a type, such as would occur if a tracer has a portion of the 8k "struct task_struct". Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1626362126-27775-4-git-send-email-alan.maguire@oracle.com
-
Alan Maguire authored
It will support strncmp()-style string comparisons. Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1626362126-27775-3-git-send-email-alan.maguire@oracle.com
-
Alan Maguire authored
Add a BTF dumper for typed data, so that the user can dump a typed version of the data provided. The API is int btf_dump__dump_type_data(struct btf_dump *d, __u32 id, void *data, size_t data_sz, const struct btf_dump_type_data_opts *opts); ...where the id is the BTF id of the data pointed to by the "void *" argument; for example the BTF id of "struct sk_buff" for a "struct skb *" data pointer. Options supported are - a starting indent level (indent_lvl) - a user-specified indent string which will be printed once per indent level; if NULL, tab is chosen but any string <= 32 chars can be provided. - a set of boolean options to control dump display, similar to those used for BPF helper bpf_snprintf_btf(). Options are - compact : omit newlines and other indentation - skip_names: omit member names - emit_zeroes: show zero-value members Default output format is identical to that dumped by bpf_snprintf_btf(), for example a "struct sk_buff" representation would look like this: struct sk_buff){ (union){ (struct){ .next = (struct sk_buff *)0xffffffffffffffff, .prev = (struct sk_buff *)0xffffffffffffffff, (union){ .dev = (struct net_device *)0xffffffffffffffff, .dev_scratch = (long unsigned int)18446744073709551615, }, }, ... If the data structure is larger than the *data_sz* number of bytes that are available in *data*, as much of the data as possible will be dumped and -E2BIG will be returned. This is useful as tracers will sometimes not be able to capture all of the data associated with a type; for example a "struct task_struct" is ~16k. Being able to specify that only a subset is available is important for such cases. On success, the amount of data dumped is returned. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1626362126-27775-2-git-send-email-alan.maguire@oracle.com
-
Andrii Nakryiko authored
Shuyi Cheng says: ==================== This patch set adds the ability to point to a custom BTF for the purposes of BPF CO-RE relocations. This is useful for using BPF CO-RE on old kernels that don't yet natively support kernel (vmlinux) BTF and thus libbpf needs application's help in locating kernel BTF generated separately from the kernel itself. This was already possible to do through bpf_object__load's attribute struct, but that makes it inconvenient to use with BPF skeleton, which only allows to specify bpf_object_open_opts during the open step. Thus, add the ability to override vmlinux BTF at open time. Patch #1 adds libbpf changes. Patch #2 fixes pre-existing memory leak detected during the code review. Patch #3 switches existing selftests to using open_opts for custom BTF. Changelog: ---------- v3: https://lore.kernel.org/bpf/CAEf4BzY2cdT44bfbMus=gei27ViqGE1BtGo6XrErSsOCnqtVJg@mail.gmail.com/T/#m877eed1d4cf0a1d3352d3f3d6c5ff158be45c542 v3->v4: - Follow Andrii's suggestion to modify cover letter description. - Delete function bpf_object__load_override_btf. - Follow Dan's suggestion to add fixes tag and modify commit msg to patch #2. - Add pathch #3 to switch existing selftests to using open_opts. v2: https://lore.kernel.org/bpf/CAEf4Bza_ua+tjxdhyy4nZ8Boeo+scipWmr_1xM1pC6N5wyuhAA@mail.gmail.com/T/#mf9cf86ae0ffa96180ac29e4fd12697eb70eccd0f v2->v3: - Load the BTF specified by btf_custom_path to btf_vmlinux_override instead of btf_bmlinux. - Fix the memory leak that may be introduced by the second version of the patch. - Add a new patch to fix the possible memory leak caused by obj->kconfig. v1: https://lore.kernel.org/bpf/CAEf4BzaGjEC4t1OefDo11pj2-HfNy0BLhs_G2UREjRNTmb2u=A@mail.gmail.com/t/#m4d9f7c6761fbd2b436b5dfe491cd864b70225804 v1->v2: - Change custom_btf_path to btf_custom_path. - If the length of btf_custom_path of bpf_obj_open_opts is too long, return ERR_PTR(-ENAMETOOLONG). - Add `custom BTF is in addition to vmlinux BTF` with btf_custom_path field. ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
-
Shuyi Cheng authored
This patch mainly replaces the bpf_object_load_attr of the core_autosize.c and core_reloc.c files with bpf_object_open_opts. Signed-off-by: Shuyi Cheng <chengshuyi@linux.alibaba.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1626180159-112996-4-git-send-email-chengshuyi@linux.alibaba.com
-
Shuyi Cheng authored
If the strdup() fails then we need to call bpf_object__close(obj) to avoid a resource leak. Fixes: 166750bc ("libbpf: Support libbpf-provided extern variables") Signed-off-by: Shuyi Cheng <chengshuyi@linux.alibaba.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1626180159-112996-3-git-send-email-chengshuyi@linux.alibaba.com
-
Shuyi Cheng authored
btf_custom_path allows developers to load custom BTF which libbpf will subsequently use for CO-RE relocation instead of vmlinux BTF. Having btf_custom_path in bpf_object_open_opts one can directly use the skeleton's <objname>_bpf__open_opts() API to pass in the btf_custom_path parameter, as opposed to using bpf_object__load_xattr() which is slated to be deprecated ([0]). This work continues previous work started by another developer ([1]). [0] https://lore.kernel.org/bpf/CAEf4BzbJZLjNoiK8_VfeVg_Vrg=9iYFv+po-38SMe=UzwDKJ=Q@mail.gmail.com/#t [1] https://yhbt.net/lore/all/CAEf4Bzbgw49w2PtowsrzKQNcxD4fZRE6AKByX-5-dMo-+oWHHA@mail.gmail.com/Signed-off-by: Shuyi Cheng <chengshuyi@linux.alibaba.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1626180159-112996-2-git-send-email-chengshuyi@linux.alibaba.com
-
Roy, UjjaL authored
Add new heading for extensions to make it more readable. Also, add one more example of filtering interface index for better understanding. Signed-off-by: Roy, UjjaL <royujjal@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/CAADnVQJ=DoRDcVkaXmY3EmNdLoO7gq1mkJOn5G=00wKH8qUtZQ@mail.gmail.com
-
Andrii Nakryiko authored
b910eaaa ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper") fixed the problem with cgroup-local storage use in BPF by pre-allocating per-CPU array of 8 cgroup storage pointers to accommodate possible BPF program preemptions and nested executions. While this seems to work good in practice, it introduces new and unnecessary failure mode in which not all BPF programs might be executed if we fail to find an unused slot for cgroup storage, however unlikely it is. It might also not be so unlikely when/if we allow sleepable cgroup BPF programs in the future. Further, the way that cgroup storage is implemented as ambiently-available property during entire BPF program execution is a convenient way to pass extra information to BPF program and helpers without requiring user code to pass around extra arguments explicitly. So it would be good to have a generic solution that can allow implementing this without arbitrary restrictions. Ideally, such solution would work for both preemptable and sleepable BPF programs in exactly the same way. This patch introduces such solution, bpf_run_ctx. It adds one pointer field (bpf_ctx) to task_struct. This field is maintained by BPF_PROG_RUN family of macros in such a way that it always stays valid throughout BPF program execution. BPF program preemption is handled by remembering previous current->bpf_ctx value locally while executing nested BPF program and restoring old value after nested BPF program finishes. This is handled by two helper functions, bpf_set_run_ctx() and bpf_reset_run_ctx(), which are supposed to be used before and after BPF program runs, respectively. Restoring old value of the pointer handles preemption, while bpf_run_ctx pointer being a property of current task_struct naturally solves this problem for sleepable BPF programs by "following" BPF program execution as it is scheduled in and out of CPU. It would even allow CPU migration of BPF programs, even though it's not currently allowed by BPF infra. This patch cleans up cgroup local storage handling as a first application. The design itself is generic, though, with bpf_run_ctx being an empty struct that is supposed to be embedded into a specific struct for a given BPF program type (bpf_cg_run_ctx in this case). Follow up patches are planned that will expand this mechanism for other uses within tracing BPF programs. To verify that this change doesn't revert the fix to the original cgroup storage issue, I ran the same repro as in the original report ([0]) and didn't get any problems. Replacing bpf_reset_run_ctx(old_run_ctx) with bpf_reset_run_ctx(NULL) triggers the issue pretty quickly (so repro does work). [0] https://lore.kernel.org/bpf/YEEvBUiJl2pJkxTd@krava/ Fixes: b910eaaa ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210712230615.3525979-1-andrii@kernel.org
-
Peilin Ye authored
Currently netdevsim only supports a single queue per port, which is insufficient for testing multi-queue TC schedulers e.g. sch_mq. Extend the current sysfs interface so that users can create ports with multiple queues: $ echo "[ID] [PORT_COUNT] [NUM_QUEUES]" > /sys/bus/netdevsim/new_device As an example, echoing "2 4 8" creates 4 ports, with 8 queues per port. Note, this is compatible with the current interface, with default number of queues set to 1. For example, echoing "2 4" creates 4 ports with 1 queue per port; echoing "2" simply creates 1 port with 1 queue. Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Mark Gray authored
The Open vSwitch kernel module uses the upcall mechanism to send packets from kernel space to user space when it misses in the kernel space flow table. The upcall sends packets via a Netlink socket. Currently, a Netlink socket is created for every vport. In this way, there is a 1:1 mapping between a vport and a Netlink socket. When a packet is received by a vport, if it needs to be sent to user space, it is sent via the corresponding Netlink socket. This mechanism, with various iterations of the corresponding user space code, has seen some limitations and issues: * On systems with a large number of vports, there is a correspondingly large number of Netlink sockets which can limit scaling. (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) * Packet reordering on upcalls. (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) * A thundering herd issue. (https://bugzilla.redhat.com/show_bug.cgi?id=1834444) This patch introduces an alternative, feature-negotiated, upcall mode using a per-cpu dispatch rather than a per-vport dispatch. In this mode, the Netlink socket to be used for the upcall is selected based on the CPU of the thread that is executing the upcall. In this way, it resolves the issues above as: a) The number of Netlink sockets scales with the number of CPUs rather than the number of vports. b) Ordering per-flow is maintained as packets are distributed to CPUs based on mechanisms such as RSS and flows are distributed to a single user space thread. c) Packets from a flow can only wake up one user space thread. The corresponding user space code can be found at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385139.html Bugzilla: https://bugzilla.redhat.com/1844576Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Bill Wendling authored
Fix the clang build warning: drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c:1862:13: error: variable 'cur_data_offset' set but not used [-Werror,-Wunused-but-set-variable] dma_addr_t cur_data_offset; Signed-off-by: Bill Wendling <morbo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christophe JAILLET authored
Use 'bitmap_alloc()/bitmap_free()' instead of hand-writing it. This makes the code less verbose. Also, use 'bitmap_alloc()' instead of 'bitmap_zalloc()' because the bitmap is fully overridden by a 'bitmap_copy()' call just after its allocation. While at it, remove an extra and unneeded space. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yajun Deng authored
It has been deal with the 'if (err' statement in rtnetlink_send() and rtnl_unicast(). so remove unnecessary if statement. v2: use the raw name rtnetlink_send(). Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yajun Deng authored
The netlink_{broadcast, unicast} don't deal with 'if (err > 0' statement but nlmsg_{multicast, unicast} do. The nlmsg_notify() contains them. so use nlmsg_notify() instead. so that the caller wouldn't deal with 'if (err > 0' statement. v2: use nlmsg_notify() will do well. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Haiyue Wang authored
The 'tail' pointer is also free-running count, so it needs to be masked as 'adminq_prod_cnt' does, to become an index value of AdminQ buffer. Fixes: 5cdad90d ("gve: Batch AQ commands for creating and destroying queues.") Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller authored
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-07-15 The following pull-request contains BPF updates for your *net-next* tree. We've added 45 non-merge commits during the last 15 day(s) which contain a total of 52 files changed, 3122 insertions(+), 384 deletions(-). The main changes are: 1) Introduce bpf timers, from Alexei. 2) Add sockmap support for unix datagram socket, from Cong. 3) Fix potential memleak and UAF in the verifier, from He. 4) Add bpf_get_func_ip helper, from Jiri. 5) Improvements to generic XDP mode, from Kumar. 6) Support for passing xdp_md to XDP programs in bpf_prog_run, from Zvi. =================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexei Starovoitov authored
Cong Wang says: ==================== From: Cong Wang <cong.wang@bytedance.com> This is the last patchset of the original large patchset. In the previous patchset, a new BPF sockmap program BPF_SK_SKB_VERDICT was introduced and UDP began to support it too. In this patchset, we add BPF_SK_SKB_VERDICT support to Unix datagram socket, so that we can finally splice Unix datagram socket and UDP socket. Please check each patch description for more details. To see the big picture, the previous patchsets are available here: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=1e0ab70778bd86a90de438cc5e1535c115a7c396 https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=89d69c5d0fbcabd8656459bc8b1a476d6f1efee4 and this patchset is available here: https://github.com/congwang/linux/tree/sockmap3Acked-by: John Fastabend <john.fastabend@gmail.com> --- v5: lift socket state check for dgram remove ->unhash() case add retries for EAGAIN in all test cases remove an unused parameter of __unix_dgram_recvmsg() rebase on the latest bpf-next v4: fix af_unix disconnect case add unix_unhash() split out two small patches reduce u->iolock critical section remove an unused parameter of __unix_dgram_recvmsg() v3: fix Kconfig dependency make unix_read_sock() static fix a UAF in unix_release() add a missing header unix_bpf.c v2: separate out from the original large patchset rebase to the latest bpf-next clean up unix_read_sock() export sock_map_close() factor out some helpers in selftests for code reuse ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Cong Wang authored
Add two test cases to ensure redirection between udp and unix work bidirectionally. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-12-xiyou.wangcong@gmail.com
-
Cong Wang authored
Add a test case to ensure redirection between two AF_UNIX datagram sockets work. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-11-xiyou.wangcong@gmail.com
-
Cong Wang authored
Factor out a common helper add_to_sockmap() which adds two sockets into a sockmap. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-10-xiyou.wangcong@gmail.com
-
Cong Wang authored
Factor out a common helper udp_socketpair() which creates a pair of connected UDP sockets. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-9-xiyou.wangcong@gmail.com
-
Cong Wang authored
We have to implement unix_dgram_bpf_recvmsg() to replace the original ->recvmsg() to retrieve skmsg from ingress_msg. AF_UNIX is again special here because the lack of sk_prot->recvmsg(). I simply add a special case inside unix_dgram_recvmsg() to call sk->sk_prot->recvmsg() directly. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-8-xiyou.wangcong@gmail.com
-
Cong Wang authored
Now we can implement unix_bpf_update_proto() to update sk_prot, especially prot->close(). Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-7-xiyou.wangcong@gmail.com
-
Cong Wang authored
Unlike af_inet, unix_proto is very different, it does not even have a ->close(). We have to add a dummy implementation to satisfy sockmap. Normally it is just a nop, it is introduced only for sockmap to replace it. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-6-xiyou.wangcong@gmail.com
-
Cong Wang authored
Currently only unix stream socket sets TCP_ESTABLISHED, datagram socket can set this too when they connect to its peer socket. At least __ip4_datagram_connect() does the same. This will be used to determine whether an AF_UNIX datagram socket can be redirected to in sockmap. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-5-xiyou.wangcong@gmail.com
-
Cong Wang authored
Implement ->read_sock() for AF_UNIX datagram socket, it is pretty much similar to udp_read_sock(). Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-4-xiyou.wangcong@gmail.com
-
Cong Wang authored
TCP and other connection oriented sockets have accept() for each incoming connection on the server side, hence they can just insert those fd's from accept() to sockmap, which are of course established. Now with datagram sockets begin to support sockmap and redirection, the restriction is no longer applicable to them, as they have no accept(). So we have to lift this restriction for them. This is fine, because inside bpf_sk_redirect_map() we still have another socket status check, sock_map_redirect_allowed(), as a guard. This also means they do not have to be removed from sockmap when disconnecting. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-3-xiyou.wangcong@gmail.com
-
Cong Wang authored
Currently sock_map still has Kconfig dependency on CONFIG_INET, but there is no actual functional dependency on it after we introduce ->psock_update_sk_prot(). We have to extend it to CONFIG_NET now as we are going to support AF_UNIX. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210704190252.11866-2-xiyou.wangcong@gmail.com
-
Alexei Starovoitov authored
Jiri Olsa says: ==================== Add bpf_get_func_ip helper that returns IP address of the caller function for trampoline and krobe programs. There're 2 specific implementation of the bpf_get_func_ip helper, one for trampoline progs and one for kprobe/kretprobe progs. The trampoline helper call is replaced/inlined by the verifier with simple move instruction. The kprobe/kretprobe is actual helper call that returns prepared caller address. Also available at: https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git bpf/get_func_ip v4 changes: - dropped jit/x86 check for get_func_ip tracing check [Alexei] - added code to bpf_get_func_ip_tracing [Alexei] and tested that it works without inlining [Alexei] - changed has_get_func_ip to check_get_func_ip [Andrii] - replaced test assert loop with explicit asserts [Andrii] - adde bpf_program__attach_kprobe_opts function and use it for offset setup [Andrii] - used bpf_program__set_autoload(false) for test6 [Andrii] - added Masami's ack v3 changes: - resend with Masami in cc and v3 in each patch subject v2 changes: - use kprobe_running to get kprobe instead of cpu var [Masami] - added support to add kprobe on function+offset and test for that [Alan] ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Jiri Olsa authored
Adding test for bpf_get_func_ip in kprobe+ofset probe. Because of the offset value it's arch specific, enabling the new test only for x86_64 architecture. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210714094400.396467-9-jolsa@kernel.org
-
Alan Maguire authored
kprobes can be placed on most instructions in a function, not just entry, and ftrace and bpftrace support the function+offset notification for probe placement. Adding parsing of func_name into func+offset to bpf_program__attach_kprobe() allows the user to specify SEC("kprobe/bpf_fentry_test5+0x6") ...for example, and the offset can be passed to perf_event_open_probe() to support kprobe attachment. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210714094400.396467-8-jolsa@kernel.org
-
Jiri Olsa authored
Adding bpf_program__attach_kprobe_opts that does the same as bpf_program__attach_kprobe, but takes opts argument. Currently opts struct holds just retprobe bool, but we will add new field in following patch. The function is not exported, so there's no need to add size to the struct bpf_program_attach_kprobe_opts for now. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210714094400.396467-7-jolsa@kernel.org
-
Jiri Olsa authored
Adding test for bpf_get_func_ip helper for fentry, fexit, kprobe, kretprobe and fmod_ret programs. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210714094400.396467-6-jolsa@kernel.org
-
Jiri Olsa authored
Adding bpf_get_func_ip helper for BPF_PROG_TYPE_KPROBE programs, so it's now possible to call bpf_get_func_ip from both kprobe and kretprobe programs. Taking the caller's address from 'struct kprobe::addr', which is defined for both kprobe and kretprobe. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/bpf/20210714094400.396467-5-jolsa@kernel.org
-
Jiri Olsa authored
Adding bpf_get_func_ip helper for BPF_PROG_TYPE_TRACING programs, specifically for all trampoline attach types. The trampoline's caller IP address is stored in (ctx - 8) address. so there's no reason to actually call the helper, but rather fixup the call instruction and return [ctx - 8] value directly. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210714094400.396467-4-jolsa@kernel.org
-