- 28 Sep, 2021 27 commits
-
-
Yucong Sun authored
"?:" is a GNU C extension, some environment has warning flags for its use, or even prohibit it directly. This patch avoid triggering these problems by simply expand it to its full form, no functionality change. Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210928184221.1545079-1-fallentree@fb.com
-
Alexei Starovoitov authored
Andrii Nakryiko says: ==================== Implement opt-in stricter BPF program section name (SEC()) handling logic. For a lot of supported ELF section names, enforce exact section name match with no arbitrary characters added at the end. See patch #9 for more details. To allow this, patches #2 through #4 clean up and preventively fix selftests, normalizing existing SEC() usage across multiple selftests. While at it, those patches also reduce the amount of remaining bpf_object__find_program_by_title() uses, which should be completely removed soon, given it's an API with ambiguous semantics and will be deprecated and eventually removed in libbpf 1.0. Patch #1 also introduces SEC("tc") as an alias for SEC("classifier"). "tc" is a better and less misleading name, so patch #3 replaces all classifier* uses with nice and short SEC("tc"). Last patch is also fixing "sk_lookup/" definition to not require and not allow extra "/blah" parts after it, which serve no meaning. All the other patches are gradual internal libbpf changes to: - allow this optional strict logic for ELF section name handling; - allow new use case (for now for "struct_ops", but that could be extended to, say, freplace definitions), in which it can be used stand-alone to specify just type (SEC("struct_ops")), or also accept extra parameters which can be utilized by libbpf to either get more data or double-check valid use (e.g., SEC("struct_ops/dctcp_init") to specify desired struct_ops operation that is supposed to be implemented); - get libbpf's internal logic ready to allow other libraries and applications to specify their custom handlers for ELF section name for BPF programs. All the pieces are in place, the only thing preventing making this as public libbpf API is reliance on internal type for specifying BPF program load attributes. The work is planned to revamp related low-level libbpf APIs, at which point it will be possible to just re-use such new types for coordination between libbpf and custom handlers. These changes are a part of libbpf 1.0 effort ([0]). They are also intended to be applied on top of the previous preparatory series [1], so currently CI will be failing to apply them to bpf-next until that patch set is landed. Once it is landed, kernel-patches daemon will automatically retest this patch set. [0] https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#stricter-and-more-uniform-bpf-program-section-name-sec-handling [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=547675&state=* v3->v4: - replace SEC("classifier*") with SEC("tc") (Daniel); v2->v3: - applied acks, addressed most feedback, added comments to new flags (Dave); v1->v2: - rebase onto latest bpf-next and resolve merge conflicts w/ Dave's changes. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Andrii Nakryiko authored
Update "sk_lookup/" definition to be a stand-alone type specifier, with backwards-compatible prefix match logic in non-libbpf-1.0 mode. Currently in selftests all the "sk_lookup/<whatever>" uses just use <whatever> for duplicated unique name encoding, which is redundant as BPF program's name (C function name) uniquely and descriptively identifies the intended use for such BPF programs. With libbpf's SEC_DEF("sk_lookup") definition updated, switch existing sk_lookup programs to use "unqualified" SEC("sk_lookup") section names, with no random text after it. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/bpf/20210928161946.2512801-11-andrii@kernel.org
-
Andrii Nakryiko authored
Implement strict ELF section name handling for BPF programs. It utilizes `libbpf_set_strict_mode()` framework and adds new flag: LIBBPF_STRICT_SEC_NAME. If this flag is set, libbpf will enforce exact section name matching for a lot of program types that previously allowed just partial prefix match. E.g., if previously SEC("xdp_whatever_i_want") was allowed, now in strict mode only SEC("xdp") will be accepted, which makes SEC("") definitions cleaner and more structured. SEC() now won't be used as yet another way to uniquely encode BPF program identifier (for that C function name is better and is guaranteed to be unique within bpf_object). Now SEC() is strictly BPF program type and, depending on program type, extra load/attach parameter specification. Libbpf completely supports multiple BPF programs in the same ELF section, so multiple BPF programs of the same type/specification easily co-exist together within the same bpf_object scope. Additionally, a new (for now internal) convention is introduced: section name that can be a stand-alone exact BPF program type specificator, but also could have extra parameters after '/' delimiter. An example of such section is "struct_ops", which can be specified by itself, but also allows to specify the intended operation to be attached to, e.g., "struct_ops/dctcp_init". Note, that "struct_ops_some_op" is not allowed. Such section definition is specified as "struct_ops+". This change is part of libbpf 1.0 effort ([0], [1]). [0] Closes: https://github.com/libbpf/libbpf/issues/271 [1] https://github.com/libbpf/libbpf/wiki/Libbpf:-the-road-to-v1.0#stricter-and-more-uniform-bpf-program-section-name-sec-handlingSigned-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/bpf/20210928161946.2512801-10-andrii@kernel.org
-
Andrii Nakryiko authored
Complete SEC() table refactoring towards unified form by rewriting BPF_APROG_SEC and BPF_EAPROG_SEC definitions with SEC_DEF(SEC_ATTACHABLE_OPT) (for optional expected_attach_type) and SEC_DEF(SEC_ATTACHABLE) (mandatory expected_attach_type), respectively. Drop BPF_APROG_SEC, BPF_EAPROG_SEC, and BPF_PROG_SEC_IMPL macros after that, leaving SEC_DEF() macro as the only one used. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/bpf/20210928161946.2512801-9-andrii@kernel.org
-
Andrii Nakryiko authored
Refactor ELF section handler definitions table to use a set of flags and unified SEC_DEF() macro. This allows for more succinct and table-like set of definitions, and allows to more easily extend the logic without adding more verbosity (this is utilized in later patches in the series). This approach is also making libbpf-internal program pre-load callback not rely on bpf_sec_def definition, which demonstrates that future pluggable ELF section handlers will be able to achieve similar level of integration without libbpf having to expose extra types and APIs. For starters, update SEC_DEF() definitions and make them more succinct. Also convert BPF_PROG_SEC() and BPF_APROG_COMPAT() definitions to a common SEC_DEF() use. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/bpf/20210928161946.2512801-8-andrii@kernel.org
-
Andrii Nakryiko authored
Move closer to not relying on bpf_sec_def internals that won't be part of public API, when pluggable SEC() handlers will be allowed. Drop pre-calculated prefix length, and in various helpers don't rely on this prefix length availability. Also minimize reliance on knowing bpf_sec_def's prefix for few places where section prefix shortcuts are supported (e.g., tp vs tracepoint, raw_tp vs raw_tracepoint). Given checking some string for having a given string-constant prefix is such a common operation and so annoying to be done with pure C code, add a small macro helper, str_has_pfx(), and reuse it throughout libbpf.c where prefix comparison is performed. With __builtin_constant_p() it's possible to have a convenient helper that checks some string for having a given prefix, where prefix is either string literal (or compile-time known string due to compiler optimization) or just a runtime string pointer, which is quite convenient and saves a lot of typing and string literal duplication. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/bpf/20210928161946.2512801-7-andrii@kernel.org
-
Andrii Nakryiko authored
Refactor internals of libbpf to allow adding custom SEC() handling logic easily from outside of libbpf. To that effect, each SEC()-handling registration sets mandatory program type/expected attach type for a given prefix and can provide three callbacks called at different points of BPF program lifetime: - init callback for right after bpf_program is initialized and prog_type/expected_attach_type is set. This happens during bpf_object__open() step, close to the very end of constructing bpf_object, so all the libbpf APIs for querying and updating bpf_program properties should be available; - pre-load callback is called right before BPF_PROG_LOAD command is called in the kernel. This callbacks has ability to set both bpf_program properties, as well as program load attributes, overriding and augmenting the standard libbpf handling of them; - optional auto-attach callback, which makes a given SEC() handler support auto-attachment of a BPF program through bpf_program__attach() API and/or BPF skeletons <skel>__attach() method. Each callbacks gets a `long cookie` parameter passed in, which is specified during SEC() handling. This can be used by callbacks to lookup whatever additional information is necessary. This is not yet completely ready to be exposed to the outside world, mainly due to non-public nature of struct bpf_prog_load_params. Instead of making it part of public API, we'll wait until the planned low-level libbpf API improvements for BPF_PROG_LOAD and other typical bpf() syscall APIs, at which point we'll have a public, probably OPTS-based, way to fully specify BPF program load parameters, which will be used as an interface for custom pre-load callbacks. But this change itself is already a good first step to unify the BPF program hanling logic even within the libbpf itself. As one example, all the extra per-program type handling (sleepable bit, attach_btf_id resolution, unsetting optional expected attach type) is now more obvious and is gathered in one place. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/bpf/20210928161946.2512801-6-andrii@kernel.org
-
Andrii Nakryiko authored
Normalize all the other non-conforming SEC() usages across all selftests. This is in preparation for libbpf to start to enforce stricter SEC() rules in libbpf 1.0 mode. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/bpf/20210928161946.2512801-5-andrii@kernel.org
-
Andrii Nakryiko authored
Convert all SEC("classifier*") uses to a new and strict SEC("tc") section name. In reference_tracking selftests switch from ambiguous searching by program title (section name) to non-ambiguous searching by name in some selftests, getting closer to completely removing bpf_object__find_program_by_title(). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210928161946.2512801-4-andrii@kernel.org
-
Andrii Nakryiko authored
Convert almost all SEC("xdp_blah") uses to strict SEC("xdp") to comply with strict libbpf 1.0 logic of exact section name match for XDP program types. There is only one exception, which is only tested through iproute2 and defines multiple XDP programs within the same BPF object. Given iproute2 still works in non-strict libbpf mode and it doesn't have means to specify XDP programs by its name (not section name/title), leave that single file alone for now until iproute2 gains lookup by function/program name. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/bpf/20210928161946.2512801-3-andrii@kernel.org
-
Andrii Nakryiko authored
As argued in [0], add "tc" ELF section definition for SCHED_CLS BPF program type. "classifier" is a misleading terminology and should be migrated away from. [0] https://lore.kernel.org/bpf/270e27b1-e5be-5b1c-b343-51bd644d0747@iogearbox.net/Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210928161946.2512801-2-andrii@kernel.org
-
Johan Almbladh authored
This patch adds a tail call limit test where the program also emits a BPF_CALL to an external function prior to the tail call. Mainly testing that JITed programs preserve its internal register state, for example tail call count, across such external calls. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-15-johan.almbladh@anyfinetworks.com
-
Johan Almbladh authored
This patch fixes an error in the tail call limit test that caused the test to fail on for x86-64 JIT. Previously, the register R0 was used to report the total number of tail calls made. However, after a tail call fall-through, the value of the R0 register is undefined. Now, all tail call error path tests instead use context state to store the count. Fixes: 874be05f ("bpf, tests: Add tail call test suite") Reported-by: Paul Chaignon <paul@cilium.io> Reported-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/bpf/20210914091842.4186267-14-johan.almbladh@anyfinetworks.com
-
Johan Almbladh authored
This patch adds tests of the high 32 bits of 64-bit BPF_END conversions. It also adds a mirrored set of tests where the source bytes are reversed. The MSB of each byte is now set on the high word instead, possibly affecting sign-extension during conversion in a different way. Mainly for JIT testing. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-13-johan.almbladh@anyfinetworks.com
-
Johan Almbladh authored
This patch expands the branch conversion test introduced by 66e5eb84 ("bpf, tests: Add branch conversion JIT test"). The test now includes a JMP with maximum eBPF offset. This triggers branch conversion for the 64-bit MIPS JIT. Additional variants are also added for cases when the branch is taken or not taken. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-12-johan.almbladh@anyfinetworks.com
-
Johan Almbladh authored
This patch adds a set of tests for JMP and JMP32 operations where the branch decision is know at JIT time. Mainly testing JIT behaviour. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-11-johan.almbladh@anyfinetworks.com
-
Johan Almbladh authored
This patch adds a set of tests for JMP to verify that the JITed jump offset is calculated correctly. We pretend that the verifier has inserted any zero extensions to make the jump-over operations JIT to one instruction each, in order to control the exact JITed jump offset. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-10-johan.almbladh@anyfinetworks.com
-
Johan Almbladh authored
This patch adds a new flag to indicate that the verified did insert zero-extensions, even though the verifier is not being run for any of the tests. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-9-johan.almbladh@anyfinetworks.com
-
Johan Almbladh authored
This patch adds a test for the 64-bit immediate load, a two-instruction operation, to verify correctness for all possible magnitudes of the immediate operand. Mainly intended for JIT testing. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-8-johan.almbladh@anyfinetworks.com
-
Johan Almbladh authored
This patch adds a new type of jump test where the program jumps forwards and backwards with increasing offset. It mainly tests JITs where a relative jump may generate different JITed code depending on the offset size, read MIPS. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-7-johan.almbladh@anyfinetworks.com
-
Johan Almbladh authored
This patch adds a set of tests for conditional JMP and JMP32 operations to verify correctness for all possible magnitudes of the immediate and register operands. Mainly intended for JIT testing. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-6-johan.almbladh@anyfinetworks.com
-
Johan Almbladh authored
This patch adds a set of tests for ALU64 and ALU32 arithmetic and bitwise logical operations to verify correctness for all possible magnitudes of the register and immediate operands. Mainly intended for JIT testing. The patch introduces a pattern generator that can be used to drive extensive tests of different kinds of operations. It is parameterized to allow tuning of the operand combinations to test. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-5-johan.almbladh@anyfinetworks.com
-
Johan Almbladh authored
This patch adds a set of tests for ALU64 and ALU32 shift operations to verify correctness for all possible values of the shift value. Mainly intended for JIT testing. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-4-johan.almbladh@anyfinetworks.com
-
Johan Almbladh authored
The test suite used to call any fill_helper callbacks to generate eBPF program data for all test cases at once. This caused ballooning memory requirements as more extensive test cases were added. Now the each fill_helper is called before the test is run and the allocated memory released afterwards, before the next test case is processed. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-3-johan.almbladh@anyfinetworks.com
-
Johan Almbladh authored
This patch allows a test cast to specify the number of runs to use. For compatibility with existing test case definitions, the default value 0 is interpreted as MAX_TESTRUNS. A reduced number of runs is useful for complex test programs where 1000 runs may take a very long time. Instead of reducing what is tested, one can instead reduce the number of times the test is run. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210914091842.4186267-2-johan.almbladh@anyfinetworks.com
-
Toke Høiland-Jørgensen authored
When parsing legacy map definitions, libbpf would error out when encountering an STT_SECTION symbol. This becomes a problem because some versions of binutils will produce SECTION symbols for every section when processing an ELF file, so BPF files run through 'strip' will end up with such symbols, making libbpf refuse to load them. There's not really any reason why erroring out is strictly necessary, so change libbpf to just ignore SECTION symbols when parsing the ELF. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210927205810.715656-1-toke@redhat.com
-
- 27 Sep, 2021 13 commits
-
-
Daniel Borkmann authored
Magnus Karlsson says: ==================== This patch set introduces a batched interface for Rx buffer allocation in AF_XDP buffer pool. Instead of using xsk_buff_alloc(*pool), drivers can now use xsk_buff_alloc_batch(*pool, **xdp_buff_array, max). Instead of returning a pointer to an xdp_buff, it returns the number of xdp_buffs it managed to allocate up to the maximum value of the max parameter in the function call. Pointers to the allocated xdp_buff:s are put in the xdp_buff_array supplied in the call. This could be a SW ring that already exists in the driver or a new structure that the driver has allocated. u32 xsk_buff_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max); When using this interface, the driver should also use the new interface below to set the relevant fields in the struct xdp_buff. The reason for this is that xsk_buff_alloc_batch() does not fill in the data and data_meta fields for you as is the case with xsk_buff_alloc(). So it is not sufficient to just set data_end (effectively the size) anymore in the driver. The reason for this is performance as explained in detail in the commit message. void xsk_buff_set_size(struct xdp_buff *xdp, u32 size); Patch 6 also optimizes the buffer allocation in the aligned case. In this case, we can skip the reinitialization of most fields in the xdp_buff_xsk struct at allocation time. As the number of elements in the heads array is equal to the number of possible buffers in the umem, we can initialize them once and for all at bind time and then just point to the correct one in the xdp_buff_array that is returned to the driver. No reason to have a stack of free head entries. In the unaligned case, the buffers can reside anywhere in the umem, so this optimization is not possible as we still have to fill in the right information in the xdp_buff every single time one is allocated. I have updated i40e and ice to use this new batched interface. These are the throughput results on my 2.1 GHz Cascade Lake system: Aligned mode: ice: +11% / -9 cycles/pkt i40e: +12% / -9 cycles/pkt Unaligned mode: ice: +1.5% / -1 cycle/pkt i40e: +1% / -1 cycle/pkt For the aligned case, batching provides around 40% of the performance improvement and the aligned optimization the rest, around 60%. Would have expected a ~4% boost for unaligned with this data, but I only get around 1%. Do not know why. Note that memory consumption in aligned mode is also reduced by this patch set. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
-
Magnus Karlsson authored
Add a test for the frame_headroom feature that can be set on the umem. The logic added validates that all offsets in all tests and packets are valid, not just the ones that have a specifically configured frame_headroom. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-14-magnus.karlsson@gmail.com
-
Magnus Karlsson authored
Change the interleaving of packets in unaligned mode. With the current buffer addresses in the packet stream, the last buffer in the umem could not be used as a large packet could potentially write over the end of the umem. The kernel correctly threw this buffer address away and refused to use it. This is perfectly fine for all regular packet streams, but the ones used for unaligned mode have every other packet being at some different offset. As we will add checks for correct offsets in the next patch, this needs to be fixed. Just start these page-boundary straddling buffers one page earlier so that the last one is not on the last page of the umem, making all buffers valid. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-13-magnus.karlsson@gmail.com
-
Magnus Karlsson authored
Add a test where a single packet is sent and received. This might sound like a silly test, but since many of the interfaces in xsk are batched, it is important to be able to validate that we did not break something as fundamental as just receiving single packets, instead of batches of packets at high speed. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-12-magnus.karlsson@gmail.com
-
Magnus Karlsson authored
Introduce pacing of traffic so that the Tx thread can never send more packets than the receiver has processed plus the number of packets it can have in its umem. So at any point in time, the number of in flight packets (not processed by the Rx thread) are less than or equal to the number of packets that can be held in the Rx thread's umem. The batch size is also increased to improve running time. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-11-magnus.karlsson@gmail.com
-
Magnus Karlsson authored
The socket creation retry unnecessarily registered the umem once for every retry. No reason to do this. It wastes memory and it might lead to too many pages being locked at some point and the failure of a test. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-10-magnus.karlsson@gmail.com
-
Magnus Karlsson authored
Fix a problem where the fill ring was populated with too many entries. If number of buffers in the umem was smaller than the fill ring size, the code used to loop over from the beginning of the umem and start putting the same buffers in again. This is racy indeed as a later packet can be received overwriting an earlier one before the Rx thread manages to validate it. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-9-magnus.karlsson@gmail.com
-
Magnus Karlsson authored
Fix missing initialization of the member rx_pkt_nb in the packet stream. This leads to some tests declaring success too early as the test thought all packets had already been received. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-8-magnus.karlsson@gmail.com
-
Magnus Karlsson authored
Optimize for the aligned case by precomputing the parameter values of the xdp_buff_xsk and xdp_buff structures in the heads array. We can do this as the heads array size is equal to the number of chunks in the umem for the aligned case. Then every entry in this array will reflect a certain chunk/frame and can therefore be prepopulated with the correct values and we can drop the use of the free_heads stack. Note that it is not possible to allocate more buffers than what has been allocated in the aligned case since each chunk can only contain a single buffer. We can unfortunately not do this in the unaligned case as one chunk might contain multiple buffers. In this case, we keep the old scheme of populating a heads entry every time it is used and using the free_heads stack. Also move xp_release() and xp_get_handle() to xsk_buff_pool.h. They were for some reason in xsk.c even though they are buffer pool operations. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-7-magnus.karlsson@gmail.com
-
Magnus Karlsson authored
Use the new xsk batched rx allocation interface for the zero-copy data path. As the array of struct xdp_buff pointers kept by the driver is really a ring that wraps, the allocation routine is modified to detect a wrap and in that case call the allocation function twice. The allocation function cannot deal with wrapped rings, only arrays. As we now know exactly how many buffers we get and that there is no wrapping, the allocation function can be simplified even more as all if-statements in the allocation loop can be removed, improving performance. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-6-magnus.karlsson@gmail.com
-
Magnus Karlsson authored
Use the new xsk batched rx allocation interface for the zero-copy data path. As the array of struct xdp_buff pointers kept by the driver is really a ring that wraps, the allocation routine is modified to detect a wrap and in that case call the allocation function twice. The allocation function cannot deal with wrapped rings, only arrays. As we now know exactly how many buffers we get and that there is no wrapping, the allocation function can be simplified even more as all if-statements in the allocation loop can be removed, improving performance. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-5-magnus.karlsson@gmail.com
-
Magnus Karlsson authored
In order to use the new xsk batched buffer allocation interface, a pointer to an array of struct xsk_buff pointers need to be provided so that the function can put the result of the allocation there. In the ice driver, we already have a ring that stores pointers to xdp_buffs. This is only used for the xsk zero-copy driver and is a union with the structure that is used for the regular non zero-copy path. Unfortunately, that structure is larger than the xdp_buffs pointers which mean that there will be a stride (of 20 bytes) between each xdp_buff pointer. And feeding this into the xsk_buff_alloc_batch interface will not work since it assumes a regular array of xdp_buff pointers (each 8 bytes with 0 bytes in-between them on a 64-bit system). To fix this, remove the xdp_buff pointer from the rx_buf union and move it one step higher to the union above which only has pointers to arrays in it. This solves the problem and we can directly feed the SW ring of xdp_buff pointers straight into the allocation function in the next patch when that interface is used. This will improve performance. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-4-magnus.karlsson@gmail.com
-
Magnus Karlsson authored
Add a new driver interface xsk_buff_alloc_batch() offering batched buffer allocations to improve performance. The new interface takes three arguments: the buffer pool to allocated from, a pointer to an array of struct xdp_buff pointers which will contain pointers to the allocated xdp_buffs, and an unsigned integer specifying the max number of buffers to allocate. The return value is the actual number of buffers that the allocator managed to allocate and it will be in the range 0 <= N <= max, where max is the third parameter to the function. u32 xsk_buff_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max); A second driver interface is also introduced that need to be used in conjunction with xsk_buff_alloc_batch(). It is a helper that sets the size of struct xdp_buff and is used by the NIC Rx irq routine when receiving a packet. This helper sets the three struct members data, data_meta, and data_end. The two first ones is in the xsk_buff_alloc() case set in the allocation routine and data_end is set when a packet is received in the receive irq function. This unfortunately leads to worse performance since the xdp_buff is touched twice with a long time period in between leading to an extra cache miss. Instead, we fill out the xdp_buff with all 3 fields at one single point in time in the driver, when the size of the packet is known. Hence this helper. Note that the driver has to use this helper (or set all three fields itself) when using xsk_buff_alloc_batch(). xsk_buff_alloc() works as before and does not require this. void xsk_buff_set_size(struct xdp_buff *xdp, u32 size); Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-3-magnus.karlsson@gmail.com
-