Commit c231c22a authored by Jiong Wang's avatar Jiong Wang Committed by Alexei Starovoitov

bpf: doc: update answer for 32-bit subregister question

There has been quite a few progress around the two steps mentioned in the
answer to the following question:

  Q: BPF 32-bit subregister requirements

This patch updates the answer to reflect what has been done.

v2:
 - Add missing full stop. (Song Liu)
 - Minor tweak on one sentence. (Song Liu)

v1:
 - Integrated rephrase from Quentin and Jakub
Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
Acked-by: default avatarSong Liu <songliubraving@fb.com>
Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
parent d168286d
...@@ -172,11 +172,31 @@ registers which makes BPF inefficient virtual machine for 32-bit ...@@ -172,11 +172,31 @@ registers which makes BPF inefficient virtual machine for 32-bit
CPU architectures and 32-bit HW accelerators. Can true 32-bit registers CPU architectures and 32-bit HW accelerators. Can true 32-bit registers
be added to BPF in the future? be added to BPF in the future?
A: NO. The first thing to improve performance on 32-bit archs is to teach A: NO.
LLVM to generate code that uses 32-bit subregisters. Then second step
is to teach verifier to mark operations where zero-ing upper bits But some optimizations on zero-ing the upper 32 bits for BPF registers are
is unnecessary. Then JITs can take advantage of those markings and available, and can be leveraged to improve the performance of JITed BPF
drastically reduce size of generated code and improve performance. programs for 32-bit architectures.
Starting with version 7, LLVM is able to generate instructions that operate
on 32-bit subregisters, provided the option -mattr=+alu32 is passed for
compiling a program. Furthermore, the verifier can now mark the
instructions for which zero-ing the upper bits of the destination register
is required, and insert an explicit zero-extension (zext) instruction
(a mov32 variant). This means that for architectures without zext hardware
support, the JIT back-ends do not need to clear the upper bits for
subregisters written by alu32 instructions or narrow loads. Instead, the
back-ends simply need to support code generation for that mov32 variant,
and to overwrite bpf_jit_needs_zext() to make it return "true" (in order to
enable zext insertion in the verifier).
Note that it is possible for a JIT back-end to have partial hardware
support for zext. In that case, if verifier zext insertion is enabled,
it could lead to the insertion of unnecessary zext instructions. Such
instructions could be removed by creating a simple peephole inside the JIT
back-end: if one instruction has hardware support for zext and if the next
instruction is an explicit zext, then the latter can be skipped when doing
the code generation.
Q: Does BPF have a stable ABI? Q: Does BPF have a stable ABI?
------------------------------ ------------------------------
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment