Commit b2ad54e1 authored by Xu Kuohai's avatar Xu Kuohai Committed by Daniel Borkmann

bpf, arm64: Implement bpf_arch_text_poke() for arm64

Implement bpf_arch_text_poke() for arm64, so bpf prog or bpf trampoline
can be patched with it.

When the target address is NULL, the original instruction is patched to
a NOP.

When the target address and the source address are within the branch
range, the original instruction is patched to a bl instruction to the
target address directly.

To support attaching bpf trampoline to both regular kernel function and
bpf prog, we follow the ftrace patchsite way for bpf prog. That is, two
instructions are inserted at the beginning of bpf prog, the first one
saves the return address to x9, and the second is a nop which will be
patched to a bl instruction when a bpf trampoline is attached.

However, when a bpf trampoline is attached to bpf prog, the distance
between target address and source address may exceed 128MB, the maximum
branch range, because bpf trampoline and bpf prog are allocated
separately with vmalloc. So long jump should be handled.

When a bpf prog is constructed, a plt pointing to empty trampoline
dummy_tramp is placed at the end:

        bpf_prog:
                mov x9, lr
                nop // patchsite
                ...
                ret

        plt:
                ldr x10, target
                br x10
        target:
                .quad dummy_tramp // plt target

This is also the state when no trampoline is attached.

When a short-jump bpf trampoline is attached, the patchsite is patched to
a bl instruction to the trampoline directly:

        bpf_prog:
                mov x9, lr
                bl <short-jump bpf trampoline address> // patchsite
                ...
                ret

        plt:
                ldr x10, target
                br x10
        target:
                .quad dummy_tramp // plt target

When a long-jump bpf trampoline is attached, the plt target is filled with
the trampoline address and the patchsite is patched to a bl instruction to
the plt:

        bpf_prog:
                mov x9, lr
                bl plt // patchsite
                ...
                ret

        plt:
                ldr x10, target
                br x10
        target:
                .quad <long-jump bpf trampoline address>

dummy_tramp is used to prevent another CPU from jumping to an unknown
location during the patching process, making the patching process easier.

The patching process is as follows:

1. when neither the old address or the new address is a long jump, the
   patchsite is replaced with a bl to the new address, or nop if the new
   address is NULL;

2. when the old address is not long jump but the new one is, the
   branch target address is written to plt first, then the patchsite
   is replaced with a bl instruction to the plt;

3. when the old address is long jump but the new one is not, the address
   of dummy_tramp is written to plt first, then the patchsite is replaced
   with a bl to the new address, or a nop if the new address is NULL;

4. when both the old address and the new address are long jump, the
   new address is written to plt and the patchsite is not changed.
Signed-off-by: default avatarXu Kuohai <xukuohai@huawei.com>
Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: default avatarKP Singh <kpsingh@kernel.org>
Reviewed-by: default avatarJean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: default avatarSong Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220711150823.2128542-4-xukuohai@huawei.com
parent f1e8a24e
......@@ -80,6 +80,12 @@
#define A64_STR64I(Xt, Xn, imm) A64_LS_IMM(Xt, Xn, imm, 64, STORE)
#define A64_LDR64I(Xt, Xn, imm) A64_LS_IMM(Xt, Xn, imm, 64, LOAD)
/* LDR (literal) */
#define A64_LDR32LIT(Wt, offset) \
aarch64_insn_gen_load_literal(0, offset, Wt, false)
#define A64_LDR64LIT(Xt, offset) \
aarch64_insn_gen_load_literal(0, offset, Xt, true)
/* Load/store register pair */
#define A64_LS_PAIR(Rt, Rt2, Rn, offset, ls, type) \
aarch64_insn_gen_load_store_pair(Rt, Rt2, Rn, offset, \
......@@ -270,6 +276,7 @@
#define A64_BTI_C A64_HINT(AARCH64_INSN_HINT_BTIC)
#define A64_BTI_J A64_HINT(AARCH64_INSN_HINT_BTIJ)
#define A64_BTI_JC A64_HINT(AARCH64_INSN_HINT_BTIJC)
#define A64_NOP A64_HINT(AARCH64_INSN_HINT_NOP)
/* DMB */
#define A64_DMB_ISH aarch64_insn_gen_dmb(AARCH64_INSN_MB_ISH)
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment