• Mark Rutland's avatar
    sched/preempt: Add PREEMPT_DYNAMIC using static keys · 99cf983c
    Mark Rutland authored
    Where an architecture selects HAVE_STATIC_CALL but not
    HAVE_STATIC_CALL_INLINE, each static call has an out-of-line trampoline
    which will either branch to a callee or return to the caller.
    
    On such architectures, a number of constraints can conspire to make
    those trampolines more complicated and potentially less useful than we'd
    like. For example:
    
    * Hardware and software control flow integrity schemes can require the
      addition of "landing pad" instructions (e.g. `BTI` for arm64), which
      will also be present at the "real" callee.
    
    * Limited branch ranges can require that trampolines generate or load an
      address into a register and perform an indirect branch (or at least
      have a slow path that does so). This loses some of the benefits of
      having a direct branch.
    
    * Interaction with SW CFI schemes can be complicated and fragile, e.g.
      requiring that we can recognise idiomatic codegen and remove
      indirections understand, at least until clang proves more helpful
      mechanisms for dealing with this.
    
    For PREEMPT_DYNAMIC, we don't need the full power of static calls, as we
    really only need to enable/disable specific preemption functions. We can
    achieve the same effect without a number of the pain points above by
    using static keys to fold early returns into the preemption functions
    themselves rather than in an out-of-line trampoline, effectively
    inlining the trampoline into the start of the function.
    
    For arm64, this results in good code generation. For example, the
    dynamic_cond_resched() wrapper looks as follows when enabled. When
    disabled, the first `B` is replaced with a `NOP`, resulting in an early
    return.
    
    | <dynamic_cond_resched>:
    |        bti     c
    |        b       <dynamic_cond_resched+0x10>     // or `nop`
    |        mov     w0, #0x0
    |        ret
    |        mrs     x0, sp_el0
    |        ldr     x0, [x0, #8]
    |        cbnz    x0, <dynamic_cond_resched+0x8>
    |        paciasp
    |        stp     x29, x30, [sp, #-16]!
    |        mov     x29, sp
    |        bl      <preempt_schedule_common>
    |        mov     w0, #0x1
    |        ldp     x29, x30, [sp], #16
    |        autiasp
    |        ret
    
    ... compared to the regular form of the function:
    
    | <__cond_resched>:
    |        bti     c
    |        mrs     x0, sp_el0
    |        ldr     x1, [x0, #8]
    |        cbz     x1, <__cond_resched+0x18>
    |        mov     w0, #0x0
    |        ret
    |        paciasp
    |        stp     x29, x30, [sp, #-16]!
    |        mov     x29, sp
    |        bl      <preempt_schedule_common>
    |        mov     w0, #0x1
    |        ldp     x29, x30, [sp], #16
    |        autiasp
    |        ret
    
    Any architecture which implements static keys should be able to use this
    to implement PREEMPT_DYNAMIC with similar cost to non-inlined static
    calls. Since this is likely to have greater overhead than (inlined)
    static calls, PREEMPT_DYNAMIC is only defaulted to enabled when
    HAVE_PREEMPT_DYNAMIC_CALL is selected.
    Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
    Acked-by: default avatarFrederic Weisbecker <frederic@kernel.org>
    Link: https://lore.kernel.org/r/20220214165216.2231574-6-mark.rutland@arm.com
    99cf983c
core.c 278 KB