• Christophe Leroy's avatar
    powerpc/32: Add support for out-of-line static calls · 5c810ced
    Christophe Leroy authored
    Add support for out-of-line static calls on PPC32. This change
    improve performance of calls to global function pointers by
    using direct calls instead of indirect calls.
    
    The trampoline is initialy populated with a 'blr' or branch to target,
    followed by an unreachable long jump sequence.
    
    In order to cater with parallele execution, the trampoline needs to
    be updated in a way that ensures it remains consistent at all time.
    This means we can't use the traditional lis/addi to load r12 with
    the target address, otherwise there would be a window during which
    the first instruction contains the upper part of the new target
    address while the second instruction still contains the lower part of
    the old target address. To avoid that the target address is stored
    just after the 'bctr' and loaded from there with a single instruction.
    
    Then, depending on the target distance, arch_static_call_transform()
    will either replace the first instruction by a direct 'bl <target>' or
    'nop' in order to have the trampoline fall through the long jump
    sequence.
    
    For the special case of __static_call_return0(), to avoid the risk of
    a far branch, a version of it is inlined at the end of the trampoline.
    
    Performancewise the long jump sequence is probably not better than
    the indirect calls set by GCC when we don't use static calls, but
    such calls are unlikely to be required on powerpc32: With most
    configurations the kernel size is far below 32 Mbytes so only
    modules may happen to be too far. And even modules are likely to
    be close enough as they are allocated below the kernel core and
    as close as possible of the kernel text.
    
    static_call selftest is running successfully with this change.
    
    With this patch, __do_irq() has the following sequence to trace
    irq entries:
    
    	c0004a00 <__SCT__tp_func_irq_entry>:
    	c0004a00:	48 00 00 e0 	b       c0004ae0 <__traceiter_irq_entry>
    	c0004a04:	3d 80 c0 00 	lis     r12,-16384
    	c0004a08:	81 8c 4a 1c 	lwz     r12,18972(r12)
    	c0004a0c:	7d 89 03 a6 	mtctr   r12
    	c0004a10:	4e 80 04 20 	bctr
    	c0004a14:	38 60 00 00 	li      r3,0
    	c0004a18:	4e 80 00 20 	blr
    	c0004a1c:	00 00 00 00 	.long 0x0
    ...
    	c0005654 <__do_irq>:
    ...
    	c0005664:	7c 7f 1b 78 	mr      r31,r3
    ...
    	c00056a0:	81 22 00 00 	lwz     r9,0(r2)
    	c00056a4:	39 29 00 01 	addi    r9,r9,1
    	c00056a8:	91 22 00 00 	stw     r9,0(r2)
    	c00056ac:	3d 20 c0 af 	lis     r9,-16209
    	c00056b0:	81 29 74 cc 	lwz     r9,29900(r9)
    	c00056b4:	2c 09 00 00 	cmpwi   r9,0
    	c00056b8:	41 82 00 10 	beq     c00056c8 <__do_irq+0x74>
    	c00056bc:	80 69 00 04 	lwz     r3,4(r9)
    	c00056c0:	7f e4 fb 78 	mr      r4,r31
    	c00056c4:	4b ff f3 3d 	bl      c0004a00 <__SCT__tp_func_irq_entry>
    
    Before this patch, __do_irq() was doing the following to trace irq
    entries:
    
    	c0005700 <__do_irq>:
    ...
    	c0005710:	7c 7e 1b 78 	mr      r30,r3
    ...
    	c000574c:	93 e1 00 0c 	stw     r31,12(r1)
    	c0005750:	81 22 00 00 	lwz     r9,0(r2)
    	c0005754:	39 29 00 01 	addi    r9,r9,1
    	c0005758:	91 22 00 00 	stw     r9,0(r2)
    	c000575c:	3d 20 c0 af 	lis     r9,-16209
    	c0005760:	83 e9 f4 cc 	lwz     r31,-2868(r9)
    	c0005764:	2c 1f 00 00 	cmpwi   r31,0
    	c0005768:	41 82 00 24 	beq     c000578c <__do_irq+0x8c>
    	c000576c:	81 3f 00 00 	lwz     r9,0(r31)
    	c0005770:	80 7f 00 04 	lwz     r3,4(r31)
    	c0005774:	7d 29 03 a6 	mtctr   r9
    	c0005778:	7f c4 f3 78 	mr      r4,r30
    	c000577c:	4e 80 04 21 	bctrl
    	c0005780:	85 3f 00 0c 	lwzu    r9,12(r31)
    	c0005784:	2c 09 00 00 	cmpwi   r9,0
    	c0005788:	40 82 ff e4 	bne     c000576c <__do_irq+0x6c>
    
    Behind the fact of now using a direct 'bl' instead of a
    'load/mtctr/bctr' sequence, we can also see that we get one less
    register on the stack.
    Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
    Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/6ec2a7865ed6a5ec54ab46d026785bafe1d837ea.1630484892.git.christophe.leroy@csgroup.eu
    5c810ced
Makefile 6.34 KB