• Trent Piepho's avatar
    powerpc: Improve (in|out)_[bl]eXX() asm code · 0f3d6bcd
    Trent Piepho authored
    Since commit 4cb3cee0 the code generated
    for the in_beXX() and out_beXX() mmio functions has been sub-optimal.
    
    The out_leXX() family of functions are created with the macro
    DEF_MMIO_OUT_LE() while the out_beXX() family are created with
    DEF_MMIO_OUT_BE().  In what was perhaps a bit too much macro use, both of
    these macros are in turn created via the macro DEF_MMIO_OUT().
    
    For the LE versions, eventually they boil down to an asm that will look
    something like this:
    asm("sync; stwbrx %1,0,%2" : "=m" (*addr) : "r" (val), "r" (addr));
    
    The issue is that the "stwbrx" instruction only comes in an indexed, or
    'x', version, in which the address is represented by the sum of two
    registers (the "0,%2").  Unfortunately, gcc doesn't have a constraint for
    an indexed memory reference.  The "m" constraint allows both indexed and
    offset, i.e. register plus constant, memory references and there is no
    "stwbr" version for offset references.  "m" also allows updating addresses
    and there is no 'u' version of "stwbrx" like there is with "stwux".
    
    The unused first operand to the asm is just to tell gcc that *addr is an
    output of the asm.  The address used is passed in a single register via the
    third asm operand, and the index register is just hard coded as 0.  This
    means gcc is forced to put the address in a single register and can't use
    index addressing, e.g. if one has the data in register 9, a base address in
    register 3 and an index in register 4, gcc must emit code like "add 11,4,3;
    stwbrx 9,0,11" instead of just "stwbrx 9,4,3".  This costs an extra add
    instruction and another register.
    
    For gcc 4.0 and older, there doesn't appear to be anything that can be
    done.  But for 4.1 and newer, there is a 'Z' constraint.  It does not allow
    "updating" addresses, but does allow both indexed and offset addresses.
    However, the only allowed constant offset is 0.  We can then use the
    undocumented 'y' operand modifier, which causes gcc to convert "0(reg)"
    into the equivilient "0,reg" format that can be used with stwbrx.
    
    This brings us the to problem with the BE version.  In this case, the "stw"
    instruction does have both indexed and non-indexed versions.  The final asm
    ends up looking like this:
    asm("sync; stw%U0%X0 %1,%0" : "=m" (*addr) : "r" (val), "r" (addr));
    
    The undocumented codes "%U0" and "%0X" will generate a 'u' if the memory
    reference should be an auto-updating one, and an 'x' if the memory
    reference is indexed, respectively.  The third operand is unused, it's just
    there because asm the code is reused from the LE version.  However, gcc
    does not know this, and generates unnecessary code to stick addr in a
    register!  To use the example from the LE version, gcc will generate "add
    11,4,3; stwx 9,4,3".  It is able to use the indexed address "4,3" for the
    "stwx", but still thinks it needs to put 4+3 into register 11, which will
    never be used.
    
    This also ends up happening a lot for the offset addressing mode, where
    common code like this:  out_be32(&device_registers->some_register, data);
    uses an instruction like "stw 9, 42(3)", where register 3 has the pointer
    device_registers and 42 is the offset of some_register in that structure.
    gcc will be forced to generate the unnecessary instruction "addi 11, 3, 42"
    to put the address into a single (unused) register.
    
    The in_* versions end up having these exact same problems as well.
    Signed-off-by: default avatarTrent Piepho <tpiepho@freescale.com>
    CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    CC: Andreas Schwab <schwab@suse.de>
    Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
    0f3d6bcd
io.h 25.1 KB