• Cyril Bur's avatar
    powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint · eb5c3f1c
    Cyril Bur authored
    Lazy save and restore of FP/Altivec means that a userspace process can
    be sent to userspace with FP or Altivec disabled and loaded only as
    required (by way of an FP/Altivec unavailable exception). Transactional
    Memory complicates this situation as a transaction could be started
    without FP/Altivec being loaded up. This causes the hardware to
    checkpoint incorrect registers. Handling FP/Altivec unavailable
    exceptions while a thread is transactional requires a reclaim and
    recheckpoint to ensure the CPU has correct state for both sets of
    registers.
    
    tm_reclaim() has optimisations to not always save the FP/Altivec
    registers to the checkpointed save area. This was originally done
    because the caller might have information that the checkpointed
    registers aren't valid due to lazy save and restore. We've also been a
    little vague as to how tm_reclaim() leaves the FP/Altivec state since it
    doesn't necessarily always save it to the thread struct. This has lead
    to an (incorrect) assumption that it leaves the checkpointed state on
    the CPU.
    
    tm_recheckpoint() has similar optimisations in reverse. It may not
    always reload the checkpointed FP/Altivec registers from the thread
    struct before the trecheckpoint. It is therefore quite unclear where it
    expects to get the state from. This didn't help with the assumption
    made about tm_reclaim().
    
    These optimisations sit in what is by definition a slow path. If a
    process has to go through a reclaim/recheckpoint then its transaction
    will be doomed on returning to userspace. This mean that the process
    will be unable to complete its transaction and be forced to its failure
    handler. This is already an out if line case for userspace. Furthermore,
    the cost of copying 64 times 128 bits from registers isn't very long[0]
    (at all) on modern processors. As such it appears these optimisations
    have only served to increase code complexity and are unlikely to have
    had a measurable performance impact.
    
    Our transactional memory handling has been riddled with bugs. A cause
    of this has been difficulty in following the code flow, code complexity
    has not been our friend here. It makes sense to remove these
    optimisations in favour of a (hopefully) more stable implementation.
    
    This patch does mean that some times the assembly will needlessly save
    'junk' registers which will subsequently get overwritten with the
    correct value by the C code which calls the assembly function. This
    small inefficiency is far outweighed by the reduction in complexity for
    general TM code, context switching paths, and transactional facility
    unavailable exception handler.
    
    0: I tried to measure it once for other work and found that it was
    hiding in the noise of everything else I was working with. I find it
    exceedingly likely this will be the case here.
    Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    eb5c3f1c
process.c 52.4 KB