Commit eb5c3f1c authored by Cyril Bur's avatar Cyril Bur Committed by Michael Ellerman

powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint

Lazy save and restore of FP/Altivec means that a userspace process can
be sent to userspace with FP or Altivec disabled and loaded only as
required (by way of an FP/Altivec unavailable exception). Transactional
Memory complicates this situation as a transaction could be started
without FP/Altivec being loaded up. This causes the hardware to
checkpoint incorrect registers. Handling FP/Altivec unavailable
exceptions while a thread is transactional requires a reclaim and
recheckpoint to ensure the CPU has correct state for both sets of
registers.

tm_reclaim() has optimisations to not always save the FP/Altivec
registers to the checkpointed save area. This was originally done
because the caller might have information that the checkpointed
registers aren't valid due to lazy save and restore. We've also been a
little vague as to how tm_reclaim() leaves the FP/Altivec state since it
doesn't necessarily always save it to the thread struct. This has lead
to an (incorrect) assumption that it leaves the checkpointed state on
the CPU.

tm_recheckpoint() has similar optimisations in reverse. It may not
always reload the checkpointed FP/Altivec registers from the thread
struct before the trecheckpoint. It is therefore quite unclear where it
expects to get the state from. This didn't help with the assumption
made about tm_reclaim().

These optimisations sit in what is by definition a slow path. If a
process has to go through a reclaim/recheckpoint then its transaction
will be doomed on returning to userspace. This mean that the process
will be unable to complete its transaction and be forced to its failure
handler. This is already an out if line case for userspace. Furthermore,
the cost of copying 64 times 128 bits from registers isn't very long[0]
(at all) on modern processors. As such it appears these optimisations
have only served to increase code complexity and are unlikely to have
had a measurable performance impact.

Our transactional memory handling has been riddled with bugs. A cause
of this has been difficulty in following the code flow, code complexity
has not been our friend here. It makes sense to remove these
optimisations in favour of a (hopefully) more stable implementation.

This patch does mean that some times the assembly will needlessly save
'junk' registers which will subsequently get overwritten with the
correct value by the C code which calls the assembly function. This
small inefficiency is far outweighed by the reduction in complexity for
general TM code, context switching paths, and transactional facility
unavailable exception handler.

0: I tried to measure it once for other work and found that it was
hiding in the noise of everything else I was working with. I find it
exceedingly likely this will be the case here.
Signed-off-by: default avatarCyril Bur <cyrilbur@gmail.com>
Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
parent 91381b9c
...@@ -11,10 +11,9 @@ ...@@ -11,10 +11,9 @@
extern void tm_enable(void); extern void tm_enable(void);
extern void tm_reclaim(struct thread_struct *thread, extern void tm_reclaim(struct thread_struct *thread,
unsigned long orig_msr, uint8_t cause); uint8_t cause);
extern void tm_reclaim_current(uint8_t cause); extern void tm_reclaim_current(uint8_t cause);
extern void tm_recheckpoint(struct thread_struct *thread, extern void tm_recheckpoint(struct thread_struct *thread);
unsigned long orig_msr);
extern void tm_abort(uint8_t cause); extern void tm_abort(uint8_t cause);
extern void tm_save_sprs(struct thread_struct *thread); extern void tm_save_sprs(struct thread_struct *thread);
extern void tm_restore_sprs(struct thread_struct *thread); extern void tm_restore_sprs(struct thread_struct *thread);
......
...@@ -876,6 +876,8 @@ static void tm_reclaim_thread(struct thread_struct *thr, ...@@ -876,6 +876,8 @@ static void tm_reclaim_thread(struct thread_struct *thr,
giveup_all(container_of(thr, struct task_struct, thread)); giveup_all(container_of(thr, struct task_struct, thread));
tm_reclaim(thr, cause);
/* /*
* If we are in a transaction and FP is off then we can't have * If we are in a transaction and FP is off then we can't have
* used FP inside that transaction. Hence the checkpointed * used FP inside that transaction. Hence the checkpointed
...@@ -894,8 +896,6 @@ static void tm_reclaim_thread(struct thread_struct *thr, ...@@ -894,8 +896,6 @@ static void tm_reclaim_thread(struct thread_struct *thr,
if ((thr->ckpt_regs.msr & MSR_VEC) == 0) if ((thr->ckpt_regs.msr & MSR_VEC) == 0)
memcpy(&thr->ckvr_state, &thr->vr_state, memcpy(&thr->ckvr_state, &thr->vr_state,
sizeof(struct thread_vr_state)); sizeof(struct thread_vr_state));
tm_reclaim(thr, thr->ckpt_regs.msr, cause);
} }
void tm_reclaim_current(uint8_t cause) void tm_reclaim_current(uint8_t cause)
...@@ -946,11 +946,9 @@ static inline void tm_reclaim_task(struct task_struct *tsk) ...@@ -946,11 +946,9 @@ static inline void tm_reclaim_task(struct task_struct *tsk)
tm_save_sprs(thr); tm_save_sprs(thr);
} }
extern void __tm_recheckpoint(struct thread_struct *thread, extern void __tm_recheckpoint(struct thread_struct *thread);
unsigned long orig_msr);
void tm_recheckpoint(struct thread_struct *thread, void tm_recheckpoint(struct thread_struct *thread)
unsigned long orig_msr)
{ {
unsigned long flags; unsigned long flags;
...@@ -969,15 +967,13 @@ void tm_recheckpoint(struct thread_struct *thread, ...@@ -969,15 +967,13 @@ void tm_recheckpoint(struct thread_struct *thread,
*/ */
tm_restore_sprs(thread); tm_restore_sprs(thread);
__tm_recheckpoint(thread, orig_msr); __tm_recheckpoint(thread);
local_irq_restore(flags); local_irq_restore(flags);
} }
static inline void tm_recheckpoint_new_task(struct task_struct *new) static inline void tm_recheckpoint_new_task(struct task_struct *new)
{ {
unsigned long msr;
if (!cpu_has_feature(CPU_FTR_TM)) if (!cpu_has_feature(CPU_FTR_TM))
return; return;
...@@ -996,13 +992,11 @@ static inline void tm_recheckpoint_new_task(struct task_struct *new) ...@@ -996,13 +992,11 @@ static inline void tm_recheckpoint_new_task(struct task_struct *new)
tm_restore_sprs(&new->thread); tm_restore_sprs(&new->thread);
return; return;
} }
msr = new->thread.ckpt_regs.msr;
/* Recheckpoint to restore original checkpointed register state. */ /* Recheckpoint to restore original checkpointed register state. */
TM_DEBUG("*** tm_recheckpoint of pid %d " TM_DEBUG("*** tm_recheckpoint of pid %d (new->msr 0x%lx)\n",
"(new->msr 0x%lx, new->origmsr 0x%lx)\n", new->pid, new->thread.regs->msr);
new->pid, new->thread.regs->msr, msr);
tm_recheckpoint(&new->thread, msr); tm_recheckpoint(&new->thread);
/* /*
* The checkpointed state has been restored but the live state has * The checkpointed state has been restored but the live state has
......
...@@ -880,7 +880,7 @@ static long restore_tm_user_regs(struct pt_regs *regs, ...@@ -880,7 +880,7 @@ static long restore_tm_user_regs(struct pt_regs *regs,
/* Make sure the transaction is marked as failed */ /* Make sure the transaction is marked as failed */
current->thread.tm_texasr |= TEXASR_FS; current->thread.tm_texasr |= TEXASR_FS;
/* This loads the checkpointed FP/VEC state, if used */ /* This loads the checkpointed FP/VEC state, if used */
tm_recheckpoint(&current->thread, msr); tm_recheckpoint(&current->thread);
/* This loads the speculative FP/VEC state, if used */ /* This loads the speculative FP/VEC state, if used */
msr_check_and_set(msr & (MSR_FP | MSR_VEC)); msr_check_and_set(msr & (MSR_FP | MSR_VEC));
......
...@@ -552,7 +552,7 @@ static long restore_tm_sigcontexts(struct task_struct *tsk, ...@@ -552,7 +552,7 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
/* Make sure the transaction is marked as failed */ /* Make sure the transaction is marked as failed */
tsk->thread.tm_texasr |= TEXASR_FS; tsk->thread.tm_texasr |= TEXASR_FS;
/* This loads the checkpointed FP/VEC state, if used */ /* This loads the checkpointed FP/VEC state, if used */
tm_recheckpoint(&tsk->thread, msr); tm_recheckpoint(&tsk->thread);
msr_check_and_set(msr & (MSR_FP | MSR_VEC)); msr_check_and_set(msr & (MSR_FP | MSR_VEC));
if (msr & MSR_FP) { if (msr & MSR_FP) {
......
...@@ -79,15 +79,12 @@ _GLOBAL(tm_abort) ...@@ -79,15 +79,12 @@ _GLOBAL(tm_abort)
blr blr
/* void tm_reclaim(struct thread_struct *thread, /* void tm_reclaim(struct thread_struct *thread,
* unsigned long orig_msr,
* uint8_t cause) * uint8_t cause)
* *
* - Performs a full reclaim. This destroys outstanding * - Performs a full reclaim. This destroys outstanding
* transactions and updates thread->regs.tm_ckpt_* with the * transactions and updates thread->regs.tm_ckpt_* with the
* original checkpointed state. Note that thread->regs is * original checkpointed state. Note that thread->regs is
* unchanged. * unchanged.
* - FP regs are written back to thread->transact_fpr before
* reclaiming. These are the transactional (current) versions.
* *
* Purpose is to both abort transactions of, and preserve the state of, * Purpose is to both abort transactions of, and preserve the state of,
* a transactions at a context switch. We preserve/restore both sets of process * a transactions at a context switch. We preserve/restore both sets of process
...@@ -98,9 +95,9 @@ _GLOBAL(tm_abort) ...@@ -98,9 +95,9 @@ _GLOBAL(tm_abort)
* Call with IRQs off, stacks get all out of sync for some periods in here! * Call with IRQs off, stacks get all out of sync for some periods in here!
*/ */
_GLOBAL(tm_reclaim) _GLOBAL(tm_reclaim)
mfcr r6 mfcr r5
mflr r0 mflr r0
stw r6, 8(r1) stw r5, 8(r1)
std r0, 16(r1) std r0, 16(r1)
std r2, STK_GOT(r1) std r2, STK_GOT(r1)
stdu r1, -TM_FRAME_SIZE(r1) stdu r1, -TM_FRAME_SIZE(r1)
...@@ -108,7 +105,6 @@ _GLOBAL(tm_reclaim) ...@@ -108,7 +105,6 @@ _GLOBAL(tm_reclaim)
/* We've a struct pt_regs at [r1+STACK_FRAME_OVERHEAD]. */ /* We've a struct pt_regs at [r1+STACK_FRAME_OVERHEAD]. */
std r3, STK_PARAM(R3)(r1) std r3, STK_PARAM(R3)(r1)
std r4, STK_PARAM(R4)(r1)
SAVE_NVGPRS(r1) SAVE_NVGPRS(r1)
/* We need to setup MSR for VSX register save instructions. */ /* We need to setup MSR for VSX register save instructions. */
...@@ -138,8 +134,8 @@ _GLOBAL(tm_reclaim) ...@@ -138,8 +134,8 @@ _GLOBAL(tm_reclaim)
std r1, PACAR1(r13) std r1, PACAR1(r13)
/* Clear MSR RI since we are about to change r1, EE is already off. */ /* Clear MSR RI since we are about to change r1, EE is already off. */
li r4, 0 li r5, 0
mtmsrd r4, 1 mtmsrd r5, 1
/* /*
* BE CAREFUL HERE: * BE CAREFUL HERE:
...@@ -151,7 +147,7 @@ _GLOBAL(tm_reclaim) ...@@ -151,7 +147,7 @@ _GLOBAL(tm_reclaim)
* to user register state. (FPRs, CCR etc. also!) * to user register state. (FPRs, CCR etc. also!)
* Use an sprg and a tm_scratch in the PACA to shuffle. * Use an sprg and a tm_scratch in the PACA to shuffle.
*/ */
TRECLAIM(R5) /* Cause in r5 */ TRECLAIM(R4) /* Cause in r4 */
/* ******************** GPRs ******************** */ /* ******************** GPRs ******************** */
/* Stash the checkpointed r13 away in the scratch SPR and get the real /* Stash the checkpointed r13 away in the scratch SPR and get the real
...@@ -242,40 +238,30 @@ _GLOBAL(tm_reclaim) ...@@ -242,40 +238,30 @@ _GLOBAL(tm_reclaim)
/* ******************** FPR/VR/VSRs ************ /* ******************** FPR/VR/VSRs ************
* After reclaiming, capture the checkpointed FPRs/VRs /if used/. * After reclaiming, capture the checkpointed FPRs/VRs.
*
* (If VSX used, FP and VMX are implied. Or, we don't need to look
* at MSR.VSX as copying FP regs if .FP, vector regs if .VMX covers it.)
*
* We're passed the thread's MSR as the second parameter
* *
* We enabled VEC/FP/VSX in the msr above, so we can execute these * We enabled VEC/FP/VSX in the msr above, so we can execute these
* instructions! * instructions!
*/ */
ld r4, STK_PARAM(R4)(r1) /* Second parameter, MSR * */
mr r3, r12 mr r3, r12
andis. r0, r4, MSR_VEC@h
beq dont_backup_vec
/* Altivec (VEC/VMX/VR)*/
addi r7, r3, THREAD_CKVRSTATE addi r7, r3, THREAD_CKVRSTATE
SAVE_32VRS(0, r6, r7) /* r6 scratch, r7 transact vr state */ SAVE_32VRS(0, r6, r7) /* r6 scratch, r7 transact vr state */
mfvscr v0 mfvscr v0
li r6, VRSTATE_VSCR li r6, VRSTATE_VSCR
stvx v0, r7, r6 stvx v0, r7, r6
dont_backup_vec:
/* VRSAVE */
mfspr r0, SPRN_VRSAVE mfspr r0, SPRN_VRSAVE
std r0, THREAD_CKVRSAVE(r3) std r0, THREAD_CKVRSAVE(r3)
andi. r0, r4, MSR_FP /* Floating Point (FP) */
beq dont_backup_fp
addi r7, r3, THREAD_CKFPSTATE addi r7, r3, THREAD_CKFPSTATE
SAVE_32FPRS_VSRS(0, R6, R7) /* r6 scratch, r7 transact fp state */ SAVE_32FPRS_VSRS(0, R6, R7) /* r6 scratch, r7 transact fp state */
mffs fr0 mffs fr0
stfd fr0,FPSTATE_FPSCR(r7) stfd fr0,FPSTATE_FPSCR(r7)
dont_backup_fp:
/* TM regs, incl TEXASR -- these live in thread_struct. Note they've /* TM regs, incl TEXASR -- these live in thread_struct. Note they've
* been updated by the treclaim, to explain to userland the failure * been updated by the treclaim, to explain to userland the failure
...@@ -343,22 +329,19 @@ _GLOBAL(__tm_recheckpoint) ...@@ -343,22 +329,19 @@ _GLOBAL(__tm_recheckpoint)
*/ */
subi r7, r7, STACK_FRAME_OVERHEAD subi r7, r7, STACK_FRAME_OVERHEAD
/* We need to setup MSR for FP/VMX/VSX register save instructions. */
mfmsr r6 mfmsr r6
/* R4 = original MSR to indicate whether thread used FP/Vector etc. */ mr r5, r6
/* Enable FP/vec in MSR if necessary! */
lis r5, MSR_VEC@h
ori r5, r5, MSR_FP ori r5, r5, MSR_FP
and. r5, r4, r5 #ifdef CONFIG_ALTIVEC
beq restore_gprs /* if neither, skip both */ oris r5, r5, MSR_VEC@h
#endif
#ifdef CONFIG_VSX #ifdef CONFIG_VSX
BEGIN_FTR_SECTION BEGIN_FTR_SECTION
oris r5, r5, MSR_VSX@h oris r5,r5, MSR_VSX@h
END_FTR_SECTION_IFSET(CPU_FTR_VSX) END_FTR_SECTION_IFSET(CPU_FTR_VSX)
#endif #endif
or r5, r6, r5 /* Set MSR.FP+.VSX/.VEC */ mtmsrd r5
mtmsr r5
#ifdef CONFIG_ALTIVEC #ifdef CONFIG_ALTIVEC
/* /*
...@@ -367,28 +350,20 @@ _GLOBAL(__tm_recheckpoint) ...@@ -367,28 +350,20 @@ _GLOBAL(__tm_recheckpoint)
* thread.fp_state[] version holds the 'live' (transactional) * thread.fp_state[] version holds the 'live' (transactional)
* and will be loaded subsequently by any FPUnavailable trap. * and will be loaded subsequently by any FPUnavailable trap.
*/ */
andis. r0, r4, MSR_VEC@h
beq dont_restore_vec
addi r8, r3, THREAD_CKVRSTATE addi r8, r3, THREAD_CKVRSTATE
li r5, VRSTATE_VSCR li r5, VRSTATE_VSCR
lvx v0, r8, r5 lvx v0, r8, r5
mtvscr v0 mtvscr v0
REST_32VRS(0, r5, r8) /* r5 scratch, r8 ptr */ REST_32VRS(0, r5, r8) /* r5 scratch, r8 ptr */
dont_restore_vec:
ld r5, THREAD_CKVRSAVE(r3) ld r5, THREAD_CKVRSAVE(r3)
mtspr SPRN_VRSAVE, r5 mtspr SPRN_VRSAVE, r5
#endif #endif
andi. r0, r4, MSR_FP
beq dont_restore_fp
addi r8, r3, THREAD_CKFPSTATE addi r8, r3, THREAD_CKFPSTATE
lfd fr0, FPSTATE_FPSCR(r8) lfd fr0, FPSTATE_FPSCR(r8)
MTFSF_L(fr0) MTFSF_L(fr0)
REST_32FPRS_VSRS(0, R4, R8) REST_32FPRS_VSRS(0, R4, R8)
dont_restore_fp:
mtmsr r6 /* FP/Vec off again! */ mtmsr r6 /* FP/Vec off again! */
restore_gprs: restore_gprs:
......
...@@ -1693,7 +1693,7 @@ void fp_unavailable_tm(struct pt_regs *regs) ...@@ -1693,7 +1693,7 @@ void fp_unavailable_tm(struct pt_regs *regs)
* If VMX is in use, the VRs now hold checkpointed values, * If VMX is in use, the VRs now hold checkpointed values,
* so we don't want to load the VRs from the thread_struct. * so we don't want to load the VRs from the thread_struct.
*/ */
tm_recheckpoint(&current->thread, orig_msr | MSR_FP); tm_recheckpoint(&current->thread);
/* If VMX is in use, get the transactional values back */ /* If VMX is in use, get the transactional values back */
if (orig_msr & MSR_VEC) { if (orig_msr & MSR_VEC) {
...@@ -1721,7 +1721,7 @@ void altivec_unavailable_tm(struct pt_regs *regs) ...@@ -1721,7 +1721,7 @@ void altivec_unavailable_tm(struct pt_regs *regs)
regs->nip, regs->msr); regs->nip, regs->msr);
tm_reclaim_current(TM_CAUSE_FAC_UNAV); tm_reclaim_current(TM_CAUSE_FAC_UNAV);
current->thread.load_vec = 1; current->thread.load_vec = 1;
tm_recheckpoint(&current->thread, orig_msr | MSR_VEC); tm_recheckpoint(&current->thread);
current->thread.used_vr = 1; current->thread.used_vr = 1;
if (orig_msr & MSR_FP) { if (orig_msr & MSR_FP) {
...@@ -1733,8 +1733,6 @@ void altivec_unavailable_tm(struct pt_regs *regs) ...@@ -1733,8 +1733,6 @@ void altivec_unavailable_tm(struct pt_regs *regs)
void vsx_unavailable_tm(struct pt_regs *regs) void vsx_unavailable_tm(struct pt_regs *regs)
{ {
unsigned long orig_msr = regs->msr;
/* See the comments in fp_unavailable_tm(). This works similarly, /* See the comments in fp_unavailable_tm(). This works similarly,
* though we're loading both FP and VEC registers in here. * though we're loading both FP and VEC registers in here.
* *
...@@ -1748,28 +1746,16 @@ void vsx_unavailable_tm(struct pt_regs *regs) ...@@ -1748,28 +1746,16 @@ void vsx_unavailable_tm(struct pt_regs *regs)
current->thread.used_vsr = 1; current->thread.used_vsr = 1;
/* If FP and VMX are already loaded, we have all the state we need */
if ((orig_msr & (MSR_FP | MSR_VEC)) == (MSR_FP | MSR_VEC)) {
regs->msr |= MSR_VSX;
return;
}
/* This reclaims FP and/or VR regs if they're already enabled */ /* This reclaims FP and/or VR regs if they're already enabled */
tm_reclaim_current(TM_CAUSE_FAC_UNAV); tm_reclaim_current(TM_CAUSE_FAC_UNAV);
current->thread.load_vec = 1; current->thread.load_vec = 1;
current->thread.load_fp = 1; current->thread.load_fp = 1;
/* This loads & recheckpoints FP and VRs; but we have tm_recheckpoint(&current->thread);
* to be sure not to overwrite previously-valid state.
*/
tm_recheckpoint(&current->thread, orig_msr | MSR_FP | MSR_VEC);
msr_check_and_set(orig_msr & (MSR_FP | MSR_VEC));
if (orig_msr & MSR_FP) msr_check_and_set(MSR_FP | MSR_VEC);
load_fp_state(&current->thread.fp_state); load_fp_state(&current->thread.fp_state);
if (orig_msr & MSR_VEC)
load_vr_state(&current->thread.vr_state); load_vr_state(&current->thread.vr_state);
} }
#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment