• Russell King's avatar
    ARM: avoid Cortex-A9 livelock on tight dmb loops · e2cadf02
    Russell King authored
    [ Upstream commit 5388a5b8 ]
    
    machine_crash_nonpanic_core() does this:
    
    	while (1)
    		cpu_relax();
    
    because the kernel has crashed, and we have no known safe way to deal
    with the CPU.  So, we place the CPU into an infinite loop which we
    expect it to never exit - at least not until the system as a whole is
    reset by some method.
    
    In the absence of erratum 754327, this code assembles to:
    
    	b	.
    
    In other words, an infinite loop.  When erratum 754327 is enabled,
    this becomes:
    
    1:	dmb
    	b	1b
    
    It has been observed that on some systems (eg, OMAP4) where, if a
    crash is triggered, the system tries to kexec into the panic kernel,
    but fails after taking the secondary CPU down - placing it into one
    of these loops.  This causes the system to livelock, and the most
    noticable effect is the system stops after issuing:
    
    	Loading crashdump kernel...
    
    to the system console.
    
    The tested as working solution I came up with was to add wfe() to
    these infinite loops thusly:
    
    	while (1) {
    		cpu_relax();
    		wfe();
    	}
    
    which, without 754327 builds to:
    
    1:	wfe
    	b	1b
    
    or with 754327 is enabled:
    
    1:	dmb
    	wfe
    	b	1b
    
    Adding "wfe" does two things depending on the environment we're running
    under:
    - where we're running on bare metal, and the processor implements
      "wfe", it stops us spinning endlessly in a loop where we're never
      going to do any useful work.
    - if we're running in a VM, it allows the CPU to be given back to the
      hypervisor and rescheduled for other purposes (maybe a different VM)
      rather than wasting CPU cycles inside a crashed VM.
    
    However, in light of erratum 794072, Will Deacon wanted to see 10 nops
    as well - which is reasonable to cover the case where we have erratum
    754327 enabled _and_ we have a processor that doesn't implement the
    wfe hint.
    
    So, we now end up with:
    
    1:      wfe
            b       1b
    
    when erratum 754327 is disabled, or:
    
    1:      dmb
            nop
            nop
            nop
            nop
            nop
            nop
            nop
            nop
            nop
            nop
            wfe
            b       1b
    
    when erratum 754327 is enabled.  We also get the dmb + 10 nop
    sequence elsewhere in the kernel, in terminating loops.
    
    This is reasonable - it means we get the workaround for erratum
    794072 when erratum 754327 is enabled, but still relinquish the dead
    processor - either by placing it in a lower power mode when wfe is
    implemented as such or by returning it to the hypervisior, or in the
    case where wfe is a no-op, we use the workaround specified in erratum
    794072 to avoid the problem.
    
    These as two entirely orthogonal problems - the 10 nops addresses
    erratum 794072, and the wfe is an optimisation that makes the system
    more efficient when crashed either in terms of power consumption or
    by allowing the host/other VMs to make use of the CPU.
    
    I don't see any reason not to use kexec() inside a VM - it has the
    potential to provide automated recovery from a failure of the VMs
    kernel with the opportunity for saving a crashdump of the failure.
    A panic() with a reboot timeout won't do that, and reading the
    libvirt documentation, setting on_reboot to "preserve" won't either
    (the documentation states "The preserve action for an on_reboot event
    is treated as a destroy".)  Surely it has to be a good thing to
    avoiding having CPUs spinning inside a VM that is doing no useful
    work.
    Acked-by: default avatarWill Deacon <will.deacon@arm.com>
    Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
    Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
    e2cadf02
prm_common.c 22 KB