• Nathan Lynch's avatar
    powerpc/rtas: use device model APIs and serialization during LPM · a6717c01
    Nathan Lynch authored
    The LPAR migration implementation and userspace-initiated cpu hotplug
    can interleave their executions like so:
    
    1. Set cpu 7 offline via sysfs.
    
    2. Begin a partition migration, whose implementation requires the OS
       to ensure all present cpus are online; cpu 7 is onlined:
    
         rtas_ibm_suspend_me -> rtas_online_cpus_mask -> cpu_up
    
       This sets cpu 7 online in all respects except for the cpu's
       corresponding struct device; dev->offline remains true.
    
    3. Set cpu 7 online via sysfs. _cpu_up() determines that cpu 7 is
       already online and returns success. The driver core (device_online)
       sets dev->offline = false.
    
    4. The migration completes and restores cpu 7 to offline state:
    
         rtas_ibm_suspend_me -> rtas_offline_cpus_mask -> cpu_down
    
    This leaves cpu7 in a state where the driver core considers the cpu
    device online, but in all other respects it is offline and
    unused. Attempts to online the cpu via sysfs appear to succeed but the
    driver core actually does not pass the request to the lower-level
    cpuhp support code. This makes the cpu unusable until the cpu device
    is manually set offline and then online again via sysfs.
    
    Instead of directly calling cpu_up/cpu_down, the migration code should
    use the higher-level device core APIs to maintain consistent state and
    serialize operations.
    
    Fixes: 120496ac ("powerpc: Bring all threads online prior to migration/hibernation")
    Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
    Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20190802192926.19277-2-nathanl@linux.ibm.com
    a6717c01
rtas.c 29.1 KB