• Nathan Lynch's avatar
    powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration · e610a466
    Nathan Lynch authored
    It's common for the platform to replace the cache device nodes after a
    migration. Since the cacheinfo code is never informed about this, it
    never drops its references to the source system's cache nodes, causing
    it to wind up in an inconsistent state resulting in warnings and oopses
    as soon as CPU online/offline occurs after the migration, e.g.
    
      cache for /cpus/l3-cache@3113(Unified) refers to cache for /cpus/l2-cache@200d(Unified)
      WARNING: CPU: 15 PID: 86 at arch/powerpc/kernel/cacheinfo.c:176 release_cache+0x1bc/0x1d0
      [...]
      NIP release_cache+0x1bc/0x1d0
      LR  release_cache+0x1b8/0x1d0
      Call Trace:
        release_cache+0x1b8/0x1d0 (unreliable)
        cacheinfo_cpu_offline+0x1c4/0x2c0
        unregister_cpu_online+0x1b8/0x260
        cpuhp_invoke_callback+0x114/0xf40
        cpuhp_thread_fun+0x270/0x310
        smpboot_thread_fn+0x2c8/0x390
        kthread+0x1b8/0x1c0
        ret_from_kernel_thread+0x5c/0x68
    
    Using device tree notifiers won't work since we want to rebuild the
    hierarchy only after all the removals and additions have occurred and
    the device tree is in a consistent state. Call cacheinfo_teardown()
    before processing device tree updates, and rebuild the hierarchy
    afterward.
    
    Fixes: 410bccf9 ("powerpc/pseries: Partition migration in the kernel")
    Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
    Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    e610a466
mobility.c 9.21 KB