1. 29 Apr, 2019 1 commit
    • Nathan Fontenot's avatar
      powerpc/pseries: Track LMB nid instead of using device tree · b2d3b5ee
      Nathan Fontenot authored
      When removing memory we need to remove the memory from the node
      it was added to instead of looking up the node it should be in
      in the device tree.
      
      During testing we have seen scenarios where the affinity for a
      LMB changes due to a partition migration or PRRN event. In these
      cases the node the LMB exists in may not match the node the device
      tree indicates it belongs in. This can lead to a system crash
      when trying to DLPAR remove the LMB after a migration or PRRN
      event. The current code looks up the node in the device tree to
      remove the LMB from, the crash occurs when we try to offline this
      node and it does not have any data, i.e. node_data[nid] == NULL.
      
      36:mon> e
      cpu 0x36: Vector: 300 (Data Access) at [c0000001828b7810]
          pc: c00000000036d08c: try_offline_node+0x2c/0x1b0
          lr: c0000000003a14ec: remove_memory+0xbc/0x110
          sp: c0000001828b7a90
         msr: 800000000280b033
         dar: 9a28
       dsisr: 40000000
        current = 0xc0000006329c4c80
        paca    = 0xc000000007a55200   softe: 0        irq_happened: 0x01
          pid   = 76926, comm = kworker/u320:3
      
      36:mon> t
      [link register   ] c0000000003a14ec remove_memory+0xbc/0x110
      [c0000001828b7a90] c00000000006a1cc arch_remove_memory+0x9c/0xd0 (unreliable)
      [c0000001828b7ad0] c0000000003a14e0 remove_memory+0xb0/0x110
      [c0000001828b7b20] c0000000000c7db4 dlpar_remove_lmb+0x94/0x160
      [c0000001828b7b60] c0000000000c8ef8 dlpar_memory+0x7e8/0xd10
      [c0000001828b7bf0] c0000000000bf828 handle_dlpar_errorlog+0xf8/0x160
      [c0000001828b7c60] c0000000000bf8cc pseries_hp_work_fn+0x3c/0xa0
      [c0000001828b7c90] c000000000128cd8 process_one_work+0x298/0x5a0
      [c0000001828b7d20] c000000000129068 worker_thread+0x88/0x620
      [c0000001828b7dc0] c00000000013223c kthread+0x1ac/0x1c0
      [c0000001828b7e30] c00000000000b45c ret_from_kernel_thread+0x5c/0x80
      
      To resolve this we need to track the node a LMB belongs to when
      it is added to the system so we can remove it from that node instead
      of the node that the device tree indicates it should belong to.
      Signed-off-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      b2d3b5ee
  2. 28 Apr, 2019 1 commit
  3. 21 Apr, 2019 32 commits
  4. 20 Apr, 2019 6 commits