• Nicholas Piggin's avatar
    powerpc/powernv: Use kernel crash path for machine checks · 6fcd6baa
    Nicholas Piggin authored
    There are quite a few machine check exceptions that can be caused by
    kernel bugs. To make debugging easier, use the kernel crash path in
    cases of synchronous machine checks that occur in kernel mode, if that
    would not result in the machine going straight to panic or crash dump.
    
    There is a downside here that die()ing the process in kernel mode can
    still leave the system unstable. panic_on_oops will always force the
    system to fail-stop, so systems where that behaviour is important will
    still do the right thing.
    
    As a test, when triggering an i-side 0111b error (ifetch from foreign
    address) in kernel mode process context on POWER9, the kernel currently
    dies quickly like this:
    
      Severe Machine check interrupt [Not recovered]
        NIP [ffff000000000000]: 0xffff000000000000
        Initiator: CPU
        Error type: Real address [Instruction fetch (foreign)]
      [  127.426651616,0] OPAL: Reboot requested due to Platform error.
          Effective[  127.426693712,3] OPAL: Reboot requested due to Platform error. address: ffff000000000000
      opal: Reboot type 1 not supported
      Kernel panic - not syncing: PowerNV Unrecovered Machine Check
      CPU: 56 PID: 4425 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a26-dirty #35
      Call Trace:
      [  128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to
        buggy/mising code in OPAL for this BMC
        Rebooting in 10 seconds..
      Trying to free IRQ 496 from IRQ context!
    
    After this patch, the process is killed and the kernel continues with
    this message, which gives enough information to identify the offending
    branch (i.e., with CFAR):
    
      Severe Machine check interrupt [Not recovered]
        NIP [ffff000000000000]: 0xffff000000000000
        Initiator: CPU
        Error type: Real address [Instruction fetch (foreign)]
          Effective address: ffff000000000000
      Oops: Machine check, sig: 7 [#1]
      SMP NR_CPUS=2048
      NUMA
      PowerNV
      Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 ...
      CPU: 22 PID: 4436 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a26-dirty #36
      task: c000000932300000 task.stack: c000000932380000
      NIP: ffff000000000000 LR: 00000000217706a4 CTR: ffff000000000000
      REGS: c00000000fc8fd80 TRAP: 0200   Tainted: G   M             (4.12.0-rc1-13857-ga4700a26-dirty)
      MSR: 90000000001c1003 <SF,HV,ME,RI,LE>
        CR: 24000484  XER: 20000000
      CFAR: c000000000004c80 DAR: 0000000021770a90 DSISR: 0a000000 SOFTE: 1
      GPR00: 0000000000001ebe 00007fffce4818b0 0000000021797f00 0000000000000000
      GPR04: 00007fff8007ac24 0000000044000484 0000000000004000 00007fff801405e8
      GPR08: 900000000280f033 0000000024000484 0000000000000000 0000000000000030
      GPR12: 9000000000001003 00007fff801bc370 0000000000000000 0000000000000000
      GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      GPR28: 00007fff801b0000 0000000000000000 00000000217707a0 00007fffce481918
      NIP [ffff000000000000] 0xffff000000000000
      LR [00000000217706a4] 0x217706a4
      Call Trace:
      Instruction dump:
      XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
      XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
    Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
    Reviewed-by: default avatarMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    6fcd6baa
fadump.c 41.1 KB