• Mahesh Salgaonkar's avatar
    pseries/eeh: Fix the kdump kernel crash during eeh_pseries_init · eb8257a1
    Mahesh Salgaonkar authored
    On pseries LPAR when an empty slot is assigned to partition OR in single
    LPAR mode, kdump kernel crashes during issuing PHB reset.
    
    In the kdump scenario, we traverse all PHBs and issue reset using the
    pe_config_addr of the first child device present under each PHB. However
    the code assumes that none of the PHB slots can be empty and uses
    list_first_entry() to get the first child device under the PHB. Since
    list_first_entry() expects the list to be non-empty, it returns an
    invalid pci_dn entry and ends up accessing NULL phb pointer under
    pci_dn->phb causing kdump kernel crash.
    
    This patch fixes the below kdump kernel crash by skipping empty slots:
    
      audit: initializing netlink subsys (disabled)
      thermal_sys: Registered thermal governor 'fair_share'
      thermal_sys: Registered thermal governor 'step_wise'
      cpuidle: using governor menu
      pstore: Registered nvram as persistent store backend
      Issue PHB reset ...
      audit: type=2000 audit(1631267818.000:1): state=initialized audit_enabled=0 res=1
      BUG: Kernel NULL pointer dereference on read at 0x00000268
      Faulting instruction address: 0xc000000008101fb0
      Oops: Kernel access of bad area, sig: 7 [#1]
      LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
      Modules linked in:
      CPU: 7 PID: 1 Comm: swapper/7 Not tainted 5.14.0 #1
      NIP:  c000000008101fb0 LR: c000000009284ccc CTR: c000000008029d70
      REGS: c00000001161b840 TRAP: 0300   Not tainted  (5.14.0)
      MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000224  XER: 20040002
      CFAR: c000000008101f0c DAR: 0000000000000268 DSISR: 00080000 IRQMASK: 0
      ...
      NIP pseries_eeh_get_pe_config_addr+0x100/0x1b0
      LR  __machine_initcall_pseries_eeh_pseries_init+0x2cc/0x350
      Call Trace:
        0xc00000001161bb80 (unreliable)
        __machine_initcall_pseries_eeh_pseries_init+0x2cc/0x350
        do_one_initcall+0x60/0x2d0
        kernel_init_freeable+0x350/0x3f8
        kernel_init+0x3c/0x17c
        ret_from_kernel_thread+0x5c/0x64
    
    Fixes: 5a090f7c ("powerpc/pseries: PCIE PHB reset")
    Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.ibm.com>
    [mpe: Tweak wording and trim oops]
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/163215558252.413351.8600189949820258982.stgit@jupiter
    eb8257a1
eeh_pseries.c 23.8 KB