• Daniel J Blueman's avatar
    EDAC, amd64_edac: Prevent OOPS with >16 memory controllers · 0c510cc8
    Daniel J Blueman authored
    When DRAM errors occur on memory controllers after EDAC_MAX_MCS (16),
    the kernel fatally dereferences unallocated structures, see splat below;
    this occurs on at least NumaConnect systems.
    
    Fix by checking if a memory controller info structure was found.
    
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000320
    IP: [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0
    PGD 2f8b5a3067 PUD 2f8b5a2067 PMD 0
    Oops: 0000 [#2] SMP
    Modules linked in:
    CPU: 224 PID: 11930 Comm: stream_c.exe.gn Tainted: G   D    3.19.0 #1
    Hardware name: Supermicro H8QGL/H8QGL, BIOS 3.5b    01/28/2015
    task: ffff8807dbfb8c00 ti: ffff8807dd16c000 task.ti: ffff8807dd16c000
    RIP: 0010:[<ffffffff819f714f>] [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0
    RSP: 0000:ffff8907dfc03c48 EFLAGS: 00010297
    RAX: 0000000000000001 RBX: 9c67400010080a13 RCX: 0000000000001dc6
    RDX: 000000001dc61dc6 RSI: ffff8907dfc03df0 RDI: 000000000000001c
    RBP: ffff8907dfc03ce8 R08: 0000000000000000 R09: 0000000000000022
    R10: ffff891fffa30380 R11: 00000000001cfc90 R12: 0000000000000008
    R13: 0000000000000000 R14: 000000000000001c R15: 00009c6740001000
    FS: 00007fa97ee18700(0000) GS:ffff8907dfc00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000320 CR3: 0000003f889b8000 CR4: 00000000000407e0
    Stack:
     0000000000000000 ffff8907dfc03df0 0000000000000008 9c67400010080a13
     000000000000001c 00009c6740001000 ffff8907dfc03c88 ffffffff810e4f9a
     ffff8907dfc03ce8 ffffffff81b375b9 0000000000000000 0000000000000010
    Call Trace:
     <IRQ>
     ? vprintk_default
     ? printk
     amd_decode_mce
     notifier_call_chain
     atomic_notifier_call_chain
     mce_log
     machine_check_poll
     mce_timer_fn
     ? mce_cpu_restart
     call_timer_fn.isra.29
     run_timer_softirq
     __do_softirq
     irq_exit
     smp_apic_timer_interrupt
     apic_timer_interrupt
     <EOI>
     ? down_read_trylock
     __do_page_fault
     ? __schedule
     do_page_fault
     page_fault
    Signed-off-by: default avatarDaniel J Blueman <daniel@numascale.com>
    Link: http://lkml.kernel.org/r/1424144078-24589-1-git-send-email-daniel@numascale.com
    Cc: stable@vger.kernel.org
    [ Boris: massage commit message ]
    Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
    0c510cc8
amd64_edac.c 77.8 KB