• Qiuxu Zhuo's avatar
    EDAC, sb_edac: Fix reporting for patrol scrubber errors · 8489b17c
    Qiuxu Zhuo authored
    sb_edac sometimes reports the wrong DIMM for a memory error found by
    the patrol scrubber. That is because the hardware provides only a 4KB
    page-aligned address for the error case.
    
    This means that the EDAC driver will point at the DIMM matching offset
    0x0 in the 4KB page, but because of interleaving across channels and
    ranks, the actual DIMM involved may be different if the error is on some
    other cache line within the page.
    
    Therefore, reconstruct the socket/iMC/channel information from the "mce"
    structure passed to the EDAC driver. The DIMM cannot be determined, so
    pass "dimm=-1" to the EDAC core. It will report that all the DIMMs on
    that channel may be affected.
    Signed-off-by: default avatarQiuxu Zhuo <qiuxu.zhuo@intel.com>
    Cc: Aristeu Rozanski <aris@redhat.com>
    Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
    Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
    Cc: linux-edac <linux-edac@vger.kernel.org>
    Link: http://lkml.kernel.org/r/20180907230828.13901-3-tony.luck@intel.com
    [ Improve comments on the functions to convert bank number
      to memory controller number. Minor cleanup to commit message. ]
    Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
    [ Massage commit message more. ]
    Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
    8489b17c
sb_edac.c 94.2 KB