• Linus Torvalds's avatar
    Merge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac · 87a5af24
    Linus Torvalds authored
    Pull EDAC internal API changes from Mauro Carvalho Chehab:
     "This changeset is the first part of a series of patches that fixes the
      EDAC sybsystem.  On this set, it changes the Kernel EDAC API in order
      to properly represent the Intel i3/i5/i7, Xeon 3xxx/5xxx/7xxx, and
      Intel E5-xxxx memory controllers.
    
      The EDAC core used to assume that:
    
           - the DRAM chip select pin is directly accessed by the memory
             controller
    
           - when multiple channels are used, they're all filled with the
             same type of memory.
    
      None of the above premises is true on Intel memory controllers since
      2002, when RAMBUS and FB-DIMMs were introduced, and Advanced Memory
      Buffer or by some similar technologies hides the direct access to the
      DRAM pins.
    
      So, the existing drivers for those chipsets had to lie to the EDAC
      core, in general telling that just one channel is filled.  That
      produces some hard to understand error messages like:
    
           EDAC MC0: CE row 3, channel 0, label "DIMM1": 1 Unknown error(s): memory read error on FATAL area : cpu=0 Err=0008:00c2 (ch=2), addr = 0xad1f73480 => socket=0, Channel=0(mask=2), rank=1
    
      The location information there (row3 channel 0) is completely bogus:
      it has no physical meaning, and are just some random values that the
      driver uses to talk with the EDAC core.  The error actually happened
      at CPU socket 0, channel 0, slot 1, but this is not reported anywhere,
      as the EDAC core doesn't know anything about the memory layout.  So,
      only advanced users that know how the EDAC driver works and that tests
      their systems to see how DIMMs are mapped can actually benefit for
      such error logs.
    
      This patch series fixes the error report logic, in order to allow the
      EDAC to expose the memory architecture used by them to the EDAC core.
      So, as the EDAC core now understands how the memory is organized, it
      can provide an useful report:
    
           EDAC MC0: CE memory read error on DIMM1 (channel:0 slot:1 page:0x364b1b offset:0x600 grain:32 syndrome:0x0 - count:1 area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:4)
    
      The location of the DIMM where the error happened is reported by "MC0"
      (cpu socket #0), at "channel:0 slot:1" location, and matches the
      physical location of the DIMM.
    
      There are two remaining issues not covered by this patch series:
    
           - The EDAC sysfs API will still report bogus values.  So,
             userspace tools like edac-utils will still use the bogus data;
    
           - Add a new tracepoint-based way to get the binary information
             about the errors.
    
      Those are on a second series of patches (also at -next), but will
      probably miss the train for 3.5, due to the slow review process."
    
    Fix up trivial conflict (due to spelling correction of removed code) in
    drivers/edac/edac_device.c
    
    * git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (42 commits)
      i7core: fix ranks information at the per-channel struct
      i5000: Fix the fatal error handling
      i5100_edac: Fix a warning when compiled with 32 bits
      i82975x_edac: Test nr_pages earlier to save a few CPU cycles
      e752x_edac: provide more info about how DIMMS/ranks are mapped
      i5000_edac: Fix the logic that retrieves memory information
      i5400_edac: improve debug messages to better represent the filled memory
      edac: Cleanup the logs for i7core and sb edac drivers
      edac: Initialize the dimm label with the known information
      edac: Remove the legacy EDAC ABI
      x38_edac: convert driver to use the new edac ABI
      tile_edac: convert driver to use the new edac ABI
      sb_edac: convert driver to use the new edac ABI
      r82600_edac: convert driver to use the new edac ABI
      ppc4xx_edac: convert driver to use the new edac ABI
      pasemi_edac: convert driver to use the new edac ABI
      mv64x60_edac: convert driver to use the new edac ABI
      mpc85xx_edac: convert driver to use the new edac ABI
      i82975x_edac: convert driver to use the new edac ABI
      i82875p_edac: convert driver to use the new edac ABI
      ...
    87a5af24
edac_core.h 14.3 KB