• Linus Torvalds's avatar
    Merge tag 'ras_core_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · aa35a483
    Linus Torvalds authored
    Pull RAS updates from Borislav Petkov:
    
     - Add initial support for RAS hardware found on AMD server GPUs (MI200).
    
       Those GPUs and CPUs are connected together through the coherent
       fabric and the GPU memory controllers report errors through x86's MCA
       so EDAC needs to support them. The amd64_edac driver supports now HBM
       (High Bandwidth Memory) and thus such heterogeneous memory controller
       systems
    
     - Other small cleanups and improvements
    
    * tag 'ras_core_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      EDAC/amd64: Cache and use GPU node map
      EDAC/amd64: Add support for AMD heterogeneous Family 19h Model 30h-3Fh
      EDAC/amd64: Document heterogeneous system enumeration
      x86/MCE/AMD, EDAC/mce_amd: Decode UMC_V2 ECC errors
      x86/amd_nb: Re-sort and re-indent PCI defines
      x86/amd_nb: Add MI200 PCI IDs
      ras/debugfs: Fix error checking for debugfs_create_dir()
      x86/MCE: Check a hw error's address to determine proper recovery action
    aa35a483
amd64_edac.c 112 KB