Commit aa35a483 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'ras_core_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:

 - Add initial support for RAS hardware found on AMD server GPUs (MI200).

   Those GPUs and CPUs are connected together through the coherent
   fabric and the GPU memory controllers report errors through x86's MCA
   so EDAC needs to support them. The amd64_edac driver supports now HBM
   (High Bandwidth Memory) and thus such heterogeneous memory controller
   systems

 - Other small cleanups and improvements

* tag 'ras_core_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  EDAC/amd64: Cache and use GPU node map
  EDAC/amd64: Add support for AMD heterogeneous Family 19h Model 30h-3Fh
  EDAC/amd64: Document heterogeneous system enumeration
  x86/MCE/AMD, EDAC/mce_amd: Decode UMC_V2 ECC errors
  x86/amd_nb: Re-sort and re-indent PCI defines
  x86/amd_nb: Add MI200 PCI IDs
  ras/debugfs: Fix error checking for debugfs_create_dir()
  x86/MCE: Check a hw error's address to determine proper recovery action
parents e5ce2f19 4251566e
...@@ -106,6 +106,16 @@ will occupy those chip-select rows. ...@@ -106,6 +106,16 @@ will occupy those chip-select rows.
This term is avoided because it is unclear when needing to distinguish This term is avoided because it is unclear when needing to distinguish
between chip-select rows and socket sets. between chip-select rows and socket sets.
* High Bandwidth Memory (HBM)
HBM is a new memory type with low power consumption and ultra-wide
communication lanes. It uses vertically stacked memory chips (DRAM dies)
interconnected by microscopic wires called "through-silicon vias," or
TSVs.
Several stacks of HBM chips connect to the CPU or GPU through an ultra-fast
interconnect called the "interposer". Therefore, HBM's characteristics
are nearly indistinguishable from on-chip integrated RAM.
Memory Controllers Memory Controllers
------------------ ------------------
...@@ -176,3 +186,113 @@ nodes:: ...@@ -176,3 +186,113 @@ nodes::
the L1 and L2 directories would be "edac_device_block's" the L1 and L2 directories would be "edac_device_block's"
.. kernel-doc:: drivers/edac/edac_device.h .. kernel-doc:: drivers/edac/edac_device.h
Heterogeneous system support
----------------------------
An AMD heterogeneous system is built by connecting the data fabrics of
both CPUs and GPUs via custom xGMI links. Thus, the data fabric on the
GPU nodes can be accessed the same way as the data fabric on CPU nodes.
The MI200 accelerators are data center GPUs. They have 2 data fabrics,
and each GPU data fabric contains four Unified Memory Controllers (UMC).
Each UMC contains eight channels. Each UMC channel controls one 128-bit
HBM2e (2GB) channel (equivalent to 8 X 2GB ranks). This creates a total
of 4096-bits of DRAM data bus.
While the UMC is interfacing a 16GB (8high X 2GB DRAM) HBM stack, each UMC
channel is interfacing 2GB of DRAM (represented as rank).
Memory controllers on AMD GPU nodes can be represented in EDAC thusly:
GPU DF / GPU Node -> EDAC MC
GPU UMC -> EDAC CSROW
GPU UMC channel -> EDAC CHANNEL
For example: a heterogeneous system with 1 AMD CPU is connected to
4 MI200 (Aldebaran) GPUs using xGMI.
Some more heterogeneous hardware details:
- The CPU UMC (Unified Memory Controller) is mostly the same as the GPU UMC.
They have chip selects (csrows) and channels. However, the layouts are different
for performance, physical layout, or other reasons.
- CPU UMCs use 1 channel, In this case UMC = EDAC channel. This follows the
marketing speak. CPU has X memory channels, etc.
- CPU UMCs use up to 4 chip selects, So UMC chip select = EDAC CSROW.
- GPU UMCs use 1 chip select, So UMC = EDAC CSROW.
- GPU UMCs use 8 channels, So UMC channel = EDAC channel.
The EDAC subsystem provides a mechanism to handle AMD heterogeneous
systems by calling system specific ops for both CPUs and GPUs.
AMD GPU nodes are enumerated in sequential order based on the PCI
hierarchy, and the first GPU node is assumed to have a Node ID value
following those of the CPU nodes after latter are fully populated::
$ ls /sys/devices/system/edac/mc/
mc0 - CPU MC node 0
mc1 |
mc2 |- GPU card[0] => node 0(mc1), node 1(mc2)
mc3 |
mc4 |- GPU card[1] => node 0(mc3), node 1(mc4)
mc5 |
mc6 |- GPU card[2] => node 0(mc5), node 1(mc6)
mc7 |
mc8 |- GPU card[3] => node 0(mc7), node 1(mc8)
For example, a heterogeneous system with one AMD CPU is connected to
four MI200 (Aldebaran) GPUs using xGMI. This topology can be represented
via the following sysfs entries::
/sys/devices/system/edac/mc/..
CPU # CPU node
├── mc 0
GPU Nodes are enumerated sequentially after CPU nodes have been populated
GPU card 1 # Each MI200 GPU has 2 nodes/mcs
├── mc 1 # GPU node 0 == mc1, Each MC node has 4 UMCs/CSROWs
│   ├── csrow 0 # UMC 0
│   │   ├── channel 0 # Each UMC has 8 channels
│   │   ├── channel 1 # size of each channel is 2 GB, so each UMC has 16 GB
│   │   ├── channel 2
│   │   ├── channel 3
│   │   ├── channel 4
│   │   ├── channel 5
│   │   ├── channel 6
│   │   ├── channel 7
│   ├── csrow 1 # UMC 1
│   │   ├── channel 0
│   │   ├── ..
│   │   ├── channel 7
│   ├── .. ..
│   ├── csrow 3 # UMC 3
│   │   ├── channel 0
│   │   ├── ..
│   │   ├── channel 7
│   ├── rank 0
│   ├── .. ..
│   ├── rank 31 # total 32 ranks/dimms from 4 UMCs
├── mc 2 # GPU node 1 == mc2
│   ├── .. # each GPU has total 64 GB
GPU card 2
├── mc 3
│   ├── ..
├── mc 4
│   ├── ..
GPU card 3
├── mc 5
│   ├── ..
├── mc 6
│   ├── ..
GPU card 4
├── mc 7
│   ├── ..
├── mc 8
│   ├── ..
...@@ -21,8 +21,11 @@ ...@@ -21,8 +21,11 @@
#define PCI_DEVICE_ID_AMD_17H_M60H_ROOT 0x1630 #define PCI_DEVICE_ID_AMD_17H_M60H_ROOT 0x1630
#define PCI_DEVICE_ID_AMD_17H_MA0H_ROOT 0x14b5 #define PCI_DEVICE_ID_AMD_17H_MA0H_ROOT 0x14b5
#define PCI_DEVICE_ID_AMD_19H_M10H_ROOT 0x14a4 #define PCI_DEVICE_ID_AMD_19H_M10H_ROOT 0x14a4
#define PCI_DEVICE_ID_AMD_19H_M40H_ROOT 0x14b5
#define PCI_DEVICE_ID_AMD_19H_M60H_ROOT 0x14d8 #define PCI_DEVICE_ID_AMD_19H_M60H_ROOT 0x14d8
#define PCI_DEVICE_ID_AMD_19H_M70H_ROOT 0x14e8 #define PCI_DEVICE_ID_AMD_19H_M70H_ROOT 0x14e8
#define PCI_DEVICE_ID_AMD_MI200_ROOT 0x14bb
#define PCI_DEVICE_ID_AMD_17H_DF_F4 0x1464 #define PCI_DEVICE_ID_AMD_17H_DF_F4 0x1464
#define PCI_DEVICE_ID_AMD_17H_M10H_DF_F4 0x15ec #define PCI_DEVICE_ID_AMD_17H_M10H_DF_F4 0x15ec
#define PCI_DEVICE_ID_AMD_17H_M30H_DF_F4 0x1494 #define PCI_DEVICE_ID_AMD_17H_M30H_DF_F4 0x1494
...@@ -31,12 +34,12 @@ ...@@ -31,12 +34,12 @@
#define PCI_DEVICE_ID_AMD_17H_MA0H_DF_F4 0x1728 #define PCI_DEVICE_ID_AMD_17H_MA0H_DF_F4 0x1728
#define PCI_DEVICE_ID_AMD_19H_DF_F4 0x1654 #define PCI_DEVICE_ID_AMD_19H_DF_F4 0x1654
#define PCI_DEVICE_ID_AMD_19H_M10H_DF_F4 0x14b1 #define PCI_DEVICE_ID_AMD_19H_M10H_DF_F4 0x14b1
#define PCI_DEVICE_ID_AMD_19H_M40H_ROOT 0x14b5
#define PCI_DEVICE_ID_AMD_19H_M40H_DF_F4 0x167d #define PCI_DEVICE_ID_AMD_19H_M40H_DF_F4 0x167d
#define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e
#define PCI_DEVICE_ID_AMD_19H_M60H_DF_F4 0x14e4 #define PCI_DEVICE_ID_AMD_19H_M60H_DF_F4 0x14e4
#define PCI_DEVICE_ID_AMD_19H_M70H_DF_F4 0x14f4 #define PCI_DEVICE_ID_AMD_19H_M70H_DF_F4 0x14f4
#define PCI_DEVICE_ID_AMD_19H_M78H_DF_F4 0x12fc #define PCI_DEVICE_ID_AMD_19H_M78H_DF_F4 0x12fc
#define PCI_DEVICE_ID_AMD_MI200_DF_F4 0x14d4
/* Protect the PCI config register pairs used for SMN. */ /* Protect the PCI config register pairs used for SMN. */
static DEFINE_MUTEX(smn_mutex); static DEFINE_MUTEX(smn_mutex);
...@@ -53,6 +56,7 @@ static const struct pci_device_id amd_root_ids[] = { ...@@ -53,6 +56,7 @@ static const struct pci_device_id amd_root_ids[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_ROOT) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_ROOT) },
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M60H_ROOT) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M60H_ROOT) },
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M70H_ROOT) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M70H_ROOT) },
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_MI200_ROOT) },
{} {}
}; };
...@@ -81,6 +85,7 @@ static const struct pci_device_id amd_nb_misc_ids[] = { ...@@ -81,6 +85,7 @@ static const struct pci_device_id amd_nb_misc_ids[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M60H_DF_F3) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M60H_DF_F3) },
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M70H_DF_F3) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M70H_DF_F3) },
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M78H_DF_F3) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M78H_DF_F3) },
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_MI200_DF_F3) },
{} {}
}; };
...@@ -101,6 +106,7 @@ static const struct pci_device_id amd_nb_link_ids[] = { ...@@ -101,6 +106,7 @@ static const struct pci_device_id amd_nb_link_ids[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_DF_F4) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_DF_F4) },
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F4) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F4) },
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F4) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F4) },
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_MI200_DF_F4) },
{} {}
}; };
......
...@@ -715,11 +715,13 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c) ...@@ -715,11 +715,13 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
bool amd_mce_is_memory_error(struct mce *m) bool amd_mce_is_memory_error(struct mce *m)
{ {
enum smca_bank_types bank_type;
/* ErrCodeExt[20:16] */ /* ErrCodeExt[20:16] */
u8 xec = (m->status >> 16) & 0x1f; u8 xec = (m->status >> 16) & 0x1f;
bank_type = smca_get_bank_type(m->extcpu, m->bank);
if (mce_flags.smca) if (mce_flags.smca)
return smca_get_bank_type(m->extcpu, m->bank) == SMCA_UMC && xec == 0x0; return (bank_type == SMCA_UMC || bank_type == SMCA_UMC_V2) && xec == 0x0;
return m->bank == 4 && xec == 0x8; return m->bank == 4 && xec == 0x8;
} }
...@@ -1050,7 +1052,7 @@ static const char *get_name(unsigned int cpu, unsigned int bank, struct threshol ...@@ -1050,7 +1052,7 @@ static const char *get_name(unsigned int cpu, unsigned int bank, struct threshol
if (bank_type >= N_SMCA_BANK_TYPES) if (bank_type >= N_SMCA_BANK_TYPES)
return NULL; return NULL;
if (b && bank_type == SMCA_UMC) { if (b && (bank_type == SMCA_UMC || bank_type == SMCA_UMC_V2)) {
if (b->block < ARRAY_SIZE(smca_umc_block_names)) if (b->block < ARRAY_SIZE(smca_umc_block_names))
return smca_umc_block_names[b->block]; return smca_umc_block_names[b->block];
return NULL; return NULL;
......
...@@ -1533,7 +1533,7 @@ noinstr void do_machine_check(struct pt_regs *regs) ...@@ -1533,7 +1533,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
/* If this triggers there is no way to recover. Die hard. */ /* If this triggers there is no way to recover. Die hard. */
BUG_ON(!on_thread_stack() || !user_mode(regs)); BUG_ON(!on_thread_stack() || !user_mode(regs));
if (kill_current_task) if (!mce_usable_address(&m))
queue_task_work(&m, msg, kill_me_now); queue_task_work(&m, msg, kill_me_now);
else else
queue_task_work(&m, msg, kill_me_maybe); queue_task_work(&m, msg, kill_me_maybe);
......
This diff is collapsed.
...@@ -16,6 +16,7 @@ ...@@ -16,6 +16,7 @@
#include <linux/slab.h> #include <linux/slab.h>
#include <linux/mmzone.h> #include <linux/mmzone.h>
#include <linux/edac.h> #include <linux/edac.h>
#include <linux/bitfield.h>
#include <asm/cpu_device_id.h> #include <asm/cpu_device_id.h>
#include <asm/msr.h> #include <asm/msr.h>
#include "edac_module.h" #include "edac_module.h"
......
...@@ -1186,7 +1186,8 @@ static void decode_smca_error(struct mce *m) ...@@ -1186,7 +1186,8 @@ static void decode_smca_error(struct mce *m)
if (xec < smca_mce_descs[bank_type].num_descs) if (xec < smca_mce_descs[bank_type].num_descs)
pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]); pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]);
if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc) if ((bank_type == SMCA_UMC || bank_type == SMCA_UMC_V2) &&
xec == 0 && decode_dram_ecc)
decode_dram_ecc(topology_die_id(m->extcpu), m); decode_dram_ecc(topology_die_id(m->extcpu), m);
} }
......
...@@ -46,7 +46,7 @@ int __init ras_add_daemon_trace(void) ...@@ -46,7 +46,7 @@ int __init ras_add_daemon_trace(void)
fentry = debugfs_create_file("daemon_active", S_IRUSR, ras_debugfs_dir, fentry = debugfs_create_file("daemon_active", S_IRUSR, ras_debugfs_dir,
NULL, &trace_fops); NULL, &trace_fops);
if (!fentry) if (IS_ERR(fentry))
return -ENODEV; return -ENODEV;
return 0; return 0;
......
...@@ -568,6 +568,7 @@ ...@@ -568,6 +568,7 @@
#define PCI_DEVICE_ID_AMD_19H_M60H_DF_F3 0x14e3 #define PCI_DEVICE_ID_AMD_19H_M60H_DF_F3 0x14e3
#define PCI_DEVICE_ID_AMD_19H_M70H_DF_F3 0x14f3 #define PCI_DEVICE_ID_AMD_19H_M70H_DF_F3 0x14f3
#define PCI_DEVICE_ID_AMD_19H_M78H_DF_F3 0x12fb #define PCI_DEVICE_ID_AMD_19H_M78H_DF_F3 0x12fb
#define PCI_DEVICE_ID_AMD_MI200_DF_F3 0x14d3
#define PCI_DEVICE_ID_AMD_CNB17H_F3 0x1703 #define PCI_DEVICE_ID_AMD_CNB17H_F3 0x1703
#define PCI_DEVICE_ID_AMD_LANCE 0x2000 #define PCI_DEVICE_ID_AMD_LANCE 0x2000
#define PCI_DEVICE_ID_AMD_LANCE_HOME 0x2001 #define PCI_DEVICE_ID_AMD_LANCE_HOME 0x2001
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment