Commit 9c91e6a5 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'edac_for_5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:
 "A lot of changes this time around, details below.

  From the next cycle onwards, we'll switch the EDAC tree to topic
  branches (instead of a single edac-for-next branch) which should make
  the changes handling more flexible, hopefully. We'll see.

  Summary:

   - Rework error logging functions to accept a count of errors
     parameter (Hanna Hawa)

   - Part one of substantial EDAC core + ghes_edac driver cleanup
     (Robert Richter)

   - Print additional useful logging information in skx_* (Tony Luck)

   - Improve amd64_edac hw detection + cleanups (Yazen Ghannam)

   - Misc cleanups, fixes and code improvements"

* tag 'edac_for_5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (35 commits)
  EDAC/altera: Use the Altera System Manager driver
  EDAC/altera: Cleanup the ECC Manager
  EDAC/altera: Use fast register IO for S10 IRQs
  EDAC/ghes: Do not warn when incrementing refcount on 0
  EDAC/Documentation: Describe CPER module definition and DIMM ranks
  EDAC: Unify the mc_event tracepoint call
  EDAC/ghes: Remove intermediate buffer pvt->detail_location
  EDAC/ghes: Fix grain calculation
  EDAC/ghes: Use standard kernel macros for page calculations
  EDAC: Remove misleading comment in struct edac_raw_error_desc
  EDAC/mc: Reduce indentation level in edac_mc_handle_error()
  EDAC/mc: Remove needless zero string termination
  EDAC/mc: Do not BUG_ON() in edac_mc_alloc()
  EDAC: Introduce an mci_for_each_dimm() iterator
  EDAC: Remove EDAC_DIMM_OFF() macro
  EDAC: Replace EDAC_DIMM_PTR() macro with edac_get_dimm() function
  EDAC/amd64: Get rid of the ECC disabled long message
  EDAC/ghes: Fix locking and memory barrier issues
  EDAC/amd64: Check for memory before fully initializing an instance
  EDAC/amd64: Use cached data when checking for ECC
  ...
parents 752272f1 5781823f
......@@ -330,9 +330,12 @@ There can be multiple csrows and multiple channels.
.. [#f4] Nowadays, the term DIMM (Dual In-line Memory Module) is widely
used to refer to a memory module, although there are other memory
packaging alternatives, like SO-DIMM, SIMM, etc. Along this document,
and inside the EDAC system, the term "dimm" is used for all memory
modules, even when they use a different kind of packaging.
packaging alternatives, like SO-DIMM, SIMM, etc. The UEFI
specification (Version 2.7) defines a memory module in the Common
Platform Error Record (CPER) section to be an SMBIOS Memory Device
(Type 17). Along this document, and inside the EDAC subsystem, the term
"dimm" is used for all memory modules, even when they use a
different kind of packaging.
Memory controllers allow for several csrows, with 8 csrows being a
typical value. Yet, the actual number of csrows depends on the layout of
......@@ -349,12 +352,14 @@ controllers. The following example will assume 2 channels:
| | ``ch0`` | ``ch1`` |
+============+===========+===========+
| ``csrow0`` | DIMM_A0 | DIMM_B0 |
+------------+ | |
| ``csrow1`` | | |
| | rank0 | rank0 |
+------------+ - | - |
| ``csrow1`` | rank1 | rank1 |
+------------+-----------+-----------+
| ``csrow2`` | DIMM_A1 | DIMM_B1 |
+------------+ | |
| ``csrow3`` | | |
| | rank0 | rank0 |
+------------+ - | - |
| ``csrow3`` | rank1 | rank1 |
+------------+-----------+-----------+
In the above example, there are 4 physical slots on the motherboard
......@@ -374,11 +379,13 @@ which the memory DIMM is placed. Thus, when 1 DIMM is placed in each
Channel, the csrows cross both DIMMs.
Memory DIMMs come single or dual "ranked". A rank is a populated csrow.
Thus, 2 single ranked DIMMs, placed in slots DIMM_A0 and DIMM_B0 above
will have just one csrow (csrow0). csrow1 will be empty. On the other
hand, when 2 dual ranked DIMMs are similarly placed, then both csrow0
and csrow1 will be populated. The pattern repeats itself for csrow2 and
csrow3.
In the example above 2 dual ranked DIMMs are similarly placed. Thus,
both csrow0 and csrow1 are populated. On the other hand, when 2 single
ranked DIMMs are placed in slots DIMM_A0 and DIMM_B0, then they will
have just one csrow (csrow0) and csrow1 will be empty. The pattern
repeats itself for csrow2 and csrow3. Also note that some memory
controllers don't have any logic to identify the memory module, see
``rankX`` directories below.
The representation of the above is reflected in the directory
tree in EDAC's sysfs interface. Starting in directory
......
......@@ -14,6 +14,7 @@
#include <linux/interrupt.h>
#include <linux/irqchip/chained_irq.h>
#include <linux/kernel.h>
#include <linux/mfd/altera-sysmgr.h>
#include <linux/mfd/syscon.h>
#include <linux/notifier.h>
#include <linux/of_address.h>
......@@ -275,7 +276,6 @@ static int a10_unmask_irq(struct platform_device *pdev, u32 mask)
return ret;
}
static int socfpga_is_a10(void);
static int altr_sdram_probe(struct platform_device *pdev)
{
const struct of_device_id *id;
......@@ -399,7 +399,7 @@ static int altr_sdram_probe(struct platform_device *pdev)
goto err;
/* Only the Arria10 has separate IRQs */
if (socfpga_is_a10()) {
if (of_machine_is_compatible("altr,socfpga-arria10")) {
/* Arria10 specific initialization */
res = a10_init(mc_vbase);
if (res < 0)
......@@ -502,68 +502,6 @@ module_platform_driver(altr_sdram_edac_driver);
#endif /* CONFIG_EDAC_ALTERA_SDRAM */
/**************** Stratix 10 EDAC Memory Controller Functions ************/
/**
* s10_protected_reg_write
* Write to a protected SMC register.
* @context: Not used.
* @reg: Address of register
* @value: Value to write
* Return: INTEL_SIP_SMC_STATUS_OK (0) on success
* INTEL_SIP_SMC_REG_ERROR on error
* INTEL_SIP_SMC_RETURN_UNKNOWN_FUNCTION if not supported
*/
static int s10_protected_reg_write(void *context, unsigned int reg,
unsigned int val)
{
struct arm_smccc_res result;
unsigned long offset = (unsigned long)context;
arm_smccc_smc(INTEL_SIP_SMC_REG_WRITE, offset + reg, val, 0, 0,
0, 0, 0, &result);
return (int)result.a0;
}
/**
* s10_protected_reg_read
* Read the status of a protected SMC register
* @context: Not used.
* @reg: Address of register
* @value: Value read.
* Return: INTEL_SIP_SMC_STATUS_OK (0) on success
* INTEL_SIP_SMC_REG_ERROR on error
* INTEL_SIP_SMC_RETURN_UNKNOWN_FUNCTION if not supported
*/
static int s10_protected_reg_read(void *context, unsigned int reg,
unsigned int *val)
{
struct arm_smccc_res result;
unsigned long offset = (unsigned long)context;
arm_smccc_smc(INTEL_SIP_SMC_REG_READ, offset + reg, 0, 0, 0,
0, 0, 0, &result);
*val = (unsigned int)result.a1;
return (int)result.a0;
}
static const struct regmap_config s10_sdram_regmap_cfg = {
.name = "s10_ddr",
.reg_bits = 32,
.reg_stride = 4,
.val_bits = 32,
.max_register = 0xffd12228,
.reg_read = s10_protected_reg_read,
.reg_write = s10_protected_reg_write,
.use_single_read = true,
.use_single_write = true,
};
/************** </Stratix10 EDAC Memory Controller Functions> ***********/
/************************* EDAC Parent Probe *************************/
static const struct of_device_id altr_edac_device_of_match[];
......@@ -1008,16 +946,6 @@ static int __maybe_unused altr_init_memory_port(void __iomem *ioaddr, int port)
return ret;
}
static int socfpga_is_a10(void)
{
return of_machine_is_compatible("altr,socfpga-arria10");
}
static int socfpga_is_s10(void)
{
return of_machine_is_compatible("altr,socfpga-stratix10");
}
static __init int __maybe_unused
altr_init_a10_ecc_block(struct device_node *np, u32 irq_mask,
u32 ecc_ctrl_en_mask, bool dual_port)
......@@ -1033,34 +961,10 @@ altr_init_a10_ecc_block(struct device_node *np, u32 irq_mask,
/* Get the ECC Manager - parent of the device EDACs */
np_eccmgr = of_get_parent(np);
if (socfpga_is_a10()) {
ecc_mgr_map = syscon_regmap_lookup_by_phandle(np_eccmgr,
"altr,sysmgr-syscon");
} else {
struct device_node *sysmgr_np;
struct resource res;
uintptr_t base;
sysmgr_np = of_parse_phandle(np_eccmgr,
"altr,sysmgr-syscon", 0);
if (!sysmgr_np) {
edac_printk(KERN_ERR, EDAC_DEVICE,
"Unable to find altr,sysmgr-syscon\n");
return -ENODEV;
}
if (of_address_to_resource(sysmgr_np, 0, &res)) {
of_node_put(sysmgr_np);
return -ENOMEM;
}
ecc_mgr_map =
altr_sysmgr_regmap_lookup_by_phandle(np_eccmgr,
"altr,sysmgr-syscon");
/* Need physical address for SMCC call */
base = res.start;
ecc_mgr_map = regmap_init(NULL, NULL, (void *)base,
&s10_sdram_regmap_cfg);
of_node_put(sysmgr_np);
}
of_node_put(np_eccmgr);
if (IS_ERR(ecc_mgr_map)) {
edac_printk(KERN_ERR, EDAC_DEVICE,
......@@ -1125,9 +1029,6 @@ static int __init __maybe_unused altr_init_a10_ecc_device_type(char *compat)
int irq;
struct device_node *child, *np;
if (!socfpga_is_a10() && !socfpga_is_s10())
return -ENODEV;
np = of_find_compatible_node(NULL, NULL,
"altr,socfpga-a10-ecc-manager");
if (!np) {
......@@ -2178,33 +2079,9 @@ static int altr_edac_a10_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, edac);
INIT_LIST_HEAD(&edac->a10_ecc_devices);
if (socfpga_is_a10()) {
edac->ecc_mgr_map =
syscon_regmap_lookup_by_phandle(pdev->dev.of_node,
"altr,sysmgr-syscon");
} else {
struct device_node *sysmgr_np;
struct resource res;
uintptr_t base;
sysmgr_np = of_parse_phandle(pdev->dev.of_node,
"altr,sysmgr-syscon", 0);
if (!sysmgr_np) {
edac_printk(KERN_ERR, EDAC_DEVICE,
"Unable to find altr,sysmgr-syscon\n");
return -ENODEV;
}
if (of_address_to_resource(sysmgr_np, 0, &res))
return -ENOMEM;
/* Need physical address for SMCC call */
base = res.start;
edac->ecc_mgr_map = devm_regmap_init(&pdev->dev, NULL,
(void *)base,
&s10_sdram_regmap_cfg);
}
edac->ecc_mgr_map =
altr_sysmgr_regmap_lookup_by_phandle(pdev->dev.of_node,
"altr,sysmgr-syscon");
if (IS_ERR(edac->ecc_mgr_map)) {
edac_printk(KERN_ERR, EDAC_DEVICE,
......@@ -2270,18 +2147,7 @@ static int altr_edac_a10_probe(struct platform_device *pdev)
if (!of_device_is_available(child))
continue;
if (of_device_is_compatible(child, "altr,socfpga-a10-l2-ecc") ||
of_device_is_compatible(child, "altr,socfpga-a10-ocram-ecc") ||
of_device_is_compatible(child, "altr,socfpga-eth-mac-ecc") ||
of_device_is_compatible(child, "altr,socfpga-nand-ecc") ||
of_device_is_compatible(child, "altr,socfpga-dma-ecc") ||
of_device_is_compatible(child, "altr,socfpga-usb-ecc") ||
of_device_is_compatible(child, "altr,socfpga-qspi-ecc") ||
#ifdef CONFIG_EDAC_ALTERA_SDRAM
of_device_is_compatible(child, "altr,sdram-edac-s10") ||
#endif
of_device_is_compatible(child, "altr,socfpga-sdmmc-ecc"))
if (of_match_node(altr_edac_a10_device_of_match, child))
altr_edac_a10_device_add(edac, child);
#ifdef CONFIG_EDAC_ALTERA_SDRAM
......
......@@ -16,12 +16,11 @@ module_param(ecc_enable_override, int, 0644);
static struct msr __percpu *msrs;
static struct amd64_family_type *fam_type;
/* Per-node stuff */
static struct ecc_settings **ecc_stngs;
/* Number of Unified Memory Controllers */
static u8 num_umcs;
/*
* Valid scrub rates for the K8 hardware memory scrubber. We map the scrubbing
* bandwidth to a valid bit pattern. The 'set' operation finds the 'matching-
......@@ -454,7 +453,7 @@ static void get_cs_base_and_mask(struct amd64_pvt *pvt, int csrow, u8 dct,
for (i = 0; i < pvt->csels[dct].m_cnt; i++)
#define for_each_umc(i) \
for (i = 0; i < num_umcs; i++)
for (i = 0; i < fam_type->max_mcs; i++)
/*
* @input_addr is an InputAddr associated with the node given by mci. Return the
......@@ -2224,6 +2223,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "K8",
.f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP,
.f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL,
.max_mcs = 2,
.ops = {
.early_channel_count = k8_early_channel_count,
.map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow,
......@@ -2234,6 +2234,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F10h",
.f1_id = PCI_DEVICE_ID_AMD_10H_NB_MAP,
.f2_id = PCI_DEVICE_ID_AMD_10H_NB_DRAM,
.max_mcs = 2,
.ops = {
.early_channel_count = f1x_early_channel_count,
.map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
......@@ -2244,6 +2245,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F15h",
.f1_id = PCI_DEVICE_ID_AMD_15H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2,
.max_mcs = 2,
.ops = {
.early_channel_count = f1x_early_channel_count,
.map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
......@@ -2254,6 +2256,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F15h_M30h",
.f1_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2,
.max_mcs = 2,
.ops = {
.early_channel_count = f1x_early_channel_count,
.map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
......@@ -2264,6 +2267,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F15h_M60h",
.f1_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2,
.max_mcs = 2,
.ops = {
.early_channel_count = f1x_early_channel_count,
.map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
......@@ -2274,6 +2278,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F16h",
.f1_id = PCI_DEVICE_ID_AMD_16H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_16H_NB_F2,
.max_mcs = 2,
.ops = {
.early_channel_count = f1x_early_channel_count,
.map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
......@@ -2284,6 +2289,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F16h_M30h",
.f1_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F1,
.f2_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F2,
.max_mcs = 2,
.ops = {
.early_channel_count = f1x_early_channel_count,
.map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
......@@ -2294,6 +2300,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F17h",
.f0_id = PCI_DEVICE_ID_AMD_17H_DF_F0,
.f6_id = PCI_DEVICE_ID_AMD_17H_DF_F6,
.max_mcs = 2,
.ops = {
.early_channel_count = f17_early_channel_count,
.dbam_to_cs = f17_addr_mask_to_cs_size,
......@@ -2303,6 +2310,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F17h_M10h",
.f0_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F0,
.f6_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F6,
.max_mcs = 2,
.ops = {
.early_channel_count = f17_early_channel_count,
.dbam_to_cs = f17_addr_mask_to_cs_size,
......@@ -2312,6 +2320,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F17h_M30h",
.f0_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F0,
.f6_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F6,
.max_mcs = 8,
.ops = {
.early_channel_count = f17_early_channel_count,
.dbam_to_cs = f17_addr_mask_to_cs_size,
......@@ -2321,6 +2330,7 @@ static struct amd64_family_type family_types[] = {
.ctl_name = "F17h_M70h",
.f0_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F0,
.f6_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F6,
.max_mcs = 2,
.ops = {
.early_channel_count = f17_early_channel_count,
.dbam_to_cs = f17_addr_mask_to_cs_size,
......@@ -2838,8 +2848,6 @@ static void read_mc_regs(struct amd64_pvt *pvt)
edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]);
determine_ecc_sym_sz(pvt);
dump_misc_regs(pvt);
}
/*
......@@ -2936,6 +2944,7 @@ static int init_csrows_df(struct mem_ctl_info *mci)
dimm->mtype = pvt->dram_type;
dimm->edac_mode = edac_mode;
dimm->dtype = dev_type;
dimm->grain = 64;
}
}
......@@ -3012,6 +3021,7 @@ static int init_csrows(struct mem_ctl_info *mci)
dimm = csrow->channels[j]->dimm;
dimm->mtype = pvt->dram_type;
dimm->edac_mode = edac_mode;
dimm->grain = 64;
}
}
......@@ -3178,43 +3188,27 @@ static void restore_ecc_error_reporting(struct ecc_settings *s, u16 nid,
amd64_warn("Error restoring NB MCGCTL settings!\n");
}
/*
* EDAC requires that the BIOS have ECC enabled before
* taking over the processing of ECC errors. A command line
* option allows to force-enable hardware ECC later in
* enable_ecc_error_reporting().
*/
static const char *ecc_msg =
"ECC disabled in the BIOS or no ECC capability, module will not load.\n"
" Either enable ECC checking or force module loading by setting "
"'ecc_enable_override'.\n"
" (Note that use of the override may cause unknown side effects.)\n";
static bool ecc_enabled(struct pci_dev *F3, u16 nid)
static bool ecc_enabled(struct amd64_pvt *pvt)
{
u16 nid = pvt->mc_node_id;
bool nb_mce_en = false;
u8 ecc_en = 0, i;
u32 value;
if (boot_cpu_data.x86 >= 0x17) {
u8 umc_en_mask = 0, ecc_en_mask = 0;
struct amd64_umc *umc;
for_each_umc(i) {
u32 base = get_umc_base(i);
umc = &pvt->umc[i];
/* Only check enabled UMCs. */
if (amd_smn_read(nid, base + UMCCH_SDP_CTRL, &value))
continue;
if (!(value & UMC_SDP_INIT))
if (!(umc->sdp_ctrl & UMC_SDP_INIT))
continue;
umc_en_mask |= BIT(i);
if (amd_smn_read(nid, base + UMCCH_UMC_CAP_HI, &value))
continue;
if (value & UMC_ECC_ENABLED)
if (umc->umc_cap_hi & UMC_ECC_ENABLED)
ecc_en_mask |= BIT(i);
}
......@@ -3227,7 +3221,7 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
/* Assume UMC MCA banks are enabled. */
nb_mce_en = true;
} else {
amd64_read_pci_cfg(F3, NBCFG, &value);
amd64_read_pci_cfg(pvt->F3, NBCFG, &value);
ecc_en = !!(value & NBCFG_ECC_ENABLE);
......@@ -3240,11 +3234,10 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
amd64_info("Node %d: DRAM ECC %s.\n",
nid, (ecc_en ? "enabled" : "disabled"));
if (!ecc_en || !nb_mce_en) {
amd64_info("%s", ecc_msg);
if (!ecc_en || !nb_mce_en)
return false;
}
return true;
else
return true;
}
static inline void
......@@ -3278,8 +3271,7 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
}
}
static void setup_mci_misc_attrs(struct mem_ctl_info *mci,
struct amd64_family_type *fam)
static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
{
struct amd64_pvt *pvt = mci->pvt_info;
......@@ -3298,7 +3290,7 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci,
mci->edac_cap = determine_edac_cap(pvt);
mci->mod_name = EDAC_MOD_STR;
mci->ctl_name = fam->ctl_name;
mci->ctl_name = fam_type->ctl_name;
mci->dev_name = pci_name(pvt->F3);
mci->ctl_page_to_phys = NULL;
......@@ -3312,8 +3304,6 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci,
*/
static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
{
struct amd64_family_type *fam_type = NULL;
pvt->ext_model = boot_cpu_data.x86_model >> 4;
pvt->stepping = boot_cpu_data.x86_stepping;
pvt->model = boot_cpu_data.x86_model;
......@@ -3401,51 +3391,15 @@ static const struct attribute_group *amd64_edac_attr_groups[] = {
NULL
};
/* Set the number of Unified Memory Controllers in the system. */
static void compute_num_umcs(void)
{
u8 model = boot_cpu_data.x86_model;
if (boot_cpu_data.x86 < 0x17)
return;
if (model >= 0x30 && model <= 0x3f)
num_umcs = 8;
else
num_umcs = 2;
edac_dbg(1, "Number of UMCs: %x", num_umcs);
}
static int init_one_instance(unsigned int nid)
static int hw_info_get(struct amd64_pvt *pvt)
{
struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
struct amd64_family_type *fam_type = NULL;
struct mem_ctl_info *mci = NULL;
struct edac_mc_layer layers[2];
struct amd64_pvt *pvt = NULL;
u16 pci_id1, pci_id2;
int err = 0, ret;
ret = -ENOMEM;
pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL);
if (!pvt)
goto err_ret;
pvt->mc_node_id = nid;
pvt->F3 = F3;
ret = -EINVAL;
fam_type = per_family_init(pvt);
if (!fam_type)
goto err_free;
int ret = -EINVAL;
if (pvt->fam >= 0x17) {
pvt->umc = kcalloc(num_umcs, sizeof(struct amd64_umc), GFP_KERNEL);
if (!pvt->umc) {
ret = -ENOMEM;
goto err_free;
}
pvt->umc = kcalloc(fam_type->max_mcs, sizeof(struct amd64_umc), GFP_KERNEL);
if (!pvt->umc)
return -ENOMEM;
pci_id1 = fam_type->f0_id;
pci_id2 = fam_type->f6_id;
......@@ -3454,21 +3408,37 @@ static int init_one_instance(unsigned int nid)
pci_id2 = fam_type->f2_id;
}
err = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2);
if (err)
goto err_post_init;
ret = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2);
if (ret)
return ret;
read_mc_regs(pvt);
return 0;
}
static void hw_info_put(struct amd64_pvt *pvt)
{
if (pvt->F0 || pvt->F1)
free_mc_sibling_devs(pvt);
kfree(pvt->umc);
}
static int init_one_instance(struct amd64_pvt *pvt)
{
struct mem_ctl_info *mci = NULL;
struct edac_mc_layer layers[2];
int ret = -EINVAL;
/*
* We need to determine how many memory channels there are. Then use
* that information for calculating the size of the dynamic instance
* tables in the 'mci' structure.
*/
ret = -EINVAL;
pvt->channel_count = pvt->ops->early_channel_count(pvt);
if (pvt->channel_count < 0)
goto err_siblings;
return ret;
ret = -ENOMEM;
layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
......@@ -3480,24 +3450,18 @@ static int init_one_instance(unsigned int nid)
* Always allocate two channels since we can have setups with DIMMs on
* only one channel. Also, this simplifies handling later for the price
* of a couple of KBs tops.
*
* On Fam17h+, the number of controllers may be greater than two. So set
* the size equal to the maximum number of UMCs.
*/
if (pvt->fam >= 0x17)
layers[1].size = num_umcs;
else
layers[1].size = 2;
layers[1].size = fam_type->max_mcs;
layers[1].is_virt_csrow = false;
mci = edac_mc_alloc(nid, ARRAY_SIZE(layers), layers, 0);
mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
if (!mci)
goto err_siblings;
return ret;
mci->pvt_info = pvt;
mci->pdev = &pvt->F3->dev;
setup_mci_misc_attrs(mci, fam_type);
setup_mci_misc_attrs(mci);
if (init_csrows(mci))
mci->edac_cap = EDAC_FLAG_NONE;
......@@ -3505,31 +3469,30 @@ static int init_one_instance(unsigned int nid)
ret = -ENODEV;
if (edac_mc_add_mc_with_groups(mci, amd64_edac_attr_groups)) {
edac_dbg(1, "failed edac_mc_add_mc()\n");
goto err_add_mc;
edac_mc_free(mci);
return ret;
}
return 0;
}
err_add_mc:
edac_mc_free(mci);
err_siblings:
free_mc_sibling_devs(pvt);
err_post_init:
if (pvt->fam >= 0x17)
kfree(pvt->umc);
static bool instance_has_memory(struct amd64_pvt *pvt)
{
bool cs_enabled = false;
int cs = 0, dct = 0;
err_free:
kfree(pvt);
for (dct = 0; dct < fam_type->max_mcs; dct++) {
for_each_chip_select(cs, dct, pvt)
cs_enabled |= csrow_enabled(cs, dct, pvt);
}
err_ret:
return ret;
return cs_enabled;
}
static int probe_one_instance(unsigned int nid)
{
struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
struct amd64_pvt *pvt = NULL;
struct ecc_settings *s;
int ret;
......@@ -3540,8 +3503,29 @@ static int probe_one_instance(unsigned int nid)
ecc_stngs[nid] = s;
if (!ecc_enabled(F3, nid)) {
ret = 0;
pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL);
if (!pvt)
goto err_settings;
pvt->mc_node_id = nid;
pvt->F3 = F3;
fam_type = per_family_init(pvt);
if (!fam_type)
goto err_enable;
ret = hw_info_get(pvt);
if (ret < 0)
goto err_enable;
ret = 0;
if (!instance_has_memory(pvt)) {
amd64_info("Node %d: No DIMMs detected.\n", nid);
goto err_enable;
}
if (!ecc_enabled(pvt)) {
ret = -ENODEV;
if (!ecc_enable_override)
goto err_enable;
......@@ -3556,7 +3540,7 @@ static int probe_one_instance(unsigned int nid)
goto err_enable;
}
ret = init_one_instance(nid);
ret = init_one_instance(pvt);
if (ret < 0) {
amd64_err("Error probing instance: %d\n", nid);
......@@ -3566,9 +3550,15 @@ static int probe_one_instance(unsigned int nid)
goto err_enable;
}
dump_misc_regs(pvt);
return ret;
err_enable:
hw_info_put(pvt);
kfree(pvt);
err_settings:
kfree(s);
ecc_stngs[nid] = NULL;
......@@ -3595,14 +3585,13 @@ static void remove_one_instance(unsigned int nid)
restore_ecc_error_reporting(s, nid, F3);
free_mc_sibling_devs(pvt);
kfree(ecc_stngs[nid]);
ecc_stngs[nid] = NULL;
/* Free the EDAC CORE resources */
mci->pvt_info = NULL;
hw_info_put(pvt);
kfree(pvt);
edac_mc_free(mci);
}
......@@ -3668,8 +3657,6 @@ static int __init amd64_edac_init(void)
if (!msrs)
goto err_free;
compute_num_umcs();
for (i = 0; i < amd_nb_num(); i++) {
err = probe_one_instance(i);
if (err) {
......
......@@ -479,6 +479,8 @@ struct low_ops {
struct amd64_family_type {
const char *ctl_name;
u16 f0_id, f1_id, f2_id, f6_id;
/* Maximum number of memory controllers per die/node. */
u8 max_mcs;
struct low_ops ops;
};
......
......@@ -281,16 +281,11 @@ static int aspeed_probe(struct platform_device *pdev)
struct device *dev = &pdev->dev;
struct edac_mc_layer layers[2];
struct mem_ctl_info *mci;
struct resource *res;
void __iomem *regs;
u32 reg04;
int rc;
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!res)
return -ENOENT;
regs = devm_ioremap_resource(dev, res);
regs = devm_platform_ioremap_resource(pdev, 0);
if (IS_ERR(regs))
return PTR_ERR(regs);
......
......@@ -555,12 +555,16 @@ static inline int edac_device_get_panic_on_ue(struct edac_device_ctl_info
return edac_dev->panic_on_ue;
}
void edac_device_handle_ce(struct edac_device_ctl_info *edac_dev,
int inst_nr, int block_nr, const char *msg)
void edac_device_handle_ce_count(struct edac_device_ctl_info *edac_dev,
unsigned int count, int inst_nr, int block_nr,
const char *msg)
{
struct edac_device_instance *instance;
struct edac_device_block *block = NULL;
if (!count)
return;
if ((inst_nr >= edac_dev->nr_instances) || (inst_nr < 0)) {
edac_device_printk(edac_dev, KERN_ERR,
"INTERNAL ERROR: 'instance' out of range "
......@@ -582,27 +586,31 @@ void edac_device_handle_ce(struct edac_device_ctl_info *edac_dev,
if (instance->nr_blocks > 0) {
block = instance->blocks + block_nr;
block->counters.ce_count++;
block->counters.ce_count += count;
}
/* Propagate the count up the 'totals' tree */
instance->counters.ce_count++;
edac_dev->counters.ce_count++;
instance->counters.ce_count += count;
edac_dev->counters.ce_count += count;
if (edac_device_get_log_ce(edac_dev))
edac_device_printk(edac_dev, KERN_WARNING,
"CE: %s instance: %s block: %s '%s'\n",
edac_dev->ctl_name, instance->name,
block ? block->name : "N/A", msg);
"CE: %s instance: %s block: %s count: %d '%s'\n",
edac_dev->ctl_name, instance->name,
block ? block->name : "N/A", count, msg);
}
EXPORT_SYMBOL_GPL(edac_device_handle_ce);
EXPORT_SYMBOL_GPL(edac_device_handle_ce_count);
void edac_device_handle_ue(struct edac_device_ctl_info *edac_dev,
int inst_nr, int block_nr, const char *msg)
void edac_device_handle_ue_count(struct edac_device_ctl_info *edac_dev,
unsigned int count, int inst_nr, int block_nr,
const char *msg)
{
struct edac_device_instance *instance;
struct edac_device_block *block = NULL;
if (!count)
return;
if ((inst_nr >= edac_dev->nr_instances) || (inst_nr < 0)) {
edac_device_printk(edac_dev, KERN_ERR,
"INTERNAL ERROR: 'instance' out of range "
......@@ -624,22 +632,22 @@ void edac_device_handle_ue(struct edac_device_ctl_info *edac_dev,
if (instance->nr_blocks > 0) {
block = instance->blocks + block_nr;
block->counters.ue_count++;
block->counters.ue_count += count;
}
/* Propagate the count up the 'totals' tree */
instance->counters.ue_count++;
edac_dev->counters.ue_count++;
instance->counters.ue_count += count;
edac_dev->counters.ue_count += count;
if (edac_device_get_log_ue(edac_dev))
edac_device_printk(edac_dev, KERN_EMERG,
"UE: %s instance: %s block: %s '%s'\n",
edac_dev->ctl_name, instance->name,
block ? block->name : "N/A", msg);
"UE: %s instance: %s block: %s count: %d '%s'\n",
edac_dev->ctl_name, instance->name,
block ? block->name : "N/A", count, msg);
if (edac_device_get_panic_on_ue(edac_dev))
panic("EDAC %s: UE instance: %s block %s '%s'\n",
edac_dev->ctl_name, instance->name,
block ? block->name : "N/A", msg);
panic("EDAC %s: UE instance: %s block %s count: %d '%s'\n",
edac_dev->ctl_name, instance->name,
block ? block->name : "N/A", count, msg);
}
EXPORT_SYMBOL_GPL(edac_device_handle_ue);
EXPORT_SYMBOL_GPL(edac_device_handle_ue_count);
......@@ -286,27 +286,60 @@ extern int edac_device_add_device(struct edac_device_ctl_info *edac_dev);
extern struct edac_device_ctl_info *edac_device_del_device(struct device *dev);
/**
* edac_device_handle_ue():
* perform a common output and handling of an 'edac_dev' UE event
* Log correctable errors.
*
* @edac_dev: pointer to struct &edac_device_ctl_info
* @inst_nr: number of the instance where the UE error happened
* @block_nr: number of the block where the UE error happened
* @inst_nr: number of the instance where the CE error happened
* @count: Number of errors to log.
* @block_nr: number of the block where the CE error happened
* @msg: message to be printed
*/
void edac_device_handle_ce_count(struct edac_device_ctl_info *edac_dev,
unsigned int count, int inst_nr, int block_nr,
const char *msg);
/**
* Log uncorrectable errors.
*
* @edac_dev: pointer to struct &edac_device_ctl_info
* @inst_nr: number of the instance where the CE error happened
* @count: Number of errors to log.
* @block_nr: number of the block where the CE error happened
* @msg: message to be printed
*/
extern void edac_device_handle_ue(struct edac_device_ctl_info *edac_dev,
int inst_nr, int block_nr, const char *msg);
void edac_device_handle_ue_count(struct edac_device_ctl_info *edac_dev,
unsigned int count, int inst_nr, int block_nr,
const char *msg);
/**
* edac_device_handle_ce():
* perform a common output and handling of an 'edac_dev' CE event
* edac_device_handle_ce(): Log a single correctable error
*
* @edac_dev: pointer to struct &edac_device_ctl_info
* @inst_nr: number of the instance where the CE error happened
* @block_nr: number of the block where the CE error happened
* @msg: message to be printed
*/
extern void edac_device_handle_ce(struct edac_device_ctl_info *edac_dev,
int inst_nr, int block_nr, const char *msg);
static inline void
edac_device_handle_ce(struct edac_device_ctl_info *edac_dev, int inst_nr,
int block_nr, const char *msg)
{
edac_device_handle_ce_count(edac_dev, 1, inst_nr, block_nr, msg);
}
/**
* edac_device_handle_ue(): Log a single uncorrectable error
*
* @edac_dev: pointer to struct &edac_device_ctl_info
* @inst_nr: number of the instance where the UE error happened
* @block_nr: number of the block where the UE error happened
* @msg: message to be printed
*/
static inline void
edac_device_handle_ue(struct edac_device_ctl_info *edac_dev, int inst_nr,
int block_nr, const char *msg)
{
edac_device_handle_ue_count(edac_dev, 1, inst_nr, block_nr, msg);
}
/**
* edac_device_alloc_index: Allocate a unique device index number
......@@ -316,5 +349,4 @@ extern void edac_device_handle_ce(struct edac_device_ctl_info *edac_dev,
*/
extern int edac_device_alloc_index(void);
extern const char *edac_layer_name[];
#endif
......@@ -145,15 +145,18 @@ static void edac_mc_dump_channel(struct rank_info *chan)
edac_dbg(4, " channel->dimm = %p\n", chan->dimm);
}
static void edac_mc_dump_dimm(struct dimm_info *dimm, int number)
static void edac_mc_dump_dimm(struct dimm_info *dimm)
{
char location[80];
if (!dimm->nr_pages)
return;
edac_dimm_info_location(dimm, location, sizeof(location));
edac_dbg(4, "%s%i: %smapped as virtual row %d, chan %d\n",
dimm->mci->csbased ? "rank" : "dimm",
number, location, dimm->csrow, dimm->cschannel);
dimm->idx, location, dimm->csrow, dimm->cschannel);
edac_dbg(4, " dimm = %p\n", dimm);
edac_dbg(4, " dimm->label = '%s'\n", dimm->label);
edac_dbg(4, " dimm->nr_pages = 0x%x\n", dimm->nr_pages);
......@@ -314,25 +317,28 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
struct dimm_info *dimm;
u32 *ce_per_layer[EDAC_MAX_LAYERS], *ue_per_layer[EDAC_MAX_LAYERS];
unsigned int pos[EDAC_MAX_LAYERS];
unsigned int size, tot_dimms = 1, count = 1;
unsigned int idx, size, tot_dimms = 1, count = 1;
unsigned int tot_csrows = 1, tot_channels = 1, tot_errcount = 0;
void *pvt, *p, *ptr = NULL;
int i, j, row, chn, n, len, off;
int i, j, row, chn, n, len;
bool per_rank = false;
BUG_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0);
if (WARN_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0))
return NULL;
/*
* Calculate the total amount of dimms and csrows/cschannels while
* in the old API emulation mode
*/
for (i = 0; i < n_layers; i++) {
tot_dimms *= layers[i].size;
if (layers[i].is_virt_csrow)
tot_csrows *= layers[i].size;
for (idx = 0; idx < n_layers; idx++) {
tot_dimms *= layers[idx].size;
if (layers[idx].is_virt_csrow)
tot_csrows *= layers[idx].size;
else
tot_channels *= layers[i].size;
tot_channels *= layers[idx].size;
if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
if (layers[idx].type == EDAC_MC_LAYER_CHIP_SELECT)
per_rank = true;
}
......@@ -425,19 +431,15 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
memset(&pos, 0, sizeof(pos));
row = 0;
chn = 0;
for (i = 0; i < tot_dimms; i++) {
for (idx = 0; idx < tot_dimms; idx++) {
chan = mci->csrows[row]->channels[chn];
off = EDAC_DIMM_OFF(layer, n_layers, pos[0], pos[1], pos[2]);
if (off < 0 || off >= tot_dimms) {
edac_mc_printk(mci, KERN_ERR, "EDAC core bug: EDAC_DIMM_OFF is trying to do an illegal data access\n");
goto error;
}
dimm = kzalloc(sizeof(**mci->dimms), GFP_KERNEL);
if (!dimm)
goto error;
mci->dimms[off] = dimm;
mci->dimms[idx] = dimm;
dimm->mci = mci;
dimm->idx = idx;
/*
* Copy DIMM location and initialize it.
......@@ -714,6 +716,7 @@ int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
edac_mc_dump_mci(mci);
if (edac_debug_level >= 4) {
struct dimm_info *dimm;
int i;
for (i = 0; i < mci->nr_csrows; i++) {
......@@ -730,9 +733,9 @@ int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
if (csrow->channels[j]->dimm->nr_pages)
edac_mc_dump_channel(csrow->channels[j]);
}
for (i = 0; i < mci->tot_dimms; i++)
if (mci->dimms[i]->nr_pages)
edac_mc_dump_dimm(mci->dimms[i], i);
mci_for_each_dimm(mci, dimm)
edac_mc_dump_dimm(dimm);
}
#endif
mutex_lock(&mem_ctls_mutex);
......@@ -1055,6 +1058,21 @@ void edac_raw_mc_handle_error(const enum hw_event_mc_err_type type,
{
char detail[80];
int pos[EDAC_MAX_LAYERS] = { e->top_layer, e->mid_layer, e->low_layer };
u8 grain_bits;
/* Sanity-check driver-supplied grain value. */
if (WARN_ON_ONCE(!e->grain))
e->grain = 1;
grain_bits = fls_long(e->grain - 1);
/* Report the error via the trace interface */
if (IS_ENABLED(CONFIG_RAS))
trace_mc_event(type, e->msg, e->label, e->error_count,
mci->mc_idx, e->top_layer, e->mid_layer,
e->low_layer,
(e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
grain_bits, e->syndrome, e->other_detail);
/* Memory type dependent details about the error */
if (type == HW_EVENT_ERR_CORRECTED) {
......@@ -1090,11 +1108,11 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
const char *msg,
const char *other_detail)
{
struct dimm_info *dimm;
char *p;
int row = -1, chan = -1;
int pos[EDAC_MAX_LAYERS] = { top_layer, mid_layer, low_layer };
int i, n_labels = 0;
u8 grain_bits;
struct edac_raw_error_desc *e = &mci->error_desc;
edac_dbg(3, "MC%d\n", mci->mc_idx);
......@@ -1150,9 +1168,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
p = e->label;
*p = '\0';
for (i = 0; i < mci->tot_dimms; i++) {
struct dimm_info *dimm = mci->dimms[i];
mci_for_each_dimm(mci, dimm) {
if (top_layer >= 0 && top_layer != dimm->location[0])
continue;
if (mid_layer >= 0 && mid_layer != dimm->location[1])
......@@ -1170,37 +1186,37 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
* channel/memory controller/... may be affected.
* Also, don't show errors for empty DIMM slots.
*/
if (e->enable_per_layer_report && dimm->nr_pages) {
if (n_labels >= EDAC_MAX_LABELS) {
e->enable_per_layer_report = false;
break;
}
n_labels++;
if (p != e->label) {
strcpy(p, OTHER_LABEL);
p += strlen(OTHER_LABEL);
}
strcpy(p, dimm->label);
p += strlen(p);
*p = '\0';
if (!e->enable_per_layer_report || !dimm->nr_pages)
continue;
/*
* get csrow/channel of the DIMM, in order to allow
* incrementing the compat API counters
*/
edac_dbg(4, "%s csrows map: (%d,%d)\n",
mci->csbased ? "rank" : "dimm",
dimm->csrow, dimm->cschannel);
if (row == -1)
row = dimm->csrow;
else if (row >= 0 && row != dimm->csrow)
row = -2;
if (chan == -1)
chan = dimm->cschannel;
else if (chan >= 0 && chan != dimm->cschannel)
chan = -2;
if (n_labels >= EDAC_MAX_LABELS) {
e->enable_per_layer_report = false;
break;
}
n_labels++;
if (p != e->label) {
strcpy(p, OTHER_LABEL);
p += strlen(OTHER_LABEL);
}
strcpy(p, dimm->label);
p += strlen(p);
/*
* get csrow/channel of the DIMM, in order to allow
* incrementing the compat API counters
*/
edac_dbg(4, "%s csrows map: (%d,%d)\n",
mci->csbased ? "rank" : "dimm",
dimm->csrow, dimm->cschannel);
if (row == -1)
row = dimm->csrow;
else if (row >= 0 && row != dimm->csrow)
row = -2;
if (chan == -1)
chan = dimm->cschannel;
else if (chan >= 0 && chan != dimm->cschannel)
chan = -2;
}
if (!e->enable_per_layer_report) {
......@@ -1234,20 +1250,6 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
if (p > e->location)
*(p - 1) = '\0';
/* Sanity-check driver-supplied grain value. */
if (WARN_ON_ONCE(!e->grain))
e->grain = 1;
grain_bits = fls_long(e->grain - 1);
/* Report the error via the trace interface */
if (IS_ENABLED(CONFIG_RAS))
trace_mc_event(type, e->msg, e->label, e->error_count,
mci->mc_idx, e->top_layer, e->mid_layer,
e->low_layer,
(e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
grain_bits, e->syndrome, e->other_detail);
edac_raw_mc_handle_error(type, mci, e);
}
EXPORT_SYMBOL_GPL(edac_mc_handle_error);
......@@ -557,14 +557,8 @@ static ssize_t dimmdev_ce_count_show(struct device *dev,
{
struct dimm_info *dimm = to_dimm(dev);
u32 count;
int off;
off = EDAC_DIMM_OFF(dimm->mci->layers,
dimm->mci->n_layers,
dimm->location[0],
dimm->location[1],
dimm->location[2]);
count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][off];
count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][dimm->idx];
return sprintf(data, "%u\n", count);
}
......@@ -574,14 +568,8 @@ static ssize_t dimmdev_ue_count_show(struct device *dev,
{
struct dimm_info *dimm = to_dimm(dev);
u32 count;
int off;
off = EDAC_DIMM_OFF(dimm->mci->layers,
dimm->mci->n_layers,
dimm->location[0],
dimm->location[1],
dimm->location[2]);
count = dimm->mci->ue_per_layer[dimm->mci->n_layers-1][off];
count = dimm->mci->ue_per_layer[dimm->mci->n_layers-1][dimm->idx];
return sprintf(data, "%u\n", count);
}
......@@ -633,8 +621,7 @@ static const struct device_type dimm_attr_type = {
/* Create a DIMM object under specifed memory controller device */
static int edac_create_dimm_object(struct mem_ctl_info *mci,
struct dimm_info *dimm,
int index)
struct dimm_info *dimm)
{
int err;
dimm->mci = mci;
......@@ -644,9 +631,9 @@ static int edac_create_dimm_object(struct mem_ctl_info *mci,
dimm->dev.parent = &mci->dev;
if (mci->csbased)
dev_set_name(&dimm->dev, "rank%d", index);
dev_set_name(&dimm->dev, "rank%d", dimm->idx);
else
dev_set_name(&dimm->dev, "dimm%d", index);
dev_set_name(&dimm->dev, "dimm%d", dimm->idx);
dev_set_drvdata(&dimm->dev, dimm);
pm_runtime_forbid(&mci->dev);
......@@ -928,7 +915,8 @@ static const struct device_type mci_attr_type = {
int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
const struct attribute_group **groups)
{
int i, err;
struct dimm_info *dimm;
int err;
/* get the /sys/devices/system/edac subsys reference */
mci->dev.type = &mci_attr_type;
......@@ -952,13 +940,12 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
/*
* Create the dimm/rank devices
*/
for (i = 0; i < mci->tot_dimms; i++) {
struct dimm_info *dimm = mci->dimms[i];
mci_for_each_dimm(mci, dimm) {
/* Only expose populated DIMMs */
if (!dimm->nr_pages)
continue;
err = edac_create_dimm_object(mci, dimm, i);
err = edac_create_dimm_object(mci, dimm);
if (err)
goto fail_unregister_dimm;
}
......@@ -973,12 +960,9 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
return 0;
fail_unregister_dimm:
for (i--; i >= 0; i--) {
struct dimm_info *dimm = mci->dimms[i];
if (!dimm->nr_pages)
continue;
device_unregister(&dimm->dev);
mci_for_each_dimm(mci, dimm) {
if (device_is_registered(&dimm->dev))
device_unregister(&dimm->dev);
}
device_unregister(&mci->dev);
......@@ -990,7 +974,7 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
*/
void edac_remove_sysfs_mci_device(struct mem_ctl_info *mci)
{
int i;
struct dimm_info *dimm;
edac_dbg(0, "\n");
......@@ -1001,8 +985,7 @@ void edac_remove_sysfs_mci_device(struct mem_ctl_info *mci)
edac_delete_csrow_objects(mci);
#endif
for (i = 0; i < mci->tot_dimms; i++) {
struct dimm_info *dimm = mci->dimms[i];
mci_for_each_dimm(mci, dimm) {
if (dimm->nr_pages == 0)
continue;
edac_dbg(1, "unregistering device %s\n", dev_name(&dimm->dev));
......
......@@ -21,14 +21,22 @@ struct ghes_edac_pvt {
struct mem_ctl_info *mci;
/* Buffers for the error handling routine */
char detail_location[240];
char other_detail[160];
char other_detail[400];
char msg[80];
};
static atomic_t ghes_init = ATOMIC_INIT(0);
static refcount_t ghes_refcount = REFCOUNT_INIT(0);
/*
* Access to ghes_pvt must be protected by ghes_lock. The spinlock
* also provides the necessary (implicit) memory barrier for the SMP
* case to make the pointer visible on another CPU.
*/
static struct ghes_edac_pvt *ghes_pvt;
/* GHES registration mutex */
static DEFINE_MUTEX(ghes_reg_mutex);
/*
* Sync with other, potentially concurrent callers of
* ghes_edac_report_mem_error(). We don't know what the
......@@ -79,15 +87,15 @@ static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
(*num_dimm)++;
}
static int get_dimm_smbios_index(u16 handle)
static int get_dimm_smbios_index(struct mem_ctl_info *mci, u16 handle)
{
struct mem_ctl_info *mci = ghes_pvt->mci;
int i;
struct dimm_info *dimm;
for (i = 0; i < mci->tot_dimms; i++) {
if (mci->dimms[i]->smbios_handle == handle)
return i;
mci_for_each_dimm(mci, dimm) {
if (dimm->smbios_handle == handle)
return dimm->idx;
}
return -1;
}
......@@ -98,9 +106,7 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
if (dh->type == DMI_ENTRY_MEM_DEVICE) {
struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
mci->n_layers,
dimm_fill->count, 0, 0);
struct dimm_info *dimm = edac_get_dimm(mci, dimm_fill->count, 0, 0);
u16 rdr_mask = BIT(7) | BIT(13);
if (entry->size == 0xffff) {
......@@ -198,13 +204,9 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
enum hw_event_mc_err_type type;
struct edac_raw_error_desc *e;
struct mem_ctl_info *mci;
struct ghes_edac_pvt *pvt = ghes_pvt;
struct ghes_edac_pvt *pvt;
unsigned long flags;
char *p;
u8 grain_bits;
if (!pvt)
return;
/*
* We can do the locking below because GHES defers error processing
......@@ -216,12 +218,17 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
spin_lock_irqsave(&ghes_lock, flags);
pvt = ghes_pvt;
if (!pvt)
goto unlock;
mci = pvt->mci;
e = &mci->error_desc;
/* Cleans the error report buffer */
memset(e, 0, sizeof (*e));
e->error_count = 1;
e->grain = 1;
strcpy(e->label, "unknown label");
e->msg = pvt->msg;
e->other_detail = pvt->other_detail;
......@@ -311,13 +318,13 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
/* Error address */
if (mem_err->validation_bits & CPER_MEM_VALID_PA) {
e->page_frame_number = mem_err->physical_addr >> PAGE_SHIFT;
e->offset_in_page = mem_err->physical_addr & ~PAGE_MASK;
e->page_frame_number = PHYS_PFN(mem_err->physical_addr);
e->offset_in_page = offset_in_page(mem_err->physical_addr);
}
/* Error grain */
if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)
e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);
e->grain = ~mem_err->physical_addr_mask + 1;
/* Memory error location, mapped on e->location */
p = e->location;
......@@ -348,7 +355,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
mem_err->mem_dev_handle);
index = get_dimm_smbios_index(mem_err->mem_dev_handle);
index = get_dimm_smbios_index(mci, mem_err->mem_dev_handle);
if (index >= 0) {
e->top_layer = index;
e->enable_per_layer_report = true;
......@@ -360,6 +367,8 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
/* All other fields are mapped on e->other_detail */
p = pvt->other_detail;
p += snprintf(p, sizeof(pvt->other_detail),
"APEI location: %s ", e->location);
if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_STATUS) {
u64 status = mem_err->error_status;
......@@ -433,16 +442,9 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
if (p > pvt->other_detail)
*(p - 1) = '\0';
/* Generate the trace event */
grain_bits = fls_long(e->grain);
snprintf(pvt->detail_location, sizeof(pvt->detail_location),
"APEI location: %s %s", e->location, e->other_detail);
trace_mc_event(type, e->msg, e->label, e->error_count,
mci->mc_idx, e->top_layer, e->mid_layer, e->low_layer,
(e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
grain_bits, e->syndrome, pvt->detail_location);
edac_raw_mc_handle_error(type, mci, e);
unlock:
spin_unlock_irqrestore(&ghes_lock, flags);
}
......@@ -457,10 +459,12 @@ static struct acpi_platform_list plat_list[] = {
int ghes_edac_register(struct ghes *ghes, struct device *dev)
{
bool fake = false;
int rc, num_dimm = 0;
int rc = 0, num_dimm = 0;
struct mem_ctl_info *mci;
struct ghes_edac_pvt *pvt;
struct edac_mc_layer layers[1];
struct ghes_edac_dimm_fill dimm_fill;
unsigned long flags;
int idx = -1;
if (IS_ENABLED(CONFIG_X86)) {
......@@ -472,11 +476,14 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
idx = 0;
}
/* finish another registration/unregistration instance first */
mutex_lock(&ghes_reg_mutex);
/*
* We have only one logical memory controller to which all DIMMs belong.
*/
if (atomic_inc_return(&ghes_init) > 1)
return 0;
if (refcount_inc_not_zero(&ghes_refcount))
goto unlock;
/* Get the number of DIMMs */
dmi_walk(ghes_edac_count_dimms, &num_dimm);
......@@ -494,12 +501,13 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
if (!mci) {
pr_info("Can't allocate memory for EDAC data\n");
return -ENOMEM;
rc = -ENOMEM;
goto unlock;
}
ghes_pvt = mci->pvt_info;
ghes_pvt->ghes = ghes;
ghes_pvt->mci = mci;
pvt = mci->pvt_info;
pvt->ghes = ghes;
pvt->mci = mci;
mci->pdev = dev;
mci->mtype_cap = MEM_FLAG_EMPTY;
......@@ -527,8 +535,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
dimm_fill.mci = mci;
dmi_walk(ghes_edac_dmidecode, &dimm_fill);
} else {
struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
mci->n_layers, 0, 0, 0);
struct dimm_info *dimm = edac_get_dimm(mci, 0, 0, 0);
dimm->nr_pages = 1;
dimm->grain = 128;
......@@ -541,23 +548,48 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
if (rc < 0) {
pr_info("Can't register at EDAC core\n");
edac_mc_free(mci);
return -ENODEV;
rc = -ENODEV;
goto unlock;
}
return 0;
spin_lock_irqsave(&ghes_lock, flags);
ghes_pvt = pvt;
spin_unlock_irqrestore(&ghes_lock, flags);
/* only set on success */
refcount_set(&ghes_refcount, 1);
unlock:
mutex_unlock(&ghes_reg_mutex);
return rc;
}
void ghes_edac_unregister(struct ghes *ghes)
{
struct mem_ctl_info *mci;
unsigned long flags;
if (!ghes_pvt)
return;
mutex_lock(&ghes_reg_mutex);
if (atomic_dec_return(&ghes_init))
return;
if (!refcount_dec_and_test(&ghes_refcount))
goto unlock;
mci = ghes_pvt->mci;
/*
* Wait for the irq handler being finished.
*/
spin_lock_irqsave(&ghes_lock, flags);
mci = ghes_pvt ? ghes_pvt->mci : NULL;
ghes_pvt = NULL;
edac_mc_del_mc(mci->pdev);
edac_mc_free(mci);
spin_unlock_irqrestore(&ghes_lock, flags);
if (!mci)
goto unlock;
mci = edac_mc_del_mc(mci->pdev);
if (mci)
edac_mc_free(mci);
unlock:
mutex_unlock(&ghes_reg_mutex);
}
......@@ -154,8 +154,7 @@ static int i10nm_get_dimm_config(struct mem_ctl_info *mci)
ndimms = 0;
for (j = 0; j < I10NM_NUM_DIMMS; j++) {
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
mci->n_layers, i, j, 0);
dimm = edac_get_dimm(mci, i, j, 0);
mtr = I10NM_GET_DIMMMTR(imc, i, j);
mcddrtcfg = I10NM_GET_MCDDRTCFG(imc, i, j);
edac_dbg(1, "dimmmtr 0x%x mcddrtcfg 0x%x (mc%d ch%d dimm%d)\n",
......
......@@ -392,8 +392,7 @@ static int i3200_probe1(struct pci_dev *pdev, int dev_idx)
unsigned long nr_pages;
for (j = 0; j < nr_channels; j++) {
struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
mci->n_layers, i, j, 0);
struct dimm_info *dimm = edac_get_dimm(mci, i, j, 0);
nr_pages = drb_to_nr_pages(drbs, stacked, j, i);
if (nr_pages == 0)
......
......@@ -1275,9 +1275,8 @@ static int i5000_init_csrows(struct mem_ctl_info *mci)
if (!MTR_DIMMS_PRESENT(mtr))
continue;
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers,
channel / MAX_BRANCHES,
channel % MAX_BRANCHES, slot);
dimm = edac_get_dimm(mci, channel / MAX_BRANCHES,
channel % MAX_BRANCHES, slot);
csrow_megs = pvt->dimm_info[slot][channel].megabytes;
dimm->grain = 8;
......
......@@ -713,7 +713,6 @@ static int i5100_read_spd_byte(const struct mem_ctl_info *mci,
{
struct i5100_priv *priv = mci->pvt_info;
u16 w;
unsigned long et;
pci_read_config_word(priv->mc, I5100_SPDDATA, &w);
if (i5100_spddata_busy(w))
......@@ -724,7 +723,6 @@ static int i5100_read_spd_byte(const struct mem_ctl_info *mci,
0, 0));
/* wait up to 100ms */
et = jiffies + HZ / 10;
udelay(100);
while (1) {
pci_read_config_word(priv->mc, I5100_SPDDATA, &w);
......@@ -848,21 +846,17 @@ static void i5100_init_interleaving(struct pci_dev *pdev,
static void i5100_init_csrows(struct mem_ctl_info *mci)
{
int i;
struct i5100_priv *priv = mci->pvt_info;
struct dimm_info *dimm;
for (i = 0; i < mci->tot_dimms; i++) {
struct dimm_info *dimm;
const unsigned long npages = i5100_npages(mci, i);
const unsigned int chan = i5100_csrow_to_chan(mci, i);
const unsigned int rank = i5100_csrow_to_rank(mci, i);
mci_for_each_dimm(mci, dimm) {
const unsigned long npages = i5100_npages(mci, dimm->idx);
const unsigned int chan = i5100_csrow_to_chan(mci, dimm->idx);
const unsigned int rank = i5100_csrow_to_rank(mci, dimm->idx);
if (!npages)
continue;
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers,
chan, rank, 0);
dimm->nr_pages = npages;
dimm->grain = 32;
dimm->dtype = (priv->mtr[chan][rank].width == 4) ?
......
......@@ -548,8 +548,8 @@ static void i5400_proccess_non_recoverable_info(struct mem_ctl_info *mci,
ras = nrec_ras(info);
cas = nrec_cas(info);
edac_dbg(0, "\t\tDIMM= %d Channels= %d,%d (Branch= %d DRAM Bank= %d Buffer ID = %d rdwr= %s ras= %d cas= %d)\n",
rank, channel, channel + 1, branch >> 1, bank,
edac_dbg(0, "\t\t%s DIMM= %d Channels= %d,%d (Branch= %d DRAM Bank= %d Buffer ID = %d rdwr= %s ras= %d cas= %d)\n",
type, rank, channel, channel + 1, branch >> 1, bank,
buf_id, rdwr_str(rdwr), ras, cas);
/* Only 1 bit will be on */
......@@ -1054,8 +1054,6 @@ static void i5400_get_mc_regs(struct mem_ctl_info *mci)
u32 actual_tolm;
u16 limit;
int slot_row;
int maxch;
int maxdimmperch;
int way0, way1;
pvt = mci->pvt_info;
......@@ -1065,9 +1063,6 @@ static void i5400_get_mc_regs(struct mem_ctl_info *mci)
pci_read_config_dword(pvt->system_address, AMBASE + sizeof(u32),
&pvt->u.ambase_top);
maxdimmperch = pvt->maxdimmperch;
maxch = pvt->maxch;
edac_dbg(2, "AMBASE= 0x%lx MAXCH= %d MAX-DIMM-Per-CH= %d\n",
(long unsigned int)pvt->ambase, pvt->maxch, pvt->maxdimmperch);
......@@ -1170,17 +1165,13 @@ static int i5400_init_dimms(struct mem_ctl_info *mci)
{
struct i5400_pvt *pvt;
struct dimm_info *dimm;
int ndimms, channel_count;
int max_dimms;
int ndimms;
int mtr;
int size_mb;
int channel, slot;
pvt = mci->pvt_info;
channel_count = pvt->maxch;
max_dimms = pvt->maxdimmperch;
ndimms = 0;
/*
......@@ -1196,8 +1187,7 @@ static int i5400_init_dimms(struct mem_ctl_info *mci)
if (!MTR_DIMMS_PRESENT(mtr))
continue;
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers,
channel / 2, channel % 2, slot);
dimm = edac_get_dimm(mci, channel / 2, channel % 2, slot);
size_mb = pvt->dimm_info[slot][channel].megabytes;
......
......@@ -580,7 +580,7 @@ static void i7300_enable_error_reporting(struct mem_ctl_info *mci)
* @ch: Channel number within the branch (0 or 1)
* @branch: Branch number (0 or 1)
* @dinfo: Pointer to DIMM info where dimm size is stored
* @p_csrow: Pointer to the struct csrow_info that corresponds to that element
* @dimm: Pointer to the struct dimm_info that corresponds to that element
*/
static int decode_mtr(struct i7300_pvt *pvt,
int slot, int ch, int branch,
......@@ -794,8 +794,7 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
for (ch = 0; ch < max_channel; ch++) {
int channel = to_channel(ch, branch);
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
mci->n_layers, branch, ch, slot);
dimm = edac_get_dimm(mci, branch, ch, slot);
dinfo = &pvt->dimm_info[slot][channel];
......@@ -817,7 +816,7 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
/**
* decode_mir() - Decodes Memory Interleave Register (MIR) info
* @int mir_no: number of the MIR register to decode
* @mir_no: number of the MIR register to decode
* @mir: array with the MIR data cached on the driver
*/
static void decode_mir(int mir_no, u16 mir[MAX_MIR])
......
......@@ -585,8 +585,7 @@ static int get_dimm_config(struct mem_ctl_info *mci)
if (!DIMM_PRESENT(dimm_dod[j]))
continue;
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers,
i, j, 0);
dimm = edac_get_dimm(mci, i, j, 0);
banks = numbank(MC_DOD_NUMBANK(dimm_dod[j]));
ranks = numrank(MC_DOD_NUMRANK(dimm_dod[j]));
rows = numrow(MC_DOD_NUMROW(dimm_dod[j]));
......
......@@ -490,9 +490,7 @@ static int ie31200_probe1(struct pci_dev *pdev, int dev_idx)
if (dimm_info[j][i].dual_rank) {
nr_pages = nr_pages / 2;
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
mci->n_layers, (i * 2) + 1,
j, 0);
dimm = edac_get_dimm(mci, (i * 2) + 1, j, 0);
dimm->nr_pages = nr_pages;
edac_dbg(0, "set nr pages: 0x%lx\n", nr_pages);
dimm->grain = 8; /* just a guess */
......@@ -503,8 +501,7 @@ static int ie31200_probe1(struct pci_dev *pdev, int dev_idx)
dimm->dtype = DEV_UNKNOWN;
dimm->edac_mode = EDAC_UNKNOWN;
}
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
mci->n_layers, i * 2, j, 0);
dimm = edac_get_dimm(mci, i * 2, j, 0);
dimm->nr_pages = nr_pages;
edac_dbg(0, "set nr pages: 0x%lx\n", nr_pages);
dimm->grain = 8; /* same guess */
......
......@@ -1231,7 +1231,7 @@ static void apl_get_dimm_config(struct mem_ctl_info *mci)
if (!(chan_mask & BIT(i)))
continue;
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, i, 0, 0);
dimm = edac_get_dimm(mci, i, 0, 0);
if (!dimm) {
edac_dbg(0, "No allocated DIMM for channel %d\n", i);
continue;
......@@ -1311,7 +1311,7 @@ static void dnv_get_dimm_config(struct mem_ctl_info *mci)
if (!ranks_of_dimm[j])
continue;
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, i, j, 0);
dimm = edac_get_dimm(mci, i, j, 0);
if (!dimm) {
edac_dbg(0, "No allocated DIMM for channel %d DIMM %d\n", i, j);
continue;
......
......@@ -254,18 +254,20 @@ static const u32 rir_offset[MAX_RIR_RANGES][MAX_RIR_WAY] = {
* FIXME: Implement the error count reads directly
*/
static const u32 correrrcnt[] = {
0x104, 0x108, 0x10c, 0x110,
};
#define RANK_ODD_OV(reg) GET_BITFIELD(reg, 31, 31)
#define RANK_ODD_ERR_CNT(reg) GET_BITFIELD(reg, 16, 30)
#define RANK_EVEN_OV(reg) GET_BITFIELD(reg, 15, 15)
#define RANK_EVEN_ERR_CNT(reg) GET_BITFIELD(reg, 0, 14)
#if 0 /* Currently unused*/
static const u32 correrrcnt[] = {
0x104, 0x108, 0x10c, 0x110,
};
static const u32 correrrthrsld[] = {
0x11c, 0x120, 0x124, 0x128,
};
#endif
#define RANK_ODD_ERR_THRSLD(reg) GET_BITFIELD(reg, 16, 30)
#define RANK_EVEN_ERR_THRSLD(reg) GET_BITFIELD(reg, 0, 14)
......@@ -1340,7 +1342,7 @@ static void knl_show_mc_route(u32 reg, char *s)
*/
static int knl_get_dimm_capacity(struct sbridge_pvt *pvt, u64 *mc_sizes)
{
u64 sad_base, sad_size, sad_limit = 0;
u64 sad_base, sad_limit = 0;
u64 tad_base, tad_size, tad_limit, tad_deadspace, tad_livespace;
int sad_rule = 0;
int tad_rule = 0;
......@@ -1427,7 +1429,6 @@ static int knl_get_dimm_capacity(struct sbridge_pvt *pvt, u64 *mc_sizes)
edram_only = KNL_EDRAM_ONLY(dram_rule);
sad_limit = pvt->info.sad_limit(dram_rule)+1;
sad_size = sad_limit - sad_base;
pci_read_config_dword(pvt->pci_sad0,
pvt->info.interleave_list[sad_rule], &interleave_reg);
......@@ -1620,7 +1621,7 @@ static int __populate_dimms(struct mem_ctl_info *mci,
}
for (j = 0; j < max_dimms_per_channel; j++) {
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, i, j, 0);
dimm = edac_get_dimm(mci, i, j, 0);
if (pvt->info.type == KNIGHTS_LANDING) {
pci_read_config_dword(pvt->knl.pci_channel[i],
knl_mtr_reg, &mtr);
......@@ -2952,7 +2953,7 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
struct mem_ctl_info *new_mci;
struct sbridge_pvt *pvt = mci->pvt_info;
enum hw_event_mc_err_type tp_event;
char *type, *optype, msg[256];
char *optype, msg[256];
bool ripv = GET_BITFIELD(m->mcgstatus, 0, 0);
bool overflow = GET_BITFIELD(m->status, 62, 62);
bool uncorrected_error = GET_BITFIELD(m->status, 61, 61);
......@@ -2981,14 +2982,11 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
if (uncorrected_error) {
core_err_cnt = 1;
if (ripv) {
type = "FATAL";
tp_event = HW_EVENT_ERR_FATAL;
} else {
type = "NON_FATAL";
tp_event = HW_EVENT_ERR_UNCORRECTED;
}
} else {
type = "CORRECTED";
tp_event = HW_EVENT_ERR_CORRECTED;
}
......@@ -3200,7 +3198,6 @@ static struct notifier_block sbridge_mce_dec = {
static void sbridge_unregister_mci(struct sbridge_dev *sbridge_dev)
{
struct mem_ctl_info *mci = sbridge_dev->mci;
struct sbridge_pvt *pvt;
if (unlikely(!mci || !mci->pvt_info)) {
edac_dbg(0, "MC: dev = %p\n", &sbridge_dev->pdev[0]->dev);
......@@ -3209,8 +3206,6 @@ static void sbridge_unregister_mci(struct sbridge_dev *sbridge_dev)
return;
}
pvt = mci->pvt_info;
edac_dbg(0, "MC: mci = %p, dev = %p\n",
mci, &sbridge_dev->pdev[0]->dev);
......
......@@ -46,7 +46,8 @@ static struct skx_dev *get_skx_dev(struct pci_bus *bus, u8 idx)
}
enum munittype {
CHAN0, CHAN1, CHAN2, SAD_ALL, UTIL_ALL, SAD
CHAN0, CHAN1, CHAN2, SAD_ALL, UTIL_ALL, SAD,
ERRCHAN0, ERRCHAN1, ERRCHAN2,
};
struct munit {
......@@ -68,6 +69,9 @@ static const struct munit skx_all_munits[] = {
{ 0x2040, { PCI_DEVFN(10, 0), PCI_DEVFN(12, 0) }, 2, 2, CHAN0 },
{ 0x2044, { PCI_DEVFN(10, 4), PCI_DEVFN(12, 4) }, 2, 2, CHAN1 },
{ 0x2048, { PCI_DEVFN(11, 0), PCI_DEVFN(13, 0) }, 2, 2, CHAN2 },
{ 0x2043, { PCI_DEVFN(10, 3), PCI_DEVFN(12, 3) }, 2, 2, ERRCHAN0 },
{ 0x2047, { PCI_DEVFN(10, 7), PCI_DEVFN(12, 7) }, 2, 2, ERRCHAN1 },
{ 0x204b, { PCI_DEVFN(11, 3), PCI_DEVFN(13, 3) }, 2, 2, ERRCHAN2 },
{ 0x208e, { }, 1, 0, SAD },
{ }
};
......@@ -104,10 +108,18 @@ static int get_all_munits(const struct munit *m)
}
switch (m->mtype) {
case CHAN0: case CHAN1: case CHAN2:
case CHAN0:
case CHAN1:
case CHAN2:
pci_dev_get(pdev);
d->imc[i].chan[m->mtype].cdev = pdev;
break;
case ERRCHAN0:
case ERRCHAN1:
case ERRCHAN2:
pci_dev_get(pdev);
d->imc[i].chan[m->mtype - ERRCHAN0].edev = pdev;
break;
case SAD_ALL:
pci_dev_get(pdev);
d->sad_all = pdev;
......@@ -177,8 +189,7 @@ static int skx_get_dimm_config(struct mem_ctl_info *mci)
pci_read_config_dword(imc->chan[i].cdev, 0x8C, &amap);
pci_read_config_dword(imc->chan[i].cdev, 0x400, &mcddrtcfg);
for (j = 0; j < SKX_NUM_DIMMS; j++) {
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
mci->n_layers, i, j, 0);
dimm = edac_get_dimm(mci, i, j, 0);
pci_read_config_dword(imc->chan[i].cdev,
0x80 + 4 * j, &mtr);
if (IS_DIMM_PRESENT(mtr)) {
......@@ -216,6 +227,39 @@ static int skx_get_dimm_config(struct mem_ctl_info *mci)
#define SKX_ILV_REMOTE(tgt) (((tgt) & 8) == 0)
#define SKX_ILV_TARGET(tgt) ((tgt) & 7)
static void skx_show_retry_rd_err_log(struct decoded_addr *res,
char *msg, int len)
{
u32 log0, log1, log2, log3, log4;
u32 corr0, corr1, corr2, corr3;
struct pci_dev *edev;
int n;
edev = res->dev->imc[res->imc].chan[res->channel].edev;
pci_read_config_dword(edev, 0x154, &log0);
pci_read_config_dword(edev, 0x148, &log1);
pci_read_config_dword(edev, 0x150, &log2);
pci_read_config_dword(edev, 0x15c, &log3);
pci_read_config_dword(edev, 0x114, &log4);
n = snprintf(msg, len, " retry_rd_err_log[%.8x %.8x %.8x %.8x %.8x]",
log0, log1, log2, log3, log4);
pci_read_config_dword(edev, 0x104, &corr0);
pci_read_config_dword(edev, 0x108, &corr1);
pci_read_config_dword(edev, 0x10c, &corr2);
pci_read_config_dword(edev, 0x110, &corr3);
if (len - n > 0)
snprintf(msg + n, len - n,
" correrrcnt[%.4x %.4x %.4x %.4x %.4x %.4x %.4x %.4x]",
corr0 & 0xffff, corr0 >> 16,
corr1 & 0xffff, corr1 >> 16,
corr2 & 0xffff, corr2 >> 16,
corr3 & 0xffff, corr3 >> 16);
}
static bool skx_sad_decode(struct decoded_addr *res)
{
struct skx_dev *d = list_first_entry(skx_edac_list, typeof(*d), list);
......@@ -659,7 +703,7 @@ static int __init skx_init(void)
}
}
skx_set_decode(skx_decode);
skx_set_decode(skx_decode, skx_show_retry_rd_err_log);
if (nvdimm_count && skx_adxl_get() == -ENODEV)
skx_printk(KERN_NOTICE, "Only decoding DDR4 address!\n");
......
......@@ -37,6 +37,7 @@ static char *adxl_msg;
static char skx_msg[MSG_SIZE];
static skx_decode_f skx_decode;
static skx_show_retry_log_f skx_show_retry_rd_err_log;
static u64 skx_tolm, skx_tohm;
static LIST_HEAD(dev_edac_list);
......@@ -100,6 +101,7 @@ void __exit skx_adxl_put(void)
static bool skx_adxl_decode(struct decoded_addr *res)
{
struct skx_dev *d;
int i, len = 0;
if (res->addr >= skx_tohm || (res->addr >= skx_tolm &&
......@@ -118,6 +120,24 @@ static bool skx_adxl_decode(struct decoded_addr *res)
res->channel = (int)adxl_values[component_indices[INDEX_CHANNEL]];
res->dimm = (int)adxl_values[component_indices[INDEX_DIMM]];
if (res->imc > NUM_IMC - 1) {
skx_printk(KERN_ERR, "Bad imc %d\n", res->imc);
return false;
}
list_for_each_entry(d, &dev_edac_list, list) {
if (d->imc[0].src_id == res->socket) {
res->dev = d;
break;
}
}
if (!res->dev) {
skx_printk(KERN_ERR, "No device for src_id %d imc %d\n",
res->socket, res->imc);
return false;
}
for (i = 0; i < adxl_component_count; i++) {
if (adxl_values[i] == ~0x0ull)
continue;
......@@ -131,9 +151,10 @@ static bool skx_adxl_decode(struct decoded_addr *res)
return true;
}
void skx_set_decode(skx_decode_f decode)
void skx_set_decode(skx_decode_f decode, skx_show_retry_log_f show_retry_log)
{
skx_decode = decode;
skx_show_retry_rd_err_log = show_retry_log;
}
int skx_get_src_id(struct skx_dev *d, int off, u8 *id)
......@@ -452,34 +473,17 @@ static void skx_unregister_mci(struct skx_imc *imc)
edac_mc_free(mci);
}
static struct mem_ctl_info *get_mci(int src_id, int lmc)
{
struct skx_dev *d;
if (lmc > NUM_IMC - 1) {
skx_printk(KERN_ERR, "Bad lmc %d\n", lmc);
return NULL;
}
list_for_each_entry(d, &dev_edac_list, list) {
if (d->imc[0].src_id == src_id)
return d->imc[lmc].mci;
}
skx_printk(KERN_ERR, "No mci for src_id %d lmc %d\n", src_id, lmc);
return NULL;
}
static void skx_mce_output_error(struct mem_ctl_info *mci,
const struct mce *m,
struct decoded_addr *res)
{
enum hw_event_mc_err_type tp_event;
char *type, *optype;
char *optype;
bool ripv = GET_BITFIELD(m->mcgstatus, 0, 0);
bool overflow = GET_BITFIELD(m->status, 62, 62);
bool uncorrected_error = GET_BITFIELD(m->status, 61, 61);
bool recoverable;
int len;
u32 core_err_cnt = GET_BITFIELD(m->status, 38, 52);
u32 mscod = GET_BITFIELD(m->status, 16, 31);
u32 errcode = GET_BITFIELD(m->status, 0, 15);
......@@ -490,14 +494,11 @@ static void skx_mce_output_error(struct mem_ctl_info *mci,
if (uncorrected_error) {
core_err_cnt = 1;
if (ripv) {
type = "FATAL";
tp_event = HW_EVENT_ERR_FATAL;
} else {
type = "NON_FATAL";
tp_event = HW_EVENT_ERR_UNCORRECTED;
}
} else {
type = "CORRECTED";
tp_event = HW_EVENT_ERR_CORRECTED;
}
......@@ -539,12 +540,12 @@ static void skx_mce_output_error(struct mem_ctl_info *mci,
}
}
if (adxl_component_count) {
snprintf(skx_msg, MSG_SIZE, "%s%s err_code:0x%04x:0x%04x %s",
len = snprintf(skx_msg, MSG_SIZE, "%s%s err_code:0x%04x:0x%04x %s",
overflow ? " OVERFLOW" : "",
(uncorrected_error && recoverable) ? " recoverable" : "",
mscod, errcode, adxl_msg);
} else {
snprintf(skx_msg, MSG_SIZE,
len = snprintf(skx_msg, MSG_SIZE,
"%s%s err_code:0x%04x:0x%04x socket:%d imc:%d rank:%d bg:%d ba:%d row:0x%x col:0x%x",
overflow ? " OVERFLOW" : "",
(uncorrected_error && recoverable) ? " recoverable" : "",
......@@ -553,6 +554,9 @@ static void skx_mce_output_error(struct mem_ctl_info *mci,
res->bank_group, res->bank_address, res->row, res->column);
}
if (skx_show_retry_rd_err_log)
skx_show_retry_rd_err_log(res, skx_msg + len, MSG_SIZE - len);
edac_dbg(0, "%s\n", skx_msg);
/* Call the helper to output message */
......@@ -583,15 +587,12 @@ int skx_mce_check_error(struct notifier_block *nb, unsigned long val,
if (adxl_component_count) {
if (!skx_adxl_decode(&res))
return NOTIFY_DONE;
mci = get_mci(res.socket, res.imc);
} else {
if (!skx_decode || !skx_decode(&res))
return NOTIFY_DONE;
mci = res.dev->imc[res.imc].mci;
} else if (!skx_decode || !skx_decode(&res)) {
return NOTIFY_DONE;
}
mci = res.dev->imc[res.imc].mci;
if (!mci)
return NOTIFY_DONE;
......
......@@ -64,6 +64,7 @@ struct skx_dev {
u8 src_id, node_id;
struct skx_channel {
struct pci_dev *cdev;
struct pci_dev *edev;
struct skx_dimm {
u8 close_pg;
u8 bank_xor_enable;
......@@ -113,10 +114,11 @@ struct decoded_addr {
typedef int (*get_dimm_config_f)(struct mem_ctl_info *mci);
typedef bool (*skx_decode_f)(struct decoded_addr *res);
typedef void (*skx_show_retry_log_f)(struct decoded_addr *res, char *msg, int len);
int __init skx_adxl_get(void);
void __exit skx_adxl_put(void);
void skx_set_decode(skx_decode_f decode);
void skx_set_decode(skx_decode_f decode, skx_show_retry_log_f show_retry_log);
int skx_get_src_id(struct skx_dev *d, int off, u8 *id);
int skx_get_node_id(struct skx_dev *d, u8 *id);
......
......@@ -135,7 +135,7 @@ static void ti_edac_setup_dimm(struct mem_ctl_info *mci, u32 type)
u32 val;
u32 memsize;
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, 0, 0, 0);
dimm = edac_get_dimm(mci, 0, 0, 0);
val = ti_edac_readl(edac, EMIF_SDRAM_CONFIG);
......
......@@ -362,78 +362,6 @@ struct edac_mc_layer {
*/
#define EDAC_MAX_LAYERS 3
/**
* EDAC_DIMM_OFF - Macro responsible to get a pointer offset inside a pointer
* array for the element given by [layer0,layer1,layer2]
* position
*
* @layers: a struct edac_mc_layer array, describing how many elements
* were allocated for each layer
* @nlayers: Number of layers at the @layers array
* @layer0: layer0 position
* @layer1: layer1 position. Unused if n_layers < 2
* @layer2: layer2 position. Unused if n_layers < 3
*
* For 1 layer, this macro returns "var[layer0] - var";
*
* For 2 layers, this macro is similar to allocate a bi-dimensional array
* and to return "var[layer0][layer1] - var";
*
* For 3 layers, this macro is similar to allocate a tri-dimensional array
* and to return "var[layer0][layer1][layer2] - var".
*
* A loop could be used here to make it more generic, but, as we only have
* 3 layers, this is a little faster.
*
* By design, layers can never be 0 or more than 3. If that ever happens,
* a NULL is returned, causing an OOPS during the memory allocation routine,
* with would point to the developer that he's doing something wrong.
*/
#define EDAC_DIMM_OFF(layers, nlayers, layer0, layer1, layer2) ({ \
int __i; \
if ((nlayers) == 1) \
__i = layer0; \
else if ((nlayers) == 2) \
__i = (layer1) + ((layers[1]).size * (layer0)); \
else if ((nlayers) == 3) \
__i = (layer2) + ((layers[2]).size * ((layer1) + \
((layers[1]).size * (layer0)))); \
else \
__i = -EINVAL; \
__i; \
})
/**
* EDAC_DIMM_PTR - Macro responsible to get a pointer inside a pointer array
* for the element given by [layer0,layer1,layer2] position
*
* @layers: a struct edac_mc_layer array, describing how many elements
* were allocated for each layer
* @var: name of the var where we want to get the pointer
* (like mci->dimms)
* @nlayers: Number of layers at the @layers array
* @layer0: layer0 position
* @layer1: layer1 position. Unused if n_layers < 2
* @layer2: layer2 position. Unused if n_layers < 3
*
* For 1 layer, this macro returns "var[layer0]";
*
* For 2 layers, this macro is similar to allocate a bi-dimensional array
* and to return "var[layer0][layer1]";
*
* For 3 layers, this macro is similar to allocate a tri-dimensional array
* and to return "var[layer0][layer1][layer2]";
*/
#define EDAC_DIMM_PTR(layers, var, nlayers, layer0, layer1, layer2) ({ \
typeof(*var) __p; \
int ___i = EDAC_DIMM_OFF(layers, nlayers, layer0, layer1, layer2); \
if (___i < 0) \
__p = NULL; \
else \
__p = (var)[___i]; \
__p; \
})
struct dimm_info {
struct device dev;
......@@ -443,6 +371,7 @@ struct dimm_info {
unsigned int location[EDAC_MAX_LAYERS];
struct mem_ctl_info *mci; /* the parent */
unsigned int idx; /* index within the parent dimm array */
u32 grain; /* granularity of reported error in bytes */
enum dev_type dtype; /* memory device type */
......@@ -528,15 +457,10 @@ struct errcount_attribute_data {
* (typically, a memory controller error)
*/
struct edac_raw_error_desc {
/*
* NOTE: everything before grain won't be cleaned by
* edac_raw_error_desc_clean()
*/
char location[LOCATION_SIZE];
char label[(EDAC_MC_LABEL_LEN + 1 + sizeof(OTHER_LABEL)) * EDAC_MAX_LABELS];
long grain;
/* the vars below and grain will be cleaned on every new error report */
u16 error_count;
int top_layer;
int mid_layer;
......@@ -669,4 +593,70 @@ struct mem_ctl_info {
bool fake_inject_ue;
u16 fake_inject_count;
};
#endif
#define mci_for_each_dimm(mci, dimm) \
for ((dimm) = (mci)->dimms[0]; \
(dimm); \
(dimm) = (dimm)->idx + 1 < (mci)->tot_dimms \
? (mci)->dimms[(dimm)->idx + 1] \
: NULL)
/**
* edac_get_dimm_by_index - Get DIMM info at @index from a memory
* controller
*
* @mci: MC descriptor struct mem_ctl_info
* @index: index in the memory controller's DIMM array
*
* Returns a struct dimm_info * or NULL on failure.
*/
static inline struct dimm_info *
edac_get_dimm_by_index(struct mem_ctl_info *mci, int index)
{
if (index < 0 || index >= mci->tot_dimms)
return NULL;
if (WARN_ON_ONCE(mci->dimms[index]->idx != index))
return NULL;
return mci->dimms[index];
}
/**
* edac_get_dimm - Get DIMM info from a memory controller given by
* [layer0,layer1,layer2] position
*
* @mci: MC descriptor struct mem_ctl_info
* @layer0: layer0 position
* @layer1: layer1 position. Unused if n_layers < 2
* @layer2: layer2 position. Unused if n_layers < 3
*
* For 1 layer, this function returns "dimms[layer0]";
*
* For 2 layers, this function is similar to allocating a two-dimensional
* array and returning "dimms[layer0][layer1]";
*
* For 3 layers, this function is similar to allocating a tri-dimensional
* array and returning "dimms[layer0][layer1][layer2]";
*/
static inline struct dimm_info *edac_get_dimm(struct mem_ctl_info *mci,
int layer0, int layer1, int layer2)
{
int index;
if (layer0 < 0
|| (mci->n_layers > 1 && layer1 < 0)
|| (mci->n_layers > 2 && layer2 < 0))
return NULL;
index = layer0;
if (mci->n_layers > 1)
index = index * mci->layers[1].size + layer1;
if (mci->n_layers > 2)
index = index * mci->layers[2].size + layer2;
return edac_get_dimm_by_index(mci, index);
}
#endif /* _LINUX_EDAC_H_ */
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment