Commit 9f988030 authored by Muralidhara M K's avatar Muralidhara M K Committed by Borislav Petkov (AMD)

EDAC/mce_amd: Remove SMCA Extended Error code descriptions

On AMD systems with Scalable MCA each machine check error of a SMCA bank
type has an associated bit position in the bank's control (CTL)
register.

An error's bit position in the CTL register is used during error decoding
for offsetting into the corresponding bank's error description structure.
As new errors are being added in newer AMD systems for existing SMCA bank
types, the underlying SMCA architecture guarantees that the bit positions
of existing errors are not altered.

However, on some AMD systems some of the existing bit definitions in the
CTL register of SMCA bank type are reassigned without defining new HWID
and McaType. Consequently, the errors whose bit definitions have been
reassigned in the CTL register are being erroneously decoded.

Remove SMCA Extended Error Code descriptions, this avoids decoding
issues for incorrectly reassigned bits, and avoids the related
maintenance burden in the kernel. But the bank type and Extended Error
Code value for an error will continue to be printed as a convenience.

The decoding of SMCA Extended Error Code description can be done by
referring to AMD documentation or use external tools such as rasdaemon.

Offline decoding can be done using below option in rasdaemon. For example:

  $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca

Also, the user can pass particular family and model to decode the error
string.

$ rasdaemon -p --status <STATUS> --ipid <IPID> --smca --family <CPU Family>
	--model <CPU Model> --bank <BANK_NUM>

Refer to the rasdaemon commit for details:

  https://github.com/mchehab/rasdaemon/commit/932118b04a04104dfac6b8536Signed-off-by: default avatarMuralidhara M K <muralidhara.mk@amd.com>
Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: default avatarYazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20231102114225.2006878-2-muralimk@amd.com
parent ff03ff32
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment