EDAC/mce_amd: Remove SMCA Extended Error code descriptions
On AMD systems with Scalable MCA each machine check error of a SMCA bank type has an associated bit position in the bank's control (CTL) register. An error's bit position in the CTL register is used during error decoding for offsetting into the corresponding bank's error description structure. As new errors are being added in newer AMD systems for existing SMCA bank types, the underlying SMCA architecture guarantees that the bit positions of existing errors are not altered. However, on some AMD systems some of the existing bit definitions in the CTL register of SMCA bank type are reassigned without defining new HWID and McaType. Consequently, the errors whose bit definitions have been reassigned in the CTL register are being erroneously decoded. Remove SMCA Extended Error Code descriptions, this avoids decoding issues for incorrectly reassigned bits, and avoids the related maintenance burden in the kernel. But the bank type and Extended Error Code value for an error will continue to be printed as a convenience. The decoding of SMCA Extended Error Code description can be done by referring to AMD documentation or use external tools such as rasdaemon. Offline decoding can be done using below option in rasdaemon. For example: $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca Also, the user can pass particular family and model to decode the error string. $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca --family <CPU Family> --model <CPU Model> --bank <BANK_NUM> Refer to the rasdaemon commit for details: https://github.com/mchehab/rasdaemon/commit/932118b04a04104dfac6b8536Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20231102114225.2006878-2-muralimk@amd.com
Showing
Please register or sign in to comment