• Yazen Ghannam's avatar
    x86/MCE/AMD: Allow Reserved types to be overwritten in smca_banks[] · 966af209
    Yazen Ghannam authored
    Each logical CPU in Scalable MCA systems controls a unique set of MCA
    banks in the system. These banks are not shared between CPUs. The bank
    types and ordering will be the same across CPUs on currently available
    systems.
    
    However, some CPUs may see a bank as Reserved/Read-as-Zero (RAZ) while
    other CPUs do not. In this case, the bank seen as Reserved on one CPU is
    assumed to be the same type as the bank seen as a known type on another
    CPU.
    
    In general, this occurs when the hardware represented by the MCA bank
    is disabled, e.g. disabled memory controllers on certain models, etc.
    The MCA bank is disabled in the hardware, so there is no possibility of
    getting an MCA/MCE from it even if it is assumed to have a known type.
    
    For example:
    
    Full system:
    	Bank  |  Type seen on CPU0  |  Type seen on CPU1
    	------------------------------------------------
    	 0    |         LS          |          LS
    	 1    |         UMC         |          UMC
    	 2    |         CS          |          CS
    
    System with hardware disabled:
    	Bank  |  Type seen on CPU0  |  Type seen on CPU1
    	------------------------------------------------
    	 0    |         LS          |          LS
    	 1    |         UMC         |          RAZ
    	 2    |         CS          |          CS
    
    For this reason, there is a single, global struct smca_banks[] that is
    initialized at boot time. This array is initialized on each CPU as it
    comes online. However, the array will not be updated if an entry already
    exists.
    
    This works as expected when the first CPU (usually CPU0) has all
    possible MCA banks enabled. But if the first CPU has a subset, then it
    will save a "Reserved" type in smca_banks[]. Successive CPUs will then
    not be able to update smca_banks[] even if they encounter a known bank
    type.
    
    This may result in unexpected behavior. Depending on the system
    configuration, a user may observe issues enumerating the MCA
    thresholding sysfs interface. The issues may be as trivial as sysfs
    entries not being available, or as severe as system hangs.
    
    For example:
    
    	Bank  |  Type seen on CPU0  |  Type seen on CPU1
    	------------------------------------------------
    	 0    |         LS          |          LS
    	 1    |         RAZ         |          UMC
    	 2    |         CS          |          CS
    
    Extend the smca_banks[] entry check to return if the entry is a
    non-reserved type. Otherwise, continue so that CPUs that encounter a
    known bank type can update smca_banks[].
    
    Fixes: 68627a69 ("x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type")
    Signed-off-by: default avatarYazen Ghannam <yazen.ghannam@amd.com>
    Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: linux-edac <linux-edac@vger.kernel.org>
    Cc: <stable@vger.kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: x86-ml <x86@kernel.org>
    Link: https://lkml.kernel.org/r/20191121141508.141273-1-Yazen.Ghannam@amd.com
    966af209
amd.c 36.3 KB