• Fan Du's avatar
    x86/mm: Avoid truncating memblocks for SGX memory · 28e5e44a
    Fan Du authored
    tl;dr:
    
    Several SGX users reported seeing the following message on NUMA systems:
    
      sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback to the NUMA node 0.
    
    This turned out to be the memblock code mistakenly throwing away SGX
    memory.
    
    === Full Changelog ===
    
    The 'max_pfn' variable represents the highest known RAM address.  It can
    be used, for instance, to quickly determine for which physical addresses
    there is mem_map[] space allocated.  The numa_meminfo code makes an
    effort to throw out ("trim") all memory blocks which are above 'max_pfn'.
    
    SGX memory is not considered RAM (it is marked as "Reserved" in the
    e820) and is not taken into account by max_pfn. Despite this, SGX memory
    areas have NUMA affinity and are enumerated in the ACPI SRAT table. The
    existing SGX code uses the numa_meminfo mechanism to look up the NUMA
    affinity for its memory areas.
    
    In cases where SGX memory was above max_pfn (usually just the one EPC
    section in the last highest NUMA node), the numa_memblock is truncated
    at 'max_pfn', which is below the SGX memory.  When the SGX code tries to
    look up the affinity of this memory, it fails and produces an error message:
    
      sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback to the NUMA node 0.
    
    and assigns the memory to NUMA node 0.
    
    Instead of silently truncating the memory block at 'max_pfn' and
    dropping the SGX memory, add the truncated portion to
    'numa_reserved_meminfo'.  This allows the SGX code to later determine
    the NUMA affinity of its 'Reserved' area.
    
    Before, numa_meminfo looked like this (from 'crash'):
    
      blk = { start =          0x0, end = 0x2080000000, nid = 0x0 }
            { start = 0x2080000000, end = 0x4000000000, nid = 0x1 }
    
    numa_reserved_meminfo is empty.
    
    With this, numa_meminfo looks like this:
    
      blk = { start =          0x0, end = 0x2080000000, nid = 0x0 }
            { start = 0x2080000000, end = 0x4000000000, nid = 0x1 }
    
    and numa_reserved_meminfo has an entry for node 1's SGX memory:
    
      blk =  { start = 0x4000000000, end = 0x4080000000, nid = 0x1 }
    
     [ daveh: completely rewrote/reworked changelog ]
    
    Fixes: 5d30f92e ("x86/NUMA: Provide a range-to-target_node lookup facility")
    Reported-by: default avatarReinette Chatre <reinette.chatre@intel.com>
    Signed-off-by: default avatarFan Du <fan.du@intel.com>
    Signed-off-by: default avatarDave Hansen <dave.hansen@intel.com>
    Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
    Reviewed-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
    Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
    Reviewed-by: default avatarDave Hansen <dave.hansen@intel.com>
    Cc: <stable@vger.kernel.org>
    Link: https://lkml.kernel.org/r/20210617194657.0A99CB22@viggo.jf.intel.com
    28e5e44a
numa.c 24.3 KB