• David Rientjes's avatar
    x86: Interleave emulated nodes over physical nodes · adc19389
    David Rientjes authored
    Add interleaved NUMA emulation support
    
    This patch interleaves emulated nodes over the system's physical
    nodes. This is required for interleave optimizations since
    mempolicies, for example, operate by iterating over a nodemask and
    act without knowledge of node distances.  It can also be used for
    testing memory latencies and NUMA bugs in the kernel.
    
    There're a couple of ways to do this:
    
     - divide the number of emulated nodes by the number of physical
       nodes and allocate the result on each physical node, or
    
     - allocate each successive emulated node on a different physical
       node until all memory is exhausted.
    
    The disadvantage of the first option is, depending on the asymmetry
    in node capacities of each physical node, emulated nodes may
    substantially differ in size on a particular physical node compared
    to another.
    
    The disadvantage of the second option is, also depending on the
    asymmetry in node capacities of each physical node, there may be
    more emulated nodes allocated on a single physical node as another.
    
    This patch implements the second option; we sacrifice the
    possibility that we may have slightly more emulated nodes on a
    particular physical node compared to another in lieu of node size
    asymmetry.
    
     [ Note that "node capacity" of a physical node is not only a
       function of its addressable range, but also is affected by
       subtracting out the amount of reserved memory over that range.
       NUMA emulation only deals with available, non-reserved memory
       quantities. ]
    
    We ensure there is at least a minimal amount of available memory
    allocated to each node.  We also make sure that at least this
    amount of available memory is available in ZONE_DMA32 for any node
    that includes both ZONE_DMA32 and ZONE_NORMAL.
    
    This patch also cleans the emulation code up by no longer passing
    the statically allocated struct bootnode array among the various
    functions. This init.data array is not allocated on the stack since
    it may be very large and thus it may be accessed at file scope.
    
    The WARN_ON() for nodes_cover_memory() when faking proximity
    domains is removed since it relies on successive nodes always
    having greater start addresses than previous nodes; with
    interleaving this is no longer always true.
    Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
    Cc: Yinghai Lu <yinghai@kernel.org>
    Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
    Cc: Ankita Garg <ankita@in.ibm.com>
    Cc: Len Brown <len.brown@intel.com>
    LKML-Reference: <alpine.DEB.1.00.0909251519150.14754@chino.kir.corp.google.com>
    Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    adc19389
numa_64.c 22.7 KB