1. 02 Feb, 2010 5 commits
  2. 01 Dec, 2009 1 commit
    • H. Peter Anvin's avatar
      x86, mm: Correct the implementation of is_untracked_pat_range() · ccef0864
      H. Peter Anvin authored
      The semantics the PAT code expect of is_untracked_pat_range() is "is
      this range completely contained inside the untracked region."  This
      means that checkin 8a271389 was
      technically wrong, because the implementation needlessly confusing.
      
      The sane interface is for it to take a semiclosed range like just
      about everything else (as evidenced by the sheer number of "- 1"'s
      removed by that patch) so change the actual implementation to match.
      Reported-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jack Steiner <steiner@sgi.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      LKML-Reference: <20091119202341.GA4420@sgi.com>
      ccef0864
  3. 26 Nov, 2009 1 commit
  4. 24 Nov, 2009 6 commits
  5. 23 Nov, 2009 5 commits
    • Jack Steiner's avatar
      x86: UV SGI: Don't track GRU space in PAT · fd12a0d6
      Jack Steiner authored
      GRU space is always mapped as WB in the page table. There is
      no need to track the mappings in the PAT. This also eliminates
      the "freeing invalid memtype" messages when the GRU space is
      unmapped.
      Signed-off-by: default avatarJack Steiner <steiner@sgi.com>
      LKML-Reference: <20091119202341.GA4420@sgi.com>
      [ v2: fix build failure ]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      fd12a0d6
    • Cliff Wickman's avatar
      x86: SGI UV: Fix BAU initialization · e38e2af1
      Cliff Wickman authored
      A memory mapped register that affects the SGI UV Broadcast
      Assist Unit's interrupt handling may sometimes be unintialized.
      
      Remove the condition on its initialization, as that condition
      can be randomly satisfied by a hardware reset.
      Signed-off-by: default avatarCliff Wickman <cpw@sgi.com>
      Cc: <stable@kernel.org>
      LKML-Reference: <E1NBGB9-0005nU-Dp@eag09.americas.sgi.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e38e2af1
    • Yinghai Lu's avatar
      x86, numa: Use near(er) online node instead of roundrobin for NUMA · d9c2d5ac
      Yinghai Lu authored
      CPU to node mapping is set via the following sequence:
      
       1. numa_init_array(): Set up roundrobin from cpu to online node
      
       2. init_cpu_to_node(): Set that according to apicid_to_node[]
      			according to srat only handle the node that
      			is online, and leave other cpu on node
      			without ram (aka not online) to still
      			roundrobin.
      
      3. later call srat_detect_node for Intel/AMD, will use first_online
         node or nearby node.
      
      Problem is that setup_per_cpu_areas() is not called between 2 and 3,
      the per_cpu for cpu on node with ram is on different node, and could
      put that on node with two hops away.
      
      So try to optimize this and add find_near_online_node() and call
      init_cpu_to_node().
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4B07A739.3030104@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d9c2d5ac
    • Yinghai Lu's avatar
      x86, numa, bootmem: Only free bootmem on NUMA failure path · 021428ad
      Yinghai Lu authored
      In the NUMA bootmem setup failure path we freed nodedata_phys
      incorrectly.
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4B07A739.3030104@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      021428ad
    • Yinghai Lu's avatar
      x86: Change crash kernel to reserve via reserve_early() · 44280733
      Yinghai Lu authored
      use find_e820_area()/reserve_early() instead.
      
      -v2: address Eric's request, to restore original semantics.
           will fail, if the provided address can not be used.
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Acked-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <4B09E2F9.7040403@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      44280733
  6. 19 Nov, 2009 1 commit
    • Jan Beulich's avatar
      x86: Eliminate redundant/contradicting cache line size config options · 350f8f56
      Jan Beulich authored
      Rather than having X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
      (with inconsistent defaults), just having the latter suffices as
      the former can be easily calculated from it.
      
      To be consistent, also change X86_INTERNODE_CACHE_BYTES to
      X86_INTERNODE_CACHE_SHIFT, and set it to 7 (128 bytes) for NUMA
      to account for last level cache line size (which here matters
      more than L1 cache line size).
      
      Finally, make sure the default value for X86_L1_CACHE_SHIFT,
      when X86_GENERIC is selected, is being seen before that for the
      individual CPU model options (other than on x86-64, where
      GENERIC_CPU is part of the choice construct, X86_GENERIC is a
      separate option on ix86).
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Acked-by: default avatarRavikiran Thirumalai <kiran@scalex86.org>
      Acked-by: default avatarNick Piggin <npiggin@suse.de>
      LKML-Reference: <4AFD5710020000780001F8F0@vpn.id2.novell.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      350f8f56
  7. 17 Nov, 2009 1 commit
  8. 16 Nov, 2009 6 commits
  9. 11 Nov, 2009 1 commit
  10. 08 Nov, 2009 1 commit
  11. 02 Nov, 2009 3 commits
  12. 27 Oct, 2009 1 commit
    • Steven Rostedt's avatar
      tracing: allow to change permissions for text with dynamic ftrace enabled · 883242dd
      Steven Rostedt authored
      The commit 74e08179
      x86-64: align RODATA kernel section to 2MB with CONFIG_DEBUG_RODATA
      prevents text sections from becoming read/write using set_memory_rw.
      
      The dynamic ftrace changes all text pages to read/write just before
      converting the calls to tracing to nops, and vice versa.
      
      I orginally just added a flag to allow this transaction when ftrace
      did the change, but I also found that when the CPA testing was running
      it would remove the read/write as well, and ftrace does not do the text
      conversion on boot up, and the CPA changes caused the dynamic tracer
      to fail on self tests.
      
      The current solution I have is to simply not to prevent
      change_page_attr from setting the RW bit for kernel text pages.
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      883242dd
  13. 24 Oct, 2009 1 commit
  14. 23 Oct, 2009 1 commit
  15. 20 Oct, 2009 3 commits
    • Suresh Siddha's avatar
      x86-64: add comment for RODATA large page retainment · d6cc1c3a
      Suresh Siddha authored
      Add a comment explaining why RODATA is aligned to 2 MB.
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      d6cc1c3a
    • Suresh Siddha's avatar
      x86-64: align RODATA kernel section to 2MB with CONFIG_DEBUG_RODATA · 74e08179
      Suresh Siddha authored
      CONFIG_DEBUG_RODATA chops the large pages spanning boundaries of kernel
      text/rodata/data to small 4KB pages as they are mapped with different
      attributes (text as RO, RODATA as RO and NX etc).
      
      On x86_64, preserve the large page mappings for kernel text/rodata/data
      boundaries when CONFIG_DEBUG_RODATA is enabled. This is done by allowing the
      RODATA section to be hugepage aligned and having same RWX attributes
      for the 2MB page boundaries
      
      Extra Memory pages padding the sections will be freed during the end of the boot
      and the kernel identity mappings will have different RWX permissions compared to
      the kernel text mappings.
      
      Kernel identity mappings to these physical pages will be mapped with smaller
      pages but large page mappings are still retained for kernel text,rodata,data
      mappings.
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20091014220254.190119924@sbs-t61.sc.intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      74e08179
    • Suresh Siddha's avatar
      x86-64: preserve large page mapping for 1st 2MB kernel txt with CONFIG_DEBUG_RODATA · b9af7c0d
      Suresh Siddha authored
      In the first 2MB, kernel text is co-located with kernel static
      page tables setup by head_64.S.  CONFIG_DEBUG_RODATA chops this
      2MB large page mapping to small 4KB pages as we mark the kernel text as RO,
      leaving the static page tables as RW.
      
      With CONFIG_DEBUG_RODATA disabled, OLTP run on NHM-EP shows 1% improvement
      with 2% reduction in system time and 1% improvement in iowait idle time.
      
      To recover this, move the kernel static page tables to .data section, so that
      we don't have to break the first 2MB of kernel text to small pages with
      CONFIG_DEBUG_RODATA.
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20091014220254.063193621@sbs-t61.sc.intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      b9af7c0d
  16. 12 Oct, 2009 3 commits
    • David Rientjes's avatar
      x86: Interleave emulated nodes over physical nodes · adc19389
      David Rientjes authored
      Add interleaved NUMA emulation support
      
      This patch interleaves emulated nodes over the system's physical
      nodes. This is required for interleave optimizations since
      mempolicies, for example, operate by iterating over a nodemask and
      act without knowledge of node distances.  It can also be used for
      testing memory latencies and NUMA bugs in the kernel.
      
      There're a couple of ways to do this:
      
       - divide the number of emulated nodes by the number of physical
         nodes and allocate the result on each physical node, or
      
       - allocate each successive emulated node on a different physical
         node until all memory is exhausted.
      
      The disadvantage of the first option is, depending on the asymmetry
      in node capacities of each physical node, emulated nodes may
      substantially differ in size on a particular physical node compared
      to another.
      
      The disadvantage of the second option is, also depending on the
      asymmetry in node capacities of each physical node, there may be
      more emulated nodes allocated on a single physical node as another.
      
      This patch implements the second option; we sacrifice the
      possibility that we may have slightly more emulated nodes on a
      particular physical node compared to another in lieu of node size
      asymmetry.
      
       [ Note that "node capacity" of a physical node is not only a
         function of its addressable range, but also is affected by
         subtracting out the amount of reserved memory over that range.
         NUMA emulation only deals with available, non-reserved memory
         quantities. ]
      
      We ensure there is at least a minimal amount of available memory
      allocated to each node.  We also make sure that at least this
      amount of available memory is available in ZONE_DMA32 for any node
      that includes both ZONE_DMA32 and ZONE_NORMAL.
      
      This patch also cleans the emulation code up by no longer passing
      the statically allocated struct bootnode array among the various
      functions. This init.data array is not allocated on the stack since
      it may be very large and thus it may be accessed at file scope.
      
      The WARN_ON() for nodes_cover_memory() when faking proximity
      domains is removed since it relies on successive nodes always
      having greater start addresses than previous nodes; with
      interleaving this is no longer always true.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Ankita Garg <ankita@in.ibm.com>
      Cc: Len Brown <len.brown@intel.com>
      LKML-Reference: <alpine.DEB.1.00.0909251519150.14754@chino.kir.corp.google.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      adc19389
    • David Rientjes's avatar
      x86: Export srat physical topology · 8716273c
      David Rientjes authored
      This is the counterpart to "x86: export k8 physical topology" for
      SRAT. It is not as invasive because the acpi code already seperates
      node setup into detection and registration steps, with the
      exception of registering e820 active regions in
      acpi_numa_memory_affinity_init().  This is now moved to
      acpi_scan_nodes() if NUMA emulation is disabled or deferred.
      
      acpi_numa_init() now returns a value which specifies whether an
      underlying SRAT was located.  If so, that topology can be used by
      the emulation code to interleave emulated nodes over physical nodes
      or to register the nodes for ACPI.
      
      acpi_get_nodes() may now be used to export the srat physical
      topology of the machine for NUMA emulation.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Ankita Garg <ankita@in.ibm.com>
      Cc: Len Brown <len.brown@intel.com>
      LKML-Reference: <alpine.DEB.1.00.0909251518580.14754@chino.kir.corp.google.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8716273c
    • David Rientjes's avatar
      x86: Export k8 physical topology · 8ee2debc
      David Rientjes authored
      To eventually interleave emulated nodes over physical nodes, we
      need to know the physical topology of the machine without actually
      registering it.  This does the k8 node setup in two parts:
      detection and registration.  NUMA emulation can then used the
      physical topology detected to setup the address ranges of emulated
      nodes accordingly.  If emulation isn't used, the k8 nodes are
      registered as normal.
      
      Two formals are added to the x86 NUMA setup functions: `acpi' and
      `k8'. These represent whether ACPI or K8 NUMA has been detected;
      both cannot be true at the same time.  This specifies to the NUMA
      emulation code whether an underlying physical NUMA topology exists
      and which interface to use.
      
      This patch deals solely with separating the k8 setup path into
      Northbridge detection and registration steps and leaves the ACPI
      changes for a subsequent patch.  The `acpi' formal is added here,
      however, to avoid touching all the header files again in the next
      patch.
      
      This approach also ensures emulated nodes will not span physical
      nodes so the true memory latency is not misrepresented.
      
      k8_get_nodes() may now be used to export the k8 physical topology
      of the machine for NUMA emulation.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Ankita Garg <ankita@in.ibm.com>
      Cc: Len Brown <len.brown@intel.com>
      LKML-Reference: <alpine.DEB.1.00.0909251518400.14754@chino.kir.corp.google.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8ee2debc