1. 24 Feb, 2009 21 commits
    • Ingo Molnar's avatar
      Merge branch 'tj-percpu' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/percpu · 0edcf8d6
      Ingo Molnar authored
      Conflicts:
      	arch/x86/include/asm/pgtable.h
      0edcf8d6
    • Ingo Molnar's avatar
      Merge branch 'x86/core' into core/percpu · 87b20307
      Ingo Molnar authored
      87b20307
    • Ingo Molnar's avatar
      Merge branches 'x86/acpi', 'x86/apic', 'x86/asm', 'x86/cleanups', 'x86/mm',... · a852cbfa
      Ingo Molnar authored
      Merge branches 'x86/acpi', 'x86/apic', 'x86/asm', 'x86/cleanups', 'x86/mm', 'x86/signal' and 'x86/urgent'; commit 'v2.6.29-rc6' into x86/core
      a852cbfa
    • Cyrill Gorcunov's avatar
      x86: efi_stub_32,64 - add missing ENDPROCs · 9f331119
      Cyrill Gorcunov authored
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: heukelum@fastmail.fm
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9f331119
    • Cyrill Gorcunov's avatar
      x86: head_64.S - use GLOBAL macro · bc8b2b92
      Cyrill Gorcunov authored
      Impact: cleanup
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: heukelum@fastmail.fm
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      bc8b2b92
    • Cyrill Gorcunov's avatar
      x86: entry_64.S - add missing ENDPROC · b3baaa13
      Cyrill Gorcunov authored
      native_usergs_sysret64 is described as
      
      	extern void native_usergs_sysret64(void)
      
      so lets add ENDPROC here
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: heukelum@fastmail.fm
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b3baaa13
    • Cyrill Gorcunov's avatar
      x86: invalid_vm86_irq -- use predefined macros · 57e37293
      Cyrill Gorcunov authored
      Impact: cleanup
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: heukelum@fastmail.fm
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      57e37293
    • Cyrill Gorcunov's avatar
      x86: head_64.S - use IDT_ENTRIES instead of hardcoded number · 5e112ae2
      Cyrill Gorcunov authored
      Impact: cleanup
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: heukelum@fastmail.fm
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      5e112ae2
    • Cyrill Gorcunov's avatar
      x86: head_64.S - remove useless balign · 2a0b1001
      Cyrill Gorcunov authored
      Impact: cleanup
      
      NEXT_PAGE already has 'balign' so no
      need to keep this redundant one.
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: heukelum@fastmail.fm
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2a0b1001
    • Salman Qazi's avatar
      x86: fix performance regression in write() syscall · 30d697fa
      Salman Qazi authored
      While the introduction of __copy_from_user_nocache (see commit:
      0812a579) may have been an improvement
      for sufficiently large writes, there is evidence to show that it is
      deterimental for small writes.  Unixbench's fstime test gives the
      following results for 256 byte writes with MAX_BLOCK of 2000:
      
          2.6.29-rc6 ( 5 samples, each in KB/sec ):
          283750, 295200, 294500, 293000, 293300
      
          2.6.29-rc6 + this patch (5 samples, each in KB/sec):
          313050, 3106750, 293350, 306300, 307900
      
          2.6.18
          395700, 342000, 399100, 366050, 359850
      
          See w_test() in src/fstime.c in unixbench version 4.1.0.  Basically, the above test
          consists of counting how much we can write in this manner:
      
          alarm(10);
          while (!sigalarm) {
                  for (f_blocks = 0; f_blocks < 2000; ++f_blocks) {
                         write(f, buf, 256);
                  }
                  lseek(f, 0L, 0);
          }
      
      Note, there are other components to the write syscall regression
      that are not addressed here.
      Signed-off-by: default avatarSalman Qazi <sqazi@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      30d697fa
    • Tejun Heo's avatar
      percpu: add __read_mostly to variables which are mostly read only · 40150d37
      Tejun Heo authored
      Most global variables in percpu allocator are initialized during boot
      and read only from that point on.  Add __read_mostly as per Rusty's
      suggestion.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      40150d37
    • Tejun Heo's avatar
      x86: add remapping percpu first chunk allocator · 8ac83757
      Tejun Heo authored
      Impact: add better first percpu allocation for NUMA
      
      On NUMA, embedding allocator can't be used as different units can't be
      made to fall in the correct NUMA nodes.  To use large page mapping,
      each unit needs to be remapped.  However, percpu areas are usually
      much smaller than large page size and unused space hurts a lot as the
      number of cpus grow.  This allocator remaps large pages for each chunk
      but gives back unused part to the bootmem allocator making the large
      pages mapped twice.
      
      This adds slightly to the TLB pressure but is much better than using
      4k mappings while still being NUMA-friendly.
      
      Ingo suggested that this would be the correct approach for NUMA.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      8ac83757
    • Tejun Heo's avatar
      x86: add embedding percpu first chunk allocator · 89c92151
      Tejun Heo authored
      Impact: add better first percpu allocation for !NUMA
      
      On !NUMA, we can simply allocate contiguous memory and use it for the
      first chunk without mapping it into vmalloc area.  As the memory area
      is covered by the large page physical memory mapping, it allows the
      dynamic perpcu allocator to not add any TLB overhead for the static
      percpu area and whatever falls into the first chunk and the
      implementation is very simple too.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      89c92151
    • Tejun Heo's avatar
      x86: separate out setup_pcpu_4k() from setup_per_cpu_areas() · 5f5d8405
      Tejun Heo authored
      Impact: modularize percpu first chunk allocation
      
      x86 is gonna have a few different strategies for the first chunk
      allocation.  Modularize it by separating out the current allocation
      mechanism into pcpu_alloc_bootmem() and setup_pcpu_4k().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      5f5d8405
    • Tejun Heo's avatar
      percpu: give more latitude to arch specific first chunk initialization · 8d408b4b
      Tejun Heo authored
      Impact: more latitude for first percpu chunk allocation
      
      The first percpu chunk serves the kernel static percpu area and may or
      may not contain extra room for further dynamic allocation.
      Initialization of the first chunk needs to be done before normal
      memory allocation service is up, so it has its own init path -
      pcpu_setup_static().
      
      It seems archs need more latitude while initializing the first chunk
      for example to take advantage of large page mapping.  This patch makes
      the following changes to allow this.
      
      * Define PERCPU_DYNAMIC_RESERVE to give arch hint about how much space
        to reserve in the first chunk for further dynamic allocation.
      
      * Rename pcpu_setup_static() to pcpu_setup_first_chunk().
      
      * Make pcpu_setup_first_chunk() much more flexible by fetching page
        pointer by callback and adding optional @unit_size, @free_size and
        @base_addr arguments which allow archs to selectively part of chunk
        initialization to their likings.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      8d408b4b
    • Tejun Heo's avatar
      percpu: remove unit_size power-of-2 restriction · d9b55eeb
      Tejun Heo authored
      Impact: allow unit_size to be arbitrary multiple of PAGE_SIZE
      
      In dynamic percpu allocator, there is no reason the unit size should
      be power of two.  Remove the restriction.
      
      As non-power-of-two unit size means that empty chunks fall into the
      same slot index as lightly occupied chunks which is bad for reclaming.
      Reserve an extra slot for empty chunks.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      d9b55eeb
    • Tejun Heo's avatar
      x86: update populate_extra_pte() and add populate_extra_pmd() · 458a3e64
      Tejun Heo authored
      Impact: minor change to populate_extra_pte() and addition of pmd flavor
      
      Update populate_extra_pte() to return pointer to the pte_t for the
      specified address and add populate_extra_pmd() which only populates
      till the pmd and returns pointer to the pmd entry for the address.
      
      For 64bit, pud/pmd/pte fill functions are separated out from
      set_pte_vaddr[_pud]() and used for set_pte_vaddr[_pud]() and
      populate_extra_{pte|pmd}().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      458a3e64
    • Tejun Heo's avatar
      vmalloc: add @align to vm_area_register_early() · c0c0a293
      Tejun Heo authored
      Impact: allow larger alignment for early vmalloc area allocation
      
      Some early vmalloc users might want larger alignment, for example, for
      custom large page mapping.  Add @align to vm_area_register_early().
      While at it, drop docbook comment on non-existent @size.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      c0c0a293
    • Tejun Heo's avatar
      bootmem: reorder interface functions and add a missing one · 2d0aae41
      Tejun Heo authored
      Impact: cleanup and addition of missing interface wrapper
      
      The interface functions in bootmem.h was ordered in not so orderly
      manner.  Reorder them such that
      
      * functions allocating the same area group together -
        ie. alloc_bootmem group and alloc_bootmem_low group.
      
      * functions w/o node parameter come before the ones w/ node parameter.
      
      * nopanic variants are immediately below their panicky counterparts.
      
      While at it, add alloc_bootmem_pages_node_nopanic() which was missing.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@saeurebad.de>
      2d0aae41
    • Tejun Heo's avatar
      bootmem: clean up arch-specific bootmem wrapping · c1329375
      Tejun Heo authored
      Impact: cleaner and consistent bootmem wrapping
      
      By setting CONFIG_HAVE_ARCH_BOOTMEM_NODE, archs can define
      arch-specific wrappers for bootmem allocation.  However, this is done
      a bit strangely in that only the high level convenience macros can be
      changed while lower level, but still exported, interface functions
      can't be wrapped.  This not only is messy but also leads to strange
      situation where alloc_bootmem() does what the arch wants it to do but
      the equivalent __alloc_bootmem() call doesn't although they should be
      able to be used interchangeably.
      
      This patch updates bootmem such that archs can override / wrap the
      backend function - alloc_bootmem_core() instead of the highlevel
      interface functions to allow simpler and consistent wrapping.  Also,
      HAVE_ARCH_BOOTMEM_NODE is renamed to HAVE_ARCH_BOOTMEM.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@saeurebad.de>
      c1329375
    • Tejun Heo's avatar
      percpu: fix pcpu_chunk_struct_size · cb83b42e
      Tejun Heo authored
      Impact: fix short allocation leading to memory corruption
      
      While dropping rvalue wrapping macros around global parameters,
      pcpu_chunk_struct_size was set incorrectly resulting in shorter page
      pointer array.  Fix it.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      cb83b42e
  2. 23 Feb, 2009 13 commits
  3. 22 Feb, 2009 6 commits
    • Jeremy Fitzhardinge's avatar
      acpi: add some missing section markers · 0d3a9cf5
      Jeremy Fitzhardinge authored
      early_acpi_os_unmap_memory() is an __init function, and
      acpi_os_unmap_memory() is allowed to access an __init function
      until acpi_gbl_permanent_mmap is set up.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Len Brown <len.brown@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0d3a9cf5
    • Ingo Molnar's avatar
      x86: refactor x86_quirks support · 8e6dafd6
      Ingo Molnar authored
      Impact: cleanup
      
      Make x86_quirks support more transparent. The highlevel
      methods are now named:
      
        extern void x86_quirk_pre_intr_init(void);
        extern void x86_quirk_intr_init(void);
      
        extern void x86_quirk_trap_init(void);
      
        extern void x86_quirk_pre_time_init(void);
        extern void x86_quirk_time_init(void);
      
      This makes it clear that if some platform extension has to
      do something here that it is considered ... weird, and is
      discouraged.
      
      Also remove arch_hooks.h and move it into setup.h (and other
      header files where appropriate).
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8e6dafd6
    • Ingo Molnar's avatar
      x86: remove various unused subarch hooks · d85a881d
      Ingo Molnar authored
      Impact: remove dead code
      
      Remove:
      
       - pre_setup_arch_hook()
       - mca_nmi_hook()
      
      If needed they can be added back via an x86_quirk handler.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d85a881d
    • Ingo Molnar's avatar
      x86: remove the Voyager 32-bit subarch · 965c7eca
      Ingo Molnar authored
      Impact: remove unused/broken code
      
      The Voyager subarch last built successfully on the v2.6.26 kernel
      and has been stale since then and does not build on the v2.6.27,
      v2.6.28 and v2.6.29-rc5 kernels.
      
      No actual users beyond the maintainer reported this breakage.
      Patches were sent and most of the fixes were accepted but the
      discussion around how to do a few remaining issues cleanly
      fizzled out with no resolution and the code remained broken.
      
      In the v2.6.30 x86 tree development cycle 32-bit subarch support
      has been reworked and removed - and the Voyager code, beyond the
      build problems already known, needs serious and significant
      changes and probably a rewrite to support it.
      
      CONFIG_X86_VOYAGER has been marked BROKEN then. The maintainer has
      been notified but no patches have been sent so far to fix it.
      
      While all other subarchs have been converted to the new scheme,
      voyager is still broken. We'd prefer to receive patches which
      clean up the current situation in a constructive way, but even in
      case of removal there is no obstacle to add that support back
      after the issues have been sorted out in a mutually acceptable
      fashion.
      
      So remove this inactive code for now.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      965c7eca
    • Paul Moore's avatar
      selinux: Fix the NetLabel glue code for setsockopt() · 09c50b4a
      Paul Moore authored
      At some point we (okay, I) managed to break the ability for users to use the
      setsockopt() syscall to set IPv4 options when NetLabel was not active on the
      socket in question.  The problem was noticed by someone trying to use the
      "-R" (record route) option of ping:
      
       # ping -R 10.0.0.1
       ping: record route: No message of desired type
      
      The solution is relatively simple, we catch the unlabeled socket case and
      clear the error code, allowing the operation to succeed.  Please note that we
      still deny users the ability to override IPv4 options on socket's which have
      NetLabel labeling active; this is done to ensure the labeling remains intact.
      Signed-off-by: default avatarPaul Moore <paul.moore@hp.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      09c50b4a
    • Paul Moore's avatar
      cipso: Fix documentation comment · 586c2500
      Paul Moore authored
      The CIPSO protocol engine incorrectly stated that the FIPS-188 specification
      could be found in the kernel's Documentation directory.  This patch corrects
      that by removing the comment and directing users to the FIPS-188 documented
      hosted online.  For the sake of completeness I've also included a link to the
      CIPSO draft specification on the NetLabel website.
      
      Thanks to Randy Dunlap for spotting the error and letting me know.
      Signed-off-by: default avatarPaul Moore <paul.moore@hp.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      586c2500