1. 24 Jun, 2009 4 commits
    • Tejun Heo's avatar
      percpu: cleanup percpu array definitions · 204fba4a
      Tejun Heo authored
      Currently, the following three different ways to define percpu arrays
      are in use.
      
      1. DEFINE_PER_CPU(elem_type[array_len], array_name);
      2. DEFINE_PER_CPU(elem_type, array_name[array_len]);
      3. DEFINE_PER_CPU(elem_type, array_name)[array_len];
      
      Unify to #1 which correctly separates the roles of the two parameters
      and thus allows more flexibility in the way percpu variables are
      defined.
      
      [ Impact: cleanup ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: linux-mm@kvack.org
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: David S. Miller <davem@davemloft.net>
      204fba4a
    • Jesper Nilsson's avatar
      CRIS: Change DEFINE_PER_CPU of current_pgd to be non volatile. · fe87f94f
      Jesper Nilsson authored
      The DEFINE_PER_CPU of current_pgd was on CRIS defined using volatile,
      which is not needed. Remove volatile.
      
      Tested on an ARTPEC-3 (CRISv32) board.
      
      tj: extern DEFINE_PER_CPU() replaced with DECLARE_PER_CPU()
      
      [ Impact: code cleanup ]
      Signed-off-by: default avatarJesper Nilsson <jesper.nilsson@axis.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      fe87f94f
    • Tejun Heo's avatar
      linker script: throw away .discard section · 405d967d
      Tejun Heo authored
      x86 throws away .discard section but no other archs do.  Also,
      .discard is not thrown away while linking modules.  Make every arch
      and module linking throw it away.  This will be used to define dummy
      variables for percpu declarations and definitions.
      
      This patch is based on Ivan Kokshaysky's alpha percpu patch.
      
      [ Impact: always throw away everything in .discard ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: Bryan Wu <cooloney@kernel.org>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      405d967d
    • Tejun Heo's avatar
      percpu: use dynamic percpu allocator as the default percpu allocator · e74e3962
      Tejun Heo authored
      This patch makes most !CONFIG_HAVE_SETUP_PER_CPU_AREA archs use
      dynamic percpu allocator.  The first chunk is allocated using
      embedding helper and 8k is reserved for modules.  This ensures that
      the new allocator behaves almost identically to the original allocator
      as long as static percpu variables are concerned, so it shouldn't
      introduce much breakage.
      
      s390 and alpha use custom SHIFT_PERCPU_PTR() to work around addressing
      range limit the addressing model imposes.  Unfortunately, this breaks
      if the address is specified using a variable, so for now, the two
      archs aren't converted.
      
      The following architectures are affected by this change.
      
      * sh
      * arm
      * cris
      * mips
      * sparc(32)
      * blackfin
      * avr32
      * parisc (broken, under investigation)
      * m32r
      * powerpc(32)
      
      As this change makes the dynamic allocator the default one,
      CONFIG_HAVE_DYNAMIC_PER_CPU_AREA is replaced with its invert -
      CONFIG_HAVE_LEGACY_PER_CPU_AREA, which is added to yet-to-be converted
      archs.  These archs implement their own setup_per_cpu_areas() and the
      conversion is not trivial.
      
      * powerpc(64)
      * sparc(64)
      * ia64
      * alpha
      * s390
      
      Boot and batch alloc/free tests on x86_32 with debug code (x86_32
      doesn't use default first chunk initialization).  Compile tested on
      sparc(32), powerpc(32), arm and alpha.
      
      Kyle McMartin reported that this change breaks parisc.  The problem is
      still under investigation and he is okay with pushing this patch
      forward and fixing parisc later.
      
      [ Impact: use dynamic allocator for most archs w/o custom percpu setup ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Reviewed-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Bryan Wu <cooloney@kernel.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      e74e3962
  2. 22 Jun, 2009 8 commits
    • Tejun Heo's avatar
      x86: ensure percpu lpage doesn't consume too much vmalloc space · 0017c869
      Tejun Heo authored
      On extreme configuration (e.g. 32bit 32-way NUMA machine), lpage
      percpu first chunk allocator can consume too much of vmalloc space.
      Make it fall back to 4k allocator if the consumption goes over 20%.
      
      [ Impact: add sanity check for lpage percpu first chunk allocator ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarJan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      0017c869
    • Tejun Heo's avatar
      x86: implement percpu_alloc kernel parameter · fa8a7094
      Tejun Heo authored
      According to Andi, it isn't clear whether lpage allocator is worth the
      trouble as there are many processors where PMD TLB is far scarcer than
      PTE TLB.  The advantage or disadvantage probably depends on the actual
      size of percpu area and specific processor.  As performance
      degradation due to TLB pressure tends to be highly workload specific
      and subtle, it is difficult to decide which way to go without more
      data.
      
      This patch implements percpu_alloc kernel parameter to allow selecting
      which first chunk allocator to use to ease debugging and testing.
      
      While at it, make sure all the failure paths report why something
      failed to help determining why certain allocator isn't working.  Also,
      kill the "Great future plan" comment which had already been realized
      quite some time ago.
      
      [ Impact: allow explicit percpu first chunk allocator selection ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarJan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      fa8a7094
    • Tejun Heo's avatar
      x86: fix pageattr handling for lpage percpu allocator and re-enable it · e59a1bb2
      Tejun Heo authored
      lpage allocator aliases a PMD page for each cpu and returns whatever
      is unused to the page allocator.  When the pageattr of the recycled
      pages are changed, this makes the two aliases point to the overlapping
      regions with different attributes which isn't allowed and known to
      cause subtle data corruption in certain cases.
      
      This can be handled in simliar manner to the x86_64 highmap alias.
      pageattr code should detect if the target pages have PMD alias and
      split the PMD alias and synchronize the attributes.
      
      pcpur allocator is updated to keep the allocated PMD pages map sorted
      in ascending address order and provide pcpu_lpage_remapped() function
      which binary searches the array to determine whether the given address
      is aliased and if so to which address.  pageattr is updated to use
      pcpu_lpage_remapped() to detect the PMD alias and split it up as
      necessary from cpa_process_alias().
      
      Jan Beulich spotted the original problem and incorrect usage of vaddr
      instead of laddr for lookup.
      
      With this, lpage percpu allocator should work correctly.  Re-enable
      it.
      
      [ Impact: fix subtle lpage pageattr bug and re-enable lpage ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarJan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      e59a1bb2
    • Tejun Heo's avatar
      x86: reorganize cpa_process_alias() · 992f4c1c
      Tejun Heo authored
      Reorganize cpa_process_alias() so that new alias condition can be
      added easily.
      
      Jan Beulich spotted problem in the original cleanup thread which
      incorrectly assumed the two existing conditions were mutially
      exclusive.
      
      [ Impact: code reorganization ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      992f4c1c
    • Tejun Heo's avatar
      x86: prepare setup_pcpu_lpage() for pageattr fix · 0ff2587f
      Tejun Heo authored
      Make the following changes in preparation of coming pageattr updates.
      
      * Define and use array of struct pcpul_ent instead of array of
        pointers.  The only difference is ->cpu field which is set but
        unused yet.
      
      * Rename variables according to the above change.
      
      * Rename local variable vm to pcpul_vm and move it out of the
        function.
      
      [ Impact: no functional difference ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      0ff2587f
    • Tejun Heo's avatar
      x86: rename remap percpu first chunk allocator to lpage · 97c9bf06
      Tejun Heo authored
      The "remap" allocator remaps large pages to build the first chunk;
      however, the name isn't very good because 4k allocator remaps too and
      the whole point of the remap allocator is using large page mapping.
      The allocator will be generalized and exported outside of x86, rename
      it to lpage before that happens.
      
      percpu_alloc kernel parameter is updated to accept both "remap" and
      "lpage" for lpage allocator.
      
      [ Impact: code cleanup, kernel parameter argument updated ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      97c9bf06
    • Tejun Heo's avatar
      x86: fix duplicate free in setup_pcpu_remap() failure path · c5806df9
      Tejun Heo authored
      In the failure path, setup_pcpu_remap() tries to free the area which
      has already been freed to make holes in the large page.  Fix it.
      
      [ Impact: fix duplicate free in failure path ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      c5806df9
    • Tejun Heo's avatar
      percpu: fix too lazy vunmap cache flushing · 85ae87c1
      Tejun Heo authored
      In pcpu_unmap(), flushing virtual cache on vunmap can't be delayed as
      the page is going to be returned to the page allocator.  Only TLB
      flushing can be put off such that vmalloc code can handle it lazily.
      Fix it.
      
      [ Impact: fix subtle virtual cache flush bug ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      85ae87c1
  3. 21 Jun, 2009 22 commits
  4. 20 Jun, 2009 6 commits
    • Johannes Weiner's avatar
      mm: page_alloc: clear PG_locked before checking flags on free · c277331d
      Johannes Weiner authored
      da456f14 "page allocator: do not disable interrupts in free_page_mlock()" moved
      the PG_mlocked clearing after the flag sanity checking which makes mlocked
      pages always trigger 'bad page'.  Fix this by clearing the bit up front.
      Reported--and-debugged-by: default avatarPeter Chubb <peter.chubb@nicta.com.au>
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Tested-by: default avatarMaxim Levitsky <maximlevitsky@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c277331d
    • Linus Torvalds's avatar
      x86, 64-bit: Clean up user address masking · 9063c61f
      Linus Torvalds authored
      The discussion about using "access_ok()" in get_user_pages_fast() (see
      commit 7f818906: "x86: don't use
      'access_ok()' as a range check in get_user_pages_fast()" for details and
      end result), made us notice that x86-64 was really being very sloppy
      about virtual address checking.
      
      So be way more careful and straightforward about masking x86-64 virtual
      addresses:
      
       - All the VIRTUAL_MASK* variants now cover half of the address
         space, it's not like we can use the full mask on a signed
         integer, and the larger mask just invites mistakes when
         applying it to either half of the 48-bit address space.
      
       - /proc/kcore's kc_offset_to_vaddr() becomes a lot more
         obvious when it transforms a file offset into a
         (kernel-half) virtual address.
      
       - Unify/simplify the 32-bit and 64-bit USER_DS definition to
         be based on TASK_SIZE_MAX.
      
      This cleanup and more careful/obvious user virtual address checking also
      uncovered a buglet in the x86-64 implementation of strnlen_user(): it
      would do an "access_ok()" check on the whole potential area, even if the
      string itself was much shorter, and thus return an error even for valid
      strings. Our sloppy checking had hidden this.
      
      So this fixes 'strnlen_user()' to do this properly, the same way we
      already handled user strings in 'strncpy_from_user()'.  Namely by just
      checking the first byte, and then relying on fault handling for the
      rest.  That always works, since we impose a guard page that cannot be
      mapped at the end of the user space address space (and even if we
      didn't, we'd have the address space hole).
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9063c61f
    • Linus Torvalds's avatar
      Merge branch 'irq-fixes-for-linus' of... · 2453d6ff
      Linus Torvalds authored
      Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
        genirq, irq.h: Fix kernel-doc warnings
        genirq: fix comment to say IRQ_WAKE_THREAD
      2453d6ff
    • Linus Torvalds's avatar
      Merge branch 'perfcounters-fixes-for-linus' of... · 12e24f34
      Linus Torvalds authored
      Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
        perfcounter: Handle some IO return values
        perf_counter: Push perf_sample_data through the swcounter code
        perf_counter tools: Define and use our own u64, s64 etc. definitions
        perf_counter: Close race in perf_lock_task_context()
        perf_counter, x86: Improve interactions with fast-gup
        perf_counter: Simplify and fix task migration counting
        perf_counter tools: Add a data file header
        perf_counter: Update userspace callchain sampling uses
        perf_counter: Make callchain samples extensible
        perf report: Filter to parent set by default
        perf_counter tools: Handle lost events
        perf_counter: Add event overlow handling
        fs: Provide empty .set_page_dirty() aop for anon inodes
        perf_counter: tools: Makefile tweaks for 64-bit powerpc
        perf_counter: powerpc: Add processor back-end for MPC7450 family
        perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels
        perf_counter: powerpc: Change how processor-specific back-ends get selected
        perf_counter: powerpc: Use unsigned long for register and constraint values
        perf_counter: powerpc: Enable use of software counters on 32-bit powerpc
        perf_counter tools: Add and use isprint()
        ...
      12e24f34
    • Linus Torvalds's avatar
      Merge branch 'sched-fixes-for-linus' of... · 1eb51c33
      Linus Torvalds authored
      Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
        sched: Fix out of scope variable access in sched_slice()
        sched: Hide runqueues from direct refer at source code level
        sched: Remove unneeded __ref tag
        sched, x86: Fix cpufreq + sched_clock() TSC scaling
      1eb51c33
    • Linus Torvalds's avatar
      Merge branch 'tracing-fixes-for-linus' of... · b0b7065b
      Linus Torvalds authored
      Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits)
        tracing/urgent: warn in case of ftrace_start_up inbalance
        tracing/urgent: fix unbalanced ftrace_start_up
        function-graph: add stack frame test
        function-graph: disable when both x86_32 and optimize for size are configured
        ring-buffer: have benchmark test print to trace buffer
        ring-buffer: do not grab locks in nmi
        ring-buffer: add locks around rb_per_cpu_empty
        ring-buffer: check for less than two in size allocation
        ring-buffer: remove useless compile check for buffer_page size
        ring-buffer: remove useless warn on check
        ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index
        tracing: update sample event documentation
        tracing/filters: fix race between filter setting and module unload
        tracing/filters: free filter_string in destroy_preds()
        ring-buffer: use commit counters for commit pointer accounting
        ring-buffer: remove unused variable
        ring-buffer: have benchmark test handle discarded events
        ring-buffer: prevent adding write in discarded area
        tracing/filters: strloc should be unsigned short
        tracing/filters: operand can be negative
        ...
      
      Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually
      b0b7065b