Commits · 0093b108a6853d8398e1acf9021bcbd4666f4594 · nexedi / linux

26 Feb, 2009 2 commits

Merge branch 'tj-percpu' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/percpu · 0093b108
Ingo Molnar authored Feb 26, 2009

0093b108

percpu: fix too low alignment restriction on UP · e3176036

Tejun Heo authored Feb 26, 2009

UP __alloc_percpu() triggered WARN_ON_ONCE() if the requested
alignment is larger than that of unsigned long long, which is too
small for all the cacheline aligned allocations.  Bump it up to
SMP_CACHE_BYTES which kmalloc allocations generally guarantee.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ingo Molnar <mingo@elte.hu>

e3176036

25 Feb, 2009 4 commits

alloc_percpu: fix UP build · d2b02615

Ingo Molnar authored Feb 25, 2009

Impact: build fix

the !SMP branch had a 'gfp' leftover:

 include/linux/percpu.h: In function '__alloc_percpu':
 include/linux/percpu.h:160: error: 'gfp' undeclared (first use in this function)
 include/linux/percpu.h:160: error: (Each undeclared identifier is reported only once
 include/linux/percpu.h:160: error: for each function it appears in.)

Use GFP_KERNEL like the SMP version does.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

d2b02615

alloc_percpu: add align argument to __alloc_percpu, fix · 0dcec8c2

Ingo Molnar authored Feb 25, 2009

Impact: build fix

API was changed, but not all usage sites were converted:

 net/ipv4/route.c: In function ‘ip_rt_init’:
 net/ipv4/route.c:3379: error: too few arguments to function ‘__alloc_percpu’

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

0dcec8c2

x86: convert cacheflush macros inline functions · d3251005

Tejun Heo authored Feb 25, 2009

Impact: cleanup

Unused macro parameters cause spurious unused variable warnings.
Convert all cacheflush macros to inline functions to avoid the
warnings and achieve better type checking.
Signed-off-by: Tejun Heo <tj@kernel.org>

d3251005

x86, percpu: fix minor bugs in setup_percpu.c · 24ff9542

Tejun Heo authored Feb 25, 2009

Recent changes in setup_percpu.c made a now meaningless DBG()
statement fail to compile and introduced a
comparison-of-different-types warning.  Fix them.

Compile failure is reported by Ingo Molnar.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ingo Molnar <mingo@elte.hu>

24ff9542

24 Feb, 2009 21 commits

Merge branch 'tj-percpu' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/percpu · 0edcf8d6
Ingo Molnar authored Feb 24, 2009
```
Conflicts:
	arch/x86/include/asm/pgtable.h
```
0edcf8d6
Merge branch 'x86/core' into core/percpu · 87b20307
Ingo Molnar authored Feb 24, 2009

87b20307

Merge branches 'x86/acpi', 'x86/apic', 'x86/asm', 'x86/cleanups', 'x86/mm',... · a852cbfa

Ingo Molnar authored Feb 24, 2009

Merge branches 'x86/acpi', 'x86/apic', 'x86/asm', 'x86/cleanups', 'x86/mm', 'x86/signal' and 'x86/urgent'; commit 'v2.6.29-rc6' into x86/core

a852cbfa

x86: efi_stub_32,64 - add missing ENDPROCs · 9f331119

Cyrill Gorcunov authored Feb 23, 2009

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: heukelum@fastmail.fm
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9f331119

x86: head_64.S - use GLOBAL macro · bc8b2b92

Cyrill Gorcunov authored Feb 23, 2009

Impact: cleanup
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: heukelum@fastmail.fm
Signed-off-by: Ingo Molnar <mingo@elte.hu>

bc8b2b92

x86: entry_64.S - add missing ENDPROC · b3baaa13

Cyrill Gorcunov authored Feb 23, 2009

native_usergs_sysret64 is described as

	extern void native_usergs_sysret64(void)

so lets add ENDPROC here
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: heukelum@fastmail.fm
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

b3baaa13

x86: invalid_vm86_irq -- use predefined macros · 57e37293

Cyrill Gorcunov authored Feb 23, 2009

Impact: cleanup
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: heukelum@fastmail.fm
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

57e37293

x86: head_64.S - use IDT_ENTRIES instead of hardcoded number · 5e112ae2

Cyrill Gorcunov authored Feb 23, 2009

Impact: cleanup
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: heukelum@fastmail.fm
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

5e112ae2

x86: head_64.S - remove useless balign · 2a0b1001

Cyrill Gorcunov authored Feb 23, 2009

Impact: cleanup

NEXT_PAGE already has 'balign' so no
need to keep this redundant one.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: heukelum@fastmail.fm
Signed-off-by: Ingo Molnar <mingo@elte.hu>

2a0b1001

x86: fix performance regression in write() syscall · 30d697fa

Salman Qazi authored Feb 23, 2009

While the introduction of __copy_from_user_nocache (see commit:
0812a579) may have been an improvement
for sufficiently large writes, there is evidence to show that it is
deterimental for small writes.  Unixbench's fstime test gives the
following results for 256 byte writes with MAX_BLOCK of 2000:

    2.6.29-rc6 ( 5 samples, each in KB/sec ):
    283750, 295200, 294500, 293000, 293300

    2.6.29-rc6 + this patch (5 samples, each in KB/sec):
    313050, 3106750, 293350, 306300, 307900

    2.6.18
    395700, 342000, 399100, 366050, 359850

    See w_test() in src/fstime.c in unixbench version 4.1.0.  Basically, the above test
    consists of counting how much we can write in this manner:

    alarm(10);
    while (!sigalarm) {
            for (f_blocks = 0; f_blocks < 2000; ++f_blocks) {
                   write(f, buf, 256);
            }
            lseek(f, 0L, 0);
    }

Note, there are other components to the write syscall regression
that are not addressed here.
Signed-off-by: Salman Qazi <sqazi@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

30d697fa

percpu: add __read_mostly to variables which are mostly read only · 40150d37

Tejun Heo authored Feb 24, 2009

Most global variables in percpu allocator are initialized during boot
and read only from that point on.  Add __read_mostly as per Rusty's
suggestion.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>

40150d37

x86: add remapping percpu first chunk allocator · 8ac83757

Tejun Heo authored Feb 24, 2009

Impact: add better first percpu allocation for NUMA

On NUMA, embedding allocator can't be used as different units can't be
made to fall in the correct NUMA nodes.  To use large page mapping,
each unit needs to be remapped.  However, percpu areas are usually
much smaller than large page size and unused space hurts a lot as the
number of cpus grow.  This allocator remaps large pages for each chunk
but gives back unused part to the bootmem allocator making the large
pages mapped twice.

This adds slightly to the TLB pressure but is much better than using
4k mappings while still being NUMA-friendly.

Ingo suggested that this would be the correct approach for NUMA.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>

8ac83757

x86: add embedding percpu first chunk allocator · 89c92151

Tejun Heo authored Feb 24, 2009

Impact: add better first percpu allocation for !NUMA

On !NUMA, we can simply allocate contiguous memory and use it for the
first chunk without mapping it into vmalloc area.  As the memory area
is covered by the large page physical memory mapping, it allows the
dynamic perpcu allocator to not add any TLB overhead for the static
percpu area and whatever falls into the first chunk and the
implementation is very simple too.
Signed-off-by: Tejun Heo <tj@kernel.org>

89c92151

x86: separate out setup_pcpu_4k() from setup_per_cpu_areas() · 5f5d8405

Tejun Heo authored Feb 24, 2009

Impact: modularize percpu first chunk allocation

x86 is gonna have a few different strategies for the first chunk
allocation.  Modularize it by separating out the current allocation
mechanism into pcpu_alloc_bootmem() and setup_pcpu_4k().
Signed-off-by: Tejun Heo <tj@kernel.org>

5f5d8405

percpu: give more latitude to arch specific first chunk initialization · 8d408b4b

Tejun Heo authored Feb 24, 2009

Impact: more latitude for first percpu chunk allocation

The first percpu chunk serves the kernel static percpu area and may or
may not contain extra room for further dynamic allocation.
Initialization of the first chunk needs to be done before normal
memory allocation service is up, so it has its own init path -
pcpu_setup_static().

It seems archs need more latitude while initializing the first chunk
for example to take advantage of large page mapping.  This patch makes
the following changes to allow this.

* Define PERCPU_DYNAMIC_RESERVE to give arch hint about how much space
  to reserve in the first chunk for further dynamic allocation.

* Rename pcpu_setup_static() to pcpu_setup_first_chunk().

* Make pcpu_setup_first_chunk() much more flexible by fetching page
  pointer by callback and adding optional @unit_size, @free_size and
  @base_addr arguments which allow archs to selectively part of chunk
  initialization to their likings.
Signed-off-by: Tejun Heo <tj@kernel.org>

8d408b4b

percpu: remove unit_size power-of-2 restriction · d9b55eeb

Tejun Heo authored Feb 24, 2009

Impact: allow unit_size to be arbitrary multiple of PAGE_SIZE

In dynamic percpu allocator, there is no reason the unit size should
be power of two.  Remove the restriction.

As non-power-of-two unit size means that empty chunks fall into the
same slot index as lightly occupied chunks which is bad for reclaming.
Reserve an extra slot for empty chunks.
Signed-off-by: Tejun Heo <tj@kernel.org>

d9b55eeb

x86: update populate_extra_pte() and add populate_extra_pmd() · 458a3e64

Tejun Heo authored Feb 24, 2009

Impact: minor change to populate_extra_pte() and addition of pmd flavor

Update populate_extra_pte() to return pointer to the pte_t for the
specified address and add populate_extra_pmd() which only populates
till the pmd and returns pointer to the pmd entry for the address.

For 64bit, pud/pmd/pte fill functions are separated out from
set_pte_vaddr[_pud]() and used for set_pte_vaddr[_pud]() and
populate_extra_{pte|pmd}().
Signed-off-by: Tejun Heo <tj@kernel.org>

458a3e64

vmalloc: add @align to vm_area_register_early() · c0c0a293

Tejun Heo authored Feb 24, 2009

Impact: allow larger alignment for early vmalloc area allocation

Some early vmalloc users might want larger alignment, for example, for
custom large page mapping.  Add @align to vm_area_register_early().
While at it, drop docbook comment on non-existent @size.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>

c0c0a293

bootmem: reorder interface functions and add a missing one · 2d0aae41

Tejun Heo authored Feb 24, 2009

Impact: cleanup and addition of missing interface wrapper

The interface functions in bootmem.h was ordered in not so orderly
manner.  Reorder them such that

* functions allocating the same area group together -
  ie. alloc_bootmem group and alloc_bootmem_low group.

* functions w/o node parameter come before the ones w/ node parameter.

* nopanic variants are immediately below their panicky counterparts.

While at it, add alloc_bootmem_pages_node_nopanic() which was missing.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@saeurebad.de>

2d0aae41

bootmem: clean up arch-specific bootmem wrapping · c1329375

Tejun Heo authored Feb 24, 2009

Impact: cleaner and consistent bootmem wrapping

By setting CONFIG_HAVE_ARCH_BOOTMEM_NODE, archs can define
arch-specific wrappers for bootmem allocation.  However, this is done
a bit strangely in that only the high level convenience macros can be
changed while lower level, but still exported, interface functions
can't be wrapped.  This not only is messy but also leads to strange
situation where alloc_bootmem() does what the arch wants it to do but
the equivalent __alloc_bootmem() call doesn't although they should be
able to be used interchangeably.

This patch updates bootmem such that archs can override / wrap the
backend function - alloc_bootmem_core() instead of the highlevel
interface functions to allow simpler and consistent wrapping.  Also,
HAVE_ARCH_BOOTMEM_NODE is renamed to HAVE_ARCH_BOOTMEM.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@saeurebad.de>

c1329375

percpu: fix pcpu_chunk_struct_size · cb83b42e

Tejun Heo authored Feb 24, 2009

Impact: fix short allocation leading to memory corruption

While dropping rvalue wrapping macros around global parameters,
pcpu_chunk_struct_size was set incorrectly resulting in shorter page
pointer array.  Fix it.
Signed-off-by: Tejun Heo <tj@kernel.org>

cb83b42e

23 Feb, 2009 13 commits

x86, doc: fix references to Documentation/x86/i386/boot.txt · 954a8b81

Kyle McMartin authored Feb 19, 2009

Impact: Documentation fix

The amazing dancing boot.txt file has jumped places again.  It should
never have been in Documentation/x86/i386, since it never was
32-bit-specific, but it unfortunately ended up there for a while.
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

954a8b81

x86: minor cleanup in the espfix code · bda3a897

Stas Sergeev authored Feb 23, 2009

Impact: Cleanup

Checkin be44d2aa eliminates the use of
a 16-bit stack for espfix.  However, at least one instruction remained
that only operated on the low 16 bits of %esp.

This is not a bug per se because the kernel stack is always an aligned
4K or 8K block.  Therefore it cannot cross 64K boundaries; this code,
in fact, relies strictly on that fact.

However, it's a lot cleaner (and, for that matter, smaller) to operate
on the entire 32-bit register.
Signed-off-by: Stas Sergeev <stsp@aknet.ru>
CC: Zachary Amsden <zach@vmware.com>
CC: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

bda3a897

x86: check mptable physptr with max_low_pfn on 32bit · ecda0628

Yinghai Lu authored Feb 22, 2009

Impact: fix early crash on LinuxBIOS systems

Kevin O'Connor reported that Coreboot aka LinuxBIOS tries to put
mptable somewhere very high, well above max_low_pfn (below which
BIOSes generally put the mptable), causing a panic.

The BIOS will probably be changed to be compatible with older
Linus versions, but nevertheless the MP-spec does not forbid
an MP-table in arbitrary system RAM, so make sure it all
works even if the table is in an unexpected place.

Check physptr with max_low_pfn * PAGE_SIZE.
Reported-by: Kevin O'Connor <kevin@koconnor.net>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Stefan Reinauer <stepan@coresystems.de>
Cc: coreboot@coreboot.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>

ecda0628

Linux 2.6.29-rc6 · 20f4d6c3
Linus Torvalds authored Feb 22, 2009

20f4d6c3

acpi/doc: add missing param value · af23f573

Randy Dunlap authored Feb 22, 2009

Add missing parameter value to list of available values
for acpi=<value>.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

af23f573

Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 · 83105092

Linus Torvalds authored Feb 22, 2009

* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/i915: Add missing mutex_lock(&dev->struct_mutex)
  drm/i915: fix WC mapping in non-GEM i915 code.
  drm/i915: Fix regression in 95ca9d
  drm/i915: Retire requests from i915_gem_busy_ioctl.
  drm/i915: suspend/resume GEM when KMS is active
  drm/i915: Don't let a device flush to prepare buffers clear new write_domains.
  drm/i915: Cut two args to set_to_gpu_domain that confused this tricky path.

83105092

drm/i915: Add missing mutex_lock(&dev->struct_mutex) · 5004417d

Pierre Willenbrock authored Feb 23, 2009

there might be a nicer way to fix this but this is the simplest for now.
Signed-off-by: Pierre Willenbrock <pierre@pirsoft.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>

5004417d

drm/i915: fix WC mapping in non-GEM i915 code. · 6fb88588
Jesse Barnes authored Feb 23, 2009
```
[airlied - taken from mailing list posting]
Signed-off-by: Dave Airlie <airlied@redhat.com>
```
6fb88588

drm/i915: Fix regression in 95ca9d · bab2d1f6

Chris Wilson authored Feb 20, 2009

The object is dereferenced before the NULL check. Oops.

Fixes http://bugs.freedesktop.org/show_bug.cgi?id=20235Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

bab2d1f6

drm/i915: Retire requests from i915_gem_busy_ioctl. · f21289b3

Eric Anholt authored Feb 18, 2009

This ensures that the user gets the latest information from the hardware
on whether the buffer is busy, potentially reducing the working set of objects
that the user chooses.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

f21289b3

drm/i915: suspend/resume GEM when KMS is active · 5669fcac

Jesse Barnes authored Feb 17, 2009

In the KMS case, we need to suspend/resume GEM as well.  So on suspend, make
sure we idle GEM and stop any new rendering from coming in, and on resume,
re-init the framebuffer and clear the suspended flag.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

5669fcac

drm/i915: Don't let a device flush to prepare buffers clear new write_domains. · efbeed96

Eric Anholt authored Feb 19, 2009

The problem was that object_set_to_gpu_domain would set the new write_domains
that are getting set by this batchbuffer, then the accumulated flushes required
for all the objects in preparation for this batchbuffer were posted, and the
brand new write domain would get cleared by the flush being posted. Instead,
hang on to the new (or old if we're not changing it) value and set it after
the flush is queued.

Results from this noticably included conformance test failures from reads
shortly after writes (where the new write domain had been lost and thus not
flushed and waited on), but is a suspected cause of hangs in some apps when
a write domain is lost on a buffer that gets reused for instruction or
commmand state.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

efbeed96

drm/i915: Cut two args to set_to_gpu_domain that confused this tricky path. · 8b0e378a

Eric Anholt authored Feb 19, 2009

While not strictly required, it helped while thinking about the following
change.  This change should be invariant.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

8b0e378a