Commits · dac5e7849034825665bb114dc554e35cae449f49 · Kirill Smelkov / linux

01 Apr, 2010 24 commits

efifb: fix framebuffer handoff · dac5e784

Marcin Slusarz authored Feb 22, 2010

commit 89f3f219 upstream.

Commit 4410f391 ("fbdev: add support for
handoff from firmware to hw framebuffers") didn't add fb_destroy
operation to efifb.  Fix it and change aperture_size to match size
passed to request_mem_region.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=15151Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Reported-by: Alex Zhavnerchik <alex.vizor@gmail.com>
Tested-by: Alex Zhavnerchik <alex.vizor@gmail.com>
Acked-by: Peter Jones <pjones@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

dac5e784

drm/edid: Unify detailed block parsing between base and extension blocks · 1ba878ba

Adam Jackson authored Dec 03, 2009

commit 9cf00977 upstream.

Also fix an embarassing bug in standard timing subblock parsing that
would result in an infinite loop.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

1ba878ba

PCI: unconditionally clear AER uncorr status register during cleanup · 1ebf19e8

Andrew Patterson authored Dec 03, 2009

commit 6cdfd995 upstream.

The current implementation of pci_cleanup_aer_uncorrect_error_status
only clears either fatal or non-fatal error status bits depending
on the state of the I/O channel. This implementation will then often
leave some bits set after PCI error recovery completes.  The uncleared bit
settings will then be falsely reported the next time an AER interrupt is
generated for that hierarchy. An easy way to illustrate this issue is to
use the aer-inject module to simultaneously inject both an uncorrectable
non-fatal and uncorrectable fatal error.  One of the errors will not be
cleared.

This patch resolves this issue by unconditionally clearing all bits in
the AER uncorrectable status register. All settings and corrective action
strategies are saved and determined before
pci_cleanup_aer_uncorrect_error_status is called, so this change should not
affect errory handling functionality.
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Alex Chiang <achiang@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

1ebf19e8

tracing: Do not record user stack trace from NMI context · 80736c50

Steven Rostedt authored Mar 12, 2010

commit b6345879 upstream.

A bug was found with Li Zefan's ftrace_stress_test that caused applications
to segfault during the test.

Placing a tracing_off() in the segfault code, and examining several
traces, I found that the following was always the case. The lock tracer
was enabled (lockdep being required) and userstack was enabled. Testing
this out, I just enabled the two, but that was not good enough. I needed
to run something else that could trigger it. Running a load like hackbench
did not work, but executing a new program would. The following would
trigger the segfault within seconds:

  # echo 1 > /debug/tracing/options/userstacktrace
  # echo 1 > /debug/tracing/events/lock/enable
  # while :; do ls > /dev/null ; done

Enabling the function graph tracer and looking at what was happening
I finally noticed that all cashes happened just after an NMI.

 1)               |    copy_user_handle_tail() {
 1)               |      bad_area_nosemaphore() {
 1)               |        __bad_area_nosemaphore() {
 1)               |          no_context() {
 1)               |            fixup_exception() {
 1)   0.319 us    |              search_exception_tables();
 1)   0.873 us    |            }
[...]
 1)   0.314 us    |  __rcu_read_unlock();
 1)   0.325 us    |    native_apic_mem_write();
 1)   0.943 us    |  }
 1)   0.304 us    |  rcu_nmi_exit();
[...]
 1)   0.479 us    |  find_vma();
 1)               |  bad_area() {
 1)               |    __bad_area() {

After capturing several traces of failures, all of them happened
after an NMI. Curious about this, I added a trace_printk() to the NMI
handler to read the regs->ip to see where the NMI happened. In which I
found out it was here:

ffffffff8135b660 <page_fault>:
ffffffff8135b660:       48 83 ec 78             sub    $0x78,%rsp
ffffffff8135b664:       e8 97 01 00 00          callq  ffffffff8135b800 <error_entry>

What was happening is that the NMI would happen at the place that a page
fault occurred. It would call rcu_read_lock() which was traced by
the lock events, and the user_stack_trace would run. This would trigger
a page fault inside the NMI. I do not see where the CR2 register is
saved or restored in NMI handling. This means that it would corrupt
the page fault handling that the NMI interrupted.

The reason the while loop of ls helped trigger the bug, was that
each execution of ls would cause lots of pages to be faulted in, and
increase the chances of the race happening.

The simple solution is to not allow user stack traces in NMI context.
After this patch, I ran the above "ls" test for a couple of hours
without any issues. Without this patch, the bug would trigger in less
than a minute.
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

80736c50

tracing: Disable buffer switching when starting or stopping trace · 95b9725f

Steven Rostedt authored Mar 12, 2010

commit a2f80714 upstream.

When the trace iterator is read, tracing_start() and tracing_stop()
is called to stop tracing while the iterator is processing the trace
output.

These functions disable both the standard buffer and the max latency
buffer. But if the wakeup tracer is running, it can switch these
buffers between the two disables:

  buffer = global_trace.buffer;
  if (buffer)
      ring_buffer_record_disable(buffer);

      <<<--------- swap happens here

  buffer = max_tr.buffer;
  if (buffer)
      ring_buffer_record_disable(buffer);

What happens is that we disabled the same buffer twice. On tracing_start()
we can enable the same buffer twice. All ring_buffer_record_disable()
must be matched with a ring_buffer_record_enable() or the buffer
can be disable permanently, or enable prematurely, and cause a bug
where a reset happens while a trace is commiting.

This patch protects these two by taking the ftrace_max_lock to prevent
a switch from occurring.

Found with Li Zefan's ftrace_stress_test.
Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

95b9725f

tracing: Use same local variable when resetting the ring buffer · c087612a

Steven Rostedt authored Mar 12, 2010

commit 283740c6 upstream.

In the ftrace code that resets the ring buffer it references the
buffer with a local variable, but then uses the tr->buffer as the
parameter to reset. If the wakeup tracer is running, which can
switch the tr->buffer with the max saved buffer, this can break
the requirement of disabling the buffer before the reset.

   buffer = tr->buffer;
   ring_buffer_record_disable(buffer);
   synchronize_sched();
   __tracing_reset(tr->buffer, cpu);

If the tr->buffer is swapped, then the reset is not happening to the
buffer that was disabled. This will cause the ring buffer to fail.

Found with Li Zefan's ftrace_stress_test.
Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

c087612a

Bluetooth: Fix sleeping function in RFCOMM within invalid context · 1020738f

Marcel Holtmann authored Feb 03, 2010

commit 485f1eff upstream.

With the commit 9e726b17 the
rfcomm_session_put() gets accidentially called from a timeout
callback and results in this:

BUG: sleeping function called from invalid context at net/core/sock.c:1897
in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper
Pid: 0, comm: swapper Tainted: P           2.6.32 #31
Call Trace:
 <IRQ>  [<ffffffff81036455>] __might_sleep+0xf8/0xfa
 [<ffffffff8138ef1d>] lock_sock_nested+0x29/0xc4
 [<ffffffffa03921b3>] lock_sock+0xb/0xd [l2cap]
 [<ffffffffa03948e6>] l2cap_sock_shutdown+0x1c/0x76 [l2cap]
 [<ffffffff8106adea>] ? clockevents_program_event+0x75/0x7e
 [<ffffffff8106bea2>] ? tick_dev_program_event+0x37/0xa5
 [<ffffffffa0394967>] l2cap_sock_release+0x27/0x67 [l2cap]
 [<ffffffff8138c971>] sock_release+0x1a/0x67
 [<ffffffffa03d2492>] rfcomm_session_del+0x34/0x53 [rfcomm]
 [<ffffffffa03d24c5>] rfcomm_session_put+0x14/0x16 [rfcomm]
 [<ffffffffa03d28b4>] rfcomm_session_timeout+0xe/0x1a [rfcomm]
 [<ffffffff810554a8>] run_timer_softirq+0x1e2/0x29a
 [<ffffffffa03d28a6>] ? rfcomm_session_timeout+0x0/0x1a [rfcomm]
 [<ffffffff8104e0f6>] __do_softirq+0xfe/0x1c5
 [<ffffffff8100e8ce>] ? timer_interrupt+0x1a/0x21
 [<ffffffff8100cc4c>] call_softirq+0x1c/0x28
 [<ffffffff8100e05b>] do_softirq+0x33/0x6b
 [<ffffffff8104daf6>] irq_exit+0x36/0x85
 [<ffffffff8100d7a9>] do_IRQ+0xa6/0xbd
 [<ffffffff8100c493>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff812585b3>] ? acpi_idle_enter_bm+0x269/0x294
 [<ffffffff812585a9>] ? acpi_idle_enter_bm+0x25f/0x294
 [<ffffffff81373ddc>] ? cpuidle_idle_call+0x97/0x107
 [<ffffffff8100aca0>] ? cpu_idle+0x53/0xaa
 [<ffffffff81429006>] ? rest_init+0x7a/0x7c
 [<ffffffff8177bc8c>] ? start_kernel+0x389/0x394
 [<ffffffff8177b29c>] ? x86_64_start_reservations+0xac/0xb0
 [<ffffffff8177b384>] ? x86_64_start_kernel+0xe4/0xeb

To fix this, the rfcomm_session_put() needs to be moved out of
rfcomm_session_timeout() into rfcomm_process_sessions(). In that
context it is perfectly fine to sleep and disconnect the socket.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Tested-by: David John <davidjon@xenontk.org>
Cc: Chase Douglas <chase.douglas@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

1020738f

function-graph: Init curr_ret_stack with ret_stack · 090fe6cb

Steven Rostedt authored Mar 12, 2010

commit ea14eb71 upstream.

If the graph tracer is active, and a task is forked but the allocating of
the processes graph stack fails, it can cause crash later on.

This is due to the temporary stack being NULL, but the curr_ret_stack
variable is copied from the parent. If it is not -1, then in
ftrace_graph_probe_sched_switch() the following:

	for (index = next->curr_ret_stack; index >= 0; index--)
		next->ret_stack[index].calltime += timestamp;

Will cause a kernel OOPS.

Found with Li Zefan's ftrace_stress_test.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

090fe6cb

ring-buffer: Move disabled check into preempt disable section · 9dfa691c

Lai Jiangshan authored Mar 08, 2010

commit 52fbe9cd upstream.

The ring buffer resizing and resetting relies on a schedule RCU
action. The buffers are disabled, a synchronize_sched() is called
and then the resize or reset takes place.

But this only works if the disabling of the buffers are within the
preempt disabled section, otherwise a window exists that the buffers
can be written to while a reset or resize takes place.
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4B949E43.2010906@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

9dfa691c

ath5k: fix setup for CAB queue · a0d8ffd6

Bob Copeland authored Jan 20, 2010

commit a951ae21 upstream.

The beacon sent gating doesn't seem to work with any combination
of flags.  Thus, buffered frames tend to stay buffered forever,
using up tx descriptors.

Instead, use the DBA gating and hold transmission of the buffered
frames until 80% of the beacon interval has elapsed using the ready
time.  This fixes the following error in AP mode:

   ath5k phy0: no further txbuf available, dropping packet

Add a comment to acknowledge that this isn't the best solution.
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Acked-by: Nick Kossifidis <mickflemm@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

a0d8ffd6

ath5k: dont use external sleep clock in AP mode · d69b8a11

Bob Copeland authored Mar 11, 2010

commit 5d6ce628 upstream

When using the external sleep clock in AP mode, the
TSF increments too quickly, causing beacon interval
to be much lower than it is supposed to be, resulting
in lots of beacon-not-ready interrupts.

This fixes http://bugzilla.kernel.org/show_bug.cgi?id=14802.
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Acked-by: Nick Kossifidis <mickflemm@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: Luis Rodriguez <lrodriguez@atheros.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

d69b8a11

i2c-i801: Don't use the block buffer for I2C block writes · cde4ff84

Jean Delvare authored Mar 13, 2010

commit c074c39d upstream.

Experience has shown that the block buffer can only be used for SMBus
(not I2C) block transactions, even though the datasheet doesn't
mention this limitation.
Reported-by: Felix Rubinstein <felixru@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Oleg Ryjkov <oryjkov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

cde4ff84

Input: i8042 - add ALDI/MEDION netbook E1222 to qurik reset table · 93c676ec

Christoph Fritz authored Mar 13, 2010

commit 31968ecf upstream.

ALDI/MEDION netbook E1222 needs to be in the reset quirk list for
its touchpad's proper function.
Reported-by: Michael Fischer <mifi@gmx.de>
Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

93c676ec

Input: alps - add support for the touchpad on Toshiba Tecra A11-11L · e2d83be6

Thomas Bächler authored Mar 09, 2010

commit eb8bff85 upstream.
Signed-off-by: Thomas Bächler <thomas@archlinux.org>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

e2d83be6

timekeeping: Prevent oops when GENERIC_TIME=n · 74b17607

john stultz authored Mar 01, 2010

commit ad6759fb upstream.

Aaro Koskinen reported an issue in kernel.org bugzilla #15366, where
on non-GENERIC_TIME systems, accessing
/sys/devices/system/clocksource/clocksource0/current_clocksource
results in an oops.

It seems the timekeeper/clocksource rework missed initializing the
curr_clocksource value in the !GENERIC_TIME case.

Thanks to Aaro for reporting and diagnosing the issue as well as
testing the fix!
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
LKML-Reference: <1267475683.4216.61.camel@localhost.localdomain>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

74b17607

ALSA: hda: enable MSI for Gateway M-6866 · bc176c0d

Daniel T Chen authored Mar 15, 2010

BugLink: https://bugs.launchpad.net/bugs/538918

The OR has verified that explicitly enabling MSI is necessary for audio
to be audible by default.

This patch is only applicable to 2.6.32.y; in 2.6.33, MSI is enabled by
default.
Reported-by: Sam Townsend <stownsend42@sbcglobal.net>
Tested-by: Sam Townsend <stownsend42@sbcglobal.net>
Acked-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Daniel T Chen <crimsun@ubuntu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

bc176c0d

ALSA: hda - Fix input source elements of secondary ADCs on Realtek · 35d14dd0

Takashi Iwai authored Mar 08, 2010

commit 5311114d upstream.

Since alc_auto_create_input_ctls() doesn't set the elements for the
secondary ADCs, "Input Source" elemtns for these also get empty, resulting
in buggy outputs of alsactl like:
	control.14 {
		comment.access 'read write'
		comment.type ENUMERATED
		comment.count 1
		iface MIXER
		name 'Input Source'
		index 1
		value 0
	}

This patch fixes alc_mux_enum_*() (and others) to fall back to the
first entry if the secondary input mux is empty.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

35d14dd0

tg3: Fix 5906 transmit hangs · 9d3ba2e2

Matt Carlson authored Mar 12, 2010

This is a resubmit backport of commit 92c6b8d1
to kernel version 2.6.32. The gentoo bug report can be found at
https://bugs.gentoo.org/show_bug.cgi?id=301091. Thanks to Matt Carlson for his
assistance and working me to fix a regression caused by the initial patch.  The
original description is as follows:

The 5906 has trouble with fragments that are less than 8 bytes in size.  This
patch works around the problem by pivoting the 5906's transmit routine to
tg3_start_xmit_dma_bug() and introducing a new SHORT_DMA_BUG flag that enables
code to detect and react to the problematic condition.
Signed-off-by: Mike Pagano <mpagano@gentoo.org>
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

9d3ba2e2

tg3: Fix tg3_poll_controller() passing wrong pointer to tg3_interrupt() · 1ab72755

Louis Rilling authored Mar 09, 2010

commit fe234f0e upstream.

Commit 09943a18
	Author: Matt Carlson <mcarlson@broadcom.com>
	Date:   Fri Aug 28 14:01:57 2009 +0000

	tg3: Convert ISR parameter to tnapi

forgot to update tg3_poll_controller(), leading to intermittent crashes with
netpoll.

Fix this.
Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

1ab72755

MIPS: Cleanup forgotten label_module_alloc in tlbex.c · 16dc6f06

David Daney authored Dec 03, 2009

commit abbdc3d8 upstream.

commit c8af165342e83a4eb078c9607d29a7c399d30a53 (lmo) rsp.
e0cc87f5 (kernel.org) left
label_module_alloc unused.  Remove it now.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Cc: linux-mips@linux-mips.org
Patchwork: http://patchwork.linux-mips.org/patch/752/Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

16dc6f06

ARM: Fix decompressor's kernel size estimation for ROM=y · 5f1cbd20

Russell King authored Feb 25, 2010

commit 98e12b5a upstream.

Commit 2552fc27 changed the way the decompressor decides if it is safe
to decompress the kernel directly to its final location.  Unfortunately,
it took the top of the compressed data as being the stack pointer,
which it is for ROM=n cases.  However, for ROM=y, the stack pointer
is not relevant, and results in the wrong answer.

Fix this by explicitly storing the end of the biggybacked data in the
decompressor, and use that to calculate the compressed image size.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

5f1cbd20

decompress: fix new decompressor for PIC · e99a084b

Russell King authored Mar 10, 2010

commit 5ceaa2f3 upstream.

The ARM kernel decompressor wants to be able to relocate r/w data
independently from the rest of the image, and we do this by ensuring that
r/w data has global visibility.  Define STATIC_RW_DATA to be empty to
achieve this.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Alain Knaff <alain@knaff.lu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

e99a084b

drivers/scsi/ses.c: eliminate double free · b4c8997c

Julia Lawall authored Mar 10, 2010

commit 9b3a6549 upstream.

The few lines below the kfree of hdr_buf may go to the label err_free
which will also free hdr_buf.  The most straightforward solution seems to
be to just move the kfree of hdr_buf after these gotos.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r@
identifier E;
expression E1;
iterator I;
statement S;
@@

*kfree(E);
... when != E = E1
    when != I(E,...) S
    when != &E
*kfree(E);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

b4c8997c

drm/i915: fix gpio register detection logic for BIOS without VBT · f0c63934

Shaohua Li authored Nov 18, 2009

commit 29874f44 upstream.

if no VBT is present, crt_ddc_bus will be left at 0, and cause us
to use that for the GPIO register offset. That's never a valid register
offset, so let the "undefined" value be 0 instead of -1.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
[anholt: clarified the commit message a bit]
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

f0c63934

15 Mar, 2010 16 commits

Linux 2.6.32.10 · dd49f626
Greg Kroah-Hartman authored Mar 15, 2010

dd49f626

x86, mm: Allow highmem user page tables to be disabled at boot time · 1942aeab

Ian Campbell authored Feb 17, 2010

commit 14315592 upstream.

Distros generally (I looked at Debian, RHEL5 and SLES11) seem to
enable CONFIG_HIGHPTE for any x86 configuration which has highmem
enabled. This means that the overhead applies even to machines which
have a fairly modest amount of high memory and which therefore do not
really benefit from allocating PTEs in high memory but still pay the
price of the additional mapping operations.

Running kernbench on a 4G box I found that with CONFIG_HIGHPTE=y but
no actual highptes being allocated there was a reduction in system
time used from 59.737s to 55.9s.

With CONFIG_HIGHPTE=y and highmem PTEs being allocated:
  Average Optimal load -j 4 Run (std deviation):
  Elapsed Time 175.396 (0.238914)
  User Time 515.983 (5.85019)
  System Time 59.737 (1.26727)
  Percent CPU 263.8 (71.6796)
  Context Switches 39989.7 (4672.64)
  Sleeps 42617.7 (246.307)

With CONFIG_HIGHPTE=y but with no highmem PTEs being allocated:
  Average Optimal load -j 4 Run (std deviation):
  Elapsed Time 174.278 (0.831968)
  User Time 515.659 (6.07012)
  System Time 55.9 (1.07799)
  Percent CPU 263.8 (71.266)
  Context Switches 39929.6 (4485.13)
  Sleeps 42583.7 (373.039)

This patch allows the user to control the allocation of PTEs in
highmem from the command line ("userpte=nohigh") but retains the
status-quo as the default.

It is possible that some simple heuristic could be developed which
allows auto-tuning of this option however I don't have a sufficiently
large machine available to me to perform any particularly meaningful
experiments. We could probably handwave up an argument for a threshold
at 16G of total RAM.

Assuming 768M of lowmem we have 196608 potential lowmem PTE
pages. Each page can map 2M of RAM in a PAE-enabled configuration,
meaning a maximum of 384G of RAM could potentially be mapped using
lowmem PTEs.

Even allowing generous factor of 10 to account for other required
lowmem allocations, generous slop to account for page sharing (which
reduces the total amount of RAM mappable by a given number of PT
pages) and other innacuracies in the estimations it would seem that
even a 32G machine would not have a particularly pressing need for
highmem PTEs. I think 32G could be considered to be at the upper bound
of what might be sensible on a 32 bit machine (although I think in
practice 64G is still supported).

It's seems questionable if HIGHPTE is even a win for any amount of RAM
you would sensibly run a 32 bit kernel on rather than going 64 bit.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
LKML-Reference: <1266403090-20162-1-git-send-email-ian.campbell@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

1942aeab

sched: Don't use possibly stale sched_class · 04833a6a

Thomas Gleixner authored Feb 17, 2010

commit 83ab0aa0 upstream.

setscheduler() saves task->sched_class outside of the rq->lock held
region for a check after the setscheduler changes have become
effective. That might result in checking a stale value.

rtmutex_setprio() has the same problem, though it is protected by
p->pi_lock against setscheduler(), but for correctness sake (and to
avoid bad examples) it needs to be fixed as well.

Retrieve task->sched_class inside of the rq->lock held region.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

04833a6a

sched: Fix SMT scheduler regression in find_busiest_queue() · 76d07136

Suresh Siddha authored Feb 12, 2010

commit 9000f05c upstream.

Fix a SMT scheduler performance regression that is leading to a scenario
where SMT threads in one core are completely idle while both the SMT threads
in another core (on the same socket) are busy.

This is caused by this commit (with the problematic code highlighted)

   commit bdb94aa5
   Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
   Date:   Tue Sep 1 10:34:38 2009 +0200

   sched: Try to deal with low capacity

   @@ -4203,15 +4223,18 @@ find_busiest_queue()
   ...
	for_each_cpu(i, sched_group_cpus(group)) {
   +	unsigned long power = power_of(i);

   ...

   -	wl = weighted_cpuload(i);
   +	wl = weighted_cpuload(i) * SCHED_LOAD_SCALE;
   +	wl /= power;

   -	if (rq->nr_running == 1 && wl > imbalance)
   +	if (capacity && rq->nr_running == 1 && wl > imbalance)
		continue;

On a SMT system, power of the HT logical cpu will be 589 and
the scheduler load imbalance (for scenarios like the one mentioned above)
can be approximately 1024 (SCHED_LOAD_SCALE). The above change of scaling
the weighted load with the power will result in "wl > imbalance" and
ultimately resulting in find_busiest_queue() return NULL, causing
load_balance() to think that the load is well balanced. But infact
one of the tasks can be moved to the idle core for optimal performance.

We don't need to use the weighted load (wl) scaled by the cpu power to
compare with  imabalance. In that condition, we already know there is only a
single task "rq->nr_running == 1" and the comparison between imbalance,
wl is to make sure that we select the correct priority thread which matches
imbalance. So we really need to compare the imabalnce with the original
weighted load of the cpu and not the scaled load.

But in other conditions where we want the most hammered(busiest) cpu, we can
use scaled load to ensure that we consider the cpu power in addition to the
actual load on that cpu, so that we can move the load away from the
guy that is getting most hammered with respect to the actual capacity,
as compared with the rest of the cpu's in that busiest group.

Fix it.
Reported-by: Ma Ling <ling.ma@intel.com>
Initial-Analysis-by: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1266023662.2808.118.camel@sbs-t61.sc.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

76d07136

sched: Fix sched_mv_power_savings for !SMT · d9d97367

Vaidyanathan Srinivasan authored Feb 08, 2010

commit 28f53181 upstream.

Fix for sched_mc_powersavigs for pre-Nehalem platforms.
Child sched domain should clear SD_PREFER_SIBLING if parent will have
SD_POWERSAVINGS_BALANCE because they are contradicting.

Sets the flags correctly based on sched_mc_power_savings.
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100208100555.GD2931@dirshya.in.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

d9d97367

KVM: x86 emulator: Check CPL level during privilege instruction emulation · 0e352d47

Gleb Natapov authored Feb 10, 2010

commit e92805ac upstream.

Add CPL checking in case emulator is tricked into emulating
privilege instruction from userspace.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

0e352d47

KVM: x86 emulator: Add group9 instruction decoding · 9a93be43

Gleb Natapov authored Feb 10, 2010

commit 60a29d4e upstream.

Use groups mechanism to decode 0F C7 instructions.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

9a93be43

KVM: x86 emulator: Forbid modifying CS segment register by mov instruction · 3e338f53

Gleb Natapov authored Feb 18, 2010

commit 8b9f4414 upstream.

Inject #UD if guest attempts to do so. This is in accordance to Intel
SDM.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

3e338f53

KVM: x86 emulator: Add group8 instruction decoding · 89412b0b

Gleb Natapov authored Feb 10, 2010

commit 2db2c2eb upstream.

Use groups mechanism to decode 0F BA instructions.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

89412b0b

dm: free dm_io before bio_endio not after · f23fb645

Mikulas Patocka authored Mar 06, 2010

commit a97f925a upstream.

Free the dm_io structure before calling bio_endio() instead of after it,
to ensure that the io_pool containing it is not referenced after it is
freed.

This partially fixes a problem described here
  https://www.redhat.com/archives/dm-devel/2010-February/msg00109.html

thread 1:
bio_endio(bio, io_error);
/* scheduling happens */
					thread 2:
					close the device
					remove the device
thread 1:
free_io(md, io);

Thread 2, when removing the device, sees non-empty md->io_pool (because the
io hasn't been freed by thread 1 yet) and may crash with BUG in mempool_free.
Thread 1 may also crash, when freeing into a nonexisting mempool.

To fix this we must make sure that bio_endio() is the last call and
the md structure is not accessed afterwards.

There is another bio_endio in process_barrier, but it is called from the thread
and the thread is destroyed prior to freeing the mempools, so this call is
not affected by the bug.

A similar bug exists with module unloads - the module may be unloaded
immediately after bio_endio - but that is more difficult to fix.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

f23fb645

NFS: Fix an allocation-under-spinlock bug · f7081342

Trond Myklebust authored Mar 02, 2010

commit ebed9203 upstream.

sunrpc_cache_update() will always call detail->update() from inside the
detail->hash_lock, so it cannot allocate memory.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

f7081342

rtc-coh901331: fix braces in resume code · b8e5ba11

James Hogan authored Mar 05, 2010

commit 5a98c04d upstream.

The else part of the if statement is indented but does not have braces
around it. It clearly should since it uses clk_enable and clk_disable
which are supposed to balance.
Signed-off-by: James Hogan <james@albanarts.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

b8e5ba11

s3cmci: s3cmci_card_present: Use no_detect to decide whether there is a card detect pin · 31ec453d

Lars-Peter Clausen authored Mar 05, 2010

commit dc2ed552 upstream.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

31ec453d

SUNRPC: Handle EINVAL error returns from the TCP connect operation · c73b9e0e

Trond Myklebust authored Mar 02, 2010

commit 9fcfe0c8 upstream.

This can, for instance, happen if the user specifies a link local IPv6
address.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

c73b9e0e

sunrpc: remove unnecessary svc_xprt_put · 967f9400

Neil Brown authored Feb 27, 2010

commit ab1b18f7 upstream.

The 'struct svc_deferred_req's on the xpt_deferred queue do not
own a reference to the owning xprt.  This is seen in svc_revisit
which is where things are added to this queue.  dr->xprt is set to
NULL and the reference to the xprt it put.

So when this list is cleaned up in svc_delete_xprt, we mustn't
put the reference.

Also, replace the 'for' with a 'while' which is arguably
simpler and more likely to compile efficiently.

Cc: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

967f9400

drm/ttm: handle OOM in ttm_tt_swapout · d234eee1

Maarten Maathuis authored Feb 20, 2010

commit 290e5505 upstream.

- Without this change I get a general protection fault.
- Also use PTR_ERR where applicable.
Signed-off-by: Maarten Maathuis <madman2003@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Acked-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

d234eee1