Commits · 4a29865689dbb87a02e3b0fff4a4ae5041273173 · Kirill Smelkov / linux

06 May, 2011 14 commits

rcu: make rcutorture version numbers available through debugfs · 4a298656

Paul E. McKenney authored Apr 03, 2011

It is not possible to accurately correlate rcutorture output with that
of debugfs. This patch therefore adds a debugfs file that prints out
the rcutorture version number, permitting easy correlation.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

4a298656

rcu: add tracing for RCU's kthread run states. · d71df90e

Paul E. McKenney authored Mar 29, 2011

Add tracing to help debugging situations when RCU's kthreads are not
running but are supposed to be.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

d71df90e

rcu: add callback-queue information to rcudata output · 0ac3d136

Paul E. McKenney authored Mar 28, 2011

This commit adds an indication of the state of the callback queue using
a string of four characters following the "ql=" integer queue length.
The first character is "N" if there are callbacks that have been
queued that are not yet ready to be handled by the next grace period, or
"." otherwise. The second character is "R" if there are callbacks queued
that are ready to be handled by the next grace period, or "." otherwise.
The third character is "W" if there are callbacks waiting for the current
grace period, or "." otherwise. Finally, the fourth character is "D"
if there are callbacks that have been handled by a prior grace period
and are waiting to be invoked, or ".".

Note that callbacks that are in the process of being invoked are
not shown. These callbacks would have been removed from the rcu_data
structure's list by rcu_do_batch() prior to being executed. (These
callbacks are also not reflected in the "ql=" total, FWIW.)

Also, document the new callback-queue trace information.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

0ac3d136

rcu: Update RCU's trace.txt documentation for new format · 2fa218d8

Paul E. McKenney authored Mar 27, 2011

The trace.txt file had obsolete output for the debugfs rcu/rcudata
file, so update it.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

2fa218d8

rcu: Add boosting to TREE_PREEMPT_RCU tracing · 0ea1f2eb

Paul E. McKenney authored Feb 22, 2011

Includes total number of tasks boosted, number boosted on behalf of each
of normal and expedited grace periods, and statistics on attempts to
initiate boosting that failed for various reasons.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

0ea1f2eb

rcu: eliminate unused boosting statistics · 67b98dba

Paul E. McKenney authored Feb 21, 2011

The n_rcu_torture_boost_allocerror and n_rcu_torture_boost_afferror
statistics are not actually incremented anymore, so eliminate them.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

67b98dba

rcu: avoid hammering sched with yet another bound RT kthread · 3acf4a9a

Paul E. McKenney authored Apr 17, 2011

The scheduler does not appear to take kindly to having multiple
real-time threads bound to a CPU that is going offline.  So this
commit is a temporary hack-around to avoid that happening.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

3acf4a9a

rcu: put per-CPU kthread at non-RT priority during CPU hotplug operations · e3995a25

Paul E. McKenney authored Apr 18, 2011

If you are doing CPU hotplug operations, it is best not to have
CPU-bound realtime tasks running CPU-bound on the outgoing CPU.
So this commit makes per-CPU kthreads run at non-realtime priority
during that time.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

e3995a25

rcu: Force per-rcu_node kthreads off of the outgoing CPU · 0f962a5e

Paul E. McKenney authored Apr 14, 2011

The scheduler has had some heartburn in the past when too many real-time
kthreads were affinitied to the outgoing CPU. So, this commit lightens
the load by forcing the per-rcu_node and the boost kthreads off of the
outgoing CPU. Note that RCU's per-CPU kthread remains on the outgoing
CPU until the bitter end, as it must in order to preserve correctness.

Also avoid disabling hardirqs across calls to set_cpus_allowed_ptr(),
given that this function can block.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

0f962a5e

rcu: priority boosting for TREE_PREEMPT_RCU · 27f4d280

Paul E. McKenney authored Feb 07, 2011

Add priority boosting for TREE_PREEMPT_RCU, similar to that for
TINY_PREEMPT_RCU.  This is enabled by the default-off RCU_BOOST
kernel parameter.  The priority to which to boost preempted
RCU readers is controlled by the RCU_BOOST_PRIO kernel parameter
(defaulting to real-time priority 1) and the time to wait before
boosting the readers who are blocking a given grace period is
controlled by the RCU_BOOST_DELAY kernel parameter (defaulting to
500 milliseconds).
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

27f4d280

rcu: move TREE_RCU from softirq to kthread · a26ac245

Paul E. McKenney authored Jan 12, 2011

If RCU priority boosting is to be meaningful, callback invocation must
be boosted in addition to preempted RCU readers. Otherwise, in presence
of CPU real-time threads, the grace period ends, but the callbacks don't
get invoked. If the callbacks don't get invoked, the associated memory
doesn't get freed, so the system is still subject to OOM.

But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit
moves the callback invocations to a kthread, which can be boosted easily.

Also add comments and properly synchronized all accesses to
rcu_cpu_kthread_task, as suggested by Lai Jiangshan.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

a26ac245

rcu: merge TREE_PREEPT_RCU blocked_tasks[] lists · 12f5f524

Paul E. McKenney authored Nov 29, 2010

Combine the current TREE_PREEMPT_RCU ->blocked_tasks[] lists in the
rcu_node structure into a single ->blkd_tasks list with ->gp_tasks
and ->exp_tasks tail pointers.  This is in preparation for RCU priority
boosting, which will add a third dimension to the combinatorial explosion
in the ->blocked_tasks[] case, but simply a third pointer in the new
->blkd_tasks case.

Also update documentation to reflect blocked_tasks[] merge
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

12f5f524

rcu: Decrease memory-barrier usage based on semi-formal proof · e59fb312

Paul E. McKenney authored Sep 07, 2010

Commit d09b62df fixed grace-period synchronization, but left some smp_mb()
invocations in rcu_process_callbacks() that are no longer needed, but
sheer paranoia prevented them from being removed. This commit removes
them and provides a proof of correctness in their absence. It also adds
a memory barrier to rcu_report_qs_rsp() immediately before the update to
rsp->completed in order to handle the theoretical possibility that the
compiler or CPU might move massive quantities of code into a lock-based
critical section. This also proves that the sheer paranoia was not
entirely unjustified, at least from a theoretical point of view.

In addition, the old dyntick-idle synchronization depended on the fact
that grace periods were many milliseconds in duration, so that it could
be assumed that no dyntick-idle CPU could reorder a memory reference
across an entire grace period. Unfortunately for this design, the
addition of expedited grace periods breaks this assumption, which has
the unfortunate side-effect of requiring atomic operations in the
functions that track dyntick-idle state for RCU. (There is some hope
that the algorithms used in user-level RCU might be applied here, but
some work is required to handle the NMIs that user-space applications
can happily ignore. For the short term, better safe than sorry.)

This proof assumes that neither compiler nor CPU will allow a lock
acquisition and release to be reordered, as doing so can result in
deadlock. The proof is as follows:

1. A given CPU declares a quiescent state under the protection of
its leaf rcu_node's lock.

2. If there is more than one level of rcu_node hierarchy, the
last CPU to declare a quiescent state will also acquire the
->lock of the next rcu_node up in the hierarchy, but only
after releasing the lower level's lock. The acquisition of this
lock clearly cannot occur prior to the acquisition of the leaf
node's lock.

3. Step 2 repeats until we reach the root rcu_node structure.
Please note again that only one lock is held at a time through
this process. The acquisition of the root rcu_node's ->lock
must occur after the release of that of the leaf rcu_node.

4. At this point, we set the ->completed field in the rcu_state
structure in rcu_report_qs_rsp(). However, if the rcu_node
hierarchy contains only one rcu_node, then in theory the code
preceding the quiescent state could leak into the critical
section. We therefore precede the update of ->completed with a
memory barrier. All CPUs will therefore agree that any updates
preceding any report of a quiescent state will have happened
before the update of ->completed.

5. Regardless of whether a new grace period is needed, rcu_start_gp()
will propagate the new value of ->completed to all of the leaf
rcu_node structures, under the protection of each rcu_node's ->lock.
If a new grace period is needed immediately, this propagation
will occur in the same critical section that ->completed was
set in, but courtesy of the memory barrier in #4 above, is still
seen to follow any pre-quiescent-state activity.

6. When a given CPU invokes __rcu_process_gp_end(), it becomes
aware of the end of the old grace period and therefore makes
any RCU callbacks that were waiting on that grace period eligible
for invocation.

If this CPU is the same one that detected the end of the grace
period, and if there is but a single rcu_node in the hierarchy,
we will still be in the single critical section. In this case,
the memory barrier in step #4 guarantees that all callbacks will
be seen to execute after each CPU's quiescent state.

On the other hand, if this is a different CPU, it will acquire
the leaf rcu_node's ->lock, and will again be serialized after
each CPU's quiescent state for the old grace period.

On the strength of this proof, this commit therefore removes the memory
barriers from rcu_process_callbacks() and adds one to rcu_report_qs_rsp().
The effect is to reduce the number of memory barriers by one and to
reduce the frequency of execution from about once per scheduling tick
per CPU to once per grace period.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

e59fb312

rcu: Remove conditional compilation for RCU CPU stall warnings · a00e0d71

Paul E. McKenney authored Feb 08, 2011

The RCU CPU stall warnings can now be controlled using the
rcu_cpu_stall_suppress boot-time parameter or via the same parameter
from sysfs. There is therefore no longer any reason to have
kernel config parameters for this feature. This commit therefore
removes the RCU_CPU_STALL_DETECTOR and RCU_CPU_STALL_DETECTOR_RUNNABLE
kernel config parameters. The RCU_CPU_STALL_TIMEOUT parameter remains
to allow the timeout to be tuned and the RCU_CPU_STALL_VERBOSE parameter
remains to allow task-stall information to be suppressed if desired.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

a00e0d71

04 May, 2011 4 commits

Linux 2.6.39-rc6 · 0ee5623f
Linus Torvalds authored May 03, 2011

0ee5623f

Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 · d2af6768

Linus Torvalds authored May 03, 2011

* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/radeon/kms: fix gart setup on fusion parts (v2)
  drm: Send pending vblank events before disabling vblank.
  drm/radeon: fix regression on atom cards with hardcoded EDID record.
  drm/radeon/kms: add some new pci ids

d2af6768

drm/radeon/kms: fix gart setup on fusion parts (v2) · 8aeb96f8

Alex Deucher authored May 03, 2011

Out of the entire GART/VM subsystem, the hw designers changed
the location of 3 regs.

v2: airlied: add parameter for userspace to work from.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>

8aeb96f8

drm: Send pending vblank events before disabling vblank. · 498548ec

Christopher James Halse Rogers authored Apr 27, 2011

This is the least-bad behaviour.  It means that we signal the
vblank event before it actually happens, but since we're disabling
vblanks there's no guarantee that it will *ever* happen otherwise.

This prevents GL applications which use WaitMSC from hanging
indefinitely.
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

498548ec

03 May, 2011 8 commits

drm/radeon: fix regression on atom cards with hardcoded EDID record. · eaa4f5e1

Dave Airlie authored May 01, 2011

Since fafcf94e introduced an edid size, it seems to have broken this path.

This manifest as oops on T500 Lenovo laptops with dual graphics primarily.

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=33812

cc: stable@kernel.org
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

eaa4f5e1

drm/radeon/kms: add some new pci ids · e2c85d8e

Alex Deucher authored May 03, 2011

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>

e2c85d8e

logfs: initialize superblock entries earlier · cce2c56e

Linus Torvalds authored May 03, 2011

In particular, s_freeing_list needs to be initialized early, since it is
used on some of the error paths when mounts fail.  The mapping inode,
for example, would be initialized and then free'd on an error path
before s_freeing_list was initialized, but the inode drop operation
needs the s_freeing_list to be set up.

Normally you'd never see this, because not only is logfs fairly rare,
but a successful mount will never have any issues.
Reported-by: werner <w.landgraf@ru.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cce2c56e

Merge branch 'stable/bug-fixes-for-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen · 609cfda5

Linus Torvalds authored May 03, 2011

* 'stable/bug-fixes-for-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_top
  xen/mmu: Add workaround "x86-64, mm: Put early page table high"

609cfda5

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc · 9a6cd4b4

Linus Torvalds authored May 03, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
  mmc: sdhci: Check mrq != NULL in sdhci_tasklet_finish
  mmc: sdhci: Check mrq->cmd in sdhci_tasklet_finish
  mmc: tmio: fix .set_ios(MMC_POWER_UP) handling
  mmc: fix a race between card-detect rescan and clock-gate work instances
  mmc: omap: Fix possible NULL pointer deref
  mmc: core: mmc_add_card(): fix missing break in switch statement
  mmc: sdhci-pci: Fix error case in sdhci_pci_probe_slot()

9a6cd4b4

Merge branches 'x86-fixes-for-linus' and 'irq-fixes-for-linus' of... · bab0dcc7

Linus Torvalds authored May 03, 2011

Merge branches 'x86-fixes-for-linus' and 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, reboot: Fix relocations in reboot_32.S
  x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo()
  x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors

* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  genirq: Fix typo CONFIG_GENIRC_IRQ_SHOW_LEVEL

bab0dcc7

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 497ff034

Linus Torvalds authored May 02, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: wm831x-ts - move BTN_TOUCH reporting to data transfer
  Input: wm831x-ts - allow IRQ flags to be specified
  Input: wm831x-ts - fix races with IRQ management

497ff034

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · 5933f2ae

Linus Torvalds authored May 02, 2011

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
  sysctl: net: call unregister_net_sysctl_table where needed
  Revert: veth: remove unneeded ifname code from veth_newlink()
  smsc95xx: fix reset check
  tg3: Fix failure to enable WoL by default when possible
  networking: inappropriate ioctl operation should return ENOTTY
  amd8111e: trivial typo spelling: Negotitate -> Negotiate
  ipv4: don't spam dmesg with "Using LC-trie" messages
  af_unix: Only allow recv on connected seqpacket sockets.
  mii: add support of pause frames in mii_get_an
  net: ftmac100: fix scheduling while atomic during PHY link status change
  usbnet: Transfer of maintainership
  usbnet: add support for some Huawei modems with cdc-ether ports
  bnx2: cancel timer on device removal
  iwl4965: fix "Received BA when not expected"
  iwlagn: fix "Received BA when not expected"
  dsa/mv88e6131: fix unknown multicast/broadcast forwarding on mv88e6085
  usbnet: Resubmit interrupt URB if device is open
  iwl4965: fix "TX Power requested while scanning"
  iwlegacy: led stay solid on when no traffic
  b43: trivial: update module info about ucode16_mimo firmware
  ...

5933f2ae

02 May, 2011 14 commits

sysctl: net: call unregister_net_sysctl_table where needed · ff538818

Lucian Adrian Grijincu authored May 01, 2011

ctl_table_headers registered with register_net_sysctl_table should
have been unregistered with the equivalent unregister_net_sysctl_table
Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff538818

Revert: veth: remove unneeded ifname code from veth_newlink() · 6c8c4446

Jiri Pirko authored Apr 30, 2011

84c49d8c ("veth: remove unneeded
ifname code from veth_newlink()") caused regression on veth
creation. This patch reverts the original one.
Reported-by: Michał Mirosław <mirqus@gmail.com>
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6c8c4446

smsc95xx: fix reset check · d9460920

Rabin Vincent authored Apr 30, 2011

The reset loop check should check the MII_BMCR register value for
BMCR_RESET rather than for MII_BMCR (the register address, which also
happens to be zero).
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: David S. Miller <davem@davemloft.net>

d9460920

tg3: Fix failure to enable WoL by default when possible · 6fdbab9d

Rafael J. Wysocki authored Apr 28, 2011

tg3 is supposed to enable WoL by default on adapters which support
that, but it fails to do so unless the adapter's
/sys/devices/.../power/wakeup file contains 'enabled' during the
initialization of the adapter.  Fix that by making tg3 use
device_set_wakeup_enable() to enable wakeup automatically whenever
WoL should be enabled by default.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

6fdbab9d

networking: inappropriate ioctl operation should return ENOTTY · 41c31f31

Lifeng Sun authored Apr 27, 2011

ioctl() calls against a socket with an inappropriate ioctl operation
are incorrectly returning EINVAL rather than ENOTTY:

  [ENOTTY]
      Inappropriate I/O control operation.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=33992Signed-off-by: Lifeng Sun <lifongsun@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

41c31f31

x86, reboot: Fix relocations in reboot_32.S · 7806a49a

H. Peter Anvin authored May 02, 2011

The use of base for %ebx in this file is arbitrary, *except* that we
also use it to compute the real-mode segment.  Therefore, make it so
that r_base really is the true address to which %ebx points.

This resolves kernel bugzilla 33302.
Reported-and-tested-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/n/tip-08os5wi3yq1no0y4i5m4z7he@git.kernel.org

7806a49a

amd8111e: trivial typo spelling: Negotitate -> Negotiate · 983960b1
Joe Perches authored May 02, 2011
```
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
983960b1

xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_top · b9269dc7

Stefano Stabellini authored Apr 12, 2011

mask_rw_pte is currently checking if a pfn is a pagetable page if it
falls in the range pgt_buf_start - pgt_buf_end but that is incorrect
because pgt_buf_end is a moving target: pgt_buf_top is the real
boundary.
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

b9269dc7

xen/mmu: Add workaround "x86-64, mm: Put early page table high" · a3864783

Konrad Rzeszutek Wilk authored Apr 29, 2011

As a consequence of the commit:

commit 4b239f45
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Fri Dec 17 16:58:28 2010 -0800

    x86-64, mm: Put early page table high

it causes the Linux kernel to crash under Xen:

mapping kernel into physical memory
Xen: setup ISA identity maps
about to get started...
(XEN) mm.c:2466:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn b1d89 (pfn bacf7)
(XEN) mm.c:3027:d0 Error while pinning mfn b1d89
(XEN) traps.c:481:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
...

The reason is that at some point init_memory_mapping is going to reach
the pagetable pages area and map those pages too (mapping them as normal
memory that falls in the range of addresses passed to init_memory_mapping
as argument). Some of those pages are already pagetable pages (they are
in the range pgt_buf_start-pgt_buf_end) therefore they are going to be
mapped RO and everything is fine.
Some of these pages are not pagetable pages yet (they fall in the range
pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
are going to be mapped RW.  When these pages become pagetable pages and
are hooked into the pagetable, xen will find that the guest has already
a RW mapping of them somewhere and fail the operation.
The reason Xen requires pagetables to be RO is that the hypervisor needs
to verify that the pagetables are valid before using them. The validation
operations are called "pinning" (more details in arch/x86/xen/mmu.c).

In order to fix the issue we mark all the pages in the entire range
pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
is completed only the range pgt_buf_start-pgt_buf_end is reserved by
init_memory_mapping. Hence the kernel is going to crash as soon as one
of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
ranges are RO).

For this reason, this function is introduced which is called _after_
the init_memory_mapping has completed (in a perfect world we would
call this function from init_memory_mapping, but lets ignore that).

Because we are called _after_ init_memory_mapping the pgt_buf_[start,
end,top] have all changed to new values (b/c another init_memory_mapping
is called). Hence, the first time we enter this function, we save
away the pgt_buf_start value and update the pgt_buf_[end,top].

When we detect that the "old" pgt_buf_start through pgt_buf_end
PFNs have been reserved (so memblock_x86_reserve_range has been called),
we immediately set out to RW the "old" pgt_buf_end through pgt_buf_top.

And then we update those "old" pgt_buf_[end|top] with the new ones
so that we can redo this on the next pagetable.
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
[v1: Updated with Jeremy's comments]
[v2: Added the crash output]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

a3864783

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 · badb0295
David S. Miller authored May 02, 2011

badb0295

Merge branch 'for-linus' of git://git.infradead.org/ubifs-2.6 · adadfe48

Linus Torvalds authored May 02, 2011

* 'for-linus' of git://git.infradead.org/ubifs-2.6:
  UBIFS: seek journal heads to the latest bud in replay
  UBIFS: do not free write-buffers when in R/O mode

adadfe48

Merge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm · 625a3b60

Linus Torvalds authored May 02, 2011

* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm: (47 commits)
  CLKDEV: Fix clkdev return value for NULL clk case
  ARM: 6891/1: prevent heap corruption in OABI semtimedop
  ARM: kprobes: Tidy-up kprobes-decode.c
  ARM: kprobes: Add emulation of hint instructions like NOP and WFI
  ARM: kprobes: Add emulation of SBFX, UBFX, BFI and BFC instructions
  ARM: kprobes: Add emulation of MOVW and MOVT instructions
  ARM: kprobes: Reject probing of undefined data processing instructions
  ARM: kprobes: Remove redundant code in space_1111
  ARM: kprobes: Fix emulation of PLD instructions
  ARM: kprobes: Reject probing of SETEND instructions
  ARM: kprobes: Consolidate stub decoding functions
  ARM: kprobes: Reject probing of all coprocessor instructions
  ARM: kprobes: Fix emulation of USAD8 instructions
  ARM: kprobes: Fix emulation of SMUAD, SMUSD and SMMUL instructions
  ARM: kprobes: Fix emulation of SXTB16, SXTB, SXTH, UXTB16, UXTB and UXTH instructions
  ARM: kprobes: Reject probing of undefined media instructions
  ARM: kprobes: Add emulation of RBIT instruction
  ARM: kprobes: Reject probing of LDRB instructions which load PC
  ARM: kprobes: Fix emulation of LDRD and STRD instructions
  ARM: kprobes: Reject probing of LDR/STR instructions which update PC unpredictably
  ...

625a3b60

genirq: Fix typo CONFIG_GENIRC_IRQ_SHOW_LEVEL · 94b2c363

Geert Uytterhoeven authored Apr 30, 2011

commit ab7798ff ("genirq: Expand generic
show_interrupts()") added the Kconfig option GENERIC_IRQ_SHOW_LEVEL to
accomodate PowerPC, but this doesn't actually enable the functionality due
to a typo in the #ifdef check.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Linux/PPC Development <linuxppc-dev@lists.ozlabs.org>
Link: http://lkml.kernel.org/r/%3Calpine.DEB.2.00.1104302251370.19068%40ayla.of.borg%3ESigned-off-by: Thomas Gleixner <tglx@linutronix.de>

94b2c363

UBIFS: seek journal heads to the latest bud in replay · 52c6e6f9

Artem Bityutskiy authored Apr 25, 2011

This is the second fix of the following symptom:

UBIFS error (pid 34456): could not find an empty LEB

which sometimes happens after power cuts when we mount the file-system - UBIFS
refuses it with the above error message which comes from the
'ubifs_rcvry_gc_commit()' function. I can reproduce this using the integck test
with the UBIFS power cut emulation enabled.

Analysis of the problem.

Currently UBIFS replay seeks the journal heads to the last _replayed_ bud.
But the buds are replayed out-of-order, so the replay basically seeks journal
heads to the "random" bud belonging to this head, and not to the _last_ one.

The result of this is that the GC head may be seeked to a full LEB with no free
space, or very little free space. And 'ubifs_rcvry_gc_commit()' tries to find a
fully or mostly dirty LEB to match the current GC head (because we need to
garbage-collect that dirty LEB at one go, because we do not have @c->gc_lnum).
So 'ubifs_find_dirty_leb()' fails and we fall back to finding an empty LEB and
also fail. As a result - recovery fails and mounting fails.

This patch teaches the replay to initialize the GC heads exactly to the latest
buds, i.e. the buds which have the largest sequence number in corresponding
log reference nodes.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org

52c6e6f9