Commits · 32bd671d6cbeda60dc73be77fa2b9037d9a9bfa0 · Kirill Smelkov / linux

05 Feb, 2009 3 commits

signal: re-add dead task accumulation stats. · 32bd671d

Peter Zijlstra authored Feb 05, 2009

We're going to split the process wide cpu accounting into two parts:

 - clocks; which can take all the time they want since they run
           from user context.

 - timers; which need constant time tracing but can affort the overhead
           because they're default off -- and rare.

The clock readout will go back to a full sum of the thread group, for this
we need to re-add the exit stats that were removed in the initial itimer
rework (f06febc9: timers: fix itimer/many thread hang).

Furthermore, since that full sum can be rather slow for large thread groups
and we have the complete dead task stats, revert the do_notify_parent time
computation.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

32bd671d

Merge branch 'sched/urgent' into timers/urgent · 83895147

Ingo Molnar authored Feb 05, 2009

Merging it here because an upcoming timers/urgent fix relies on
a change already in sched/urgent and not yet upstream.
Signed-off-by: Ingo Molnar <mingo@elte.hu>

83895147

x86: fix hpet timer reinit for x86_64 · a6a95406

Pavel Emelyanov authored Feb 04, 2009

There's a small problem with hpet_rtc_reinit function - it checks
for the:

	hpet_readl(HPET_COUNTER) - hpet_t1_cmp > 0

to continue increasing both the HPET_T1_CMP (register) and the
hpet_t1_cmp (variable).

But since the HPET_COUNTER is always 32-bit, if the hpet_t1_cmp
is 64-bit this condition will always be FALSE once the latter hits
the 32-bit boundary, and we can have a situation, when we don't
increase the HPET_T1_CMP register high enough.

The result - timer stops ticking, since HPET_T1_CMP becomes less,
than the COUNTER and never increased again.

The solution is (based on Linus's suggestion) to not compare 64-bits
(on 64-bit x86), but to do the comparison on 32-bit signed
integers.
Reported-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

a6a95406

04 Feb, 2009 1 commit

sched: fix nohz load balancer on cpu offline · 483b4ee6

Suresh Siddha authored Feb 04, 2009

Christian Borntraeger reports:

> After a logical cpu offline, even on a complete idle system, there
> is one cpu with full ticks. It turns out that nohz.cpu_mask has the
> the offlined cpu still set.
>
> In select_nohz_load_balancer() we check if the system is completely
> idle to turn of load balancing. We compare cpu_online_map with
> nohz.cpu_mask.  Since cpu_online_map is updated on cpu unplug,
> but nohz.cpu_mask is not, the check fails and the scheduler believes
> that we need an "idle load balancer" even on a fully idle system.
> Since the ilb cpu does not deactivate the timer tick this breaks NOHZ.

Fix the select_nohz_load_balancer() to not set the nohz.cpu_mask
while a cpu is going offline.
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

483b4ee6

03 Feb, 2009 12 commits

sched: add missing kernel-doc in sched.h · 35626129

Randy Dunlap authored Feb 02, 2009

Add kernel-doc notation for @lock:

include/linux/sched.h:457: No description found for parameter 'lock'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

35626129

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 · b1792e36

Linus Torvalds authored Feb 02, 2009

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI hotplug: Change link order of pciehp & acpiphp
  PCI hotplug: fakephp: Allocate PCI resources before adding the device
  PCI MSI: Fix undefined shift by 32
  PCI PM: Do not wait for buses in B2 or B3 during resume
  PCI PM: Power up devices before restoring their state
  PCI PM: Fix hibernation breakage on EeePC 701
  PCI: irq and pci_ids patch for Intel Tigerpoint DeviceIDs
  PCI PM: Fix suspend error paths and testing facility breakage

b1792e36

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 · 859281ff

Linus Torvalds authored Feb 02, 2009

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  slub: fix per cpu kmem_cache_cpu array memory leak
  kmalloc: return NULL instead of link failure

859281ff

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · 93bfbd71

Linus Torvalds authored Feb 02, 2009

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  fbdev/atyfb: Fix DSP config on some PowerMacs & PowerBooks
  powerpc: Fix oops on some machines due to incorrect pr_debug()
  powerpc/ps3: Printing fixups for l64 to ll64 convserion drivers/net
  powerpc/5200: update device tree binding documentation
  powerpc/5200: Bugfix for PCI mapping of memory and IMMR
  powerpc/5200: update defconfigs

93bfbd71

Merge branch 'sched-fixes-for-linus' of... · 31c952dc

Linus Torvalds authored Feb 02, 2009

Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched_rt: don't use first_cpu on cpumask created with cpumask_and
  sched: fix buddie group latency
  sched: clear buddies more aggressively
  sched: symmetric sync vs avg_overlap
  sched: fix sync wakeups
  cpuset: fix possible deadlock in async_rebuild_sched_domains

31c952dc

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 · 9e6235e9

Linus Torvalds authored Feb 02, 2009

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (45 commits)
  V4L/DVB (10411): s5h1409: Perform s5h1409 soft reset after tuning
  V4L/DVB (10403): saa7134-alsa: saa7130 doesn't support digital audio
  V4L/DVB (10229): ivtv: fix memory leak
  V4L/DVB (10385): gspca - main: Fix memory leak when USB disconnection while streaming.
  V4L/DVB (10325): em28xx: Fix for fail to submit URB with IRQs and Pre-emption Disabled
  V4L/DVB (10317): radio-mr800: fix radio->muted and radio->stereo
  V4L/DVB (10314): cx25840: ignore TUNER_SET_CONFIG in the command callback.
  V4L/DVB (10288): af9015: bug fix: stick does not work always when plugged
  V4L/DVB (10287): af9015: fix second FE
  V4L/DVB (10270): saa7146: fix unbalanced mutex_lock/unlock
  V4L/DVB (10265): budget.c driver: Kernel oops: "BUG: unable to handle kernel paging request at ffffffff
  V4L/DVB (10261): em28xx: fix kernel panic on audio shutdown
  V4L/DVB (10257): em28xx: Fix for KWorld 330U Board
  V4L/DVB (10256): em28xx: Fix for KWorld 330U AC97
  V4L/DVB (10254): em28xx: Fix audio URB transfer buffer race condition
  V4L/DVB (10250): cx25840: fix regression: fw not loaded on first use
  V4L/DVB (10248): v4l-dvb: fix a bunch of compile warnings.
  V4L/DVB (10243): em28xx: fix compile warning
  V4L/DVB (10240): Fix obvious swapped names in v4l2_subdev logic
  V4L/DVB (10233): [PATCH] Terratec Cinergy DT XS Diversity new USB ID (0ccd:0081)
  ...

9e6235e9

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc · 5c350d93

Linus Torvalds authored Feb 02, 2009

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
  pxamci: enable DMA for write ops after CMD/RESP
  pxamci: replace #ifdef CONFIG_PXA27x with if (cpu_is_pxa27x())
  ricoh_mmc: Use suspend_late/resume_early
  mmci: Add support for ST Micro derivate
  mmc: Add a MX2/MX3 specific SDHC driver

5c350d93

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 · 017f5178

Linus Torvalds authored Feb 02, 2009

* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
  icside: fix PCB version 6 support (v2)
  tx4939ide: typo fix and minor cleanup
  ide: add CS5536 host driver (v3)
  ide: Force VIA IDE legacy interrupts for AmigaOne boards
  IDE: Unregister and disable devices if initialization fails.
  ide: fix ide_register_port() failure handling
  ide: struct device - replace bus_id with dev_name(), dev_set_name()
  ide-cd: fix DMA for non bio-backed requests

017f5178

Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/dvrabel/uwb · 17294ab2

Linus Torvalds authored Feb 02, 2009

* 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/dvrabel/uwb:
  uwb: lock rc->rsvs_lock with spin_lock_bh()
  wusb: timeout when waiting for ASL/PZL updates in whci-hcd
  uwb: remove unused #include <version.h>'s
  wusb: return -ENOTCONN when resetting a port with no connected device
  uwb: safely remove all reservations

17294ab2

Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block · 86adf8ad

Linus Torvalds authored Feb 02, 2009

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: add text file detailing queue/ sysfs files
  bio.h: If they MUST be inlined, then use __always_inline
  Fix misleading comment in bio.h
  block: fix inconsistent parenthesisation of QUEUE_FLAG_DEFAULT
  block: fix oops in blk_queue_io_stat()

86adf8ad

virtio-pci: do not oops on config change if driver not loaded · 3fff0179

Mark McLoughlin authored Feb 03, 2009

The host really shouldn't be notifying us of config changes
before the device status is VIRTIO_CONFIG_S_DRIVER or
VIRTIO_CONFIG_S_DRIVER_OK.

However, if we do happen to be interrupted while we're not
attached to a driver, we really shouldn't oops. Prevent
this simply by checking that device->driver is non-NULL
before trying to notify the driver of config changes.

Problem observed by doing a "set_link virtio.0 down" with
QEMU before the net driver had been loaded.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3fff0179

modules: Use a better scheme for refcounting · 720eba31

Eric Dumazet authored Feb 03, 2009

Current refcounting for modules (done if CONFIG_MODULE_UNLOAD=y) is
using a lot of memory.

Each 'struct module' contains an [NR_CPUS] array of full cache lines.

This patch uses existing infrastructure (percpu_modalloc() &
percpu_modfree()) to allocate percpu space for the refcount storage.

Instead of wasting NR_CPUS*128 bytes (on i386), we now use
nr_cpu_ids*sizeof(local_t) bytes.

On a typical distro, where NR_CPUS=8, shiping 2000 modules, we reduce
size of module files by about 2 Mbytes. (1Kb per module)

Instead of having all refcounters in the same memory node - with TLB misses
because of vmalloc() - this new implementation permits to have better
NUMA properties, since each  CPU will use storage on its preferred node,
thanks to percpu storage.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

720eba31

02 Feb, 2009 23 commits

pxamci: enable DMA for write ops after CMD/RESP · b6018958

Cliff Brake authored Jan 22, 2009

With the PXA270 MMC hardware, there seems to be an issue of
data corruption on writes where a 4KB data block is offset
by one byte.

If we delay enabling the DMA for writes until after the CMD/RESP
has finished, the problem seems to be fixed.

related to PXA270 Erratum #91
Tested-by: Vernon Sauder <VernonInHand@gmail.com>
Signed-off-by: Cliff Brake <cbrake@bec-systems.com>
Acked-by: Eric Miao <eric.miao@marvell.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

b6018958

pxamci: replace #ifdef CONFIG_PXA27x with if (cpu_is_pxa27x()) · e10a854c

Cliff Brake authored Jan 22, 2009

Signed-off-by: Cliff Brake <cbrake@bec-systems.com>
Acked-by: Eric Miao <eric.miao@marvell.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

e10a854c

ricoh_mmc: Use suspend_late/resume_early · 06cc1c88

philipl@overt.org authored Jan 18, 2009

If ricoh_mmc suspends before sdhci_pci, it will pull the card
out from under the controller, which could leave the system in
a very confused state.

Using suspend_late/resume_early ensures that sdhci_pci suspends first
and resumes second.
Signed-off-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

06cc1c88

mmci: Add support for ST Micro derivate · cc30d60e

Linus Walleij authored Jan 04, 2009

This patch adds support for the ST Microelectronics version of
the PL180 PrimeCell. They use designer ID 0x80 and have a few
alterations/bugfixes related to open drain and HW flow control.
They also add some SDIO registers, I am unsure if these are
in ST HW only or if this is things also added in later ARM
revisions, but they are included in the mmci.h file for
completeness.
Signed-off-by: Linus Walleij <linus.walleij@ericsson.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

cc30d60e

mmc: Add a MX2/MX3 specific SDHC driver · d96be879

Sascha Hauer authored Jan 06, 2009

This patch adds a MX2/MX3 specific SDHC driver. The hardware is basically
the same as in the MX1, but unlike the MX1 controller the MX2
controller just works as expected. Since the MX1 driver has more
workarounds for bugs than anything else I had no success with supporting
MX1 and MX2 in a sane way in one driver.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>

d96be879

icside: fix PCB version 6 support (v2) · d224b626

Bartlomiej Zolnierkiewicz authored Feb 02, 2009

We need to pass struct ide_port_info also to ide_host_register().

v2:
Fix v5/v6 mismatch noticed by Russell.

Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

d224b626

tx4939ide: typo fix and minor cleanup · 9711a537

Atsushi Nemoto authored Feb 02, 2009

The bcount is greater than 0 and less than or equal to 0x10000.
Thus '(bcount & 0xffff) == 0x0000' can be simplified as 'bcount == 0x10000'.
Suggested-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

9711a537

ide: add CS5536 host driver (v3) · a77dcc43

Bartlomiej Zolnierkiewicz authored Feb 02, 2009

This is a port of libata's pata_cs5536.c (written by Martin K. Petersen)
to IDE subsystem.

Changes done while at it:

* Reprogram PIO/MWDMA timings if needed before and after DMA transfer
  (chipset uses shared PIO/MWDMA timings).

* Fix cable detection to report 80-wires cable if BIOS set it for any
  device on a port (IDE core will do drive-side cable detection later).

* Don't disable UDMA while programming PIO timings.

* Simplify PCI/MSR support.

Pros of having IDE host driver in addition to libata's one:

* IDE is much lighter than SCSI+libata, the host driver itself is also
  a bit smaller:

   text    data     bss     dec     hex filename
   1261     496       4    1761     6e1 drivers/ata/pata_cs5536.o
   1242     128       4    1374     55e drivers/ide/cs5536.o

* This allows use of IDE features which are unavailable under libata.

v2:
* Fixes per review from Sergei:
  - simplify dependency check in Kconfig
  - use IDE_DRV_MASK also for ->drive_data
  - disable UDMA when programming MWDMA
  - program new DTC timings only when necessary
  - fix printk() level in cs5536_init_one()

* Fix patch description according to comments from Alan and Sergei.

v3:
* Smarter masking of UDMA bits per Sergei's suggestion.

Cc: Martin K. Petersen <mkp@mkp.net>
Cc: Karl Auerbach <karl@iwl.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

a77dcc43

ide: Force VIA IDE legacy interrupts for AmigaOne boards · 9f6514c1

Gerhard Pircher authored Feb 02, 2009

The AmigaOne uses the onboard VIA IDE controller in legacy mode (like the
Pegasos).
Signed-off-by: Gerhard Pircher <gerhard_pircher@gmx.net>
Cc: "Grant Likely" <grant.likely@secretlab.ca>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

9f6514c1

IDE: Unregister and disable devices if initialization fails. · 51d6ac70

Ian Campbell authored Feb 02, 2009

On reboot the loop in device_shutdown gets confused by these partially
initialized devices and goes into an infinite loop. Therefore unregister
and disable these devices.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
[bart: remove leftover hwif->present clearing + update patch description]
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

51d6ac70

ide: fix ide_register_port() failure handling · 9a100f4b

Bartlomiej Zolnierkiewicz authored Feb 02, 2009

* Factor out port freeing from ide_host_free() to ide_free_port().

* Add ide_disable_port() and use it on ide_register_port() failure.

Cc: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

9a100f4b

ide: struct device - replace bus_id with dev_name(), dev_set_name() · e5461f38

Kay Sievers authored Feb 02, 2009

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: linux-ide@vger.kernel.org
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

e5461f38

ide-cd: fix DMA for non bio-backed requests · 9e772d01

Borislav Petkov authored Feb 02, 2009

This one fixes http://bugzilla.kernel.org/show_bug.cgi?id=12320.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

9e772d01

Merge branch 'master' of... · 8f049155

David Vrabel authored Feb 02, 2009

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into for-upstream

8f049155

block: add text file detailing queue/ sysfs files · cbb5901b
Jens Axboe authored Feb 02, 2009
```
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
```
cbb5901b

bio.h: If they MUST be inlined, then use __always_inline · c52440a6

Alberto Bertogli authored Feb 02, 2009

bvec_kmap_irq() and bvec_kunmap_irq() comments say they MUST be inlined,
so mark them as __always_inline.
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

c52440a6

Fix misleading comment in bio.h · 20b636bf

Alberto Bertogli authored Feb 02, 2009

The comment says "remember to add offset!", but the function already adds
it.
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

20b636bf

Merge branches 'topic/slab/fixes' and 'topic/slub/fixes' into for-linus · f58914e4
Pekka Enberg authored Feb 02, 2009

f58914e4
block: fix inconsistent parenthesisation of QUEUE_FLAG_DEFAULT · 0648e10d
Jens Axboe authored Feb 02, 2009
```
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
```
0648e10d

block: fix oops in blk_queue_io_stat() · fb8ec18c

Jens Axboe authored Feb 02, 2009

Some initial probe requests don't have disk->queue mapped yet, so we
can't rely on a non-NULL queue in blk_queue_io_stat(). Wrap it in
blk_do_io_stat().
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

fb8ec18c

fbdev/atyfb: Fix DSP config on some PowerMacs & PowerBooks · 7fbb7cad

Risto Suominen authored Jan 13, 2009

Since the complete re-write in 2.6.10, some PowerMacs (At least PowerMac 5500
and PowerMac G3 Beige rev A) with ATI Mach64 chip have suffered from unstable
columns in their framebuffer image. This seems to depend on a value (4) read
from PLL_EXT_CNTL register, which leads to incorrect DSP config parameters to
be written to the chip. This patch uses a value calculated by aty_init_pll_ct
instead, as a starting point.

There are questions as to whether this should be extended to other platforms
or maybe made dependent on specific chip types, but in the meantime, this has
been tested on various powermacs and works for them so let's commit it.
Signed-off-by: Risto Suominen <Risto.Suominen@gmail.com>
Tested-by: Michael Pettersson <mike@it.uu.se>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

7fbb7cad

powerpc: Fix oops on some machines due to incorrect pr_debug() · 59b608c2

Benjamin Herrenschmidt authored Feb 01, 2009

Recently, a patch left DEBUG enabled in the powerpc common PCI code,
resulting in an old bug in a pr_debug() statement to show up and cause
a NULL dereference on some machines.

This fixes the pr_debug() statement and reverts to DEBUG not being
force-enabled in that file.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

59b608c2

powerpc/ps3: Printing fixups for l64 to ll64 convserion drivers/net · 309ea626

Stephen Rothwell authored Jan 13, 2009

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

309ea626

01 Feb, 2009 1 commit

Manually revert "mlock: downgrade mmap sem while populating mlocked regions" · 27421e21

Linus Torvalds authored Feb 01, 2009

This essentially reverts commit 8edb08ca.

It downgraded our mmap semaphore to a read-lock while mlocking pages, in
order to allow other threads (and external accesses like "ps" et al) to
walk the vma lists and take page faults etc.  Which is a nice idea, but
the implementation does not work.

Because we cannot upgrade the lock back to a write lock without
releasing the mmap semaphore, the code had to release the lock entirely
and then re-take it as a writelock.  However, that meant that the caller
possibly lost the vma chain that it was following, since now another
thread could come in and mmap/munmap the range.

The code tried to work around that by just looking up the vma again and
erroring out if that happened, but quite frankly, that was just a buggy
hack that doesn't actually protect against anything (the other thread
could just have replaced the vma with another one instead of totally
unmapping it).

The only way to downgrade to a read map _reliably_ is to do it at the
end, which is likely the right thing to do: do all the 'vma' operations
with the write-lock held, then downgrade to a read after completing them
all, and then do the "populate the newly mlocked regions" while holding
just the read lock.  And then just drop the read-lock and return to user
space.

The (perhaps somewhat simpler) alternative is to just make all the
callers of mlock_vma_pages_range() know that the mmap lock got dropped,
and just re-grab the mmap semaphore if it needs to mlock more than one
vma region.

So we can do this "downgrade mmap sem while populating mlocked regions"
thing right, but the way it was done here was absolutely not correct.
Thus the revert, in the expectation that we will do it all correctly
some day.

Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

27421e21