Commits · ef4256733506f2459a0c436b62267d22a3f0cec6 · Kirill Smelkov / linux

26 Jul, 2010 14 commits

md/bitmap: optimise scanning of empty bitmaps. · ef425673

NeilBrown authored Jun 01, 2010

A bitmap is stored as one page per 2048 bits.
If none of the bits are set, the page is not allocated.

When bitmap_get_counter finds that a page isn't allocate,
it just reports that one bit work of space isn't flagged,
rather than reporting that 2048 bits worth of space are
unflagged.
This can cause searches for flagged bits (e.g. bitmap_close_sync)
to do more work than is really necessary.

So change bitmap_get_counter (when creating) to report a number of
blocks that more accurately reports the range of the device for which
no counter currently exists.
Signed-off-by: NeilBrown <neilb@suse.de>

ef425673

md/bitmap: clean up plugging calls. · b63d7c2e

NeilBrown authored Jun 01, 2010

1/ use md_unplug in bitmap.c as we will soon be using bitmaps under
  arrays with no queue attached.

2/ Don't bother plugging the queue when we set a bit in the bitmap.
   The reason for this was to encourage as many bits as possible to
   get set before we unplug and write stuff out.
   However every personality already plugs the queue after
   bitmap_startwrite either directly (raid1/raid10) or be setting
   STRIPE_BIT_DELAY which causes the queue to be plugged later
   (raid5).
Signed-off-by: NeilBrown <neilb@suse.de>

b63d7c2e

md/bitmap: reduce dependence on sysfs. · 5ff5afff

NeilBrown authored Jun 01, 2010

For dm-raid45 we will want to use bitmaps in dm-targets which don't
have entries in sysfs, so cope with the mddev not living in sysfs.
Signed-off-by: NeilBrown <neilb@suse.de>

5ff5afff

md/bitmap: white space clean up and similar. · ac2f40be

NeilBrown authored Jun 01, 2010

Fixes some whitespace problems
Fixed some checkpatch.pl complaints.
Replaced kmalloc ... memset(0), with kzalloc
Fixed an unlikely memory leak on an error path.
Reformatted a number of 'if/else' sets, sometimes
replacing goto with an else clause.
Removed some old comments and commented-out code.
Signed-off-by: NeilBrown <neilb@suse.de>

ac2f40be

md/raid5: export raid5 unplugging interface. · 9f7c2220

NeilBrown authored Jul 26, 2010

Also remove remaining accesses to ->queue and ->gendisk when ->queue
is NULL (As it is in a DM target).
Signed-off-by: NeilBrown <neilb@suse.de>

9f7c2220

md/plug: optionally use plugger to unplug an array during resync/recovery. · 252ac522

NeilBrown authored Jun 01, 2010

If an array doesn't have a 'queue' then md_do_sync cannot
unplug it.
In that case it will have a 'plugger', so make that available
to the mddev, and use it to unplug the array if needed.
Signed-off-by: NeilBrown <neilb@suse.de>

252ac522

md/raid5: add simple plugging infrastructure. · 2ac87401

NeilBrown authored Jun 01, 2010

md/raid5 uses the plugging infrastructure provided by the block layer
and 'struct request_queue'.  However when we plug raid5 under dm there
is no request queue so we cannot use that.

So create a similar infrastructure that is much lighter weight and use
it for raid5.
Signed-off-by: NeilBrown <neilb@suse.de>

2ac87401

md/raid5: export is_congested test · 11d8a6e3

NeilBrown authored Jul 26, 2010

the dm module will need this for dm-raid45.

Also only access ->queue->backing_dev_info->congested_fn
if ->queue actually exists.  It won't in a dm target.
Signed-off-by: NeilBrown <neilb@suse.de>

11d8a6e3

raid5: Don't set read-ahead when there is no queue · 4a5add49

NeilBrown authored Jun 01, 2010

dm-raid456 does not provide a 'queue' for raid5 to use,
so we must make raid5 stop depending on the queue.

First: read_ahead
dm handles read-ahead adjustment fully in userspace, so
simply don't do any readahead adjustments if there is
no queue.

Also re-arrange code slightly so all the accesses to ->queue are
together.

Finally, move the blk_queue_merge_bvec function into the 'if' as
the ->split_io setting in dm-raid456 has the same effect.
Signed-off-by: NeilBrown <neilb@suse.de>

4a5add49

md: add support for raising dm events. · 768a418d

NeilBrown authored Jul 26, 2010

dm uses scheduled work to raise events to user-space.
So allow md device to have work_structs and schedule them on an error.
Signed-off-by: NeilBrown <neilb@suse.de>

768a418d

md: export various start/stop interfaces · 390ee602

NeilBrown authored Jun 01, 2010

export entry points for starting and stopping md arrays.
This will be used by a module to make md/raid5 work under
dm.
Also stop calling md_stop_writes from md_stop, as that won't
work well with dm - it will want to call the two separately.
Signed-off-by: NeilBrown <neilb@suse.de>

390ee602

md: split out md_rdev_init · e8bb9a83

NeilBrown authored Jun 01, 2010

This functionality will be needed separately in a subsequent patch, so
split it into it's own exported function.
Signed-off-by: NeilBrown <neilb@suse.de>

e8bb9a83

md: be more careful setting MD_CHANGE_CLEAN · 676e42d8

NeilBrown authored Jun 01, 2010

When MD_CHANGE_CLEAN is set we might block in md_write_start.
So we should only set it when fairly sure that something will clear
it.

There are two places where it is set so as to encourage a metadata
update to record the progress of resync/recovery.  This should only
be done if the internal metadata update mechanisms are in use, which
can be tested by by inspecting '->persistent'.
Signed-off-by: NeilBrown <neilb@suse.de>

676e42d8

md/raid5: ensure we create a unique name for kmem_cache when mddev has no gendisk · f4be6b43

NeilBrown authored Jun 01, 2010

We will shortly allow md devices with no gendisk (they are attached to
a dm-target instead).  That will cause mdname() to return 'mdX'.
There is one place where mdname really needs to be unique: when
creating the name for a slab cache.
So in that case, if there is no gendisk, you the address of the mddev
formatted in HEX to provide a unique name.
Signed-off-by: NeilBrown <neilb@suse.de>

f4be6b43

21 Jul, 2010 2 commits

md/raid5: factor out code for changing size of stripe cache. · c41d4ac4

NeilBrown authored Jun 01, 2010

Separate the actual 'change' code from the sysfs interface
so that it can eventually be called internally.
Signed-off-by: NeilBrown <neilb@suse.de>

c41d4ac4

md: reduce dependence on sysfs. · 00bcb4ac

NeilBrown authored Jun 01, 2010

We will want md devices to live as dm targets where sysfs is not
visible.  So allow md to not connect to sysfs.
Signed-off-by: NeilBrown <neilb@suse.de>

00bcb4ac

19 Jul, 2010 9 commits

Merge branch 'x86-fixes-for-linus' of... · d0c6f625

Linus Torvalds authored Jul 19, 2010

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, pci, mrst: Add extra sanity check in walking the PCI extended cap chain
  x86: Fix x2apic preenabled system with kexec
  x86: Force HPET readback_cmp for all ATI chipsets

d0c6f625

Merge branch 'kmemleak' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-2.6-cm · 46ac0cc9

Linus Torvalds authored Jul 19, 2010

* 'kmemleak' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-2.6-cm:
  kmemleak: Add support for NO_BOOTMEM configurations
  kmemleak: Annotate false positive in init_section_page_cgroup()

46ac0cc9

Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6 · 2decd5a7

Linus Torvalds authored Jul 19, 2010

* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] cio: fix potential overflow in chpid descriptor
  [S390] add missing device put
  [S390] dasd: use correct label location for diag fba disks

2decd5a7

intel_scu_ipc: Oops/crash fixes · b4fd4f89

Sreedhara DS authored Jul 19, 2010

- fix reversing of command/sub arguments
- fix a crash if the i2c interface is called before the device is found
Signed-off-by: Sreedhara DS <sreedhara.ds@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b4fd4f89

kmemleak: Add support for NO_BOOTMEM configurations · 9078370c

Catalin Marinas authored Jul 19, 2010

With commits 08677214 and 59be5a8e, alloc_bootmem()/free_bootmem() and
friends use the early_res functions for memory management when
NO_BOOTMEM is enabled. This patch adds the kmemleak calls in the
corresponding code paths for bootmem allocations.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: stable@kernel.org

9078370c

kmemleak: Annotate false positive in init_section_page_cgroup() · 7952f988

Catalin Marinas authored Jul 19, 2010

The pointer to the page_cgroup table allocated in
init_section_page_cgroup() is stored in section->page_cgroup as (base -
pfn). Since this value does not point to the beginning or inside the
allocated memory block, kmemleak reports a false positive.

This was reported in bugzilla.kernel.org as #16297.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Adrien Dessemond <adrien.dessemond@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Andrew Morton <akpm@linux-foundation.org>

7952f988

[S390] cio: fix potential overflow in chpid descriptor · 878c4956

Sebastian Ott authored Jul 19, 2010

The length filed in the chsc response block (if valid)
has a value of n*(sizeof(chp_desc))+8 (for the response
block header). When we memcopied from the response block
to the actual descriptor we copied 8 bytes too much.
The bug was not revealed since the descriptor is embedded
in struct channel_path.
Since we only write one descriptor at a time ignore the
length value and use sizeof(*desc).
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

878c4956

[S390] add missing device put · 0abccf77

Stefan Haberland authored Jul 19, 2010

The dasd_alias_show function does not return a device reference
in case the device is an alias.
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

0abccf77

[S390] dasd: use correct label location for diag fba disks · cffab6bc

Peter Oberparleiter authored Jul 19, 2010

Partition boundary calculation fails for DASD FBA disks under the
following conditions:
- disk is formatted with CMS FORMAT with a blocksize of more than
  512 bytes
- all of the disk is reserved to a single CMS file using CMS RESERVE
- the disk is accessed using the DIAG mode of the DASD driver

Under these circumstances, the partition detection code tries to
read the CMS label block containing partition-relevant information
from logical block offset 1, while it is in fact located at physical
block offset 1.

Fix this problem by using the correct CMS label block location
depending on the device type as determined by the DASD SENSE ID
information.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

cffab6bc

18 Jul, 2010 5 commits

Merge branch 'x86/kprobes' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland · a9f7f2e7
Linus Torvalds authored Jul 18, 2010
```
* 'x86/kprobes' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland:
  x86: kprobes: fix swapped segment registers in kretprobe
```
a9f7f2e7

x86: kprobes: fix swapped segment registers in kretprobe · a1974798

Roland McGrath authored Jul 16, 2010

In commit f007ea26, the order of the %es and %ds segment registers
got accidentally swapped, so synthesized 'struct pt_regs' frames
have the two values inverted. It's almost sure that these values
never matter, and that they also never differ. But wrong is wrong.
Signed-off-by: Roland McGrath <roland@redhat.com>

a1974798

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 · 2044f228
Linus Torvalds authored Jul 18, 2010
```
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: fall back to original BIOS BAR addresses
```
2044f228

Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 · bea9a6d2

Linus Torvalds authored Jul 18, 2010

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2: Silence gcc warning in ocfs2_write_zero_page().
  jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions
  ocfs2/dlm: Remove BUG_ON from migration in the rare case of a down node
  ocfs2: Don't duplicate pages past i_size during CoW.
  ocfs2: tighten up strlen() checking
  ocfs2: Make xattr reflink work with new local alloc reservation.
  ocfs2: make xattr extension work with new local alloc reservation.
  ocfs2: Remove the redundant cpu_to_le64.
  ocfs2/dlm: don't access beyond bitmap size
  ocfs2: No need to zero pages past i_size.
  ocfs2: Zero the tail cluster when extending past i_size.
  ocfs2: When zero extending, do it by page.
  ocfs2: Limit default local alloc size within bitmap range.
  ocfs2: Move orphan scan work to ocfs2_wq.
  fs/ocfs2/dlm: Add missing spin_unlock

bea9a6d2

drm/i915: add 'reclaimable' to i915 self-reclaimable page allocations · cd9f040d

Linus Torvalds authored Jul 18, 2010

The hibernate issues that got fixed in commit 985b823b ("drm/i915:
fix hibernation since i915 self-reclaim fixes") turn out to have been
incomplete.  Vefa Bicakci tested lots of hibernate cycles, and without
the __GFP_RECLAIMABLE flag the system eventually fails to resume.

With the flag added, Vefa can apparently hibernate forever (or until he
gets bored running his automated scripts, whichever comes first).

The reclaimable flag was there originally, and was one of the flags that
were dropped (unintentionally) by commit 4bdadb97 ("drm/i915:
Selectively enable self-reclaim") that introduced all these problems,
but I didn't want to just blindly add back all the flags in commit
985b823b, and it looked like __GFP_RECLAIM wasn't necessary.  It
clearly was.

I still suspect that there is some subtle reason we're missing that
causes the problems, but __GFP_RECLAIMABLE is certainly not wrong to use
in this context, and is what the code historically used.  And we have no
idea what the causes the corruption without it.
Reported-and-tested-by: M. Vefa Bicakci <bicave@superonline.com>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cd9f040d

16 Jul, 2010 9 commits

x86, pci, mrst: Add extra sanity check in walking the PCI extended cap chain · f82c3d71

Jacob Pan authored Jul 16, 2010

The fixed bar capability structure is searched in PCI extended
configuration space.  We need to make sure there is a valid capability
ID to begin with otherwise, the search code may stuck in a infinite
loop which results in boot hang.  This patch adds additional check for
cap ID 0, which is also invalid, and indicates end of chain.

End of chain is supposed to have all fields zero, but that doesn't
seem to always be the case in the field.
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <1279306706-27087-1-git-send-email-jacob.jun.pan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

f82c3d71

x86: Fix x2apic preenabled system with kexec · fd19dce7

Yinghai Lu authored Jul 15, 2010

Found one x2apic system kexec loop test failed
when CONFIG_NMI_WATCHDOG=y (old) or CONFIG_LOCKUP_DETECTOR=y (current tip)

first kernel can kexec second kernel, but second kernel can not kexec third one.

it can be duplicated on another system with BIOS preenabled x2apic.
First kernel can not kexec second kernel.

It turns out, when kernel boot with pre-enabled x2apic, it will not execute
disable_local_APIC on shutdown path.

when init_apic_mappings() is called in setup_arch, it will skip setting of
apic_phys when x2apic_mode is set. ( x2apic_mode is much early check_x2apic())
Then later, disable_local_APIC() will bail out early because !apic_phys.

So check !x2apic_mode in x2apic_mode in disable_local_APIC with !apic_phys.

another solution could be updating init_apic_mappings() to set apic_phys even
for preenabled x2apic system. Actually even for x2apic system, that lapic
address is mapped already in early stage.

BTW: is there any x2apic preenabled system with apicid of boot cpu > 255?
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4C3EB22B.3000701@kernel.org>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: stable@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

fd19dce7

ocfs2: Silence gcc warning in ocfs2_write_zero_page(). · 5453258d

Joel Becker authored Jul 16, 2010

ocfs2_write_zero_page() has a loop that won't ever be skipped, but gcc
doesn't know that.  Set ret=0 just to make gcc happy.
Signed-off-by: Joel Becker <joel.becker@oracle.com>

5453258d

PCI: fall back to original BIOS BAR addresses · 58c84eda

Bjorn Helgaas authored Jul 15, 2010

If we fail to assign resources to a PCI BAR, this patch makes us try the
original address from BIOS rather than leaving it disabled.

Linux tries to make sure all PCI device BARs are inside the upstream
PCI host bridge or P2P bridge apertures, reassigning BARs if necessary.
Windows does similar reassignment.

Before this patch, if we could not move a BAR into an aperture, we left
the resource unassigned, i.e., at address zero. Windows leaves such BARs
at the original BIOS addresses, and this patch makes Linux do the same.

This is a bit ugly because we disable the resource long before we try to
reassign it, so we have to keep track of the BIOS BAR address somewhere.
For lack of a better place, I put it in the struct pci_dev.

I think it would be cleaner to attempt the assignment immediately when the
claim fails, so we could easily remember the original address. But we
currently claim motherboard resources in the middle, after attempting to
claim PCI resources and before assigning new PCI resources, and changing
that is a fairly big job.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16263Reported-by: Andrew <nitr0@seti.kr.ua>
Tested-by: Andrew <nitr0@seti.kr.ua>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

58c84eda

Merge branch 'perf-fixes-for-linus' of... · f469461d

Linus Torvalds authored Jul 16, 2010

Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: Add alignment to syscall metadata declarations
  perf: Sync callchains with period based hits
  perf: Resurrect flat callchains
  perf: Version String fix, for fallback if not from git
  perf: Version String fix, using kernel version

f469461d

Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes · 79140bc4

Linus Torvalds authored Jul 16, 2010

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
  GFS2: rename causes kernel Oops
  GFS2: BUG in gfs2_adjust_quota
  GFS2: Fix kernel NULL pointer dereference by dlm_astd
  GFS2: recovery stuck on transaction lock
  GFS2: O_TRUNC not working on stuffed files across cluster

79140bc4

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · cc10b6ff

Linus Torvalds authored Jul 16, 2010

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: w90p910_ts - fix call to setup_timer()
  Input: synaptics - fix wrong dimensions check
  Input: i8042 - mark stubs in i8042.h "static inline"

cc10b6ff

Input: w90p910_ts - fix call to setup_timer() · 5b39187f

Wan ZongShun authored Jul 15, 2010

No need to take address, w90p910_ts is already a pointer.
Signed-off-by: Wan ZongShun <mcuos.com@gmail.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>

5b39187f

Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 042bd1ff
Linus Torvalds authored Jul 15, 2010
```
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: skcipher - avoid NULL dereference
```
042bd1ff

15 Jul, 2010 1 commit

jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions · 13ceef09

Jan Kara authored Jul 14, 2010

OCFS2 uses t_commit trigger to compute and store checksum of the just
committed blocks. When a buffer has b_frozen_data, checksum is computed
for it instead of b_data but this can result in an old checksum being
written to the filesystem in the following scenario:

1) transaction1 is opened
2) handle1 is opened
3) journal_access(handle1, bh)
    - This sets jh->b_transaction to transaction1
4) modify(bh)
5) journal_dirty(handle1, bh)
6) handle1 is closed
7) start committing transaction1, opening transaction2
8) handle2 is opened
9) journal_access(handle2, bh)
    - This copies off b_frozen_data to make it safe for transaction1 to commit.
      jh->b_next_transaction is set to transaction2.
10) jbd2_journal_write_metadata() checksums b_frozen_data
11) the journal correctly writes b_frozen_data to the disk journal
12) handle2 is closed
    - There was no dirty call for the bh on handle2, so it is never queued for
      any more journal operation
13) Checkpointing finally happens, and it just spools the bh via normal buffer
writeback.  This will write b_data, which was never triggered on and thus
contains a wrong (old) checksum.

This patch fixes the problem by calling the trigger at the moment data is
frozen for journal commit - i.e., either when b_frozen_data is created by
do_get_write_access or just before we write a buffer to the log if
b_frozen_data does not exist. We also rename the trigger to t_frozen as
that better describes when it is called.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>

13ceef09