Commits · 3424bf6a772cff606fc4bc24a3639c937afb547f · nexedi / linux

24 Jun, 2010 12 commits

md/raid5: don't include 'spare' drives when reshaping to fewer devices. · 3424bf6a

NeilBrown authored Jun 17, 2010

There are few situations where it would make any sense to add a spare
when reducing the number of devices in an array, but it is
conceivable:  A 6 drive RAID6 with two missing devices could be
reshaped to a 5 drive RAID6, and a spare could become available
just in time for the reshape, but not early enough to have been
recovered first.  'freezing' recovery can make this easy to
do without any races.

However doing such a thing is a bad idea.  md will not record the
partially-recovered state of the 'spare' and when the reshape
finished it will think that the spare is still spare.
Easiest way to avoid this confusion is to simply disallow it.
Signed-off-by: NeilBrown <neilb@suse.de>

3424bf6a

md/raid5: add a missing 'continue' in a loop. · 2f115882

NeilBrown authored Jun 17, 2010

As the comment says, the tail of this loop only applies to devices
that are not fully in sync, so if In_sync was set, we should avoid
the rest of the loop.

This bug will hardly ever cause an actual problem.  The worst it
can do is allow an array to be assembled that is dirty and degraded,
which is not generally a good idea (without warning the sysadmin
first).

This will only happen if the array is RAID4 or a RAID5/6 in an
intermediate state during a reshape and so has one drive that is
all 'parity' - no data - while some other device has failed.

This is certainly possible, but not at all common.
Signed-off-by: NeilBrown <neilb@suse.de>

2f115882

md/raid5: Allow recovered part of partially recovered devices to be in-sync · 415e72d0

NeilBrown authored Jun 17, 2010

During a recovery of reshape the early part of some devices might be
in-sync while the later parts are not.
We we know we are looking at an early part it is good to treat that
part as in-sync for stripe calculations.

This is particularly important for a reshape which suffers device
failure.  Treating the data as in-sync can mean the difference between
data-safety and data-loss.
Signed-off-by: NeilBrown <neilb@suse.de>

415e72d0

md/raid5: More careful check for "has array failed". · 674806d6

NeilBrown authored Jun 16, 2010

When we are reshaping an array, the device failure combinations
that cause us to decide that the array as failed are more subtle.

In particular, any 'spare' will be fully in-sync in the section
of the array that has already been reshaped, thus failures that
affect only that section are less critical.

So encode this subtlety in a new function and call it as appropriate.

The case that showed this problem was a 4 drive RAID5 to 8 drive RAID6
conversion where the last two devices failed.
This resulted in:

  good good good good incomplete good good failed failed

while converting a 5-drive RAID6 to 8 drive RAID5
The incomplete device causes the whole array to look bad,
bad as it was actually good for the section that had been
converted to 8-drives, all the data was actually safe.
Reported-by: Terry Morris <tbmorris@tbmorris.com>
Signed-off-by: NeilBrown <neilb@suse.de>

674806d6

md: Don't update ->recovery_offset when reshaping an array to fewer devices. · 70fffd0b

NeilBrown authored Jun 16, 2010

When an array is reshaped to have fewer devices, the reshape proceeds
from the end of the devices to the beginning.

If a device happens to be non-In_sync (which is possible but rare)
we would normally update the ->recovery_offset as the reshape
progresses. However that would be wrong as the recover_offset records
that the early part of the device is in_sync, while in fact it would
only be the later part that is in_sync, and in any case the offset
number would be measured from the wrong end of the device.

Relatedly, if after a reshape a spare is discovered to not be
recoverred all the way to the end, not allow spare_active
to incorporate it in the array.

This becomes relevant in the following sample scenario:

A 4 drive RAID5 is converted to a 6 drive RAID6 in a combined
operation.
The RAID5->RAID6 conversion will cause a 5 drive to be included as a
spare, then the 5drive -> 6drive reshape will effectively rebuild that
spare as it progresses.  The 6th drive is treated as in_sync the whole
time as there is never any case that we might consider reading from
it, but must not because there is no valid data.

If we interrupt this reshape part-way through and reverse it to return
to a 5-drive RAID6 (or event a 4-drive RAID5), we don't want to update
the recovery_offset - as that would be wrong - and we don't want to
include that spare as active in the 5-drive RAID6 when the reversed
reshape completed and it will be mostly out-of-sync still.
Signed-off-by: NeilBrown <neilb@suse.de>

70fffd0b

md/raid5: avoid oops when number of devices is reduced then increased. · e4e11e38

NeilBrown authored Jun 16, 2010

The entries in the stripe_cache maintained by raid5 are enlarged
when we increased the number of devices in the array, but not
shrunk when we reduce the number of devices.
So if entries are added after reducing the number of devices, we
much ensure to initialise the whole entry, not just the part that
is currently relevant.  Otherwise if we enlarge the array again,
we will reference uninitialised values.

As grow_buffers/shrink_buffer now want to use a count that is stored
explicity in the raid_conf, they should get it from there rather than
being passed it as a parameter.
Signed-off-by: NeilBrown <neilb@suse.de>

e4e11e38

md: enable raid4->raid0 takeover · 049d6c1e

Maciej Trela authored Jun 16, 2010

Only level 5 with layout=PARITY_N can be taken over to raid0 now.
Lets allow level 4 either.
Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

049d6c1e

md: clear layout after ->raid0 takeover · 001048a3

Maciej Trela authored Jun 16, 2010

After takeover from raid5/10 -> raid0 mddev->layout is not cleared.
Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

001048a3

md: fix raid10 takeover: use new_layout for setup_conf · f73ea873

Maciej Trela authored Jun 16, 2010

Use mddev->new_layout in setup_conf.
Also use new_chunk, and don't set ->degraded in takeover().  That
gets set in run()
Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

f73ea873

md: fix handling of array level takeover that re-arranges devices. · e93f68a1

NeilBrown authored Jun 15, 2010

Most array level changes leave the list of devices largely unchanged,
possibly causing one at the end to become redundant.
However conversions between RAID0 and RAID10 need to renumber
all devices (except 0).

This renumbering is currently being done in the ->run method when the
new personality takes over.  However this is too late as the common
code in md.c might already have invalidated some of the devices if
they had a ->raid_disk number that appeared to high.

Moving it into the ->takeover method is too early as the array is
still active at that time and wrong ->raid_disk numbers could cause
confusion.

So add a ->new_raid_disk field to mdk_rdev_s and use it to communicate
the new raid_disk number.
Now the common code knows exactly which devices need to be renumbered,
and which can be invalidated, and can do it all at a convenient time
when the array is suspend.
It can also update some symlinks in sysfs which previously were not be
updated correctly.
Reported-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

e93f68a1

md: raid10: Fix null pointer dereference in fix_read_error() · 0544a21d

Prasanna S. Panchamukhi authored Jun 24, 2010

Such NULL pointer dereference can occur when the driver was fixing the
read errors/bad blocks and the disk was physically removed
causing a system crash. This patch check if the
rcu_dereference() returns valid rdev before accessing it in fix_read_error().

Cc: stable@kernel.org
Signed-off-by: Prasanna S. Panchamukhi <prasanna.panchamukhi@riverbed.com>
Signed-off-by: Rob Becker <rbecker@riverbed.com>
Signed-off-by: NeilBrown <neilb@suse.de>

0544a21d

Restore partition detection of newly created md arrays. · f3b99be1

NeilBrown authored Jun 24, 2010

Commit  b821eaa5 broke partition
detection for md arrays.

The logic was almost right.  However if revalidate_disk is called
when the device is not yet open, bdev->bd_disk won't be set, so the
flush_disk() Call will not set bd_invalidated.

So when md_open is called we still need to ensure that
->bd_invalidated gets set.  This is easily done with a call to
check_disk_size_change in the place where the offending commit removed
check_disk_change.  At the important times, the size will have changed
from 0 to non-zero, so check_disk_size_change will set bd_invalidated.
Tested-by: Duncan <1i5t5.duncan@cox.net>
Reported-by: Duncan <1i5t5.duncan@cox.net>
Signed-off-by: NeilBrown <neilb@suse.de>

f3b99be1

12 Jun, 2010 1 commit
- Linux 2.6.35-rc3 · 7e27d6e7
  Linus Torvalds authored Jun 11, 2010
  
  7e27d6e7
11 Jun, 2010 27 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · 4cea8706

Linus Torvalds authored Jun 11, 2010

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  wimax/i2400m: fix missing endian correction read in fw loader
  net8139: fix a race at the end of NAPI
  pktgen: Fix accuracy of inter-packet delay.
  pkt_sched: gen_estimator: add a new lock
  net: deliver skbs on inactive slaves to exact matches
  ipv6: fix ICMP6_MIB_OUTERRORS
  r8169: fix mdio_read and update mdio_write according to hw specs
  gianfar: Revive the driver for eTSEC devices (disable timestamping)
  caif: fix a couple range checks
  phylib: Add support for the LXT973 phy.
  net: Print num_rx_queues imbalance warning only when there are allocated queues

4cea8706

Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 · 7ae1277a
Linus Torvalds authored Jun 11, 2010
```
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM / x86: Save/restore MISC_ENABLE register
```
7ae1277a

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable · b25b550b

Linus Torvalds authored Jun 11, 2010

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: The file argument for fsync() is never null
  Btrfs: handle ERR_PTR from posix_acl_from_xattr()
  Btrfs: avoid BUG when dropping root and reference in same transaction
  Btrfs: prohibit a operation of changing acl's mask when noacl mount option used
  Btrfs: should add a permission check for setfacl
  Btrfs: btrfs_lookup_dir_item() can return ERR_PTR
  Btrfs: btrfs_read_fs_root_no_name() returns ERR_PTRs
  Btrfs: unwind after btrfs_start_transaction() errors
  Btrfs: btrfs_iget() returns ERR_PTR
  Btrfs: handle kzalloc() failure in open_ctree()
  Btrfs: handle error returns from btrfs_lookup_dir_item()
  Btrfs: Fix BUG_ON for fs converted from extN
  Btrfs: Fix null dereference in relocation.c
  Btrfs: fix remap_file_pages error
  Btrfs: uninitialized data is check_path_shared()
  Btrfs: fix fallocate regression
  Btrfs: fix loop device on top of btrfs

b25b550b

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 · eda05477

Linus Torvalds authored Jun 11, 2010

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: clear bridge resource range if BIOS assigned bad one
  PCI: hotplug/cpqphp, fix NULL dereference
  Revert "PCI: create function symlinks in /sys/bus/pci/slots/N/"
  PCI: change resource collision messages from KERN_ERR to KERN_INFO

eda05477

PCI: clear bridge resource range if BIOS assigned bad one · 837c4ef1

Yinghai Lu authored Jun 03, 2010

Yannick found that video does not work with 2.6.34.  The cause of this
bug was that the BIOS had assigned the wrong range to the PCI bridge
above the video device.  Before 2.6.34 the kernel would have shrunk
the size of the bridge window, but since
  d65245c3 PCI: don't shrink bridge resources
the kernel will avoid shrinking BIOS ranges.

So zero out the old range if we fail to claim it at boot time; this will
cause us to allocate a new range at startup, restoring the 2.6.34
behavior.

Fixes regression https://bugzilla.kernel.org/show_bug.cgi?id=16009.
Reported-by: Yannick <yannick.roehlly@free.fr>
Acked-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

837c4ef1

PCI: hotplug/cpqphp, fix NULL dereference · a7ef7d1f

Jiri Slaby authored Jun 09, 2010

There are devices out there which are PCI Hot-plug controllers with
compaq PCI IDs, but are not bridges, hence have pdev->subordinate
NULL. But cpqphp expects the pointer to be non-NULL.

Add a check to the probe function to avoid oopses like:
BUG: unable to handle kernel NULL pointer dereference at 00000050
IP: [<f82e3c41>] cpqhpc_probe+0x951/0x1120 [cpqphp]
*pdpt = 0000000033779001 *pde = 0000000000000000
...

The device here was:
00:0b.0 PCI Hot-plug controller [0804]: Compaq Computer Corporation PCI Hotplug Controller [0e11:a0f7] (rev 11)
	Subsystem: Compaq Computer Corporation Device [0e11:a2f8]
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

a7ef7d1f

Revert "PCI: create function symlinks in /sys/bus/pci/slots/N/" · 3be434f0

Jesse Barnes authored Jun 11, 2010

This reverts commit 75568f80.

Since they're just a convenience anyway, remove these symlinks since
they're causing duplicate filename errors in the wild.
Acked-by: Alex Chiang <achiang@canonical.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

3be434f0

PCI: change resource collision messages from KERN_ERR to KERN_INFO · f6d440da

Bjorn Helgaas authored Jun 03, 2010

We can often deal with PCI resource issues by moving devices around. In
that case, there's no point in alarming the user with messages like these.
There are many bug reports where the message itself is the only problem,
e.g., https://bugs.launchpad.net/ubuntu/+source/linux/+bug/413419 .
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

f6d440da

Btrfs: The file argument for fsync() is never null · 6f902af4

Dan Carpenter authored May 29, 2010

The "file" argument for fsync is never null so we can remove this check.

What drew my attention here is that 7ea80859: "drop unused dentry
argument to ->fsync" introduced an unconditional dereference at the
start of the function and that generated a smatch warning.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

6f902af4

Btrfs: handle ERR_PTR from posix_acl_from_xattr() · 834e7475

Dan Carpenter authored May 29, 2010

posix_acl_from_xattr() returns both ERR_PTRs and null, but it's OK to
pass null values to set_cached_acl()
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

834e7475

Btrfs: avoid BUG when dropping root and reference in same transaction · 15e70000

Sage Weil authored May 17, 2010

If btrfs_ioctl_snap_destroy() deletes a snapshot but finishes
with end_transaction(), the cleaner kthread may come in and
drop the root in the same transaction.  If that's the case, the
root's refs still == 1 in the tree when btrfs_del_root() deletes
the item, because commit_fs_roots() hasn't updated it yet (that
happens during the commit).

This wasn't a problem before only because
btrfs_ioctl_snap_destroy() would commit the transaction before dropping
the dentry reference, so the dead root wouldn't get queued up until
after the fs root item was updated in the btree.

Since it is not an error to drop the root reference and the root in the
same transaction, just drop the BUG_ON() in btrfs_del_root().
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

15e70000

Btrfs: prohibit a operation of changing acl's mask when noacl mount option used · 731e3d1b

Shi Weihua authored May 18, 2010

when used Posix File System Test Suite(pjd-fstest) to test btrfs,
some cases about setfacl failed when noacl mount option used.
I simplified used commands in pjd-fstest, and the following steps
can reproduce it.
------------------------
# cd btrfs-part/
# mkdir aaa
# setfacl -m m::rw aaa    <- successed, but not expected by pjd-fstest.
------------------------
I checked ext3, a warning message occured, like as:
  setfacl: aaa/: Operation not supported
Certainly, it's expected by pjd-fstest.

So, i compared acl.c of btrfs and ext3. Based on that, a patch created.
Fortunately, it works.
Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

731e3d1b

Btrfs: should add a permission check for setfacl · 2f26afba

Shi Weihua authored May 18, 2010

On btrfs, do the following
------------------
# su user1
# cd btrfs-part/
# touch aaa
# getfacl aaa
  # file: aaa
  # owner: user1
  # group: user1
  user::rw-
  group::rw-
  other::r--
# su user2
# cd btrfs-part/
# setfacl -m u::rwx aaa
# getfacl aaa
  # file: aaa
  # owner: user1
  # group: user1
  user::rwx           <- successed to setfacl
  group::rw-
  other::r--
------------------
but we should prohibit it that user2 changing user1's acl.
In fact, on ext3 and other fs, a message occurs:
  setfacl: aaa: Operation not permitted

This patch fixed it.
Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

2f26afba

Btrfs: btrfs_lookup_dir_item() can return ERR_PTR · cf1e99a4

Dan Carpenter authored May 29, 2010

btrfs_lookup_dir_item() can return either ERR_PTRs or null.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

cf1e99a4

Btrfs: btrfs_read_fs_root_no_name() returns ERR_PTRs · 3140c9a3

Dan Carpenter authored May 29, 2010

btrfs_read_fs_root_no_name() returns ERR_PTRs on error so I added a
check for that.  It's not clear to me if it can also return NULL
pointers or not so I left the original NULL pointer check as is.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

3140c9a3

Btrfs: unwind after btrfs_start_transaction() errors · d327099a

Dan Carpenter authored May 29, 2010

This was added by a22285a6: "Btrfs: Integrate metadata reservation
with start_transaction".  If we goto out here then we skip all the
unwinding and there are locks still held etc.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

d327099a

Btrfs: btrfs_iget() returns ERR_PTR · 4cbd1149

Dan Carpenter authored May 29, 2010

btrfs_iget() returns an ERR_PTR() on failure and not null.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

4cbd1149

Btrfs: handle kzalloc() failure in open_ctree() · 676e4c86

Dan Carpenter authored May 29, 2010

Unwind and return -ENOMEM if the allocation fails here.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

676e4c86

Btrfs: handle error returns from btrfs_lookup_dir_item() · fb4f6f91

Dan Carpenter authored May 29, 2010

If btrfs_lookup_dir_item() fails, we should can just let the mount fail
with an error.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

fb4f6f91

Btrfs: Fix BUG_ON for fs converted from extN · 3bf84a5a

Yan, Zheng authored May 31, 2010

Tree blocks can live in data block groups in FS converted from extN.
So it's easy to trigger the BUG_ON.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

3bf84a5a

Btrfs: Fix null dereference in relocation.c · 046f264f

Yan, Zheng authored May 31, 2010

Fix a potential null dereference in relocation.c
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Acked-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

046f264f

Merge branch 'wimax-2.6.35.y' of git://git.kernel.org/pub/scm/linux/kernel/git/inaky/wimax · e79aa867
David S. Miller authored Jun 11, 2010

e79aa867

wimax/i2400m: fix missing endian correction read in fw loader · a385a53e

Inaky Perez-Gonzalez authored Jun 11, 2010

i2400m_fw_hdr_check() was accessing hardware field
bcf_hdr->module_type (little endian 32) without converting to host
byte sex.
Reported-by: Данилин Михаил <mdanilin@nsg.net.ru>
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>

a385a53e

Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 · 891a9894

Linus Torvalds authored Jun 11, 2010

* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
  kbuild: Create output directory in Makefile.modbuiltin
  kbuild: Generate modules.builtin in make modules

891a9894

Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6 · f1f6ea35

Linus Torvalds authored Jun 11, 2010

* 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6:
  pcmcia: avoid validate_cis failure on CIS override
  pcmcia: dev_node removal bugfix
  pcmcia: yenta_socket.c Remove extra #ifdef CONFIG_YENTA_TI
  pcmcia: only keep saved I365_CSCINT flag if there is no PCI irq

f1f6ea35

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 63c70a0d

Linus Torvalds authored Jun 11, 2010

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: try to send partial cap release on cap message on missing inode
  ceph: release cap on import if we don't have the inode
  ceph: fix misleading/incorrect debug message
  ceph: fix atomic64_t initialization on ia64
  ceph: fix lease revocation when seq doesn't match
  ceph: fix f_namelen reported by statfs
  ceph: fix memory leak in statfs
  ceph: fix d_subdirs ordering problem

63c70a0d

Btrfs: fix remap_file_pages error · 058a457e

Miao Xie authored May 20, 2010

when we use remap_file_pages() to remap a file, remap_file_pages always return
error. It is because btrfs didn't set VM_CAN_NONLINEAR for vma.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

058a457e