Commits · d7f124f129a6aea99938e0d4172c741b56fefeda · Kirill Smelkov / linux

13 Jun, 2011 3 commits

ceph: fix sync and dio writes across stripe boundaries · d7f124f1

Sage Weil authored Jun 13, 2011

We were iterating across stripe boundaries properly, but not moving the
write buffer pointer forward. This caused us to rewrite the same data
after the break. Fix by adjusting the data pointer forward, and
recalculating the io and buffer alignment after the break.
Signed-off-by: Sage Weil <sage@newdream.net>

d7f124f1

libceph: fix page calculation for non-page-aligned io · 9bb0ce2b

Sage Weil authored Jun 13, 2011

Set the page count correctly for non-page-aligned IO.  We were already
doing this correctly for alignment, but not the page count.  Fixes
DIRECT_IO writes from unaligned pages.
Signed-off-by: Sage Weil <sage@newdream.net>

9bb0ce2b

ceph: fix page alignment corrections · 773e9b44

Sage Weil authored Jun 07, 2011

 dd if=/dev/urandom of=/mnt/fs_depot/dd10 bs=500 seek=8388 count=1
 dd if=/mnt/fs_depot/dd10 of=/root/dd10out bs=500 skip=8388 count=1
Reported-by: Henry C Chang <henry.cy.chang@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>

773e9b44

08 Jun, 2011 5 commits

ceph: unwind canceled flock state · 0c1f91f2

Sage Weil authored May 25, 2011

If we request a lock and then abort (e.g., ^C), we need to send a matching
unlock request to the MDS to unwind our lock attempt to avoid indefinitely
blocking other clients.
Reported-by: Brian Chrisman <brchrisman@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>

0c1f91f2

ceph: fix ENOENT logic in striped_read · 0e98728f

Sage Weil authored Jun 07, 2011

Getting ENOENT is equivalent to reading 0 bytes.  Make that correction
before setting up the hit_stripe and was_short flags.

Fixes the following case:
 dd if=/dev/zero of=/mnt/fs_depot/dd3 bs=1 seek=1048576 count=0
 dd if=/mnt/fs_depot/dd3 of=/root/ddout1 skip=8 bs=500 count=2 iflag=direct
Reported-by: Henry C Chang <henry.cy.chang@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>

0e98728f

ceph: fix short sync reads from the OSD · c3cd6283

Sage Weil authored Jun 01, 2011

If we get a short read from the OSD because the object is small, we need to
zero the remainder of the buffer.  For O_DIRECT reads, the attempted range
is not trimmed to i_size by the VFS, so we were actually looping
indefinitely.

Fix by trimming by i_size, and the unconditionally zeroing the trailing
range.
Reported-by: Jeff Wu <cpwu@tnsoft.com.cn>
Signed-off-by: Sage Weil <sage@newdream.net>

c3cd6283

ceph: fix sync vs canceled write · 25845472

Sage Weil authored Jun 03, 2011

If we cancel a write, trigger the safe completions to prevent a sync from
blocking indefinitely in ceph_osdc_sync().
Signed-off-by: Sage Weil <sage@newdream.net>

25845472

ceph: use ihold when we already have an inode ref · 70b666c3

Sage Weil authored May 27, 2011

We should use ihold whenever we already have a stable inode ref, even
when we aren't holding i_lock.  This avoids adding new and unnecessary
locking dependencies.
Signed-off-by: Sage Weil <sage@newdream.net>

70b666c3

24 May, 2011 8 commits

ceph: fix cap flush race reentrancy · db354052

Sage Weil authored May 24, 2011

In e9964c10 we change cap flushing to do a delicate dance because some
inodes on the cap_dirty list could be in a migrating state (got EXPORT but
not IMPORT) in which we couldn't actually flush and move from
dirty->flushing, breaking the while (!empty) { process first } loop
structure.  It worked for a single sync thread, but was not reentrant and
triggered infinite loops when multiple syncers came along.

Instead, move inodes with dirty to a separate cap_dirty_migrating list
when in the limbo export-but-no-import state, allowing us to go back to
the simple loop structure (which was reentrant).  This is cleaner and more
robust.

Audited the cap_dirty users and this looks fine:
list_empty(&ci->i_dirty_item) is still a reliable indicator of whether we
have dirty caps (which list we're on is irrelevant) and list_del_init()
calls still do the right thing.
Signed-off-by: Sage Weil <sage@newdream.net>

db354052

libceph: subscribe to osdmap when cluster is full · cd634fb6

Sage Weil authored May 12, 2011

When the cluster is marked full, subscribe to subsequent map updates to
ensure we find out promptly when it is no longer full. This will prevent
us from spewing ENOSPC for (much) longer than necessary.
Signed-off-by: Sage Weil <sage@newdream.net>

cd634fb6

libceph: handle new osdmap down/state change encoding · 7662d8ff

Sage Weil authored May 03, 2011

Old incrementals encode a 0 value (nearly always) when an osd goes down.
Change that to allow any state bit(s) to be flipped.  Special case 0 to
mean flip the CEPH_OSD_UP bit to mimic the old behavior.
Signed-off-by: Sage Weil <sage@newdream.net>

7662d8ff

rbd: handle online resize of underlying rbd image · 9db4b3e3

Sage Weil authored Apr 19, 2011

If we get a notification that the image header has changed, check for
a change in the image size.
Signed-off-by: Sage Weil <sage@newdream.net>

9db4b3e3

ceph: avoid inode lookup on nfs fh reconnect · 45e3d3ee

Sage Weil authored Apr 06, 2011

If we get the inode from the MDS, we have a reference in req; don't do a
fresh lookup.
Signed-off-by: Sage Weil <sage@newdream.net>

45e3d3ee

ceph: use LOOKUPINO to make unconnected nfs fh more reliable · 3c454cf2

Sage Weil authored Apr 06, 2011

If we are unable to locate an inode by ino, ask the MDS using the new
LOOKUPINO command.
Signed-off-by: Sage Weil <sage@newdream.net>

3c454cf2

rbd: use snprintf for disk->disk_name · aedfec59
Sage Weil authored May 12, 2011
```
Signed-off-by: Sage Weil <sage@newdream.net>
```
aedfec59
rbd: cleanup: make kfree match kmalloc · 916d4d67
Sage Weil authored May 12, 2011
```
Signed-off-by: Sage Weil <sage@newdream.net>
```
916d4d67

19 May, 2011 16 commits

rbd: warn on update_snaps failure on notify · 13143d2d
Sage Weil authored May 12, 2011
```
Signed-off-by: Sage Weil <sage@newdream.net>
```
13143d2d

ceph: check return value for start_request in writepages · 9d6fcb08

Sage Weil authored May 12, 2011

Since we pass the nofail arg, we should never get an error; BUG if we do.
(And fix the function to not return an error if __map_request fails.)
Signed-off-by: Sage Weil <sage@newdream.net>

9d6fcb08

ceph: remove useless check · 6b4a3b51

Sage Weil authored May 12, 2011

rc is only ever 0 or negative in this method.
Signed-off-by: Sage Weil <sage@newdream.net>

6b4a3b51

libceph: add missing breaks in addr_set_port · a2a79609
Sage Weil authored May 12, 2011
```
Signed-off-by: Sage Weil <sage@newdream.net>
```
a2a79609

libceph: fix TAG_WAIT case · 04177882

Sage Weil authored May 12, 2011

If we get a WAIT as a client something went wrong; error out.  And don't
fall through to an unrelated case.
Signed-off-by: Sage Weil <sage@newdream.net>

04177882

ceph: fix broken comparison in readdir loop · da39822c

Sage Weil authored May 12, 2011

Both off and fi->offset are unsigned, so the difference is always >= 0.
Compare them directly instead of the sign of the difference.
Signed-off-by: Sage Weil <sage@newdream.net>

da39822c

libceph: fix osdmap timestamp assignment · 31456665
Sage Weil authored May 12, 2011
```
Signed-off-by: Sage Weil <sage@newdream.net>
```
31456665

ceph: fix rare potential cap leak · 3540303f

Sage Weil authored May 12, 2011

If we grab new_cap, retake the lock, and find we already have a cap now
for the given mds, release new_cap.
Signed-off-by: Sage Weil <sage@newdream.net>

3540303f

libceph: use snprintf for unknown addrs · 12a2f643
Sage Weil authored May 12, 2011
```
Signed-off-by: Sage Weil <sage@newdream.net>
```
12a2f643
libceph: use snprintf for formatting object name · 2dab036b
Sage Weil authored May 12, 2011
```
Signed-off-by: Sage Weil <sage@newdream.net>
```
2dab036b

ceph: use snprintf for dirstat content · ae598083

Sage Weil authored May 12, 2011

We allocate a buffer for rstats if the dirstat option is enabled.  Use
snprintf.
Signed-off-by: Sage Weil <sage@newdream.net>

ae598083

libceph: fix uninitialized value when no get_authorizer method is set · e8f54ce1

Sage Weil authored May 12, 2011

If there is no get_authorizer method we set the out_kvec to a bogus
pointer. The length is also zero in that case, so it doesn't much matter,
but it's better not to add the empty item in the first place.
Signed-off-by: Sage Weil <sage@newdream.net>

e8f54ce1

libceph: remove unused variable · 1b366985
Sage Weil authored May 12, 2011
```
Signed-off-by: Sage Weil <sage@newdream.net>
```
1b366985

libceph: handle connection reopen race with callbacks · 0da5d703

Sage Weil authored May 19, 2011

If a connection is closed and/or reopened (ceph_con_close, ceph_con_open)
it can race with a callback.  con_work does various state checks for
closed or reopened sockets at the beginning, but drops con->mutex before
making callbacks.  We need to check for state bit changes after retaking
the lock to ensure we restart con_work and execute those CLOSED/OPENING
tests or else we may end up operating under stale assumptions.

In Jim's case, this was causing 'bad tag' errors.

There are four cases where we re-take the con->mutex inside con_work: catch
them all and return EAGAIN from try_{read,write} so that we can restart
con_work.
Reported-by: Jim Schutt <jaschut@sandia.gov>
Tested-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Sage Weil <sage@newdream.net>

0da5d703

ceph: take reference on mds request r_unsafe_dir · 3b663780

Sage Weil authored May 18, 2011

We put ourselves on an inode list for the parent directory of metadata
operations so that an fsync on the directory will wait for metadata updates
to commit to disk.  We weren't holding a reference to that directory,
however, and under certain workloads (fsstress in this case) the directory
can go away.
Signed-off-by: Sage Weil <sage@newdream.net>

3b663780

Linux 2.6.39 · 61c4f2c8
Linus Torvalds authored May 18, 2011

61c4f2c8

18 May, 2011 8 commits

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 · 3f80fbff

Linus Torvalds authored May 18, 2011

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  configfs: Fix race between configfs_readdir() and configfs_d_iput()
  configfs: Don't try to d_delete() negative dentries.
  ocfs2/dlm: Target node death during resource migration leads to thread spin
  ocfs2: Skip mount recovery for hard-ro mounts
  ocfs2/cluster: Heartbeat mismatch message improved
  ocfs2/cluster: Increase the live threshold for global heartbeat
  ocfs2/dlm: Use negotiated o2dlm protocol version
  ocfs2: skip existing hole when removing the last extent_rec in punching-hole codes.
  ocfs2: Initialize data_ac (might be used uninitialized)

3f80fbff

Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6 · fce51958

Linus Torvalds authored May 18, 2011

* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
  drivercore: revert addition of of_match to struct device
  of: fix race when matching drivers

fce51958

Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus · 7103dbed

Linus Torvalds authored May 18, 2011

* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus:
  MIPS: Kludge IP27 build for 2.6.39.
  MIPS: AR7: Fix GPIO register size for Titan variant.
  MIPS: Fix duplicate invocation of notify_die.
  MIPS: RB532: Fix iomap resource size miscalculation.

7103dbed

drivercore: revert addition of of_match to struct device · b1608d69

Grant Likely authored May 18, 2011

Commit b826291c, "drivercore/dt: add a match table pointer to struct
device" added an of_match pointer to struct device to cache the
of_match_table entry discovered at driver match time.  This was unsafe
because matching is not an atomic operation with probing a driver.  If
two or more drivers are attempted to be matched to a driver at the
same time, then the cached matching entry pointer could get
overwritten.

This patch reverts the of_match cache pointer and reworks all users to
call of_match_device() directly instead.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>

b1608d69

of: fix race when matching drivers · 01294d82

Milton Miller authored May 18, 2011

If two drivers are probing devices at the same time, both will write
their match table result to the dev->of_match cache at the same time.

Only write the result if the device matches.

In a thread titled "SBus devices sometimes detected, sometimes not",
Meelis reported his SBus hme was not detected about 50% of the time.
From the debug suggested by Grant it was obvious another driver matched
some devices between the call to match the hme and the hme discovery
failling.
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Milton Miller <miltonm@bga.com>
[grant.likely: modified to only call of_match_device() once]
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>

01294d82

Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block · a2b9c1f6

Linus Torvalds authored May 18, 2011

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: don't delay blk_run_queue_async
  scsi: remove performance regression due to async queue run
  blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup
  block: rescan partitions on invalidated devices on -ENOMEDIA too
  cdrom: always check_disk_change() on open
  block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers

a2b9c1f6

MIPS: Kludge IP27 build for 2.6.39. · a5602a32
Ralf Baechle authored May 18, 2011
```
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
```
a5602a32

MIPS: AR7: Fix GPIO register size for Titan variant. · 3e9957b4

Florian Fainelli authored May 13, 2011

The 'size' variable contains the correct register size for both AR7
and Titan, but we never used it to ioremap the correct register size.
This problem only shows up on Titan.

[ralf@linux-mips.org: Fixed the fix.  The original patch as in patchwork
recognizes the problem correctly then fails to fix it ...]
Reported-by: Alexander Clouter <alex@digriz.org.uk>
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Patchwork: https://patchwork.linux-mips.org/patch/2380/Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

3e9957b4