Commits · a1c15c59feee36267c43142a41152fbf7402afb6 · nexedi / linux

24 May, 2011 16 commits

loop: handle on-demand devices correctly · a1c15c59

Namhyung Kim authored May 24, 2011

When finding or allocating a loop device, loop_probe() did not take
partition numbers into account so that it can result to a different
device. Consider following example:

$ sudo modprobe loop max_part=15
$ ls -l /dev/loop*
brw-rw---- 1 root disk 7,   0 2011-05-24 22:16 /dev/loop0
brw-rw---- 1 root disk 7,  16 2011-05-24 22:16 /dev/loop1
brw-rw---- 1 root disk 7,  32 2011-05-24 22:16 /dev/loop2
brw-rw---- 1 root disk 7,  48 2011-05-24 22:16 /dev/loop3
brw-rw---- 1 root disk 7,  64 2011-05-24 22:16 /dev/loop4
brw-rw---- 1 root disk 7,  80 2011-05-24 22:16 /dev/loop5
brw-rw---- 1 root disk 7,  96 2011-05-24 22:16 /dev/loop6
brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7
$ sudo mknod /dev/loop8 b 7 128
$ sudo losetup /dev/loop8 ~/temp/disk-with-3-parts.img
$ sudo losetup -a
/dev/loop128: [0805]:278201 (/home/namhyung/temp/disk-with-3-parts.img)
$ ls -l /dev/loop*
brw-rw---- 1 root disk 7,    0 2011-05-24 22:16 /dev/loop0
brw-rw---- 1 root disk 7,   16 2011-05-24 22:16 /dev/loop1
brw-rw---- 1 root disk 7, 2048 2011-05-24 22:18 /dev/loop128
brw-rw---- 1 root disk 7, 2049 2011-05-24 22:18 /dev/loop128p1
brw-rw---- 1 root disk 7, 2050 2011-05-24 22:18 /dev/loop128p2
brw-rw---- 1 root disk 7, 2051 2011-05-24 22:18 /dev/loop128p3
brw-rw---- 1 root disk 7,   32 2011-05-24 22:16 /dev/loop2
brw-rw---- 1 root disk 7,   48 2011-05-24 22:16 /dev/loop3
brw-rw---- 1 root disk 7,   64 2011-05-24 22:16 /dev/loop4
brw-rw---- 1 root disk 7,   80 2011-05-24 22:16 /dev/loop5
brw-rw---- 1 root disk 7,   96 2011-05-24 22:16 /dev/loop6
brw-rw---- 1 root disk 7,  112 2011-05-24 22:16 /dev/loop7
brw-r--r-- 1 root root 7,  128 2011-05-24 22:17 /dev/loop8

After this patch, /dev/loop8 - instead of /dev/loop128 - was
accessed correctly.

In addition, 'range' passed to blk_register_region() should
include all range of dev_t that LOOP_MAJOR can address. It does
not need to be limited by partition numbers unless 'max_loop'
param was specified.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Laurent Vivier <Laurent.Vivier@bull.net>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>

a1c15c59

loop: limit 'max_part' module param to DISK_MAX_PARTS · 78f4bb36

Namhyung Kim authored May 24, 2011

The 'max_part' parameter controls the number of maximum partition
a loop block device can have. However if a user specifies very
large value it would exceed the limitation of device minor number
and can cause a kernel panic (or, at least, produce invalid
device nodes in some cases).

On my desktop system, following command kills the kernel. On qemu,
it triggers similar oops but the kernel was alive:

$ sudo modprobe loop max_part0000
 ------------[ cut here ]------------
 kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65!
 invalid opcode: 0000 [#1] SMP
 last sysfs file:
 CPU 0
 Modules linked in: loop(+)

 Pid: 43, comm: insmod Tainted: G        W   2.6.39-qemu+ #155 Bochs Bochs
 RIP: 0010:[<ffffffff8113ce61>]  [<ffffffff8113ce61>] internal_create_group=
+0x2a/0x170
 RSP: 0018:ffff880007b3fde8  EFLAGS: 00000246
 RAX: 00000000ffffffef RBX: ffff880007b3d878 RCX: 00000000000007b4
 RDX: ffffffff8152da50 RSI: 0000000000000000 RDI: ffff880007b3d878
 RBP: ffff880007b3fe38 R08: ffff880007b3fde8 R09: 0000000000000000
 R10: ffff88000783b4a8 R11: ffff880007b3d878 R12: ffffffff8152da50
 R13: ffff880007b3d868 R14: 0000000000000000 R15: ffff880007b3d800
 FS:  0000000002137880(0063) GS:ffff880007c00000(0000) knlGS:00000000000000=
00
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000422680 CR3: 0000000007b50000 CR4: 00000000000006b0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
 Process insmod (pid: 43, threadinfo ffff880007b3e000, task ffff880007afb9c=
0)
 Stack:
  ffff880007b3fe58 ffffffff811e66dd ffff880007b3fe58 ffffffff811e570b
  0000000000000010 ffff880007b3d800 ffff880007a7b390 ffff880007b3d868
  0000000000400920 ffff880007b3d800 ffff880007b3fe48 ffffffff8113cfc8
 Call Trace:
  [<ffffffff811e66dd>] ? device_add+0x4bc/0x5af
  [<ffffffff811e570b>] ? dev_set_name+0x3c/0x3e
  [<ffffffff8113cfc8>] sysfs_create_group+0xe/0x12
  [<ffffffff810b420e>] blk_trace_init_sysfs+0x14/0x16
  [<ffffffff8116a090>] blk_register_queue+0x47/0xf7
  [<ffffffff8116f527>] add_disk+0xdf/0x290
  [<ffffffffa00060eb>] loop_init+0xeb/0x1b8 [loop]
  [<ffffffffa0006000>] ? 0xffffffffa0005fff
  [<ffffffff8100020a>] do_one_initcall+0x7a/0x12e
  [<ffffffff81096804>] sys_init_module+0x9c/0x1e0
  [<ffffffff813329bb>] system_call_fastpath+0x16/0x1b
 Code: c3 55 48 89 e5 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 53 48 89 fb=
 48 83 ec 28 48 85 ff 74 0b 85 f6 75 0b 48 83 7f 30 00 75 14 <0f> 0b eb fe =
48 83 7f 30 00 b9 ea ff ff ff 0f 84 18 01 00 00 49
 RIP  [<ffffffff8113ce61>] internal_create_group+0x2a/0x170
  RSP <ffff880007b3fde8>
 ---[ end trace a123eb592043acad ]---
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Laurent Vivier <Laurent.Vivier@bull.net>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>

78f4bb36

drbd: fix warning · 0ddf72be

Andrew Morton authored May 23, 2011

In file included from drivers/block/drbd/drbd_main.c:54: drivers/block/drbd/drbd_int.h:1190: warning: parameter has incomplete type

Forward declarations of enums do not work.

Fix it unpleasantly by moving the prototype.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Lars Ellenberg <drbd-dev@lists.linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

0ddf72be

drbd: fix warning · 9b2f61ae
Philipp Reisner authored May 24, 2011
```
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
```
9b2f61ae

drbd: Fix spelling · 24c4830c

Bart Van Assche authored May 21, 2011

Found these with the help of ispell -l.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>

24c4830c

drbd: fix schedule in atomic · 9a0d9d03

Lars Ellenberg authored May 02, 2011

An administrative detach used to request a state change directly to D_DISKLESS,
first suspending IO to avoid the last put_ldev() occuring from an endio handler,
potentially in irq context.

This is not enough on the receiving side (typically secondary), we may miss
some peer_req on the way to local disk, which then may do the last put_ldev()
from their drbd_peer_request_endio().

This patch makes the detach always go through the intermediate D_FAILED state.
We may consider to rename it D_DETACHING.

Alternative approach would be to create yet an other work item to be scheduled
on the worker, do the destructor work from there, and get the timing right.

manually picked commit 564040f from the drbd 8.4 branch.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

9a0d9d03

drbd: Take a more conservative approach when deciding max_bio_size · 99432fcc

Philipp Reisner authored May 20, 2011

The old (optimistic) implementation could shrink the bio size
on an primary device.

Shrinking the bio size on a primary device is bad. Since there
we might get BIOs with the old (bigger) size shortly after
we published the new size.

The new implementation is more conservative, and eventually
increases the max_bio_size on a primary device (which is valid).
It does so, when it knows the local limit AND the remote limit.

 We cache the last seen max_bio_size of the peer in the meta
 data, and rely on that, to make the operation of single
 nodes more efficient.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

99432fcc

drbd: Fixed state transitions after async outdate-peer-handler returned · 21423fa7
Philipp Reisner authored May 17, 2011
```
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
```
21423fa7

drbd: Disallow the peer_disk_state to be D_OUTDATED while connected · fa7d9396

Philipp Reisner authored May 17, 2011

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

fa7d9396

drbd: Fix for the connection problems on high latency links · a8e40792

Philipp Reisner authored May 13, 2011

It seems that the real cause of all the issues where that
we did not noticed in drbd_try_connect() when the other
guy closes one socket if the round trip time gets higher
than 100ms. There were that 100ms hard coded!
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

a8e40792

drbd: fix potential activity log refcount imbalance in error path · 76727f68

Lars Ellenberg authored May 16, 2011

It is no longer sufficient to trigger on local WRITE,
we need to check on (rq_state & RQ_IN_ACT_LOG)
before calling drbd_al_complete_io also in the error path.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

76727f68

drbd: Only downgrade the disk state in case of disk failures · d2e17807

Philipp Reisner authored Mar 14, 2011

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

d2e17807

drbd: fix disconnect/reconnect loop, if ping-timeout == ping-int · f36af18c

Lars Ellenberg authored Mar 09, 2011

If there is no replication traffic within the idle timeout
(ping-int seconds), DRBD will send a P_PING,
and adjust the timeout to ping-timeout.

If there is no P_PING_ACK received within this ping-timeout,
DRBD finally drops the connection, and tries to re-establish it.

To decide which timeout was active, we compared the current timeout
with the ping-timeout, and dropped the connection, if that was the case.

By default, ping-int is 10 seconds, ping-timeout is 500 ms.

Unfortunately, if you configure ping-timeout to be the same as ping-int,
expiry of the idle-timeout had been mistaken for a missing ping ack,
and caused an immediate reconnection attempt.

Fix:
Allow both timeouts to be equal, use a local variable
to store which timeout is active.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

f36af18c

drbd: fix potential distributed deadlock · 53ea4331

Lars Ellenberg authored Mar 08, 2011

We limit ourselves to a configurable maximum number of pages used as
temporary bio pages.

If the configured "max_buffers" is not big enough to match the bandwidth
of the respective deployment, a distributed deadlock could be triggered
by e.g. fast online verify and heavy application IO.

TCP connections would block on congestion, because both receivers
would wait on pages to become available.

Fortunately the respective senders in this case would be able to give
back some pages already. So do that.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

53ea4331

lru_cache.h: fix comments referring to ts_ instead of lc_ · 600942e0

Lars Ellenberg authored Jan 27, 2011

For some time we contemplated calling the "struct lru_cache"
a "struct tracked_set", and some comments kept the ts_ prefix.

Fix those to match the member field names.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

600942e0

drbd: Fix for application IO with the on-io-error=pass-on policy · 738a84b2

Philipp Reisner authored Mar 03, 2011

In case a write failes on the local disk, go into D_INCONSISTENT
disk state. That causes future reads of that block to be shipped
to the peer.

Read retry remote was already in place.

Actually the documentation needs to get fixed now. Since the
application is still shielded from the error. (as long as we have
only a single disk failing) The difference to detach is that
we keep the disk. And therefore might keep all the other, still
working sectors up to date.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

738a84b2

19 May, 2011 1 commit

Merge branches 'for-jens/xen-backend-fixes' and 'for-jens/xen-blkback-v3.3' of... · 779d5306

Jens Axboe authored May 19, 2011

Merge branches 'for-jens/xen-backend-fixes' and 'for-jens/xen-blkback-v3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-2.6.40/drivers

779d5306

18 May, 2011 3 commits

xen/p2m: Add EXPORT_SYMBOL_GPL to the M2P override functions. · c9ce9e43

Konrad Rzeszutek Wilk authored Apr 20, 2011

If the backends, which use these two functions, are compiled as
a module we need these two functions to be exported.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

c9ce9e43

xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override. · d5431d52

Konrad Rzeszutek Wilk authored Feb 28, 2011

We only supported the M2P (and P2M) override only for the
GNTMAP_contains_pte type mappings. Meaning that we grants
operations would "contain the machine address of the PTE to update"
If the flag is unset, then the grant operation is
"contains a host virtual address". The latter case means that
the Hypervisor takes care of updating our page table
(specifically the PTE entry) with the guest's MFN. As such we should
not try to do anything with the PTE. Previous to this patch
we would try to clear the PTE which resulted in Xen hypervisor
being upset with us:

(XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE c0100000ccc59067
(XEN) domain_crash called from mm.c:1067
(XEN) Domain 0 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-4.0-110228  x86_64  debug=y  Not tainted ]----

and crashing us.

This patch allows us to inhibit the PTE clearing in the PV guest
if the GNTMAP_contains_pte is not set.

On the m2p_remove_override path we provide the same parameter.

Sadly in the grant-table driver we do not have a mechanism to
tell m2p_remove_override whether to clear the PTE or not. Since
the grant-table driver is used by user-space, we can safely assume
that it operates only on PTE's. Hence the implementation for
it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future
we can implement the support for this. It will require some extra
accounting structure to keep track of the page[i], and the flag.

[v1: Added documentation details, made it return -EOPNOTSUPP instead
 of trying to do a half-way implementation]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

d5431d52

xen/blkback: don't fail empty barrier requests · 8ab52150

Jan Beulich authored May 17, 2011

The sector number on empty barrier requests may (will?) be -1, which,
given that it's being treated as unsigned 64-bit quantity, will almost
always exceed the actual (virtual) disk's size.

Inspired by Konrad's "When writting barriers set the sector number to
zero...".

While at it also add overflow checking to the math in vbd_translate().
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

8ab52150

13 May, 2011 1 commit

xen/blkback: fix xenbus_transaction_start() hang caused by double xenbus_transaction_end() · 496b318e

Laszlo Ersek authored May 13, 2011

vbd_resize() up_read()'s xs_state.suspend_mutex twice in a row via double
xenbus_transaction_end() calls. The next down_read() in
xenbus_transaction_start() (at eg. the next resize attempt) hangs.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=618317Acked-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

496b318e

12 May, 2011 19 commits

xen/blkback: Align the tabs on the structure. · 51854322

Konrad Rzeszutek Wilk authored May 12, 2011

The recent changes caused this field of the structure to be offset a bit.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

51854322

xen/blkback: if log_stats is enabled print out the data. · cca537af

Konrad Rzeszutek Wilk authored May 12, 2011

And not depend on the driver being built with -DDEBUG flag.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

cca537af

xen/blkback: Add the prefix XEN in the common.h. · 5a577e38
Konrad Rzeszutek Wilk authored May 12, 2011
```
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
```
5a577e38
xen/blkback: Prefix 'vbd' with 'xen' in structs and functions. · 3d814731
Konrad Rzeszutek Wilk authored May 12, 2011
```
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
```
3d814731
xen/blkback: Change structure name blkif_st to xen_blkif. · 30fd1502
Konrad Rzeszutek Wilk authored May 12, 2011
```
No need for that '_st' and xen_blkif is more apt.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
```
30fd1502
xen/blkback: Remove the unused typedefs. · 325a6486
Konrad Rzeszutek Wilk authored May 12, 2011
```
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
```
325a6486
xen/blkback: Move include/xen/blkif.h into drivers/block/xen-blkback/common.h · 452a6b2b
Konrad Rzeszutek Wilk authored May 12, 2011
```
Not point of the blkif.h file. It is not used by the frontend.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
```
452a6b2b
xen/blkback: Fixing some more of the cleanpatch.pl warnings. · b0f80127
Konrad Rzeszutek Wilk authored May 12, 2011
```
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
```
b0f80127
xen/blkback: Checkpatch.pl recommend against multiple assigments. · 03e0edf9
Konrad Rzeszutek Wilk authored May 12, 2011
```
CHECK: multiple assignments should be avoided
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
```
03e0edf9
xen/blkback: Fix checkpatch.pl warnings about more than 80 lines. · 41ca4d38
Konrad Rzeszutek Wilk authored May 12, 2011
```
Break up the macro usage.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
```
41ca4d38
xen/blkback: Flesh out the description in the Kconfig. · a4c34858
Konrad Rzeszutek Wilk authored May 12, 2011
```
with more details.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
```
a4c34858
xen/blkback: Fix spelling mistakes. · b9fc0296
Konrad Rzeszutek Wilk authored May 11, 2011
```
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
```
b9fc0296
xen/blkback: Move blkif_get_x86_[32|64]_req to common.h in block/xen-blkback dir. · 68c88dd7
Konrad Rzeszutek Wilk authored May 11, 2011
```
From the blkif.h header, which was exposed to the frontend.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
```
68c88dd7
xen/blkback: Removing the debug_lvl option. · 72468bfc
Konrad Rzeszutek Wilk authored May 11, 2011
```
It is not really used for anything.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
```
72468bfc
xen/blkback: Use the DRV_PFX in the pr_.. macros. · 22b20f2d
Konrad Rzeszutek Wilk authored May 12, 2011
```
To make it easier to read.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
```
22b20f2d
xen/blkback: Make the DPRINTK uniform. · 1afbd730
Konrad Rzeszutek Wilk authored May 11, 2011
```
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
```
1afbd730

xen/blkback: Change printk/DPRINTK to pr_.. type variant. · ebe81906

Konrad Rzeszutek Wilk authored May 12, 2011

And also make them uniform and prefix the message with 'xen-blkback'.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

ebe81906

xen-blkfront: Introduce BLKIF_OP_FLUSH_DISKCACHE support. · edf6ef59

Konrad Rzeszutek Wilk authored May 03, 2011

If the backend supports the 'feature-flush-cache' mode, use that
instead of the 'feature-barrier' support.

Currently there are three backends that support the 'feature-flush-cache'
mode: NetBSD, Solaris and Linux kernel. The 'flush' option is much
light-weight version than the 'barrier' support so lets try to use as
there are no filesystems in the kernel that use full barriers anymore.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

edf6ef59

xen-blkfront: Provide for 'feature-flush-cache' the BLKIF_OP_WRITE_FLUSH_CACHE operation. · 6dcfb751

Konrad Rzeszutek Wilk authored May 05, 2011

The operation BLKIF_OP_WRITE_FLUSH_CACHE has existed in the Xen
tree header file for years but it was never present in the Linux tree
because the frontend (nor the backend) supported this interface.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

6dcfb751