Commits · 617049aa7d753e8c821ac77126ab90e9f1b66d6d · nexedi / linux

10 Mar, 2011 40 commits

drbd: Fixed an issue with AHEAD -> SYNC_SOURCE transitions · 617049aa

Philipp Reisner authored Dec 22, 2010

Create a new barrier when leaving the AHEAD mode.

  Otherwise we trigger the assertion in req_mod(, barrier_acked)
  D_ASSERT(req->rq_state & RQ_NET_SENT);

The new barrier is created by recycling the newest existing one.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

617049aa

drbd: ratelimit io error messages · 07194272

Lars Ellenberg authored Dec 20, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

07194272

drbd: There might be a resync after unfreezing IO due to no disk [Bugz 332] · 3f98688a

Philipp Reisner authored Dec 20, 2010

When on-no-data-accessible is set to suspend-io, also consider that
a Primary, SyncTarget node losses its connection.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

3f98688a

drbd: fix potential access of on-stack wait_queue_head_t after return · 725a97e4

Lars Ellenberg authored Dec 19, 2010

I run into something declaring itself as "spinlock deadlock",
 BUG: spinlock lockup on CPU#1, kjournald/27816, ffff88000ad6bca0
 Pid: 27816, comm: kjournald Tainted: G        W 2.6.34.6 #2
 Call Trace:
  <IRQ>  [<ffffffff811ba0aa>] do_raw_spin_lock+0x11e/0x14d
  [<ffffffff81340fde>] _raw_spin_lock_irqsave+0x6a/0x81
  [<ffffffff8103b694>] ? __wake_up+0x22/0x50
  [<ffffffff8103b694>] __wake_up+0x22/0x50
  [<ffffffffa07ff661>] bm_async_io_complete+0x258/0x299 [drbd]
but the call traces do not fit at all,
all other cpus are cpu_idle.

I think it may be this race:

drbd_bm_write_page
 wait_queue_head_t io_wait;
 atomic_t in_flight;
 bm_async_io
  submit_bio
					bm_async_io_complete
					  if (atomic_dec_and_test(in_flight))
 wait_event(io_wait,
	atomic_read(in_flight) == 0)
 return
					    wake_up(io_wait)

The wake_up now accesses the wait_queue_head_t spinlock, which is no
longer valid, since the stack frame of drbd_bm_write_page has been
clobbered now.

Fix this by using struct completion, which does both the condition test
as well as the wake_up inside its spinlock, so this race cannot happen.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

725a97e4

drbd: improve on bitmap write out timing · 06d33e96

Lars Ellenberg authored Dec 18, 2010

Even though we now track the need for bitmap writeout per bitmap page,
there is no need to trigger the writeout while a resync is going on.

Once the resync is finished (or aborted),
we trigger bitmap writeout anyways.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

06d33e96

drbd: spelling fix in log message · 418e0a92

Lars Ellenberg authored Dec 18, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

418e0a92

drbd: be less noisy with some log messages · 7648cdfe

Lars Ellenberg authored Dec 17, 2010

We expect changes to a bitmap page in drbd_bm_write_page,
that's why we submit a copy page.

If a page changes during global writeout, that would be unexpected,
and reason to warn, though.

Also, often page writeout can be skipped (on activity log transactions
during normal operation, for example), no need to log that everytime.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

7648cdfe

drbd: serialize sending of resync uuid with pending w_send_oos · 5a22db89

Lars Ellenberg authored Dec 17, 2010

To improve the latency of IO requests during bitmap exchange,
we recently allowed writes while waiting for the bitmap, sending "set
out-of-sync" information packets for any newly dirtied bits.

We have to make sure that the new resync-uuid does not overtake
these "set oos" packets. Once the resync-uuid is received, the
sync target starts the resync process, and expects the bitmap to
only be cleared, not re-set.

If we use this protocol extension, we queue the generation and sending
of the resync-uuid on the worker, which naturally serializes with all
previously queued packets.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

5a22db89

drbd: add debugging assert to make sure the protocol is clean · f735e363

Lars Ellenberg authored Dec 17, 2010

We expect to only receive the recently introduced "set out of sync"
packets in specific states. If we receive them in different states, that
may confuse the resync process to the point where it won't terminate, or
think it made negative progress.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

f735e363

drbd: Documenting drbd_should_do_remote() and drbd_should_send_oos() · c88d65e2

Philipp Reisner authored Dec 20, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

c88d65e2

drbd: fix potential dereference of NULL pointer · 2265b473

Lars Ellenberg authored Dec 16, 2010

If drbd used to have crypto digest algorithms configured, then is being
unconfigured (but not unloaded), it frees the algorithms, but does not
reset the config. If it then is reconfigured to use the very same
algorithm, it "forgot" to re-allocate the algorithms, thinking that the
config has not changed in that aspect.
It will then Oops on the first attempt to actually use those algorithms.

Fix this by resetting the config to defaults after cleanup.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

2265b473

drbd: move bitmap write from resync_finished to after_state_change · 02851e9f

Lars Ellenberg authored Dec 16, 2010

We must not call it directly from resync_finished,
as we may be in either receiver or worker context there.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

02851e9f

drbd: Removed a reference to debug macros removed long time ago · 84e7c0f7

Lars Ellenberg authored Dec 16, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

84e7c0f7

drbd: get rid of unused debug code · 6850c442

Lars Ellenberg authored Dec 16, 2010

Long time ago, we had paranoia code in the bitmap that allocated one
extra word, assigned a magic value, and checked on every occasion that
the magic value was still unchanged.

That debug code is unused, the extra long word complicates code a bit.
Get rid of it.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

6850c442

drbd: allow petabyte storage on 64bit arch · 4b0715f0

Lars Ellenberg authored Dec 14, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

4b0715f0

drbd: bitmap keep track of changes vs on-disk bitmap · 19f843aa

Lars Ellenberg authored Dec 15, 2010

When we set or clear bits in a bitmap page,
also set a flag in the page->private pointer.

This allows us to skip writes of unchanged pages.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

19f843aa

drbd: store in-core bitmap little endian, regardless of architecture · 95a0f10c

Lars Ellenberg authored Dec 15, 2010

Our on-disk bitmap is a little endian bitstream.
Up to now, we have stored the in-core copy of that in
native endian, applying byte order conversion when necessary.

Instead, keep the bitmap pages little endian, as they are read from disk,
and use the generic_*_le_bit family of functions.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

95a0f10c

drbd: bitmap: don't count unused bits (fix non-terminating resync) · 7777a8ba

Lars Ellenberg authored Dec 15, 2010

We trusted the on-disk bitmap to have unused bits cleared.
In case that is not true for whatever reason,
and we take a code path where the unused bits don't get cleared
elsewhere (bm_clear_surplus is not called), we may miscount the bits,
and get confused during resync, waiting for bits to get cleared that we
don't even use: the resync process would not terminate.

Fix this by masking out unused bits in __bm_count_bits.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

7777a8ba

drbd: Rename __inc_ap_bio_cond to may_inc_ap_bio · 1b881ef7

Andreas Gruenbacher authored Dec 13, 2010

The old name is confusing: the function does not increment anything.
Also rename _inc_ap_bio_cond to inc_ap_bio_cond: there is no need for
an underscore.
Finally, make it clear that these functions return boolean values.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

1b881ef7

drbd: Fix: drbd_bitmap_io does not return an enum determine_dev_size · 24dccabb

Andreas Gruenbacher authored Dec 12, 2010

I guess bitmap I/O errors are supposed to cause drbd_determin_dev_size
to return dev_size_error.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

24dccabb

drbd: receive_bitmap_plain: Get rid of ugly and useless enum · 2c46407d

Andreas Gruenbacher authored Dec 11, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

2c46407d

drbd: send_bitmap_rle_or_plain: Get rid of ugly and useless enum · f70af118

Andreas Gruenbacher authored Dec 11, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

f70af118

drbd: receive_bitmap: Missing free_page() on error path · 78fcbdae

Andreas Gruenbacher authored Dec 10, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

78fcbdae

drbd: receive_bitmap: Avoid casting enum drbd_state_rv to int · de1f8e4a

Andreas Gruenbacher authored Dec 10, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

de1f8e4a

drbd: receive_bitmap: Fix the wrong return value · 4114be81

Andreas Gruenbacher authored Dec 10, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

4114be81

drbd: drbd_nl_disk_conf: Avoid a compiler warning · f2024e7c

Andreas Gruenbacher authored Dec 10, 2010

Warning: comparison between ‘enum drbd_ret_code’ and ‘enum drbd_state_rv’
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

f2024e7c

drbd: Use the standard bool, true, and false keywords · 81e84650

Andreas Gruenbacher authored Dec 09, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

81e84650

drbd: This code is dead now · 6184ea21

Andreas Gruenbacher authored Dec 09, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

6184ea21

drbd: Another small enum drbd_state_rv cleanup · bb437946

Andreas Gruenbacher authored Dec 09, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

bb437946

drbd: Be more explicit about functions that return an enum drbd_state_rv · bf885f8a
Andreas Gruenbacher authored Dec 08, 2010
```
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
```
bf885f8a

drbd: Rename enum drbd_state_ret_codes to enum drbd_state_rv · c8b32563

Andreas Gruenbacher authored Dec 08, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

c8b32563

drbd: Rename enum drbd_ret_codes to enum drbd_ret_code · 116676ca

Andreas Gruenbacher authored Dec 08, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

116676ca

drbd: Get rid of unnecessary macros (2) · 0cf9d27e

Andreas Gruenbacher authored Dec 07, 2010

The FAULT_ACTIVE macro just wraps the drbd_insert_fault macro for no
apparent reason.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

0cf9d27e

drbd: Get rid of unnecessary macros (1) · 662d91a2

Andreas Gruenbacher authored Dec 07, 2010

This macro doesn't save much code, but makes things a lot harder to read.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

662d91a2

drbd: Rename drbd_make_request_26 to drbd_make_request · 2f58dcfc

Andreas Gruenbacher authored Dec 13, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

2f58dcfc

drbd: Remove left-over prototype · 96756784

Andreas Gruenbacher authored Dec 09, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

96756784

drbd: Make sure that drbd_send() has sent the right number of bytes · cab2f74b
Andreas Gruenbacher authored Dec 09, 2010
```
Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
```
cab2f74b

drbd: fix incomplete error message · 220df4d0

Lars Ellenberg authored Dec 09, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

220df4d0

drbd: Removed an unnecessary #undef · 7e458c32

Andreas Gruenbacher authored Dec 08, 2010

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

7e458c32

drbd: fix regression, we need to close drbd epochs during normal operation · 8a3c1044

Lars Ellenberg authored Dec 05, 2010

commit e2041475e6ddb081734d161f6421977323f5a9b9
drbd: Starting with protocol 96 we can allow app-IO while receiving the bitmap

Contained a bad chunk that tried to optimize away drbd barriers during
bitmap exchange, but accidentally dropped them for normal mode as well.

Impact: depending on activity log size and access pattern, activity log
extents may not be recycled in time, causeing IO to block indefinetely.

Fix: skip drbd barriers only if there is no connection to send them on,
or the request being completed has not been on the network at all.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

8a3c1044