- 11 Mar, 2011 2 commits
-
-
Stephen M. Cameron authored
This bit got lost somewhere along the way. Without this, panic. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
-
Stephen M. Cameron authored
This attribute, requested by Redhat, allows kexec-tools to know whether the controller can honor the reset_devices kernel parameter and actually reset the controller. For kdump to work properly it is necessary that the reset_devices parameter be honored. This attribute enables kexec-tools to warn the user if they attempt to designate a non-resettable controller as the dump device. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
-
- 10 Mar, 2011 38 commits
-
-
Or Gerlitz authored
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Andreas Gruenbacher authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
This code became obsolete and unused last December with drbd: bitmap keep track of changes vs on-disk bitmap Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Just deal with it more gracefully, if we fail to add even a single page to an empty bio. We used to BUG_ON() there, but it has been observed in some Xen deployment, so we need to handle that case more robustly now. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
This is an addendum to drbd: serialize admin requests for new resync with pending bitmap io It avoids a race that could trigger "FIXME" assert log messages. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
When we receive a barrier ack, we walk the ring list of drbd requests in the transfer log of the respective epoch, do some housekeeping, and free those objects. We tried to keep epochs of mirrored and unmirrored drbd requests separate, and assert that no local-only requests are present in a barrier_acked epoch. It turns out that this has quite a number of corner cases and would add bloated code without functional benefit. We now revert the (insufficient) commits drbd: Fixed an issue with AHEAD -> SYNC_SOURCE transitions drbd: Ensure that an epoch contains only requests of one kind and instead fix the processing of barrier acks to cope with a mix of local-only and mirrored requests. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
If we fail to send the information that we lost our disk, we have no connection, and no disk: no access to data anymore. That is either expected (deconfiguration), or there will be so much noise in the logs that "Sending state failed" is not useful at all. Drop it. If the reason for a shorter than expected receive was a signal, which we sent because we already decided to disconnect, these additional log messages are confusing and useless. This patch follows this pattern: - dev_warn(DEV, "short read expecting header on sock: r=%d\n", r); + if (!signal_pending(current)) + dev_warn(DEV, "short read expecting header on sock: r=%d\n", r); Also make them all dev_warn for consistency. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Now that we do no longer in-place endian-swap the bitmap, we allow selected bitmap operations (testing bits, sometimes even settting bits) during some bulk operations. This caused us to hit a lot of FIXME asserts similar to FIXME asender in drbd_bm_count_bits, bitmap locked for 'write from resync_finished' by worker Which now is nonsense: looking at the bitmap is perfectly legal as long as it is not being resized. This cosmetic patch defines some flags to describe expectations in finer detail, so the asserts in e.g. bm_change_bits_to() can be skipped if appropriate. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
All decisions about sync, sync direction, and wether or not to allow a connect or attach are based on our set of UUIDs to tag a data generation. Log changes to the UUIDs whenever they occur, logging "new current UUID P:Q:R:S" is more useful than "Creating new current UUID". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
When the user clears the sync-pause flag, and sync stays in pause state, give hints to the user, why it still is in pause state. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
The "lazy writeout" of cleared bitmap pages happens during resync, and should happen again once the resync finishes cleanly, or is aborted. If resync finished cleanly, or was aborted because of peer disk failure, we trigger the writeout from worker context in the after state change work. If resync was aborted because of connection failure, we should not immediately trigger bitmap writeout, but rather postpone the writeout to after the connection cleanup happened. We now do it in the receiver context from drbd_disconnect(). If resync was aborted because of local disk failure, well, there is nothing to write to anymore. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
This is a minor optimization and cleanup, and also considerably reduces some harmless (but noisy) race with the connection cleanup code. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
The assert in drbd_req.c:755 forces us to have only requests of one kind in an epoch. The two kinds we distinguish here are: local-only or mirrored. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Protocol A has no P_WRITE_ACKs, but has P_NEG_ACKs. The master bio might already be completed, therefore the request is no longer in the collision hash. => Do not try to validate block_id as request In Protocol B we might already have got a P_RECV_ACK but then get a P_NEG_ACK after wards. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
The point is that drbd_disconnect() can be called with a cstate of WFConnection. That happens if the user issues "drbdsetup disconnect" while the drbd_connect() function executes. Then drbdd_init() will call drbdd(), which in turn will return without receiving any packets. Then drbdd_init() will end up calling drbd_disconnect() with a cstate of WFConnection. Bottom line: This assertion is wrong as it is, and we do not see value in fixing it. => Removing it. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
The test if rs_pending_cnt == 0 was too weak. Using Test for unacked_cnt == 0 instead. Moved that into the worker. Since unacked_cnt gets already increased when an P_RS_DATA_REQ comes in. Also using a timer to make Ahead -> SyncSource -> Ahead cycles slower... Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
See also commit from 2009-08-15 "drbd_uuid_compare(): Do not full sync in case a P_SYNC_UUID packet gets lost." We saw cases where the History UUIDs where not as expected. So the detection of the special case did not trigger. With the sync UUID no longer being a random number, but deducible from the previous bitmap UUID, the detection of this special case becomes more reliable. The SyncUUID now is the previous bitmap UUID + 0x1000000000000. Rule 5a: Cs = H1p & H1p + Offset = Bp Connection was lost before SyncUUID Packet came through. Corrent (peer) UUIDs: Bp = H1p H1p = H2p H2p = 0 Become Sync target. Rule 7a: Cp = H1s & H1s + Offset = Bs Connection was lost before SyncUUID Packet came through. Correct (own) UUIDs: Bs = H1s H1s = H2s H2s = 0 Become Sync source. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Andreas Gruenbacher authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Besides removed a few lines of code, this moves the inspection of the state from before the queuing process to after the queuing. I.e. more closely to the actual invocation of the work. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
We may not get from SyncSource to Ahead if we have sent some P_RS_DATA_REPLY packets to the peer and are waiting for P_WRITE_ACK. Again, this is not relevant for proper tuned systems, but makes sure that the not-tuned system does not get diverging bitmaps. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
When the sync source node replies to a P_RS_DATA_REQUEST packet when it is already in ahead mode. I.e. those two packets crossed each other on the wire, that may lead to diverging bitmaps. This never happens in a well-tuned-system. In a well-tuned- system the resync controller has reduced the resync speed to zero long before we got into ahead-mode. But we have to be prepared for the not-well-tuned-system of course as well. Because -> diverging bitmaps = non terminating resync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
Create a new barrier when leaving the AHEAD mode. Otherwise we trigger the assertion in req_mod(, barrier_acked) D_ASSERT(req->rq_state & RQ_NET_SENT); The new barrier is created by recycling the newest existing one. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Philipp Reisner authored
When on-no-data-accessible is set to suspend-io, also consider that a Primary, SyncTarget node losses its connection. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
I run into something declaring itself as "spinlock deadlock", BUG: spinlock lockup on CPU#1, kjournald/27816, ffff88000ad6bca0 Pid: 27816, comm: kjournald Tainted: G W 2.6.34.6 #2 Call Trace: <IRQ> [<ffffffff811ba0aa>] do_raw_spin_lock+0x11e/0x14d [<ffffffff81340fde>] _raw_spin_lock_irqsave+0x6a/0x81 [<ffffffff8103b694>] ? __wake_up+0x22/0x50 [<ffffffff8103b694>] __wake_up+0x22/0x50 [<ffffffffa07ff661>] bm_async_io_complete+0x258/0x299 [drbd] but the call traces do not fit at all, all other cpus are cpu_idle. I think it may be this race: drbd_bm_write_page wait_queue_head_t io_wait; atomic_t in_flight; bm_async_io submit_bio bm_async_io_complete if (atomic_dec_and_test(in_flight)) wait_event(io_wait, atomic_read(in_flight) == 0) return wake_up(io_wait) The wake_up now accesses the wait_queue_head_t spinlock, which is no longer valid, since the stack frame of drbd_bm_write_page has been clobbered now. Fix this by using struct completion, which does both the condition test as well as the wake_up inside its spinlock, so this race cannot happen. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-
Lars Ellenberg authored
Even though we now track the need for bitmap writeout per bitmap page, there is no need to trigger the writeout while a resync is going on. Once the resync is finished (or aborted), we trigger bitmap writeout anyways. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
-