1. 21 Jan, 2013 1 commit
    • Lars Ellenberg's avatar
      drbd: fix potential protocol error and resulting disconnect/reconnect · 2681f7f6
      Lars Ellenberg authored
      When we notice a disk failure on the receiving side,
      we stop sending it new incoming writes.
      
      Depending on exact timing of various events, the same transfer log epoch
      could end up containing both replicated (before we noticed the failure)
      and local-only requests (after we noticed the failure).
      
      The sanity checks in tl_release(), called when receiving a
      P_BARRIER_ACK, check that the ack'ed transfer log epoch matches
      the expected epoch, and the number of contained writes matches
      the number of ack'ed writes.
      
      In this case, they counted both replicated and local-only writes,
      but the peer only acknowledges those it has seen.  We get a mismatch,
      resulting in a protocol error and disconnect/reconnect cycle.
      
      Messages logged are
        "BAD! BarrierAck #%u received with n_writes=%u, expected n_writes=%u!\n"
      
      A similar issue can also be triggered when starting a resync while
      having a healthy replication link, by invalidating one side, forcing a
      full sync, or attaching to a diskless node.
      
      Fix this by closing the current epoch if the state changes in a way
      that would cause the replication intent of the next write.
      
      Epochs now contain either only non-replicated,
      or only replicated writes.
      Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      2681f7f6
  2. 06 Dec, 2012 5 commits
  3. 01 Dec, 2012 1 commit
  4. 30 Nov, 2012 3 commits
    • Jens Axboe's avatar
      drbd: fixup after wait_even_lock_irq() addition to generic code · 2cecb730
      Jens Axboe authored
      Compiling drbd yields:
      
      drivers/block/drbd/drbd_state.c: In function ‘_conn_request_state’:
      drivers/block/drbd/drbd_state.c:1804:5: error: macro "wait_event_lock_irq" passed 4 arguments, but takes just 3
      drivers/block/drbd/drbd_state.c:1801:3: error: ‘wait_event_lock_irq’ undeclared (first use in this function)
      drivers/block/drbd/drbd_state.c:1801:3: note: each undeclared identifier is reported only once for each function it appears in
      drivers/block/drbd/drbd_state.c: At top level:
      drivers/block/drbd/drbd_state.c:1734:1: warning: ‘_conn_rq_cond’ defined but not used [-Wunused-function]
      
      Due to drbd having copied the MD definition for wait_event_lock_irq()
      as well. Kill them.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2cecb730
    • Lukas Czerner's avatar
      loop: Limit the number of requests in the bio list · 7b5a3522
      Lukas Czerner authored
      Currently there is not limitation of number of requests in the loop bio
      list. This can lead into some nasty situations when the caller spawns
      tons of bio requests taking huge amount of memory. This is even more
      obvious with discard where blkdev_issue_discard() will submit all bios
      for the range and wait for them to finish afterwards. On really big loop
      devices and slow backing file system this can lead to OOM situation as
      reported by Dave Chinner.
      
      With this patch we will wait in loop_make_request() if the number of
      bios in the loop bio list would exceed 'nr_congestion_on'.
      We'll wake up the process as we process the bios form the list. Some
      threshold hysteresis is in place to avoid high frequency oscillation.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Reported-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7b5a3522
    • Lukas Czerner's avatar
      wait: add wait_event_lock_irq() interface · eed8c02e
      Lukas Czerner authored
      New wait_event{_interruptible}_lock_irq{_cmd} macros added. This commit
      moves the private wait_event_lock_irq() macro from MD to regular wait
      includes, introduces new macro wait_event_lock_irq_cmd() instead of using
      the old method with omitting cmd parameter which is ugly and makes a use
      of new macros in the MD. It also introduces the _interruptible_ variant.
      
      The use of new interface is when one have a special lock to protect data
      structures used in the condition, or one also needs to invoke "cmd"
      before putting it to sleep.
      
      All new macros are expected to be called with the lock taken. The lock
      is released before sleep and is reacquired afterwards. We will leave the
      macro with the lock held.
      
      Note to DM: IMO this should also fix theoretical race on waitqueue while
      using simultaneously wait_event_lock_irq() and wait_event() because of
      lack of locking around current state setting and wait queue removal.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      eed8c02e
  5. 26 Nov, 2012 2 commits
  6. 23 Nov, 2012 5 commits
  7. 12 Nov, 2012 1 commit
  8. 09 Nov, 2012 22 commits