• Lars Ellenberg's avatar
    drbd: do not reset rs_pending_cnt too early · 0029d624
    Lars Ellenberg authored
    Fix asserts like
      block drbd0: in got_BlockAck:4634: rs_pending_cnt = -35 < 0 !
    
    We reset the resync lru cache and related information (rs_pending_cnt),
    once we successfully finished a resync or online verify, or if the
    replication connection is lost.
    
    We also need to reset it if a resync or online verify is aborted
    because a lower level disk failed.
    
    In that case the replication link is still established,
    and we may still have packets queued in the network buffers
    which want to touch rs_pending_cnt.
    
    We do not have any synchronization mechanism to know for sure when all
    such pending resync related packets have been drained.
    
    To avoid this counter to go negative (and violate the ASSERT that it
    will always be >= 0), just do not reset it when we lose a disk.
    
    It is good enough to make sure it is re-initialized before the next
    resync can start: reset it when we re-attach a disk.
    Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
    Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
    0029d624
drbd_nl.c 74.3 KB