• Lars Ellenberg's avatar
    drbd: fix resync finished detection · 5ab7d2c0
    Lars Ellenberg authored
    This fixes one recent regresion,
    and one long existing bug.
    
    The bug:
    drbd_try_clear_on_disk_bm() assumed that all "count" bits have to be
    accounted in the resync extent corresponding to the start sector.
    
    Since we allow application requests to cross our "extent" boundaries,
    this assumption is no longer true, resulting in possible misaccounting,
    scary messages
    ("BAD! sector=12345s enr=6 rs_left=-7 rs_failed=0 count=58 cstate=..."),
    and potentially, if the last bit to be cleared during resync would
    reside in previously misaccounted resync extent, the resync would never
    be recognized as finished, but would be "stalled" forever, even though
    all blocks are in sync again and all bits have been cleared...
    
    The regression was introduced by
        drbd: get rid of atomic update on disk bitmap works
    
    For an "empty" resync (rs_total == 0), we must not "finish" the
    resync on the SyncSource before the SyncTarget knows all relevant
    information (sync uuid).  We need to wait for the full round-trip,
    the SyncTarget will then explicitly notify us.
    
    Also for normal, non-empty resyncs (rs_total > 0), the resync-finished
    condition needs to be tested before the schedule() in wait_for_work, or
    it is likely to be missed.
    Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
    Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
    5ab7d2c0
drbd_int.h 76.7 KB