• Jonathan Brassow's avatar
    dm raid: fix incorrect sync_ratio when degraded · da1e1488
    Jonathan Brassow authored
    Upstream commit 4102d9de ("dm raid: fix rs_get_progress()
    synchronization state/ratio") in combination with commit 7c29744e
    ("dm raid: simplify rs_get_progress()") introduced a regression by
    incorrectly reporting a sync_ratio of 0 for degraded raid sets.  This
    caused lvm2 to fail to repair raid legs automatically.
    
    Fix by identifying the degraded state by checking the MD_RECOVERY_INTR
    flag and returning mddev->recovery_cp in case it is set.
    
    MD sets recovery = [ MD_RECOVERY_RECOVER MD_RECOVERY_INTR
    MD_RECOVERY_NEEDED ] when a RAID member fails.  It then shuts down any
    sync thread that is running and leaves us with all MD_RECOVERY_* flags
    cleared.  The bug occurs if a status is requested in the short time it
    takes to shut down any sync thread and clear the flags, because we were
    keying in on the MD_RECOVERY_NEEDED - understanding it to be the initial
    phase of a “recover” sync thread.  However, this is an incorrect
    interpretation if MD_RECOVERY_INTR is also set.
    
    This also explains why the bug only happened when automatic repair was
    enabled and not a normal ‘manual’ method.  It is impossible to react
    quick enough to hit the problematic window without it being automated.
    
    Fix passes automatic repair tests.
    
    Fixes: 7c29744e ("dm raid: simplify rs_get_progress()")
    Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
    Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
    Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    da1e1488
dm-raid.c 117 KB