• NeilBrown's avatar
    md/raid1,raid10: always abort recover on write error. · 9a28f9e6
    NeilBrown authored
    commit 2446dba0 upstream.
    
    Currently we don't abort recovery on a write error if the write error
    to the recovering device was triggerd by normal IO (as opposed to
    recovery IO).
    
    This means that for one bitmap region, the recovery might write to the
    recovering device for a few sectors, then not bother for subsequent
    sectors (as it never writes to failed devices).  In this case
    the bitmap bit will be cleared, but it really shouldn't.
    
    The result is that if the recovering device fails and is then re-added
    (after fixing whatever hardware problem triggerred the failure),
    the second recovery won't redo the region it was in the middle of,
    so some of the device will not be recovered properly.
    
    If we abort the recovery, the region being processes will be cancelled
    (bit not cleared) and the whole region will be retried.
    
    As the bug can result in data corruption the patch is suitable for
    -stable.  For kernels prior to 3.11 there is a conflict in raid10.c
    which will require care.
    
    Original-from: jiao hui <jiaohui@bwstor.com.cn>
    Reported-and-tested-by: default avatarjiao hui <jiaohui@bwstor.com.cn>
    Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    [bwh: Backported to 3.2: adjust context]
    Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
    9a28f9e6
raid10.c 84 KB