• Jonathan Brassow's avatar
    dm raid1: handle write failures · 72f4b314
    Jonathan Brassow authored
    This patch gives mirror the ability to handle device failures
    during normal write operations.
    
    The 'write_callback' function is called when a write completes.
    If all the writes failed or succeeded, we report failure or
    success respectively.  If some of the writes failed, we call
    fail_mirror; which increments the error count for the device, notes
    the type of error encountered (DM_RAID1_WRITE_ERROR),  and
    selects a new primary (if necessary).  Note that the primary
    device can never change while the mirror is not in-sync (IOW,
    while recovery is happening.)  This means that the scenario
    where a failed write changes the primary and gives
    recovery_complete a chance to misread the primary never happens.
    The fact that the primary can change has necessitated the change
    to the default_mirror field.  We need to protect against reading
    garbage while the primary changes.  We then add the bio to a new
    list in the mirror set, 'failures'.  For every bio in the 'failures'
    list, we call a new function, '__bio_mark_nosync', where we mark
    the region 'not-in-sync' in the log and properly set the region
    state as, RH_NOSYNC.  Userspace must also be notified of the
    failure.  This is done by 'raising an event' (dm_table_event()).
    If fail_mirror is called in process context the event can be raised
    right away.  If in interrupt context, the event is deferred to the
    kmirrord thread - which raises the event if 'event_waiting' is set.
    
    Backwards compatibility is maintained by ignoring errors if
    the DM_FEATURES_HANDLE_ERRORS flag is not present.
    Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
    Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
    72f4b314
dm-raid1.c 36.9 KB