• Hidehiro Kawai's avatar
    jbd2: fix error handling for checkpoint io · 44519faf
    Hidehiro Kawai authored
    When a checkpointing IO fails, current JBD2 code doesn't check the
    error and continue journaling.  This means latest metadata can be
    lost from both the journal and filesystem.
    
    This patch leaves the failed metadata blocks in the journal space
    and aborts journaling in the case of jbd2_log_do_checkpoint().
    To achieve this, we need to do:
    
    1. don't remove the failed buffer from the checkpoint list where in
       the case of __try_to_free_cp_buf() because it may be released or
       overwritten by a later transaction
    2. jbd2_log_do_checkpoint() is the last chance, remove the failed
       buffer from the checkpoint list and abort the journal
    3. when checkpointing fails, don't update the journal super block to
       prevent the journaled contents from being cleaned.  For safety,
       don't update j_tail and j_tail_sequence either
    4. when checkpointing fails, notify this error to the ext4 layer so
       that ext4 don't clear the needs_recovery flag, otherwise the
       journaled contents are ignored and cleaned in the recovery phase
    5. if the recovery fails, keep the needs_recovery flag
    6. prevent jbd2_cleanup_journal_tail() from being called between
       __jbd2_journal_drop_transaction() and jbd2_journal_abort()
       (a possible race issue between jbd2_log_do_checkpoint()s called by
       jbd2_journal_flush() and __jbd2_log_wait_for_space())
    Signed-off-by: default avatarHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
    Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
    44519faf
recovery.c 18.7 KB