jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path

commit c0a2ad9b upstream. On umount path, jbd2_journal_destroy() writes latest transaction ID (->j_tail_sequence) to be used at next mount. The bug is that ->j_tail_sequence is not holding latest transaction ID in some cases. So, at next mount, there is chance to conflict with remaining (not overwritten yet) transactions. mount (id=10) write transaction (id=11) write transaction (id=12) umount (id=10) <= the bug doesn't write latest ID mount (id=10) write transaction (id=11) crash mount [recovery process] transaction (id=11) transaction (id=12) <= valid transaction ID, but old commit must not replay Like above, this bug become the cause of recovery failure, or FS corruption. So why ->j_tail_sequence doesn't point latest ID? Because if checkpoint transactions was reclaimed by memory pressure (i.e. bdev_try_to_free_page()), then ->j_tail_sequence is not updated. (And another case is, __jbd2_journal_clean_checkpoint_list() is called with empty transaction.) So in above cases, ->j_tail_sequence is not pointing latest transaction ID at umount path. Plus, REQ_FLUSH for checkpoint is not done too. So, to fix this problem with minimum changes, this patch updates ->j_tail_sequence, and issue REQ_FLUSH. (With more complex changes, some optimizations would be possible to avoid unnecessary REQ_FLUSH for example though.) BTW, journal->j_tail_sequence = ++journal->j_transaction_sequence; Increment of ->j_transaction_sequence seems to be unnecessary, but ext3 does this. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jiri Slaby <jslaby@suse.cz>

jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path
commit c0a2ad9b upstream. On umount path, jbd2_journal_destroy() writes latest transaction ID (->j_tail_sequence) to be used at next mount. The bug is that ->j_tail_sequence is not holding latest transaction ID in some cases. So, at next mount, there is chance to conflict with remaining (not overwritten yet) transactions. mount (id=10) write transaction (id=11) write transaction (id=12) umount (id=10) <= the bug doesn't write latest ID mount (id=10) write transaction (id=11) crash mount [recovery process] transaction (id=11) transaction (id=12) <= valid transaction ID, but old commit must not replay Like above, this bug become the cause of recovery failure, or FS corruption. So why ->j_tail_sequence doesn't point latest ID? Because if checkpoint transactions was reclaimed by memory pressure (i.e. bdev_try_to_free_page()), then ->j_tail_sequence is not updated. (And another case is, __jbd2_journal_clean_checkpoint_list() is called with empty transaction.) So in above cases, ->j_tail_sequence is not pointing latest transaction ID at umount path. Plus, REQ_FLUSH for checkpoint is not done too. So, to fix this problem with minimum changes, this patch updates ->j_tail_sequence, and issue REQ_FLUSH. (With more complex changes, some optimizations would be possible to avoid unnecessary REQ_FLUSH for example though.) BTW, journal->j_tail_sequence = ++journal->j_transaction_sequence; Increment of ->j_transaction_sequence seems to be unnecessary, but ext3 does this. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
040c8d12 · OGAWA Hirofumi · Jiri Slaby · 12d51716 · 040c8d12
Commit 040c8d12 authored Mar 09, 2016 by OGAWA Hirofumi Committed by Jiri Slaby Apr 11, 2016
Hide whitespace changes
Inline Side-by-side

Showing with 12 additions and 5 deletions

fs/jbd2/journal.c fs/jbd2/journal.c +12 -5

No files found.
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1425,11 +1425,12 @@ int jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid,
 /**
 * jbd2_mark_journal_empty() - Mark on disk journal as empty.
 * @journal: The journal to update.
+ * @write_op: With which operation should we write the journal sb
 *
 * Update a journal's dynamic superblock fields to show that journal is empty.
 * Write updated superblock to disk waiting for IO to complete.
 */
-static void jbd2_mark_journal_empty(journal_t *journal)
+static void jbd2_mark_journal_empty(journal_t *journal, int write_op)
 {
 	journal_superblock_t *sb = journal->j_superblock;
@@ -1447,7 +1448,7 @@ static void jbd2_mark_journal_empty(journal_t *journal)
 	sb->s_start    = cpu_to_be32(0);
 	read_unlock(&journal->j_state_lock);
-	jbd2_write_superblock(journal, WRITE_FUA);
+	jbd2_write_superblock(journal, write_op);
 	/* Log is no longer empty */
 	write_lock(&journal->j_state_lock);
@@ -1732,7 +1733,13 @@ int jbd2_journal_destroy(journal_t *journal)
 	if (journal->j_sb_buffer) {
 		if (!is_journal_aborted(journal)) {
 			mutex_lock(&journal->j_checkpoint_mutex);
-			jbd2_mark_journal_empty(journal);
+			write_lock(&journal->j_state_lock);
+			journal->j_tail_sequence =
+				++journal->j_transaction_sequence;
+			write_unlock(&journal->j_state_lock);
+			jbd2_mark_journal_empty(journal, WRITE_FLUSH_FUA);
 			mutex_unlock(&journal->j_checkpoint_mutex);
 		} else
 			err = -EIO;
@@ -1993,7 +2000,7 @@ int jbd2_journal_flush(journal_t *journal)
 	 * the magic code for a fully-recovered superblock.  Any future
 	 * commits of data to the journal will restore the current
 	 * s_start value. */
-	jbd2_mark_journal_empty(journal);
+	jbd2_mark_journal_empty(journal, WRITE_FUA);
 	mutex_unlock(&journal->j_checkpoint_mutex);
 	write_lock(&journal->j_state_lock);
 	J_ASSERT(!journal->j_running_transaction);
@@ -2039,7 +2046,7 @@ int jbd2_journal_wipe(journal_t *journal, int write)
 	if (write) {
 		/* Lock to make assertions happy... */
 		mutex_lock(&journal->j_checkpoint_mutex);
-		jbd2_mark_journal_empty(journal);
+		jbd2_mark_journal_empty(journal, WRITE_FUA);
 		mutex_unlock(&journal->j_checkpoint_mutex);
 	}