btrfs: avoid unnecessary log mutex contention when syncing log

When syncing the log we acquire the root's log mutex just to update the root's last_log_commit. This is unnecessary because: 1) At this point there can only be one task updating this value, which is the task committing the current log transaction. Any task that enters btrfs_sync_log() has to wait for the previous log transaction to commit and wait for the current log transaction to commit if someone else already started it (in this case it never reaches to the point of updating last_log_commit, as that is done by the committing task); 2) All readers of the root's last_log_commit don't acquire the root's log mutex. This is to avoid blocking the readers, potentially for too long and because getting a stale value of last_log_commit does not cause any functional problem, in the worst case getting a stale value results in logging an inode unnecessarily. Plus it's actually very rare to get a stale value that results in unnecessarily logging the inode. So in order to avoid unnecessary contention on the root's log mutex, which is used for several different purposes, like starting/joining a log transaction and starting writeback of a log transaction, stop acquiring the log mutex for updating the root's last_log_commit. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: avoid unnecessary log mutex contention when syncing log
When syncing the log we acquire the root's log mutex just to update the root's last_log_commit. This is unnecessary because: 1) At this point there can only be one task updating this value, which is the task committing the current log transaction. Any task that enters btrfs_sync_log() has to wait for the previous log transaction to commit and wait for the current log transaction to commit if someone else already started it (in this case it never reaches to the point of updating last_log_commit, as that is done by the committing task); 2) All readers of the root's last_log_commit don't acquire the root's log mutex. This is to avoid blocking the readers, potentially for too long and because getting a stale value of last_log_commit does not cause any functional problem, in the worst case getting a stale value results in logging an inode unnecessarily. Plus it's actually very rare to get a stale value that results in unnecessarily logging the inode. So in order to avoid unnecessary contention on the root's log mutex, which is used for several different purposes, like starting/joining a log transaction and starting writeback of a log transaction, stop acquiring the log mutex for updating the root's last_log_commit. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
e1a6d264 · Filipe Manana · David Sterba · cceaa89f · e1a6d264
Commit e1a6d264 authored Jul 20, 2021 by Filipe Manana Committed by David Sterba Aug 23, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 10 additions and 4 deletions

fs/btrfs/tree-log.c fs/btrfs/tree-log.c +10 -4

No files found.
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3328,10 +3328,16 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
 		goto out_wake_log_root;
 	}
-	mutex_lock(&root->log_mutex);
+	/*
-	if (root->last_log_commit < log_transid)
+	 * We know there can only be one task here, since we have not yet set
-		root->last_log_commit = log_transid;
+	 * root->log_commit[index1] to 0 and any task attempting to sync the
-	mutex_unlock(&root->log_mutex);
+	 * log must wait for the previous log transaction to commit if it's
+	 * still in progress or wait for the current log transaction commit if
+	 * someone else already started it. We use <= and not < because the
+	 * first log transaction has an ID of 0.
+	 */
+	ASSERT(root->last_log_commit <= log_transid);
+	root->last_log_commit = log_transid;
 out_wake_log_root:
 	mutex_lock(&log_root_tree->log_mutex);