btrfs: raid56: make it more explicit that cache rbio should have all its data sectors uptodate

For Btrfs RAID56, we have a caching system for btrfs raid bios (rbio). We call cache_rbio_pages() to mark a qualified rbio ready for cache. The timing happens at: - finish_rmw() At this timing, we have already read all necessary sectors, along with the rbio sectors, we have covered all data stripes. - __raid_recover_end_io() At this timing, we have rebuild the rbio, thus all data sectors involved (either from stripe or bio list) are uptodate now. Thus at the timing of cache_rbio_pages(), we should have all data sectors uptodate. This patch will make it explicit that all data sectors are uptodate at cache_rbio_pages() timing, mostly to prepare for the incoming verification at RMW time. This patch will add: - Extra ASSERT()s in cache_rbio_pages() This is to make sure all data sectors, which are not covered by bio, are already uptodate. - Extra ASSERT()s in steal_rbio() Since only cached rbio can be stolen, thus every data sector should already be uptodate in the source rbio. - Update __raid_recover_end_io() to update recovered sector->uptodate Previously __raid_recover_end_io() will only mark failed sectors uptodate if it's doing an RMW. But this can trigger new ASSERT()s, as for recovery case, a recovered failed sector will not be marked uptodate, and trigger ASSERT() in later cache_rbio_pages() call. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

btrfs: raid56: make it more explicit that cache rbio should have all its data sectors uptodate
For Btrfs RAID56, we have a caching system for btrfs raid bios (rbio). We call cache_rbio_pages() to mark a qualified rbio ready for cache. The timing happens at: - finish_rmw() At this timing, we have already read all necessary sectors, along with the rbio sectors, we have covered all data stripes. - __raid_recover_end_io() At this timing, we have rebuild the rbio, thus all data sectors involved (either from stripe or bio list) are uptodate now. Thus at the timing of cache_rbio_pages(), we should have all data sectors uptodate. This patch will make it explicit that all data sectors are uptodate at cache_rbio_pages() timing, mostly to prepare for the incoming verification at RMW time. This patch will add: - Extra ASSERT()s in cache_rbio_pages() This is to make sure all data sectors, which are not covered by bio, are already uptodate. - Extra ASSERT()s in steal_rbio() Since only cached rbio can be stolen, thus every data sector should already be uptodate in the source rbio. - Update __raid_recover_end_io() to update recovered sector->uptodate Previously __raid_recover_end_io() will only mark failed sectors uptodate if it's doing an RMW. But this can trigger new ASSERT()s, as for recovery case, a recovered failed sector will not be marked uptodate, and trigger ASSERT() in later cache_rbio_pages() call. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
88074c8b · Qu Wenruo · David Sterba · 797d74b7 · 88074c8b
Commit 88074c8b authored Oct 10, 2022 by Qu Wenruo Committed by David Sterba Dec 05, 2022
Hide whitespace changes
Inline Side-by-side

Showing with 51 additions and 19 deletions

fs/btrfs/raid56.c fs/btrfs/raid56.c +51 -19

No files found.
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -176,8 +176,16 @@ static void cache_rbio_pages(struct btrfs_raid_bio *rbio)
 	for (i = 0; i < rbio->nr_sectors; i++) {
 		/* Some range not covered by bio (partial write), skip it */
-		if (!rbio->bio_sectors[i].page)
+		if (!rbio->bio_sectors[i].page) {
+			/*
+			 * Even if the sector is not covered by bio, if it is
+			 * a data sector it should still be uptodate as it is
+			 * read from disk.
+			 */
+			if (i < rbio->nr_data * rbio->stripe_nsectors)
+				ASSERT(rbio->stripe_sectors[i].uptodate);
 			continue;
+		}
 		ASSERT(rbio->stripe_sectors[i].page);
 		memcpy_page(rbio->stripe_sectors[i].page,
@@ -264,6 +272,21 @@ static void steal_rbio_page(struct btrfs_raid_bio *src,
 		dest->stripe_sectors[i].uptodate = true;
 }
+static bool is_data_stripe_page(struct btrfs_raid_bio *rbio, int page_nr)
+{
+	const int sector_nr = (page_nr << PAGE_SHIFT) >>
+			      rbio->bioc->fs_info->sectorsize_bits;
+	/*
+	 * We have ensured PAGE_SIZE is aligned with sectorsize, thus
+	 * we won't have a page which is half data half parity.
+	 *
+	 * Thus if the first sector of the page belongs to data stripes, then
+	 * the full page belongs to data stripes.
+	 */
+	return (sector_nr < rbio->nr_data * rbio->stripe_nsectors);
+}
 /*
 * Stealing an rbio means taking all the uptodate pages from the stripe array
 * in the source rbio and putting them into the destination rbio.
@@ -274,16 +297,26 @@ static void steal_rbio_page(struct btrfs_raid_bio *src,
 static void steal_rbio(struct btrfs_raid_bio *src, struct btrfs_raid_bio *dest)
 {
 	int i;
-	struct page *s;
 	if (!test_bit(RBIO_CACHE_READY_BIT, &src->flags))
 		return;
 	for (i = 0; i < dest->nr_pages; i++) {
-		s = src->stripe_pages[i];
+		struct page *p = src->stripe_pages[i];
-		if (!s || !full_page_sectors_uptodate(src, i))
+		/*
+		 * We don't need to steal P/Q pages as they will always be
+		 * regenerated for RMW or full write anyway.
+		 */
+		if (!is_data_stripe_page(src, i))
 			continue;
+		/*
+		 * If @src already has RBIO_CACHE_READY_BIT, it should have
+		 * all data stripe pages present and uptodate.
+		 */
+		ASSERT(p);
+		ASSERT(full_page_sectors_uptodate(src, i));
 		steal_rbio_page(src, dest, i);
 	}
 	index_stripe_sectors(dest);
@@ -2003,22 +2036,21 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 			/* xor in the rest */
 			run_xor(pointers, rbio->nr_data - 1, sectorsize);
 		}
-		/* if we're doing this rebuild as part of an rmw, go through
-		 * and set all of our private rbio pages in the
+		/*
-		 * failed stripes as uptodate.  This way finish_rmw will
+		 * No matter if this is a RMW or recovery, we should have all
-		 * know they can be trusted.  If this was a read reconstruction,
+		 * failed sectors repaired, thus they are now uptodate.
-		 * other endio functions will fiddle the uptodate bits
+		 * Especially if we determine to cache the rbio, we need to
+		 * have at least all data sectors uptodate.
 		 */
-		if (rbio->operation == BTRFS_RBIO_WRITE) {
+		for (i = 0;  i < rbio->stripe_nsectors; i++) {
-			for (i = 0;  i < rbio->stripe_nsectors; i++) {
+			if (faila != -1) {
-				if (faila != -1) {
+				sector = rbio_stripe_sector(rbio, faila, i);
-					sector = rbio_stripe_sector(rbio, faila, i);
+				sector->uptodate = 1;
-					sector->uptodate = 1;
+			}
-				}
+			if (failb != -1) {
-				if (failb != -1) {
+				sector = rbio_stripe_sector(rbio, failb, i);
-					sector = rbio_stripe_sector(rbio, failb, i);
+				sector->uptodate = 1;
-					sector->uptodate = 1;
-				}
 			}
 		}
 		for (stripe = rbio->real_stripes - 1; stripe >= 0; stripe--)