• Qu Wenruo's avatar
    btrfs: scrub: improve tree block error reporting · 28232909
    Qu Wenruo authored
    [BUG]
    When debugging a scrub related metadata error, it turns out that our
    metadata error reporting is not ideal.
    
    The only 3 error messages are:
    
    - BTRFS error (device dm-2): bdev /dev/mapper/test-scratch1 errs: wr 0, rd 0, flush 0, corrupt 0, gen 1
      Showing we have metadata generation mismatch errors.
    
    - BTRFS error (device dm-2): unable to fixup (regular) error at logical 7110656 on dev /dev/mapper/test-scratch1
      Showing which tree blocks are corrupted.
    
    - BTRFS warning (device dm-2): checksum/header error at logical 24772608 on dev /dev/mapper/test-scratch2, physical 3801088: metadata node (level 1) in tree 5
      Showing which physical range the corrupted metadata is at.
    
    We have to combine the above 3 to know we have a corrupted metadata with
    generation mismatch.
    
    And this is already the better case, if we have other problems, like
    fsid mismatch, we can not even know the cause.
    
    [CAUSE]
    The problem is caused by the fact that, scrub_checksum_tree_block()
    never outputs any error message.
    
    It just return two bits for scrub: sblock->header_error, and
    sblock->generation_error.
    
    And later we report error in scrub_print_warning(), but unfortunately we
    only have two bits, there is not really much thing we can done to print
    any detailed errors.
    
    [FIX]
    This patch will do the following to enhance the error reporting of
    metadata scrub:
    
    - Add extra warning (ratelimited) for every error we hit
      This can help us to distinguish the different types of errors.
      Some errors can help us to know what's going wrong immediately,
      like bytenr mismatch.
    
    - Re-order the checks
      Currently we check bytenr first, then immediately generation.
      This can lead to false generation mismatch reports, while the fsid
      mismatches.
    
    Here is the new output for the bug I'm debugging (we forgot to
    writeback tree blocks for commit roots):
    
     BTRFS warning (device dm-2): tree block 24117248 mirror 1 has bad fsid, has b77cd862-f150-4c71-90ec-7baf0544d83f want 17df6abf-23cd-445f-b350-5b3e40bfd2fc
     BTRFS warning (device dm-2): tree block 24117248 mirror 0 has bad fsid, has b77cd862-f150-4c71-90ec-7baf0544d83f want 17df6abf-23cd-445f-b350-5b3e40bfd2fc
    
    Now we can immediately know it's some tree blocks didn't even get written
    back, other than the original confusing generation mismatch.
    Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    28232909
scrub.c 126 KB