• Qu Wenruo's avatar
    btrfs: tree-checker: dump the page status if hit something wrong · dd6a5719
    Qu Wenruo authored
    [BUG]
    There is a bug report about very suspicious tree-checker got triggered:
    
      BTRFS critical (device dm-0): corrupted node, root=256
    block=8550954455682405139 owner mismatch, have 11858205567642294356
    expect [256, 18446744073709551360]
      BTRFS critical (device dm-0): corrupted node, root=256
    block=8550954455682405139 owner mismatch, have 11858205567642294356
    expect [256, 18446744073709551360]
      BTRFS critical (device dm-0): corrupted node, root=256
    block=8550954455682405139 owner mismatch, have 11858205567642294356
    expect [256, 18446744073709551360]
      SELinux: inode_doinit_use_xattr:  getxattr returned 117 for dev=dm-0
    ino=5737268
    
    [ANALYZE]
    The root cause is still unclear, but there are some clues already:
    
    - Unaligned eb bytenr
      The block bytenr is 8550954455682405139, which is not even aligned to
      2.
      This bytenr is fetched from extent buffer header, not from eb->start.
    
      This means, at the initial time of read, eb header bytenr is still
      correct (the very basis check to continue read), but later something
      wrong happened, got at least the first page corrupted.
      Thus we got such obviously incorrect value.
    
    - Invalid extent buffer header owner
      The read itself is triggered for subvolume 256, but the eb header
      owner is 11858205567642294356, which is not really possible.
      The problem here is, subvolume id is limited to (1 << 48 - 1),
      and this one definitely goes beyond that limit.
    
      So this value is another garbage.
    
    We already got two garbage from an extent buffer, which passed the
    initial bytenr and csum checks, but later the contents become garbage at
    some point.
    
    This looks like a page lifespan problem (e.g. we didn't properly hold the
    page).
    
    [ENHANCEMENT]
    The current tree-checker only outputs things from the extent buffer,
    nothing with the page status.
    
    So this patch would enhance the tree-checker output by also dumping the
    first page, which would look like this:
    
      page:00000000aa9f3ce8 refcount:4 mapcount:0 mapping:00000000169aa6b6 index:0x1d0c pfn:0x1022e5
      memcg:ffff888103456000
      aops:btree_aops [btrfs] ino:1
      flags: 0x2ffff0000008000(private|node=0|zone=2|lastcpupid=0xffff)
      page_type: 0xffffffff()
      raw: 02ffff0000008000 0000000000000000 dead000000000122 ffff88811e06e220
      raw: 0000000000001d0c ffff888102fdb1d8 00000004ffffffff ffff888103456000
      page dumped because: eb page dump
      BTRFS critical (device dm-3): corrupt leaf: root=5 block=30457856 slot=6 ino=257 file_offset=0, invalid disk_bytenr for file extent, have 10617606235235216665, should be aligned to 4096
      BTRFS error (device dm-3): read time tree block corruption detected on logical 30457856 mirror 1
    
    From the dump we can see some extra info, something can help us to do
    extra cross-checks:
    
    - Page refcount
      if it's too low, it definitely means something bad.
    
    - Page aops
      Any mapped eb page should have btree_aops with inode number 1.
    
    - Page index
      Since a mapped eb page should has its bytenr matching the page
      position, (index << PAGE_SHIFT) should match the bytenr of the
      bytenr from the critical line.
    
    - Page Private flags
      A mapped eb page should have Private flag set to indicate it's managed
      by btrfs.
    
    Link: https://lore.kernel.org/linux-btrfs/CAHk-=whNdMaN9ntZ47XRKP6DBes2E5w7fi-0U3H2+PS18p+Pzw@mail.gmail.com/Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    dd6a5719
tree-checker.c 64.2 KB