• Qu Wenruo's avatar
    btrfs: raid56: extra debugging for raid6 syndrome generation · b2324e08
    Qu Wenruo authored
    [BUG]
    I have got at least two crash report for RAID6 syndrome generation, no
    matter if it's AVX2 or SSE2, they all seems to have a similar
    calltrace with corrupted RAX:
    
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      #PF: supervisor read access in kernel mode
      #PF: error_code(0x0000) - not-present page
      PGD 0 P4D 0
      Oops: 0000 [#1] PREEMPT SMP PTI
      Workqueue: btrfs-rmw rmw_rbio_work [btrfs]
      RIP: 0010:raid6_sse21_gen_syndrome+0x9e/0x130 [raid6_pq]
      RAX: 0000000000000000 RBX: 0000000000001000 RCX: ffffa0ff4cfa3248
      RDX: 0000000000000000 RSI: ffffa0f74cfa3238 RDI: 0000000000000000
      Call Trace:
       <TASK>
       rmw_rbio+0x5c8/0xa80 [btrfs]
       process_one_work+0x1c7/0x3d0
       worker_thread+0x4d/0x380
       kthread+0xf3/0x120
       ret_from_fork+0x2c/0x50
       </TASK>
    
    [CAUSE]
    The cause is not known.  Recently I also hit this in AVX512 path, and
    that's even in v5.15 backport, which doesn't have any of my RAID56
    rework.
    
    Furthermore according to the registers:
    
      RAX: 0000000000000000 RBX: 0000000000001000 RCX: ffffa0ff4cfa3248
    
    The RAX register is showing the number of stripes (including PQ), which
    is not correct (0).  But the remaining two registers are all sane.
    
    - RBX is the sectorsize
      For x86_64 it should always be 4K and matches the output.
    
    - RCX is the pointers array
      Which is from rbio->finish_pointers, and it looks like a sane
      kernel address.
    
    [WORKAROUND]
    For now, I can only add extra debug ASSERT()s before we call raid6
    gen_syndrome() helper and hopes to catch the problem.
    
    The debug requires both CONFIG_BTRFS_DEBUG and CONFIG_BTRFS_ASSERT
    enabled.
    
    My current guess is some use-after-free, but every report is only having
    corrupted RAX but seemingly valid pointers doesn't make much sense.
    Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    b2324e08
raid56.c 74.4 KB