1. 22 Jan, 2018 40 commits
    • Liu Bo's avatar
      Btrfs: avoid losing data raid profile when deleting a device · a6f93c71
      Liu Bo authored
      We've avoided data losing raid profile when doing balance, but it
      turns out that deleting a device could also result in the same
      problem.
      
      Say we have 3 disks, and they're created with '-d raid1' profile.
      
      - We have chunk P (the only data chunk on the empty btrfs).
      
      - Suppose that chunk P's two raid1 copies reside in disk A and disk B.
      
      - Now, 'btrfs device remove disk B'
               btrfs_rm_device()
      	   -> btrfs_shrink_device()
      	      -> btrfs_relocate_chunk() #relocate any chunk on disk B
      	      	 			 to other places.
      
      - Chunk P will be removed and a new chunk will be created to hold
        those data, but as chunk P is the only one holding raid1 profile,
        after it goes away, the new chunk will be created as single profile
        which is our default profile.
      
      This fixes the problem by creating an empty data chunk before
      relocating the data chunk.
      
      Metadata/System chunk are supposed to have non-zero bytes all the time
      so their raid profile is preserved.
      Reported-by: default avatarJames Alandt <James.Alandt@wdc.com>
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a6f93c71
    • Filipe Manana's avatar
      Btrfs: fix space leak after fallocate and zero range operations · 81fdf638
      Filipe Manana authored
      If we do a buffered write after a zero range operation that has an
      unaligned (with the filesystem's sector size) end which also falls within
      an unwritten (prealloc) extent that is currently beyond the inode's
      i_size, and the zero range operation has the flag FALLOC_FL_KEEP_SIZE,
      we end up leaking data and metadata space. This happens because when
      zeroing a range we call btrfs_truncate_block(), which does delalloc
      (loads the page and partially zeroes its content), and in the buffered
      write path we only clear existing delalloc space reservation for the
      range we are writing into if that range starts at an offset smaller then
      the inode's i_size, which makes sense since we can not have delalloc
      extents beyond the i_size, only unwritten extents are allowed.
      
      Example reproducer:
      
       $ mkfs.btrfs -f /dev/sdb
       $ mount /dev/sdb /mnt
       $ xfs_io -f -c "falloc -k 428K 4K" /mnt/foobar
       $ xfs_io -c "fzero -k 0 430K" /mnt/foobar
       $ xfs_io -c "pwrite -S 0xaa 428K 4K" /mnt/foobar
       $ umount /mnt
      
      After the unmount we get the metadata and data space leaks reported in
      dmesg/syslog:
      
       [95794.602253] ------------[ cut here ]------------
       [95794.603322] WARNING: CPU: 0 PID: 31496 at fs/btrfs/inode.c:9561 btrfs_destroy_inode+0x4e/0x206 [btrfs]
       [95794.605167] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs]
       [95794.613000] CPU: 0 PID: 31496 Comm: umount Tainted: G        W       4.14.0-rc6-btrfs-next-54+ #1
       [95794.614448] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
       [95794.615972] task: ffff880075aa0240 task.stack: ffffc90001734000
       [95794.617114] RIP: 0010:btrfs_destroy_inode+0x4e/0x206 [btrfs]
       [95794.618001] RSP: 0018:ffffc90001737d00 EFLAGS: 00010202
       [95794.618721] RAX: 0000000000000000 RBX: ffff880070fa1418 RCX: ffffc90001737c7c
       [95794.619645] RDX: 0000000175aa0240 RSI: 0000000000000001 RDI: ffff880070fa1418
       [95794.620711] RBP: ffffc90001737d38 R08: 0000000000000000 R09: 0000000000000000
       [95794.621932] R10: ffffc90001737c48 R11: ffff88007123e158 R12: ffff880075b6a000
       [95794.623124] R13: ffff88006145c000 R14: ffff880070fa1418 R15: ffff880070c3b4a0
       [95794.624188] FS:  00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
       [95794.625578] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [95794.626522] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0
       [95794.627647] Call Trace:
       [95794.628128]  destroy_inode+0x3d/0x55
       [95794.628573]  evict+0x177/0x17e
       [95794.629010]  dispose_list+0x50/0x71
       [95794.629478]  evict_inodes+0x132/0x141
       [95794.630289]  generic_shutdown_super+0x3f/0x10b
       [95794.630864]  kill_anon_super+0x12/0x1c
       [95794.631383]  btrfs_kill_super+0x16/0x21 [btrfs]
       [95794.631930]  deactivate_locked_super+0x30/0x68
       [95794.632539]  deactivate_super+0x36/0x39
       [95794.633200]  cleanup_mnt+0x49/0x67
       [95794.633818]  __cleanup_mnt+0x12/0x14
       [95794.634416]  task_work_run+0x82/0xa6
       [95794.634902]  prepare_exit_to_usermode+0xe1/0x10c
       [95794.635525]  syscall_return_slowpath+0x18c/0x1af
       [95794.636122]  entry_SYSCALL_64_fastpath+0xab/0xad
       [95794.636834] RIP: 0033:0x7fa678cb99a7
       [95794.637370] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       [95794.638672] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7
       [95794.639596] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90
       [95794.640703] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015
       [95794.641773] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64
       [95794.643150] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160
       [95794.644249] Code: ff 4c 8b a8 80 06 00 00 48 8b 87 c0 01 00 00 48 85 c0 74 02 0f ff 48 83 bb e0 02 00 00 00 74 02 0f ff 83 bb 3c ff ff ff 00 74 02 <0f> ff 83 bb 40 ff ff ff 00 74 02 0f ff 48 83 bb f8 fe ff ff 00
       [95794.646929] ---[ end trace e95877675c6ec007 ]---
       [95794.647751] ------------[ cut here ]------------
       [95794.648509] WARNING: CPU: 0 PID: 31496 at fs/btrfs/inode.c:9562 btrfs_destroy_inode+0x59/0x206 [btrfs]
       [95794.649842] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs]
       [95794.654659] CPU: 0 PID: 31496 Comm: umount Tainted: G        W       4.14.0-rc6-btrfs-next-54+ #1
       [95794.655894] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
       [95794.657546] task: ffff880075aa0240 task.stack: ffffc90001734000
       [95794.658433] RIP: 0010:btrfs_destroy_inode+0x59/0x206 [btrfs]
       [95794.659279] RSP: 0018:ffffc90001737d00 EFLAGS: 00010202
       [95794.660054] RAX: 0000000000000000 RBX: ffff880070fa1418 RCX: ffffc90001737c7c
       [95794.660753] RDX: 0000000175aa0240 RSI: 0000000000000001 RDI: ffff880070fa1418
       [95794.661513] RBP: ffffc90001737d38 R08: 0000000000000000 R09: 0000000000000000
       [95794.662289] R10: ffffc90001737c48 R11: ffff88007123e158 R12: ffff880075b6a000
       [95794.663393] R13: ffff88006145c000 R14: ffff880070fa1418 R15: ffff880070c3b4a0
       [95794.664342] FS:  00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
       [95794.665673] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [95794.666593] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0
       [95794.667629] Call Trace:
       [95794.668065]  destroy_inode+0x3d/0x55
       [95794.668637]  evict+0x177/0x17e
       [95794.669179]  dispose_list+0x50/0x71
       [95794.669830]  evict_inodes+0x132/0x141
       [95794.670416]  generic_shutdown_super+0x3f/0x10b
       [95794.671103]  kill_anon_super+0x12/0x1c
       [95794.671786]  btrfs_kill_super+0x16/0x21 [btrfs]
       [95794.672552]  deactivate_locked_super+0x30/0x68
       [95794.673393]  deactivate_super+0x36/0x39
       [95794.674107]  cleanup_mnt+0x49/0x67
       [95794.674706]  __cleanup_mnt+0x12/0x14
       [95794.675279]  task_work_run+0x82/0xa6
       [95794.675795]  prepare_exit_to_usermode+0xe1/0x10c
       [95794.676507]  syscall_return_slowpath+0x18c/0x1af
       [95794.677275]  entry_SYSCALL_64_fastpath+0xab/0xad
       [95794.678006] RIP: 0033:0x7fa678cb99a7
       [95794.678600] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       [95794.679739] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7
       [95794.680779] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90
       [95794.681837] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015
       [95794.682867] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64
       [95794.683891] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160
       [95794.684843] Code: c0 01 00 00 48 85 c0 74 02 0f ff 48 83 bb e0 02 00 00 00 74 02 0f ff 83 bb 3c ff ff ff 00 74 02 0f ff 83 bb 40 ff ff ff 00 74 02 <0f> ff 48 83 bb f8 fe ff ff 00 74 02 0f ff 48 83 bb 00 ff ff ff
       [95794.687156] ---[ end trace e95877675c6ec008 ]---
       [95794.687876] ------------[ cut here ]------------
       [95794.688579] WARNING: CPU: 0 PID: 31496 at fs/btrfs/inode.c:9565 btrfs_destroy_inode+0x7d/0x206 [btrfs]
       [95794.689735] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs]
       [95794.695015] CPU: 0 PID: 31496 Comm: umount Tainted: G        W       4.14.0-rc6-btrfs-next-54+ #1
       [95794.696396] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
       [95794.697956] task: ffff880075aa0240 task.stack: ffffc90001734000
       [95794.698925] RIP: 0010:btrfs_destroy_inode+0x7d/0x206 [btrfs]
       [95794.699763] RSP: 0018:ffffc90001737d00 EFLAGS: 00010206
       [95794.700434] RAX: 0000000000000000 RBX: ffff880070fa1418 RCX: ffffc90001737c7c
       [95794.701445] RDX: 0000000175aa0240 RSI: 0000000000000001 RDI: ffff880070fa1418
       [95794.702448] RBP: ffffc90001737d38 R08: 0000000000000000 R09: 0000000000000000
       [95794.703557] R10: ffffc90001737c48 R11: ffff88007123e158 R12: ffff880075b6a000
       [95794.704441] R13: ffff88006145c000 R14: ffff880070fa1418 R15: ffff880070c3b4a0
       [95794.705270] FS:  00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
       [95794.706341] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [95794.707001] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0
       [95794.708030] Call Trace:
       [95794.708466]  destroy_inode+0x3d/0x55
       [95794.709071]  evict+0x177/0x17e
       [95794.709497]  dispose_list+0x50/0x71
       [95794.709973]  evict_inodes+0x132/0x141
       [95794.710564]  generic_shutdown_super+0x3f/0x10b
       [95794.711200]  kill_anon_super+0x12/0x1c
       [95794.711633]  btrfs_kill_super+0x16/0x21 [btrfs]
       [95794.712139]  deactivate_locked_super+0x30/0x68
       [95794.712608]  deactivate_super+0x36/0x39
       [95794.713093]  cleanup_mnt+0x49/0x67
       [95794.713514]  __cleanup_mnt+0x12/0x14
       [95794.713933]  task_work_run+0x82/0xa6
       [95794.714543]  prepare_exit_to_usermode+0xe1/0x10c
       [95794.715247]  syscall_return_slowpath+0x18c/0x1af
       [95794.715952]  entry_SYSCALL_64_fastpath+0xab/0xad
       [95794.716653] RIP: 0033:0x7fa678cb99a7
       [95794.721100] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       [95794.722052] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7
       [95794.722856] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90
       [95794.723698] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015
       [95794.724736] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64
       [95794.725928] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160
       [95794.726728] Code: 40 ff ff ff 00 74 02 0f ff 48 83 bb f8 fe ff ff 00 74 02 0f ff 48 83 bb 00 ff ff ff 00 74 02 0f ff 48 83 bb 30 ff ff ff 00 74 02 <0f> ff 48 83 bb 08 ff ff ff 00 74 02 0f ff 4d 85 e4 0f 84 52 01
       [95794.729203] ---[ end trace e95877675c6ec009 ]---
       [95794.841054] ------------[ cut here ]------------
       [95794.841829] WARNING: CPU: 0 PID: 31496 at fs/btrfs/extent-tree.c:5831 btrfs_free_block_groups+0x235/0x36a [btrfs]
       [95794.843425] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs]
       [95794.850658] CPU: 0 PID: 31496 Comm: umount Tainted: G        W       4.14.0-rc6-btrfs-next-54+ #1
       [95794.852590] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
       [95794.854752] task: ffff880075aa0240 task.stack: ffffc90001734000
       [95794.855812] RIP: 0010:btrfs_free_block_groups+0x235/0x36a [btrfs]
       [95794.856811] RSP: 0018:ffffc90001737d70 EFLAGS: 00010206
       [95794.857805] RAX: 0000000080000000 RBX: ffff88006145c000 RCX: 0000000000000001
       [95794.859014] RDX: 00000001810af668 RSI: 0000000000000002 RDI: 00000000ffffffff
       [95794.860270] RBP: ffffc90001737d98 R08: 0000000000000000 R09: ffffffff817e22b9
       [95794.861525] R10: ffffc90001737c80 R11: 00000000000337fd R12: 0000000000000000
       [95794.862700] R13: ffff88006145c0c0 R14: ffff88021b61a800 R15: ffff88006145c100
       [95794.863810] FS:  00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
       [95794.865149] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [95794.866099] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0
       [95794.867198] Call Trace:
       [95794.867626]  close_ctree+0x1db/0x2b8 [btrfs]
       [95794.868188]  ? evict_inodes+0x132/0x141
       [95794.869037]  btrfs_put_super+0x15/0x17 [btrfs]
       [95794.870400]  generic_shutdown_super+0x6a/0x10b
       [95794.871262]  kill_anon_super+0x12/0x1c
       [95794.872046]  btrfs_kill_super+0x16/0x21 [btrfs]
       [95794.872746]  deactivate_locked_super+0x30/0x68
       [95794.873687]  deactivate_super+0x36/0x39
       [95794.874639]  cleanup_mnt+0x49/0x67
       [95794.875504]  __cleanup_mnt+0x12/0x14
       [95794.876126]  task_work_run+0x82/0xa6
       [95794.876788]  prepare_exit_to_usermode+0xe1/0x10c
       [95794.877777]  syscall_return_slowpath+0x18c/0x1af
       [95794.878381]  entry_SYSCALL_64_fastpath+0xab/0xad
       [95794.878888] RIP: 0033:0x7fa678cb99a7
       [95794.879307] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       [95794.880204] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7
       [95794.881640] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90
       [95794.882690] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015
       [95794.883538] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64
       [95794.884562] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160
       [95794.885664] Code: 89 ef e8 07 ec 32 e1 e8 9d c0 ea e0 48 8d b3 28 02 00 00 48 83 c9 ff 31 d2 48 89 df e8 29 c5 ff ff 48 83 bb 80 02 00 00 00 74 02 <0f> ff 48 83 bb 88 02 00 00 00 74 02 0f ff 48 83 bb d8 02 00 00
       [95794.887980] ---[ end trace e95877675c6ec00a ]---
       [95794.888739] ------------[ cut here ]------------
       [95794.889405] WARNING: CPU: 0 PID: 31496 at fs/btrfs/extent-tree.c:5832 btrfs_free_block_groups+0x241/0x36a [btrfs]
       [95794.891020] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs]
       [95794.897551] CPU: 0 PID: 31496 Comm: umount Tainted: G        W       4.14.0-rc6-btrfs-next-54+ #1
       [95794.898509] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
       [95794.899685] task: ffff880075aa0240 task.stack: ffffc90001734000
       [95794.900592] RIP: 0010:btrfs_free_block_groups+0x241/0x36a [btrfs]
       [95794.901387] RSP: 0018:ffffc90001737d70 EFLAGS: 00010206
       [95794.902300] RAX: 0000000080000000 RBX: ffff88006145c000 RCX: 0000000000000001
       [95794.903260] RDX: 00000001810af668 RSI: 0000000000000002 RDI: 00000000ffffffff
       [95794.904332] RBP: ffffc90001737d98 R08: 0000000000000000 R09: ffffffff817e22b9
       [95794.905300] R10: ffffc90001737c80 R11: 00000000000337fd R12: 0000000000000000
       [95794.906439] R13: ffff88006145c0c0 R14: ffff88021b61a800 R15: ffff88006145c100
       [95794.907459] FS:  00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
       [95794.908625] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [95794.909511] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0
       [95794.910630] Call Trace:
       [95794.911153]  close_ctree+0x1db/0x2b8 [btrfs]
       [95794.911837]  ? evict_inodes+0x132/0x141
       [95794.912344]  btrfs_put_super+0x15/0x17 [btrfs]
       [95794.912975]  generic_shutdown_super+0x6a/0x10b
       [95794.913788]  kill_anon_super+0x12/0x1c
       [95794.914424]  btrfs_kill_super+0x16/0x21 [btrfs]
       [95794.915142]  deactivate_locked_super+0x30/0x68
       [95794.915831]  deactivate_super+0x36/0x39
       [95794.916433]  cleanup_mnt+0x49/0x67
       [95794.917045]  __cleanup_mnt+0x12/0x14
       [95794.917665]  task_work_run+0x82/0xa6
       [95794.918309]  prepare_exit_to_usermode+0xe1/0x10c
       [95794.919021]  syscall_return_slowpath+0x18c/0x1af
       [95794.919722]  entry_SYSCALL_64_fastpath+0xab/0xad
       [95794.920426] RIP: 0033:0x7fa678cb99a7
       [95794.921039] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       [95794.922303] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7
       [95794.923335] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90
       [95794.924364] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015
       [95794.925435] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64
       [95794.926533] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160
       [95794.927557] Code: 48 8d b3 28 02 00 00 48 83 c9 ff 31 d2 48 89 df e8 29 c5 ff ff 48 83 bb 80 02 00 00 00 74 02 0f ff 48 83 bb 88 02 00 00 00 74 02 <0f> ff 48 83 bb d8 02 00 00 00 74 02 0f ff 48 83 bb e0 02 00 00
       [95794.930166] ---[ end trace e95877675c6ec00b ]---
       [95794.930961] ------------[ cut here ]------------
       [95794.931727] WARNING: CPU: 0 PID: 31496 at fs/btrfs/extent-tree.c:9953 btrfs_free_block_groups+0x2bc/0x36a [btrfs]
       [95794.932729] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs]
       [95794.938394] CPU: 0 PID: 31496 Comm: umount Tainted: G        W       4.14.0-rc6-btrfs-next-54+ #1
       [95794.939842] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
       [95794.941455] task: ffff880075aa0240 task.stack: ffffc90001734000
       [95794.942336] RIP: 0010:btrfs_free_block_groups+0x2bc/0x36a [btrfs]
       [95794.943268] RSP: 0018:ffffc90001737d70 EFLAGS: 00010206
       [95794.944127] RAX: ffff8802004fd0e8 RBX: ffff88006145c000 RCX: 0000000000000001
       [95794.945211] RDX: 00000001810af668 RSI: 0000000000000002 RDI: 00000000ffffffff
       [95794.946316] RBP: ffffc90001737d98 R08: 0000000000000000 R09: ffffffff817e22b9
       [95794.947271] R10: ffffc90001737c80 R11: 00000000000337fd R12: ffff8802004fd0e8
       [95794.948219] R13: ffff88006145c0c0 R14: ffff88006145e598 R15: ffff88006145c100
       [95794.949193] FS:  00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
       [95794.950495] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [95794.951338] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0
       [95794.952361] Call Trace:
       [95794.952811]  close_ctree+0x1db/0x2b8 [btrfs]
       [95794.953522]  ? evict_inodes+0x132/0x141
       [95794.954543]  btrfs_put_super+0x15/0x17 [btrfs]
       [95794.955231]  generic_shutdown_super+0x6a/0x10b
       [95794.955916]  kill_anon_super+0x12/0x1c
       [95794.956414]  btrfs_kill_super+0x16/0x21 [btrfs]
       [95794.956953]  deactivate_locked_super+0x30/0x68
       [95794.957635]  deactivate_super+0x36/0x39
       [95794.958256]  cleanup_mnt+0x49/0x67
       [95794.958701]  __cleanup_mnt+0x12/0x14
       [95794.959181]  task_work_run+0x82/0xa6
       [95794.959635]  prepare_exit_to_usermode+0xe1/0x10c
       [95794.960182]  syscall_return_slowpath+0x18c/0x1af
       [95794.960731]  entry_SYSCALL_64_fastpath+0xab/0xad
       [95794.961438] RIP: 0033:0x7fa678cb99a7
       [95794.961990] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       [95794.963111] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7
       [95794.963975] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90
       [95794.964680] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015
       [95794.965763] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64
       [95794.966868] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160
       [95794.967800] Code: 00 00 00 4c 8b a3 98 25 00 00 49 83 bc 24 60 ff ff ff 00 75 16 49 83 bc 24 68 ff ff ff 00 75 0b 49 83 bc 24 70 ff ff ff 00 74 16 <0f> ff 49 8d b4 24 18 ff ff ff 31 c9 31 d2 48 89 df e8 93 7a ff
       [95794.970629] ---[ end trace e95877675c6ec00c ]---
       [95794.971451] BTRFS info (device sdi): space_info 1 has 7680000 free, is not full
       [95794.972351] BTRFS info (device sdi): space_info total=8388608, used=704512, pinned=0, reserved=0, may_use=4096, readonly=0
       [95794.973595] ------------[ cut here ]------------
       [95794.974353] WARNING: CPU: 0 PID: 31496 at fs/btrfs/extent-tree.c:9953 btrfs_free_block_groups+0x2bc/0x36a [btrfs]
       [95794.980163] Modules linked in: btrfs xfs ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper parport_pc psmouse sg i2c_piix4 parport i2c_core evdev pcspkr button serio_raw sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod virtio_scsi ata_generic crc32c_intel ata_piix floppy virtio_pci virtio_ring virtio libata scsi_mod e1000 [last unloaded: btrfs]
       [95794.986461] CPU: 0 PID: 31496 Comm: umount Tainted: G        W       4.14.0-rc6-btrfs-next-54+ #1
       [95794.987591] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
       [95794.988929] task: ffff880075aa0240 task.stack: ffffc90001734000
       [95794.989922] RIP: 0010:btrfs_free_block_groups+0x2bc/0x36a [btrfs]
       [95794.990715] RSP: 0018:ffffc90001737d70 EFLAGS: 00010206
       [95794.991431] RAX: ffff88020f6e70e8 RBX: ffff88006145c000 RCX: ffffffff8115a906
       [95794.992455] RDX: ffffffff8115a902 RSI: ffff880075aa0b40 RDI: ffff880075aa0b40
       [95794.993535] RBP: ffffc90001737d98 R08: 0000000000000020 R09: fffffffffffffff7
       [95794.994573] R10: 00000000ffffffc4 R11: ffff8800633b1bc0 R12: ffff88020f6e70e8
       [95794.996250] R13: 0000000000000038 R14: ffff88006145e598 R15: 0000000000000000
       [95794.997233] FS:  00007fa6793c92c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
       [95794.998592] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [95794.999484] CR2: 000056338670d048 CR3: 00000000610dc005 CR4: 00000000001606f0
       [95795.000542] Call Trace:
       [95795.001138]  close_ctree+0x1db/0x2b8 [btrfs]
       [95795.001885]  ? evict_inodes+0x132/0x141
       [95795.002407]  btrfs_put_super+0x15/0x17 [btrfs]
       [95795.003093]  generic_shutdown_super+0x6a/0x10b
       [95795.003720]  kill_anon_super+0x12/0x1c
       [95795.004353]  btrfs_kill_super+0x16/0x21 [btrfs]
       [95795.005095]  deactivate_locked_super+0x30/0x68
       [95795.005716]  deactivate_super+0x36/0x39
       [95795.006388]  cleanup_mnt+0x49/0x67
       [95795.006939]  __cleanup_mnt+0x12/0x14
       [95795.007512]  task_work_run+0x82/0xa6
       [95795.008124]  prepare_exit_to_usermode+0xe1/0x10c
       [95795.008994]  syscall_return_slowpath+0x18c/0x1af
       [95795.009831]  entry_SYSCALL_64_fastpath+0xab/0xad
       [95795.010610] RIP: 0033:0x7fa678cb99a7
       [95795.011193] RSP: 002b:00007ffccf0aaed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       [95795.012327] RAX: 0000000000000000 RBX: 0000563386706030 RCX: 00007fa678cb99a7
       [95795.013432] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000056338670ca90
       [95795.014558] RBP: 000056338670ca90 R08: 000056338670c740 R09: 0000000000000015
       [95795.015577] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fa6791bae64
       [95795.016569] R13: 0000000000000000 R14: 0000563386706210 R15: 00007ffccf0ab160
       [95795.017662] Code: 00 00 00 4c 8b a3 98 25 00 00 49 83 bc 24 60 ff ff ff 00 75 16 49 83 bc 24 68 ff ff ff 00 75 0b 49 83 bc 24 70 ff ff ff 00 74 16 <0f> ff 49 8d b4 24 18 ff ff ff 31 c9 31 d2 48 89 df e8 93 7a ff
       [95795.020538] ---[ end trace e95877675c6ec00d ]---
       [95795.021259] BTRFS info (device sdi): space_info 4 has 1072775168 free, is not full
       [95795.022390] BTRFS info (device sdi): space_info total=1073741824, used=114688, pinned=0, reserved=0, may_use=786432, readonly=65536
      
      Fix this by ensuring the zero range operation does not call
      btrfs_truncate_block() if the corresponding extent is an unwritten one
      (it's pointless anyway, since reading from an unwritten extent yields
      zeroes).
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Tested-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      81fdf638
    • Filipe Manana's avatar
      Btrfs: fix missing inode i_size update after zero range operation · 9f13ce74
      Filipe Manana authored
      For a fallocate's zero range operation that targets a range with an end
      that is not aligned to the sector size, we can end up not updating the
      inode's i_size. This happens when the last page of the range maps to an
      unwritten (prealloc) extent and before that last page we have either a
      hole or a written extent. This is because in this scenario we relied
      on a call to btrfs_prealloc_file_range() to update the inode's i_size,
      however it can only update the i_size to the "down aligned" end of the
      range.
      
      Example:
      
       $ mkfs.btrfs -f /dev/sdc
       $ mount /dev/sdc /mnt
       $ xfs_io -f -c "pwrite -S 0xff 0 428K" /mnt/foobar
       $ xfs_io -c "falloc -k 428K 4K" /mnt/foobar
       $ xfs_io -c "fzero 0 430K" /mnt/foobar
       $ du --bytes /mnt/foobar
       438272	/mnt/foobar
      
      The inode's i_size was left as 428Kb (438272 bytes) when it should have
      been updated to 430Kb (440320 bytes).
      Fix this by always updating the inode's i_size explicitly after zeroing
      the range.
      
      Fixes: ba6d5887946ff86d93dc ("Btrfs: add support for fallocate's zero range operation")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      9f13ce74
    • Filipe Manana's avatar
      Btrfs: use cached state when dirtying pages during buffered write · 94f45071
      Filipe Manana authored
      During a buffered IO write, we can have an extent state that we got when
      we locked the range (if the range starts at an offset lower than eof), so
      always pass it to btrfs_dirty_pages() so that setting the delalloc bit
      in the range does not need to do a full search in the inode's io tree,
      saving time and reducing the amount of time we hold the io tree's lock.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      94f45071
    • Filipe Manana's avatar
      Btrfs: add support for fallocate's zero range operation · f27451f2
      Filipe Manana authored
      This implements support the zero range operation of fallocate. For now
      at least it's as simple as possible while reusing most of the existing
      fallocate and hole punching infrastructure.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      f27451f2
    • Liu Bo's avatar
      Btrfs: do not merge rbios if their fail stripe index are not identical · cc54ff62
      Liu Bo authored
      Since fail stripe index in rbio would be used to decide which
      algorithm reconstruction would be run, we cannot merge rbios if
      their's fail striped indexes are different, otherwise, one of the two
      reconstructions would fail.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      cc54ff62
    • Liu Bo's avatar
      Btrfs: remove redundant check in rbio_can_merge · db34be19
      Liu Bo authored
      Given the above
      '
      if (last->operation != cur->operation)
      	return 0;
      ',
      it's guaranteed that two operations are same.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      db34be19
    • Anand Jain's avatar
      btrfs: minor style cleanups in btrfs_scan_one_device · 05a5c55d
      Anand Jain authored
      Assign ret = -EINVAL where it is actually required.
      Remove { } around single line if else code.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      05a5c55d
    • Anand Jain's avatar
      btrfs: simplify mutex unlocking code in btrfs_commit_transaction · c1f32b7c
      Anand Jain authored
      No functional change rearrange the mutex_unlock.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      [ edit subject ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      c1f32b7c
    • Anand Jain's avatar
      btrfs: rename btrfs_device::scrub_device to scrub_ctx · cadbc0a0
      Anand Jain authored
      btrfs_device::scrub_device is not a device which is being scrubbed,
      but it holds the scrub context, so rename to reflect the same. No
      functional changes here.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      cadbc0a0
    • Anand Jain's avatar
      btrfS: collapse btrfs_handle_error() into __btrfs_handle_fs_error() · 922ea899
      Anand Jain authored
      There is no other consumer for btrfs_handle_error() other than
      __btrfs_handle_fs_error(), further this function quite small.
      Merge it into its parent.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      [ reformat comment ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      922ea899
    • Anand Jain's avatar
      btrfs: remove check for BTRFS_FS_STATE_ERROR which we just set · 61ecda68
      Anand Jain authored
      __btrfs_handle_fs_error() sets BTRFS_FS_STATE_ERROR, and calls
      btrfs_handle_error() so no need to check if the BTRFS_FS_STATE_ERROR
      is set in btrfs_handle_error(). And there is no other user of
      btrfs_handle_error() as well.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      61ecda68
    • Liu Bo's avatar
      Btrfs: make raid6 rebuild retry more · 8810f751
      Liu Bo authored
      There is a scenario that can end up with rebuild process failing to
      return good content, i.e.
      suppose that all disks can be read without problems and if the content
      that was read out doesn't match its checksum, currently for raid6
      btrfs at most retries twice,
      
      - the 1st retry is to rebuild with all other stripes, it'll eventually
        be a raid5 xor rebuild,
      - if the 1st fails, the 2nd retry will deliberately fail parity p so
        that it will do raid6 style rebuild,
      
      however, the chances are that another non-parity stripe content also
      has something corrupted, so that the above retries are not able to
      return correct content, and users will think of this as data loss.
      More seriouly, if the loss happens on some important internal btree
      roots, it could refuse to mount.
      
      This extends btrfs to do more retries and each retry fails only one
      stripe.  Since raid6 can tolerate 2 disk failures, if there is one
      more failure besides the failure on which we're recovering, this can
      always work.
      
      The worst case is to retry as many times as the number of raid6 disks,
      but given the fact that such a scenario is really rare in practice,
      it's still acceptable.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      8810f751
    • Liu Bo's avatar
      Btrfs: fix scrub to repair raid6 corruption · 762221f0
      Liu Bo authored
      The raid6 corruption is that,
      suppose that all disks can be read without problems and if the content
      that was read out doesn't match its checksum, currently for raid6
      btrfs at most retries twice,
      
      - the 1st retry is to rebuild with all other stripes, it'll eventually
        be a raid5 xor rebuild,
      - if the 1st fails, the 2nd retry will deliberately fail parity p so
        that it will do raid6 style rebuild,
      
      however, the chances are that another non-parity stripe content also
      has something corrupted, so that the above retries are not able to
      return correct content.
      
      We've fixed normal reads to rebuild raid6 correctly with more retries
      in Patch "Btrfs: make raid6 rebuild retry more"[1], this is to fix
      scrub to do the exactly same rebuild process.
      
      [1]: https://patchwork.kernel.org/patch/10091755/Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      762221f0
    • Anand Jain's avatar
      btrfs: factor btrfs_check_rw_degradable() to check given device · 6528b99d
      Anand Jain authored
      Update btrfs_check_rw_degradable() to check against the given device if
      its lost.
      
      We can use this function to know if the volume is going to be in
      degraded mode OR failed state, when the given device fails.  Which is
      needed when we are handling the device failed state.
      
      A preparatory patch does not affect the flow as such.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      [ enhance comment ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      6528b99d
    • David Sterba's avatar
      btrfs: sink unlock_extent parameter gfp_flags · e43bbe5e
      David Sterba authored
      All callers pass either GFP_NOFS or GFP_KERNEL now, so we can sink the
      parameter to the function, though we lose some of the slightly better
      semantics of GFP_KERNEL in some places, it's worth cleaning up the
      callchains.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e43bbe5e
    • David Sterba's avatar
      btrfs: add separate helper for unlock_extent_cached with GFP_ATOMIC · d810a4be
      David Sterba authored
      There's only one instance where we pass different gfp mask to
      unlock_extent_cached. Add a separate helper for that and then we can
      drop the gfp parameter from unlock_extent_cached.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d810a4be
    • David Sterba's avatar
      btrfs: drop unused parameters from mount_subvol · 5bedc48a
      David Sterba authored
      Recent patches reworking the mount path left some unused parameters. We
      pass a vfsmount to mount_subvol, the flags and data (ie. mount options)
      have been already applied and we will not need them.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      5bedc48a
    • Misono, Tomohiro's avatar
      btrfs: cleanup unnecessary string dup in btrfs_parse_options() · e215772c
      Misono, Tomohiro authored
      Long ago, commit edf24abe ("btrfs: sanity mount option parsing and
      early mount code") split the btrfs_parse_options() into two parts
      (btrfs_parse_early_options() and btrfs_parse_options()). As a result,
      btrfs_parse_optins no longer gets called twice and is the last one to
      parse mount option string. Therefore there is no need to dup it.
      Signed-off-by: default avatarTomohiro Misono <misono.tomohiro@jp.fujitsu.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e215772c
    • Liu Bo's avatar
      Btrfs: remove unused wait in btrfs_stripe_hash · 203e02d9
      Liu Bo authored
      In fact nobody is waiting on @wait's waitqueue, it can be safely
      removed.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      203e02d9
    • Nikolay Borisov's avatar
      btrfs: Remove redundant pair of bio_get/set in __btrfs_submit_dio_bio · 36f7894f
      Nikolay Borisov authored
      The bio is not referenced after it has been submitted and the endio is
      going to consume the sole reference on successful submission. On error,
      the callers of __btrfs_submit_dio_bio do invoke bio_put so we don't
      leak it either.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      36f7894f
    • Nikolay Borisov's avatar
      btrfs: Remove redundant bio_get/bio_set pair from submit_one_bio · ffc9c8dd
      Nikolay Borisov authored
      The bio is never referenced after it has been submitted so there is no
      point in getting an extra reference.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ffc9c8dd
    • Nikolay Borisov's avatar
      btrfs: Remove redundant bio_get/set from submit_dio_repair_bio · ea057f6d
      Nikolay Borisov authored
      The bio that is passsed is the newly created repair bio which already
      has a reference count of 1, which is going to be consumed by the
      endio routine on successful submission. On error the handler also
      calls bio_put.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ea057f6d
    • Nikolay Borisov's avatar
      btrfs: Remove redundant bio_get/set calls in compressed read/write paths · 32506af5
      Nikolay Borisov authored
      bio_get/set is necessary only if the bio is going to be referenced
      following submissions. In the code paths where such calls are made
      we don't really need them since the bio is referenced only if
      btrfs_map_bio returns an error. And this function can return an error
      prior to submission only. So referencing the bio is safe. Furthermore
      we do call bio_endio which will consume the last reference. So let's
      remove the redundant calls.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      32506af5
    • Nikolay Borisov's avatar
      4271ecea
    • David Sterba's avatar
      btrfs: heuristic: call get4bits directly · 36243c91
      David Sterba authored
      As it's a single instance and local to the file, we don't need to pass
      it as an argument.
      Reviewed-by: default avatarTimofey Titovets <nefelim4ag@gmail.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      36243c91
    • David Sterba's avatar
      btrfs: heuristic: open code copy_call callback of radix sort · 7add17be
      David Sterba authored
      The callback is trivial and we don't need the abstraction for our
      purposes. Let's open code it.
      Reviewed-by: default avatarTimofey Titovets <nefelim4ag@gmail.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      7add17be
    • David Sterba's avatar
      btrfs: heuristic: open code get_num callback of radix sort · 23ae8c63
      David Sterba authored
      The callback is trivial and we don't need the abstraction for our
      purposes. Let's open code it and also make the array types explicit.
      Reviewed-by: default avatarTimofey Titovets <nefelim4ag@gmail.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      23ae8c63
    • Misono, Tomohiro's avatar
      btrfs: remove unused arg from parse_subvol_options() · 78f6beac
      Misono, Tomohiro authored
      Remove unused arg 'holder' from parse_subvol_options(), which has been
      forgotten to be cleaned in the commit b99beb110e2d ("btrfs: split
      parse_early_options() in two").
      Signed-off-by: default avatarTomohiro Misono <misono.tomohiro@jp.fujitsu.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      78f6beac
    • Misono, Tomohiro's avatar
      btrfs: remove unused setup_root_args() · 83085935
      Misono, Tomohiro authored
      Since setup_root_args() is not used anymore, just remove it.
      Signed-off-by: default avatarTomohiro Misono <misono.tomohiro@jp.fujitsu.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      83085935
    • Misono, Tomohiro's avatar
      btrfs: split parse_early_options() in two · d7407606
      Misono, Tomohiro authored
      Now parse_early_options() is used by both btrfs_mount() and
      btrfs_mount_root(). However, the former only needs subvol related part
      and the latter needs the others.
      
      Therefore extract the subvol related parts from parse_early_options() and
      move it to new parse function (parse_subvol_options()).
      Signed-off-by: default avatarTomohiro Misono <misono.tomohiro@jp.fujitsu.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d7407606
    • Misono, Tomohiro's avatar
      btrfs: cleanup btrfs_mount() using btrfs_mount_root() · 312c89fb
      Misono, Tomohiro authored
      Cleanup btrfs_mount() by using btrfs_mount_root(). This avoids getting
      btrfs_mount() called twice in mount path.
      
      Old btrfs_mount() will do:
      0. VFS layer calls vfs_kern_mount() with registered file_system_type
         (for btrfs, btrfs_fs_type). btrfs_mount() is called on the way.
      1. btrfs_parse_early_options() parses "subvolid=" mount option and set the
         value to subvol_objectid. Otherwise, subvol_objectid has the initial
         value of 0
      2. check subvol_objectid is 5 or not. Assume this time id is not 5, then
         btrfs_mount() returns by calling mount_subvol()
      3. In mount_subvol(), original mount options are modified to contain
         "subvolid=0" in setup_root_args(). Then, vfs_kern_mount() is called with
         btrfs_fs_type and new options
      4. btrfs_mount() is called again
      5. btrfs_parse_early_options() parses "subvolid=0" and set 5 (instead of 0)
         to subvol_objectid
      6. check subvol_objectid is 5 or not. This time id is 5 and mount_subvol()
         is not called. btrfs_mount() finishes mounting a root
      7. (in mount_subvol()) with using a return vale of vfs_kern_mount(), it
         calls mount_subtree()
      8. return subvolume's dentry
      
      Reusing the same file_system_type (and btrfs_mount()) for vfs_kern_mount()
      is the cause of complication.
      
      Instead, new btrfs_mount() will do:
      1. parse subvol id related options for later use in mount_subvol()
      2. mount device's root by calling vfs_kern_mount() with
         btrfs_root_fs_type, which is not registered to VFS by
         register_filesystem(). As a result, btrfs_mount_root() is called
      3. return by calling mount_subvol()
      
      The code of 2. is moved from the first part of mount_subvol().
      
      The semantics of device holder changes from btrfs_fs_type to
      btrfs_root_fs_type and has to be used in all contexts. Otherwise we'd
      get wrong results when mount and dev scan would not check the same
      thing. (this has been found indendently and the fix is folded into this
      patch)
      Signed-off-by: default avatarTomohiro Misono <misono.tomohiro@jp.fujitsu.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      [ fold the btrfs_control_ioctl fixup, extend the comment ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      312c89fb
    • Misono, Tomohiro's avatar
      btrfs: add btrfs_mount_root() and new file_system_type · 72fa39f5
      Misono, Tomohiro authored
      Add btrfs_mount_root() and new file_system_type for preparation of cleanup
      of btrfs_mount(). Code path is not changed yet.
      
      btrfs_mount_root() is almost the same as current btrfs_mount(), but doesn't
      have subvolume related part.
      Signed-off-by: default avatarTomohiro Misono <misono.tomohiro@jp.fujitsu.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      72fa39f5
    • David Sterba's avatar
      btrfs: unify extent_page_data type passed as void · aab6e9ed
      David Sterba authored
      Functions called from extent_write_cache_pages used void* as generic
      callback data, but all of them convert it to extent_page_data, or use it
      directly.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      aab6e9ed
    • David Sterba's avatar
      btrfs: sink writepage parameter to extent_write_cache_pages · 935db853
      David Sterba authored
      The function extent_write_cache_pages is modelled after
      write_cache_pages which is a generic interface and the writepage
      parameter makes sense there. In btrfs we know exactly which callback
      we're going to use, so we can pass it directly.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      935db853
    • David Sterba's avatar
      btrfs: sink flush_fn to extent_write_cache_pages · 25b860e0
      David Sterba authored
      All callers pass the same value flush_write_bio.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      25b860e0
    • David Sterba's avatar
      btrfs: merge two flush_write_bio helpers · e2932ee0
      David Sterba authored
      flush_epd_write_bio is same as flush_write_bio, no point having two such
      functions. Merge them to flush_write_bio. The 'noinline' attribute is
      removed as it does not have any meaning.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e2932ee0
    • Nikolay Borisov's avatar
      btrfs: Rename bin_search -> btrfs_bin_search · a74b35ec
      Nikolay Borisov authored
      Currently there are 2 function doing binary search on btrfs nodes:
      bin_search and btrfs_bin_search. The latter being a simple wrapper for
      the former. So eliminate the wrapper and just rename bin_search to
      btrfs_bin_search. No functional changes
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a74b35ec
    • Nikolay Borisov's avatar
      btrfs: sink extent_write_full_page tree argument · 0a9b0e53
      Nikolay Borisov authored
      The tree argument passed to extent_write_full_page is referenced from
      the page being passed to the same function. Since we already have
      enough information to get the reference, remove the function parameter.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      0a9b0e53
    • Nikolay Borisov's avatar
      btrfs: sink extent_write_locked_range tree parameter · 5e3ee236
      Nikolay Borisov authored
      This function is called only from submit_compressed_extents and the
      io tree being passed is always that of the inode. But we are also
      passing the inode, so just move getting the io tree pointer in
      extent_write_locked_range to simplify the signature.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      5e3ee236