Commits · 37bf7718ffa6554bf3be4597d36aec93c5c3ea8f · Kirill Smelkov / linux

04 Mar, 2024 40 commits

btrfs: handle transaction commit errors in flush_reservations() · 37bf7718

David Sterba authored Feb 22, 2024

Other errors in flush_reservations() are handled and also in the caller.
Ignoring commit might make some sense as it's called right after join so
it's to poke the whole commit machinery to free space.

However for consistency return the error. The caller
btrfs_quota_disable() would try to start the transaction which would
in turn fail too so there's no effective change.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

37bf7718

btrfs: use KMEM_CACHE() to create btrfs_free_space cache · 06c95649

Kunwu Chan authored Feb 20, 2024

Use the KMEM_CACHE() macro instead of kmem_cache_create() to simplify
the creation of SLAB caches when the default values are used.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

06c95649

btrfs: use KMEM_CACHE() to create delayed ref caches · b2c7d55e

Kunwu Chan authored Feb 20, 2024

Use the KMEM_CACHE() macro instead of kmem_cache_create() to simplify
the creation of SLAB caches related to delayed refs when the default
values are used.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

b2c7d55e

btrfs: use KMEM_CACHE() to create btrfs_path cache · 66ce5447

Kunwu Chan authored Feb 20, 2024

Use the KMEM_CACHE() macro instead of kmem_cache_create() to simplify
the creation of SLAB caches when the default values are used.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

66ce5447

btrfs: use KMEM_CACHE() to create btrfs_trans_handle cache · 2753b4d8

Kunwu Chan authored Feb 20, 2024

Use the KMEM_CACHE() macro instead of kmem_cache_create() to simplify
the creation of SLAB caches when the default values are used.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

2753b4d8

btrfs: use KMEM_CACHE() to create btrfs_ordered_extent cache · 4bd3e126

Kunwu Chan authored Feb 20, 2024

Use the KMEM_CACHE() macro instead of kmem_cache_create() to simplify
the creation of SLAB caches when the default values are used.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

4bd3e126

btrfs: use KMEM_CACHE() to create btrfs_delayed_node cache · 625c1e06

Kunwu Chan authored Feb 20, 2024

Use the KMEM_CACHE() macro instead of kmem_cache_create() to simplify
the creation of SLAB caches when the default values are used.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

625c1e06

btrfs: uninline some static inline helpers from delayed-ref.h · d57dd52a

David Sterba authored Feb 16, 2024

The helpers are doing an initialization or release work, none of which
is performance critical that it would require a static inline, so move
them to the .c file.
Signed-off-by: David Sterba <dsterba@suse.com>

d57dd52a

btrfs: open code trivial btrfs_lru_cache_size() · e9256716

David Sterba authored Feb 16, 2024

The helper is really trivial, reading a cache size can be done directly.
Signed-off-by: David Sterba <dsterba@suse.com>

e9256716

btrfs: uninline some static inline helpers from tree-log.h · c207adc1

David Sterba authored Feb 16, 2024

The helpers are doing an initialization or release work, none of which
is performance critical that it would require a static inline, so move
them to the .c file.
Signed-off-by: David Sterba <dsterba@suse.com>

c207adc1

btrfs: drop static inline specifiers from tree-mod-log.c · 2be1f2bf

David Sterba authored Feb 16, 2024

Using static inline in a .c file should be justified, e.g. when
functions are on a hot path but none of the affected functions seem to
be. As it's all in one compilation unit let the compiler decide.
Signed-off-by: David Sterba <dsterba@suse.com>

2be1f2bf

btrfs: uninline btrfs_init_delayed_root() · 585ab692

David Sterba authored Feb 16, 2024

This is a simple initializer and not on any hot path, it does not need
to be static inline.
Signed-off-by: David Sterba <dsterba@suse.com>

585ab692

btrfs: uninline some static inline helpers from backref.h · 2aa756ec

David Sterba authored Feb 16, 2024

There are many helpers doing simple things but not simple enough to
justify the static inline. None of them seems to be on a hot path so
move them to .c.
Signed-off-by: David Sterba <dsterba@suse.com>

2aa756ec

btrfs: open code btrfs_backref_get_eb() · ef923440

David Sterba authored Feb 16, 2024

The helper is trivial, we can inline it. It's safe to remove the 'if' as
the iterator is always valid when used, the potential NULL was never
checked anyway.
Signed-off-by: David Sterba <dsterba@suse.com>

ef923440

btrfs: open code btrfs_backref_iter_free() · 56430c14

David Sterba authored Feb 16, 2024

The helper is trivial and used only once, open code it. It's safe to
remove the 'if', the pointer is validated in build_backref_tree().
Signed-off-by: David Sterba <dsterba@suse.com>

56430c14

btrfs: move balance args conversion helpers to volumes.c · e6052347

David Sterba authored Feb 16, 2024

The from/to CPU/disk helpers for balance args are used only in volumes,
no need to define them in accessors.h.
Signed-off-by: David Sterba <dsterba@suse.com>

e6052347

btrfs: introduce offload_csum_mode to tweak checksum offloading behavior · 2761ece8

Naohiro Aota authored Feb 05, 2024

We disable offloading checksum to workqueues and do it synchronously when
the checksum algorithm is fast. However, as reported in the link below,
RAID0 with multiple devices may suffer from the sync checksum, because
"fast checksum" is still not fast enough to catch up with RAID0 writing.

We don't have an effective way to determine whether to offload or not,
for now add a sysfs knob so this can be debugged. This is intentionally
under CONFIG_BTRFS_DEBUG so ti's not exposed to users as it may be
removed in the future agin.

Introduce fs_devices->offload_csum_mode, so that a btrfs developer can
change the behavior by writing to /sys/fs/btrfs/<uuid>/offload_csum. The
default is "auto" which is the same as the previous behavior. Or, you
can set "on" or "off" (or "y" or "n" whatever kstrtobool() accepts) to
always/never offload checksum.

More benchmark need to be collected with this knob to implement a proper
criteria to enable/disable checksum offloading.

Link: https://lore.kernel.org/linux-btrfs/20230731152223.4EFB.409509F4@e16-tech.com/
Link: https://lore.kernel.org/linux-btrfs/p3vo3g7pqn664mhmdhlotu5dzcna6vjtcoc2hb2lsgo2fwct7k@xzaxclba5tae/Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

2761ece8

btrfs: raid56: extra debugging for raid6 syndrome generation · b2324e08

Qu Wenruo authored Jan 26, 2024

[BUG]
I have got at least two crash report for RAID6 syndrome generation, no
matter if it's AVX2 or SSE2, they all seems to have a similar
calltrace with corrupted RAX:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP PTI
  Workqueue: btrfs-rmw rmw_rbio_work [btrfs]
  RIP: 0010:raid6_sse21_gen_syndrome+0x9e/0x130 [raid6_pq]
  RAX: 0000000000000000 RBX: 0000000000001000 RCX: ffffa0ff4cfa3248
  RDX: 0000000000000000 RSI: ffffa0f74cfa3238 RDI: 0000000000000000
  Call Trace:
   <TASK>
   rmw_rbio+0x5c8/0xa80 [btrfs]
   process_one_work+0x1c7/0x3d0
   worker_thread+0x4d/0x380
   kthread+0xf3/0x120
   ret_from_fork+0x2c/0x50
   </TASK>

[CAUSE]
The cause is not known.  Recently I also hit this in AVX512 path, and
that's even in v5.15 backport, which doesn't have any of my RAID56
rework.

Furthermore according to the registers:

  RAX: 0000000000000000 RBX: 0000000000001000 RCX: ffffa0ff4cfa3248

The RAX register is showing the number of stripes (including PQ), which
is not correct (0).  But the remaining two registers are all sane.

- RBX is the sectorsize
  For x86_64 it should always be 4K and matches the output.

- RCX is the pointers array
  Which is from rbio->finish_pointers, and it looks like a sane
  kernel address.

[WORKAROUND]
For now, I can only add extra debug ASSERT()s before we call raid6
gen_syndrome() helper and hopes to catch the problem.

The debug requires both CONFIG_BTRFS_DEBUG and CONFIG_BTRFS_ASSERT
enabled.

My current guess is some use-after-free, but every report is only having
corrupted RAX but seemingly valid pointers doesn't make much sense.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

b2324e08

btrfs: avoid unnecessary ref initialization when freeing log tree block · 74cd8cac

Filipe Manana authored Feb 19, 2024

At btrfs_free_tree_block(), we are always initializing a delayed reference
to drop the given extent buffer but we only use if it does not belong to a
log root tree. So we are doing unnecessary work here and increasing the
duration of a critical section as this is normally called while holding a
lock on the parent tree block (if any) and while holding a log transaction
open.

So initialize the delayed reference only if the extent buffer is not from
a log tree, avoiding unnecessary work and making the code also a bit
easier to follow.
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

74cd8cac

btrfs: send: avoid duplicated search for last extent when sending hole · 0e9e135e

Filipe Manana authored Feb 17, 2024

During an incremental send, before determining if we need to send a hole
(write operations full of zeroes) we will search for the last extent's
end offset if we are at the first slot of a leaf and the last processed
extent's end offset is smaller then the current extent's start offset.
However we are repeating this search in case we had the last extent's end
offset undefined (set to the (u64)-1 value) when we entered
maybe_send_hole(), wasting time.

So avoid this duplicated search by combining the two conditions that
trigger a search for the last extent's end offset into a single if
statement.
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

0e9e135e

btrfs: factor out validation of btrfs_ioctl_vol_args_v2::name · 0478adff

David Sterba authored Feb 14, 2024

The validation of vol args v2 name in snapshot and device remove ioctls
is not done properly. A terminating NUL is written to the end of the
buffer unconditionally, assuming that this would be the last place in
case the buffer is used completely. This does not communicate back the
actual error (either an invalid or too long path).

Factor out all such cases and use a helper to do the verification,
simply look for NUL in the buffer.  There's no expected practical
change, the size of buffer is 4088, this is enough for most paths or
names.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>

0478adff

btrfs: factor out validation of btrfs_ioctl_vol_args::name · 5ab2b180

David Sterba authored Feb 14, 2024

The validation of vol args name in several ioctls is not done properly.
a terminating NUL is written to the end of the buffer unconditionally,
assuming that this would be the last place in case the buffer is used
completely. This does not communicate back the actual error (either an
invalid or too long path).

Factor out all such cases and use a helper to do the verification,
simply look for NUL in the buffer. There's no expected practical change,
the size of buffer is 4088, this is enough for most paths or names.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>

5ab2b180

btrfs: remove no longer used btrfs_transaction_in_commit() · f33163ee

Filipe Manana authored Feb 13, 2024

The function btrfs_transaction_in_commit() is no longer used, its last
use was removed in commit 11aeb97b ("btrfs: don't arbitrarily slow
down delalloc if we're committing"), so just remove it.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

f33163ee

btrfs: sysfs: drop unnecessary double logical negation in acl_show() · f840ab79

Neal Gompa authored Feb 11, 2024

The IS_ENABLED() macro already guarantees the result will be a
suitable boolean return value ("1" for enabled, and "0" for disabled).
Thus, it seems that the "!!" used right before is unnecessary to force
the 0/1 values.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Neal Gompa <neal@gompa.dev>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

f840ab79

btrfs: delete BUG_ON in btrfs_init_locked_inode() · 636d91d7

David Sterba authored Feb 07, 2024

The purpose of the BUG_ON is not clear. The helper btrfs_grab_root()
could return a NULL in case args->root would be a NULL or if there are
zero references. Then we check if the root pointer stored in the inode
still exists.

The whole call chain is for iget:

btrfs_iget
  btrfs_iget_path
    btrfs_iget_locked
      iget5_locked
	btrfs_init_locked_inode

which is called from many contexts where we the root pointer is used and
we can safely assume has enough references.
Signed-off-by: David Sterba <dsterba@suse.com>

636d91d7

btrfs: delete pointless BUG_ONs on extent item size · bfe8a0cc

David Sterba authored Feb 06, 2024

Checking extent item size in add_inline_refs() is redundant, we do that
already in tree-checker after reading the extent buffer and it won't
change under normal circumstances.  It was added long ago in
8da6d581 ("Btrfs: added btrfs_find_all_roots()") and does not seem
to have a clear purpose.

Similar case in extent_from_logical(), added in a542ad1b ("btrfs:
added helper functions to iterate backrefs").
Signed-off-by: David Sterba <dsterba@suse.com>

bfe8a0cc

btrfs: delete pointless BUG_ON check on quota root in btrfs_qgroup_account_extent() · f40a3ea9

David Sterba authored Feb 06, 2024

The BUG_ON is deep in the qgroup code where we can expect that it
exists. A NULL pointer would cause a crash.

It was added long ago in 550d7a2e ("btrfs: qgroup: Add new qgroup
calculation function btrfs_qgroup_account_extents()."). It maybe made
sense back then as the quota enable/disable state machine was not that
robust as it is nowadays, so we can just delete it.
Signed-off-by: David Sterba <dsterba@suse.com>

f40a3ea9

btrfs: change BUG_ONs to assertions in btrfs_qgroup_trace_subtree() · 4839c386

David Sterba authored Feb 06, 2024

The only caller do_walk_down() of btrfs_qgroup_trace_subtree() validates
the value of level and uses it several times before it's passed as an
argument. Same for root_eb that's called 'next' in the caller.

Change both BUG_ONs to assertions as this is to assure proper interface
use rather than real errors.
Signed-off-by: David Sterba <dsterba@suse.com>

4839c386

btrfs: change BUG_ON to assertion in tree_move_down() · 56f335e0

David Sterba authored Feb 06, 2024

There's only one caller of tree_move_down() that does not pass level 0
so the assertion is better suited here.
Signed-off-by: David Sterba <dsterba@suse.com>

56f335e0

btrfs: send: handle path ref underflow in header iterate_inode_ref() · 3c6ee34c

David Sterba authored Feb 06, 2024

Change BUG_ON to proper error handling if building the path buffer
fails. The pointers are not printed so we don't accidentally leak kernel
addresses.
Signed-off-by: David Sterba <dsterba@suse.com>

3c6ee34c

btrfs: send: handle unexpected inode in header process_recorded_refs() · 5d228871

David Sterba authored Feb 06, 2024

Change BUG_ON to proper error handling when an unexpected inode number
is encountered. As the comment says this should never happen.
Signed-off-by: David Sterba <dsterba@suse.com>

5d228871

btrfs: send: handle unexpected data in header buffer in begin_cmd() · e80e3f73

David Sterba authored Feb 06, 2024

Change BUG_ON to a proper error handling in the unlikely case of seeing
data when the command is started. This is supposed to be reset when the
command is finished (send_cmd, send_encoded_extent).
Signed-off-by: David Sterba <dsterba@suse.com>

e80e3f73

btrfs: handle invalid root reference found in may_destroy_subvol() · 6fbc6f4a

David Sterba authored Jan 24, 2024

The may_destroy_subvol() looks up a root by a key, allowing to do an
inexact search when key->offset is -1.  It's never expected to find such
item, as it would break the allowed range of a root id.
Signed-off-by: David Sterba <dsterba@suse.com>

6fbc6f4a

btrfs: handle invalid extent item reference found in find_first_extent_item() · f626a0f5

David Sterba authored Jan 24, 2024

The find_first_extent_item() helper looks up an extent item by a key,
allowing to do an inexact search when key->offset is -1.  It's never
expected to find such item, as it would break the allowed range of a
extent item offset.
Signed-off-by: David Sterba <dsterba@suse.com>

f626a0f5

btrfs: handle invalid extent item reference found in extent_from_logical() · 11dcc86e

David Sterba authored Jan 24, 2024

The extent_from_logical() helper looks up an extent item by a key,
allowing to do an inexact search when key->offset is -1.  It's never
expected to find such item, as it would break the allowed range of a
extent item offset.

The same error is already handled in btrfs_backref_iter_start() so add a
comment for consistency.
Signed-off-by: David Sterba <dsterba@suse.com>

11dcc86e

btrfs: update comment and drop assertion in extent item lookup in find_parent_nodes() · 5b957989

David Sterba authored Jan 24, 2024

Same comment was added to this type of error, unify that and drop the
assertion as we'd find out quickly that something is wrong after
returning -EUCLEAN.
Signed-off-by: David Sterba <dsterba@suse.com>

5b957989

btrfs: push errors up from add_async_extent() · dbe6cda6

David Sterba authored Jan 24, 2024

The memory allocation error in add_async_extent() is not handled
properly, return an error and push the BUG_ON to the caller. Handling it
there is not trivial so at least make it visible.
Signed-off-by: David Sterba <dsterba@suse.com>

dbe6cda6

btrfs: remove do_list variable at btrfs_clear_delalloc_extent() · 4e94ee80

Filipe Manana authored Feb 09, 2024

The "do_list" variable has a rather confusing name, so remove it and
directly use btrfs_is_free_space_inode() instead.
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

4e94ee80

btrfs: remove do_list variable at btrfs_set_delalloc_extent() · 99c15fec

Filipe Manana authored Feb 09, 2024

The "do_list" variable is only used once, plus its name/meaning is a bit
confusing, so remove it and directory use btrfs_is_free_space_inode().
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

99c15fec

btrfs: use assertion instead of BUG_ON when adding/removing to delalloc list · d23626d8

Filipe Manana authored Feb 09, 2024

When adding or removing and inode to/from the root's delalloc list,
instead of using a BUG_ON() to validate list emptiness, use ASSERT()
since this is to check logic errors rather than real errors.
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

d23626d8