Commits · 58dda9c10e3fdea488180a74de622432de6b4568 · Kirill Smelkov / linux

08 May, 2024 24 commits

bcachefs: kill metadata only gc · 58dda9c1
Kent Overstreet authored Apr 06, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
58dda9c1

bcachefs: Finish converting reconstruct_alloc to errors_silent · d1b213a0

Kent Overstreet authored Apr 07, 2024

with errors_silent, reconstruct_alloc no longer requires fsck and
fix_errors to work
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d1b213a0

bcachefs: bch2_gc() is now private to btree_gc.c · 68e14240
Kent Overstreet authored Apr 06, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
68e14240
bcachefs: for_each_btree_key_continue() · 665e8b32
Kent Overstreet authored Apr 07, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
665e8b32
bcachefs: kill for_each_btree_key_old() · a21107ee
Kent Overstreet authored Apr 06, 2024
```
Dead code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
a21107ee

bcachefs: Optimize eytzinger0_sort() with bottom-up heapsort · 0ddb5f08

Kuan-Wei Chiu authored Apr 07, 2024

This optimization reduces the average number of comparisons required
from 2*n*log2(n) - 3*n + o(n) to n*log2(n) + 0.37*n + o(n). When n is
sufficiently large, it results in approximately 50% fewer comparisons.

Currently, eytzinger0_sort employs the textbook version of heapsort,
where during the heapify process, each level requires two comparisons
to determine the maximum among three elements. In contrast, the
bottom-up heapsort, during heapify, only compares two children at each
level until reaching a leaf node. Then, it backtracks from the leaf
node to find the correct position. Since heapify typically continues
until very close to the leaf node, the standard heapify requires about
2*log2(n) comparisons, while the bottom-up variant only needs log2(n)
comparisons.

The experimental data presented below is based on an array generated
by get_random_u32().

|   N   | comparisons(old) | comparisons(new) | time(old) | time(new) |
|-------|------------------|------------------|-----------|-----------|
| 10000 |     235381       |     136615       |  25545 us |  20366 us |
| 20000 |     510694       |     293425       |  31336 us |  18312 us |
| 30000 |     800384       |     457412       |  35042 us |  27386 us |
| 40000 |    1101617       |     626831       |  48779 us |  38253 us |
| 50000 |    1409762       |     799637       |  62238 us |  46950 us |
| 60000 |    1721191       |     974521       |  75588 us |  58367 us |
| 70000 |    2038536       |    1152171       |  90823 us |  68778 us |
| 80000 |    2362958       |    1333472       | 104165 us |  78625 us |
| 90000 |    2690900       |    1516065       | 116111 us |  89573 us |
| 100000|    3019413       |    1699879       | 133638 us | 100998 us |

Refs:
  BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average,
  QUICKSORT (if n is not very small)
  Ingo Wegener
  Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993
  https://doi.org/10.1016/0304-3975(93)90364-YSigned-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0ddb5f08

bcachefs: When traversing to interior nodes, propagate result to paths to same leaf node · be31bf43
Kent Overstreet authored Mar 21, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
be31bf43

bcachefs: Don't read journal just for fsck · 4dcd90b6

Kent Overstreet authored Mar 30, 2024

reading the journal can take a decent amount of time compared to the
rest of fsck, let's only read it when required.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4dcd90b6

bcachefs: allow for custom action in fsck error messages · 19391b92

Kent Overstreet authored Mar 28, 2024

Be more explicit to the user about what we're doing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

19391b92

bcachefs: New assertion for writing to the journal after shutdown · 497c982f
Kent Overstreet authored Feb 20, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
497c982f

bcachefs: bch2_btree_path_to_text() · 00589cad

Kent Overstreet authored Apr 05, 2024

Long form version of bch2_btree_path_to_text() - useful in error
messages and tracepoints.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

00589cad

bcachefs: add btree_node_merging_disabled debug param · 55778814
Kent Overstreet authored Apr 05, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
55778814
bcachefs: bch2_hash_lookup() now returns bkey_s_c · ac01928b
Kent Overstreet authored Apr 07, 2024
```
small cleanup
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
ac01928b
bcachefs: bch2_journal_keys_dump() · 6ab71b4a
Kent Overstreet authored Apr 09, 2024
```
debug helper
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
6ab71b4a

bcachefs: bch2_btree_node_header_to_text() · 9089376f

Kent Overstreet authored Apr 10, 2024

better btree node read path error messages
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

9089376f

bcachefs: prt_printf() now respects \r\n\t · 7423330e
Kent Overstreet authored Apr 10, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
7423330e
bcachefs: printbufs: prt_printf() now handles \t\r\n · 2dcb605e
Kent Overstreet authored Apr 10, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
2dcb605e

bcachefs: printbuf improvements · acce32a5

Kent Overstreet authored Apr 10, 2024

- fix assorted (harmless) off-by-one errors
- we were inconsistent on whether out->pos stays <= out->size on
  overflow; now it does, and printbuf.overflow exists to indicate if a
  printbuf has overflowed
- factor out printbuf_advance_pos()
- printbuf_nul_terminate_reserved(); use this to reduce the number of
  printbuf_make_room() calls
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

acce32a5

bcachefs: Run upgrade/downgrade even in -o nochanges mode · 62606398

Kent Overstreet authored Apr 28, 2024

We need to be able to test these paths in dry run mode.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

62606398

bcachefs: Better write_super() error messages · 6d828691

Kent Overstreet authored May 03, 2024

When a superblock write is silently dropped or it's been modified by
another process we need to know which device it was.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

6d828691

bcachefs: Fix xattr_to_text() unsafety · 74768337
Kent Overstreet authored May 08, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
74768337

bcachefs: bch2_bkey_format_field_overflows() · 61692c78

Kent Overstreet authored May 08, 2024

Fix another shift-by-64 by factoring out a common helper for
bch2_bkey_format_invalid() and bformat_needs_redo() (where it was
already fixed).

Reported-by: syzbot+9833a1d29d4a44361e2c@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

61692c78

bcachefs: Fix needs_whiteout BUG_ON() in bkey_sort() · 5dfd3746

Kent Overstreet authored May 08, 2024

Btree nodes are log structured; thus, we need to emit whiteouts when
we're deleting a key that's been written out to disk.

k->needs_whiteout tracks whether a key will need a whiteout when it's
deleted, and this requires some careful handling; e.g. the key we're
deleting may not have been written out to disk, but it may have
overwritten a key that was - thus we need to carry this flag around on
overwrites.

Invariants:
There may be multiple key for the same position in a given node (because
of overwrites), but only one of them will be a live (non deleted) key,
and only one key for a given position will have the needs_whiteout flag
set.

Additionally, we don't want to carry around whiteouts that need to be
written in the main searchable part of a btree node - btree_iter_peek()
will have to skip past them, and this can lead to an O(n^2) issues when
doing sequential deletions (e.g. inode rm/truncate). So there's a
separate region in the btree node buffer for unwritten whiteouts; these
are merge sorted with the rest of the keys we're writing in the btree
node write path.

The unwritten whiteouts was a later optimization that bch2_sort_keys()
didn't take into account; the unwritten whiteouts area means that we
never have deleted keys with needs_whiteout set in the main searchable
part of a btree node.

That means we can simplify and optimize some sort paths, and eliminate
an assertion that syzbot found:

- Unless we're in the btree node write path, it's always ok to drop
  whiteouts when sorting
- When sorting for a btree node write, we drop the whiteout if it's not
  from the unwritten whiteouts area, or if it's overwritten by a real
  key at the same position.

This completely eliminates some tricky logic for propagating the
needs_whiteout flag: syzbot was able to hit the assertion that checked
that there shouldn't be more than one key at the same pos with
needs_whiteout set, likely due to a combination of flipping on
needs_whiteout on all written keys (they need whiteouts if overwritten),
combined with not always dropping unneeded whiteouts, and the tricky
logic in the sort path for preserving needs_whiteout that wasn't really
needed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5dfd3746

bcachefs: Fix sb_clean_validate endianness conversion · 5ad1f33c
Kent Overstreet authored May 08, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
5ad1f33c

07 May, 2024 2 commits

bcachefs: Add missing sched_annotate_sleep() in bch2_journal_flush_seq_async() · 6e297a73
Kent Overstreet authored May 06, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
6e297a73

bcachefs: Fix race in bch2_write_super() · 54541c1f

Kent Overstreet authored May 06, 2024

bch2_write_super() was looping over online devices multiple times -
dropping and retaking io_ref each time.

This meant it could race with device removal; it could increment the
sequence number on a device but fail to write it - and then if the
device was re-added, it would get confused the next time around thinking
a superblock write was silently dropped.

Fix this by taking io_ref once, and stashing pointers to online devices
in a darray.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

54541c1f

06 May, 2024 14 commits

bcachefs: BCH_SB_LAYOUT_SIZE_BITS_MAX · 71dac248

Kent Overstreet authored May 06, 2024

Define a constant for the max superblock size, to avoid a too-large
shift.

Reported-by: syzbot+a8b0fb419355c91dda7f@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

71dac248

bcachefs: Add missing skcipher_request_set_callback() call · 88ab1018
Kent Overstreet authored May 06, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
88ab1018

bcachefs: Fix snapshot_t() usage in bch2_fs_quota_read_inode() · 8060bf1d

Kent Overstreet authored May 05, 2024

bch2_fs_quota_read_inode() wasn't entirely updated to the
bch2_snapshot_tree() helper, which takes rcu lock.

Reported-by: syzbot+a3a9a61224ed3b7f0010@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8060bf1d

bcachefs: Fix shift-by-64 in bformat_needs_redo() · 0ec5b3b7

Kent Overstreet authored May 05, 2024

Ancient versions of bcachefs produced packed formats that could
represent keys that our in memory format cannot represent;
bformat_needs_redo() has some tricky shifts to check for this sort of
overflow.

Reported-by: syzbot+594427aebfefeebe91c6@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0ec5b3b7

bcachefs: Guard against unknown k.k->type in __bkey_invalid() · 2bb9600d

Kent Overstreet authored May 05, 2024

For forwards compatibility we have to allow unknown key types, and only
run the checks that make sense against them.

Fix a missing guard on k.k->type being known.

Reported-by: syzbot+ae4dc916da3ce51f284f@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

2bb9600d

bcachefs: Add missing validation for superblock section clean · f3905522

Kent Overstreet authored May 05, 2024

We were forgetting to check for jset entries that overrun the end of the
section - both in validate and to_text(); to_text() needs to be safe for
types that fail to validate.

Reported-by: syzbot+c48865e11e7e893ec4ab@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f3905522

bcachefs: Fix assert in bch2_alloc_v4_invalid() · 6b8cbfc3

Kent Overstreet authored May 05, 2024

Reported-by: syzbot+10827fa6b176e1acf1d0@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

6b8cbfc3

bcachefs: fix overflow in fiemap · 9a0ec045

Reed Riley authored May 04, 2024

filefrag (and potentially other utilities that call fiemap) sometimes
pass ULONG_MAX as the length.  fiemap_prep clamps excessively large
lengths - but the calculation of end can overflow if it occurs before
calling fiemap_prep.  When this happens, filefrag assumes it has read to
the end and exits.
Signed-off-by: Reed Riley <reed@riley.engineer>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

9a0ec045

bcachefs: Add a better limit for maximum number of buckets · db42549d

Kent Overstreet authored May 04, 2024

The bucket_gens array is a single array allocation (one byte per
bucket), and kernel allocations are still limited to INT_MAX.

Check this limit to avoid failing the bucket_gens array allocation.

Reported-by: syzbot+b29f436493184ea42e2b@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

db42549d

bcachefs: Fix lifetime issue in device iterator helpers · 18b4abce

Kent Overstreet authored May 04, 2024

bch2_get_next_dev() and bch2_get_next_online_dev() iterate over devices,
dropping and taking refs as they go; we can't access the previous device
(for ca->dev_idx) after we've dropped our ref to it, unless we take
rcu_read_lock() first.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

18b4abce

bcachefs: Fix bch2_dev_lookup() refcounting · 3a2d0259

Kent Overstreet authored May 04, 2024

bch2_dev_lookup() is supposed to take a ref on the device it returns, but
for_each_member_device() takes refs as it iterates,
for_each_member_device_rcu() does not.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3a2d0259

bcachefs: Initialize bch_write_op->failed in inline data path · 1267df40

Kent Overstreet authored May 04, 2024

Normally this is initialized in __bch2_write(), which is executed in a
loop, but the inline data path skips this.

Reported-by: syzbot+fd3ccb331eb21f05d13b@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1267df40

bcachefs: Fix refcount put in sb_field_resize error path · feb077c1
Kent Overstreet authored May 03, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
feb077c1

bcachefs: Inodes need extra padding for varint_decode_fast() · 4a8521b6

Kent Overstreet authored May 03, 2024

Reported-by: syzbot+66b9b74f6520068596a9@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4a8521b6