Commits · 55936afe11077a84d9e1c5068169af328bbf2811 · Kirill Smelkov / linux

03 Apr, 2024 10 commits

bcachefs: Flag btrees with missing data · 55936afe

Kent Overstreet authored Mar 15, 2024

We need this to know when we should attempt to reconstruct the snapshots
btree
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

55936afe

bcachefs: Topology repair now uses nodes found by scanning to fill holes · 43f5ea46

Kent Overstreet authored Mar 16, 2024

With the new btree node scan code, we can now recover from corrupt btree
roots - simply create a new fake root at depth 1, and then insert all
the leaves we found.

If the root wasn't corrupt but there's corruption elsewhere in the
btree, we can fill in holes as needed with the newest version of a given
node(s) from the scan; we also check if a given btree node is older than
what we found from the scan.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

43f5ea46

bcachefs: Repair pass for scanning for btree nodes · 4409b808

Kent Overstreet authored Mar 11, 2024

If a btree root or interior btree node goes bad, we're going to lose a
lot of data, unless we can recover the nodes that it pointed to by
scanning.

Fortunately btree node headers are fully self describing, and
additionally the magic number is xored with the filesytem UUID, so we
can do so safely.

This implements the scanning - next patch will rework topology repair to
make use of the found nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4409b808

bcachefs: Don't skip fake btree roots in fsck · b268aa4e

Kent Overstreet authored Mar 10, 2024

When a btree root is unreadable, we might still have keys fro the
journal to walk and mark.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b268aa4e

bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake() · f2f61f41
Kent Overstreet authored Mar 14, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
f2f61f41

bcachefs: Etyzinger cleanups · ca1e02f7

Kent Overstreet authored Mar 22, 2024

Pull out eytzinger.c and kill eytzinger_cmp_fn. We now provide
eytzinger0_sort and eytzinger0_sort_r, which use the standard cmp_func_t
and cmp_r_func_t callbacks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ca1e02f7

bcachefs: bch2_shoot_down_journal_keys() · bdbf953b
Kent Overstreet authored Mar 19, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
bdbf953b
bcachefs: Clear recovery_passes_required as they complete without errors · 27fcec6c
Kent Overstreet authored Mar 30, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
27fcec6c
bcachefs: ratelimit informational fsck errors · fa14b504
Kent Overstreet authored Apr 02, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
fa14b504

bcachefs: Check for bad needs_discard before doing discard · 7ee88737

Kent Overstreet authored Apr 02, 2024

In the discard worker, we were failing to validate the bucket state -
meaning a corrupt needs_discard btree could cause us to discard a bucket
that we shouldn't.

If check_alloc_info hasn't run yet we just want to bail out, otherwise
it's a filesystem inconsistent error.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

7ee88737

02 Apr, 2024 5 commits

bcachefs: Improve bch2_btree_update_to_text() · e0319af2

Kent Overstreet authored Apr 02, 2024

Print out the mode as a string, and also print out the btree and
watermark.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e0319af2

mean_and_variance: Drop always failing tests · 97ca7c1f

Guenter Roeck authored Feb 25, 2024

mean_and_variance_test_2 and mean_and_variance_test_4 always fail.
The input parameters to those tests are identical to the input parameters
to tests 1 and 3, yet the expected result for tests 2 and 4 is different
for the mean and stddev tests. That will always fail.

     Expected mean_and_variance_get_mean(mv) == mean[i], but
        mean_and_variance_get_mean(mv) == 22 (0x16)
        mean[i] == 10 (0xa)

Drop the bad tests.

Fixes: 65bc4109 ("mean and variance: More tests")
Closes: https://lore.kernel.org/lkml/065b94eb-6a24-4248-b7d7-d3212efb4787@roeck-us.net/
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

97ca7c1f

bcachefs: fix nocow lock deadlock · c42cd606
Kent Overstreet authored Apr 02, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
c42cd606

bcachefs: BCH_WATERMARK_interior_updates · e2a316b3

Kent Overstreet authored Apr 01, 2024

This adds a new watermark, higher priority than BCH_WATERMARK_reclaim,
for interior btree updates. We've seen a deadlock where journal replay
triggers a ton of btree node merges, and these use up all available open
buckets and then interior updates get stuck.

One cause of this is that we're currently lacking btree node merging on
write buffer btrees - that needs to be fixed as well.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e2a316b3

bcachefs: Fix btree node reserve · ba947ecd

Kent Overstreet authored Apr 01, 2024

Sign error when checking the watermark - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ba947ecd

01 Apr, 2024 25 commits

bcachefs: On emergency shutdown, print out current journal sequence number · b3c7fd35
Kent Overstreet authored Mar 30, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
b3c7fd35

bcachefs: Fix overlapping extent repair · eab3a3ce

Kent Overstreet authored Mar 30, 2024

overlapping extent repair was colliding with extent past end of inode
checks - don't update "extent ends at" until we know we have an extent.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

eab3a3ce

bcachefs: Fix remove_dirent() · 8ce1db80

Kent Overstreet authored Apr 01, 2024

We were missing an iter_traverse().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8ce1db80

bcachefs: Logged op errors should be ignored · cecfed9b

Kent Overstreet authored Mar 31, 2024

If something is wrong with a logged op, we just want to delete it -
there's nothing to repair.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

cecfed9b

bcachefs: Improve -o norecovery; opts.recovery_pass_limit · 13c1e583

Kent Overstreet authored Mar 28, 2024

This adds opts.recovery_pass_limit, and redoes -o norecovery to make use
of it; this fixes some issues with -o norecovery so it can be safely
used for data recovery.

Norecovery means "don't do journal replay"; it's an important data
recovery tool when we're getting stuck in journal replay.

When using it this way we need to make sure we don't free journal keys
after startup, so we continue to overlay them: thus it needs to imply
retain_recovery_info, as well as nochanges.

recovery_pass_limit is an explicit option for telling recovery to exit
after a specific recovery pass; this is a much cleaner way of
implementing -o norecovery, as well as being a useful debug feature in
its own right.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

13c1e583

bcachefs: bch2_run_explicit_recovery_pass_persistent() · 060ff30a

Kent Overstreet authored Mar 29, 2024

Flag that we need to run a recovery pass and run it - persistenly, so if
we crash it'll still get run.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

060ff30a

bcachefs: Ensure bch_sb_field_ext always exists · 0a34c058

Kent Overstreet authored Mar 30, 2024

This makes bch_sb_field_ext more consistent with the rest of -o
nochanges - we don't want to be varying other codepaths based on -o
nochanges, since it's used for testing in dry run mode; also fixes some
potential null ptr derefs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0a34c058

bcachefs: Flush journal immediately after replay if we did early repair · 4fe0eeea
Kent Overstreet authored Mar 28, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
4fe0eeea

bcachefs: Resume logged ops after fsck · af855a5f

Kent Overstreet authored Mar 23, 2024

Finishing logged ops requires the filesystem to be in a reasonably
consistent state - and other fsck passes don't require it to have
completed, so just run it last.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

af855a5f

bcachefs: Add error messages to logged ops fns · e5aa8046
Kent Overstreet authored Mar 23, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
e5aa8046

bcachefs: Split out recovery_passes.c · d2554263

Kent Overstreet authored Mar 23, 2024

We've grown a fair amount of code for managing recovery passes; tracking
which ones we're running, which ones need to be run, and flagging in the
superblock which ones need to be run on the next recovery.

So it's worth splitting out into its own file, this code is pretty
different from the code in recovery.c.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d2554263

bcachefs: fix backpointer for missing alloc key msg · 11d5568d
Kent Overstreet authored Mar 28, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
11d5568d

bcachefs: Fix bch2_btree_increase_depth() · 7f9e5080

Kent Overstreet authored Mar 14, 2024

When we haven't yet allocated any btree nodes for a given btree, we
first need to call the regular split path to allocate one.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

7f9e5080

bcachefs: Kill bch2_bkey_ptr_data_type() · 47d2080e

Kent Overstreet authored Mar 25, 2024

Remove some duplication, and inconsistency between check_fix_ptrs and
the main ptr marking paths
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

47d2080e

bcachefs: Fix use after free in check_root_trans() · dcc1c045
Kent Overstreet authored Mar 26, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
dcc1c045
bcachefs: Fix repair path for missing indirect extents · 83bb5853
Kent Overstreet authored Mar 26, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
83bb5853
bcachefs: Fix use after free in bch2_check_fix_ptrs() · 6f5869ff
Kent Overstreet authored Mar 26, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
6f5869ff

bcachefs: Fix btree node keys accounting in topology repair path · 812a9297

Kent Overstreet authored Mar 26, 2024

When dropping keys now outside a now because we're changing the node
min/max, we need to redo the node's accounting as well.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

812a9297

bcachefs: Check btree ptr min_key in .invalid · 805b535a
Kent Overstreet authored Mar 25, 2024
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
805b535a

bcachefs: add REQ_SYNC and REQ_IDLE in write dio · bb660099

zhuxiaohui authored Mar 26, 2024

when writing file with direct_IO on bcachefs, then performance is
much lower than other fs due to write back throttle in block layer:

        wbt_wait+1
        __rq_qos_throttle+32
        blk_mq_submit_bio+394
        submit_bio_noacct_nocheck+649
        bch2_submit_wbio_replicas+538
        __bch2_write+2539
        bch2_direct_write+1663
        bch2_write_iter+318
        aio_write+355
        io_submit_one+1224
        __x64_sys_io_submit+169
        do_syscall_64+134
        entry_SYSCALL_64_after_hwframe+110

add set REQ_SYNC and REQ_IDLE in bio->bi_opf as standard dirct-io
Signed-off-by: zhuxiaohui <zhuxiaohui.400@bytedance.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bb660099

bcachefs: Improved topology repair checks · 79032b07

Kent Overstreet authored Mar 23, 2024

Consolidate bch2_gc_check_topology() and btree_node_interior_verify(),
and replace them with an improved version,
bch2_btree_node_check_topology().

This checks that children of an interior node correctly span the full
range of the parent node with no overlaps.

Also, ensure that topology repairs at runtime are always a fatal error;
in particular, this adds a check in btree_iter_down() - if we don't find
a key while walking down the btree that's indicative of a topology error
and should be flagged as such, not a null ptr deref.

Some checks in btree_update_interior.c remaining BUG_ONS(), because we
already checked the node for topology errors when starting the update,
and the assertions indicate that we _just_ corrupted the btree node -
i.e. the problem can't be that existing on disk corruption, they
indicate an actual algorithmic bug.

In the future, we'll be annotating the fsck errors list with which
recovery pass corrects them; the open coded "run explicit recovery pass
or fatal error" in bch2_btree_node_check_topology() will in the future
be done for every fsck_err() call.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

79032b07

bcachefs: Be careful about btree node splits during journal replay · 40cb2623
Kent Overstreet authored Mar 26, 2024
```
Don't pick a pivot that's going to be deleted.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
40cb2623

bcachefs: btree_and_journal_iter now respects trans->journal_replay_not_finished · 048f47e8

Kent Overstreet authored Mar 25, 2024

btree_and_journal_iter is now safe to use at runtime, not just during
recovery before journal keys have been freed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

048f47e8

bcachefs: fix trans->mem realloc in __bch2_trans_kmalloc · 36f9ef10

Hongbo Li authored Mar 25, 2024

The old code doesn't consider the mem alloced from mempool when call
krealloc on trans->mem. Also in bch2_trans_put, using mempool_free to
free trans->mem by condition "trans->mem_bytes == BTREE_TRANS_MEM_MAX"
is inaccurate when trans->mem was allocated by krealloc function.
Instead, we use used_mempool stuff to record the situation, and realloc
or free the trans->mem in elegant way.

Also, after krealloc failed in __bch2_trans_kmalloc, the old data
should be copied to the new buffer when alloc from mempool_alloc.

Fixes: 31403dca ("bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONS")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

36f9ef10

bcachefs: Don't do extent merging before journal replay is finished · 57339b24

Kent Overstreet authored Mar 23, 2024

We don't normally do extent updates this early in recovery, but some of
the repair paths have to and when we do, we don't want to do anything
that requires the snapshots table.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

57339b24