Commits · 595c1e9bab7fd5512250d0e297e50a549af59b1f · Kirill Smelkov / linux

22 Oct, 2023 40 commits

Kent Overstreet authored Apr 28, 2021

There were some overflows in the time conversion functions - fix this by
converting tv_sec and tv_nsec separately. Also, set sb->time_min and
sb->time_max.

Fixes xfstest generic/258.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

595c1e9b

bcachefs: Add a tracepoint for when we block on journal reclaim · 4f6dad46

Kent Overstreet authored Apr 29, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4f6dad46

bcachefs: Make sure to initialize j->last_flushed · 2ce867df

Kent Overstreet authored Apr 28, 2021

If the journal reclaim thread makes it to the timeout without ever
initializing j->last_flushed, we could end up sleeping for a very long
time.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

2ce867df

bcachefs: Ensure that fpunch updates inode timestamps · 050197b1

Kent Overstreet authored Apr 28, 2021

Fixes xfstests generic/059
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

050197b1

bcachefs: Change copygc wait amount to be min of per device waits · d4b44223

Kent Overstreet authored Apr 27, 2021

We're seeing a filesystem get stuck when all devices but one have no
more reclaimable buckets - because the copygc wait amount is curretly
filesystem wide.

This patch should fix that, possibly at the expensive of running too
much when only one or a few devices is full and the rebalance thread
needs to move data around.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d4b44223

bcachefs: Change bch2_btree_key_cache_count() to exclude dirty keys · baa65029

Kent Overstreet authored Apr 27, 2021

We're seeing livelocks that appear to be due to
bch2_btree_key_cache_scan repeatedly scanning and blocking other tasks
from using the key cache lock - we probably shouldn't be reporting
objects that can't actually be freed yet.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

baa65029

bcachefs: Call bch2_inconsistent_error() on missing stripe/indirect extent · d99af4f1
Kent Overstreet authored Apr 29, 2021
```
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
d99af4f1

bcachefs: New tracepoint for bch2_trans_get_iter() · 3dea728c

Kent Overstreet authored Apr 29, 2021

Trying to debug an issue where after traverse_all() we shouldn't have to
traverse any iterators... yet we are
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3dea728c

bcachefs: Fix __bch2_trans_get_iter() · d36cdb04

Kent Overstreet authored Apr 27, 2021

We need to also set iter->uptodate to indicate it needs to be traversed.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d36cdb04

bcachefs: Evict btree nodes we're deleting · ceda1b9a

Kent Overstreet authored Apr 25, 2021

There was a bug that led to duplicate btree node pointers being inserted
at the wrong level. The new topology repair code can fix that, except
that the btree cache code gets confused when we read in a btree node
from the pointer that was at the wrong level. This patch evicts nodes
that we're deleting to, which nicely solves the problem.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ceda1b9a

bcachefs: New check_nlinks algorithm for snapshots · fc51b041

Kent Overstreet authored Apr 21, 2021

With snapshots, using a radix tree for the table of link counts won't
work anymore because we also need to distinguish between inodes with
different snapshot IDs. Instead, this patch builds up a sorted array of
inodes that have hardlinks that we can binary search on - taking
advantage of the fact that with inode backpointers, the check_nlinks()
pass _only_ needs to concern itself with inodes that have hardlinks now.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

fc51b041

bcachefs: Fix a null ptr deref · e3b4b48c

Kent Overstreet authored Apr 24, 2021

Fix a few memory safety issues, found by asan in userspace.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e3b4b48c

bcachefs: New and improved topology repair code · aae15aaf

Kent Overstreet authored Apr 24, 2021

This splits out btree topology repair into a separate pass, and makes
some improvements:
 - When we have to pick which of two overlapping nodes to drop keys
   from, we use the btree node header sequence number to preserve the
   newer node

 - the gc code has been changed so that it doesn't bail out if we're
   continuing/ignoring on fsck error - this way the dump tool can skip
   running the repair pass but still walk all reachable metadata

 - add a new superblock flag indicating when a filesystem is known to
   have btree topology issues, and the topology repair pass should be
   run

 - changing the start/end of a node might mean keys in that node have to
   be deleted: this patch handles that better by splitting it out into a
   separate function and running it explicitly in the topology repair
   code, previously those keys were only being dropped when the btree
   node was read in.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

aae15aaf

bcachefs: Fix key cache assertion · 4932e07e

Kent Overstreet authored Apr 24, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4932e07e

bcachefs: New helper __bch2_btree_insert_keys_interior() · 0098376f

Kent Overstreet authored Apr 23, 2021

Consolidate common parts of bch2_btree_insert_keys_interior() and
btree_split_insert_keys() - prep work for adding some new topology
assertions.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0098376f

bcachefs: Rewrite btree nodes with errors · bcd25dac

Kent Overstreet authored Apr 24, 2021

This patch adds self healing functionality for btree nodes - if we
notice a problem when reading a btree node, we just rewrite it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcd25dac

bcachefs: Fix bch2_verify_keylist_sorted · 8058b532

Kent Overstreet authored Apr 24, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8058b532

bcachefs: Fix an out of bounds read · bc2e5d5c

Kent Overstreet authored Apr 24, 2021

bch2_varint_decode() can read up to 7 bytes past the end of the buffer,
which means we need to allocate slightly larger key cache buffers.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bc2e5d5c

bcachefs: Use mmap() instead of vmalloc_exec() in userspace · 65c0601a

Kent Overstreet authored Apr 24, 2021

Calling mmap() directly is much better than malloc() then mprotect(), we
end up with much less address space fragmentation.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

65c0601a

bcachefs: Don't BUG_ON() btree topology error · 537c32f5

Kent Overstreet authored Apr 23, 2021

This replaces an assertion in the btree merge path with a
bch2_inconsistent_error() - fsck will fix it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

537c32f5

bcachefs: Fix repair leading to replicas not marked · 1c8441be

Kent Overstreet authored Apr 23, 2021

bch2_check_fix_ptrs() was being called after checking if the replicas
set was marked - but repair could change which replicas set needed to be
marked. Oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1c8441be

bcachefs: Lookup/create lost+found lazily · 58686a25

Kent Overstreet authored Apr 19, 2021

This is prep work for subvolumes - each subvolume will have its own
lost+found.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

58686a25

bcachefs: Don't BUG() in update_replicas · eb365fbc

Kent Overstreet authored Apr 21, 2021

Apparently, we have a bug where in mark and sweep while accounting for a
key, a replicas entry isn't found. Change the code to print out the key
we couldn't mark and halt instead of a BUG_ON().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

eb365fbc

bcachefs: Fix a deadlock on journal reclaim · f09517fc

Kent Overstreet authored Apr 20, 2021

Flushing the btree key cache needs to use allocation reserves - journal
reclaim depends on flushing the btree key cache for making forward
progress, and the allocator and copygc depend on journal reclaim making
forward progress.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f09517fc

bcachefs: Update bch2_btree_verify() · 6adaac0b

Kent Overstreet authored Apr 20, 2021

bch2_btree_verify() verifies that the btree node on disk matches what we
have in memory. This patch changes it to verify every replica, and also
fixes it for interior btree nodes - there's a mem_ptr field which is
used as a scratch space and needs to be zeroed out for comparing with
what's on disk.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

6adaac0b

bcachefs: Fix two btree iterator leaks · 7b7278bb

Kent Overstreet authored Apr 20, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

7b7278bb

bcachefs: Punt btree writes to workqueue to submit · 51c804ed

Kent Overstreet authored Apr 06, 2021

We don't want to be submitting IO with btree locks held, and btree
writes usually aren't latency sensitive.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

51c804ed

bcachefs: Fix a use after free · 4d47b21c

Kent Overstreet authored Apr 19, 2021

Turns out, we weren't waiting on in flight btree writes when freeing
existing btree nodes. This lead to stray btree writes overwriting newly
allocated buckets, but only started showing itself with some of the
recent allocator work and another patch to move submitting of btree
writes to worqueues.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4d47b21c

bcachefs: Fix for btree_gc repairing interior btree ptrs · 8ce600d4

Kent Overstreet authored Apr 19, 2021

Using the normal transaction commit path to insert and journal updates
to interior nodes hadn't been done before this repair code was written,
not surprising that there was a bug.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8ce600d4

bcachefs: Preallocate trans mem in bch2_migrate_index_update() · e95d7edf

Kent Overstreet authored Apr 19, 2021

This will help avoid transaction restarts.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e95d7edf

bcachefs: Allocator refactoring · 89baec78

Kent Overstreet authored Apr 17, 2021

This uses the kthread_wait_freezable() macro to simplify a lot of the
allocator thread code, along with cleaning up bch2_invalidate_bucket2().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

89baec78

bcachefs: Always check for invalid bkeys in trans commit path · fa272f33

Kent Overstreet authored Apr 18, 2021

We check for this prior to metadata being written, but we're seeing some
strange bugs lately, and this will help catch those closer to where they
occur.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

fa272f33

bcachefs: Check that keys are in the correct btrees · 27cc532e

Kent Overstreet authored Apr 17, 2021

We've started seeing bug reports of pointers to btree nodes being
detected in leaf nodes. This should catch that before it's happened, and
it's something we should've been checking anyways.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

27cc532e

bcachefs: Handle errors in bch2_trans_mark_update() · 04903131

Kent Overstreet authored Apr 18, 2021

It's not actually the case that iterators are always checked here -
__bch2_trans_commit() checks for that after running triggers.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

04903131

bcachefs: Allocator thread doesn't need gc_lock anymore · 6ad060b0

Kent Overstreet authored Apr 16, 2021

Even with runtime gc (which currently isn't supported), runtime gc no
longer clears/recalculates the main set of bucket marks - it allocates
and calculates another set, updating the primary at the end.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

6ad060b0

bcachefs: gc shouldn't care about owned_by_allocator · dac1525d

Kent Overstreet authored Apr 16, 2021

The owned_by_allocator field is a purely in memory thing, even if/when
we bring back GC at runtime there's no need for it to be recalculating
this field. This is prep work for pulling it out of struct bucket, and
eventually getting rid of the bucket array.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

dac1525d

bcachefs: Refactor bchfs_fallocate() to not nest btree_trans on stack · 694015c2

Kent Overstreet authored Apr 16, 2021

Upcoming patch is going to disallow multiple btree_trans on the stack.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

694015c2

bcachefs: Fix an unused var warning in userspace · f02810a1

Kent Overstreet authored Apr 16, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f02810a1

bcachefs: Fix some small memory leaks · f24fab9c

Kent Overstreet authored Apr 16, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f24fab9c

bcachefs: Simplify fsck remove_dirent() · ae8bbb9f

Kent Overstreet authored Apr 16, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ae8bbb9f