Commits · 6333bd2f1334595c553278c2580c1b155e319e43 · Kirill Smelkov / linux

22 Oct, 2023 40 commits

bcachefs: Improve handling of extents in bch2_trans_update() · 6333bd2f

Kent Overstreet authored Feb 20, 2021

The transaction update/commit path cares about whether it's inserting
extents or regular keys; extents require extra passes (handling of
overlapping extents) but sometimes we want to skip all that. This
clarifies things by adding a new member to btree_insert_entry specifying
whether the key being inserted is an extent, instead of overloading
BTREE_ITER_IS_EXTENTS.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

6333bd2f

bcachefs: Use x-macros for more enums · 2436cb9f

Kent Overstreet authored Feb 20, 2021

This patch standardizes all the enums that have associated string tables
(probably more enums should have string tables).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

2436cb9f

bcachefs: Rename BTREE_ID enums for consistency with other enums · 41f8b09e

Kent Overstreet authored Feb 20, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

41f8b09e

bcachefs: Rename KEY_TYPE_whiteout -> KEY_TYPE_hash_whiteout · 79f88eba

Kent Overstreet authored Feb 20, 2021

Snapshots are going to need a different whiteout key type. Also, switch
to using BCH_BKEY_TYPES() to define the bkey value accessors.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

79f88eba

bcachefs: KEY_TYPE_discard is no longer used · c052cf82

Kent Overstreet authored Feb 19, 2021

KEY_TYPE_discard used to be used for extent whiteouts, but when handling
over overlapping extents was lifted above the core btree code it became
unused. This patch updates various code to reflect that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c052cf82

bcachefs: Kill support for !BTREE_NODE_NEW_EXTENT_OVERWRITE() · f2785955

Kent Overstreet authored Feb 20, 2021

bcachefs has been aggressively migrating filesystems and btree nodes to
the new format for quite some time - this shouldn't affect anyone
anymore, and lets us delete a _lot_ of code. Also, it frees up
KEY_TYPE_discard for a new whiteout key type for snapshots.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f2785955

bcachefs: Fix bch2_btree_cache_scan() · c043a330

Kent Overstreet authored Dec 27, 2021

It was counting nodes on the freed list that it skips - because we want
to leave a few so that btree splits don't touch the allocator - as nodes
that it touched, meaning that if it was called with <= 3 nodes to
reclaim, and those nodes were on the freed list, it would never do any
work.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c043a330

bcachefs: Add a mempool for the replicas delta list · 9620c3ec

Kent Overstreet authored Apr 24, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

9620c3ec

bcachefs: Add a mempool for btree_trans bump allocator · e131b6aa

Kent Overstreet authored Apr 24, 2021

This allocation is required for filesystem operations to make forward
progress, thus needs a mempool.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e131b6aa

bcachefs: Start journal reclaim thread earlier · 9ae28f82

Kent Overstreet authored Jun 21, 2021

Especially in userspace, we sometime run into resource exhaustion issues
with starting up threads after mark and sweep/fsck.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

9ae28f82

bcachefs: Fix for copygc getting stuck waiting for reserve to be filled · 2ee47eec

Kent Overstreet authored Apr 18, 2021

This fixes a regression from the patch
  bcachefs: Fix copygc dying on startup

In general only the allocator thread itself should be updating
ca->allocator_state, the thread waking up the allocator setting it is an
ugly hack only needed to avoid racing with the copygc threads when we're
first starting up.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

2ee47eec

bcachefs: Add allocator thread state to sysfs · bae895a5

Kent Overstreet authored Apr 18, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bae895a5

bcachefs: Rip out copygc pd controller · 51c66fed

Kent Overstreet authored Apr 17, 2021

We have a separate mechanism for ratelimiting copygc now - the pd
controller has only been causing problems.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

51c66fed

bcachefs: Add copygc wait to sysfs · 5bbe4bf9

Kent Overstreet authored Apr 13, 2021

Currently debugging an issue with copygc not running when it's supposed
to, and this is an obvious first step.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5bbe4bf9

bcachefs: Fix copygc threshold · cb66fc5f

Kent Overstreet authored Apr 13, 2021

Awhile back the meaning of is_available_bucket() and thus also
bch_dev_usage->buckets_unavailable changed to include buckets that are
owned by the allocator - this was so that the stat could be persisted
like other allocation information, and wouldn't have to be regenerated
by walking each bucket at mount time.

This broke copygc, which needs to consider buckets that are reclaimable
and haven't yet been grabbed by the allocator thread and moved onta
freelist. This patch fixes that by adding dev_buckets_reclaimable() for
copygc and the allocator thread, and cleans up some of the callers a bit.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

cb66fc5f

bcachefs: Don't drop ptrs to btree nodes · 006d69aa

Kent Overstreet authored Apr 16, 2021

If a ptr gen doesn't match the bucket gen, the bucket likely doesn't
contain the data we want - but it's still possible the data we want
might have been overwritten, and for btree node pointers we can verify
whether or not the node is the one we wanted with the node's sequence
number, so it's better to keep the pointer and try reading from it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

006d69aa

bcachefs: Fix a use-after-free in bch2_gc_mark_key() · d065472c

Kent Overstreet authored Apr 16, 2021

bch2_check_fix_ptrs() can update/reallocate k
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d065472c

bcachefs: Bring back metadata only gc · 41e37786

Kent Overstreet authored Apr 16, 2021

This is useful for the filesystem dump debugging tool - when we're
hitting bugs we want to skip as much of the recovery process as
possible, and the dump tool only needs to know where metadata lives.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

41e37786

bcachefs: Fix bch2_write_super to obey very_degraded option · 98f2197d

Kent Overstreet authored Apr 09, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

98f2197d

bcachefs: Don't fail mounts due to devices that are marked as failed · ed8269cc

Kent Overstreet authored Apr 02, 2021

If a given set of replicas is entirely on failed devices, don't fail the
mount: we will still fail the mount if we have some copies on non failed
devices.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ed8269cc

bcachefs: Add a cond_seched() to the allocator thread · 1b057787

Kent Overstreet authored Apr 05, 2021

This is just a band-aid fix for now.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1b057787

bcachefs: Use x-macros for compat feature bits · 19dd3172

Kent Overstreet authored Apr 04, 2021

This is to generate strings for them, so that we can print them out.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

19dd3172

bcachefs: Fix some (spurious) warnings about uninitialized vars · 33a391a2

Kent Overstreet authored Mar 24, 2021

These are only complained about when building in userspace, for some
reason.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

33a391a2

bcachefs: Fix an allocator startup race · 220d2062

Kent Overstreet authored Mar 11, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

220d2062

bcachefs: Fix bkey format generation for 32 bit fields · e01dacf7

Kent Overstreet authored Mar 20, 2021

Having a packed format that can represent a field larger than the
unpacked type breaks bkey_packed_successor() assertions - we need to fix this to start using the snapshot filed.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e01dacf7

bcachefs: Scan for old btree nodes if necessary on mount · a4805d66

Kent Overstreet authored Mar 22, 2021

We dropped support for !BTREE_NODE_NEW_EXTENT_OVERWRITE but it turned
out there were people who still had filesystems with btree nodes in that
format in the wild. This adds a new compat feature that indicates we've
scanned for and rewritten nodes in the old format, and does that scan at
mount time if the option isn't set.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a4805d66

bcachefs: Add code to scan for/rewite old btree nodes · 1889ad5a

Kent Overstreet authored Mar 14, 2021

This adds a new data job type to scan for btree nodes in the old extent
format, and rewrite them.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1889ad5a

bcachefs: Dump journal state when we get stuck · 85674154

Kent Overstreet authored Feb 24, 2021

We had a bug reported where the journal is failing to allocate a journal
write - this should help figure out what's going on.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

85674154

bcachefs: Fix a 64 bit divide on 32 bit · 514852c2

Kent Overstreet authored Feb 20, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

514852c2

bcachefs: Don't use inode btree key cache in fsck code · fe38b720

Kent Overstreet authored Mar 07, 2021

We had a cache coherency bug with the btree key cache in the fsck code -
this fixes fsck to be consistent about not using it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

fe38b720

bcachefs: Don't call into journal reclaim when we're not supposed to · bcdb4b97

Kent Overstreet authored Mar 07, 2021

This was causing a deadlock when btree_update_nodes_writtes() invokes
journal reclaim because of the btree cache being too dirty.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcdb4b97

bcachefs: Create allocator threads when allocating filesystem · 59a74051

Kent Overstreet authored Mar 05, 2021

We're seeing failures to mount because of a failure to start the
allocator threads, which currently happens fairly late in the mount
process, after walking all metadata, and kthread_create() fails if
something has tried to kill the mount process, which is probably not
what we want.

This patch avoids this issue by creating, but not starting, the
allocator threads when we preallocate all of our other in memory data
structures.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

59a74051

bcachefs: Fix for bch2_btree_node_get_noiter() returning -ENOMEM · 18a7b972

Kent Overstreet authored Feb 23, 2021

bch2_btree_node_get_noiter() isn't used from the btree iterator code,
which retries with the btree node cache cannibalize lock held on
-ENOMEM, so we should do it ourself if necessary.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

18a7b972

bcachefs: Add error message for some allocation failures · dab9ef0d

Kent Overstreet authored Feb 23, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

dab9ef0d

bcachefs: Extents may now cross btree node boundaries · 8042b5b7

Kent Overstreet authored Feb 10, 2021

When snapshots arrive, we won't necessarily be able to arbitrarily split
existis - when we need to split an existing extent, we'll have to check
if the extent was overwritten in child snapshots and if so emit a
whiteout for the split in the child snapshot.

Because extents couldn't span btree nodes previously, journal replay
would sometimes have to split existing extents. That's no good anymore,
but fortunately since extent handling has already been lifted above most
of the btree code there's no real need for that rule anymore.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8042b5b7

bcachefs: iter->real_pos · 7e1a3aa9

Kent Overstreet authored Feb 11, 2021

We need to differentiate between the search position of a btree
iterator, vs. what it actually points at (what we found). This matters
for extents, where iter->pos will typically be the start of the key we
found and iter->real_pos will be the end of the key we found (which soon
won't necessarily be in the same btree node!) and it will also matter
for snapshots.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

7e1a3aa9

bcachefs: Ensure btree iterators are traversed in bch2_trans_commit() · 9f631dc1

Kent Overstreet authored Mar 09, 2021

The upcoming patch to allow extents to span btree nodes will require
this... and this assertion seems to be popping, and it's not a very good
assertion anyways.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

9f631dc1

bcachefs: Drop invalid stripe ptrs in fsck · 0507962f

Kent Overstreet authored Feb 17, 2021

More repair code, now that we can repair extents during initial gc.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0507962f

bcachefs: Fix unnecessary read amplificaiton when allocating ec stripes · 0ef837a0

Robbie Litchfield authored Feb 10, 2021

When allocating an erasure coding stripe, bcachefs will always reuse any
partial stripes before reserving a new stripe. This causes unnecessary
read amplification when preparing a stripe for writing. This patch changes
bcachefs to always reserve new stripes first, only relying on stripe reuse
when copygc needs more time to empty buckets from existing stripes.
Signed-off-by: Robbie Litchfield <blam.kiwi@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0ef837a0

bcachefs: Fsck fixes · 2bb748a6

Kent Overstreet authored Feb 12, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

2bb748a6