Commits · 5ea037d03cabc219f5b2ccd72b7a33fa036c9bfc · Kirill Smelkov / linux

22 Oct, 2023 40 commits

bcachefs: Assert that we're not trying to flush journal seq in the future · 5ea037d0
Kent Overstreet authored Feb 10, 2021
```
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
5ea037d0

bcachefs: Fix bch2_btree_iter_peek_prev() · 3d495595

Kent Overstreet authored Feb 07, 2021

This makes bch2_btree_iter_peek_prev() and bch2_btree_iter_prev()
consistent with peek() and next(), w.r.t. iter->pos.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3d495595

bcachefs: bch2_btree_iter_advance_pos() · 434094be

Kent Overstreet authored Feb 07, 2021

This adds a new common helper for advancing past the last key returned
by peek().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

434094be

bcachefs: Kill bch2_btree_iter_set_pos_same_leaf() · 792e2c4c

Kent Overstreet authored Feb 07, 2021

The only reason we were keeping this around was for
BTREE_INSERT_NOUNLOCK semantics - if bch2_btree_iter_set_pos() advances
to the next leaf node, it'll drop the lock on the node that we just
inserted to.

But we don't rely on BTREE_INSERT_NOUNLOCK semantics for the extents
btree, just the inodes btree, and if we do need it for the extents btree
in the future we can do it more cleanly by cloning the iterator - this
lets us delete some special cases in the btree iterator code, which is
complicated enough as it is.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

792e2c4c

bcachefs: Simplify btree_iter_(next|prev)_leaf() · 2b2c1a89

Kent Overstreet authored Feb 07, 2021

There's no good reason for these functions to not be using
bch2_btree_iter_set_pos().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

2b2c1a89

bcachefs: Fix for hash_redo_key() in fsck · eaf79831

Kent Overstreet authored Feb 09, 2021

It's possible we're calling hash_redo_key() because of a duplicate key -
easiest fix for that is to just not use BCH_HASH_SET_MUST_CREATE.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

eaf79831

bcachefs: Add flushed_seq_ondisk to journal_debug_to_text() · 6a16ad95

Kent Overstreet authored Feb 09, 2021

Also, make the wait in bch2_journal_flush_seq() interruptible, not just
killable.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

6a16ad95

bcachefs: Redo checks for sufficient devices · fcb3431b

Kent Overstreet authored Feb 06, 2021

When the replicas mechanism was added, for tracking data by which drives
it's replicated on, the check for whether we have sufficient devices was
never updated to make use of it. This patch finally does that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

fcb3431b

bcachefs: Run fsck if BCH_FEATURE_alloc_v2 isn't set · 5d428c7c

Kent Overstreet authored Feb 03, 2021

We're using BCH_FEATURE_alloc_v2 to also gate journalling updates to dev
usage - we don't have the code for reconstructing this from buckets
anymore, so we need to run fsck if it's not set.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5d428c7c

bcachefs: Fixes/improvements for journal entry reservations · 4b8f89af

Kent Overstreet authored Feb 03, 2021

This fixes some arithmetic bugs in "bcachefs: Journal updates to dev
usage" - additionally, it cleans things up by switching everything that
goes in every journal entry to the journal_entry_res mechanism.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4b8f89af

bcachefs: Include device in btree IO error messages · 91f6ad6f

Kent Overstreet authored Feb 02, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

91f6ad6f

bcachefs: Journal updates to dev usage · 180fb49d

Kent Overstreet authored Jan 21, 2021

This eliminates the need to scan every bucket to regenerate dev_usage at
mount time.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

180fb49d

bcachefs: Persist 64 bit io clocks · 2abe5420

Kent Overstreet authored Jan 21, 2021

Originally, bcachefs - going back to bcache - stored, for each bucket, a
16 bit counter corresponding to how long it had been since the bucket
was read from. But, this required periodically rescaling counters on
every bucket to avoid wraparound. That wasn't an issue in bcache, where
we'd perodically rewrite the per bucket metadata all at once, but in
bcachefs we're trying to avoid having to walk every single bucket.

This patch switches to persisting 64 bit io clocks, corresponding to the
64 bit bucket timestaps introduced in the previous patch with
KEY_TYPE_alloc_v2.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

2abe5420

bcachefs: KEY_TYPE_alloc_v2 · 7f4e1d5d

Kent Overstreet authored Jan 22, 2021

This introduces a new version of KEY_TYPE_alloc, which uses the new
varint encoding introduced for inodes. This means we'll eventually be
able to support much larger bucket sizes (for SMR devices), and the
read/write time fields are expanded to 64 bits - which will be used in
the next patch to get rid of the periodic rescaling of those fields.

Also, for buckets that are members of erasure coded stripes, this adds
persistent fields for the index of the stripe they're members of and the
stripe redundancy. This is part of work to get rid of having to scan and
read into memory the alloc and stripes btrees at mount time.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

7f4e1d5d

bcachefs: Add missing call to bch2_replicas_entry_sort() · 26452d1d

Kent Overstreet authored Feb 02, 2021

This fixes a bug introduced by "bcachefs: Improve diagnostics when
journal entries are missing" - devices in a replicas entry are supposed
to be sorted.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

26452d1d

bcachefs: Add an assertion to check for journal writes to same location · a28bd48a
Kent Overstreet authored Jan 29, 2021
```
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
a28bd48a

bcachefs: Add an option for metadata_target · d042b040

Kent Overstreet authored Jan 29, 2021

Also, make journal writes obey foreground_target and metadata_target.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d042b040

bcachefs: Repair bad data pointers · 5fc70d3a

Kent Overstreet authored Jan 27, 2021

Now that we can repair metadata during GC, we can handle bad pointers
that would trigger errors being marked, when they need to just be
dropped.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5fc70d3a

bcachefs: Add (partial) support for fixing btree topology · a0b73c1c

Kent Overstreet authored Jan 26, 2021

When we walk the btrees during recovery, part of that is checking that
btree topology is correct: for every interior btree node, its child
nodes should exactly span the range the parent node covers.

Previously, we had checks for this, but not repair code. Now that we
have the ability to do btree updates during initial GC, this patch adds
that repair code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a0b73c1c

bcachefs: Add support for doing btree updates prior to journal replay · 5b593ee1

Kent Overstreet authored Jan 26, 2021

Some errors may need to be fixed in order for GC to successfully run -
walk and mark all metadata. But we can't start the allocators and do
normal btree updates until after GC has completed, and allocation
information is known to be consistent, so we need a different method of
doing btree updates.

Fortunately, we already have code for walking the btree while overlaying
keys from the journal to be replayed. This patch adds an update path
that adds keys to the list of keys to be replayed by journal replay, and
also fixes up iterators.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5b593ee1

bcachefs: Add BTREE_PTR_RANGE_UPDATED · 51d2dfb8

Kent Overstreet authored Jan 26, 2021

This is so that when we discover btree topology issues, we can just
update the pointer to a btree node and signal btree read path that the
min/max keys in the node header should be updated from the node pointer.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

51d2dfb8

bcachefs: Refactor checking of btree topology · a66f7989

Kent Overstreet authored Jan 26, 2021

Still a lot of work to be done here: we can't yet repair btree topology
issues, but this patch refactors things so that we have better access to
what we need in the topology checks. Next up will be figuring out a way
to do btree updates during gc, before journal replay is done.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a66f7989

bcachefs: Improve diagnostics when journal entries are missing · e4c3f386

Kent Overstreet authored Jan 26, 2021

There's an outstanding bug with journal entries being missing in journal
replay. This patch adds code to print out where the journal entries were
physically located that were around the entry(ies) being missing, which
should make debugging easier.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e4c3f386

bcachefs: Fix BCH_REPLICAS_MAX check · 522c25f0

Kent Overstreet authored Jan 26, 2021

Ideally, this limit will be going away in the future.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

522c25f0

bcachefs: Fix build in userspace · 0093a50f

Kent Overstreet authored Jan 27, 2021

The userspace bch_err() macro doesn't use the filesystem argument. Could
also be fixed with a better macro.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0093a50f

bcachefs: Fix an assertion · 4529ae09

Kent Overstreet authored Jan 25, 2021

If we're invalidating a bucket that has cached data in it, data_type
won't be 0 - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4529ae09

bcachefs: Mark superblocks transactionally · bfcf840d

Kent Overstreet authored Jan 22, 2021

More work towards getting rid of the in memory struct bucket: this path
adds code for marking superblock and journal buckets via the btree, and
uses it in the device add and journal resize paths.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bfcf840d

bcachefs: Kill bch2_invalidate_bucket() · 9afc6652

Kent Overstreet authored Jan 22, 2021

This patch is working towards eventually getting rid of the in memory
struct bucket, and relying only on the btree representation.

Since bch2_invalidate_bucket() was only used for incrementing gens, not
invalidating cached data, no other counters were being changed as a side
effect - meaning it's safe for the allocator code to increment the
bucket gen directly.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

9afc6652

bcachefs: Refactor dev usage · 72eab8da

Kent Overstreet authored Jan 21, 2021

This is to make it more amenable for serialization.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

72eab8da

bcachefs: Kill metadata only gc · 079663d8

Kent Overstreet authored Jan 21, 2021

This was useful before we had transactional updates to interior btree
nodes - but now, it's just extra unneeded complexity.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

079663d8

bcachefs: Ensure __bch2_trans_commit() always calls bch2_trans_reset() · b7cf4bd7

Kent Overstreet authored Jan 21, 2021

This was leading to a very strange bug in bch2_bucket_io_time_reset(),
where we'd retry without clearing out the list of updates.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b7cf4bd7

bcachefs: Fix a faulty assertion · fdbb88ac

Kent Overstreet authored Jan 21, 2021

If journal replay hasn't finished, the journal can't be empty - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

fdbb88ac

bcachefs: Switch replicas.c allocations to GFP_KERNEL · e46b8557

Kent Overstreet authored Jan 21, 2021

We're transitioning to memalloc_nofs_save/restore instead of GFP flags
with the rest of the kernel, and GFP_NOIO was excessively strict and
causing unnnecessary allocation failures - these allocations are done
with btree locks dropped.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e46b8557

bcachefs: Fix loopback in dio mode · b4725cc1

Kent Overstreet authored Jan 21, 2021

We had a deadlock on page_lock, because buffered reads signal completion
by unlocking the page, but the dio read path normally dirties the pages
it's reading to with set_page_dirty_lock.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b4725cc1

bcachefs: Clean up bch2_extent_can_insert · ef470b48

Kent Overstreet authored Jan 20, 2021

It was using an internal btree node iterator interface, when
bch2_btree_iter_peek_slot() sufficed. We were hitting a null ptr deref
that looked like it was from the iterator not being uptodate - this will
also fix that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ef470b48

bcachefs: Fix an assertion pop · a5cd80ea

Kent Overstreet authored Jan 20, 2021

There was a race: btree node writes drop their reference on journal pins
before clearing the btree_node_write_in_flight flag.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a5cd80ea

bcachefs: Don't allocate stripes at POS_MIN · 33ccd718

Kent Overstreet authored Jan 18, 2021

In the future, stripe index 0 will be a sentinal value. This patch
doesn't disallow stripes at POS_MIN yet, leaving that for when we do the
on disk format changes.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

33ccd718

bcachefs: Rework allocating buckets for stripes · 6c7585b0

Kent Overstreet authored Jan 18, 2021

Allocating buckets for existing stripes was busted, in part because the
data structures were too contorted. This reworks new stripes so that we
have an array of open buckets that matches blocks in the stripe, and
it's sparse if we're reusing an existing stripe.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

6c7585b0

bcachefs: Verify transaction updates are sorted · f9ef45ad

Kent Overstreet authored Jan 18, 2021

A user reported a bug that implies they might not be correctly sorted,
this should help track that down.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f9ef45ad

bcachefs: Preserve stripe blockcounts on existing stripes · c6e658ee

Kent Overstreet authored Jan 17, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c6e658ee