Commits · c9d01179e185f72b20af7e37aa4308f4c7ca7eeb · Kirill Smelkov / linux

An error occurred fetching the project authors.

05 Nov, 2023 1 commit
- bcachefs: Data move path now uses bch2_trans_unlock_long() · f82755e4
  Kent Overstreet authored 1 year ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
  f82755e4
31 Oct, 2023 6 commits

bcachefs: move: move_stats refactoring · 96a363a7

data_progress_list is gone - it was redundant with moving_context_list

The upcoming rebalance rewrite is going to have it using two different
move_stats objects with the same moving_context, depending on whether
it's scanning or using the rebalance_work btree - this patch plumbs
stats around a bit differently so that will work.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

96a363a7

bcachefs: move: convert to bbpos · d5eade93
Kent Overstreet authored 1 year ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
d5eade93

bcachefs: moving_context now owns a btree_trans · 63316903

Kent Overstreet authored 1 year ago

btree_trans and moving_context are used together, and having the
moving_context owns the transaction object reduces some plumbing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

63316903

bcachefs: move.c exports, refactoring · a0bfe3b0

Kent Overstreet authored 1 year ago

Prep work for the new rebalance code - we need a few helpers exported.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a0bfe3b0

bcachefs: Improve io option handling in data move path · 84809057

Kent Overstreet authored 1 year ago

The data move path now correctly picks IO options when inodes in
different snapshots have different options applied.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

84809057

bcachefs: bch2_btree_id_str() · 88dfe193

Kent Overstreet authored 1 year ago

Since we can run with unknown btree IDs, we can't directly index btree
IDs into fixed size arrays.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

88dfe193

22 Oct, 2023 33 commits

bcachefs: More minor smatch fixes · 40a53b92

Kent Overstreet authored 1 year ago

 - fix a few uninitialized return values
 - return a proper error code in lookup_lostfound()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

40a53b92

bcachefs: Heap allocate btree_trans · 6bd68ec2

Kent Overstreet authored 1 year ago

We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.

But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

6bd68ec2

bcachefs: Fix W=12 build errors · 96dea3d5
Kent Overstreet authored 1 year ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
96dea3d5

bcachefs: Break up io.c · 1809b8cb

Kent Overstreet authored 1 year ago

More reorganization, this splits up io.c into
 - io_read.c
 - io_misc.c - fallocate, fpunch, truncate
 - io_write.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1809b8cb

bcachefs: Improve bch2_moving_ctxt_to_text() · 9d2a7bd8

Kent Overstreet authored 1 year ago

Print more information out about moving contexts - fold in the output of
the redundant bch2_data_jobs_to_text(), and also include information
relevant to whether move_data() should be blocked.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

9d2a7bd8

bcachefs: Allow for unknown btree IDs · faa6cb6c

Kent Overstreet authored 1 year ago

We need to allow filesystems with metadata from newer versions to be
mountable and usable by older versions.

This patch enables us to roll out new btrees without a new major version
number; we can now handle btree roots for unknown btree types.

The unknown btree roots will be retained, and fsck (including
backpointers) will check them, the same as other btree types.

We add a dynamic array for the extra, unknown btree roots, in addition
to the fixed size btree root array, and add new helpers for looking up
btree roots.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

faa6cb6c

bcachefs: New error message helpers · 1bb3c2a9

Kent Overstreet authored 1 year ago

Add two new helpers for printing error messages with __func__ and
bch2_err_str():
 - bch_err_fn
 - bch_err_msg

Also kill the old error strings in the recovery path, which were causing
us to incorrectly report memory allocation failures - they're not needed
anymore.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1bb3c2a9

bcachefs: Convert -ENOENT to private error codes · e47a390a

Kent Overstreet authored 1 year ago

As with previous conversions, replace -ENOENT uses with more informative
private error codes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e47a390a

bcachefs: Delete an incorrect bch2_trans_unlock() · a49bd8c0

Kent Overstreet authored 1 year ago

These deletes a bch2_trans_unlock() call from __bch2_move_data(). It was
redundant; bch2_move_extent() has the correct unlock call, and it was
buggy because when move_extent calls bch2_extent_drop_ptrs() we don't
want the transaction to be unlocked yet - this fixes a btree_iter.c
assertion.

Fixes https://github.com/koverstreet/bcachefs/issues/511.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a49bd8c0

bcachefs: bch2_bkey_make_mut() now calls bch2_trans_update() · dbda63bb

Kent Overstreet authored 1 year ago

It's safe to call bch2_trans_update with a k/v pair where the value
hasn't been filled out, as long as the key part has been and the value
is filled out by transaction commit time.

This patch folds the bch2_trans_update() call into bch2_bkey_make_mut(),
eliminating a bit of boilerplate.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

dbda63bb

bcachefs: Kill bch2_verify_bucket_evacuated() · 1af5227c

Kent Overstreet authored 1 year ago

With backpointers, it's now impossible for bch2_evacuate_bucket() to be
completely reliable: it can race with an extent being partially
overwritten or split, which needs a new write buffer flush for the
backpointer to be seen.

This shouldn't be a real issue in practice; the previous patch added a
new tracepoint so we'll be able to see more easily if it is.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1af5227c

bcachefs: Improve move path tracepoints · 5a21764d

Kent Overstreet authored 1 year ago

Move path tracepoints now include the key being moved. Also, add new
tracepoints for the start of move_extent, and evacuate_bucket.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5a21764d

bcachefs: Rip out code for storing backpointers in alloc keys · 62a03559

Kent Overstreet authored 1 year ago

We don't store backpointers in alloc keys anymore, since we gained the
btree write buffer.

This patch drops support for backpointers in alloc keys, and revs the on
disk format version so that we know a fsck is required.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

62a03559

bcachefs: Use BTREE_ITER_INTENT in ec_stripe_update_extent() · 6bdefe9c

Kent Overstreet authored 1 year ago

This adds a flags param to bch2_backpointer_get_key() so that we can
pass BTREE_ITER_INTENT, since ec_stripe_update_extent() is updating the
extent immediately.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

6bdefe9c

bcachefs: Fix bch2_verify_bucket_evacuated() · ffc76edb

Kent Overstreet authored 1 year ago

We were going into an infinite loop when printing out backpointers, due
to never incrementing bp_offset - whoops.

Also limit the number of backpointers we print to 10; this is debug code
and we only need to print a sample, not all of them.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ffc76edb

bcachefs: verify_bucket_evacuated() -> set_btree_iter_dontneed() · d59ca7e8

Kent Overstreet authored 1 year ago

This should help with excessive 'would deadlock' transaction restarts.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d59ca7e8

bcachefs: Fix an unhandled transaction restart error · 3e36e572

Kent Overstreet authored 1 year ago

This is a bit awkward: we're passing around a btree_trans, but we're not
in a context where transaction restarts are handled - we should try to
come up with a better way to denote situations like this.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3e36e572

bcachefs: New erasure coding shutdown path · b40901b0

Kent Overstreet authored 1 year ago

This implements a new shutdown path for erasure coding, which is needed
for the upcoming BCH_WRITE_WAIT_FOR_EC write path.

The process is:
 - Cancel new stripes being built up
 - Close out/cancel open buckets on write points or the partial list
   that are for stripes
 - Shutdown rebalance/copygc
 - Then wait for in flight new stripes to finish

With BCH_WRITE_WAIT_FOR_EC, move ops will be waiting on stripes to fill
up before they complete; the new ec shutdown path is needed for shutting
down copygc/rebalance without deadlocking.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b40901b0

bcachefs: bch2_fs_moving_ctxts_to_text() · b9fa375b

Kent Overstreet authored 1 year ago

This also adds bch2_write_op_to_text(): now we can see outstand moves,
useful for debugging shutdown with the upcoming BCH_WRITE_WAIT_FOR_EC
and likely for other things in the future.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b9fa375b

bcachefs: evacuate_bucket() no longer moves cached ptrs · 3f5d3fb4
Kent Overstreet authored 1 year ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
3f5d3fb4

bcachefs: evacuate_bucket() no longer calls verify_bucket_evacuated() · 5bf9db01

Kent Overstreet authored 1 year ago

The copygc code itself now calls this when all moves from a given bucket
are complete.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5bf9db01

bcachefs: Improved copygc pipelining · 8fcdf814

Kent Overstreet authored 2 years ago

This improves copygc pipelining across multiple buckets: we now track
each in flight bucket we're evacuating, with separate moving_contexts.

This means that whereas previously we had to wait for outstanding moves
to complete to ensure we didn't try to evacuate the same bucket twice,
we can now just check buckets we want to evacuate against the pending
list.

This also mean we can run the verify_bucket_evacuated() check without
killing pipelining - meaning it can now always be enabled, not just on
debug builds.

This is going to be important for the upcoming erasure coding work,
where moving IOs that are being erasure coded will now skip the initial
replication step; instead the IOs will wait on the stripe to complete.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8fcdf814

bcachefs: moving_context->stats is allowed to be NULL · 2f528663
Kent Overstreet authored 1 year ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
2f528663
bcachefs: Don't call bch2_trans_update() unlocked · e9b70146
Kent Overstreet authored 2 years ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
e9b70146

bcachefs: Fragmentation LRU · 80c33085

Kent Overstreet authored 2 years ago

Now that we have much more efficient updates to the LRU btree, this
patch adds a new LRU that indexes buckets by fragmentation.

This means copygc no longer has to scan every bucket to find buckets
that need to be evacuated.

Changes:
 - A new field in bch_alloc_v4, fragmentation_lru - this corresponds to
   the bucket's position in the fragmentation LRU. We add a new field
   for this instead of calculating it as needed because we may make the
   fragmentation LRU optional; this field indicates whether a bucket is
   on the fragmentation LRU.

   Also, zoned devices will introduce variable bucket sizes; explicitly
   recording the LRU position will be safer for them.

 - A new copygc path for using the fragmentation LRU instead of
   scanning every bucket and building up an in-memory heap.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

80c33085

bcachefs: Fix verify_bucket_evacuated() · 429dd427

Kent Overstreet authored 2 years ago

This fixes an incorrectly handled transaction restart.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

429dd427

bcachefs: Add max nr of IOs in flight to the move path · c782c583
Kent Overstreet authored 2 years ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
c782c583

bcachefs: Fix deadlock on nocow locks in data move path · 7ffb6a7e

Kent Overstreet authored 2 years ago

The recent nocow locking rework introduced a deadlock in the data move
path: the new nocow locking scheme uses a hash table with a fixed size
array for chaining, meaning on hash collision we may have to wait for
other locks to be released before we can lock a bucket.

And since the data move path needs to submit writes from the same thread
that's taking nocow locks and submitting reads, this introduces a
deadlock.

This shouldn't happen often in practice, but since the data move path
can keep large numbers of IOs in flight simultaneously, it's something
we have to handle.

This patch makes move_ctxt_wait_event() available to
bch2_data_update_init() and uses it when appropriate, which is our
normal solution to this kind of thing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

7ffb6a7e

bcachefs: Nocow support · a8b3a677

Kent Overstreet authored 2 years ago

This adds support for nocow mode, where we do writes in-place when
possible. Patch components:

 - New boolean filesystem and inode option, nocow: note that when nocow
   is enabled, data checksumming and compression are implicitly disabled

 - To prevent in-place writes from racing with data moves
   (data_update.c) or bucket reuse (i.e. a bucket being reused and
   re-allocated while a nocow write is in flight, we have a new locking
   mechanism.

   Buckets can be locked for either data update or data move, using a
   fixed size hash table of two_state_shared locks. We don't have any
   chaining, meaning updates and moves to different buckets that hash to
   the same lock will wait unnecessarily - we'll want to watch for this
   becoming an issue.

 - The allocator path also needs to check for in-place writes in flight
   to a given bucket before giving it out: thus we add another counter
   to bucket_alloc_state so we can track this.

 - Fsync now may need to issue cache flushes to block devices instead of
   flushing the journal. We add a device bitmask to bch_inode_info,
   ei_devs_need_flush, which tracks devices that need to have flushes
   issued - note that this will lead to unnecessary flushes when other
   codepaths have already issued flushes, we may want to replace this with
   a sequence number.

 - New nocow write path: look up extents, and if they're writable write
   to them - otherwise fall back to the normal COW write path.

XXX: switch to sequence numbers instead of bitmask for devs needing
journal flush

XXX: ei_quota_lock being a mutex means bch2_nocow_write_done() needs to
run in process context - see if we can improve this
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a8b3a677

bcachefs: Data update support for unwritten extents · 4dcd1cae

Kent Overstreet authored 2 years ago

The data update path requires special support for unwritten extents - we
still need to be able to move them, but there's no need to read or write
anything.

This patch adds a new error code to tell bch2_move_extent() that we're
short circuiting the read, and adds bch2_update_unwritten_extent() to
create a reservation then call __bch2_data_update_index_update().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4dcd1cae

bcachefs: Don't use key cache during fsck · 53b1c6f4

Kent Overstreet authored 2 years ago

The btree key cache mainly helps with lock contention, at the cost of
additional memory overhead. During some fsck passes the memory overhead
really matters, but fsck is single threaded so lock contention is an
issue - so skipping the key cache during fsck will help with
performance.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

53b1c6f4

bcachefs: Copygc now uses backpointers · 8e3f913e

Kent Overstreet authored 2 years ago

Previously, copygc needed to walk the entire extents & reflink btrees to
find extents that needed to be moved.

Now that we have backpointers, this patch implements
bch2_evacuate_bucket() in the move code, which copygc now uses for
evacuating mostly empty buckets.

Also, thanks to the new backpointers code, copygc can now move btree
nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8e3f913e

bcachefs: Debug mode for c->writes references · d94189ad

Kent Overstreet authored 2 years ago

This adds a debug mode where we split up the c->writes refcount into
distinct refcounts for every codepath that takes a reference, and adds
sysfs code to print the value of each ref.

This will make it easier to debug shutdown hangs due to refcount leaks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d94189ad