Commits · f2b542ba42a8b35d9dc43f5eab9791fea76bfd3a · Kirill Smelkov / linux

An error occurred fetching the project authors.

22 Oct, 2023 40 commits

bcachefs: Go RW before check_alloc_info() · f2b542ba

Kent Overstreet authored 2 years ago

It's possible to do btree updates before going RW by adding them to the
list of updates for journal replay to do, but this is limited by what
fits in RAM. This patch switches the second alloc info phase to run
after going RW - btree_gc has already ensured the alloc btree itself is
correct - and tweaks the allocation path to deal with the potential
small inconsistencies.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f2b542ba

bcachefs: Better inlining in core write path · 393a1f68

Kent Overstreet authored 2 years ago

Provide inline versions of some allocation functions
 - bch2_alloc_sectors_done_inlined()
 - bch2_alloc_sectors_append_ptrs_inlined()

and use them in the core IO path.

Also, inline bch2_extent_update_i_size_sectors() and
bch2_bkey_append_ptr().

In the core write path, function call overhead matters - every function
call is a jump to a new location and a potential cache miss.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

393a1f68

bcachefs: Better inlining for bch2_alloc_to_v4_mut · 19a614d2

Kent Overstreet authored 1 year ago

This separates out the slowpath into a separate function, and inlines
bch2_alloc_v4_mut into bch2_trans_start_alloc_update(), the main place
it's called.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

19a614d2

bcachefs: Fix bch2_bucket_alloc_early() · db36c147

Kent Overstreet authored 2 years ago

We were incorrectly retrying after a transaction restart.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

db36c147

bcachefs: Convert EAGAIN errors to private error codes · 87ced107

Kent Overstreet authored 2 years ago

More error code cleanup, for better error messages and debugability.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

87ced107

bcachefs: Convert EROFS errors to private error codes · 858536c7

Kent Overstreet authored 2 years ago

More error code improvements - this gets us more useful error messages.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

858536c7

bcachefs: New bpos_cmp(), bkey_cmp() replacements · e88a75eb

Kent Overstreet authored 2 years ago

This patch introduces
 - bpos_eq()
 - bpos_lt()
 - bpos_le()
 - bpos_gt()
 - bpos_ge()

and equivalent replacements for bkey_cmp().

Looking at the generated assembly these could probably be improved
further, but we already see a significant code size improvement with
this patch.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e88a75eb

bcachefs: Kill bch2_alloc_sectors_start() · 07de1803

Kent Overstreet authored 2 years ago

Only used in one place, just inline it there.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

07de1803

bcachefs: Assorted checkpatch fixes · 3e3e02e6

Kent Overstreet authored 2 years ago

checkpatch.pl gives lots of warnings that we don't want - suggested
ignore list:

 ASSIGN_IN_IF
 UNSPECIFIED_INT	- bcachefs coding style prefers single token type names
 NEW_TYPEDEFS		- typedefs are occasionally good
 FUNCTION_ARGUMENTS	- we prefer to look at functions in .c files
			  (hopefully with docbook documentation), not .h
			  file prototypes
 MULTISTATEMENT_MACRO_USE_DO_WHILE
			- we have _many_ x-macros and other macros where
			  we can't do this
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3e3e02e6

bcachefs: Optimize bch2_dev_usage_read() · ed80c569

Kent Overstreet authored 2 years ago

 - add bch2_dev_usage_read_fast(), which doesn't return by value -
   bch_dev_usage is big enough that we don't want the silent memcpy
 - tweak the allocation path to only call bch2_dev_usage_read() once per
   bucket allocated
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ed80c569

bcachefs: bucket_alloc_fail tracepoint should only fire when we have to block · adf16c6d

Kent Overstreet authored 2 years ago

We don't want to fire the bucket_alloc_fail tracepoint on transaction
restart, when we can retry immediately - only when we the allocation
actually has to block.

Also, switch from sched_clock() to local_clock(), as we've been doing
elsewhere.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

adf16c6d

bcachefs: Don't quash error in bch2_bucket_alloc_set_trans() · 943f9946

Kent Overstreet authored 2 years ago

We were incorrectly returning -BCH_ERR_insufficient_devices when we'd
received a different error from bch2_bucket_alloc_trans(), which
(erronously) turns into -EROFS further up the call chain.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

943f9946

bcachefs: bucket_alloc_state · ae10fe01

Kent Overstreet authored 2 years ago

This refactoring puts our various allocation path counters into a
dedicated struct - the upcoming nocow patch is going to add another
counter.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ae10fe01

bcachefs: Improve bucket_alloc tracepoint · 68b6cd19

Kent Overstreet authored 2 years ago

It now includes more info - whether the bucket was for metadata or data
- and also call it in the same place as the bucket_alloc_fail
tracepoint.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

68b6cd19

bcachefs: Add private error codes for ENOSPC · 098ef98d

Kent Overstreet authored 2 years ago

Continuing the saga of introducing private dedicated error codes for
each error path, this patch converts ENOSPC to error codes that are
subtypes of ENOSPC. We've recently had a test failure where we got
-ENOSPC where we shouldn't have, and didn't have enough information to
tell where it came from, so this patch will solve that problem.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

098ef98d

bcachefs: Add persistent counters for all tracepoints · 674cfc26

Kent Overstreet authored 2 years ago

Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

674cfc26

bcachefs: bch2_bucket_alloc_trans_early -> for_each_btree_key_norestart · db346e71

Kent Overstreet authored 2 years ago

Nested btree transactions require special care, and an upcoming patch is
going to add assertions to that effect. We don't want to be using them
unnecessarily, so this patch switches bch2_bucket_trans_early() to not
handle transaction restarts.

This patch also adds a cursor so that on transaction restart we can
continue scanning from where the previous search for an empty bucket
left off.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

db346e71

bcachefs: EINTR -> BCH_ERR_transaction_restart · 549d173c

Kent Overstreet authored 2 years ago

Now that we have error codes, with subtypes, we can switch to our own
error code for transaction restarts - and even better, a distinct error
code for each transaction restart reason: clearer code and better
debugging.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

549d173c

bcachefs: Prevent a btree iter overflow in alloc path · 90cecb92

Kent Overstreet authored 2 years ago

In bch2_bucket_alloc_trans(), we're iterating over buckets - but not
directly with an iterator, since we're iterating over the freespace
btree.

This means that we need to clear iter->path->preserve, otherwise we'll
end up retaining a btree_path for every alloc key we touched - which is
not what we want here.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

90cecb92

bcachefs: Improved errcodes · 615f867c

Kent Overstreet authored 2 years ago

Instead of overloading standard error codes (EINTR/EAGAIN), and defining
short lists of error codes in multiple places that potentially end up
overlapping & conflicting, we're now going to have one master list of
error codes.

Error codes are defined with an x-macro: thus we also have
bch2_err_str() now.

Also, error codes have a class field. Now, instead of checking for
errors with ==, code should use bch2_err_matches(), which returns true
if the error is equal to or a sub-error of the error class.

This means we can define unique errors for every source location where
an error is generated, which will help improve our error messages.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

615f867c

bcachefs: Improve bucket_alloc_fail tracepoint · 8ef98313

Kent Overstreet authored 2 years ago

We should be printing the number of free buckets, not just the number of
available buckets.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8ef98313

bcachefs: Split out dev_buckets_free() · 30f0349d

Kent Overstreet authored 2 years ago

Previously, dev_buckets_available() only counted buckets that are
eligible to be allocated right now - i.e. buckets that don't have cached
data, or need discard, or need gc gens, etc.

But most users of this function want to know how many buckets are
eligible to be allocated from without moving data around - copygc,
allocator striping, which means we should be including cached data
buckets etc.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

30f0349d

bcachefs: Printbuf rework · 401ec4db

Kent Overstreet authored 1 year ago

This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

401ec4db

bcachefs: Improve bch2_open_buckets_to_text() · 3518e6fa

Kent Overstreet authored 2 years ago

This patch updates bch2_open_buckets_to_text() to include the device and
bucket the open_bucket owns.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

3518e6fa

bcachefs: Fold bucket_state in to BCH_DATA_TYPES() · 822835ff

Kent Overstreet authored 2 years ago

Previously, we were missing accounting for buckets in need_gc_gens and
need_discard states. This matters because buckets in those states need
other btree operations done before they can be used, so they can't be
conuted when checking current number of free buckets against the
allocation watermark.

Also, we weren't directly counting free buckets at all. Now, data type 0
== BCH_DATA_free, and free buckets are counted; this means we can get
rid of the separate (poorly defined) count of unavailable buckets.

This is a new on disk format version, with upgrade and fsck required for
the accounting changes.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

822835ff

bcachefs: Kill allocator threads & freelists · f25d8215

Kent Overstreet authored 3 years ago

Now that we have new persistent data structures for the allocator, this
patch converts the allocator to use them.

Now, foreground bucket allocation uses the freespace btree to find
buckets to allocate, instead of popping buckets off the freelist.

The background allocator threads are no longer needed and are deleted,
as well as the allocator freelists. Now we only need background tasks
for invalidating buckets containing cached data (when we are low on
empty buckets), and for issuing discards.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f25d8215

bcachefs: Run btree updates after write out of write_point · b17d3cec

Kent Overstreet authored 2 years ago

In the write path, after the write to the block device(s) complete we
have to punt to process context to do the btree update.

Instead of using the work item embedded in op->cl, this patch switches
to a per write-point work item. This helps with two different issues:

 - lock contention: btree updates to the same writepoint will (usually)
   be updating the same alloc keys
 - context switch overhead: when we're bottlenecked on btree updates,
   having a thread (running out of a work item) checking the write point
   for completed ops is cheaper than queueing up a new work item and
   waking up a kworker.

In an arbitrary benchmark, 4k random writes with fio running inside a
VM, this patch resulted in a 10% improvement in total iops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b17d3cec

bcachefs: x-macroize alloc_reserve enum · 3e154711

Kent Overstreet authored 2 years ago

This makes an array of strings available, like our other enums.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3e154711

bcachefs: Kill verify_not_stale() · fcf01959

Kent Overstreet authored 2 years ago

This is ancient code that's more effectively checked in other places
now.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

fcf01959

bcachefs: New in-memory array for bucket gens · a7860877

Kent Overstreet authored 3 years ago

The main in-memory bucket array is going away, but we'll still need to
keep bucket generations in memory, at least for now - ptr_stale() needs
to be an efficient operation.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

a7860877

bcachefs: Put open_buckets in a hashtable · 9ddffaf8

Kent Overstreet authored 3 years ago

This is so that the copygc code doesn't have to refer to
bucket_mark.owned_by_allocator - assisting in getting rid of the in
memory bucket array.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

9ddffaf8

bcachefs: Refactor open_bucket code · abe19d45

Kent Overstreet authored 3 years ago

Prep work for adding a hash table of open buckets - instead of embedding
a bch_extent_ptr, we need to refer to the bucket directly so that we're
not calling sector_to_bucket() in the hash table lookup code, which has
an expensive divide.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

abe19d45

bcachefs: bch2_alloc_sectors_append_ptrs() now takes cached flag · 57af63b2
Kent Overstreet authored 3 years ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
57af63b2

bcachefs: Rewrite bch2_bucket_alloc_new_fs() · 09943313

Kent Overstreet authored 3 years ago

This changes bch2_bucket_alloc_new_fs() to a simple bump allocator that
doesn't need to use the in memory bucket array, part of a larger patch
series to entirely get rid of the in memory bucket array, except for
gc/fsck.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

09943313

bcachefs: Make sure bch2_bucket_alloc_new_fs() obeys buckets_nouse · 6be1b6d9
Kent Overstreet authored 3 years ago
```
This fixes the filesystem migrate tool.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
```
6be1b6d9

bcachefs: Convert bucket_alloc_ret to negative error codes · fc6c01e2

Kent Overstreet authored 3 years ago

Start a new header, errcode.h, for bcachefs-private error codes - more
error codes will be converted later.

This patch just converts bucket_alloc_ret so that they can be mixed with
standard error codes and passed as ERR_PTR errors - the ec.c code was
doing this already, but incorrectly.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

fc6c01e2

bcachefs: Allocator refactoring · 89baec78

Kent Overstreet authored 3 years ago

This uses the kthread_wait_freezable() macro to simplify a lot of the
allocator thread code, along with cleaning up bch2_invalidate_bucket2().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

89baec78

bcachefs: gc shouldn't care about owned_by_allocator · dac1525d

Kent Overstreet authored 3 years ago

The owned_by_allocator field is a purely in memory thing, even if/when
we bring back GC at runtime there's no need for it to be recalculating
this field. This is prep work for pulling it out of struct bucket, and
eventually getting rid of the bucket array.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

dac1525d

bcachefs: Fix an RCU splat · 3e07a730

Kent Overstreet authored 3 years ago

Writepoints are never deallocated so the rcu_read_lock() isn't really
needed, but we are doing lockless list traversal.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3e07a730

bcachefs: Fix copygc threshold · cb66fc5f

Kent Overstreet authored 3 years ago

Awhile back the meaning of is_available_bucket() and thus also
bch_dev_usage->buckets_unavailable changed to include buckets that are
owned by the allocator - this was so that the stat could be persisted
like other allocation information, and wouldn't have to be regenerated
by walking each bucket at mount time.

This broke copygc, which needs to consider buckets that are reclaimable
and haven't yet been grabbed by the allocator thread and moved onta
freelist. This patch fixes that by adding dev_buckets_reclaimable() for
copygc and the allocator thread, and cleans up some of the callers a bit.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

cb66fc5f