Commit 89baec78 authored by Kent Overstreet's avatar Kent Overstreet Committed by Kent Overstreet

bcachefs: Allocator refactoring

This uses the kthread_wait_freezable() macro to simplify a lot of the
allocator thread code, along with cleaning up bch2_invalidate_bucket2().
Signed-off-by: default avatarKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
parent fa272f33
This diff is collapsed.
// SPDX-License-Identifier: GPL-2.0
/*
* Primary bucket allocation code
*
* Copyright 2012 Google, Inc.
*
* Allocation in bcache is done in terms of buckets:
*
* Each bucket has associated an 8 bit gen; this gen corresponds to the gen in
* btree pointers - they must match for the pointer to be considered valid.
*
* Thus (assuming a bucket has no dirty data or metadata in it) we can reuse a
* bucket simply by incrementing its gen.
*
* The gens (along with the priorities; it's really the gens are important but
* the code is named as if it's the priorities) are written in an arbitrary list
* of buckets on disk, with a pointer to them in the journal header.
*
* When we invalidate a bucket, we have to write its new gen to disk and wait
* for that write to complete before we use it - otherwise after a crash we
* could have pointers that appeared to be good but pointed to data that had
* been overwritten.
*
* Since the gens and priorities are all stored contiguously on disk, we can
* batch this up: We fill up the free_inc list with freshly invalidated buckets,
* call prio_write(), and when prio_write() finishes we pull buckets off the
* free_inc list and optionally discard them.
*
* free_inc isn't the only freelist - if it was, we'd often have to sleep while
* priorities and gens were being written before we could allocate. c->free is a
* smaller freelist, and buckets on that list are always ready to be used.
*
* If we've got discards enabled, that happens when a bucket moves from the
* free_inc list to the free list.
*
* It's important to ensure that gens don't wrap around - with respect to
* either the oldest gen in the btree or the gen on disk. This is quite
* difficult to do in practice, but we explicitly guard against it anyways - if
* a bucket is in danger of wrapping around we simply skip invalidating it that
* time around, and we garbage collect or rewrite the priorities sooner than we
* would have otherwise.
* Foreground allocator code: allocate buckets from freelist, and allocate in
* sector granularity from writepoints.
*
* bch2_bucket_alloc() allocates a single bucket from a specific device.
*
* bch2_bucket_alloc_set() allocates one or more buckets from different devices
* in a given filesystem.
*
* invalidate_buckets() drives all the processes described above. It's called
* from bch2_bucket_alloc() and a few other places that need to make sure free
* buckets are ready.
*
* invalidate_buckets_(lru|fifo)() find buckets that are available to be
* invalidated, and then invalidate them and stick them on the free_inc list -
* in either lru or fifo order.
*/
#include "bcachefs.h"
......
......@@ -380,24 +380,27 @@ DEFINE_EVENT(bch_fs, gc_cannot_inc_gens,
/* Allocator */
TRACE_EVENT(alloc_batch,
TP_PROTO(struct bch_dev *ca, size_t free, size_t total),
TP_ARGS(ca, free, total),
TRACE_EVENT(alloc_scan,
TP_PROTO(struct bch_dev *ca, u64 found, u64 inc_gen, u64 inc_gen_skipped),
TP_ARGS(ca, found, inc_gen, inc_gen_skipped),
TP_STRUCT__entry(
__array(char, uuid, 16 )
__field(size_t, free )
__field(size_t, total )
__field(dev_t, dev )
__field(u64, found )
__field(u64, inc_gen )
__field(u64, inc_gen_skipped )
),
TP_fast_assign(
memcpy(__entry->uuid, ca->uuid.b, 16);
__entry->free = free;
__entry->total = total;
__entry->dev = ca->disk_sb.bdev->bd_dev;
__entry->found = found;
__entry->inc_gen = inc_gen;
__entry->inc_gen_skipped = inc_gen_skipped;
),
TP_printk("%pU free %zu total %zu",
__entry->uuid, __entry->free, __entry->total)
TP_printk("%d,%d found %llu inc_gen %llu inc_gen_skipped %llu",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->found, __entry->inc_gen, __entry->inc_gen_skipped)
);
TRACE_EVENT(invalidate,
......@@ -417,8 +420,10 @@ TRACE_EVENT(invalidate,
),
TP_printk("invalidated %u sectors at %d,%d sector=%llu",
__entry->sectors, MAJOR(__entry->dev),
MINOR(__entry->dev), __entry->offset)
__entry->sectors,
MAJOR(__entry->dev),
MINOR(__entry->dev),
__entry->offset)
);
DECLARE_EVENT_CLASS(bucket_alloc,
......@@ -426,16 +431,18 @@ DECLARE_EVENT_CLASS(bucket_alloc,
TP_ARGS(ca, reserve),
TP_STRUCT__entry(
__array(char, uuid, 16)
__field(enum alloc_reserve, reserve )
__field(dev_t, dev )
__field(enum alloc_reserve, reserve )
),
TP_fast_assign(
memcpy(__entry->uuid, ca->uuid.b, 16);
__entry->reserve = reserve;
__entry->dev = ca->disk_sb.bdev->bd_dev;
__entry->reserve = reserve;
),
TP_printk("%pU reserve %d", __entry->uuid, __entry->reserve)
TP_printk("%d,%d reserve %d",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->reserve)
);
DEFINE_EVENT(bucket_alloc, bucket_alloc,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment