Commit d5acbc60 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'for-6.7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "New features:

   - raid-stripe-tree

     New tree for logical file extent mapping where the physical mapping
     may not match on multiple devices. This is now used in zoned mode
     to implement RAID0/RAID1* profiles, but can be used in non-zoned
     mode as well. The support for RAID56 is in development and will
     eventually fix the problems with the current implementation. This
     is a backward incompatible feature and has to be enabled at mkfs
     time.

   - simple quota accounting (squota)

     A simplified mode of qgroup that accounts all space on the initial
     extent owners (a subvolume), the snapshots are then cheap to create
     and delete. The deletion of snapshots in fully accounting qgroups
     is a known CPU/IO performance bottleneck.

     The squota is not suitable for the general use case but works well
     for containers where the original subvolume exists for the whole
     time. This is a backward incompatible feature as it needs extending
     some structures, but can be enabled on an existing filesystem.

   - temporary filesystem fsid (temp_fsid)

     The fsid identifies a filesystem and is hard coded in the
     structures, which disallows mounting the same fsid found on
     different devices.

     For a single device filesystem this is not strictly necessary, a
     new temporary fsid can be generated on mount e.g. after a device is
     cloned. This will be used by Steam Deck for root partition A/B
     testing, or can be used for VM root images.

  Other user visible changes:

   - filesystems with partially finished metadata_uuid conversion cannot
     be mounted anymore and the uuid fixup has to be done by btrfs-progs
     (btrfstune).

  Performance improvements:

   - reduce reservations for checksum deletions (with enabled free space
     tree by factor of 4), on a sample workload on file with many
     extents the deletion time decreased by 12%

   - make extent state merges more efficient during insertions, reduce
     rb-tree iterations (run time of critical functions reduced by 5%)

  Core changes:

   - the integrity check functionality has been removed, this was a
     debugging feature and removal does not affect other integrity
     checks like checksums or tree-checker

   - space reservation changes:

      - more efficient delayed ref reservations, this avoids building up
        too much work or overusing or exhausting the global block
        reserve in some situations

      - move delayed refs reservation to the transaction start time,
        this prevents some ENOSPC corner cases related to exhaustion of
        global reserve

      - improvements in reducing excessive reservations for block group
        items

      - adjust overcommit logic in near full situations, account for one
        more chunk to eventually allocate metadata chunk, this is mostly
        relevant for small filesystems (<10GiB)

   - single device filesystems are scanned but not registered (except
     seed devices), this allows temp_fsid to work

   - qgroup iterations do not need GFP_ATOMIC allocations anymore

   - cleanups, refactoring, reduced data structure size, function
     parameter simplifications, error handling fixes"

* tag 'for-6.7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (156 commits)
  btrfs: open code timespec64 in struct btrfs_inode
  btrfs: remove redundant log root tree index assignment during log sync
  btrfs: remove redundant initialization of variable dirty in btrfs_update_time()
  btrfs: sysfs: show temp_fsid feature
  btrfs: disable the device add feature for temp-fsid
  btrfs: disable the seed feature for temp-fsid
  btrfs: update comment for temp-fsid, fsid, and metadata_uuid
  btrfs: remove pointless empty log context list check when syncing log
  btrfs: update comment for struct btrfs_inode::lock
  btrfs: remove pointless barrier from btrfs_sync_file()
  btrfs: add and use helpers for reading and writing last_trans_committed
  btrfs: add and use helpers for reading and writing fs_info->generation
  btrfs: add and use helpers for reading and writing log_transid
  btrfs: add and use helpers for reading and writing last_log_commit
  btrfs: support cloned-device mount capability
  btrfs: add helper function find_fsid_by_disk
  btrfs: stop reserving excessive space for block group item insertions
  btrfs: stop reserving excessive space for block group item updates
  btrfs: reorder btrfs_inode to fill gaps
  btrfs: open code btrfs_ordered_inode_tree in btrfs_inode
  ...
parents 8829687a c6e8f898
......@@ -48,27 +48,6 @@ config BTRFS_FS_POSIX_ACL
If you don't know what Access Control Lists are, say N
config BTRFS_FS_CHECK_INTEGRITY
bool "Btrfs with integrity check tool compiled in (DEPRECATED)"
depends on BTRFS_FS
help
This feature has been deprecated and will be removed in 6.7.
Adds code that examines all block write requests (including
writes of the super block). The goal is to verify that the
state of the filesystem on disk is always consistent, i.e.,
after a power-loss or kernel panic event the filesystem is
in a consistent state.
If the integrity check tool is included and activated in
the mount options, plenty of kernel memory is used, and
plenty of additional CPU cycles are spent. Enabling this
functionality is not intended for normal use.
In most cases, unless you are a btrfs developer who needs
to verify the integrity of (super)-block write requests
during the run of a regression test, say N
config BTRFS_FS_RUN_SANITY_TESTS
bool "Btrfs will run sanity tests upon loading"
depends on BTRFS_FS
......
......@@ -33,10 +33,9 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
uuid-tree.o props.o free-space-tree.o tree-checker.o space-info.o \
block-rsv.o delalloc-space.o block-group.o discard.o reflink.o \
subpage.o tree-mod-log.o extent-io-tree.o fs.o messages.o bio.o \
lru_cache.o
lru_cache.o raid-stripe-tree.o
btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
btrfs-$(CONFIG_BTRFS_FS_REF_VERIFY) += ref-verify.o
btrfs-$(CONFIG_BLK_DEV_ZONED) += zoned.o
btrfs-$(CONFIG_FS_VERITY) += verity.o
......
......@@ -4,6 +4,7 @@
#define BTRFS_ACCESSORS_H
#include <linux/stddef.h>
#include <asm/unaligned.h>
struct btrfs_map_token {
struct extent_buffer *eb;
......@@ -305,6 +306,14 @@ BTRFS_SETGET_FUNCS(timespec_nsec, struct btrfs_timespec, nsec, 32);
BTRFS_SETGET_STACK_FUNCS(stack_timespec_sec, struct btrfs_timespec, sec, 64);
BTRFS_SETGET_STACK_FUNCS(stack_timespec_nsec, struct btrfs_timespec, nsec, 32);
BTRFS_SETGET_FUNCS(stripe_extent_encoding, struct btrfs_stripe_extent, encoding, 8);
BTRFS_SETGET_FUNCS(raid_stride_devid, struct btrfs_raid_stride, devid, 64);
BTRFS_SETGET_FUNCS(raid_stride_physical, struct btrfs_raid_stride, physical, 64);
BTRFS_SETGET_STACK_FUNCS(stack_stripe_extent_encoding,
struct btrfs_stripe_extent, encoding, 8);
BTRFS_SETGET_STACK_FUNCS(stack_raid_stride_devid, struct btrfs_raid_stride, devid, 64);
BTRFS_SETGET_STACK_FUNCS(stack_raid_stride_physical, struct btrfs_raid_stride, physical, 64);
/* struct btrfs_dev_extent */
BTRFS_SETGET_FUNCS(dev_extent_chunk_tree, struct btrfs_dev_extent, chunk_tree, 64);
BTRFS_SETGET_FUNCS(dev_extent_chunk_objectid, struct btrfs_dev_extent,
......@@ -349,6 +358,9 @@ BTRFS_SETGET_FUNCS(extent_data_ref_count, struct btrfs_extent_data_ref, count, 3
BTRFS_SETGET_FUNCS(shared_data_ref_count, struct btrfs_shared_data_ref, count, 32);
BTRFS_SETGET_FUNCS(extent_owner_ref_root_id, struct btrfs_extent_owner_ref,
root_id, 64);
BTRFS_SETGET_FUNCS(extent_inline_ref_type, struct btrfs_extent_inline_ref,
type, 8);
BTRFS_SETGET_FUNCS(extent_inline_ref_offset, struct btrfs_extent_inline_ref,
......@@ -365,6 +377,8 @@ static inline u32 btrfs_extent_inline_ref_size(int type)
if (type == BTRFS_EXTENT_DATA_REF_KEY)
return sizeof(struct btrfs_extent_data_ref) +
offsetof(struct btrfs_extent_inline_ref, offset);
if (type == BTRFS_EXTENT_OWNER_REF_KEY)
return sizeof(struct btrfs_extent_inline_ref);
return 0;
}
......@@ -966,6 +980,8 @@ BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item,
flags, 64);
BTRFS_SETGET_FUNCS(qgroup_status_rescan, struct btrfs_qgroup_status_item,
rescan, 64);
BTRFS_SETGET_FUNCS(qgroup_status_enable_gen, struct btrfs_qgroup_status_item,
enable_gen, 64);
/* btrfs_qgroup_info_item */
BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item,
......
......@@ -9,6 +9,7 @@
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/freezer.h>
#include <trace/events/btrfs.h>
#include "async-thread.h"
#include "ctree.h"
......@@ -242,7 +243,7 @@ static void run_ordered_work(struct btrfs_workqueue *wq,
break;
trace_btrfs_ordered_sched(work);
spin_unlock_irqrestore(lock, flags);
work->ordered_func(work);
work->ordered_func(work, false);
/* now take the lock again and drop our item from the list */
spin_lock_irqsave(lock, flags);
......@@ -277,7 +278,7 @@ static void run_ordered_work(struct btrfs_workqueue *wq,
* We don't want to call the ordered free functions with
* the lock held.
*/
work->ordered_free(work);
work->ordered_func(work, true);
/* NB: work must not be dereferenced past this point. */
trace_btrfs_all_work_done(wq->fs_info, work);
}
......@@ -285,7 +286,7 @@ static void run_ordered_work(struct btrfs_workqueue *wq,
spin_unlock_irqrestore(lock, flags);
if (free_self) {
self->ordered_free(self);
self->ordered_func(self, true);
/* NB: self must not be dereferenced past this point. */
trace_btrfs_all_work_done(wq->fs_info, self);
}
......@@ -300,7 +301,7 @@ static void btrfs_work_helper(struct work_struct *normal_work)
/*
* We should not touch things inside work in the following cases:
* 1) after work->func() if it has no ordered_free
* 1) after work->func() if it has no ordered_func(..., true) to free
* Since the struct is freed in work->func().
* 2) after setting WORK_DONE_BIT
* The work may be freed in other threads almost instantly.
......@@ -329,11 +330,10 @@ static void btrfs_work_helper(struct work_struct *normal_work)
}
void btrfs_init_work(struct btrfs_work *work, btrfs_func_t func,
btrfs_func_t ordered_func, btrfs_func_t ordered_free)
btrfs_ordered_func_t ordered_func)
{
work->func = func;
work->ordered_func = ordered_func;
work->ordered_free = ordered_free;
INIT_WORK(&work->normal_work, btrfs_work_helper);
INIT_LIST_HEAD(&work->ordered_list);
work->flags = 0;
......
......@@ -13,11 +13,11 @@ struct btrfs_fs_info;
struct btrfs_workqueue;
struct btrfs_work;
typedef void (*btrfs_func_t)(struct btrfs_work *arg);
typedef void (*btrfs_ordered_func_t)(struct btrfs_work *arg, bool);
struct btrfs_work {
btrfs_func_t func;
btrfs_func_t ordered_func;
btrfs_func_t ordered_free;
btrfs_ordered_func_t ordered_func;
/* Don't touch things below */
struct work_struct normal_work;
......@@ -35,7 +35,7 @@ struct btrfs_workqueue *btrfs_alloc_ordered_workqueue(
struct btrfs_fs_info *fs_info, const char *name,
unsigned int flags);
void btrfs_init_work(struct btrfs_work *work, btrfs_func_t func,
btrfs_func_t ordered_func, btrfs_func_t ordered_free);
btrfs_ordered_func_t ordered_func);
void btrfs_queue_work(struct btrfs_workqueue *wq,
struct btrfs_work *work);
void btrfs_destroy_workqueue(struct btrfs_workqueue *wq);
......
......@@ -1129,6 +1129,9 @@ static int add_inline_refs(struct btrfs_backref_walk_ctx *ctx,
count, sc, GFP_NOFS);
break;
}
case BTRFS_EXTENT_OWNER_REF_KEY:
ASSERT(btrfs_fs_incompat(ctx->fs_info, SIMPLE_QUOTA));
break;
default:
WARN_ON(1);
}
......@@ -2998,7 +3001,7 @@ int btrfs_backref_iter_next(struct btrfs_backref_iter *iter)
}
void btrfs_backref_init_cache(struct btrfs_fs_info *fs_info,
struct btrfs_backref_cache *cache, int is_reloc)
struct btrfs_backref_cache *cache, bool is_reloc)
{
int i;
......
......@@ -247,7 +247,7 @@ struct prelim_ref {
struct rb_node rbnode;
u64 root_id;
struct btrfs_key key_for_search;
int level;
u8 level;
int count;
struct extent_inode_elem *inode_list;
u64 parent;
......@@ -440,11 +440,11 @@ struct btrfs_backref_cache {
* Reloction backref cache require more info for reloc root compared
* to generic backref cache.
*/
unsigned int is_reloc;
bool is_reloc;
};
void btrfs_backref_init_cache(struct btrfs_fs_info *fs_info,
struct btrfs_backref_cache *cache, int is_reloc);
struct btrfs_backref_cache *cache, bool is_reloc);
struct btrfs_backref_node *btrfs_backref_alloc_node(
struct btrfs_backref_cache *cache, u64 bytenr, int level);
struct btrfs_backref_edge *btrfs_backref_alloc_edge(
......@@ -533,9 +533,9 @@ void btrfs_backref_cleanup_node(struct btrfs_backref_cache *cache,
void btrfs_backref_release_cache(struct btrfs_backref_cache *cache);
static inline void btrfs_backref_panic(struct btrfs_fs_info *fs_info,
u64 bytenr, int errno)
u64 bytenr, int error)
{
btrfs_panic(fs_info, errno,
btrfs_panic(fs_info, error,
"Inconsistency in backref cache found at offset %llu",
bytenr);
}
......
......@@ -10,11 +10,11 @@
#include "volumes.h"
#include "raid56.h"
#include "async-thread.h"
#include "check-integrity.h"
#include "dev-replace.h"
#include "rcu-string.h"
#include "zoned.h"
#include "file-item.h"
#include "raid-stripe-tree.h"
static struct bio_set btrfs_bioset;
static struct bio_set btrfs_clone_bioset;
......@@ -416,6 +416,9 @@ static void btrfs_orig_write_end_io(struct bio *bio)
else
bio->bi_status = BLK_STS_OK;
if (bio_op(bio) == REQ_OP_ZONE_APPEND && !bio->bi_status)
stripe->physical = bio->bi_iter.bi_sector << SECTOR_SHIFT;
btrfs_orig_bbio_end_io(bbio);
btrfs_put_bioc(bioc);
}
......@@ -427,6 +430,8 @@ static void btrfs_clone_write_end_io(struct bio *bio)
if (bio->bi_status) {
atomic_inc(&stripe->bioc->error);
btrfs_log_dev_io_error(bio, stripe->dev);
} else if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
stripe->physical = bio->bi_iter.bi_sector << SECTOR_SHIFT;
}
/* Pass on control to the original bio this one was cloned from */
......@@ -463,8 +468,6 @@ static void btrfs_submit_dev_bio(struct btrfs_device *dev, struct bio *bio)
(unsigned long)dev->bdev->bd_dev, btrfs_dev_name(dev),
dev->devid, bio->bi_iter.bi_size);
btrfsic_check_bio(bio);
if (bio->bi_opf & REQ_BTRFS_CGROUP_PUNT)
blkcg_punt_bio_submit(bio);
else
......@@ -490,6 +493,7 @@ static void btrfs_submit_mirrored_bio(struct btrfs_io_context *bioc, int dev_nr)
bio->bi_private = &bioc->stripes[dev_nr];
bio->bi_iter.bi_sector = bioc->stripes[dev_nr].physical >> SECTOR_SHIFT;
bioc->stripes[dev_nr].bioc = bioc;
bioc->size = bio->bi_iter.bi_size;
btrfs_submit_dev_bio(bioc->stripes[dev_nr].dev, bio);
}
......@@ -499,6 +503,8 @@ static void __btrfs_submit_bio(struct bio *bio, struct btrfs_io_context *bioc,
if (!bioc) {
/* Single mirror read/write fast path. */
btrfs_bio(bio)->mirror_num = mirror_num;
if (bio_op(bio) != REQ_OP_READ)
btrfs_bio(bio)->orig_physical = smap->physical;
bio->bi_iter.bi_sector = smap->physical >> SECTOR_SHIFT;
if (bio_op(bio) != REQ_OP_READ)
btrfs_bio(bio)->orig_physical = smap->physical;
......@@ -568,13 +574,20 @@ static void run_one_async_start(struct btrfs_work *work)
*
* At IO completion time the csums attached on the ordered extent record are
* inserted into the tree.
*
* If called with @do_free == true, then it will free the work struct.
*/
static void run_one_async_done(struct btrfs_work *work)
static void run_one_async_done(struct btrfs_work *work, bool do_free)
{
struct async_submit_bio *async =
container_of(work, struct async_submit_bio, work);
struct bio *bio = &async->bbio->bio;
if (do_free) {
kfree(container_of(work, struct async_submit_bio, work));
return;
}
/* If an error occurred we just want to clean up the bio and move on. */
if (bio->bi_status) {
btrfs_orig_bbio_end_io(async->bbio);
......@@ -590,11 +603,6 @@ static void run_one_async_done(struct btrfs_work *work)
__btrfs_submit_bio(bio, async->bioc, &async->smap, async->mirror_num);
}
static void run_one_async_free(struct btrfs_work *work)
{
kfree(container_of(work, struct async_submit_bio, work));
}
static bool should_async_write(struct btrfs_bio *bbio)
{
/* Submit synchronously if the checksum implementation is fast. */
......@@ -636,8 +644,7 @@ static bool btrfs_wq_submit_bio(struct btrfs_bio *bbio,
async->smap = *smap;
async->mirror_num = mirror_num;
btrfs_init_work(&async->work, run_one_async_start, run_one_async_done,
run_one_async_free);
btrfs_init_work(&async->work, run_one_async_start, run_one_async_done);
btrfs_queue_work(fs_info->workers, &async->work);
return true;
}
......@@ -657,9 +664,11 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
blk_status_t ret;
int error;
smap.is_scrub = !bbio->inode;
btrfs_bio_counter_inc_blocked(fs_info);
error = btrfs_map_block(fs_info, btrfs_op(bio), logical, &map_length,
&bioc, &smap, &mirror_num, 1);
&bioc, &smap, &mirror_num);
if (error) {
ret = errno_to_blk_status(error);
goto fail;
......@@ -691,6 +700,18 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
bio->bi_opf |= REQ_OP_ZONE_APPEND;
}
if (is_data_bbio(bbio) && bioc &&
btrfs_need_stripe_tree_update(bioc->fs_info, bioc->map_type)) {
/*
* No locking for the list update, as we only add to
* the list in the I/O submission path, and list
* iteration only happens in the completion path, which
* can't happen until after the last submission.
*/
btrfs_get_bioc(bioc);
list_add_tail(&bioc->rst_ordered_entry, &bbio->ordered->bioc_list);
}
/*
* Csum items for reloc roots have already been cloned at this
* point, so they are handled as part of the no-checksum case.
......@@ -779,8 +800,6 @@ int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
bio_init(&bio, smap.dev->bdev, &bvec, 1, REQ_OP_WRITE | REQ_SYNC);
bio.bi_iter.bi_sector = smap.physical >> SECTOR_SHIFT;
__bio_add_page(&bio, page, length, pg_offset);
btrfsic_check_bio(&bio);
ret = submit_bio_wait(&bio);
if (ret) {
/* try to remap that extent elsewhere? */
......
......@@ -935,7 +935,7 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, bool wait)
caching_ctl->block_group = cache;
refcount_set(&caching_ctl->count, 2);
atomic_set(&caching_ctl->progress, 0);
btrfs_init_work(&caching_ctl->work, caching_thread, NULL, NULL);
btrfs_init_work(&caching_ctl->work, caching_thread, NULL);
spin_lock(&cache->lock);
if (cache->cached != BTRFS_CACHE_NO) {
......@@ -1286,7 +1286,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
/* Once for the lookup reference */
btrfs_put_block_group(block_group);
if (remove_rsv)
btrfs_delayed_refs_rsv_release(fs_info, 1);
btrfs_dec_delayed_refs_rsv_bg_updates(fs_info);
btrfs_free_path(path);
return ret;
}
......@@ -2601,7 +2601,7 @@ static int insert_dev_extent(struct btrfs_trans_handle *trans,
btrfs_set_dev_extent_chunk_offset(leaf, extent, chunk_offset);
btrfs_set_dev_extent_length(leaf, extent, num_bytes);
btrfs_mark_buffer_dirty(leaf);
btrfs_mark_buffer_dirty(trans, leaf);
out:
btrfs_free_path(path);
return ret;
......@@ -2709,7 +2709,7 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans)
/* Already aborted the transaction if it failed. */
next:
btrfs_delayed_refs_rsv_release(fs_info, 1);
btrfs_dec_delayed_refs_rsv_bg_inserts(fs_info);
list_del_init(&block_group->bg_list);
clear_bit(BLOCK_GROUP_FLAG_NEW, &block_group->runtime_flags);
}
......@@ -2819,8 +2819,7 @@ struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *tran
#endif
list_add_tail(&cache->bg_list, &trans->new_bgs);
trans->delayed_ref_updates++;
btrfs_update_delayed_refs_rsv(trans);
btrfs_inc_delayed_refs_rsv_bg_inserts(fs_info);
set_avail_alloc_bits(fs_info, type);
return cache;
......@@ -3025,7 +3024,7 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
cache->global_root_id);
btrfs_set_stack_block_group_flags(&bgi, cache->flags);
write_extent_buffer(leaf, &bgi, bi, sizeof(bgi));
btrfs_mark_buffer_dirty(leaf);
btrfs_mark_buffer_dirty(trans, leaf);
fail:
btrfs_release_path(path);
/*
......@@ -3051,7 +3050,6 @@ static int cache_save_setup(struct btrfs_block_group *block_group,
struct btrfs_path *path)
{
struct btrfs_fs_info *fs_info = block_group->fs_info;
struct btrfs_root *root = fs_info->tree_root;
struct inode *inode = NULL;
struct extent_changeset *data_reserved = NULL;
u64 alloc_hint = 0;
......@@ -3103,7 +3101,7 @@ static int cache_save_setup(struct btrfs_block_group *block_group,
* time.
*/
BTRFS_I(inode)->generation = 0;
ret = btrfs_update_inode(trans, root, BTRFS_I(inode));
ret = btrfs_update_inode(trans, BTRFS_I(inode));
if (ret) {
/*
* So theoretically we could recover from this, simply set the
......@@ -3370,7 +3368,7 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans)
if (should_put)
btrfs_put_block_group(cache);
if (drop_reserve)
btrfs_delayed_refs_rsv_release(fs_info, 1);
btrfs_dec_delayed_refs_rsv_bg_updates(fs_info);
/*
* Avoid blocking other tasks for too long. It might even save
* us from writing caches for block groups that are going to be
......@@ -3474,8 +3472,7 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans)
cache_save_setup(cache, trans, path);
if (!ret)
ret = btrfs_run_delayed_refs(trans,
(unsigned long) -1);
ret = btrfs_run_delayed_refs(trans, U64_MAX);
if (!ret && cache->disk_cache_state == BTRFS_DC_SETUP) {
cache->io_ctl.inode = NULL;
......@@ -3518,7 +3515,7 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans)
/* If its not on the io list, we need to put the block group */
if (should_put)
btrfs_put_block_group(cache);
btrfs_delayed_refs_rsv_release(fs_info, 1);
btrfs_dec_delayed_refs_rsv_bg_updates(fs_info);
spin_lock(&cur_trans->dirty_bgs_lock);
}
spin_unlock(&cur_trans->dirty_bgs_lock);
......@@ -3543,12 +3540,12 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
u64 bytenr, u64 num_bytes, bool alloc)
{
struct btrfs_fs_info *info = trans->fs_info;
struct btrfs_block_group *cache = NULL;
u64 total = num_bytes;
struct btrfs_space_info *space_info;
struct btrfs_block_group *cache;
u64 old_val;
u64 byte_in_group;
bool reclaim = false;
bool bg_already_dirty = true;
int factor;
int ret = 0;
/* Block accounting for super block */
spin_lock(&info->delalloc_root_lock);
......@@ -3560,97 +3557,86 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
btrfs_set_super_bytes_used(info->super_copy, old_val);
spin_unlock(&info->delalloc_root_lock);
while (total) {
struct btrfs_space_info *space_info;
bool reclaim = false;
cache = btrfs_lookup_block_group(info, bytenr);
if (!cache) {
ret = -ENOENT;
break;
}
space_info = cache->space_info;
factor = btrfs_bg_type_to_factor(cache->flags);
cache = btrfs_lookup_block_group(info, bytenr);
if (!cache)
return -ENOENT;
/*
* If this block group has free space cache written out, we
* need to make sure to load it if we are removing space. This
* is because we need the unpinning stage to actually add the
* space back to the block group, otherwise we will leak space.
*/
if (!alloc && !btrfs_block_group_done(cache))
btrfs_cache_block_group(cache, true);
/* An extent can not span multiple block groups. */
ASSERT(bytenr + num_bytes <= cache->start + cache->length);
byte_in_group = bytenr - cache->start;
WARN_ON(byte_in_group > cache->length);
space_info = cache->space_info;
factor = btrfs_bg_type_to_factor(cache->flags);
spin_lock(&space_info->lock);
spin_lock(&cache->lock);
/*
* If this block group has free space cache written out, we need to make
* sure to load it if we are removing space. This is because we need
* the unpinning stage to actually add the space back to the block group,
* otherwise we will leak space.
*/
if (!alloc && !btrfs_block_group_done(cache))
btrfs_cache_block_group(cache, true);
if (btrfs_test_opt(info, SPACE_CACHE) &&
cache->disk_cache_state < BTRFS_DC_CLEAR)
cache->disk_cache_state = BTRFS_DC_CLEAR;
spin_lock(&space_info->lock);
spin_lock(&cache->lock);
old_val = cache->used;
num_bytes = min(total, cache->length - byte_in_group);
if (alloc) {
old_val += num_bytes;
cache->used = old_val;
cache->reserved -= num_bytes;
space_info->bytes_reserved -= num_bytes;
space_info->bytes_used += num_bytes;
space_info->disk_used += num_bytes * factor;
spin_unlock(&cache->lock);
spin_unlock(&space_info->lock);
} else {
old_val -= num_bytes;
cache->used = old_val;
cache->pinned += num_bytes;
btrfs_space_info_update_bytes_pinned(info, space_info,
num_bytes);
space_info->bytes_used -= num_bytes;
space_info->disk_used -= num_bytes * factor;
if (btrfs_test_opt(info, SPACE_CACHE) &&
cache->disk_cache_state < BTRFS_DC_CLEAR)
cache->disk_cache_state = BTRFS_DC_CLEAR;
reclaim = should_reclaim_block_group(cache, num_bytes);
old_val = cache->used;
if (alloc) {
old_val += num_bytes;
cache->used = old_val;
cache->reserved -= num_bytes;
space_info->bytes_reserved -= num_bytes;
space_info->bytes_used += num_bytes;
space_info->disk_used += num_bytes * factor;
spin_unlock(&cache->lock);
spin_unlock(&space_info->lock);
} else {
old_val -= num_bytes;
cache->used = old_val;
cache->pinned += num_bytes;
btrfs_space_info_update_bytes_pinned(info, space_info, num_bytes);
space_info->bytes_used -= num_bytes;
space_info->disk_used -= num_bytes * factor;
spin_unlock(&cache->lock);
spin_unlock(&space_info->lock);
reclaim = should_reclaim_block_group(cache, num_bytes);
set_extent_bit(&trans->transaction->pinned_extents,
bytenr, bytenr + num_bytes - 1,
EXTENT_DIRTY, NULL);
}
spin_unlock(&cache->lock);
spin_unlock(&space_info->lock);
spin_lock(&trans->transaction->dirty_bgs_lock);
if (list_empty(&cache->dirty_list)) {
list_add_tail(&cache->dirty_list,
&trans->transaction->dirty_bgs);
trans->delayed_ref_updates++;
btrfs_get_block_group(cache);
}
spin_unlock(&trans->transaction->dirty_bgs_lock);
set_extent_bit(&trans->transaction->pinned_extents, bytenr,
bytenr + num_bytes - 1, EXTENT_DIRTY, NULL);
}
/*
* No longer have used bytes in this block group, queue it for
* deletion. We do this after adding the block group to the
* dirty list to avoid races between cleaner kthread and space
* cache writeout.
*/
if (!alloc && old_val == 0) {
if (!btrfs_test_opt(info, DISCARD_ASYNC))
btrfs_mark_bg_unused(cache);
} else if (!alloc && reclaim) {
btrfs_mark_bg_to_reclaim(cache);
}
spin_lock(&trans->transaction->dirty_bgs_lock);
if (list_empty(&cache->dirty_list)) {
list_add_tail(&cache->dirty_list, &trans->transaction->dirty_bgs);
bg_already_dirty = false;
btrfs_get_block_group(cache);
}
spin_unlock(&trans->transaction->dirty_bgs_lock);
btrfs_put_block_group(cache);
total -= num_bytes;
bytenr += num_bytes;
/*
* No longer have used bytes in this block group, queue it for deletion.
* We do this after adding the block group to the dirty list to avoid
* races between cleaner kthread and space cache writeout.
*/
if (!alloc && old_val == 0) {
if (!btrfs_test_opt(info, DISCARD_ASYNC))
btrfs_mark_bg_unused(cache);
} else if (!alloc && reclaim) {
btrfs_mark_bg_to_reclaim(cache);
}
btrfs_put_block_group(cache);
/* Modified block groups are accounted for in the delayed_refs_rsv. */
btrfs_update_delayed_refs_rsv(trans);
return ret;
if (!bg_already_dirty)
btrfs_inc_delayed_refs_rsv_bg_updates(info);
return 0;
}
/*
......
......@@ -221,7 +221,8 @@ int btrfs_block_rsv_add(struct btrfs_fs_info *fs_info,
if (num_bytes == 0)
return 0;
ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv, num_bytes, flush);
ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv->space_info,
num_bytes, flush);
if (!ret)
btrfs_block_rsv_add_bytes(block_rsv, num_bytes, true);
......@@ -261,7 +262,8 @@ int btrfs_block_rsv_refill(struct btrfs_fs_info *fs_info,
if (!ret)
return 0;
ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv, num_bytes, flush);
ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv->space_info,
num_bytes, flush);
if (!ret) {
btrfs_block_rsv_add_bytes(block_rsv, num_bytes, false);
return 0;
......@@ -279,10 +281,10 @@ u64 btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
struct btrfs_block_rsv *target = NULL;
/*
* If we are the delayed_rsv then push to the global rsv, otherwise dump
* into the delayed rsv if it is not full.
* If we are a delayed block reserve then push to the global rsv,
* otherwise dump into the global delayed reserve if it is not full.
*/
if (block_rsv == delayed_rsv)
if (block_rsv->type == BTRFS_BLOCK_RSV_DELOPS)
target = global_rsv;
else if (block_rsv != global_rsv && !btrfs_block_rsv_full(delayed_rsv))
target = delayed_rsv;
......@@ -354,6 +356,11 @@ void btrfs_update_global_block_rsv(struct btrfs_fs_info *fs_info)
min_items++;
}
if (btrfs_fs_incompat(fs_info, RAID_STRIPE_TREE)) {
num_bytes += btrfs_root_used(&fs_info->stripe_root->root_item);
min_items++;
}
/*
* But we also want to reserve enough space so we can do the fallback
* global reserve for an unlink, which is an additional
......@@ -405,6 +412,7 @@ void btrfs_init_root_block_rsv(struct btrfs_root *root)
case BTRFS_EXTENT_TREE_OBJECTID:
case BTRFS_FREE_SPACE_TREE_OBJECTID:
case BTRFS_BLOCK_GROUP_TREE_OBJECTID:
case BTRFS_RAID_STRIPE_TREE_OBJECTID:
root->block_rsv = &fs_info->delayed_refs_rsv;
break;
case BTRFS_ROOT_TREE_OBJECTID:
......@@ -517,8 +525,8 @@ struct btrfs_block_rsv *btrfs_use_block_rsv(struct btrfs_trans_handle *trans,
block_rsv->type, ret);
}
try_reserve:
ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv, blocksize,
BTRFS_RESERVE_NO_FLUSH);
ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv->space_info,
blocksize, BTRFS_RESERVE_NO_FLUSH);
if (!ret)
return block_rsv;
/*
......@@ -539,7 +547,7 @@ struct btrfs_block_rsv *btrfs_use_block_rsv(struct btrfs_trans_handle *trans,
* one last time to force a reservation if there's enough actual space
* on disk to make the reservation.
*/
ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv, blocksize,
ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv->space_info, blocksize,
BTRFS_RESERVE_FLUSH_EMERGENCY);
if (!ret)
return block_rsv;
......
......@@ -8,6 +8,8 @@
#include <linux/hash.h>
#include <linux/refcount.h>
#include <linux/fscrypt.h>
#include <trace/events/btrfs.h>
#include "extent_map.h"
#include "extent_io.h"
#include "ordered-data.h"
......@@ -79,11 +81,21 @@ struct btrfs_inode {
*/
struct btrfs_key location;
/* Cached value of inode property 'compression'. */
u8 prop_compress;
/*
* Force compression on the file using the defrag ioctl, could be
* different from prop_compress and takes precedence if set.
*/
u8 defrag_compress;
/*
* Lock for counters and all fields used to determine if the inode is in
* the log or not (last_trans, last_sub_trans, last_log_commit,
* logged_trans), to access/update new_delalloc_bytes and to update the
* VFS' inode number of bytes used.
* logged_trans), to access/update delalloc_bytes, new_delalloc_bytes,
* defrag_bytes, disk_i_size, outstanding_extents, csum_bytes and to
* update the VFS' inode number of bytes used.
*/
spinlock_t lock;
......@@ -102,8 +114,18 @@ struct btrfs_inode {
/* held while logging the inode in tree-log.c */
struct mutex log_mutex;
/*
* Counters to keep track of the number of extent item's we may use due
* to delalloc and such. outstanding_extents is the number of extent
* items we think we'll end up using, and reserved_extents is the number
* of extent items we've reserved metadata for. Protected by 'lock'.
*/
unsigned outstanding_extents;
/* used to order data wrt metadata */
struct btrfs_ordered_inode_tree ordered_tree;
spinlock_t ordered_tree_lock;
struct rb_root ordered_tree;
struct rb_node *ordered_tree_last;
/* list of all the delalloc inodes in the FS. There are times we need
* to write all the delalloc pages to disk, and this list is used
......@@ -122,28 +144,31 @@ struct btrfs_inode {
u64 generation;
/*
* transid of the trans_handle that last modified this inode
* ID of the transaction handle that last modified this inode.
* Protected by 'lock'.
*/
u64 last_trans;
/*
* transid that last logged this inode
* ID of the transaction that last logged this inode.
* Protected by 'lock'.
*/
u64 logged_trans;
/*
* log transid when this inode was last modified
* Log transaction ID when this inode was last modified.
* Protected by 'lock'.
*/
int last_sub_trans;
/* a local copy of root's last_log_commit */
/* A local copy of root's last_log_commit. Protected by 'lock'. */
int last_log_commit;
union {
/*
* Total number of bytes pending delalloc, used by stat to
* calculate the real block usage of the file. This is used
* only for files.
* only for files. Protected by 'lock'.
*/
u64 delalloc_bytes;
/*
......@@ -161,7 +186,7 @@ struct btrfs_inode {
* Total number of bytes pending delalloc that fall within a file
* range that is either a hole or beyond EOF (and no prealloc extent
* exists in the range). This is always <= delalloc_bytes and this
* is used only for files.
* is used only for files. Protected by 'lock'.
*/
u64 new_delalloc_bytes;
/*
......@@ -172,15 +197,15 @@ struct btrfs_inode {
};
/*
* total number of bytes pending defrag, used by stat to check whether
* it needs COW.
* Total number of bytes pending defrag, used by stat to check whether
* it needs COW. Protected by 'lock'.
*/
u64 defrag_bytes;
/*
* the size of the file stored in the metadata on disk. data=ordered
* The size of the file stored in the metadata on disk. data=ordered
* means the in-memory i_size might be larger than the size on disk
* because not all the blocks are written yet.
* because not all the blocks are written yet. Protected by 'lock'.
*/
u64 disk_i_size;
......@@ -214,7 +239,7 @@ struct btrfs_inode {
/*
* Number of bytes outstanding that are going to need csums. This is
* used in ENOSPC accounting.
* used in ENOSPC accounting. Protected by 'lock'.
*/
u64 csum_bytes;
......@@ -223,30 +248,13 @@ struct btrfs_inode {
/* Read-only compatibility flags, upper half of inode_item::flags */
u32 ro_flags;
/*
* Counters to keep track of the number of extent item's we may use due
* to delalloc and such. outstanding_extents is the number of extent
* items we think we'll end up using, and reserved_extents is the number
* of extent items we've reserved metadata for.
*/
unsigned outstanding_extents;
struct btrfs_block_rsv block_rsv;
/*
* Cached values of inode properties
*/
unsigned prop_compress; /* per-file compression algorithm */
/*
* Force compression on the file using the defrag ioctl, could be
* different from prop_compress and takes precedence if set
*/
unsigned defrag_compress;
struct btrfs_delayed_node *delayed_node;
/* File creation time. */
struct timespec64 i_otime;
u64 i_otime_sec;
u32 i_otime_nsec;
/* Hook into fs_info->delayed_iputs */
struct list_head delayed_iput;
......@@ -387,7 +395,7 @@ static inline bool btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation)
spin_lock(&inode->lock);
if (inode->logged_trans == generation &&
inode->last_sub_trans <= inode->last_log_commit &&
inode->last_sub_trans <= inode->root->last_log_commit)
inode->last_sub_trans <= btrfs_get_root_last_log_commit(inode->root))
ret = true;
spin_unlock(&inode->lock);
return ret;
......@@ -481,9 +489,9 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
struct page *page, size_t pg_offset,
u64 start, u64 end);
int btrfs_update_inode(struct btrfs_trans_handle *trans,
struct btrfs_root *root, struct btrfs_inode *inode);
struct btrfs_inode *inode);
int btrfs_update_inode_fallback(struct btrfs_trans_handle *trans,
struct btrfs_root *root, struct btrfs_inode *inode);
struct btrfs_inode *inode);
int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct btrfs_inode *inode);
int btrfs_orphan_cleanup(struct btrfs_root *root);
int btrfs_cont_expand(struct btrfs_inode *inode, loff_t oldsize, loff_t size);
......
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Copyright (C) STRATO AG 2011. All rights reserved.
*/
#ifndef BTRFS_CHECK_INTEGRITY_H
#define BTRFS_CHECK_INTEGRITY_H
#ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
void btrfsic_check_bio(struct bio *bio);
#else
static inline void btrfsic_check_bio(struct bio *bio) { }
#endif
int btrfsic_mount(struct btrfs_fs_info *fs_info,
struct btrfs_fs_devices *fs_devices,
int including_extent_data, u32 print_mask);
void btrfsic_unmount(struct btrfs_fs_devices *fs_devices);
#endif
......@@ -193,12 +193,12 @@ static noinline void end_compressed_writeback(const struct compressed_bio *cb)
unsigned long index = cb->start >> PAGE_SHIFT;
unsigned long end_index = (cb->start + cb->len - 1) >> PAGE_SHIFT;
struct folio_batch fbatch;
const int errno = blk_status_to_errno(cb->bbio.bio.bi_status);
const int error = blk_status_to_errno(cb->bbio.bio.bi_status);
int i;
int ret;
if (errno)
mapping_set_error(inode->i_mapping, errno);
if (error)
mapping_set_error(inode->i_mapping, error);
folio_batch_init(&fbatch);
while (index <= end_index) {
......
This diff is collapsed.
......@@ -6,37 +6,10 @@
#ifndef BTRFS_CTREE_H
#define BTRFS_CTREE_H
#include <linux/mm.h>
#include <linux/sched/signal.h>
#include <linux/highmem.h>
#include <linux/fs.h>
#include <linux/rwsem.h>
#include <linux/semaphore.h>
#include <linux/completion.h>
#include <linux/backing-dev.h>
#include <linux/wait.h>
#include <linux/slab.h>
#include <trace/events/btrfs.h>
#include <asm/unaligned.h>
#include <linux/pagemap.h>
#include <linux/btrfs.h>
#include <linux/btrfs_tree.h>
#include <linux/workqueue.h>
#include <linux/security.h>
#include <linux/sizes.h>
#include <linux/dynamic_debug.h>
#include <linux/refcount.h>
#include <linux/crc32c.h>
#include <linux/iomap.h>
#include <linux/fscrypt.h>
#include "extent-io-tree.h"
#include "extent_io.h"
#include "extent_map.h"
#include "async-thread.h"
#include "block-rsv.h"
#include "locking.h"
#include "misc.h"
#include "fs.h"
#include "accessors.h"
struct btrfs_trans_handle;
struct btrfs_transaction;
......@@ -218,10 +191,22 @@ struct btrfs_root {
atomic_t log_commit[2];
/* Used only for log trees of subvolumes, not for the log root tree */
atomic_t log_batch;
/*
* Protected by the 'log_mutex' lock but can be read without holding
* that lock to avoid unnecessary lock contention, in which case it
* should be read using btrfs_get_root_log_transid() except if it's a
* log tree in which case it can be directly accessed. Updates to this
* field should always use btrfs_set_root_log_transid(), except for log
* trees where the field can be updated directly.
*/
int log_transid;
/* No matter the commit succeeds or not*/
int log_transid_committed;
/* Just be updated when the commit succeeds. */
/*
* Just be updated when the commit succeeds. Use
* btrfs_get_root_last_log_commit() and btrfs_set_root_last_log_commit()
* to access this field.
*/
int last_log_commit;
pid_t log_start_pid;
......@@ -326,6 +311,9 @@ struct btrfs_root {
/* Used only by log trees, when logging csum items */
struct extent_io_tree log_csum_range;
/* Used in simple quotas, track root during relocation. */
u64 relocation_src_root;
#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
u64 alloc_bytenr;
#endif
......@@ -352,6 +340,26 @@ static inline u64 btrfs_root_id(const struct btrfs_root *root)
return root->root_key.objectid;
}
static inline int btrfs_get_root_log_transid(const struct btrfs_root *root)
{
return READ_ONCE(root->log_transid);
}
static inline void btrfs_set_root_log_transid(struct btrfs_root *root, int log_transid)
{
WRITE_ONCE(root->log_transid, log_transid);
}
static inline int btrfs_get_root_last_log_commit(const struct btrfs_root *root)
{
return READ_ONCE(root->last_log_commit);
}
static inline void btrfs_set_root_last_log_commit(struct btrfs_root *root, int commit_id)
{
WRITE_ONCE(root->last_log_commit, commit_id);
}
/*
* Structure that conveys information about an extent that is going to replace
* all the extents in a file range.
......@@ -470,30 +478,6 @@ static inline u32 BTRFS_MAX_XATTR_SIZE(const struct btrfs_fs_info *info)
#define BTRFS_BYTES_TO_BLKS(fs_info, bytes) \
((bytes) >> (fs_info)->sectorsize_bits)
static inline u32 btrfs_crc32c(u32 crc, const void *address, unsigned length)
{
return crc32c(crc, address, length);
}
static inline void btrfs_crc32c_final(u32 crc, u8 *result)
{
put_unaligned_le32(~crc, result);
}
static inline u64 btrfs_name_hash(const char *name, int len)
{
return crc32c((u32)~1, name, len);
}
/*
* Figure the key offset of an extended inode ref
*/
static inline u64 btrfs_extref_hash(u64 parent_objectid, const char *name,
int len)
{
return (u64) crc32c(parent_objectid, name, len);
}
static inline gfp_t btrfs_alloc_write_mask(struct address_space *mapping)
{
return mapping_gfp_constraint(mapping, ~__GFP_FS);
......@@ -513,12 +497,42 @@ int btrfs_bin_search(struct extent_buffer *eb, int first_slot,
const struct btrfs_key *key, int *slot);
int __pure btrfs_comp_cpu_keys(const struct btrfs_key *k1, const struct btrfs_key *k2);
#ifdef __LITTLE_ENDIAN
/*
* Compare two keys, on little-endian the disk order is same as CPU order and
* we can avoid the conversion.
*/
static inline int btrfs_comp_keys(const struct btrfs_disk_key *disk_key,
const struct btrfs_key *k2)
{
const struct btrfs_key *k1 = (const struct btrfs_key *)disk_key;
return btrfs_comp_cpu_keys(k1, k2);
}
#else
/* Compare two keys in a memcmp fashion. */
static inline int btrfs_comp_keys(const struct btrfs_disk_key *disk,
const struct btrfs_key *k2)
{
struct btrfs_key k1;
btrfs_disk_key_to_cpu(&k1, disk);
return btrfs_comp_cpu_keys(&k1, k2);
}
#endif
int btrfs_previous_item(struct btrfs_root *root,
struct btrfs_path *path, u64 min_objectid,
int type);
int btrfs_previous_extent_item(struct btrfs_root *root,
struct btrfs_path *path, u64 min_objectid);
void btrfs_set_item_key_safe(struct btrfs_fs_info *fs_info,
void btrfs_set_item_key_safe(struct btrfs_trans_handle *trans,
struct btrfs_path *path,
const struct btrfs_key *new_key);
struct extent_buffer *btrfs_root_node(struct btrfs_root *root);
......@@ -536,6 +550,13 @@ int btrfs_cow_block(struct btrfs_trans_handle *trans,
struct extent_buffer *parent, int parent_slot,
struct extent_buffer **cow_ret,
enum btrfs_lock_nesting nest);
int btrfs_force_cow_block(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct extent_buffer *buf,
struct extent_buffer *parent, int parent_slot,
struct extent_buffer **cow_ret,
u64 search_start, u64 empty_size,
enum btrfs_lock_nesting nest);
int btrfs_copy_root(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct extent_buffer *buf,
......@@ -545,8 +566,10 @@ int btrfs_block_can_be_shared(struct btrfs_trans_handle *trans,
struct extent_buffer *buf);
int btrfs_del_ptr(struct btrfs_trans_handle *trans, struct btrfs_root *root,
struct btrfs_path *path, int level, int slot);
void btrfs_extend_item(struct btrfs_path *path, u32 data_size);
void btrfs_truncate_item(struct btrfs_path *path, u32 new_size, int from_end);
void btrfs_extend_item(struct btrfs_trans_handle *trans,
struct btrfs_path *path, u32 data_size);
void btrfs_truncate_item(struct btrfs_trans_handle *trans,
struct btrfs_path *path, u32 new_size, int from_end);
int btrfs_split_item(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct btrfs_path *path,
......@@ -567,10 +590,6 @@ int btrfs_search_slot_for_read(struct btrfs_root *root,
const struct btrfs_key *key,
struct btrfs_path *p, int find_higher,
int return_any);
int btrfs_realloc_node(struct btrfs_trans_handle *trans,
struct btrfs_root *root, struct extent_buffer *parent,
int start_slot, u64 *last_ret,
struct btrfs_key *progress);
void btrfs_release_path(struct btrfs_path *p);
struct btrfs_path *btrfs_alloc_path(void);
void btrfs_free_path(struct btrfs_path *p);
......@@ -610,7 +629,8 @@ struct btrfs_item_batch {
int nr;
};
void btrfs_setup_item_for_insert(struct btrfs_root *root,
void btrfs_setup_item_for_insert(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct btrfs_path *path,
const struct btrfs_key *key,
u32 data_size);
......
......@@ -337,14 +337,119 @@ int btrfs_run_defrag_inodes(struct btrfs_fs_info *fs_info)
return 0;
}
/*
* Check if two blocks addresses are close, used by defrag.
*/
static bool close_blocks(u64 blocknr, u64 other, u32 blocksize)
{
if (blocknr < other && other - (blocknr + blocksize) < SZ_32K)
return true;
if (blocknr > other && blocknr - (other + blocksize) < SZ_32K)
return true;
return false;
}
/*
* Go through all the leaves pointed to by a node and reallocate them so that
* disk order is close to key order.
*/
static int btrfs_realloc_node(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct extent_buffer *parent,
int start_slot, u64 *last_ret,
struct btrfs_key *progress)
{
struct btrfs_fs_info *fs_info = root->fs_info;
const u32 blocksize = fs_info->nodesize;
const int end_slot = btrfs_header_nritems(parent) - 1;
u64 search_start = *last_ret;
u64 last_block = 0;
int ret = 0;
bool progress_passed = false;
/*
* COWing must happen through a running transaction, which always
* matches the current fs generation (it's a transaction with a state
* less than TRANS_STATE_UNBLOCKED). If it doesn't, then turn the fs
* into error state to prevent the commit of any transaction.
*/
if (unlikely(trans->transaction != fs_info->running_transaction ||
trans->transid != fs_info->generation)) {
btrfs_abort_transaction(trans, -EUCLEAN);
btrfs_crit(fs_info,
"unexpected transaction when attempting to reallocate parent %llu for root %llu, transaction %llu running transaction %llu fs generation %llu",
parent->start, btrfs_root_id(root), trans->transid,
fs_info->running_transaction->transid,
fs_info->generation);
return -EUCLEAN;
}
if (btrfs_header_nritems(parent) <= 1)
return 0;
for (int i = start_slot; i <= end_slot; i++) {
struct extent_buffer *cur;
struct btrfs_disk_key disk_key;
u64 blocknr;
u64 other;
bool close = true;
btrfs_node_key(parent, &disk_key, i);
if (!progress_passed && btrfs_comp_keys(&disk_key, progress) < 0)
continue;
progress_passed = true;
blocknr = btrfs_node_blockptr(parent, i);
if (last_block == 0)
last_block = blocknr;
if (i > 0) {
other = btrfs_node_blockptr(parent, i - 1);
close = close_blocks(blocknr, other, blocksize);
}
if (!close && i < end_slot) {
other = btrfs_node_blockptr(parent, i + 1);
close = close_blocks(blocknr, other, blocksize);
}
if (close) {
last_block = blocknr;
continue;
}
cur = btrfs_read_node_slot(parent, i);
if (IS_ERR(cur))
return PTR_ERR(cur);
if (search_start == 0)
search_start = last_block;
btrfs_tree_lock(cur);
ret = btrfs_force_cow_block(trans, root, cur, parent, i,
&cur, search_start,
min(16 * blocksize,
(end_slot - i) * blocksize),
BTRFS_NESTING_COW);
if (ret) {
btrfs_tree_unlock(cur);
free_extent_buffer(cur);
break;
}
search_start = cur->start;
last_block = cur->start;
*last_ret = search_start;
btrfs_tree_unlock(cur);
free_extent_buffer(cur);
}
return ret;
}
/*
* Defrag all the leaves in a given btree.
* Read all the leaves and try to get key order to
* better reflect disk order
*/
int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
struct btrfs_root *root)
static int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
struct btrfs_root *root)
{
struct btrfs_path *path = NULL;
struct btrfs_key key;
......@@ -460,6 +565,45 @@ int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
return ret;
}
/*
* Defrag a given btree. Every leaf in the btree is read and defragmented.
*/
int btrfs_defrag_root(struct btrfs_root *root)
{
struct btrfs_fs_info *fs_info = root->fs_info;
int ret;
if (test_and_set_bit(BTRFS_ROOT_DEFRAG_RUNNING, &root->state))
return 0;
while (1) {
struct btrfs_trans_handle *trans;
trans = btrfs_start_transaction(root, 0);
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
break;
}
ret = btrfs_defrag_leaves(trans, root);
btrfs_end_transaction(trans);
btrfs_btree_balance_dirty(fs_info);
cond_resched();
if (btrfs_fs_closing(fs_info) || ret != -EAGAIN)
break;
if (btrfs_defrag_cancelled(fs_info)) {
btrfs_debug(fs_info, "defrag_root cancelled");
ret = -EAGAIN;
break;
}
}
clear_bit(BTRFS_ROOT_DEFRAG_RUNNING, &root->state);
return ret;
}
/*
* Defrag specific helper to get an extent map.
*
......@@ -891,8 +1035,8 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
* very likely resulting in a larger extent after writeback is
* triggered (except in a case of free space fragmentation).
*/
if (test_range_bit(&inode->io_tree, cur, cur + range_len - 1,
EXTENT_DELALLOC, 0, NULL))
if (test_range_bit_exists(&inode->io_tree, cur, cur + range_len - 1,
EXTENT_DELALLOC))
goto next;
/*
......
......@@ -12,7 +12,7 @@ int btrfs_add_inode_defrag(struct btrfs_trans_handle *trans,
struct btrfs_inode *inode, u32 extent_thresh);
int btrfs_run_defrag_inodes(struct btrfs_fs_info *fs_info);
void btrfs_cleanup_defrag_inodes(struct btrfs_fs_info *fs_info);
int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, struct btrfs_root *root);
int btrfs_defrag_root(struct btrfs_root *root);
static inline int btrfs_defrag_cancelled(struct btrfs_fs_info *fs_info)
{
......
......@@ -322,9 +322,6 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes,
} else {
if (current->journal_info)
flush = BTRFS_RESERVE_FLUSH_LIMIT;
if (btrfs_transaction_in_commit(fs_info))
schedule_timeout(1);
}
num_bytes = ALIGN(num_bytes, fs_info->sectorsize);
......@@ -346,7 +343,8 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes,
noflush);
if (ret)
return ret;
ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv, meta_reserve, flush);
ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv->space_info,
meta_reserve, flush);
if (ret) {
btrfs_qgroup_free_meta_prealloc(root, qgroup_reserve);
return ret;
......
This diff is collapsed.
......@@ -135,7 +135,6 @@ int btrfs_commit_inode_delayed_inode(struct btrfs_inode *inode);
int btrfs_delayed_update_inode(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct btrfs_inode *inode);
int btrfs_fill_inode(struct inode *inode, u32 *rdev);
int btrfs_delayed_delete_inode_ref(struct btrfs_inode *inode);
......
This diff is collapsed.
This diff is collapsed.
......@@ -17,7 +17,6 @@
#include "print-tree.h"
#include "volumes.h"
#include "async-thread.h"
#include "check-integrity.h"
#include "dev-replace.h"
#include "sysfs.h"
#include "zoned.h"
......@@ -444,7 +443,7 @@ int btrfs_run_dev_replace(struct btrfs_trans_handle *trans)
dev_replace->item_needs_writeback = 0;
up_write(&dev_replace->rwsem);
btrfs_mark_buffer_dirty(eb);
btrfs_mark_buffer_dirty(trans, eb);
out:
btrfs_free_path(path);
......
This diff is collapsed.
......@@ -3,6 +3,10 @@
#ifndef BTRFS_DIR_ITEM_H
#define BTRFS_DIR_ITEM_H
#include <linux/crc32c.h>
struct fscrypt_str;
int btrfs_check_dir_item_collision(struct btrfs_root *root, u64 dir,
const struct fscrypt_str *name);
int btrfs_insert_dir_item(struct btrfs_trans_handle *trans,
......@@ -39,4 +43,9 @@ struct btrfs_dir_item *btrfs_match_dir_item_name(struct btrfs_fs_info *fs_info,
const char *name,
int name_len);
static inline u64 btrfs_name_hash(const char *name, int len)
{
return crc32c((u32)~1, name, len);
}
#endif
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment