Commit 149c51f8 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'for-6.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "This round there are a lot of cleanups and moved code so the diffstat
  looks huge, otherwise there are some nice performance improvements and
  an update to raid56 reliability.

  User visible features:

   - raid56 reliability vs performance trade off:
      - fix destructive RMW for raid5 data (raid6 still needs work): do
        full checksum verification for all data during RMW cycle, this
        should prevent rewriting potentially corrupted data without
        notice
      - stripes are cached in memory which should reduce the performance
        impact but still can hurt some workloads
      - checksums are verified after repair again
      - this is the last option without introducing additional features
        (write intent bitmap, journal, another tree), the extra checksum
        read/verification was supposed to be avoided by the original
        implementation exactly for performance reasons but that caused
        all the reliability problems

   - discard=async by default for devices that support it

   - implement emergency flush reserve to avoid almost all unnecessary
     transaction aborts due to ENOSPC in cases where there are too many
     delayed refs or delayed allocation

   - skip block group synchronization if there's no change in used
     bytes, can reduce transaction commit count for some workloads

  Performance improvements:

   - fiemap and lseek:
      - overall speedup due to skipping unnecessary or duplicate
        searches (-40% run time)
      - cache some data structures and sharedness of extents (-30% run
        time)

   - send:
      - faster backref resolution when finding clones
      - cached leaf to root mapping for faster backref walking
      - improved clone/sharing detection
      - overall run time improvements (-70%)

  Core:

   - module initialization converted to a table of function pointers run
     in a sequence

   - preparation for fscrypt, extend passing file names across calls,
     dir item can store encryption status

   - raid56 updates:
      - more accurate error tracking of sectors within stripe
      - simplify recovery path and remove dedicated endio worker kthread
      - simplify scrub call paths
      - refactoring to support the extra data checksum verification
        during RMW cycle

   - tree block parentness checks consolidated and done at metadata read
     time

   - improved error handling

   - cleanups:
      - move a lot of code for better synchronization between kernel and
        user space sources, split big files
      - enum cleanups
      - GFP flag cleanups
      - header file cleanups, prototypes, dependencies
      - redundant parameter cleanups
      - inline extent handling simplifications
      - inode parameter conversion
      - data structure cleanups, reductions, renames, merges"

* tag 'for-6.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (249 commits)
  btrfs: print transaction aborted messages with an error level
  btrfs: sync some cleanups from progs into uapi/btrfs.h
  btrfs: do not BUG_ON() on ENOMEM when dropping extent items for a range
  btrfs: fix extent map use-after-free when handling missing device in read_one_chunk
  btrfs: remove outdated logic from overwrite_item() and add assertion
  btrfs: unify overwrite_item() and do_overwrite_item()
  btrfs: replace strncpy() with strscpy()
  btrfs: fix uninitialized variable in find_first_clear_extent_bit
  btrfs: fix uninitialized parent in insert_state
  btrfs: add might_sleep() annotations
  btrfs: add stack helpers for a few btrfs items
  btrfs: add nr_global_roots to the super block definition
  btrfs: remove BTRFS_LEAF_DATA_OFFSET
  btrfs: add helpers for manipulating leaf items and data
  btrfs: add eb to btrfs_node_key_ptr_offset
  btrfs: pass the extent buffer for the btrfs_item_nr helpers
  btrfs: move the csum helpers into ctree.h
  btrfs: move eb offset helpers into extent_io.h
  btrfs: move file_extent_item helpers into file-item.h
  btrfs: move leaf_data_end into ctree.c
  ...
parents 97971df8 b7af0635
......@@ -23,15 +23,15 @@ obj-$(CONFIG_BTRFS_FS) := btrfs.o
btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
file-item.o inode-item.o disk-io.o \
transaction.o inode.o file.o tree-defrag.o \
extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \
transaction.o inode.o file.o defrag.o \
extent_map.o sysfs.o accessors.o xattr.o ordered-data.o \
extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
export.o tree-log.o free-space-cache.o zlib.o lzo.o zstd.o \
compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
uuid-tree.o props.o free-space-tree.o tree-checker.o space-info.o \
block-rsv.o delalloc-space.o block-group.o discard.o reflink.o \
subpage.o tree-mod-log.o extent-io-tree.o
subpage.o tree-mod-log.o extent-io-tree.o fs.o messages.o bio.o
btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
......
......@@ -4,8 +4,9 @@
*/
#include <asm/unaligned.h>
#include "messages.h"
#include "ctree.h"
#include "accessors.h"
static bool check_setget_bounds(const struct extent_buffer *eb,
const void *ptr, unsigned off, int size)
......@@ -23,6 +24,13 @@ static bool check_setget_bounds(const struct extent_buffer *eb,
return true;
}
void btrfs_init_map_token(struct btrfs_map_token *token, struct extent_buffer *eb)
{
token->eb = eb;
token->kaddr = page_address(eb->pages[0]);
token->offset = 0;
}
/*
* Macro templates that define helpers to read/write extent buffer data of a
* given size, that are also used via ctree.h for access to item members by
......@@ -160,7 +168,7 @@ DEFINE_BTRFS_SETGET_BITS(64)
void btrfs_node_key(const struct extent_buffer *eb,
struct btrfs_disk_key *disk_key, int nr)
{
unsigned long ptr = btrfs_node_key_ptr_offset(nr);
unsigned long ptr = btrfs_node_key_ptr_offset(eb, nr);
read_eb_member(eb, (struct btrfs_key_ptr *)ptr,
struct btrfs_key_ptr, key, disk_key);
}
This diff is collapsed.
......@@ -11,10 +11,10 @@
#include <linux/sched.h>
#include <linux/sched/mm.h>
#include <linux/slab.h>
#include "ctree.h"
#include "btrfs_inode.h"
#include "xattr.h"
#include "acl.h"
struct posix_acl *btrfs_get_acl(struct inode *inode, int type, bool rcu)
{
......
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef BTRFS_ACL_H
#define BTRFS_ACL_H
#ifdef CONFIG_BTRFS_FS_POSIX_ACL
struct posix_acl *btrfs_get_acl(struct inode *inode, int type, bool rcu);
int btrfs_set_acl(struct user_namespace *mnt_userns, struct dentry *dentry,
struct posix_acl *acl, int type);
int __btrfs_set_acl(struct btrfs_trans_handle *trans, struct inode *inode,
struct posix_acl *acl, int type);
#else
#define btrfs_get_acl NULL
#define btrfs_set_acl NULL
static inline int __btrfs_set_acl(struct btrfs_trans_handle *trans,
struct inode *inode, struct posix_acl *acl,
int type)
{
return -EOPNOTSUPP;
}
#endif
#endif
This diff is collapsed.
......@@ -7,10 +7,128 @@
#define BTRFS_BACKREF_H
#include <linux/btrfs.h>
#include "messages.h"
#include "ulist.h"
#include "disk-io.h"
#include "extent_io.h"
/*
* Used by implementations of iterate_extent_inodes_t (see definition below) to
* signal that backref iteration can stop immediately and no error happened.
* The value must be non-negative and must not be 0, 1 (which is a common return
* value from things like btrfs_search_slot() and used internally in the backref
* walking code) and different from BACKREF_FOUND_SHARED and
* BACKREF_FOUND_NOT_SHARED
*/
#define BTRFS_ITERATE_EXTENT_INODES_STOP 5
/*
* Should return 0 if no errors happened and iteration of backrefs should
* continue. Can return BTRFS_ITERATE_EXTENT_INODES_STOP or any other non-zero
* value to immediately stop iteration and possibly signal an error back to
* the caller.
*/
typedef int (iterate_extent_inodes_t)(u64 inum, u64 offset, u64 num_bytes,
u64 root, void *ctx);
/*
* Context and arguments for backref walking functions. Some of the fields are
* to be filled by the caller of such functions while other are filled by the
* functions themselves, as described below.
*/
struct btrfs_backref_walk_ctx {
/*
* The address of the extent for which we are doing backref walking.
* Can be either a data extent or a metadata extent.
*
* Must always be set by the top level caller.
*/
u64 bytenr;
/*
* Offset relative to the target extent. This is only used for data
* extents, and it's meaningful because we can have file extent items
* that point only to a section of a data extent ("bookend" extents),
* and we want to filter out any that don't point to a section of the
* data extent containing the given offset.
*
* Must always be set by the top level caller.
*/
u64 extent_item_pos;
/*
* If true and bytenr corresponds to a data extent, then references from
* all file extent items that point to the data extent are considered,
* @extent_item_pos is ignored.
*/
bool ignore_extent_item_pos;
/* A valid transaction handle or NULL. */
struct btrfs_trans_handle *trans;
/*
* The file system's info object, can not be NULL.
*
* Must always be set by the top level caller.
*/
struct btrfs_fs_info *fs_info;
/*
* Time sequence acquired from btrfs_get_tree_mod_seq(), in case the
* caller joined the tree mod log to get a consistent view of b+trees
* while we do backref walking, or BTRFS_SEQ_LAST.
* When using BTRFS_SEQ_LAST, delayed refs are not checked and it uses
* commit roots when searching b+trees - this is a special case for
* qgroups used during a transaction commit.
*/
u64 time_seq;
/*
* Used to collect the bytenr of metadata extents that point to the
* target extent.
*/
struct ulist *refs;
/*
* List used to collect the IDs of the roots from which the target
* extent is accessible. Can be NULL in case the caller does not care
* about collecting root IDs.
*/
struct ulist *roots;
/*
* Used by iterate_extent_inodes() and the main backref walk code
* (find_parent_nodes()). Lookup and store functions for an optional
* cache which maps the logical address (bytenr) of leaves to an array
* of root IDs.
*/
bool (*cache_lookup)(u64 leaf_bytenr, void *user_ctx,
const u64 **root_ids_ret, int *root_count_ret);
void (*cache_store)(u64 leaf_bytenr, const struct ulist *root_ids,
void *user_ctx);
/*
* If this is not NULL, then the backref walking code will call this
* for each indirect data extent reference as soon as it finds one,
* before collecting all the remaining backrefs and before resolving
* indirect backrefs. This allows for the caller to terminate backref
* walking as soon as it finds one backref that matches some specific
* criteria. The @cache_lookup and @cache_store callbacks should not
* be NULL in order to use this callback.
*/
iterate_extent_inodes_t *indirect_ref_iterator;
/*
* If this is not NULL, then the backref walking code will call this for
* each extent item it's meant to process before it actually starts
* processing it. If this returns anything other than 0, then it stops
* the backref walking code immediately.
*/
int (*check_extent_item)(u64 bytenr, const struct btrfs_extent_item *ei,
const struct extent_buffer *leaf, void *user_ctx);
/*
* If this is not NULL, then the backref walking code will call this for
* each extent data ref it finds (BTRFS_EXTENT_DATA_REF_KEY keys) before
* processing that data ref. If this callback return false, then it will
* ignore this data ref and it will never resolve the indirect data ref,
* saving time searching for leaves in a fs tree with file extent items
* matching the data ref.
*/
bool (*skip_data_ref)(u64 root, u64 ino, u64 offset, void *user_ctx);
/* Context object to pass to the callbacks defined above. */
void *user_ctx;
};
struct inode_fs_paths {
struct btrfs_path *btrfs_path;
struct btrfs_root *fs_root;
......@@ -23,17 +141,59 @@ struct btrfs_backref_shared_cache_entry {
bool is_shared;
};
struct btrfs_backref_shared_cache {
#define BTRFS_BACKREF_CTX_PREV_EXTENTS_SIZE 8
struct btrfs_backref_share_check_ctx {
/* Ulists used during backref walking. */
struct ulist refs;
/*
* The current leaf the caller of btrfs_is_data_extent_shared() is at.
* Typically the caller (at the moment only fiemap) tries to determine
* the sharedness of data extents point by file extent items from entire
* leaves.
*/
u64 curr_leaf_bytenr;
/*
* The previous leaf the caller was at in the previous call to
* btrfs_is_data_extent_shared(). This may be the same as the current
* leaf. On the first call it must be 0.
*/
u64 prev_leaf_bytenr;
/*
* A path from a root to a leaf that has a file extent item pointing to
* a given data extent should never exceed the maximum b+tree height.
*/
struct btrfs_backref_shared_cache_entry entries[BTRFS_MAX_LEVEL];
bool use_cache;
struct btrfs_backref_shared_cache_entry path_cache_entries[BTRFS_MAX_LEVEL];
bool use_path_cache;
/*
* Cache the sharedness result for the last few extents we have found,
* but only for extents for which we have multiple file extent items
* that point to them.
* It's very common to have several file extent items that point to the
* same extent (bytenr) but with different offsets and lengths. This
* typically happens for COW writes, partial writes into prealloc
* extents, NOCOW writes after snapshoting a root, hole punching or
* reflinking within the same file (less common perhaps).
* So keep a small cache with the lookup results for the extent pointed
* by the last few file extent items. This cache is checked, with a
* linear scan, whenever btrfs_is_data_extent_shared() is called, so
* it must be small so that it does not negatively affect performance in
* case we don't have multiple file extent items that point to the same
* data extent.
*/
struct {
u64 bytenr;
bool is_shared;
} prev_extents_cache[BTRFS_BACKREF_CTX_PREV_EXTENTS_SIZE];
/*
* The slot in the prev_extents_cache array that will be used for
* storing the sharedness result of a new data extent.
*/
int prev_extents_cache_slot;
};
typedef int (iterate_extent_inodes_t)(u64 inum, u64 offset, u64 root,
void *ctx);
struct btrfs_backref_share_check_ctx *btrfs_alloc_backref_share_check_ctx(void);
void btrfs_free_backref_share_ctx(struct btrfs_backref_share_check_ctx *ctx);
int extent_from_logical(struct btrfs_fs_info *fs_info, u64 logical,
struct btrfs_path *path, struct btrfs_key *found_key,
......@@ -43,11 +203,9 @@ int tree_backref_for_extent(unsigned long *ptr, struct extent_buffer *eb,
struct btrfs_key *key, struct btrfs_extent_item *ei,
u32 item_size, u64 *out_root, u8 *out_level);
int iterate_extent_inodes(struct btrfs_fs_info *fs_info,
u64 extent_item_objectid,
u64 extent_offset, int search_commit_root,
iterate_extent_inodes_t *iterate, void *ctx,
bool ignore_offset);
int iterate_extent_inodes(struct btrfs_backref_walk_ctx *ctx,
bool search_commit_root,
iterate_extent_inodes_t *iterate, void *user_ctx);
int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info,
struct btrfs_path *path, void *ctx,
......@@ -55,13 +213,8 @@ int iterate_inodes_from_logical(u64 logical, struct btrfs_fs_info *fs_info,
int paths_from_inode(u64 inum, struct inode_fs_paths *ipath);
int btrfs_find_all_leafs(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info, u64 bytenr,
u64 time_seq, struct ulist **leafs,
const u64 *extent_item_pos, bool ignore_offset);
int btrfs_find_all_roots(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info, u64 bytenr,
u64 time_seq, struct ulist **roots,
int btrfs_find_all_leafs(struct btrfs_backref_walk_ctx *ctx);
int btrfs_find_all_roots(struct btrfs_backref_walk_ctx *ctx,
bool skip_commit_root_sem);
char *btrfs_ref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path,
u32 name_len, unsigned long name_off,
......@@ -77,10 +230,9 @@ int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
u64 start_off, struct btrfs_path *path,
struct btrfs_inode_extref **ret_extref,
u64 *found_off);
int btrfs_is_data_extent_shared(struct btrfs_root *root, u64 inum, u64 bytenr,
int btrfs_is_data_extent_shared(struct btrfs_inode *inode, u64 bytenr,
u64 extent_gen,
struct ulist *roots, struct ulist *tmp,
struct btrfs_backref_shared_cache *cache);
struct btrfs_backref_share_check_ctx *ctx);
int __init btrfs_prelim_ref_init(void);
void __cold btrfs_prelim_ref_exit(void);
......@@ -111,8 +263,7 @@ struct btrfs_backref_iter {
u32 end_ptr;
};
struct btrfs_backref_iter *btrfs_backref_iter_alloc(
struct btrfs_fs_info *fs_info, gfp_t gfp_flag);
struct btrfs_backref_iter *btrfs_backref_iter_alloc(struct btrfs_fs_info *fs_info);
static inline void btrfs_backref_iter_free(struct btrfs_backref_iter *iter)
{
......
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Copyright (C) 2007 Oracle. All rights reserved.
* Copyright (C) 2022 Christoph Hellwig.
*/
#ifndef BTRFS_BIO_H
#define BTRFS_BIO_H
#include <linux/bio.h>
#include <linux/workqueue.h>
#include "tree-checker.h"
struct btrfs_bio;
struct btrfs_fs_info;
#define BTRFS_BIO_INLINE_CSUM_SIZE 64
/*
* Maximum number of sectors for a single bio to limit the size of the
* checksum array. This matches the number of bio_vecs per bio and thus the
* I/O size for buffered I/O.
*/
#define BTRFS_MAX_BIO_SECTORS (256)
typedef void (*btrfs_bio_end_io_t)(struct btrfs_bio *bbio);
/*
* Additional info to pass along bio.
*
* Mostly for btrfs specific features like csum and mirror_num.
*/
struct btrfs_bio {
unsigned int mirror_num:7;
/*
* Extra indicator for metadata bios.
* For some btrfs bios they use pages without a mapping, thus
* we can not rely on page->mapping->host to determine if
* it's a metadata bio.
*/
unsigned int is_metadata:1;
struct bvec_iter iter;
/* for direct I/O */
u64 file_offset;
/* @device is for stripe IO submission. */
struct btrfs_device *device;
union {
/* For data checksum verification. */
struct {
u8 *csum;
u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE];
};
/* For metadata parentness verification. */
struct btrfs_tree_parent_check parent_check;
};
/* End I/O information supplied to btrfs_bio_alloc */
btrfs_bio_end_io_t end_io;
void *private;
/* For read end I/O handling */
struct work_struct end_io_work;
/*
* This member must come last, bio_alloc_bioset will allocate enough
* bytes for entire btrfs_bio but relies on bio being last.
*/
struct bio bio;
};
static inline struct btrfs_bio *btrfs_bio(struct bio *bio)
{
return container_of(bio, struct btrfs_bio, bio);
}
int __init btrfs_bioset_init(void);
void __cold btrfs_bioset_exit(void);
struct bio *btrfs_bio_alloc(unsigned int nr_vecs, blk_opf_t opf,
btrfs_bio_end_io_t end_io, void *private);
struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size,
btrfs_bio_end_io_t end_io, void *private);
static inline void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
{
bbio->bio.bi_status = status;
bbio->end_io(bbio);
}
static inline void btrfs_bio_free_csum(struct btrfs_bio *bbio)
{
if (bbio->is_metadata)
return;
if (bbio->csum != bbio->csum_inline) {
kfree(bbio->csum);
bbio->csum = NULL;
}
}
/*
* Iterate through a btrfs_bio (@bbio) on a per-sector basis.
*
* bvl - struct bio_vec
* bbio - struct btrfs_bio
* iters - struct bvec_iter
* bio_offset - unsigned int
*/
#define btrfs_bio_for_each_sector(fs_info, bvl, bbio, iter, bio_offset) \
for ((iter) = (bbio)->iter, (bio_offset) = 0; \
(iter).bi_size && \
(((bvl) = bio_iter_iovec((&(bbio)->bio), (iter))), 1); \
(bio_offset) += fs_info->sectorsize, \
bio_advance_iter_single(&(bbio)->bio, &(iter), \
(fs_info)->sectorsize))
void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
int mirror_num);
int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
u64 length, u64 logical, struct page *page,
unsigned int pg_offset, int mirror_num);
#endif
......@@ -17,6 +17,21 @@
#include "discard.h"
#include "raid56.h"
#include "zoned.h"
#include "fs.h"
#include "accessors.h"
#include "extent-tree.h"
#ifdef CONFIG_BTRFS_DEBUG
int btrfs_should_fragment_free_space(struct btrfs_block_group *block_group)
{
struct btrfs_fs_info *fs_info = block_group->fs_info;
return (btrfs_test_opt(fs_info, FRAGMENT_METADATA) &&
block_group->flags & BTRFS_BLOCK_GROUP_METADATA) ||
(btrfs_test_opt(fs_info, FRAGMENT_DATA) &&
block_group->flags & BTRFS_BLOCK_GROUP_DATA);
}
#endif
/*
* Return target flags in extended format or 0 if restripe for this chunk_type
......@@ -284,7 +299,7 @@ struct btrfs_block_group *btrfs_next_block_group(
return cache;
}
/**
/*
* Check if we can do a NOCOW write for a given extent.
*
* @fs_info: The filesystem information object.
......@@ -325,11 +340,9 @@ struct btrfs_block_group *btrfs_inc_nocow_writers(struct btrfs_fs_info *fs_info,
return bg;
}
/**
/*
* Decrement the number of NOCOW writers in a block group.
*
* @bg: The block group.
*
* This is meant to be called after a previous call to btrfs_inc_nocow_writers(),
* and on the block group returned by that call. Typically this is called after
* creating an ordered extent for a NOCOW write, to prevent races with scrub and
......@@ -1527,6 +1540,30 @@ static inline bool btrfs_should_reclaim(struct btrfs_fs_info *fs_info)
return true;
}
static bool should_reclaim_block_group(struct btrfs_block_group *bg, u64 bytes_freed)
{
const struct btrfs_space_info *space_info = bg->space_info;
const int reclaim_thresh = READ_ONCE(space_info->bg_reclaim_threshold);
const u64 new_val = bg->used;
const u64 old_val = new_val + bytes_freed;
u64 thresh;
if (reclaim_thresh == 0)
return false;
thresh = mult_perc(bg->length, reclaim_thresh);
/*
* If we were below the threshold before don't reclaim, we are likely a
* brand new block group and we don't want to relocate new block groups.
*/
if (old_val < thresh)
return false;
if (new_val >= thresh)
return false;
return true;
}
void btrfs_reclaim_bgs_work(struct work_struct *work)
{
struct btrfs_fs_info *fs_info =
......@@ -1594,6 +1631,40 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
up_write(&space_info->groups_sem);
goto next;
}
if (bg->used == 0) {
/*
* It is possible that we trigger relocation on a block
* group as its extents are deleted and it first goes
* below the threshold, then shortly after goes empty.
*
* In this case, relocating it does delete it, but has
* some overhead in relocation specific metadata, looking
* for the non-existent extents and running some extra
* transactions, which we can avoid by using one of the
* other mechanisms for dealing with empty block groups.
*/
if (!btrfs_test_opt(fs_info, DISCARD_ASYNC))
btrfs_mark_bg_unused(bg);
spin_unlock(&bg->lock);
up_write(&space_info->groups_sem);
goto next;
}
/*
* The block group might no longer meet the reclaim condition by
* the time we get around to reclaiming it, so to avoid
* reclaiming overly full block_groups, skip reclaiming them.
*
* Since the decision making process also depends on the amount
* being freed, pass in a fake giant value to skip that extra
* check, which is more meaningful when adding to the list in
* the first place.
*/
if (!should_reclaim_block_group(bg, bg->length)) {
spin_unlock(&bg->lock);
up_write(&space_info->groups_sem);
goto next;
}
spin_unlock(&bg->lock);
/* Get out fast, in case we're unmounting the filesystem */
......@@ -1740,8 +1811,8 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags)
write_sequnlock(&fs_info->profiles_lock);
}
/**
* Map a physical disk address to a list of logical addresses
/*
* Map a physical disk address to a list of logical addresses.
*
* @fs_info: the filesystem
* @chunk_start: logical address of block group
......@@ -2001,6 +2072,7 @@ static int read_one_block_group(struct btrfs_fs_info *info,
cache->length = key->offset;
cache->used = btrfs_stack_block_group_used(bgi);
cache->commit_used = cache->used;
cache->flags = btrfs_stack_block_group_flags(bgi);
cache->global_root_id = btrfs_stack_block_group_chunk_objectid(bgi);
......@@ -2481,7 +2553,7 @@ struct btrfs_block_group *btrfs_make_block_group(struct btrfs_trans_handle *tran
cache->global_root_id = calculate_global_root_id(fs_info, cache->start);
if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE))
cache->needs_free_space = 1;
set_bit(BLOCK_GROUP_FLAG_NEEDS_FREE_SPACE, &cache->runtime_flags);
ret = btrfs_load_block_group_zone_info(cache, true);
if (ret) {
......@@ -2692,6 +2764,25 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
struct extent_buffer *leaf;
struct btrfs_block_group_item bgi;
struct btrfs_key key;
u64 old_commit_used;
u64 used;
/*
* Block group items update can be triggered out of commit transaction
* critical section, thus we need a consistent view of used bytes.
* We cannot use cache->used directly outside of the spin lock, as it
* may be changed.
*/
spin_lock(&cache->lock);
old_commit_used = cache->commit_used;
used = cache->used;
/* No change in used bytes, can safely skip it. */
if (cache->commit_used == used) {
spin_unlock(&cache->lock);
return 0;
}
cache->commit_used = used;
spin_unlock(&cache->lock);
key.objectid = cache->start;
key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
......@@ -2706,7 +2797,7 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
leaf = path->nodes[0];
bi = btrfs_item_ptr_offset(leaf, path->slots[0]);
btrfs_set_stack_block_group_used(&bgi, cache->used);
btrfs_set_stack_block_group_used(&bgi, used);
btrfs_set_stack_block_group_chunk_objectid(&bgi,
cache->global_root_id);
btrfs_set_stack_block_group_flags(&bgi, cache->flags);
......@@ -2714,6 +2805,12 @@ static int update_block_group_item(struct btrfs_trans_handle *trans,
btrfs_mark_buffer_dirty(leaf);
fail:
btrfs_release_path(path);
/* We didn't update the block group item, need to revert @commit_used. */
if (ret < 0) {
spin_lock(&cache->lock);
cache->commit_used = old_commit_used;
spin_unlock(&cache->lock);
}
return ret;
}
......@@ -3211,31 +3308,6 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans)
return ret;
}
static inline bool should_reclaim_block_group(struct btrfs_block_group *bg,
u64 bytes_freed)
{
const struct btrfs_space_info *space_info = bg->space_info;
const int reclaim_thresh = READ_ONCE(space_info->bg_reclaim_threshold);
const u64 new_val = bg->used;
const u64 old_val = new_val + bytes_freed;
u64 thresh;
if (reclaim_thresh == 0)
return false;
thresh = div_factor_fine(bg->length, reclaim_thresh);
/*
* If we were below the threshold before don't reclaim, we are likely a
* brand new block group and we don't want to relocate new block groups.
*/
if (old_val < thresh)
return false;
if (new_val >= thresh)
return false;
return true;
}
int btrfs_update_block_group(struct btrfs_trans_handle *trans,
u64 bytenr, u64 num_bytes, bool alloc)
{
......@@ -3347,8 +3419,9 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
return ret;
}
/**
* btrfs_add_reserved_bytes - update the block_group and space info counters
/*
* Update the block_group and space info counters.
*
* @cache: The cache we are manipulating
* @ram_bytes: The number of bytes of file content, and will be same to
* @num_bytes except for the compress path.
......@@ -3391,8 +3464,9 @@ int btrfs_add_reserved_bytes(struct btrfs_block_group *cache,
return ret;
}
/**
* btrfs_free_reserved_bytes - update the block_group and space info counters
/*
* Update the block_group and space info counters.
*
* @cache: The cache we are manipulating
* @num_bytes: The number of bytes in question
* @delalloc: The blocks are allocated for the delalloc write
......@@ -3449,13 +3523,13 @@ static int should_alloc_chunk(struct btrfs_fs_info *fs_info,
*/
if (force == CHUNK_ALLOC_LIMITED) {
thresh = btrfs_super_total_bytes(fs_info->super_copy);
thresh = max_t(u64, SZ_64M, div_factor_fine(thresh, 1));
thresh = max_t(u64, SZ_64M, mult_perc(thresh, 1));
if (sinfo->total_bytes - bytes_used < thresh)
return 1;
}
if (bytes_used + SZ_2M < div_factor(sinfo->total_bytes, 8))
if (bytes_used + SZ_2M < mult_perc(sinfo->total_bytes, 80))
return 0;
return 1;
}
......
......@@ -55,6 +55,10 @@ enum btrfs_block_group_flags {
BLOCK_GROUP_FLAG_CHUNK_ITEM_INSERTED,
BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE,
BLOCK_GROUP_FLAG_ZONED_DATA_RELOC,
/* Does the block group need to be added to the free space tree? */
BLOCK_GROUP_FLAG_NEEDS_FREE_SPACE,
/* Indicate that the block group is placed on a sequential zone */
BLOCK_GROUP_FLAG_SEQUENTIAL_ZONE,
};
enum btrfs_caching_type {
......@@ -99,6 +103,12 @@ struct btrfs_block_group {
u64 cache_generation;
u64 global_root_id;
/*
* The last committed used bytes of this block group, if the above @used
* is still the same as @commit_used, we don't need to update block
* group item of this block group.
*/
u64 commit_used;
/*
* If the free space extent count exceeds this number, convert the block
* group to bitmaps.
......@@ -202,15 +212,6 @@ struct btrfs_block_group {
/* Lock for free space tree operations. */
struct mutex free_space_lock;
/*
* Does the block group need to be added to the free space tree?
* Protected by free_space_lock.
*/
int needs_free_space;
/* Flag indicating this block group is placed on a sequential zone */
bool seq_zone;
/*
* Number of extents in this block group used for swap files.
* All accesses protected by the spinlock 'lock'.
......@@ -251,16 +252,7 @@ static inline bool btrfs_is_block_group_data_only(
}
#ifdef CONFIG_BTRFS_DEBUG
static inline int btrfs_should_fragment_free_space(
struct btrfs_block_group *block_group)
{
struct btrfs_fs_info *fs_info = block_group->fs_info;
return (btrfs_test_opt(fs_info, FRAGMENT_METADATA) &&
block_group->flags & BTRFS_BLOCK_GROUP_METADATA) ||
(btrfs_test_opt(fs_info, FRAGMENT_DATA) &&
block_group->flags & BTRFS_BLOCK_GROUP_DATA);
}
int btrfs_should_fragment_free_space(struct btrfs_block_group *block_group);
#endif
struct btrfs_block_group *btrfs_lookup_first_block_group(
......
......@@ -7,6 +7,8 @@
#include "transaction.h"
#include "block-group.h"
#include "disk-io.h"
#include "fs.h"
#include "accessors.h"
/*
* HOW DO BLOCK RESERVES WORK
......@@ -225,7 +227,7 @@ int btrfs_block_rsv_add(struct btrfs_fs_info *fs_info,
return ret;
}
int btrfs_block_rsv_check(struct btrfs_block_rsv *block_rsv, int min_factor)
int btrfs_block_rsv_check(struct btrfs_block_rsv *block_rsv, int min_percent)
{
u64 num_bytes = 0;
int ret = -ENOSPC;
......@@ -234,7 +236,7 @@ int btrfs_block_rsv_check(struct btrfs_block_rsv *block_rsv, int min_factor)
return 0;
spin_lock(&block_rsv->lock);
num_bytes = div_factor(block_rsv->size, min_factor);
num_bytes = mult_perc(block_rsv->size, min_percent);
if (block_rsv->reserved >= num_bytes)
ret = 0;
spin_unlock(&block_rsv->lock);
......@@ -323,31 +325,6 @@ void btrfs_block_rsv_add_bytes(struct btrfs_block_rsv *block_rsv,
spin_unlock(&block_rsv->lock);
}
int btrfs_cond_migrate_bytes(struct btrfs_fs_info *fs_info,
struct btrfs_block_rsv *dest, u64 num_bytes,
int min_factor)
{
struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv;
u64 min_bytes;
if (global_rsv->space_info != dest->space_info)
return -ENOSPC;
spin_lock(&global_rsv->lock);
min_bytes = div_factor(global_rsv->size, min_factor);
if (global_rsv->reserved < min_bytes + num_bytes) {
spin_unlock(&global_rsv->lock);
return -ENOSPC;
}
global_rsv->reserved -= num_bytes;
if (global_rsv->reserved < global_rsv->size)
global_rsv->full = false;
spin_unlock(&global_rsv->lock);
btrfs_block_rsv_add_bytes(dest, num_bytes, true);
return 0;
}
void btrfs_update_global_block_rsv(struct btrfs_fs_info *fs_info)
{
struct btrfs_block_rsv *block_rsv = &fs_info->global_block_rsv;
......@@ -552,5 +529,17 @@ struct btrfs_block_rsv *btrfs_use_block_rsv(struct btrfs_trans_handle *trans,
if (!ret)
return global_rsv;
}
/*
* All hope is lost, but of course our reservations are overly
* pessimistic, so instead of possibly having an ENOSPC abort here, try
* one last time to force a reservation if there's enough actual space
* on disk to make the reservation.
*/
ret = btrfs_reserve_metadata_bytes(fs_info, block_rsv, blocksize,
BTRFS_RESERVE_FLUSH_EMERGENCY);
if (!ret)
return block_rsv;
return ERR_PTR(ret);
}
......@@ -4,6 +4,7 @@
#define BTRFS_BLOCK_RSV_H
struct btrfs_trans_handle;
struct btrfs_root;
enum btrfs_reserve_flush_enum;
/*
......@@ -62,7 +63,7 @@ void btrfs_free_block_rsv(struct btrfs_fs_info *fs_info,
int btrfs_block_rsv_add(struct btrfs_fs_info *fs_info,
struct btrfs_block_rsv *block_rsv, u64 num_bytes,
enum btrfs_reserve_flush_enum flush);
int btrfs_block_rsv_check(struct btrfs_block_rsv *block_rsv, int min_factor);
int btrfs_block_rsv_check(struct btrfs_block_rsv *block_rsv, int min_percent);
int btrfs_block_rsv_refill(struct btrfs_fs_info *fs_info,
struct btrfs_block_rsv *block_rsv, u64 min_reserved,
enum btrfs_reserve_flush_enum flush);
......@@ -70,9 +71,6 @@ int btrfs_block_rsv_migrate(struct btrfs_block_rsv *src_rsv,
struct btrfs_block_rsv *dst_rsv, u64 num_bytes,
bool update_size);
int btrfs_block_rsv_use_bytes(struct btrfs_block_rsv *block_rsv, u64 num_bytes);
int btrfs_cond_migrate_bytes(struct btrfs_fs_info *fs_info,
struct btrfs_block_rsv *dest, u64 num_bytes,
int min_factor);
void btrfs_block_rsv_add_bytes(struct btrfs_block_rsv *block_rsv,
u64 num_bytes, bool update_size);
u64 btrfs_block_rsv_release(struct btrfs_fs_info *fs_info,
......
......@@ -411,29 +411,142 @@ static inline void btrfs_inode_split_flags(u64 inode_item_flags,
#define CSUM_FMT "0x%*phN"
#define CSUM_FMT_VALUE(size, bytes) size, bytes
static inline void btrfs_print_data_csum_error(struct btrfs_inode *inode,
u64 logical_start, u8 *csum, u8 *csum_expected, int mirror_num)
{
struct btrfs_root *root = inode->root;
const u32 csum_size = root->fs_info->csum_size;
/* Output minus objectid, which is more meaningful */
if (root->root_key.objectid >= BTRFS_LAST_FREE_OBJECTID)
btrfs_warn_rl(root->fs_info,
"csum failed root %lld ino %lld off %llu csum " CSUM_FMT " expected csum " CSUM_FMT " mirror %d",
root->root_key.objectid, btrfs_ino(inode),
logical_start,
CSUM_FMT_VALUE(csum_size, csum),
CSUM_FMT_VALUE(csum_size, csum_expected),
mirror_num);
else
btrfs_warn_rl(root->fs_info,
"csum failed root %llu ino %llu off %llu csum " CSUM_FMT " expected csum " CSUM_FMT " mirror %d",
root->root_key.objectid, btrfs_ino(inode),
logical_start,
CSUM_FMT_VALUE(csum_size, csum),
CSUM_FMT_VALUE(csum_size, csum_expected),
mirror_num);
}
void btrfs_submit_data_write_bio(struct btrfs_inode *inode, struct bio *bio, int mirror_num);
void btrfs_submit_data_read_bio(struct btrfs_inode *inode, struct bio *bio,
int mirror_num, enum btrfs_compression_type compress_type);
void btrfs_submit_dio_repair_bio(struct btrfs_inode *inode, struct bio *bio, int mirror_num);
blk_status_t btrfs_submit_bio_start(struct btrfs_inode *inode, struct bio *bio);
blk_status_t btrfs_submit_bio_start_direct_io(struct btrfs_inode *inode,
struct bio *bio,
u64 dio_file_offset);
int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
u32 pgoff, u8 *csum, const u8 * const csum_expected);
int btrfs_check_data_csum(struct btrfs_inode *inode, struct btrfs_bio *bbio,
u32 bio_offset, struct page *page, u32 pgoff);
unsigned int btrfs_verify_data_csum(struct btrfs_bio *bbio,
u32 bio_offset, struct page *page,
u64 start, u64 end);
noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
u64 *orig_start, u64 *orig_block_len,
u64 *ram_bytes, bool nowait, bool strict);
void __btrfs_del_delalloc_inode(struct btrfs_root *root, struct btrfs_inode *inode);
struct inode *btrfs_lookup_dentry(struct inode *dir, struct dentry *dentry);
int btrfs_set_inode_index(struct btrfs_inode *dir, u64 *index);
int btrfs_unlink_inode(struct btrfs_trans_handle *trans,
struct btrfs_inode *dir, struct btrfs_inode *inode,
const struct fscrypt_str *name);
int btrfs_add_link(struct btrfs_trans_handle *trans,
struct btrfs_inode *parent_inode, struct btrfs_inode *inode,
const struct fscrypt_str *name, int add_backref, u64 index);
int btrfs_delete_subvolume(struct btrfs_inode *dir, struct dentry *dentry);
int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len,
int front);
int btrfs_start_delalloc_snapshot(struct btrfs_root *root, bool in_reclaim_context);
int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, long nr,
bool in_reclaim_context);
int btrfs_set_extent_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
unsigned int extra_bits,
struct extent_state **cached_state);
struct btrfs_new_inode_args {
/* Input */
struct inode *dir;
struct dentry *dentry;
struct inode *inode;
bool orphan;
bool subvol;
/* Output from btrfs_new_inode_prepare(), input to btrfs_create_new_inode(). */
struct posix_acl *default_acl;
struct posix_acl *acl;
struct fscrypt_name fname;
};
int btrfs_new_inode_prepare(struct btrfs_new_inode_args *args,
unsigned int *trans_num_items);
int btrfs_create_new_inode(struct btrfs_trans_handle *trans,
struct btrfs_new_inode_args *args);
void btrfs_new_inode_args_destroy(struct btrfs_new_inode_args *args);
struct inode *btrfs_new_subvol_inode(struct user_namespace *mnt_userns,
struct inode *dir);
void btrfs_set_delalloc_extent(struct btrfs_inode *inode, struct extent_state *state,
u32 bits);
void btrfs_clear_delalloc_extent(struct btrfs_inode *inode,
struct extent_state *state, u32 bits);
void btrfs_merge_delalloc_extent(struct btrfs_inode *inode, struct extent_state *new,
struct extent_state *other);
void btrfs_split_delalloc_extent(struct btrfs_inode *inode,
struct extent_state *orig, u64 split);
void btrfs_set_range_writeback(struct btrfs_inode *inode, u64 start, u64 end);
vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf);
void btrfs_evict_inode(struct inode *inode);
struct inode *btrfs_alloc_inode(struct super_block *sb);
void btrfs_destroy_inode(struct inode *inode);
void btrfs_free_inode(struct inode *inode);
int btrfs_drop_inode(struct inode *inode);
int __init btrfs_init_cachep(void);
void __cold btrfs_destroy_cachep(void);
struct inode *btrfs_iget_path(struct super_block *s, u64 ino,
struct btrfs_root *root, struct btrfs_path *path);
struct inode *btrfs_iget(struct super_block *s, u64 ino, struct btrfs_root *root);
struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
struct page *page, size_t pg_offset,
u64 start, u64 end);
int btrfs_update_inode(struct btrfs_trans_handle *trans,
struct btrfs_root *root, struct btrfs_inode *inode);
int btrfs_update_inode_fallback(struct btrfs_trans_handle *trans,
struct btrfs_root *root, struct btrfs_inode *inode);
int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct btrfs_inode *inode);
int btrfs_orphan_cleanup(struct btrfs_root *root);
int btrfs_cont_expand(struct btrfs_inode *inode, loff_t oldsize, loff_t size);
void btrfs_add_delayed_iput(struct btrfs_inode *inode);
void btrfs_run_delayed_iputs(struct btrfs_fs_info *fs_info);
int btrfs_wait_on_delayed_iputs(struct btrfs_fs_info *fs_info);
int btrfs_prealloc_file_range(struct inode *inode, int mode,
u64 start, u64 num_bytes, u64 min_size,
loff_t actual_len, u64 *alloc_hint);
int btrfs_prealloc_file_range_trans(struct inode *inode,
struct btrfs_trans_handle *trans, int mode,
u64 start, u64 num_bytes, u64 min_size,
loff_t actual_len, u64 *alloc_hint);
int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page,
u64 start, u64 end, int *page_started,
unsigned long *nr_written, struct writeback_control *wbc);
int btrfs_writepage_cow_fixup(struct page *page);
void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode,
struct page *page, u64 start,
u64 end, bool uptodate);
int btrfs_encoded_io_compression_from_extent(struct btrfs_fs_info *fs_info,
int compress_type);
int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
u64 file_offset, u64 disk_bytenr,
u64 disk_io_size,
struct page **pages);
ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter,
struct btrfs_ioctl_encoded_io_args *encoded);
ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
const struct btrfs_ioctl_encoded_io_args *encoded);
ssize_t btrfs_dio_read(struct kiocb *iocb, struct iov_iter *iter,
size_t done_before);
struct iomap_dio *btrfs_dio_write(struct kiocb *iocb, struct iov_iter *iter,
size_t done_before);
extern const struct dentry_operations btrfs_dentry_operations;
/* Inode locking type flags, by default the exclusive lock is taken. */
enum btrfs_ilock_type {
ENUM_BIT(BTRFS_ILOCK_SHARED),
ENUM_BIT(BTRFS_ILOCK_TRY),
ENUM_BIT(BTRFS_ILOCK_MMAP),
};
int btrfs_inode_lock(struct btrfs_inode *inode, unsigned int ilock_flags);
void btrfs_inode_unlock(struct btrfs_inode *inode, unsigned int ilock_flags);
void btrfs_update_inode_bytes(struct btrfs_inode *inode, const u64 add_bytes,
const u64 del_bytes);
void btrfs_assert_inode_range_clean(struct btrfs_inode *inode, u64 start, u64 end);
#endif
......@@ -82,6 +82,7 @@
#include <linux/mm.h>
#include <linux/string.h>
#include <crypto/hash.h>
#include "messages.h"
#include "ctree.h"
#include "disk-io.h"
#include "transaction.h"
......@@ -92,6 +93,7 @@
#include "check-integrity.h"
#include "rcu-string.h"
#include "compression.h"
#include "accessors.h"
#define BTRFSIC_BLOCK_HASHTABLE_SIZE 0x10000
#define BTRFSIC_BLOCK_LINK_HASHTABLE_SIZE 0x10000
......@@ -755,7 +757,7 @@ static int btrfsic_process_superblock_dev_mirror(
btrfs_info_in_rcu(fs_info,
"new initial S-block (bdev %p, %s) @%llu (%pg/%llu/%d)",
superblock_bdev,
rcu_str_deref(device->name), dev_bytenr,
btrfs_dev_name(device), dev_bytenr,
dev_state->bdev, dev_bytenr,
superblock_mirror_num);
list_add(&superblock_tmp->all_blocks_node,
......
......@@ -23,16 +23,19 @@
#include <crypto/hash.h>
#include "misc.h"
#include "ctree.h"
#include "fs.h"
#include "disk-io.h"
#include "transaction.h"
#include "btrfs_inode.h"
#include "volumes.h"
#include "bio.h"
#include "ordered-data.h"
#include "compression.h"
#include "extent_io.h"
#include "extent_map.h"
#include "subpage.h"
#include "zoned.h"
#include "file-item.h"
#include "super.h"
static const char* const btrfs_compress_types[] = { "", "zlib", "lzo", "zstd" };
......@@ -116,7 +119,7 @@ static int compression_decompress_bio(struct list_head *ws,
}
static int compression_decompress(int type, struct list_head *ws,
unsigned char *data_in, struct page *dest_page,
const u8 *data_in, struct page *dest_page,
unsigned long start_byte, size_t srclen, size_t destlen)
{
switch (type) {
......@@ -183,7 +186,7 @@ static void end_compressed_bio_read(struct btrfs_bio *bbio)
u64 start = bbio->file_offset + offset;
if (!status &&
(!csum || !btrfs_check_data_csum(inode, bbio, offset,
(!csum || !btrfs_check_data_csum(bi, bbio, offset,
bv.bv_page, bv.bv_offset))) {
btrfs_clean_io_failure(bi, start, bv.bv_page,
bv.bv_offset);
......@@ -191,9 +194,9 @@ static void end_compressed_bio_read(struct btrfs_bio *bbio)
int ret;
refcount_inc(&cb->pending_ios);
ret = btrfs_repair_one_sector(inode, bbio, offset,
ret = btrfs_repair_one_sector(BTRFS_I(inode), bbio, offset,
bv.bv_page, bv.bv_offset,
btrfs_submit_data_read_bio);
true);
if (ret) {
refcount_dec(&cb->pending_ios);
status = errno_to_blk_status(ret);
......@@ -1229,7 +1232,7 @@ static int btrfs_decompress_bio(struct compressed_bio *cb)
* single page, and we want to read a single page out of it.
* start_byte tells us the offset into the compressed data we're interested in
*/
int btrfs_decompress(int type, unsigned char *data_in, struct page *dest_page,
int btrfs_decompress(int type, const u8 *data_in, struct page *dest_page,
unsigned long start_byte, size_t srclen, size_t destlen)
{
struct list_head *workspace;
......@@ -1243,12 +1246,13 @@ int btrfs_decompress(int type, unsigned char *data_in, struct page *dest_page,
return ret;
}
void __init btrfs_init_compress(void)
int __init btrfs_init_compress(void)
{
btrfs_init_workspace_manager(BTRFS_COMPRESS_NONE);
btrfs_init_workspace_manager(BTRFS_COMPRESS_ZLIB);
btrfs_init_workspace_manager(BTRFS_COMPRESS_LZO);
zstd_init_workspace_manager();
return 0;
}
void __cold btrfs_exit_compress(void)
......
......@@ -6,6 +6,7 @@
#ifndef BTRFS_COMPRESSION_H
#define BTRFS_COMPRESSION_H
#include <linux/blk_types.h>
#include <linux/sizes.h>
struct btrfs_inode;
......@@ -77,7 +78,7 @@ static inline unsigned int btrfs_compress_level(unsigned int type_level)
return ((type_level & 0xF0) >> 4);
}
void __init btrfs_init_compress(void);
int __init btrfs_init_compress(void);
void __cold btrfs_exit_compress(void);
int btrfs_compress_pages(unsigned int type_level, struct address_space *mapping,
......@@ -85,7 +86,7 @@ int btrfs_compress_pages(unsigned int type_level, struct address_space *mapping,
unsigned long *out_pages,
unsigned long *total_in,
unsigned long *total_out);
int btrfs_decompress(int type, unsigned char *data_in, struct page *dest_page,
int btrfs_decompress(int type, const u8 *data_in, struct page *dest_page,
unsigned long start_byte, size_t srclen, size_t destlen);
int btrfs_decompress_buf2page(const char *buf, u32 buf_len,
struct compressed_bio *cb, u32 decompressed);
......@@ -149,7 +150,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
u64 start, struct page **pages, unsigned long *out_pages,
unsigned long *total_in, unsigned long *total_out);
int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb);
int zlib_decompress(struct list_head *ws, unsigned char *data_in,
int zlib_decompress(struct list_head *ws, const u8 *data_in,
struct page *dest_page, unsigned long start_byte, size_t srclen,
size_t destlen);
struct list_head *zlib_alloc_workspace(unsigned int level);
......@@ -160,7 +161,7 @@ int lzo_compress_pages(struct list_head *ws, struct address_space *mapping,
u64 start, struct page **pages, unsigned long *out_pages,
unsigned long *total_in, unsigned long *total_out);
int lzo_decompress_bio(struct list_head *ws, struct compressed_bio *cb);
int lzo_decompress(struct list_head *ws, unsigned char *data_in,
int lzo_decompress(struct list_head *ws, const u8 *data_in,
struct page *dest_page, unsigned long start_byte, size_t srclen,
size_t destlen);
struct list_head *lzo_alloc_workspace(unsigned int level);
......@@ -170,7 +171,7 @@ int zstd_compress_pages(struct list_head *ws, struct address_space *mapping,
u64 start, struct page **pages, unsigned long *out_pages,
unsigned long *total_in, unsigned long *total_out);
int zstd_decompress_bio(struct list_head *ws, struct compressed_bio *cb);
int zstd_decompress(struct list_head *ws, unsigned char *data_in,
int zstd_decompress(struct list_head *ws, const u8 *data_in,
struct page *dest_page, unsigned long start_byte, size_t srclen,
size_t destlen);
void zstd_init_workspace_manager(void);
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef BTRFS_DEFRAG_H
#define BTRFS_DEFRAG_H
int btrfs_defrag_file(struct inode *inode, struct file_ra_state *ra,
struct btrfs_ioctl_defrag_range_args *range,
u64 newer_than, unsigned long max_to_defrag);
int __init btrfs_auto_defrag_init(void);
void __cold btrfs_auto_defrag_exit(void);
int btrfs_add_inode_defrag(struct btrfs_trans_handle *trans,
struct btrfs_inode *inode, u32 extent_thresh);
int btrfs_run_defrag_inodes(struct btrfs_fs_info *fs_info);
void btrfs_cleanup_defrag_inodes(struct btrfs_fs_info *fs_info);
int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, struct btrfs_root *root);
static inline int btrfs_defrag_cancelled(struct btrfs_fs_info *fs_info)
{
return signal_pending(current);
}
#endif
// SPDX-License-Identifier: GPL-2.0
#include "messages.h"
#include "ctree.h"
#include "delalloc-space.h"
#include "block-rsv.h"
......@@ -8,6 +9,7 @@
#include "transaction.h"
#include "qgroup.h"
#include "block-group.h"
#include "fs.h"
/*
* HOW DOES THIS WORK
......@@ -200,8 +202,8 @@ void btrfs_free_reserved_data_space(struct btrfs_inode *inode,
btrfs_qgroup_free_data(inode, reserved, start, len);
}
/**
* Release any excessive reservation
/*
* Release any excessive reservations for an inode.
*
* @inode: the inode we need to release from
* @qgroup_free: free or convert qgroup meta. Unlike normal operation, qgroup
......@@ -375,8 +377,8 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes,
return 0;
}
/**
* Release a metadata reservation for an inode
/*
* Release a metadata reservation for an inode.
*
* @inode: the inode to release the reservation for.
* @num_bytes: the number of bytes we are releasing.
......@@ -403,8 +405,9 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes,
btrfs_inode_rsv_release(inode, qgroup_free);
}
/**
* btrfs_delalloc_release_extents - release our outstanding_extents
/*
* Release our outstanding_extents for an inode.
*
* @inode: the inode to balance the reservation for.
* @num_bytes: the number of bytes we originally reserved with
*
......@@ -431,9 +434,9 @@ void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes)
btrfs_inode_rsv_release(inode, true);
}
/**
* btrfs_delalloc_reserve_space - reserve data and metadata space for
* delalloc
/*
* Reserve data and metadata space for delalloc
*
* @inode: inode we're writing to
* @start: start range we are writing to
* @len: how long the range we are writing to
......@@ -442,19 +445,19 @@ void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes)
*
* This will do the following things
*
* - reserve space in data space info for num bytes
* and reserve precious corresponding qgroup space
* - reserve space in data space info for num bytes and reserve precious
* corresponding qgroup space
* (Done in check_data_free_space)
*
* - reserve space for metadata space, based on the number of outstanding
* extents and how much csums will be needed
* also reserve metadata space in a per root over-reserve method.
* extents and how much csums will be needed also reserve metadata space in a
* per root over-reserve method.
* - add to the inodes->delalloc_bytes
* - add it to the fs_info's delalloc inodes list.
* (Above 3 all done in delalloc_reserve_metadata)
*
* Return 0 for success
* Return <0 for error(-ENOSPC or -EQUOT)
* Return <0 for error(-ENOSPC or -EDQUOT)
*/
int btrfs_delalloc_reserve_space(struct btrfs_inode *inode,
struct extent_changeset **reserved, u64 start, u64 len)
......@@ -473,7 +476,7 @@ int btrfs_delalloc_reserve_space(struct btrfs_inode *inode,
return ret;
}
/**
/*
* Release data and metadata space for delalloc
*
* @inode: inode we're releasing space for
......@@ -482,10 +485,10 @@ int btrfs_delalloc_reserve_space(struct btrfs_inode *inode,
* @len: length of the space already reserved
* @qgroup_free: should qgroup reserved-space also be freed
*
* This function will release the metadata space that was not used and will
* decrement ->delalloc_bytes and remove it from the fs_info delalloc_inodes
* list if there are no delalloc bytes left.
* Also it will handle the qgroup reserved space.
* Release the metadata space that was not used and will decrement
* ->delalloc_bytes and remove it from the fs_info->delalloc_inodes list if
* there are no delalloc bytes left. Also it will handle the qgroup reserved
* space.
*/
void btrfs_delalloc_release_space(struct btrfs_inode *inode,
struct extent_changeset *reserved,
......
......@@ -20,5 +20,8 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes,
bool qgroup_free);
int btrfs_delalloc_reserve_space(struct btrfs_inode *inode,
struct extent_changeset **reserved, u64 start, u64 len);
int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes,
u64 disk_num_bytes, bool noflush);
void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes);
#endif /* BTRFS_DELALLOC_SPACE_H */
......@@ -6,14 +6,19 @@
#include <linux/slab.h>
#include <linux/iversion.h>
#include "ctree.h"
#include "fs.h"
#include "messages.h"
#include "misc.h"
#include "delayed-inode.h"
#include "disk-io.h"
#include "transaction.h"
#include "ctree.h"
#include "qgroup.h"
#include "locking.h"
#include "inode-item.h"
#include "space-info.h"
#include "accessors.h"
#include "file-item.h"
#define BTRFS_DELAYED_WRITEBACK 512
#define BTRFS_DELAYED_BACKGROUND 128
......@@ -1412,7 +1417,7 @@ void btrfs_balance_delayed_items(struct btrfs_fs_info *fs_info)
int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
const char *name, int name_len,
struct btrfs_inode *dir,
struct btrfs_disk_key *disk_key, u8 type,
struct btrfs_disk_key *disk_key, u8 flags,
u64 index)
{
struct btrfs_fs_info *fs_info = trans->fs_info;
......@@ -1443,7 +1448,7 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
btrfs_set_stack_dir_transid(dir_item, trans->transid);
btrfs_set_stack_dir_data_len(dir_item, 0);
btrfs_set_stack_dir_name_len(dir_item, name_len);
btrfs_set_stack_dir_type(dir_item, type);
btrfs_set_stack_dir_flags(dir_item, flags);
memcpy((char *)(dir_item + 1), name, name_len);
data_len = delayed_item->data_len + sizeof(struct btrfs_item);
......@@ -1641,8 +1646,8 @@ bool btrfs_readdir_get_delayed_items(struct inode *inode,
* We can only do one readdir with delayed items at a time because of
* item->readdir_list.
*/
btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED);
btrfs_inode_lock(inode, 0);
btrfs_inode_unlock(BTRFS_I(inode), BTRFS_ILOCK_SHARED);
btrfs_inode_lock(BTRFS_I(inode), 0);
mutex_lock(&delayed_node->mutex);
item = __btrfs_first_delayed_insertion_item(delayed_node);
......@@ -1753,7 +1758,7 @@ int btrfs_readdir_delayed_dir_index(struct dir_context *ctx,
name = (char *)(di + 1);
name_len = btrfs_stack_dir_name_len(di);
d_type = fs_ftype_to_dtype(di->type);
d_type = fs_ftype_to_dtype(btrfs_dir_flags_to_ftype(di->type));
btrfs_disk_key_to_cpu(&location, &di->location);
over = !dir_emit(ctx, name, name_len,
......
......@@ -113,7 +113,7 @@ static inline void btrfs_init_delayed_root(
int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
const char *name, int name_len,
struct btrfs_inode *dir,
struct btrfs_disk_key *disk_key, u8 type,
struct btrfs_disk_key *disk_key, u8 flags,
u64 index);
int btrfs_delete_delayed_dir_index(struct btrfs_trans_handle *trans,
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment