Commit 35a891be authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'xfs-reflink-for-linus-4.9-rc1' of...

Merge tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs

    < XFS has gained super CoW powers! >
     ----------------------------------
            \   ^__^
             \  (oo)\_______
                (__)\       )\/\
                    ||----w |
                    ||     ||

Pull XFS support for shared data extents from Dave Chinner:
 "This is the second part of the XFS updates for this merge cycle.  This
  pullreq contains the new shared data extents feature for XFS.

  Given the complexity and size of this change I am expecting - like the
  addition of reverse mapping last cycle - that there will be some
  follow-up bug fixes and cleanups around the -rc3 stage for issues that
  I'm sure will show up once the code hits a wider userbase.

  What it is:

  At the most basic level we are simply adding shared data extents to
  XFS - i.e. a single extent on disk can now have multiple owners. To do
  this we have to add new on-disk features to both track the shared
  extents and the number of times they've been shared. This is done by
  the new "refcount" btree that sits in every allocation group. When we
  share or unshare an extent, this tree gets updated.

  Along with this new tree, the reverse mapping tree needs to be updated
  to track each owner or a shared extent. This also needs to be updated
  ever share/unshare operation. These interactions at extent allocation
  and freeing time have complex ordering and recovery constraints, so
  there's a significant amount of new intent-based transaction code to
  ensure that operations are performed atomically from both the runtime
  and integrity/crash recovery perspectives.

  We also need to break sharing when writes hit a shared extent - this
  is where the new copy-on-write implementation comes in. We allocate
  new storage and copy the original data along with the overwrite data
  into the new location. We only do this for data as we don't share
  metadata at all - each inode has it's own metadata that tracks the
  shared data extents, the extents undergoing CoW and it's own private
  extents.

  Of course, being XFS, nothing is simple - we use delayed allocation
  for CoW similar to how we use it for normal writes. ENOSPC is a
  significant issue here - we build on the reservation code added in
  4.8-rc1 with the reverse mapping feature to ensure we don't get
  spurious ENOSPC issues part way through a CoW operation. These
  mechanisms also help minimise fragmentation due to repeated CoW
  operations. To further reduce fragmentation overhead, we've also
  introduced a CoW extent size hint, which indicates how large a region
  we should allocate when we execute a CoW operation.

  With all this functionality in place, we can hook up .copy_file_range,
  .clone_file_range and .dedupe_file_range and we gain all the
  capabilities of reflink and other vfs provided functionality that
  enable manipulation to shared extents. We also added a fallocate mode
  that explicitly unshares a range of a file, which we implemented as an
  explicit CoW of all the shared extents in a file.

  As such, it's a huge chunk of new functionality with new on-disk
  format features and internal infrastructure. It warns at mount time as
  an experimental feature and that it may eat data (as we do with all
  new on-disk features until they stabilise). We have not released
  userspace suport for it yet - userspace support currently requires
  download from Darrick's xfsprogs repo and build from source, so the
  access to this feature is really developer/tester only at this point.
  Initial userspace support will be released at the same time the kernel
  with this code in it is released.

  The new code causes 5-6 new failures with xfstests - these aren't
  serious functional failures but things the output of tests changing
  slightly due to perturbations in layouts, space usage, etc. OTOH,
  we've added 150+ new tests to xfstests that specifically exercise this
  new functionality so it's got far better test coverage than any
  functionality we've previously added to XFS.

  Darrick has done a pretty amazing job getting us to this stage, and
  special mention also needs to go to Christoph (review, testing,
  improvements and bug fixes) and Brian (caught several intricate bugs
  during review) for the effort they've also put in.

  Summary:

   - unshare range (FALLOC_FL_UNSHARE) support for fallocate

   - copy-on-write extent size hints (FS_XFLAG_COWEXTSIZE) for fsxattr
     interface

   - shared extent support for XFS

   - copy-on-write support for shared extents

   - copy_file_range support

   - clone_file_range support (implements reflink)

   - dedupe_file_range support

   - defrag support for reverse mapping enabled filesystems"

* tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (71 commits)
  xfs: convert COW blocks to real blocks before unwritten extent conversion
  xfs: rework refcount cow recovery error handling
  xfs: clear reflink flag if setting realtime flag
  xfs: fix error initialization
  xfs: fix label inaccuracies
  xfs: remove isize check from unshare operation
  xfs: reduce stack usage of _reflink_clear_inode_flag
  xfs: check inode reflink flag before calling reflink functions
  xfs: implement swapext for rmap filesystems
  xfs: refactor swapext code
  xfs: various swapext cleanups
  xfs: recognize the reflink feature bit
  xfs: simulate per-AG reservations being critically low
  xfs: don't mix reflink and DAX mode for now
  xfs: check for invalid inode reflink flags
  xfs: set a default CoW extent size of 32 blocks
  xfs: convert unwritten status of reverse mappings for shared files
  xfs: use interval query for rmap alloc operations on shared files
  xfs: add shared rmap map/unmap/convert log item types
  xfs: increase log reservations for reflink
  ...
parents 40bd3a5f feac470e
......@@ -267,6 +267,11 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
(mode & ~FALLOC_FL_INSERT_RANGE))
return -EINVAL;
/* Unshare range should only be used with allocate mode. */
if ((mode & FALLOC_FL_UNSHARE_RANGE) &&
(mode & ~(FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_KEEP_SIZE)))
return -EINVAL;
if (!(file->f_mode & FMODE_WRITE))
return -EBADF;
......
......@@ -55,6 +55,8 @@ xfs-y += $(addprefix libxfs/, \
xfs_ag_resv.o \
xfs_rmap.o \
xfs_rmap_btree.o \
xfs_refcount.o \
xfs_refcount_btree.o \
xfs_sb.o \
xfs_symlink_remote.o \
xfs_trans_resv.o \
......@@ -88,6 +90,7 @@ xfs-y += xfs_aops.o \
xfs_message.o \
xfs_mount.o \
xfs_mru_cache.o \
xfs_reflink.o \
xfs_stats.o \
xfs_super.o \
xfs_symlink.o \
......@@ -100,16 +103,20 @@ xfs-y += xfs_aops.o \
# low-level transaction/log code
xfs-y += xfs_log.o \
xfs_log_cil.o \
xfs_bmap_item.o \
xfs_buf_item.o \
xfs_extfree_item.o \
xfs_icreate_item.o \
xfs_inode_item.o \
xfs_refcount_item.o \
xfs_rmap_item.o \
xfs_log_recover.o \
xfs_trans_ail.o \
xfs_trans_bmap.o \
xfs_trans_buf.o \
xfs_trans_extfree.o \
xfs_trans_inode.o \
xfs_trans_refcount.o \
xfs_trans_rmap.o \
# optional features
......
......@@ -38,6 +38,7 @@
#include "xfs_trans_space.h"
#include "xfs_rmap_btree.h"
#include "xfs_btree.h"
#include "xfs_refcount_btree.h"
/*
* Per-AG Block Reservations
......@@ -108,7 +109,9 @@ xfs_ag_resv_critical(
trace_xfs_ag_resv_critical(pag, type, avail);
/* Critically low if less than 10% or max btree height remains. */
return avail < orig / 10 || avail < XFS_BTREE_MAXLEVELS;
return XFS_TEST_ERROR(avail < orig / 10 || avail < XFS_BTREE_MAXLEVELS,
pag->pag_mount, XFS_ERRTAG_AG_RESV_CRITICAL,
XFS_RANDOM_AG_RESV_CRITICAL);
}
/*
......@@ -228,6 +231,11 @@ xfs_ag_resv_init(
if (pag->pag_meta_resv.ar_asked == 0) {
ask = used = 0;
error = xfs_refcountbt_calc_reserves(pag->pag_mount,
pag->pag_agno, &ask, &used);
if (error)
goto out;
error = __xfs_ag_resv_init(pag, XFS_AG_RESV_METADATA,
ask, used);
if (error)
......@@ -238,6 +246,11 @@ xfs_ag_resv_init(
if (pag->pag_agfl_resv.ar_asked == 0) {
ask = used = 0;
error = xfs_rmapbt_calc_reserves(pag->pag_mount, pag->pag_agno,
&ask, &used);
if (error)
goto out;
error = __xfs_ag_resv_init(pag, XFS_AG_RESV_AGFL, ask, used);
if (error)
goto out;
......
......@@ -52,10 +52,23 @@ STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *);
STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
unsigned int
xfs_refc_block(
struct xfs_mount *mp)
{
if (xfs_sb_version_hasrmapbt(&mp->m_sb))
return XFS_RMAP_BLOCK(mp) + 1;
if (xfs_sb_version_hasfinobt(&mp->m_sb))
return XFS_FIBT_BLOCK(mp) + 1;
return XFS_IBT_BLOCK(mp) + 1;
}
xfs_extlen_t
xfs_prealloc_blocks(
struct xfs_mount *mp)
{
if (xfs_sb_version_hasreflink(&mp->m_sb))
return xfs_refc_block(mp) + 1;
if (xfs_sb_version_hasrmapbt(&mp->m_sb))
return XFS_RMAP_BLOCK(mp) + 1;
if (xfs_sb_version_hasfinobt(&mp->m_sb))
......@@ -115,6 +128,8 @@ xfs_alloc_ag_max_usable(
blocks++; /* finobt root block */
if (xfs_sb_version_hasrmapbt(&mp->m_sb))
blocks++; /* rmap root block */
if (xfs_sb_version_hasreflink(&mp->m_sb))
blocks++; /* refcount root block */
return mp->m_sb.sb_agblocks - blocks;
}
......@@ -2321,6 +2336,9 @@ xfs_alloc_log_agf(
offsetof(xfs_agf_t, agf_btreeblks),
offsetof(xfs_agf_t, agf_uuid),
offsetof(xfs_agf_t, agf_rmap_blocks),
offsetof(xfs_agf_t, agf_refcount_blocks),
offsetof(xfs_agf_t, agf_refcount_root),
offsetof(xfs_agf_t, agf_refcount_level),
/* needed so that we don't log the whole rest of the structure: */
offsetof(xfs_agf_t, agf_spare64),
sizeof(xfs_agf_t)
......@@ -2458,6 +2476,10 @@ xfs_agf_verify(
be32_to_cpu(agf->agf_btreeblks) > be32_to_cpu(agf->agf_length))
return false;
if (xfs_sb_version_hasreflink(&mp->m_sb) &&
be32_to_cpu(agf->agf_refcount_level) > XFS_BTREE_MAXLEVELS)
return false;
return true;;
}
......@@ -2578,6 +2600,7 @@ xfs_alloc_read_agf(
be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
pag->pagf_levels[XFS_BTNUM_RMAPi] =
be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level);
spin_lock_init(&pag->pagb_lock);
pag->pagb_count = 0;
pag->pagb_tree = RB_ROOT;
......
This diff is collapsed.
......@@ -97,6 +97,19 @@ struct xfs_extent_free_item
*/
#define XFS_BMAPI_ZERO 0x080
/*
* Map the inode offset to the block given in ap->firstblock. Primarily
* used for reflink. The range must be in a hole, and this flag cannot be
* turned on with PREALLOC or CONVERT, and cannot be used on the attr fork.
*
* For bunmapi, this flag unmaps the range without adjusting quota, reducing
* refcount, or freeing the blocks.
*/
#define XFS_BMAPI_REMAP 0x100
/* Map something in the CoW fork. */
#define XFS_BMAPI_COWFORK 0x200
#define XFS_BMAPI_FLAGS \
{ XFS_BMAPI_ENTIRE, "ENTIRE" }, \
{ XFS_BMAPI_METADATA, "METADATA" }, \
......@@ -105,12 +118,24 @@ struct xfs_extent_free_item
{ XFS_BMAPI_IGSTATE, "IGSTATE" }, \
{ XFS_BMAPI_CONTIG, "CONTIG" }, \
{ XFS_BMAPI_CONVERT, "CONVERT" }, \
{ XFS_BMAPI_ZERO, "ZERO" }
{ XFS_BMAPI_ZERO, "ZERO" }, \
{ XFS_BMAPI_REMAP, "REMAP" }, \
{ XFS_BMAPI_COWFORK, "COWFORK" }
static inline int xfs_bmapi_aflag(int w)
{
return (w == XFS_ATTR_FORK ? XFS_BMAPI_ATTRFORK : 0);
return (w == XFS_ATTR_FORK ? XFS_BMAPI_ATTRFORK :
(w == XFS_COW_FORK ? XFS_BMAPI_COWFORK : 0));
}
static inline int xfs_bmapi_whichfork(int bmapi_flags)
{
if (bmapi_flags & XFS_BMAPI_COWFORK)
return XFS_COW_FORK;
else if (bmapi_flags & XFS_BMAPI_ATTRFORK)
return XFS_ATTR_FORK;
return XFS_DATA_FORK;
}
/*
......@@ -131,13 +156,15 @@ static inline int xfs_bmapi_aflag(int w)
#define BMAP_LEFT_VALID (1 << 6)
#define BMAP_RIGHT_VALID (1 << 7)
#define BMAP_ATTRFORK (1 << 8)
#define BMAP_COWFORK (1 << 9)
#define XFS_BMAP_EXT_FLAGS \
{ BMAP_LEFT_CONTIG, "LC" }, \
{ BMAP_RIGHT_CONTIG, "RC" }, \
{ BMAP_LEFT_FILLING, "LF" }, \
{ BMAP_RIGHT_FILLING, "RF" }, \
{ BMAP_ATTRFORK, "ATTR" }
{ BMAP_ATTRFORK, "ATTR" }, \
{ BMAP_COWFORK, "COW" }
/*
......@@ -186,10 +213,15 @@ int xfs_bmapi_write(struct xfs_trans *tp, struct xfs_inode *ip,
xfs_fsblock_t *firstblock, xfs_extlen_t total,
struct xfs_bmbt_irec *mval, int *nmap,
struct xfs_defer_ops *dfops);
int __xfs_bunmapi(struct xfs_trans *tp, struct xfs_inode *ip,
xfs_fileoff_t bno, xfs_filblks_t *rlen, int flags,
xfs_extnum_t nexts, xfs_fsblock_t *firstblock,
struct xfs_defer_ops *dfops);
int xfs_bunmapi(struct xfs_trans *tp, struct xfs_inode *ip,
xfs_fileoff_t bno, xfs_filblks_t len, int flags,
xfs_extnum_t nexts, xfs_fsblock_t *firstblock,
struct xfs_defer_ops *dfops, int *done);
int xfs_bunmapi_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *del);
int xfs_check_nostate_extents(struct xfs_ifork *ifp, xfs_extnum_t idx,
xfs_extnum_t num);
uint xfs_default_attroffset(struct xfs_inode *ip);
......@@ -203,8 +235,31 @@ struct xfs_bmbt_rec_host *
xfs_bmap_search_extents(struct xfs_inode *ip, xfs_fileoff_t bno,
int fork, int *eofp, xfs_extnum_t *lastxp,
struct xfs_bmbt_irec *gotp, struct xfs_bmbt_irec *prevp);
int xfs_bmapi_reserve_delalloc(struct xfs_inode *ip, xfs_fileoff_t aoff,
xfs_filblks_t len, struct xfs_bmbt_irec *got,
struct xfs_bmbt_irec *prev, xfs_extnum_t *lastx, int eof);
int xfs_bmapi_reserve_delalloc(struct xfs_inode *ip, int whichfork,
xfs_fileoff_t aoff, xfs_filblks_t len,
struct xfs_bmbt_irec *got, struct xfs_bmbt_irec *prev,
xfs_extnum_t *lastx, int eof);
enum xfs_bmap_intent_type {
XFS_BMAP_MAP = 1,
XFS_BMAP_UNMAP,
};
struct xfs_bmap_intent {
struct list_head bi_list;
enum xfs_bmap_intent_type bi_type;
struct xfs_inode *bi_owner;
int bi_whichfork;
struct xfs_bmbt_irec bi_bmap;
};
int xfs_bmap_finish_one(struct xfs_trans *tp, struct xfs_defer_ops *dfops,
struct xfs_inode *ip, enum xfs_bmap_intent_type type,
int whichfork, xfs_fileoff_t startoff, xfs_fsblock_t startblock,
xfs_filblks_t blockcount, xfs_exntst_t state);
int xfs_bmap_map_extent(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
struct xfs_inode *ip, struct xfs_bmbt_irec *imap);
int xfs_bmap_unmap_extent(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
struct xfs_inode *ip, struct xfs_bmbt_irec *imap);
#endif /* __XFS_BMAP_H__ */
......@@ -453,6 +453,7 @@ xfs_bmbt_alloc_block(
if (args.fsbno == NULLFSBLOCK) {
args.fsbno = be64_to_cpu(start->l);
try_another_ag:
args.type = XFS_ALLOCTYPE_START_BNO;
/*
* Make sure there is sufficient room left in the AG to
......@@ -482,6 +483,22 @@ xfs_bmbt_alloc_block(
if (error)
goto error0;
/*
* During a CoW operation, the allocation and bmbt updates occur in
* different transactions. The mapping code tries to put new bmbt
* blocks near extents being mapped, but the only way to guarantee this
* is if the alloc and the mapping happen in a single transaction that
* has a block reservation. That isn't the case here, so if we run out
* of space we'll try again with another AG.
*/
if (xfs_sb_version_hasreflink(&cur->bc_mp->m_sb) &&
args.fsbno == NULLFSBLOCK &&
args.type == XFS_ALLOCTYPE_NEAR_BNO) {
cur->bc_private.b.dfops->dop_low = true;
args.fsbno = cur->bc_private.b.firstblock;
goto try_another_ag;
}
if (args.fsbno == NULLFSBLOCK && args.minleft) {
/*
* Could not find an AG with enough free space to satisfy
......@@ -777,6 +794,7 @@ xfs_bmbt_init_cursor(
{
struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork);
struct xfs_btree_cur *cur;
ASSERT(whichfork != XFS_COW_FORK);
cur = kmem_zone_zalloc(xfs_btree_cur_zone, KM_SLEEP);
......
......@@ -45,9 +45,10 @@ kmem_zone_t *xfs_btree_cur_zone;
*/
static const __uint32_t xfs_magics[2][XFS_BTNUM_MAX] = {
{ XFS_ABTB_MAGIC, XFS_ABTC_MAGIC, 0, XFS_BMAP_MAGIC, XFS_IBT_MAGIC,
XFS_FIBT_MAGIC },
XFS_FIBT_MAGIC, 0 },
{ XFS_ABTB_CRC_MAGIC, XFS_ABTC_CRC_MAGIC, XFS_RMAP_CRC_MAGIC,
XFS_BMAP_CRC_MAGIC, XFS_IBT_CRC_MAGIC, XFS_FIBT_CRC_MAGIC }
XFS_BMAP_CRC_MAGIC, XFS_IBT_CRC_MAGIC, XFS_FIBT_CRC_MAGIC,
XFS_REFC_CRC_MAGIC }
};
#define xfs_btree_magic(cur) \
xfs_magics[!!((cur)->bc_flags & XFS_BTREE_CRC_BLOCKS)][cur->bc_btnum]
......@@ -1216,6 +1217,9 @@ xfs_btree_set_refs(
case XFS_BTNUM_RMAP:
xfs_buf_set_ref(bp, XFS_RMAP_BTREE_REF);
break;
case XFS_BTNUM_REFC:
xfs_buf_set_ref(bp, XFS_REFC_BTREE_REF);
break;
default:
ASSERT(0);
}
......
......@@ -49,6 +49,7 @@ union xfs_btree_key {
struct xfs_inobt_key inobt;
struct xfs_rmap_key rmap;
struct xfs_rmap_key __rmap_bigkey[2];
struct xfs_refcount_key refc;
};
union xfs_btree_rec {
......@@ -57,6 +58,7 @@ union xfs_btree_rec {
struct xfs_alloc_rec alloc;
struct xfs_inobt_rec inobt;
struct xfs_rmap_rec rmap;
struct xfs_refcount_rec refc;
};
/*
......@@ -72,6 +74,7 @@ union xfs_btree_rec {
#define XFS_BTNUM_INO ((xfs_btnum_t)XFS_BTNUM_INOi)
#define XFS_BTNUM_FINO ((xfs_btnum_t)XFS_BTNUM_FINOi)
#define XFS_BTNUM_RMAP ((xfs_btnum_t)XFS_BTNUM_RMAPi)
#define XFS_BTNUM_REFC ((xfs_btnum_t)XFS_BTNUM_REFCi)
/*
* For logging record fields.
......@@ -105,6 +108,7 @@ do { \
case XFS_BTNUM_INO: __XFS_BTREE_STATS_INC(__mp, ibt, stat); break; \
case XFS_BTNUM_FINO: __XFS_BTREE_STATS_INC(__mp, fibt, stat); break; \
case XFS_BTNUM_RMAP: __XFS_BTREE_STATS_INC(__mp, rmap, stat); break; \
case XFS_BTNUM_REFC: __XFS_BTREE_STATS_INC(__mp, refcbt, stat); break; \
case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break; \
} \
} while (0)
......@@ -127,6 +131,8 @@ do { \
__XFS_BTREE_STATS_ADD(__mp, fibt, stat, val); break; \
case XFS_BTNUM_RMAP: \
__XFS_BTREE_STATS_ADD(__mp, rmap, stat, val); break; \
case XFS_BTNUM_REFC: \
__XFS_BTREE_STATS_ADD(__mp, refcbt, stat, val); break; \
case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break; \
} \
} while (0)
......@@ -217,6 +223,15 @@ union xfs_btree_irec {
struct xfs_bmbt_irec b;
struct xfs_inobt_rec_incore i;
struct xfs_rmap_irec r;
struct xfs_refcount_irec rc;
};
/* Per-AG btree private information. */
union xfs_btree_cur_private {
struct {
unsigned long nr_ops; /* # record updates */
int shape_changes; /* # of extent splits */
} refc;
};
/*
......@@ -243,6 +258,7 @@ typedef struct xfs_btree_cur
struct xfs_buf *agbp; /* agf/agi buffer pointer */
struct xfs_defer_ops *dfops; /* deferred updates */
xfs_agnumber_t agno; /* ag number */
union xfs_btree_cur_private priv;
} a;
struct { /* needed for BMAP */
struct xfs_inode *ip; /* pointer to our inode */
......
......@@ -51,6 +51,8 @@ struct xfs_defer_pending {
* find all the space it needs.
*/
enum xfs_defer_ops_type {
XFS_DEFER_OPS_TYPE_BMAP,
XFS_DEFER_OPS_TYPE_REFCOUNT,
XFS_DEFER_OPS_TYPE_RMAP,
XFS_DEFER_OPS_TYPE_FREE,
XFS_DEFER_OPS_TYPE_MAX,
......
......@@ -456,9 +456,11 @@ xfs_sb_has_compat_feature(
#define XFS_SB_FEAT_RO_COMPAT_FINOBT (1 << 0) /* free inode btree */
#define XFS_SB_FEAT_RO_COMPAT_RMAPBT (1 << 1) /* reverse map btree */
#define XFS_SB_FEAT_RO_COMPAT_REFLINK (1 << 2) /* reflinked files */
#define XFS_SB_FEAT_RO_COMPAT_ALL \
(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
XFS_SB_FEAT_RO_COMPAT_RMAPBT)
XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
XFS_SB_FEAT_RO_COMPAT_REFLINK)
#define XFS_SB_FEAT_RO_COMPAT_UNKNOWN ~XFS_SB_FEAT_RO_COMPAT_ALL
static inline bool
xfs_sb_has_ro_compat_feature(
......@@ -546,6 +548,12 @@ static inline bool xfs_sb_version_hasrmapbt(struct xfs_sb *sbp)
(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_RMAPBT);
}
static inline bool xfs_sb_version_hasreflink(struct xfs_sb *sbp)
{
return XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_REFLINK);
}
/*
* end of superblock version macros
*/
......@@ -641,14 +649,17 @@ typedef struct xfs_agf {
uuid_t agf_uuid; /* uuid of filesystem */
__be32 agf_rmap_blocks; /* rmapbt blocks used */
__be32 agf_padding; /* padding */
__be32 agf_refcount_blocks; /* refcountbt blocks used */
__be32 agf_refcount_root; /* refcount tree root block */
__be32 agf_refcount_level; /* refcount btree levels */
/*
* reserve some contiguous space for future logged fields before we add
* the unlogged fields. This makes the range logging via flags and
* structure offsets much simpler.
*/
__be64 agf_spare64[15];
__be64 agf_spare64[14];
/* unlogged fields, written during buffer writeback. */
__be64 agf_lsn; /* last write sequence */
......@@ -674,8 +685,11 @@ typedef struct xfs_agf {
#define XFS_AGF_BTREEBLKS 0x00000800
#define XFS_AGF_UUID 0x00001000
#define XFS_AGF_RMAP_BLOCKS 0x00002000
#define XFS_AGF_SPARE64 0x00004000
#define XFS_AGF_NUM_BITS 15
#define XFS_AGF_REFCOUNT_BLOCKS 0x00004000
#define XFS_AGF_REFCOUNT_ROOT 0x00008000
#define XFS_AGF_REFCOUNT_LEVEL 0x00010000
#define XFS_AGF_SPARE64 0x00020000
#define XFS_AGF_NUM_BITS 18
#define XFS_AGF_ALL_BITS ((1 << XFS_AGF_NUM_BITS) - 1)
#define XFS_AGF_FLAGS \
......@@ -693,6 +707,9 @@ typedef struct xfs_agf {
{ XFS_AGF_BTREEBLKS, "BTREEBLKS" }, \
{ XFS_AGF_UUID, "UUID" }, \
{ XFS_AGF_RMAP_BLOCKS, "RMAP_BLOCKS" }, \
{ XFS_AGF_REFCOUNT_BLOCKS, "REFCOUNT_BLOCKS" }, \
{ XFS_AGF_REFCOUNT_ROOT, "REFCOUNT_ROOT" }, \
{ XFS_AGF_REFCOUNT_LEVEL, "REFCOUNT_LEVEL" }, \
{ XFS_AGF_SPARE64, "SPARE64" }
/* disk block (xfs_daddr_t) in the AG */
......@@ -885,7 +902,8 @@ typedef struct xfs_dinode {
__be64 di_changecount; /* number of attribute changes */
__be64 di_lsn; /* flush sequence */
__be64 di_flags2; /* more random flags */
__u8 di_pad2[16]; /* more padding for future expansion */
__be32 di_cowextsize; /* basic cow extent size for file */
__u8 di_pad2[12]; /* more padding for future expansion */
/* fields only written to during inode creation */
xfs_timestamp_t di_crtime; /* time created */
......@@ -1041,9 +1059,14 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
* 16 bits of the XFS_XFLAG_s range.
*/
#define XFS_DIFLAG2_DAX_BIT 0 /* use DAX for this inode */
#define XFS_DIFLAG2_REFLINK_BIT 1 /* file's blocks may be shared */
#define XFS_DIFLAG2_COWEXTSIZE_BIT 2 /* copy on write extent size hint */
#define XFS_DIFLAG2_DAX (1 << XFS_DIFLAG2_DAX_BIT)
#define XFS_DIFLAG2_REFLINK (1 << XFS_DIFLAG2_REFLINK_BIT)
#define XFS_DIFLAG2_COWEXTSIZE (1 << XFS_DIFLAG2_COWEXTSIZE_BIT)
#define XFS_DIFLAG2_ANY (XFS_DIFLAG2_DAX)
#define XFS_DIFLAG2_ANY \
(XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE)
/*
* Inode number format:
......@@ -1353,7 +1376,9 @@ struct xfs_owner_info {
#define XFS_RMAP_OWN_AG (-5ULL) /* AG freespace btree blocks */
#define XFS_RMAP_OWN_INOBT (-6ULL) /* Inode btree blocks */
#define XFS_RMAP_OWN_INODES (-7ULL) /* Inode chunk */
#define XFS_RMAP_OWN_MIN (-8ULL) /* guard */
#define XFS_RMAP_OWN_REFC (-8ULL) /* refcount tree */
#define XFS_RMAP_OWN_COW (-9ULL) /* cow allocations */
#define XFS_RMAP_OWN_MIN (-10ULL) /* guard */
#define XFS_RMAP_NON_INODE_OWNER(owner) (!!((owner) & (1ULL << 63)))
......@@ -1433,6 +1458,62 @@ typedef __be32 xfs_rmap_ptr_t;
XFS_FIBT_BLOCK(mp) + 1 : \
XFS_IBT_BLOCK(mp) + 1)
/*
* Reference Count Btree format definitions
*
*/
#define XFS_REFC_CRC_MAGIC 0x52334643 /* 'R3FC' */
unsigned int xfs_refc_block(struct xfs_mount *mp);
/*
* Data record/key structure
*
* Each record associates a range of physical blocks (starting at
* rc_startblock and ending rc_blockcount blocks later) with a reference
* count (rc_refcount). Extents that are being used to stage a copy on
* write (CoW) operation are recorded in the refcount btree with a
* refcount of 1. All other records must have a refcount > 1 and must
* track an extent mapped only by file data forks.
*
* Extents with a single owner (attributes, metadata, non-shared file
* data) are not tracked here. Free space is also not tracked here.
* This is consistent with pre-reflink XFS.
*/
/*
* Extents that are being used to stage a copy on write are stored
* in the refcount btree with a refcount of 1 and the upper bit set
* on the startblock. This speeds up mount time deletion of stale
* staging extents because they're all at the right side of the tree.
*/
#define XFS_REFC_COW_START ((xfs_agblock_t)(1U << 31))
#define REFCNTBT_COWFLAG_BITLEN 1
#define REFCNTBT_AGBLOCK_BITLEN 31
struct xfs_refcount_rec {
__be32 rc_startblock; /* starting block number */
__be32 rc_blockcount; /* count of blocks */
__be32 rc_refcount; /* number of inodes linked here */
};
struct xfs_refcount_key {
__be32 rc_startblock; /* starting block number */
};
struct xfs_refcount_irec {
xfs_agblock_t rc_startblock; /* starting block number */
xfs_extlen_t rc_blockcount; /* count of free blocks */
xfs_nlink_t rc_refcount; /* number of inodes linked here */
};
#define MAXREFCOUNT ((xfs_nlink_t)~0U)
#define MAXREFCEXTLEN ((xfs_extlen_t)~0U)
/* btree pointer type */
typedef __be32 xfs_refcount_ptr_t;
/*
* BMAP Btree format definitions
*
......
......@@ -81,14 +81,16 @@ struct getbmapx {
#define BMV_IF_PREALLOC 0x4 /* rtn status BMV_OF_PREALLOC if req */
#define BMV_IF_DELALLOC 0x8 /* rtn status BMV_OF_DELALLOC if req */
#define BMV_IF_NO_HOLES 0x10 /* Do not return holes */
#define BMV_IF_COWFORK 0x20 /* return CoW fork rather than data */
#define BMV_IF_VALID \
(BMV_IF_ATTRFORK|BMV_IF_NO_DMAPI_READ|BMV_IF_PREALLOC| \
BMV_IF_DELALLOC|BMV_IF_NO_HOLES)
BMV_IF_DELALLOC|BMV_IF_NO_HOLES|BMV_IF_COWFORK)
/* bmv_oflags values - returned for each non-header segment */
#define BMV_OF_PREALLOC 0x1 /* segment = unwritten pre-allocation */
#define BMV_OF_DELALLOC 0x2 /* segment = delayed allocation */
#define BMV_OF_LAST 0x4 /* segment is the last in the file */
#define BMV_OF_SHARED 0x8 /* segment shared with another file */
/*
* Structure for XFS_IOC_FSSETDM.
......@@ -206,7 +208,8 @@ typedef struct xfs_fsop_resblks {
#define XFS_FSOP_GEOM_FLAGS_FTYPE 0x10000 /* inode directory types */
#define XFS_FSOP_GEOM_FLAGS_FINOBT 0x20000 /* free inode btree */
#define XFS_FSOP_GEOM_FLAGS_SPINODES 0x40000 /* sparse inode chunks */
#define XFS_FSOP_GEOM_FLAGS_RMAPBT 0x80000 /* Reverse mapping btree */
#define XFS_FSOP_GEOM_FLAGS_RMAPBT 0x80000 /* reverse mapping btree */
#define XFS_FSOP_GEOM_FLAGS_REFLINK 0x100000 /* files can share blocks */
/*
* Minimum and maximum sizes need for growth checks.
......@@ -275,7 +278,8 @@ typedef struct xfs_bstat {
#define bs_projid bs_projid_lo /* (previously just bs_projid) */
__u16 bs_forkoff; /* inode fork offset in bytes */
__u16 bs_projid_hi; /* higher part of project id */
unsigned char bs_pad[10]; /* pad space, unused */
unsigned char bs_pad[6]; /* pad space, unused */
__u32 bs_cowextsize; /* cow extent size */
__u32 bs_dmevmask; /* DMIG event mask */
__u16 bs_dmstate; /* DMIG state info */
__u16 bs_aextents; /* attribute number of extents */
......
......@@ -256,6 +256,7 @@ xfs_inode_from_disk(
to->di_crtime.t_sec = be32_to_cpu(from->di_crtime.t_sec);
to->di_crtime.t_nsec = be32_to_cpu(from->di_crtime.t_nsec);
to->di_flags2 = be64_to_cpu(from->di_flags2);
to->di_cowextsize = be32_to_cpu(from->di_cowextsize);
}
}
......@@ -305,7 +306,7 @@ xfs_inode_to_disk(
to->di_crtime.t_sec = cpu_to_be32(from->di_crtime.t_sec);
to->di_crtime.t_nsec = cpu_to_be32(from->di_crtime.t_nsec);
to->di_flags2 = cpu_to_be64(from->di_flags2);
to->di_cowextsize = cpu_to_be32(from->di_cowextsize);
to->di_ino = cpu_to_be64(ip->i_ino);
to->di_lsn = cpu_to_be64(lsn);
memset(to->di_pad2, 0, sizeof(to->di_pad2));
......@@ -357,6 +358,7 @@ xfs_log_dinode_to_disk(
to->di_crtime.t_sec = cpu_to_be32(from->di_crtime.t_sec);
to->di_crtime.t_nsec = cpu_to_be32(from->di_crtime.t_nsec);
to->di_flags2 = cpu_to_be64(from->di_flags2);
to->di_cowextsize = cpu_to_be32(from->di_cowextsize);
to->di_ino = cpu_to_be64(from->di_ino);
to->di_lsn = cpu_to_be64(from->di_lsn);
memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
......@@ -373,6 +375,9 @@ xfs_dinode_verify(
struct xfs_inode *ip,
struct xfs_dinode *dip)
{
uint16_t flags;
uint64_t flags2;
if (dip->di_magic != cpu_to_be16(XFS_DINODE_MAGIC))
return false;
......@@ -389,6 +394,23 @@ xfs_dinode_verify(
return false;
if (!uuid_equal(&dip->di_uuid, &mp->m_sb.sb_meta_uuid))
return false;
flags = be16_to_cpu(dip->di_flags);
flags2 = be64_to_cpu(dip->di_flags2);
/* don't allow reflink/cowextsize if we don't have reflink */
if ((flags2 & (XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE)) &&
!xfs_sb_version_hasreflink(&mp->m_sb))
return false;
/* don't let reflink and realtime mix */
if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags & XFS_DIFLAG_REALTIME))
return false;
/* don't let reflink and dax mix */
if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags2 & XFS_DIFLAG2_DAX))
return false;
return true;
}
......
......@@ -47,6 +47,7 @@ struct xfs_icdinode {
__uint16_t di_flags; /* random flags, XFS_DIFLAG_... */
__uint64_t di_flags2; /* more random flags */
__uint32_t di_cowextsize; /* basic cow extent size for file */
xfs_ictimestamp_t di_crtime; /* time created */
};
......
......@@ -121,6 +121,26 @@ xfs_iformat_fork(
return -EFSCORRUPTED;
}
if (unlikely(xfs_is_reflink_inode(ip) &&
(VFS_I(ip)->i_mode & S_IFMT) != S_IFREG)) {
xfs_warn(ip->i_mount,
"corrupt dinode %llu, wrong file type for reflink.",
ip->i_ino);
XFS_CORRUPTION_ERROR("xfs_iformat(reflink)",
XFS_ERRLEVEL_LOW, ip->i_mount, dip);
return -EFSCORRUPTED;
}
if (unlikely(xfs_is_reflink_inode(ip) &&
(ip->i_d.di_flags & XFS_DIFLAG_REALTIME))) {
xfs_warn(ip->i_mount,
"corrupt dinode %llu, has reflink+realtime flag set.",
ip->i_ino);
XFS_CORRUPTION_ERROR("xfs_iformat(reflink)",
XFS_ERRLEVEL_LOW, ip->i_mount, dip);
return -EFSCORRUPTED;
}
switch (VFS_I(ip)->i_mode & S_IFMT) {
case S_IFIFO:
case S_IFCHR:
......@@ -186,9 +206,14 @@ xfs_iformat_fork(
XFS_ERROR_REPORT("xfs_iformat(7)", XFS_ERRLEVEL_LOW, ip->i_mount);
return -EFSCORRUPTED;
}
if (error) {
if (error)
return error;
if (xfs_is_reflink_inode(ip)) {
ASSERT(ip->i_cowfp == NULL);
xfs_ifork_init_cow(ip);
}
if (!XFS_DFORK_Q(dip))
return 0;
......@@ -208,7 +233,8 @@ xfs_iformat_fork(
XFS_CORRUPTION_ERROR("xfs_iformat(8)",
XFS_ERRLEVEL_LOW,
ip->i_mount, dip);
return -EFSCORRUPTED;
error = -EFSCORRUPTED;
break;
}
error = xfs_iformat_local(ip, dip, XFS_ATTR_FORK, size);
......@@ -226,6 +252,9 @@ xfs_iformat_fork(
if (error) {
kmem_zone_free(xfs_ifork_zone, ip->i_afp);
ip->i_afp = NULL;
if (ip->i_cowfp)
kmem_zone_free(xfs_ifork_zone, ip->i_cowfp);
ip->i_cowfp = NULL;
xfs_idestroy_fork(ip, XFS_DATA_FORK);
}
return error;
......@@ -740,6 +769,9 @@ xfs_idestroy_fork(
if (whichfork == XFS_ATTR_FORK) {
kmem_zone_free(xfs_ifork_zone, ip->i_afp);
ip->i_afp = NULL;
} else if (whichfork == XFS_COW_FORK) {
kmem_zone_free(xfs_ifork_zone, ip->i_cowfp);
ip->i_cowfp = NULL;
}
}
......@@ -927,6 +959,19 @@ xfs_iext_get_ext(
}
}
/* Convert bmap state flags to an inode fork. */
struct xfs_ifork *
xfs_iext_state_to_fork(
struct xfs_inode *ip,
int state)
{
if (state & BMAP_COWFORK)
return ip->i_cowfp;
else if (state & BMAP_ATTRFORK)
return ip->i_afp;
return &ip->i_df;
}
/*
* Insert new item(s) into the extent records for incore inode
* fork 'ifp'. 'count' new items are inserted at index 'idx'.
......@@ -939,7 +984,7 @@ xfs_iext_insert(
xfs_bmbt_irec_t *new, /* items to insert */
int state) /* type of extent conversion */
{
xfs_ifork_t *ifp = (state & BMAP_ATTRFORK) ? ip->i_afp : &ip->i_df;
xfs_ifork_t *ifp = xfs_iext_state_to_fork(ip, state);
xfs_extnum_t i; /* extent record index */
trace_xfs_iext_insert(ip, idx, new, state, _RET_IP_);
......@@ -1189,7 +1234,7 @@ xfs_iext_remove(
int ext_diff, /* number of extents to remove */
int state) /* type of extent conversion */
{
xfs_ifork_t *ifp = (state & BMAP_ATTRFORK) ? ip->i_afp : &ip->i_df;
xfs_ifork_t *ifp = xfs_iext_state_to_fork(ip, state);
xfs_extnum_t nextents; /* number of extents in file */
int new_size; /* size of extents after removal */
......@@ -1934,3 +1979,20 @@ xfs_iext_irec_update_extoffs(
ifp->if_u1.if_ext_irec[i].er_extoff += ext_diff;
}
}
/*
* Initialize an inode's copy-on-write fork.
*/
void
xfs_ifork_init_cow(
struct xfs_inode *ip)
{
if (ip->i_cowfp)
return;
ip->i_cowfp = kmem_zone_zalloc(xfs_ifork_zone,
KM_SLEEP | KM_NOFS);
ip->i_cowfp->if_flags = XFS_IFEXTENTS;
ip->i_cformat = XFS_DINODE_FMT_EXTENTS;
ip->i_cnextents = 0;
}
......@@ -92,7 +92,9 @@ typedef struct xfs_ifork {
#define XFS_IFORK_PTR(ip,w) \
((w) == XFS_DATA_FORK ? \
&(ip)->i_df : \
(ip)->i_afp)
((w) == XFS_ATTR_FORK ? \
(ip)->i_afp : \
(ip)->i_cowfp))
#define XFS_IFORK_DSIZE(ip) \
(XFS_IFORK_Q(ip) ? \
XFS_IFORK_BOFF(ip) : \
......@@ -105,26 +107,38 @@ typedef struct xfs_ifork {
#define XFS_IFORK_SIZE(ip,w) \
((w) == XFS_DATA_FORK ? \
XFS_IFORK_DSIZE(ip) : \
XFS_IFORK_ASIZE(ip))
((w) == XFS_ATTR_FORK ? \
XFS_IFORK_ASIZE(ip) : \
0))
#define XFS_IFORK_FORMAT(ip,w) \
((w) == XFS_DATA_FORK ? \
(ip)->i_d.di_format : \
(ip)->i_d.di_aformat)
((w) == XFS_ATTR_FORK ? \
(ip)->i_d.di_aformat : \
(ip)->i_cformat))
#define XFS_IFORK_FMT_SET(ip,w,n) \
((w) == XFS_DATA_FORK ? \
((ip)->i_d.di_format = (n)) : \
((ip)->i_d.di_aformat = (n)))
((w) == XFS_ATTR_FORK ? \
((ip)->i_d.di_aformat = (n)) : \
((ip)->i_cformat = (n))))
#define XFS_IFORK_NEXTENTS(ip,w) \
((w) == XFS_DATA_FORK ? \
(ip)->i_d.di_nextents : \
(ip)->i_d.di_anextents)
((w) == XFS_ATTR_FORK ? \
(ip)->i_d.di_anextents : \
(ip)->i_cnextents))
#define XFS_IFORK_NEXT_SET(ip,w,n) \
((w) == XFS_DATA_FORK ? \
((ip)->i_d.di_nextents = (n)) : \
((ip)->i_d.di_anextents = (n)))
((w) == XFS_ATTR_FORK ? \
((ip)->i_d.di_anextents = (n)) : \
((ip)->i_cnextents = (n))))
#define XFS_IFORK_MAXEXT(ip, w) \
(XFS_IFORK_SIZE(ip, w) / sizeof(xfs_bmbt_rec_t))
struct xfs_ifork *xfs_iext_state_to_fork(struct xfs_inode *ip, int state);
int xfs_iformat_fork(struct xfs_inode *, struct xfs_dinode *);
void xfs_iflush_fork(struct xfs_inode *, struct xfs_dinode *,
struct xfs_inode_log_item *, int);
......@@ -169,4 +183,6 @@ void xfs_iext_irec_update_extoffs(struct xfs_ifork *, int, int);
extern struct kmem_zone *xfs_ifork_zone;
extern void xfs_ifork_init_cow(struct xfs_inode *ip);
#endif /* __XFS_INODE_FORK_H__ */
......@@ -112,7 +112,11 @@ static inline uint xlog_get_cycle(char *ptr)
#define XLOG_REG_TYPE_ICREATE 20
#define XLOG_REG_TYPE_RUI_FORMAT 21
#define XLOG_REG_TYPE_RUD_FORMAT 22
#define XLOG_REG_TYPE_MAX 22
#define XLOG_REG_TYPE_CUI_FORMAT 23
#define XLOG_REG_TYPE_CUD_FORMAT 24
#define XLOG_REG_TYPE_BUI_FORMAT 25
#define XLOG_REG_TYPE_BUD_FORMAT 26
#define XLOG_REG_TYPE_MAX 26
/*
* Flags to log operation header
......@@ -231,6 +235,10 @@ typedef struct xfs_trans_header {
#define XFS_LI_ICREATE 0x123f
#define XFS_LI_RUI 0x1240 /* rmap update intent */
#define XFS_LI_RUD 0x1241
#define XFS_LI_CUI 0x1242 /* refcount update intent */
#define XFS_LI_CUD 0x1243
#define XFS_LI_BUI 0x1244 /* bmbt update intent */
#define XFS_LI_BUD 0x1245
#define XFS_LI_TYPE_DESC \
{ XFS_LI_EFI, "XFS_LI_EFI" }, \
......@@ -242,7 +250,11 @@ typedef struct xfs_trans_header {
{ XFS_LI_QUOTAOFF, "XFS_LI_QUOTAOFF" }, \
{ XFS_LI_ICREATE, "XFS_LI_ICREATE" }, \
{ XFS_LI_RUI, "XFS_LI_RUI" }, \
{ XFS_LI_RUD, "XFS_LI_RUD" }
{ XFS_LI_RUD, "XFS_LI_RUD" }, \
{ XFS_LI_CUI, "XFS_LI_CUI" }, \
{ XFS_LI_CUD, "XFS_LI_CUD" }, \
{ XFS_LI_BUI, "XFS_LI_BUI" }, \
{ XFS_LI_BUD, "XFS_LI_BUD" }
/*
* Inode Log Item Format definitions.
......@@ -411,7 +423,8 @@ struct xfs_log_dinode {
__uint64_t di_changecount; /* number of attribute changes */
xfs_lsn_t di_lsn; /* flush sequence */
__uint64_t di_flags2; /* more random flags */
__uint8_t di_pad2[16]; /* more padding for future expansion */
__uint32_t di_cowextsize; /* basic cow extent size for file */
__uint8_t di_pad2[12]; /* more padding for future expansion */
/* fields only written to during inode creation */
xfs_ictimestamp_t di_crtime; /* time created */
......@@ -622,8 +635,11 @@ struct xfs_map_extent {
/* rmap me_flags: upper bits are flags, lower byte is type code */
#define XFS_RMAP_EXTENT_MAP 1
#define XFS_RMAP_EXTENT_MAP_SHARED 2
#define XFS_RMAP_EXTENT_UNMAP 3
#define XFS_RMAP_EXTENT_UNMAP_SHARED 4
#define XFS_RMAP_EXTENT_CONVERT 5
#define XFS_RMAP_EXTENT_CONVERT_SHARED 6
#define XFS_RMAP_EXTENT_ALLOC 7
#define XFS_RMAP_EXTENT_FREE 8
#define XFS_RMAP_EXTENT_TYPE_MASK 0xFF
......@@ -670,6 +686,102 @@ struct xfs_rud_log_format {
__uint64_t rud_rui_id; /* id of corresponding rui */
};
/*
* CUI/CUD (refcount update) log format definitions
*/
struct xfs_phys_extent {
__uint64_t pe_startblock;
__uint32_t pe_len;
__uint32_t pe_flags;
};
/* refcount pe_flags: upper bits are flags, lower byte is type code */
/* Type codes are taken directly from enum xfs_refcount_intent_type. */
#define XFS_REFCOUNT_EXTENT_TYPE_MASK 0xFF
#define XFS_REFCOUNT_EXTENT_FLAGS (XFS_REFCOUNT_EXTENT_TYPE_MASK)
/*
* This is the structure used to lay out a cui log item in the
* log. The cui_extents field is a variable size array whose
* size is given by cui_nextents.
*/
struct xfs_cui_log_format {
__uint16_t cui_type; /* cui log item type */
__uint16_t cui_size; /* size of this item */
__uint32_t cui_nextents; /* # extents to free */
__uint64_t cui_id; /* cui identifier */
struct xfs_phys_extent cui_extents[]; /* array of extents */
};
static inline size_t
xfs_cui_log_format_sizeof(
unsigned int nr)
{
return sizeof(struct xfs_cui_log_format) +
nr * sizeof(struct xfs_phys_extent);
}
/*
* This is the structure used to lay out a cud log item in the
* log. The cud_extents array is a variable size array whose
* size is given by cud_nextents;
*/
struct xfs_cud_log_format {
__uint16_t cud_type; /* cud log item type */
__uint16_t cud_size; /* size of this item */
__uint32_t __pad;
__uint64_t cud_cui_id; /* id of corresponding cui */
};
/*
* BUI/BUD (inode block mapping) log format definitions
*/
/* bmbt me_flags: upper bits are flags, lower byte is type code */
/* Type codes are taken directly from enum xfs_bmap_intent_type. */
#define XFS_BMAP_EXTENT_TYPE_MASK 0xFF
#define XFS_BMAP_EXTENT_ATTR_FORK (1U << 31)
#define XFS_BMAP_EXTENT_UNWRITTEN (1U << 30)
#define XFS_BMAP_EXTENT_FLAGS (XFS_BMAP_EXTENT_TYPE_MASK | \
XFS_BMAP_EXTENT_ATTR_FORK | \
XFS_BMAP_EXTENT_UNWRITTEN)
/*
* This is the structure used to lay out an bui log item in the
* log. The bui_extents field is a variable size array whose
* size is given by bui_nextents.
*/
struct xfs_bui_log_format {
__uint16_t bui_type; /* bui log item type */
__uint16_t bui_size; /* size of this item */
__uint32_t bui_nextents; /* # extents to free */
__uint64_t bui_id; /* bui identifier */
struct xfs_map_extent bui_extents[]; /* array of extents to bmap */
};
static inline size_t
xfs_bui_log_format_sizeof(
unsigned int nr)
{
return sizeof(struct xfs_bui_log_format) +
nr * sizeof(struct xfs_map_extent);
}
/*
* This is the structure used to lay out an bud log item in the
* log. The bud_extents array is a variable size array whose
* size is given by bud_nextents;
*/
struct xfs_bud_log_format {
__uint16_t bud_type; /* bud log item type */
__uint16_t bud_size; /* size of this item */
__uint32_t __pad;
__uint64_t bud_bui_id; /* id of corresponding bui */
};
/*
* Dquot Log format definitions.
*
......
This diff is collapsed.
/*
* Copyright (C) 2016 Oracle. All Rights Reserved.
*
* Author: Darrick J. Wong <darrick.wong@oracle.com>
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it would be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write the Free Software Foundation,
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
*/
#ifndef __XFS_REFCOUNT_H__
#define __XFS_REFCOUNT_H__
extern int xfs_refcount_lookup_le(struct xfs_btree_cur *cur,
xfs_agblock_t bno, int *stat);
extern int xfs_refcount_lookup_ge(struct xfs_btree_cur *cur,
xfs_agblock_t bno, int *stat);
extern int xfs_refcount_get_rec(struct xfs_btree_cur *cur,
struct xfs_refcount_irec *irec, int *stat);
enum xfs_refcount_intent_type {
XFS_REFCOUNT_INCREASE = 1,
XFS_REFCOUNT_DECREASE,
XFS_REFCOUNT_ALLOC_COW,
XFS_REFCOUNT_FREE_COW,
};
struct xfs_refcount_intent {
struct list_head ri_list;
enum xfs_refcount_intent_type ri_type;
xfs_fsblock_t ri_startblock;
xfs_extlen_t ri_blockcount;
};
extern int xfs_refcount_increase_extent(struct xfs_mount *mp,
struct xfs_defer_ops *dfops, struct xfs_bmbt_irec *irec);
extern int xfs_refcount_decrease_extent(struct xfs_mount *mp,
struct xfs_defer_ops *dfops, struct xfs_bmbt_irec *irec);
extern void xfs_refcount_finish_one_cleanup(struct xfs_trans *tp,
struct xfs_btree_cur *rcur, int error);
extern int xfs_refcount_finish_one(struct xfs_trans *tp,
struct xfs_defer_ops *dfops, enum xfs_refcount_intent_type type,
xfs_fsblock_t startblock, xfs_extlen_t blockcount,
xfs_fsblock_t *new_fsb, xfs_extlen_t *new_len,
struct xfs_btree_cur **pcur);
extern int xfs_refcount_find_shared(struct xfs_btree_cur *cur,
xfs_agblock_t agbno, xfs_extlen_t aglen, xfs_agblock_t *fbno,
xfs_extlen_t *flen, bool find_end_of_shared);
extern int xfs_refcount_alloc_cow_extent(struct xfs_mount *mp,
struct xfs_defer_ops *dfops, xfs_fsblock_t fsb,
xfs_extlen_t len);
extern int xfs_refcount_free_cow_extent(struct xfs_mount *mp,
struct xfs_defer_ops *dfops, xfs_fsblock_t fsb,
xfs_extlen_t len);
extern int xfs_refcount_recover_cow_leftovers(struct xfs_mount *mp,
xfs_agnumber_t agno);
#endif /* __XFS_REFCOUNT_H__ */
This diff is collapsed.
/*
* Copyright (C) 2016 Oracle. All Rights Reserved.
*
* Author: Darrick J. Wong <darrick.wong@oracle.com>
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it would be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write the Free Software Foundation,
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
*/
#ifndef __XFS_REFCOUNT_BTREE_H__
#define __XFS_REFCOUNT_BTREE_H__
/*
* Reference Count Btree on-disk structures
*/
struct xfs_buf;
struct xfs_btree_cur;
struct xfs_mount;
/*
* Btree block header size
*/
#define XFS_REFCOUNT_BLOCK_LEN XFS_BTREE_SBLOCK_CRC_LEN
/*
* Record, key, and pointer address macros for btree blocks.
*
* (note that some of these may appear unused, but they are used in userspace)
*/
#define XFS_REFCOUNT_REC_ADDR(block, index) \
((struct xfs_refcount_rec *) \
((char *)(block) + \
XFS_REFCOUNT_BLOCK_LEN + \
(((index) - 1) * sizeof(struct xfs_refcount_rec))))
#define XFS_REFCOUNT_KEY_ADDR(block, index) \
((struct xfs_refcount_key *) \
((char *)(block) + \
XFS_REFCOUNT_BLOCK_LEN + \
((index) - 1) * sizeof(struct xfs_refcount_key)))
#define XFS_REFCOUNT_PTR_ADDR(block, index, maxrecs) \
((xfs_refcount_ptr_t *) \
((char *)(block) + \
XFS_REFCOUNT_BLOCK_LEN + \
(maxrecs) * sizeof(struct xfs_refcount_key) + \
((index) - 1) * sizeof(xfs_refcount_ptr_t)))
extern struct xfs_btree_cur *xfs_refcountbt_init_cursor(struct xfs_mount *mp,
struct xfs_trans *tp, struct xfs_buf *agbp, xfs_agnumber_t agno,
struct xfs_defer_ops *dfops);
extern int xfs_refcountbt_maxrecs(struct xfs_mount *mp, int blocklen,
bool leaf);
extern void xfs_refcountbt_compute_maxlevels(struct xfs_mount *mp);
extern xfs_extlen_t xfs_refcountbt_calc_size(struct xfs_mount *mp,
unsigned long long len);
extern xfs_extlen_t xfs_refcountbt_max_size(struct xfs_mount *mp);
extern int xfs_refcountbt_calc_reserves(struct xfs_mount *mp,
xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used);
#endif /* __XFS_REFCOUNT_BTREE_H__ */
This diff is collapsed.
......@@ -206,4 +206,11 @@ int xfs_rmap_finish_one(struct xfs_trans *tp, enum xfs_rmap_intent_type type,
xfs_fsblock_t startblock, xfs_filblks_t blockcount,
xfs_exntst_t state, struct xfs_btree_cur **pcur);
int xfs_rmap_find_left_neighbor(struct xfs_btree_cur *cur, xfs_agblock_t bno,
uint64_t owner, uint64_t offset, unsigned int flags,
struct xfs_rmap_irec *irec, int *stat);
int xfs_rmap_lookup_le_range(struct xfs_btree_cur *cur, xfs_agblock_t bno,
uint64_t owner, uint64_t offset, unsigned int flags,
struct xfs_rmap_irec *irec, int *stat);
#endif /* __XFS_RMAP_H__ */
......@@ -35,6 +35,7 @@
#include "xfs_cksum.h"
#include "xfs_error.h"
#include "xfs_extent_busy.h"
#include "xfs_ag_resv.h"
/*
* Reverse map btree.
......@@ -512,6 +513,83 @@ void
xfs_rmapbt_compute_maxlevels(
struct xfs_mount *mp)
{
mp->m_rmap_maxlevels = xfs_btree_compute_maxlevels(mp,
mp->m_rmap_mnr, mp->m_sb.sb_agblocks);
/*
* On a non-reflink filesystem, the maximum number of rmap
* records is the number of blocks in the AG, hence the max
* rmapbt height is log_$maxrecs($agblocks). However, with
* reflink each AG block can have up to 2^32 (per the refcount
* record format) owners, which means that theoretically we
* could face up to 2^64 rmap records.
*
* That effectively means that the max rmapbt height must be
* XFS_BTREE_MAXLEVELS. "Fortunately" we'll run out of AG
* blocks to feed the rmapbt long before the rmapbt reaches
* maximum height. The reflink code uses ag_resv_critical to
* disallow reflinking when less than 10% of the per-AG metadata
* block reservation since the fallback is a regular file copy.
*/
if (xfs_sb_version_hasreflink(&mp->m_sb))
mp->m_rmap_maxlevels = XFS_BTREE_MAXLEVELS;
else
mp->m_rmap_maxlevels = xfs_btree_compute_maxlevels(mp,
mp->m_rmap_mnr, mp->m_sb.sb_agblocks);
}
/* Calculate the refcount btree size for some records. */
xfs_extlen_t
xfs_rmapbt_calc_size(
struct xfs_mount *mp,
unsigned long long len)
{
return xfs_btree_calc_size(mp, mp->m_rmap_mnr, len);
}
/*
* Calculate the maximum refcount btree size.
*/
xfs_extlen_t
xfs_rmapbt_max_size(
struct xfs_mount *mp)
{
/* Bail out if we're uninitialized, which can happen in mkfs. */
if (mp->m_rmap_mxr[0] == 0)
return 0;
return xfs_rmapbt_calc_size(mp, mp->m_sb.sb_agblocks);
}
/*
* Figure out how many blocks to reserve and how many are used by this btree.
*/
int
xfs_rmapbt_calc_reserves(
struct xfs_mount *mp,
xfs_agnumber_t agno,
xfs_extlen_t *ask,
xfs_extlen_t *used)
{
struct xfs_buf *agbp;
struct xfs_agf *agf;
xfs_extlen_t pool_len;
xfs_extlen_t tree_len;
int error;
if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
return 0;
/* Reserve 1% of the AG or enough for 1 block per record. */
pool_len = max(mp->m_sb.sb_agblocks / 100, xfs_rmapbt_max_size(mp));
*ask += pool_len;
error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
if (error)
return error;
agf = XFS_BUF_TO_AGF(agbp);
tree_len = be32_to_cpu(agf->agf_rmap_blocks);
xfs_buf_relse(agbp);
*used += tree_len;
return error;
}
......@@ -58,4 +58,11 @@ struct xfs_btree_cur *xfs_rmapbt_init_cursor(struct xfs_mount *mp,
int xfs_rmapbt_maxrecs(struct xfs_mount *mp, int blocklen, int leaf);
extern void xfs_rmapbt_compute_maxlevels(struct xfs_mount *mp);
extern xfs_extlen_t xfs_rmapbt_calc_size(struct xfs_mount *mp,
unsigned long long len);
extern xfs_extlen_t xfs_rmapbt_max_size(struct xfs_mount *mp);
extern int xfs_rmapbt_calc_reserves(struct xfs_mount *mp,
xfs_agnumber_t agno, xfs_extlen_t *ask, xfs_extlen_t *used);
#endif /* __XFS_RMAP_BTREE_H__ */
......@@ -38,6 +38,8 @@
#include "xfs_ialloc_btree.h"
#include "xfs_log.h"
#include "xfs_rmap_btree.h"
#include "xfs_bmap.h"
#include "xfs_refcount_btree.h"
/*
* Physical superblock buffer manipulations. Shared with libxfs in userspace.
......@@ -737,6 +739,13 @@ xfs_sb_mount_common(
mp->m_rmap_mnr[0] = mp->m_rmap_mxr[0] / 2;
mp->m_rmap_mnr[1] = mp->m_rmap_mxr[1] / 2;
mp->m_refc_mxr[0] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize,
true);
mp->m_refc_mxr[1] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize,
false);
mp->m_refc_mnr[0] = mp->m_refc_mxr[0] / 2;
mp->m_refc_mnr[1] = mp->m_refc_mxr[1] / 2;
mp->m_bsize = XFS_FSB_TO_BB(mp, 1);
mp->m_ialloc_inos = (int)MAX((__uint16_t)XFS_INODES_PER_CHUNK,
sbp->sb_inopblock);
......
......@@ -39,6 +39,7 @@ extern const struct xfs_buf_ops xfs_agf_buf_ops;
extern const struct xfs_buf_ops xfs_agfl_buf_ops;
extern const struct xfs_buf_ops xfs_allocbt_buf_ops;
extern const struct xfs_buf_ops xfs_rmapbt_buf_ops;
extern const struct xfs_buf_ops xfs_refcountbt_buf_ops;
extern const struct xfs_buf_ops xfs_attr3_leaf_buf_ops;
extern const struct xfs_buf_ops xfs_attr3_rmt_buf_ops;
extern const struct xfs_buf_ops xfs_bmbt_buf_ops;
......@@ -122,6 +123,7 @@ int xfs_log_calc_minimum_size(struct xfs_mount *);
#define XFS_INO_REF 2
#define XFS_ATTR_BTREE_REF 1
#define XFS_DQUOT_REF 1
#define XFS_REFC_BTREE_REF 1
/*
* Flags for xfs_trans_ichgtime().
......
......@@ -67,13 +67,14 @@ xfs_calc_buf_res(
* Per-extent log reservation for the btree changes involved in freeing or
* allocating an extent. In classic XFS there were two trees that will be
* modified (bnobt + cntbt). With rmap enabled, there are three trees
* (rmapbt). The number of blocks reserved is based on the formula:
* (rmapbt). With reflink, there are four trees (refcountbt). The number of
* blocks reserved is based on the formula:
*
* num trees * ((2 blocks/level * max depth) - 1)
*
* Keep in mind that max depth is calculated separately for each type of tree.
*/
static uint
uint
xfs_allocfree_log_count(
struct xfs_mount *mp,
uint num_ops)
......@@ -83,6 +84,8 @@ xfs_allocfree_log_count(
blocks = num_ops * 2 * (2 * mp->m_ag_maxlevels - 1);
if (xfs_sb_version_hasrmapbt(&mp->m_sb))
blocks += num_ops * (2 * mp->m_rmap_maxlevels - 1);
if (xfs_sb_version_hasreflink(&mp->m_sb))
blocks += num_ops * (2 * mp->m_refc_maxlevels - 1);
return blocks;
}
......@@ -809,11 +812,18 @@ xfs_trans_resv_calc(
* require a permanent reservation on space.
*/
resp->tr_write.tr_logres = xfs_calc_write_reservation(mp);
resp->tr_write.tr_logcount = XFS_WRITE_LOG_COUNT;
if (xfs_sb_version_hasreflink(&mp->m_sb))
resp->tr_write.tr_logcount = XFS_WRITE_LOG_COUNT_REFLINK;
else
resp->tr_write.tr_logcount = XFS_WRITE_LOG_COUNT;
resp->tr_write.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
resp->tr_itruncate.tr_logres = xfs_calc_itruncate_reservation(mp);
resp->tr_itruncate.tr_logcount = XFS_ITRUNCATE_LOG_COUNT;
if (xfs_sb_version_hasreflink(&mp->m_sb))
resp->tr_itruncate.tr_logcount =
XFS_ITRUNCATE_LOG_COUNT_REFLINK;
else
resp->tr_itruncate.tr_logcount = XFS_ITRUNCATE_LOG_COUNT;
resp->tr_itruncate.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
resp->tr_rename.tr_logres = xfs_calc_rename_reservation(mp);
......@@ -870,7 +880,10 @@ xfs_trans_resv_calc(
resp->tr_growrtalloc.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
resp->tr_qm_dqalloc.tr_logres = xfs_calc_qm_dqalloc_reservation(mp);
resp->tr_qm_dqalloc.tr_logcount = XFS_WRITE_LOG_COUNT;
if (xfs_sb_version_hasreflink(&mp->m_sb))
resp->tr_qm_dqalloc.tr_logcount = XFS_WRITE_LOG_COUNT_REFLINK;
else
resp->tr_qm_dqalloc.tr_logcount = XFS_WRITE_LOG_COUNT;
resp->tr_qm_dqalloc.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
/*
......
......@@ -87,6 +87,7 @@ struct xfs_trans_resv {
#define XFS_DEFAULT_LOG_COUNT 1
#define XFS_DEFAULT_PERM_LOG_COUNT 2
#define XFS_ITRUNCATE_LOG_COUNT 2
#define XFS_ITRUNCATE_LOG_COUNT_REFLINK 8
#define XFS_INACTIVE_LOG_COUNT 2
#define XFS_CREATE_LOG_COUNT 2
#define XFS_CREATE_TMPFILE_LOG_COUNT 2
......@@ -96,11 +97,13 @@ struct xfs_trans_resv {
#define XFS_LINK_LOG_COUNT 2
#define XFS_RENAME_LOG_COUNT 2
#define XFS_WRITE_LOG_COUNT 2
#define XFS_WRITE_LOG_COUNT_REFLINK 8
#define XFS_ADDAFORK_LOG_COUNT 2
#define XFS_ATTRINVAL_LOG_COUNT 1
#define XFS_ATTRSET_LOG_COUNT 3
#define XFS_ATTRRM_LOG_COUNT 3
void xfs_trans_resv_calc(struct xfs_mount *mp, struct xfs_trans_resv *resp);
uint xfs_allocfree_log_count(struct xfs_mount *mp, uint num_ops);
#endif /* __XFS_TRANS_RESV_H__ */
......@@ -21,6 +21,8 @@
/*
* Components of space reservations.
*/
#define XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp) \
(((mp)->m_rmap_mxr[0]) - ((mp)->m_rmap_mnr[0]))
#define XFS_MAX_CONTIG_EXTENTS_PER_BLOCK(mp) \
(((mp)->m_alloc_mxr[0]) - ((mp)->m_alloc_mnr[0]))
#define XFS_EXTENTADD_SPACE_RES(mp,w) (XFS_BM_MAXLEVELS(mp,w) - 1)
......@@ -28,6 +30,13 @@
(((b + XFS_MAX_CONTIG_EXTENTS_PER_BLOCK(mp) - 1) / \
XFS_MAX_CONTIG_EXTENTS_PER_BLOCK(mp)) * \
XFS_EXTENTADD_SPACE_RES(mp,w))
#define XFS_SWAP_RMAP_SPACE_RES(mp,b,w)\
(((b + XFS_MAX_CONTIG_EXTENTS_PER_BLOCK(mp) - 1) / \
XFS_MAX_CONTIG_EXTENTS_PER_BLOCK(mp)) * \
XFS_EXTENTADD_SPACE_RES(mp,w) + \
((b + XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp) - 1) / \
XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp)) * \
(mp)->m_rmap_maxlevels)
#define XFS_DAENTER_1B(mp,w) \
((w) == XFS_DATA_FORK ? (mp)->m_dir_geo->fsbcount : 1)
#define XFS_DAENTER_DBS(mp,w) \
......
......@@ -90,6 +90,7 @@ typedef __int64_t xfs_sfiloff_t; /* signed block number in a file */
*/
#define XFS_DATA_FORK 0
#define XFS_ATTR_FORK 1
#define XFS_COW_FORK 2
/*
* Min numbers of data/attr fork btree root pointers.
......@@ -109,7 +110,7 @@ typedef enum {
typedef enum {
XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_RMAPi, XFS_BTNUM_BMAPi,
XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_MAX
XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_REFCi, XFS_BTNUM_MAX
} xfs_btnum_t;
struct xfs_name {
......
This diff is collapsed.
......@@ -28,13 +28,15 @@ enum {
XFS_IO_DELALLOC, /* covers delalloc region */
XFS_IO_UNWRITTEN, /* covers allocated but uninitialized data */
XFS_IO_OVERWRITE, /* covers already allocated extent */
XFS_IO_COW, /* covers copy-on-write extent */
};
#define XFS_IO_TYPES \
{ XFS_IO_INVALID, "invalid" }, \
{ XFS_IO_DELALLOC, "delalloc" }, \
{ XFS_IO_UNWRITTEN, "unwritten" }, \
{ XFS_IO_OVERWRITE, "overwrite" }
{ XFS_IO_OVERWRITE, "overwrite" }, \
{ XFS_IO_COW, "CoW" }
/*
* Structure for buffered I/O completions.
......
This diff is collapsed.
/*
* Copyright (C) 2016 Oracle. All Rights Reserved.
*
* Author: Darrick J. Wong <darrick.wong@oracle.com>
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it would be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write the Free Software Foundation,
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
*/
#ifndef __XFS_BMAP_ITEM_H__
#define __XFS_BMAP_ITEM_H__
/*
* There are (currently) two pairs of bmap btree redo item types: map & unmap.
* The common abbreviations for these are BUI (bmap update intent) and BUD
* (bmap update done). The redo item type is encoded in the flags field of
* each xfs_map_extent.
*
* *I items should be recorded in the *first* of a series of rolled
* transactions, and the *D items should be recorded in the same transaction
* that records the associated bmbt updates.
*
* Should the system crash after the commit of the first transaction but
* before the commit of the final transaction in a series, log recovery will
* use the redo information recorded by the intent items to replay the
* bmbt metadata updates in the non-first transaction.
*/
/* kernel only BUI/BUD definitions */
struct xfs_mount;
struct kmem_zone;
/*
* Max number of extents in fast allocation path.
*/
#define XFS_BUI_MAX_FAST_EXTENTS 1
/*
* Define BUI flag bits. Manipulated by set/clear/test_bit operators.
*/
#define XFS_BUI_RECOVERED 1
/*
* This is the "bmap update intent" log item. It is used to log the fact that
* some reverse mappings need to change. It is used in conjunction with the
* "bmap update done" log item described below.
*
* These log items follow the same rules as struct xfs_efi_log_item; see the
* comments about that structure (in xfs_extfree_item.h) for more details.
*/
struct xfs_bui_log_item {
struct xfs_log_item bui_item;
atomic_t bui_refcount;
atomic_t bui_next_extent;
unsigned long bui_flags; /* misc flags */
struct xfs_bui_log_format bui_format;
};
static inline size_t
xfs_bui_log_item_sizeof(
unsigned int nr)
{
return offsetof(struct xfs_bui_log_item, bui_format) +
xfs_bui_log_format_sizeof(nr);
}
/*
* This is the "bmap update done" log item. It is used to log the fact that
* some bmbt updates mentioned in an earlier bui item have been performed.
*/
struct xfs_bud_log_item {
struct xfs_log_item bud_item;
struct xfs_bui_log_item *bud_buip;
struct xfs_bud_log_format bud_format;
};
extern struct kmem_zone *xfs_bui_zone;
extern struct kmem_zone *xfs_bud_zone;
struct xfs_bui_log_item *xfs_bui_init(struct xfs_mount *);
struct xfs_bud_log_item *xfs_bud_init(struct xfs_mount *,
struct xfs_bui_log_item *);
void xfs_bui_item_free(struct xfs_bui_log_item *);
void xfs_bui_release(struct xfs_bui_log_item *);
int xfs_bui_recover(struct xfs_mount *mp, struct xfs_bui_log_item *buip);
#endif /* __XFS_BMAP_ITEM_H__ */
This diff is collapsed.
......@@ -84,7 +84,8 @@ xfs_dir2_sf_getdents(
sfp = (xfs_dir2_sf_hdr_t *)dp->i_df.if_u1.if_data;
ASSERT(dp->i_d.di_size >= xfs_dir2_sf_hdr_size(sfp->i8count));
if (dp->i_d.di_size < xfs_dir2_sf_hdr_size(sfp->i8count))
return -EFSCORRUPTED;
/*
* If the block number in the offset is out of range, we're done.
......
......@@ -92,7 +92,11 @@ extern void xfs_verifier_error(struct xfs_buf *bp);
#define XFS_ERRTAG_BMAPIFORMAT 21
#define XFS_ERRTAG_FREE_EXTENT 22
#define XFS_ERRTAG_RMAP_FINISH_ONE 23
#define XFS_ERRTAG_MAX 24
#define XFS_ERRTAG_REFCOUNT_CONTINUE_UPDATE 24
#define XFS_ERRTAG_REFCOUNT_FINISH_ONE 25
#define XFS_ERRTAG_BMAP_FINISH_ONE 26
#define XFS_ERRTAG_AG_RESV_CRITICAL 27
#define XFS_ERRTAG_MAX 28
/*
* Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
......@@ -121,6 +125,10 @@ extern void xfs_verifier_error(struct xfs_buf *bp);
#define XFS_RANDOM_BMAPIFORMAT XFS_RANDOM_DEFAULT
#define XFS_RANDOM_FREE_EXTENT 1
#define XFS_RANDOM_RMAP_FINISH_ONE 1
#define XFS_RANDOM_REFCOUNT_CONTINUE_UPDATE 1
#define XFS_RANDOM_REFCOUNT_FINISH_ONE 1
#define XFS_RANDOM_BMAP_FINISH_ONE 1
#define XFS_RANDOM_AG_RESV_CRITICAL 4
#ifdef DEBUG
extern int xfs_error_test_active;
......
This diff is collapsed.
This diff is collapsed.
......@@ -26,4 +26,7 @@ extern int xfs_reserve_blocks(xfs_mount_t *mp, __uint64_t *inval,
xfs_fsop_resblks_t *outval);
extern int xfs_fs_goingdown(xfs_mount_t *mp, __uint32_t inflags);
extern int xfs_fs_reserve_ag_blocks(struct xfs_mount *mp);
extern int xfs_fs_unreserve_ag_blocks(struct xfs_mount *mp);
#endif /* __XFS_FSOPS_H__ */
......@@ -21,8 +21,8 @@
/*
* Tunable XFS parameters. xfs_params is required even when CONFIG_SYSCTL=n,
* other XFS code uses these values. Times are measured in centisecs (i.e.
* 100ths of a second) with the exception of eofb_timer, which is measured in
* seconds.
* 100ths of a second) with the exception of eofb_timer and cowb_timer, which
* are measured in seconds.
*/
xfs_param_t xfs_params = {
/* MIN DFLT MAX */
......@@ -42,6 +42,7 @@ xfs_param_t xfs_params = {
.inherit_nodfrg = { 0, 1, 1 },
.fstrm_timer = { 1, 30*100, 3600*100},
.eofb_timer = { 1, 300, 3600*24},
.cowb_timer = { 1, 1800, 3600*24},
};
struct xfs_globals xfs_globals = {
......
This diff is collapsed.
......@@ -40,6 +40,7 @@ struct xfs_eofblocks {
in xfs_inode_ag_iterator */
#define XFS_ICI_RECLAIM_TAG 0 /* inode is to be reclaimed */
#define XFS_ICI_EOFBLOCKS_TAG 1 /* inode has blocks beyond EOF */
#define XFS_ICI_COWBLOCKS_TAG 2 /* inode can have cow blocks to gc */
/*
* Flags for xfs_iget()
......@@ -70,6 +71,12 @@ int xfs_inode_free_quota_eofblocks(struct xfs_inode *ip);
void xfs_eofblocks_worker(struct work_struct *);
void xfs_queue_eofblocks(struct xfs_mount *);
void xfs_inode_set_cowblocks_tag(struct xfs_inode *ip);
void xfs_inode_clear_cowblocks_tag(struct xfs_inode *ip);
int xfs_icache_free_cowblocks(struct xfs_mount *, struct xfs_eofblocks *);
int xfs_inode_free_quota_cowblocks(struct xfs_inode *ip);
void xfs_cowblocks_worker(struct work_struct *);
int xfs_inode_ag_iterator(struct xfs_mount *mp,
int (*execute)(struct xfs_inode *ip, int flags, void *args),
int flags, void *args);
......
This diff is collapsed.
This diff is collapsed.
......@@ -368,7 +368,7 @@ xfs_inode_to_log_dinode(
to->di_crtime.t_sec = from->di_crtime.t_sec;
to->di_crtime.t_nsec = from->di_crtime.t_nsec;
to->di_flags2 = from->di_flags2;
to->di_cowextsize = from->di_cowextsize;
to->di_ino = ip->i_ino;
to->di_lsn = lsn;
memset(to->di_pad2, 0, sizeof(to->di_pad2));
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -1159,6 +1159,7 @@ xfs_diflags_to_iflags(
inode->i_flags |= S_NOATIME;
if (S_ISREG(inode->i_mode) &&
ip->i_mount->m_sb.sb_blocksize == PAGE_SIZE &&
!xfs_is_reflink_inode(ip) &&
(ip->i_mount->m_flags & XFS_MOUNT_DAX ||
ip->i_d.di_flags2 & XFS_DIFLAG2_DAX))
inode->i_flags |= S_DAX;
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment