An error occurred fetching the project authors.
  1. 08 Jan, 2018 2 commits
  2. 09 Dec, 2017 1 commit
    • Christoph Hellwig's avatar
      xfs: remove "no-allocation" reservations for file creations · f59cf5c2
      Christoph Hellwig authored
      If we create a new file we will need an inode, and usually some metadata
      in the parent direction.  Aiming for everything to go well despite the
      lack of a reservation leads to dirty transactions cancelled under a heavy
      create/delete load.  This patch removes those nospace transactions, which
      will lead to slightly earlier ENOSPC on some workloads, but instead
      prevent file system shutdowns due to cancelling dirty transactions for
      others.
      
      A customer could observe assertations failures and shutdowns due to
      cancelation of dirty transactions during heavy NFS workloads as shown
      below:
      
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728125] XFS: Assertion failed: error != -ENOSPC, file: fs/xfs/xfs_inode.c, line: 1262
      
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728222] Call Trace:
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728246]  [<ffffffff81795daf>] dump_stack+0x63/0x81
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728262]  [<ffffffff810a1a5a>] warn_slowpath_common+0x8a/0xc0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728264]  [<ffffffff810a1b8a>] warn_slowpath_null+0x1a/0x20
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728285]  [<ffffffffa01bf403>] asswarn+0x33/0x40 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728308]  [<ffffffffa01bb07e>] xfs_create+0x7be/0x7d0 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728329]  [<ffffffffa01b6ffb>] xfs_generic_create+0x1fb/0x2e0 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728348]  [<ffffffffa01b7114>] xfs_vn_mknod+0x14/0x20 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728366]  [<ffffffffa01b7153>] xfs_vn_create+0x13/0x20 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728380]  [<ffffffff81231de5>] vfs_create+0xd5/0x140
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728390]  [<ffffffffa045ddb9>] do_nfsd_create+0x499/0x610 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728396]  [<ffffffffa0465fa5>] nfsd3_proc_create+0x135/0x210 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728401]  [<ffffffffa04561e3>] nfsd_dispatch+0xc3/0x210 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728416]  [<ffffffffa03bfa43>] svc_process_common+0x453/0x6f0 [sunrpc]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728423]  [<ffffffffa03bfdf3>] svc_process+0x113/0x1f0 [sunrpc]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728427]  [<ffffffffa0455bcf>] nfsd+0x10f/0x180 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728432]  [<ffffffffa0455ac0>] ? nfsd_destroy+0x80/0x80 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728438]  [<ffffffff810c0d58>] kthread+0xd8/0xf0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728441]  [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728451]  [<ffffffff8179d962>] ret_from_fork+0x42/0x70
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728453]  [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728454] ---[ end trace f9822c842fec81d4 ]---
      
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728477] XFS (sdb): Internal error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x4ee/0x7d0 [xfs]
      
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728684] XFS (sdb): Corruption of in-memory data detected. Shutting down filesystem
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728685] XFS (sdb): Please umount the filesystem and rectify the problem(s)
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      f59cf5c2
  3. 01 Nov, 2017 1 commit
  4. 26 Oct, 2017 1 commit
  5. 11 Oct, 2017 1 commit
  6. 01 Sep, 2017 1 commit
  7. 22 Aug, 2017 1 commit
    • Carlos Maiolino's avatar
      xfs: stop searching for free slots in an inode chunk when there are none · 2d32311c
      Carlos Maiolino authored
      In a filesystem without finobt, the Space manager selects an AG to alloc a new
      inode, where xfs_dialloc_ag_inobt() will search the AG for the free slot chunk.
      
      When the new inode is in the same AG as its parent, the btree will be searched
      starting on the parent's record, and then retried from the top if no slot is
      available beyond the parent's record.
      
      To exit this loop though, xfs_dialloc_ag_inobt() relies on the fact that the
      btree must have a free slot available, once its callers relied on the
      agi->freecount when deciding how/where to allocate this new inode.
      
      In the case when the agi->freecount is corrupted, showing available inodes in an
      AG, when in fact there is none, this becomes an infinite loop.
      
      Add a way to stop the loop when a free slot is not found in the btree, making
      the function to fall into the whole AG scan which will then, be able to detect
      the corruption and shut the filesystem down.
      
      As pointed by Brian, this might impact performance, giving the fact we
      don't reset the search distance anymore when we reach the end of the
      tree, giving it fewer tries before falling back to the whole AG search, but
      it will only affect searches that start within 10 records to the end of the tree.
      Signed-off-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      2d32311c
  8. 11 Aug, 2017 1 commit
    • Omar Sandoval's avatar
      xfs: fix inobt inode allocation search optimization · c44245b3
      Omar Sandoval authored
      When we try to allocate a free inode by searching the inobt, we try to
      find the inode nearest the parent inode by searching chunks both left
      and right of the chunk containing the parent. As an optimization, we
      cache the leftmost and rightmost records that we previously searched; if
      we do another allocation with the same parent inode, we'll pick up the
      search where it last left off.
      
      There's a bug in the case where we found a free inode to the left of the
      parent's chunk: we need to update the cached left and right records, but
      because we already reassigned the right record to point to the left, we
      end up assigning the left record to both the cached left and right
      records.
      
      This isn't a correctness problem strictly, but it can result in the next
      allocation rechecking chunks unnecessarily or allocating inodes further
      away from the parent than it needs to. Fix it by swapping the record
      pointer after we update the cached left and right records.
      
      Fixes: bd169565 ("xfs: speed up free inode search")
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      c44245b3
  9. 28 Jun, 2017 1 commit
  10. 19 Jun, 2017 2 commits
  11. 17 Feb, 2017 1 commit
    • Chandan Rajendra's avatar
      xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment · 8ee9fdbe
      Chandan Rajendra authored
      On a ppc64 system, executing generic/256 test with 32k block size gives the following call trace,
      
      XFS: Assertion failed: args->maxlen > 0, file: /root/repos/linux/fs/xfs/libxfs/xfs_alloc.c, line: 2026
      
      kernel BUG at /root/repos/linux/fs/xfs/xfs_message.c:113!
      Oops: Exception in kernel mode, sig: 5 [#1]
      SMP NR_CPUS=2048
      DEBUG_PAGEALLOC
      NUMA
      pSeries
      Modules linked in:
      CPU: 2 PID: 19361 Comm: mkdir Not tainted 4.10.0-rc5 #58
      task: c000000102606d80 task.stack: c0000001026b8000
      NIP: c0000000004ef798 LR: c0000000004ef798 CTR: c00000000082b290
      REGS: c0000001026bb090 TRAP: 0700   Not tainted  (4.10.0-rc5)
      MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>
      CR: 28004428  XER: 00000000
      CFAR: c0000000004ef180 SOFTE: 1
      GPR00: c0000000004ef798 c0000001026bb310 c000000001157300 ffffffffffffffea
      GPR04: 000000000000000a c0000001026bb130 0000000000000000 ffffffffffffffc0
      GPR08: 00000000000000d1 0000000000000021 00000000ffffffd1 c000000000dd4990
      GPR12: 0000000022004444 c00000000fe00800 0000000020000000 0000000000000000
      GPR16: 0000000000000000 0000000043a606fc 0000000043a76c08 0000000043a1b3d0
      GPR20: 000001002a35cd60 c0000001026bbb80 0000000000000000 0000000000000001
      GPR24: 0000000000000240 0000000000000004 c00000062dc55000 0000000000000000
      GPR28: 0000000000000004 c00000062ecd9200 0000000000000000 c0000001026bb6c0
      NIP [c0000000004ef798] .assfail+0x28/0x30
      LR [c0000000004ef798] .assfail+0x28/0x30
      Call Trace:
      [c0000001026bb310] [c0000000004ef798] .assfail+0x28/0x30 (unreliable)
      [c0000001026bb380] [c000000000455d74] .xfs_alloc_space_available+0x194/0x1b0
      [c0000001026bb410] [c00000000045b914] .xfs_alloc_fix_freelist+0x144/0x480
      [c0000001026bb580] [c00000000045c368] .xfs_alloc_vextent+0x698/0xa90
      [c0000001026bb650] [c0000000004a6200] .xfs_ialloc_ag_alloc+0x170/0x820
      [c0000001026bb7c0] [c0000000004a9098] .xfs_dialloc+0x158/0x320
      [c0000001026bb8a0] [c0000000004e628c] .xfs_ialloc+0x7c/0x610
      [c0000001026bb990] [c0000000004e8138] .xfs_dir_ialloc+0xa8/0x2f0
      [c0000001026bbaa0] [c0000000004e8814] .xfs_create+0x494/0x790
      [c0000001026bbbf0] [c0000000004e5ebc] .xfs_generic_create+0x2bc/0x410
      [c0000001026bbce0] [c0000000002b4a34] .vfs_mkdir+0x154/0x230
      [c0000001026bbd70] [c0000000002bc444] .SyS_mkdirat+0x94/0x120
      [c0000001026bbe30] [c00000000000b760] system_call+0x38/0xfc
      Instruction dump:
      4e800020 60000000 7c0802a6 7c862378 3c82ffca 7ca72b78 38841c18 7c651b78
      38600000 f8010010 f821ff91 4bfff94d <0fe00000> 60000000 7c0802a6 7c892378
      
      When block size is larger than inode cluster size, the call to
      XFS_B_TO_FSBT(mp, mp->m_inode_cluster_size) returns 0. Also, mkfs.xfs
      would have set xfs_sb->sb_inoalignmt to 0. This causes
      xfs_ialloc_cluster_alignment() to return 0.  Due to this
      args.minalignslop (in xfs_ialloc_ag_alloc()) gets the unsigned
      equivalent of -1 assigned to it. This later causes alloc_len in
      xfs_alloc_space_available() to have a value of 0. In such a scenario
      when args.total is also 0, the assert statement "ASSERT(args->maxlen >
      0);" fails.
      
      This commit fixes the bug by replacing the call to XFS_B_TO_FSBT() in
      xfs_ialloc_cluster_alignment() with a call to xfs_icluster_size_fsb().
      Suggested-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarChandan Rajendra <chandan@linux.vnet.ibm.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      8ee9fdbe
  12. 05 Dec, 2016 2 commits
  13. 08 Nov, 2016 1 commit
  14. 03 Aug, 2016 4 commits
    • Darrick J. Wong's avatar
      xfs: add owner field to extent allocation and freeing · 340785cc
      Darrick J. Wong authored
      For the rmap btree to work, we have to feed the extent owner
      information to the the allocation and freeing functions. This
      information is what will end up in the rmap btree that tracks
      allocated extents. While we technically don't need the owner
      information when freeing extents, passing it allows us to validate
      that the extent we are removing from the rmap btree actually
      belonged to the owner we expected it to belong to.
      
      We also define a special set of owner values for internal metadata
      that would otherwise have no owner. This allows us to tell the
      difference between metadata owned by different per-ag btrees, as
      well as static fs metadata (e.g. AG headers) and internal journal
      blocks.
      
      There are also a couple of special cases we need to take care of -
      during EFI recovery, we don't actually know who the original owner
      was, so we need to pass a wildcard to indicate that we aren't
      checking the owner for validity. We also need special handling in
      growfs, as we "free" the space in the last AG when extending it, but
      because it's new space it has no actual owner...
      
      While touching the xfs_bmap_add_free() function, re-order the
      parameters to put the struct xfs_mount first.
      
      Extend the owner field to include both the owner type and some sort
      of index within the owner.  The index field will be used to support
      reverse mappings when reflink is enabled.
      
      When we're freeing extents from an EFI, we don't have the owner
      information available (rmap updates have their own redo items).
      xfs_free_extent therefore doesn't need to do an rmap update. Make
      sure that the log replay code signals this correctly.
      
      This is based upon a patch originally from Dave Chinner. It has been
      extended to add more owner information with the intent of helping
      recovery operations when things go wrong (e.g. offset of user data
      block in a file).
      
      [dchinner: de-shout the xfs_rmap_*_owner helpers]
      [darrick: minor style fixes suggested by Christoph Hellwig]
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      340785cc
    • Darrick J. Wong's avatar
      xfs: rename flist/free_list to dfops · 2c3234d1
      Darrick J. Wong authored
      Mechanical change of flist/free_list to dfops, since they're now
      deferred ops, not just a freeing list.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      2c3234d1
    • Darrick J. Wong's avatar
      xfs: change xfs_bmap_{finish,cancel,init,free} -> xfs_defer_* · 310a75a3
      Darrick J. Wong authored
      Drop the compatibility shims that we were using to integrate the new
      deferred operation mechanism into the existing code.  No new code.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      310a75a3
    • Darrick J. Wong's avatar
      xfs: rework xfs_bmap_free callers to use xfs_defer_ops · 3ab78df2
      Darrick J. Wong authored
      Restructure everything that used xfs_bmap_free to use xfs_defer_ops
      instead.  For now we'll just remove the old symbols and play some
      cpp magic to make it work; in the next patch we'll actually rename
      everything.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      3ab78df2
  15. 21 Jun, 2016 2 commits
  16. 06 Mar, 2016 1 commit
  17. 04 Jan, 2016 1 commit
  18. 12 Oct, 2015 1 commit
    • Brian Foster's avatar
      xfs: validate metadata LSNs against log on v5 superblocks · a45086e2
      Brian Foster authored
      Since the onset of v5 superblocks, the LSN of the last modification has
      been included in a variety of on-disk data structures. This LSN is used
      to provide log recovery ordering guarantees (e.g., to ensure an older
      log recovery item is not replayed over a newer target data structure).
      
      While this works correctly from the point a filesystem is formatted and
      mounted, userspace tools have some problematic behaviors that defeat
      this mechanism. For example, xfs_repair historically zeroes out the log
      unconditionally (regardless of whether corruption is detected). If this
      occurs, the LSN of the filesystem is reset and the log is now in a
      problematic state with respect to on-disk metadata structures that might
      have a larger LSN. Until either the log catches up to the highest
      previously used metadata LSN or each affected data structure is modified
      and written out without incident (which resets the metadata LSN), log
      recovery is susceptible to filesystem corruption.
      
      This problem is ultimately addressed and repaired in the associated
      userspace tools. The kernel is still responsible to detect the problem
      and notify the user that something is wrong. Check the superblock LSN at
      mount time and fail the mount if it is invalid. From that point on,
      trigger verifier failure on any metadata I/O where an invalid LSN is
      detected. This results in a filesystem shutdown and guarantees that we
      do not log metadata changes with invalid LSNs on disk. Since this is a
      known issue with a known recovery path, present a warning to instruct
      the user how to recover.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      a45086e2
  19. 19 Aug, 2015 1 commit
    • Brian Foster's avatar
      xfs: fix btree cursor error cleanups · f307080a
      Brian Foster authored
      The btree cursor cleanup function takes an error parameter that
      affects how buffers are released from the cursor. All buffers are
      released in the event of error. Several callers do not specify the
      XFS_BTREE_ERROR flag in the event of error, however. This can cause
      buffers to hang around locked or with an elevated hold count and
      thus lead to umount hangs in the event of errors.
      
      Fix up the xfs_btree_del_cursor() callers to pass XFS_BTREE_ERROR if
      the cursor is being torn down due to error.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      f307080a
  20. 29 Jul, 2015 1 commit
    • Eric Sandeen's avatar
      xfs: create new metadata UUID field and incompat flag · ce748eaa
      Eric Sandeen authored
      This adds a new superblock field, sb_meta_uuid.  If set, along with
      a new incompat flag, the code will use that field on a V5 filesystem
      to compare to metadata UUIDs, which allows us to change the user-
      visible UUID at will.  Userspace handles the setting and clearing
      of the incompat flag as appropriate, as the UUID gets changed; i.e.
      setting the user-visible UUID back to the original UUID (as stored in
      the new field) will remove the incompatible feature flag.
      
      If the incompat flag is not set, this copies the user-visible UUID into
      into the meta_uuid slot in memory when the superblock is read from disk;
      the meta_uuid field is not written back to disk in this case.
      
      The remainder of this patch simply switches verifiers, initializers,
      etc to use the new sb_meta_uuid field.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      
      ce748eaa
  21. 04 Jun, 2015 1 commit
  22. 28 May, 2015 11 commits
    • Brian Foster's avatar
      xfs: skip unallocated regions of inode chunks in xfs_ifree_cluster() · 09b56604
      Brian Foster authored
      xfs_ifree_cluster() is called to mark all in-memory inodes and inode
      buffers as stale. This occurs after we've removed the inobt records and
      dropped any references of inobt data. xfs_ifree_cluster() uses the
      starting inode number to walk the namespace of inodes expected for a
      single chunk a cluster buffer at a time. The cluster buffer disk
      addresses are calculated by decoding the sequential inode numbers
      expected from the chunk.
      
      The problem with this approach is that if the inode chunk being removed
      is a sparse chunk, not all of the buffer addresses that are calculated
      as part of this sequence may be inode clusters. Attempting to acquire
      the buffer based on expected inode characterstics (i.e., cluster length)
      can lead to errors and is generally incorrect.
      
      We already use a couple variables to carry requisite state from
      xfs_difree() to xfs_ifree_cluster(). Rather than add a third, define a
      new internal structure to carry the existing parameters through these
      functions. Add an alloc field that represents the physical allocation
      bitmap of inodes in the chunk being removed. Modify xfs_ifree_cluster()
      to check each inode against the bitmap and skip the clusters that were
      never allocated as real inodes on disk.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      09b56604
    • Brian Foster's avatar
      xfs: only free allocated regions of inode chunks · 10ae3dc7
      Brian Foster authored
      An inode chunk is currently added to the transaction free list based on
      a simple fsb conversion and hardcoded chunk length. The nature of sparse
      chunks is such that the physical chunk of inodes on disk may consist of
      one or more discontiguous parts. Blocks that reside in the holes of the
      inode chunk are not inodes and could be allocated to any other use or
      not allocated at all.
      
      Refactor the existing xfs_bmap_add_free() call into the
      xfs_difree_inode_chunk() helper. The new helper uses the existing
      calculation if a chunk is not sparse. Otherwise, use the inobt record
      holemask to free the contiguous regions of the chunk.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      10ae3dc7
    • Brian Foster's avatar
      xfs: filter out sparse regions from individual inode allocation · 26dd5217
      Brian Foster authored
      Inode allocation from an existing record with free inodes traditionally
      selects the first inode available according to the ir_free mask. With
      sparse inode chunks, the ir_free mask could refer to an unallocated
      region. We must mask the unallocated regions out of ir_free before using
      it to select a free inode in the chunk.
      
      Update the xfs_inobt_first_free_inode() helper to find the first free
      inode available of the allocated regions of the inode chunk.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      26dd5217
    • Brian Foster's avatar
      xfs: randomly do sparse inode allocations in DEBUG mode · 1cdadee1
      Brian Foster authored
      Sparse inode allocations generally only occur when full inode chunk
      allocation fails. This requires some level of filesystem space usage and
      fragmentation.
      
      For filesystems formatted with sparse inode chunks enabled, do random
      sparse inode chunk allocs when compiled in DEBUG mode to increase test
      coverage.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      1cdadee1
    • Brian Foster's avatar
      xfs: allocate sparse inode chunks on full chunk allocation failure · 56d1115c
      Brian Foster authored
      xfs_ialloc_ag_alloc() makes several attempts to allocate a full inode
      chunk. If all else fails, reduce the allocation to the sparse length and
      alignment and attempt to allocate a sparse inode chunk.
      
      If sparse chunk allocation succeeds, check whether an inobt record
      already exists that can track the chunk. If so, inherit and update the
      existing record. Otherwise, insert a new record for the sparse chunk.
      
      Create helpers to align sparse chunk inode records and insert or update
      existing records in the inode btrees. The xfs_inobt_insert_sprec()
      helper implements the merge or update semantics required for sparse
      inode records with respect to both the inobt and finobt. To update the
      inobt, either insert a new record or merge with an existing record. To
      update the finobt, use the updated inobt record to either insert or
      replace an existing record.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      56d1115c
    • Brian Foster's avatar
      xfs: pass inode count through ordered icreate log item · 463958af
      Brian Foster authored
      v5 superblocks use an ordered log item for logging the initialization of
      inode chunks. The icreate log item is currently hardcoded to an inode
      count of 64 inodes.
      
      The agbno and extent length are used to initialize the inode chunk from
      log recovery. While an incorrect inode count does not lead to bad inode
      chunk initialization, we should pass the correct inode count such that log
      recovery has enough data to perform meaningful validity checks on the
      chunk.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      463958af
    • Brian Foster's avatar
      xfs: introduce inode record hole mask for sparse inode chunks · 5419040f
      Brian Foster authored
      The inode btrees track 64 inodes per record regardless of inode size.
      Thus, inode chunks on disk vary in size depending on the size of the
      inodes. This creates a contiguous allocation requirement for new inode
      chunks that can be difficult to satisfy on an aged and fragmented (free
      space) filesystems.
      
      The inode record freecount currently uses 4 bytes on disk to track the
      free inode count. With a maximum freecount value of 64, only one byte is
      required. Convert the freecount field to a single byte and use two of
      the remaining 3 higher order bytes left for the hole mask field. Use the
      final leftover byte for the total count field.
      
      The hole mask field tracks holes in the chunks of physical space that
      the inode record refers to. This facilitates the sparse allocation of
      inode chunks when contiguous chunks are not available and allows the
      inode btrees to identify what portions of the chunk contain valid
      inodes. The total count field contains the total number of valid inodes
      referred to by the record. This can also be deduced from the hole mask.
      The count field provides clarity and redundancy for internal record
      verification.
      
      Note that neither of the new fields can be written to disk on fs'
      without sparse inode support. Doing so writes to the high-order bytes of
      freecount and causes corruption from the perspective of older kernels.
      The on-disk inobt record data structure is updated with a union to
      distinguish between the original, "full" format and the new, "sparse"
      format. The conversion routines to get, insert and update records are
      updated to translate to and from the on-disk record accordingly such
      that freecount remains a 4-byte value on non-supported fs, yet the new
      fields of the in-core record are always valid with respect to the
      record. This means that higher level code can refer to the current
      in-core record format unconditionally and lower level code ensures that
      records are translated to/from disk according to the capabilities of the
      fs.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      5419040f
    • Brian Foster's avatar
      xfs: use sparse chunk alignment for min. inode allocation requirement · 066a1884
      Brian Foster authored
      xfs_ialloc_ag_select() iterates through the allocation groups looking
      for free inodes or free space to determine whether to allow an inode
      allocation to proceed. If no free inodes are available, it assumes that
      an AG must have an extent longer than mp->m_ialloc_blks.
      
      Sparse inode chunk support currently allows for allocations smaller than
      the traditional inode chunk size specified in m_ialloc_blks. The current
      minimum sparse allocation is set in the superblock sb_spino_align field
      at mkfs time. Create a new m_ialloc_min_blks field in xfs_mount and use
      this to represent the minimum supported allocation size for inode
      chunks. Initialize m_ialloc_min_blks at mount time based on whether
      sparse inodes are supported.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      066a1884
    • Brian Foster's avatar
      xfs: update free inode record logic to support sparse inode records · 999633d3
      Brian Foster authored
      xfs_difree_inobt() uses logic in a couple places that assume inobt
      records refer to fully allocated chunks. Specifically, the use of
      mp->m_ialloc_inos can cause problems for inode chunks that are sparsely
      allocated. Sparse inode chunks can, by definition, define a smaller
      number of inodes than a full inode chunk.
      
      Fix the logic that determines whether an inode record should be removed
      from the inobt to use the ir_free mask rather than ir_freecount. Fix the
      agi counters modification to use ir_freecount to add the actual number
      of inodes freed rather than assuming a full inode chunk.
      
      Also make sure that we preserve the behavior to not remove inode chunks
      if the block size is large enough for multiple inode chunks (e.g.,
      bsize=64k, isize=512). This behavior was previously implicit in that in
      such configurations, ir.freecount of a single record never matches
      m_ialloc_inos. Hence, add some comments as well.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      999633d3
    • Brian Foster's avatar
      xfs: create individual inode alloc. helper · d4cc540b
      Brian Foster authored
      Inode allocation from sparse inode records must filter the ir_free mask
      against ir_holemask.  In preparation for this requirement, create a
      helper to allocate an individual inode from an inode record.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      d4cc540b
    • George Wang's avatar
      xfs: use percpu_counter_read_positive for mp->m_icount · 74f9ce1c
      George Wang authored
      Function percpu_counter_read just return the current counter, which can be
      negative. This will cause the checking of "allocated inode
      counts <= m_maxicount" false positive. Use percpu_counter_read_positive can
      solve this problem, and be consistent with the purpose to introduce percpu
      mechanism to xfs.
      Signed-off-by: default avatarGeorge Wang <xuw2015@gmail.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      
      74f9ce1c
  23. 23 Feb, 2015 1 commit