1. 25 Mar, 2015 14 commits
  2. 23 Feb, 2015 26 commits
    • Dave Chinner's avatar
      Merge branch 'xfs-mmap-lock' into for-next · 88e8fda9
      Dave Chinner authored
      88e8fda9
    • Dave Chinner's avatar
      Merge branch 'xfs-generic-sb-counters' into for-next · 4225441a
      Dave Chinner authored
      Conflicts:
      	fs/xfs/xfs_super.c
      4225441a
    • Dave Chinner's avatar
      3cabb836
    • Eric Sandeen's avatar
      xfs: remove deprecated mount options · 444a7022
      Eric Sandeen authored
      We recently removed deprecated sysctls; may as well
      remove deprecated mount options as well, we've stated
      that they'd be gone by now in the docs.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      444a7022
    • Dave Chinner's avatar
      xfs: xfs_alloc_fix_minleft can underflow near ENOSPC · 3790a8cd
      Dave Chinner authored
      Test generic/224 is failing with a corruption being detected on one
      of Michael's test boxes.  Debug that Michael added is indicating
      that the minleft trimming is resulting in an underflow:
      
      .....
       before fixup:              rlen          1  args->len          0
       after xfs_alloc_fix_len  : rlen          1  args->len          1
       before goto out_nominleft: rlen          1  args->len          0
       before fixup:              rlen          1  args->len          0
       after xfs_alloc_fix_len  : rlen          1  args->len          1
       after fixup:               rlen          1  args->len          1
       before fixup:              rlen          1  args->len          0
       after xfs_alloc_fix_len  : rlen          1  args->len          1
       after fixup:               rlen 4294967295  args->len 4294967295
       XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_alloc.c, line: 1424
      
      The "goto out_nominleft:" indicates that we are getting close to
      ENOSPC in the AG, and a couple of allocations later we underflow
      and the corruption check fires in xfs_alloc_ag_vextent_size().
      
      The issue is that the extent length fixups comaprisons are done
      with variables of xfs_extlen_t types. These are unsigned so an
      underflow looks like a really big value and hence is not detected
      as being smaller than the minimum length allowed for the extent.
      Hence the corruption check fires as it is noticing that the returned
      length is longer than the original extent length passed in.
      
      This can be easily fixed by ensuring we do the underflow test on
      signed values, the same way xfs_alloc_fix_len() prevents underflow.
      So we realise in future that these casts prevent underflows from
      going undetected, add comments to the code indicating this.
      Reported-by: default avatarMichael L. Semon <mlsemon35@gmail.com>
      Tested-by: default avatarMichael L. Semon <mlsemon35@gmail.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      3790a8cd
    • Eric Sandeen's avatar
      xfs: cancel failed transaction in xfs_fs_commit_blocks() · 83d5f018
      Eric Sandeen authored
      If xfs_trans_reserve fails we don't cancel the transaction,
      and we'll leak the allocated transaction pointer.
      
      Spotted by Coverity.
      Signed-off-by: default avatarEric Sandeen <ssandeen@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      83d5f018
    • Wang Sheng-Hui's avatar
      xfs: remove old and redundant comment in xfs_mount_validate_sb · dd5e7127
      Wang Sheng-Hui authored
      The error messages document the reason for the checks better than the comment
      and the comments about volume mounts date back to Irix and so aren't relevant
      any more. So just remove the old and redundant comment.
      Signed-off-by: default avatarWang Sheng-Hui <shhuiw@foxmail.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      dd5e7127
    • Eric Sandeen's avatar
      xfs: clarify async write failure ratelimit message · fdadf267
      Eric Sandeen authored
      Today, when the "failing async writes" get ratelimited, we see:
      
      XFS:: 62836 callbacks suppressed
      
      Aside from the extra ":" it's not entirely clear which message is being
      suppressed, especially if other messages or ratelimits are happening
      at the same time.  Clarify this as i.e.:
      
      XFS (dm-11): Failing async write on buffer block 0x140090. Retrying async write.
      XFS: Failing async write: 62836 callbacks suppressed
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      fdadf267
    • Eric Sandeen's avatar
      xfs: log unmount events on console · 3b9ce795
      Eric Sandeen authored
      There are times, when doing triage and forensics,
      that we would like to know whether a filesystem was unmounted,
      or if the plug was pulled without a clean unmount.  Log
      unmounts at the same level (NOTICE) as we log mounts.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      3b9ce795
    • Eric Sandeen's avatar
      xfs: Ensure we have target_ip for RENAME_EXCHANGE · fc921566
      Eric Sandeen authored
      We shouldn't get here with RENAME_EXCHANGE set and no
      target_ip, but let's be defensive, because xfs_cross_rename()
      will dereference it.
      
      Spotted by Coverity.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      fc921566
    • Eric Sandeen's avatar
      xfs: pass mp to XFS_WANT_CORRUPTED_RETURN · 5fb5aeee
      Eric Sandeen authored
      Today, if we hit an XFS_WANT_CORRUPTED_RETURN we don't print any
      information about which filesystem hit it.  Passing in the mp allows
      us to print the filesystem (device) name, which is a pretty critical
      piece of information.
      
      Tested by running fsfuzzer 'til I hit some.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      5fb5aeee
    • Eric Sandeen's avatar
      xfs: pass mp to XFS_WANT_CORRUPTED_GOTO · c29aad41
      Eric Sandeen authored
      Today, if we hit an XFS_WANT_CORRUPTED_GOTO we don't print any
      information about which filesystem hit it.  Passing in the mp allows
      us to print the filesystem (device) name, which is a pretty critical
      piece of information.
      
      Tested by running fsfuzzer 'til I hit some.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      c29aad41
    • Dave Chinner's avatar
      xfs: inodes are new until the dentry cache is set up · 58c90473
      Dave Chinner authored
      Al Viro noticed a generic set of issues to do with filehandle lookup
      racing with dentry cache setup. They involve a filehandle lookup
      occurring while an inode is being created and the filehandle lookup
      racing with the dentry creation for the real file. This can lead to
      multiple dentries for the one path being instantiated. There are a
      host of other issues around this same set of paths.
      
      The underlying cause is that file handle lookup only waits on inode
      cache instantiation rather than full dentry cache instantiation. XFS
      is mostly immune to the problems discovered due to it's own internal
      inode cache, but there are a couple of corner cases where races can
      happen.
      
      We currently clear the XFS_INEW flag when the inode is fully set up
      after insertion into the cache. Newly allocated inodes are inserted
      locked and so aren't usable until the allocation transaction
      commits. This, however, occurs before the dentry and security
      information is fully initialised and hence the inode is unlocked and
      available for lookups to find too early.
      
      To solve the problem, only clear the XFS_INEW flag for newly created
      inodes once the dentry is fully instantiated. This means lookups
      will retry until the XFS_INEW flag is removed from the inode and
      hence avoids the race conditions in questions.
      
      THis also means that xfs_create(), xfs_create_tmpfile() and
      xfs_symlink() need to finish the setup of the inode in their error
      paths if we had allocated the inode but failed later in the creation
      process. xfs_symlink(), in particular, needed a lot of help to make
      it's error handling match that of xfs_create().
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      58c90473
    • Dave Chinner's avatar
      xfs: ensure truncate forces zeroed blocks to disk · 5885ebda
      Dave Chinner authored
      A new fsync vs power fail test in xfstests indicated that XFS can
      have unreliable data consistency when doing extending truncates that
      require block zeroing. The blocks beyond EOF get zeroed in memory,
      but we never force those changes to disk before we run the
      transaction that extends the file size and exposes those blocks to
      userspace. This can result in the blocks not being correctly zeroed
      after a crash.
      
      Because in-memory behaviour is correct, tools like fsx don't pick up
      any coherency problems - it's not until the filesystem is shutdown
      or the system crashes after writing the truncate transaction to the
      journal but before the zeroed data in the page cache is flushed that
      the issue is exposed.
      
      Fix this by also flushing the dirty data in memory region between
      the old size and new size when we've found blocks that need zeroing
      in the truncate process.
      Reported-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      5885ebda
    • Jan Kara's avatar
      xfs: Fix quota type in quota structures when reusing quota file · dfcc70a8
      Jan Kara authored
      For filesystems without separate project quota inode field in the
      superblock we just reuse project quota file for group quotas (and vice
      versa) if project quota file is allocated and we need group quota file.
      When we reuse the file, quota structures on disk suddenly have wrong
      type stored in d_flags though. Nobody really cares about this (although
      structure type reported to userspace was wrong as well) except
      that after commit 14bf61ff (quota: Switch ->get_dqblk() and
      ->set_dqblk() to use bytes as space units) assertion in
      xfs_qm_scall_getquota() started to trigger on xfs/106 test (apparently I
      was testing without XFS_DEBUG so I didn't notice when submitting the
      above commit).
      
      Fix the problem by properly resetting ddq->d_flags when running quotacheck
      for a quota file.
      
      CC: stable@vger.kernel.org
      Reported-by: default avatarAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      dfcc70a8
    • Dave Chinner's avatar
      xfs: lock out page faults from extent swap operations · 723cac48
      Dave Chinner authored
      Extent swap operations are another extent manipulation operation
      that we need to ensure does not race against mmap page faults. The
      current code returns if the file is mapped prior to the swap being
      done, but it could potentially race against new page faults while
      the swap is in progress. Hence we should use the XFS_MMAPLOCK_EXCL
      for this operation, too.
      
      While there, fix the error path handling that can result in double
      unlocks of the inodes when cancelling the swapext transaction.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      723cac48
    • Dave Chinner's avatar
      xfs: xfs_setattr_size no longer races with page faults · 0f9160b4
      Dave Chinner authored
      Now that truncate locks out new page faults, we no longer need to do
      special writeback hacks in truncate to work around potential races
      between page faults, page cache truncation and file size updates to
      ensure we get write page faults for extending truncates on sub-page
      block size filesystems. Hence we can remove the code in
      xfs_setattr_size() that handles this and update the comments around
      the code tha thandles page cache truncate and size updates to
      reflect the new reality.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      0f9160b4
    • Dave Chinner's avatar
      xfs: take i_mmap_lock on extent manipulation operations · e8e9ad42
      Dave Chinner authored
      Now we have the i_mmap_lock being held across the page fault IO
      path, we now add extent manipulation operation exclusion by adding
      the lock to the paths that directly modify extent maps. This
      includes truncate, hole punching and other fallocate based
      operations. The operations will now take both the i_iolock and the
      i_mmaplock in exclusive mode, thereby ensuring that all IO and page
      faults block without holding any page locks while the extent
      manipulation is in progress.
      
      This gives us the lock order during truncate of i_iolock ->
      i_mmaplock -> page_lock -> i_lock, hence providing the same
      lock order as the iolock provides the normal IO path without
      involving the mmap_sem.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      e8e9ad42
    • Dave Chinner's avatar
      xfs: use i_mmaplock on write faults · 075a924d
      Dave Chinner authored
      Take the i_mmaplock over write page faults. These come through the
      ->page_mkwrite callout, so we need to wrap that calls with the
      i_mmaplock.
      
      This gives us a lock order of mmap_sem -> i_mmaplock -> page_lock
      -> i_lock.
      
      Also, move the page_mkwrite wrapper to the same region of xfs_file.c
      as the read fault wrappers and add a tracepoint.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      075a924d
    • Dave Chinner's avatar
      xfs: use i_mmaplock on read faults · de0e8c20
      Dave Chinner authored
      Take the i_mmaplock over read page faults. These come through the
      ->fault callout, so we need to wrap the generic implementation
      with the i_mmaplock. While there, add tracepoints for the read
      fault as it passes through XFS.
      
      This gives us a lock order of mmap_sem -> i_mmaplock -> page_lock
      -> i_lock.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      de0e8c20
    • Dave Chinner's avatar
      xfs: introduce mmap/truncate lock · 653c60b6
      Dave Chinner authored
      Right now we cannot serialise mmap against truncate or hole punch
      sanely. ->page_mkwrite is not able to take locks that the read IO
      path normally takes (i.e. the inode iolock) because that could
      result in lock inversions (read - iolock - page fault - page_mkwrite
      - iolock) and so we cannot use an IO path lock to serialise page
      write faults against truncate operations.
      
      Instead, introduce a new lock that is used *only* in the
      ->page_mkwrite path that is the equivalent of the iolock. The lock
      ordering in a page fault is i_mmaplock -> page lock -> i_ilock,
      and so in truncate we can i_iolock -> i_mmaplock and so lock out
      new write faults during the process of truncation.
      
      Because i_mmap_lock is outside the page lock, we can hold it across
      all the same operations we hold the i_iolock for. The only
      difference is that we never hold the i_mmaplock in the normal IO
      path and so do not ever have the possibility that we can page fault
      inside it. Hence there are no recursion issues on the i_mmap_lock
      and so we can use it to serialise page fault IO against inode
      modification operations that affect the IO path.
      
      This patch introduces the i_mmaplock infrastructure, lockdep
      annotations and initialisation/destruction code. Use of the new lock
      will be in subsequent patches.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      653c60b6
    • Dave Chinner's avatar
      xfs: remove xfs_mod_incore_sb API · 964aa8d9
      Dave Chinner authored
      Now that there are no users of the bitfield based incore superblock
      modification API, just remove the whole damn lot of it, including
      all the bitfield definitions. This finally removes a lot of cruft
      that has been around for a long time.
      
      Credit goes to Christoph Hellwig for providing a great patch
      connecting all the dots to enale us to do this. This patch is
      derived from that work.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      964aa8d9
    • Dave Chinner's avatar
      xfs: replace xfs_mod_incore_sb_batched · 0bd5dded
      Dave Chinner authored
      Introduce helper functions for modifying fields in the superblock
      into xfs_trans.c, the only caller of xfs_mod_incore_sb_batch().  We
      can then use these directly in xfs_trans_unreserve_and_mod_sb() and
      so remove another user of the xfs_mode_incore_sb() API without
      losing any functionality or scalability of the transaction commit
      code..
      
      Based on a patch from Christoph Hellwig.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      0bd5dded
    • Dave Chinner's avatar
      xfs: introduce xfs_mod_frextents · bab98bbe
      Dave Chinner authored
      Add a new helper to modify the incore counter of free realtime
      extents. This matches the helpers used for inode and data block
      counters, and removes a significant users of the xfs_mod_incore_sb()
      interface.
      
      Based on a patch originally from Christoph Hellwig.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      bab98bbe
    • Dave Chinner's avatar
      xfs: Remove icsb infrastructure · 5681ca40
      Dave Chinner authored
      Now that the in-core superblock infrastructure has been replaced with
      generic per-cpu counters, we don't need it anymore. Nuke it from
      orbit so we are sure that it won't haunt us again...
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      5681ca40
    • Dave Chinner's avatar
      xfs: use generic percpu counters for free block counter · 0d485ada
      Dave Chinner authored
      XFS has hand-rolled per-cpu counters for the superblock since before
      there was any generic implementation. The free block counter is
      special in that it is used for ENOSPC detection outside transaction
      contexts for for delayed allocation. This means that the counter
      needs to be accurate at zero. The current per-cpu counter code jumps
      through lots of hoops to ensure we never run past zero, but we don't
      need to make all those jumps with the generic counter
      implementation.
      
      The generic counter implementation allows us to pass a "batch"
      threshold at which the addition/subtraction to the counter value
      will be folded back into global value under lock. We can use this
      feature to reduce the batch size as we approach 0 in a very similar
      manner to the existing counters and their rebalance algorithm. If we
      use a batch size of 1 as we approach 0, then every addition and
      subtraction will be done against the global value and hence allow
      accurate detection of zero threshold crossing.
      
      Hence we can replace the handrolled, accurate-at-zero counters with
      generic percpu counters.
      
      Note: this removes just enough of the icsb infrastructure to compile
      without warnings. The rest will go in subsequent commits.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      0d485ada