1. 02 Oct, 2014 1 commit
    • Mark Tinguely's avatar
      xfs: xfs_iflush_done checks the wrong log item callback · 52177937
      Mark Tinguely authored
      Commit 30136832 ("xfs: remove all the inodes on a buffer from the AIL
      in bulk") made the xfs inode flush callback more efficient by
      combining all the inode writes on the buffer and the deletions of
      the inode log item from AIL.
      
      The initial loop in this patch should be looping through all
      the log items on the buffer to see which items have
      xfs_iflush_done as their callback function. But currently,
      only the log item passed to the function has its callback
      compared to xfs_iflush_done. If the log item pointer passed to
      the function does have the xfs_iflush_done callback function,
      then all the log items on the buffer are removed from the
      li_bio_list on the buffer b_fspriv and could be removed from
      the AIL even though they may have not been written yet.
      
      This problem is masked by the fact that currently all inodes on a
      buffer will have the same calback function - either xfs_iflush_done
      or xfs_istale_done - and hence the bug cannot manifest in any way.
      Still, we need to remove the landmine so that if we add new
      callbacks in future this doesn't cause us problems.
      Signed-off-by: default avatarMark Tinguely <tinguely@sgi.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      52177937
  2. 01 Oct, 2014 10 commits
    • Brian Foster's avatar
      xfs: flush the range before zero range conversion · da5f1096
      Brian Foster authored
      XFS currently discards delalloc blocks within the target range of a
      zero range request. Unaligned start and end offsets are zeroed
      through the page cache and the internal, aligned blocks are
      converted to unwritten extents.
      
      If EOF is page aligned and covered by a delayed allocation extent.
      The inode size is not updated until I/O completion. If a zero range
      request discards a delalloc range that covers page aligned EOF as
      such, the inode size update never occurs. For example:
      
      $ rm -f /mnt/file
      $ xfs_io -fc "pwrite 0 64k" -c "zero 60k 4k" /mnt/file
      $ stat -c "%s" /mnt/file
      65536
      $ umount /mnt
      $ mount <dev> /mnt
      $ stat -c "%s" /mnt/file
      61440
      
      Update xfs_zero_file_space() to flush the range rather than discard
      delalloc blocks to ensure that inode size updates occur
      appropriately.
      
      [dchinner: Note that this is really a workaround to avoid the
      underlying problems. More work is needed (and ongoing) to fix those
      issues so this fix is being added as a temporary stop-gap measure. ]
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      da5f1096
    • Brian Foster's avatar
      xfs: restore buffer_head unwritten bit on ioend cancel · 07d08681
      Brian Foster authored
      xfs_vm_writepage() walks each buffer_head on the page, maps to the block
      on disk and attaches to a running ioend structure that represents the
      I/O submission. A new ioend is created when the type of I/O (unwritten,
      delayed allocation or overwrite) required for a particular buffer_head
      differs from the previous. If a buffer_head is a delalloc or unwritten
      buffer, the associated bits are cleared by xfs_map_at_offset() once the
      buffer_head is added to the ioend.
      
      The process of mapping each buffer_head occurs in xfs_map_blocks() and
      acquires the ilock in blocking or non-blocking mode, depending on the
      type of writeback in progress. If the lock cannot be acquired for
      non-blocking writeback, we cancel the ioend, redirty the page and
      return. Writeback will revisit the page at some later point.
      
      Note that we acquire the ilock for each buffer on the page. Therefore
      during non-blocking writeback, it is possible to add an unwritten buffer
      to the ioend, clear the unwritten state, fail to acquire the ilock when
      mapping a subsequent buffer and cancel the ioend. If this occurs, the
      unwritten status of the buffer sitting in the ioend has been lost. The
      page will eventually hit writeback again, but xfs_vm_writepage() submits
      overwrite I/O instead of unwritten I/O and does not perform unwritten
      extent conversion at I/O completion. This leads to data corruption
      because unwritten extents are treated as holes on reads and zeroes are
      returned instead of reading from disk.
      
      Modify xfs_cancel_ioend() to restore the buffer unwritten bit for ioends
      of type XFS_IO_UNWRITTEN. This ensures that unwritten extent conversion
      occurs once the page is eventually written back.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      07d08681
    • Eric Sandeen's avatar
      xfs: check for null dquot in xfs_quota_calc_throttle() · 5cca3f61
      Eric Sandeen authored
      Coverity spotted this.
      
      Granted, we *just* checked xfs_inod_dquot() in the caller (by
      calling xfs_quota_need_throttle). However, this is the only place we
      don't check the return value but the check is cheap and future-proof
      so add it.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      5cca3f61
    • Eric Sandeen's avatar
      xfs: fix crc field handling in xfs_sb_to/from_disk · 04dd1a0d
      Eric Sandeen authored
      I discovered this in userspace, but the same change applies
      to the kernel.
      
      If we xfs_mdrestore an image from a non-crc filesystem, lo
      and behold the restored image has gained a CRC:
      
      # db/xfs_metadump.sh -o /dev/sdc1 - | xfs_mdrestore - test.img
      # xfs_db -c "sb 0" -c "p crc" /dev/sdc1
      crc = 0 (correct)
      # xfs_db -c "sb 0" -c "p crc" test.img
      crc = 0xb6f8d6a0 (correct)
      
      This is because xfs_sb_from_disk doesn't fill in sb_crc,
      but xfs_sb_to_disk(XFS_SB_ALL_BITS) does write the in-memory
      CRC to disk - so we get uninitialized memory on disk.
      
      Fix this by always initializing sb_crc to 0 when we read
      the superblock, and masking out the CRC bit from ALL_BITS
      when we write it.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      04dd1a0d
    • Eric Sandeen's avatar
      xfs: don't send null bp to xfs_trans_brelse() · 6ee49a20
      Eric Sandeen authored
      In this case, if bp is NULL, error is set, and we send a
      NULL bp to xfs_trans_brelse, which will try to dereference it.
      
      Test whether we actually have a buffer before we try to
      free it.
      
      Coverity spotted this.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      6ee49a20
    • Brian Foster's avatar
      xfs: check for inode size overflow in xfs_new_eof() · ce57bcf6
      Brian Foster authored
      If we write to the maximum file offset (2^63-2), XFS fails to log the
      inode size update when the page is flushed. For example:
      
      $ xfs_io -fc "pwrite `echo "2^63-1-1" | bc` 1" /mnt/file
      wrote 1/1 bytes at offset 9223372036854775806
      1.000000 bytes, 1 ops; 0.0000 sec (22.711 KiB/sec and 23255.8140 ops/sec)
      $ stat -c %s /mnt/file
      9223372036854775807
      $ umount /mnt ; mount <dev> /mnt/
      $ stat -c %s /mnt/file
      0
      
      This occurs because XFS calculates the new file size as io_offset +
      io_size, I/O occurs in block sized requests, and the maximum supported
      file size is not block aligned. Therefore, a write to the max allowable
      offset on a 4k blocksize fs results in a write of size 4k to offset
      2^63-4096 (e.g., equivalent to round_down(2^63-1, 4096), or IOW the
      offset of the block that contains the max file size). The offset plus
      size calculation (2^63 - 4096 + 4096 == 2^63) overflows the signed
      64-bit variable which goes negative and causes the > comparison to the
      on-disk inode size to fail. This returns 0 from xfs_new_eof() and
      results in no change to the inode on-disk.
      
      Update xfs_new_eof() to explicitly detect overflow of the local
      calculation and use the VFS inode size in this scenario. The VFS inode
      size is capped to the maximum and thus XFS writes the correct inode size
      to disk.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      ce57bcf6
    • Dave Chinner's avatar
      xfs: only set extent size hint when asked · a872703f
      Dave Chinner authored
      Currently the extent size hint is set unconditionally in
      xfs_ioctl_setattr() when the FSX_EXTSIZE flag is set. Hence we can
      set hints when the inode flags indicating the hint should be used
      are not set.  Hence only set the extent size hint from userspace
      when the inode has the XFS_DIFLAG_EXTSIZE flag set to indicate that
      we should have an extent size hint set on the inode.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      a872703f
    • Dave Chinner's avatar
      xfs: project id inheritance is a directory only flag · 9336e3a7
      Dave Chinner authored
      xfs_set_diflags() allows it to be set on non-directory inodes, and
      this flags errors in xfs_repair. Further, inode allocation allows
      the same directory-only flag to be inherited to non-directories.
      Make sure directory inode flags don't appear on other types of
      inodes.
      
      This fixes several xfstests scratch fileystem corruption reports
      (e.g. xfs/050) now that xfstests checks scratch filesystems after
      test completion.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      9336e3a7
    • Dave Chinner's avatar
      xfs: kill time.h · e076b0f3
      Dave Chinner authored
      The typedef for timespecs and nanotime() are completely unnecessary,
      and delay() can be moved to fs/xfs/linux.h, which means this file
      can go away.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      e076b0f3
    • Dave Chinner's avatar
      xfs: compat_xfs_bstat does not have forkoff · b1d6cc02
      Dave Chinner authored
      struct compat_xfs_bstat is missing the di_forkoff field and so does
      not fully translate the structure correctly. Fix it.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      b1d6cc02
  3. 23 Sep, 2014 7 commits
    • Dave Chinner's avatar
      xfs: flush entire last page of old EOF on truncate up · 2ebff7bb
      Dave Chinner authored
      On a sub-page sized filesystem, truncating a mapped region down
      leaves us in a world of hurt. We truncate the pagecache, zeroing the
      newly unused tail, then punch blocks out from under the page. If we
      then truncate the file back up immediately, we expose that unmapped
      hole to a dirty page mapped into the user application, and that's
      where it all goes wrong.
      
      In truncating the page cache, we avoid unmapping the tail page of
      the cache because it still contains valid data. The problem is that
      it also contains a hole after the truncate, but nobody told the mm
      subsystem that. Therefore, if the page is dirty before the truncate,
      we'll never get a .page_mkwrite callout after we extend the file and
      the application writes data into the hole on the page.  Hence when
      we come to writing that region of the page, it has no blocks and no
      delayed allocation reservation and hence we toss the data away.
      
      This patch adds code to the truncate up case to solve it, by
      ensuring the partial page at the old EOF is always cleaned after we
      do any zeroing and move the EOF upwards. We can't actually serialise
      the page writeback and truncate against page faults (yes, that
      problem AGAIN) so this is really just a best effort and assumes it
      is extremely unlikely that someone is concurrently writing to the
      page at the EOF while extending the file.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      2ebff7bb
    • Dave Chinner's avatar
      xfs: xfs_swap_extent_flush can be static · 7abbb8f9
      Dave Chinner authored
      Fix sparse warning introduced by commit 4ef897a2 ("xfs: flush both
      inodes in xfs_swap_extents").
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      7abbb8f9
    • Dave Chinner's avatar
      xfs: xfs_buf_write_fail_rl_state can be static · 02cc1876
      Dave Chinner authored
      Fix sparse warning introduced by commit ac8809f9 ("xfs: abort
      metadata writeback on permanent errors").
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      02cc1876
    • Fengguang Wu's avatar
      xfs: xfs_rtget_summary can be static · ea95961d
      Fengguang Wu authored
      Fix sparse warning introduced by commit afabfd30 ("xfs: combine
      xfs_rtmodify_summary and xfs_rtget_summary").
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      ea95961d
    • Fabian Frederick's avatar
      xfs: remove second xfs_quota.h inclusion in xfs_icache.c · e3cf1796
      Fabian Frederick authored
      xfs_quota.h was included twice.
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      
      e3cf1796
    • Eric Sandeen's avatar
      xfs: don't ASSERT on corrupt ftype · fb040131
      Eric Sandeen authored
      xfs_dir3_data_get_ftype() gets the file type off disk, but ASSERTs
      if it's invalid:
      
           ASSERT(type < XFS_DIR3_FT_MAX);
      
      We shouldn't ASSERT on bad values read from disk.  V3 dirs are
      CRC-protected, but V2 dirs + ftype are not.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      
      fb040131
    • Dave Chinner's avatar
      xfs: xlog_cil_force_lsn doesn't always wait correctly · 8af3dcd3
      Dave Chinner authored
      When running a tight mount/unmount loop on an older kernel, RedHat
      QE found that unmount would occasionally hang in
      xfs_buf_unpin_wait() on the superblock buffer. Tracing and other
      debug work by Eric Sandeen indicated that it was hanging on the
      writing of the superblock during unmount immediately after logging
      the superblock counters in a synchronous transaction. Further debug
      indicated that the synchronous transaction was not waiting for
      completion correctly, and we narrowed it down to
      xlog_cil_force_lsn() returning NULLCOMMITLSN and hence not pushing
      the transaction in the iclog buffer to disk correctly.
      
      While this unmount superblock write code is now very different in
      mainline kernels, the xlog_cil_force_lsn() code is identical, and it
      was bisected to the backport of commit f876e446 ("xfs: always do log
      forces via the workqueue"). This commit made the CIL push
      asynchronous for log forces and hence exposed a race condition that
      couldn't occur on a synchronous push.
      
      Essentially, the xlog_cil_force_lsn() relied implicitly on the fact
      that the sequence push would be complete by the time
      xlog_cil_push_now() returned, resulting in the context being pushed
      being in the committing list. When it was made asynchronous, it was
      recognised that there was a race condition in detecting whether an
      asynchronous push has started or not and code was added to handle
      it.
      
      Unfortunately, the fix was not quite right and left a race condition
      where it it would detect an empty CIL while a push was in progress
      before the context had been added to the committing list. This was
      incorrectly seen as a "nothing to do" condition and so would tell
      xfs_log_force_lsn() that there is nothing to wait for, and hence it
      would push the iclogbufs in memory.
      
      The fix is simple, but explaining the logic and the race condition
      is a lot more complex. The fix is to add the context to the
      committing list before we start emptying the CIL. This allows us to
      detect the difference between an empty "do nothing" push and a push
      that has not started by adding a discrete "emptying the CIL" state
      to avoid the transient, incorrect "empty" condition that the
      (unchanged) waiting code was seeing.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      8af3dcd3
  4. 09 Sep, 2014 11 commits
    • Eric Sandeen's avatar
      xfs: remove rbpp check from xfs_rtmodify_summary_int · ab6978c2
      Eric Sandeen authored
      rbpp is always passed into xfs_rtmodify_summary
      and xfs_rtget_summary, so there is no need to
      test for it in xfs_rtmodify_summary_int.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      ab6978c2
    • Eric Sandeen's avatar
      xfs: combine xfs_rtmodify_summary and xfs_rtget_summary · afabfd30
      Eric Sandeen authored
      xfs_rtmodify_summary and xfs_rtget_summary are almost identical;
      fold them into xfs_rtmodify_summary_int(), with wrappers for each of
      the original calls.
      
      The _int function modifies if a delta is passed, and returns a
      summary pointer if *sum is passed.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      afabfd30
    • Eric Sandeen's avatar
      xfs: combine xfs_dir_canenter into xfs_dir_createname · b16ed7c1
      Eric Sandeen authored
      xfs_dir_canenter and xfs_dir_createname are
      almost identical.
      
      Fold the former into the latter, with a helpful
      wrapper for the former.  If createname is called without
      an inode number, it now only checks for space, and does
      not actually add the entry.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      b16ed7c1
    • Eric Sandeen's avatar
      xfs: check resblks before calling xfs_dir_canenter · 94f3cad5
      Eric Sandeen authored
      Move the resblks test out of the xfs_dir_canenter,
      and into the caller.
      
      This makes a little more sense on the face of it;
      xfs_dir_canenter immediately returns if resblks !=0;
      and given some of the comments preceding the calls:
      
       * Check for ability to enter directory entry, if no space reserved.
      
      even more so.
      
      It also facilitates the next patch.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      94f3cad5
    • Eric Sandeen's avatar
      xfs: deduplicate xlog_do_recovery_pass() · 970fd3f0
      Eric Sandeen authored
      In xlog_do_recovery_pass(), there are 2 distinct cases:
      non-wrapped and wrapped log recovery.
      
      If we find a wrapped log, we recover around the end
      of the log, and then handle the rest of recovery
      exactly as in the non-wrapped case - using exactly the same
      (duplicated) code.
      
      Rather than having the same code in both cases, we can
      get the wrapped portion out of the way first if needed,
      and then recover the non-wrapped portion of the log.
      
      There should be no functional change here, just code
      reorganization & deduplication.
      
      The patch looks a bit bigger than it really is; the last
      hunk is whitespace changes (un-indenting).
      
      Tested with xfstests "check -g log" on a stock configuration.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      970fd3f0
    • Eric Sandeen's avatar
      xfs: lseek: the "whence" argument is called "whence" · 59f9c004
      Eric Sandeen authored
      For some reason, the older commit:
      
          965c8e59 lseek: the "whence" argument is called "whence"
      
          lseek: the "whence" argument is called "whence"
      
          But the kernel decided to call it "origin" instead.
          Fix most of the sites.
      
      left out xfs.  So fix xfs.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarJie Liu <jeff.liu@oracle.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      59f9c004
    • Eric Sandeen's avatar
      xfs: combine xfs_seek_hole & xfs_seek_data · 49c69591
      Eric Sandeen authored
      xfs_seek_hole & xfs_seek_data are remarkably similar;
      so much so that they can be combined, saving a fair
      bit of semi-complex code duplication.
      
      The following patch passes generic/285 and generic/286,
      which specifically test seek behavior.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarJie Liu <jeff.liu@oracle.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      49c69591
    • Brian Foster's avatar
      xfs: export log_recovery_delay to delay mount time log recovery · 2e227178
      Brian Foster authored
      XFS log recovery has been discovered to have race conditions with
      buffers when I/O errors occur. External tools are available to simulate
      I/O errors to XFS, but this alone is not sufficient for testing log
      recovery. XFS unconditionally resets the inactive region of the log
      prior to log recovery to avoid confusion over processing any partially
      written log records that might have been written before an unclean
      shutdown. Therefore, unconditional write I/O failures at mount time are
      caught by the reset sequence rather than log recovery and hinder the
      ability to test the latter.
      
      The device-mapper dm-flakey module uses an up/down timer to define a
      cycle for when to fail I/Os. Create a pre log recovery delay tunable
      that can be used to coordinate XFS log recovery with I/O errors
      simulated by dm-flakey. This facilitates coordination in userspace that
      allows the reset of stale log blocks to succeed and writes due to log
      recovery to fail. For example, define a dm-flakey instance with an
      uptime long enough to allow log reset to succeed and a log recovery
      delay long enough to allow the dm-flakey uptime to expire.
      
      The 'log_recovery_delay' sysfs tunable is exported under
      /sys/fs/xfs/debug and is only enabled for kernels compiled in XFS debug
      mode. The value is exported in units of seconds and allows for a delay
      of up to 60 seconds. Note that this is for XFS debug and test
      instrumentation purposes only and should not be used by applications. No
      delay is enabled by default.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      2e227178
    • Brian Foster's avatar
      xfs: add debug sysfs attribute set · 65b65735
      Brian Foster authored
      Create a top-level debug directory for global debug sysfs attributes.
      This directory is added and removed on XFS module initialization and
      removal respectively for DEBUG mode kernels only. It typically resides
      at /sys/fs/xfs/debug. It is located at the top level of the xfs sysfs
      hierarchy as attributes might define global behavior or behavior that
      must be configured before an xfs mount is available (e.g., log recovery
      behavior).
      
      Define the global debug kobject that represents the debug sysfs
      directory and add generic attribute show/store helpers to support future
      attributes. No debug attributes are exported as of yet.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      65b65735
    • Eric Sandeen's avatar
      xfs: add a few more verifier tests · e1b05723
      Eric Sandeen authored
      These were exposed by fsfuzzer runs; without them we fail
      in various exciting and sometimes convoluted ways when we
      encounter disk corruption.
      
      Without the MAXLEVELS tests we tend to walk off the end of
      an array in a loop like this:
      
              for (i = 0; i < cur->bc_nlevels; i++) {
                      if (cur->bc_bufs[i])
      
      Without the dirblklog test we try to allocate more memory
      than we could possibly hope for and loop forever:
      
      xfs_dabuf_map()
      	nfsb = mp->m_dir_geo->fsbcount;
      	irecs = kmem_zalloc(sizeof(irec) * nfsb, KM_SLEEP...
      
      As for the logbsize check, that's the convoluted one.
      
      If logbsize is specified at mount time, it's sanitized
      in xfs_parseargs; in particular it makes sure that it's
      not > XLOG_MAX_RECORD_BSIZE.
      
      If not specified at mount time, it comes from the superblock
      via sb_logsunit; this is limited to 256k at mkfs time as well;
      it's copied into m_logbsize in xfs_finish_flags().
      
      However, if for some reason the on-disk value is corrupt and
      too large, nothing catches it.  It's a circuitous path, but
      that size eventually finds its way to places that make the kernel
      very unhappy, leading to oopses in xlog_pack_data() because we
      use the size as an index into iclog->ic_data, but the array
      is not necessarily that big.
      
      Anyway - bounds checking when we read from disk is a good thing!
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      e1b05723
    • Brian Foster's avatar
      xfs: mark all internal workqueues as freezable · 8018ec08
      Brian Foster authored
      Workqueues must be explicitly set as freezable to ensure they are frozen
      in the assocated part of the hibernation/suspend sequence. Freezing of
      workqueues and kernel threads is important to ensure that modifications
      are not made on-disk after the hibernation image has been created.
      Otherwise, the in-memory state can become inconsistent with what is on
      disk and eventually lead to filesystem corruption. We have reports of
      free space btree corruptions that occur immediately after restore from
      hibernate that suggest the xfs-eofblocks workqueue could be causing
      such problems if it races with hibernation.
      
      Mark all of the internal XFS workqueues as freezable to ensure nothing
      changes on-disk once the freezer infrastructure freezes kernel threads
      and creates the hibernation image.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reported-by: default avatarCarlos E. R. <carlos.e.r@opensuse.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      8018ec08
  5. 25 Aug, 2014 5 commits
    • Linus Torvalds's avatar
      Linux 3.17-rc2 · 52addcf9
      Linus Torvalds authored
      52addcf9
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-3.17-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · f01bfc97
      Linus Torvalds authored
      Pull NFS client fixes from Trond Myklebust:
       "Highlights:
      
         - more fixes for read/write codepath regressions
           * sleeping while holding the inode lock
           * stricter enforcement of page contiguity when coalescing requests
           * fix up error handling in the page coalescing code
      
         - don't busy wait on SIGKILL in the file locking code"
      
      * tag 'nfs-for-3.17-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        nfs: Don't busy-wait on SIGKILL in __nfs_iocounter_wait
        nfs: can_coalesce_requests must enforce contiguity
        nfs: disallow duplicate pages in pgio page vectors
        nfs: don't sleep with inode lock in lock_and_join_requests
        nfs: fix error handling in lock_and_join_requests
        nfs: use blocking page_group_lock in add_request
        nfs: fix nonblocking calls to nfs_page_group_lock
        nfs: change nfs_page_group_lock argument
      f01bfc97
    • Linus Torvalds's avatar
      Merge tag 'renesas-sh-drivers-for-v3.17' of... · dd5957b7
      Linus Torvalds authored
      Merge tag 'renesas-sh-drivers-for-v3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas
      
      Pull SH driver fix from Simon Horman:
       "Confine SH_INTC to platforms that need it"
      
      * tag 'renesas-sh-drivers-for-v3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
        sh: intc: Confine SH_INTC to platforms that need it
      dd5957b7
    • Linus Torvalds's avatar
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · 497c01dd
      Linus Torvalds authored
      Pull MIPS fixes from Ralf Baechle:
       "Pretty much all across the field so with this we should be in
        reasonable shape for the upcoming -rc2"
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
        MIPS: OCTEON: make get_system_type() thread-safe
        MIPS: CPS: Initialize EVA before bringing up VPEs from secondary cores
        MIPS: Malta: EVA: Rename 'eva_entry' to 'platform_eva_init'
        MIPS: EVA: Add new EVA header
        MIPS: scall64-o32: Fix indirect syscall detection
        MIPS: syscall: Fix AUDIT value for O32 processes on MIPS64
        MIPS: Loongson: Fix COP2 usage for preemptible kernel
        MIPS: NL: Fix nlm_xlp_defconfig build error
        MIPS: Remove race window in page fault handling
        MIPS: Malta: Improve system memory detection for '{e, }memsize' >= 2G
        MIPS: Alchemy: Fix db1200 PSC clock enablement
        MIPS: BCM47XX: Fix reboot problem on BCM4705/BCM4785
        MIPS: Remove duplicated include from numa.c
        MIPS: Add common plat_irq_dispatch declaration
        MIPS: MSP71xx: remove unused plat_irq_dispatch() argument
        MIPS: GIC: Remove useless parens from GICBIS().
        MIPS: perf: Mark pmu interupt IRQF_NO_THREAD
      497c01dd
    • Linus Torvalds's avatar
      Merge tag 'trace-fixes-v3.17-rc1' of... · 01e9982a
      Linus Torvalds authored
      Merge tag 'trace-fixes-v3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
      
      Pull fix for ftrace function tracer/profiler conflict from Steven Rostedt:
       "The rewrite of the ftrace code that makes it possible to allow for
        separate trampolines had a design flaw with the interaction between
        the function and function_graph tracers.
      
        The main flaw was the simplification of the use of multiple tracers
        having the same filter (like function and function_graph, that use the
        set_ftrace_filter file to filter their code).  The design assumed that
        the two tracers could never run simultaneously as only one tracer can
        be used at a time.  The problem with this assumption was that the
        function profiler could be implemented on top of the function graph
        tracer, and the function profiler could run at the same time as the
        function tracer.  This caused the assumption to be broken and when
        ftrace detected this failed assumpiton it would spit out a nasty
        warning and shut itself down.
      
        Instead of using a single ftrace_ops that switches between the
        function and function_graph callbacks, the two tracers can again use
        their own ftrace_ops.  But instead of having a complex hierarchy of
        ftrace_ops, the filter fields are placed in its own structure and the
        ftrace_ops can carefully use the same filter.  This change took a bit
        to be able to allow for this and currently only the global_ops can
        share the same filter, but this new design can easily be modified to
        allow for any ftrace_ops to share its filter with another ftrace_ops.
      
        The first four patches deal with the change of allowing the ftrace_ops
        to share the filter (and this needs to go to 3.16 as well).
      
        The fifth patch fixes a bug that was also caused by the new changes
        but only for archs other than x86, and only if those archs implement a
        direct call to the function_graph tracer which they do not do yet but
        will in the future.  It does not need to go to stable, but needs to be
        fixed before the other archs update their code to allow direct calls
        to the function_graph trampoline"
      
      * tag 'trace-fixes-v3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        ftrace: Use current addr when converting to nop in __ftrace_replace_code()
        ftrace: Fix function_profiler and function tracer together
        ftrace: Fix up trampoline accounting with looping on hash ops
        ftrace: Update all ftrace_ops for a ftrace_hash_ops update
        ftrace: Allow ftrace_ops to use the hashes from other ops
      01e9982a
  6. 24 Aug, 2014 6 commits
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7be141d0
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "A couple of EFI fixes, plus misc fixes all around the map"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi/arm64: Store Runtime Services revision
        firmware: Do not use WARN_ON(!spin_is_locked())
        x86_32, entry: Clean up sysenter_badsys declaration
        x86/doc: Fix the 'tlb_single_page_flush_ceiling' sysconfig path
        x86/mm: Fix sparse 'tlb_single_page_flush_ceiling' warning and make the variable read-mostly
        x86/mm: Fix RCU splat from new TLB tracepoints
      7be141d0
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 44744bb3
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "A kprobes and a perf compat ioctl fix"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf: Handle compat ioctl
        kprobes: Skip kretprobe hit in NMI context to avoid deadlock
      44744bb3
    • Linus Torvalds's avatar
      Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 959dc258
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "A collection of fixes from this week, it's been pretty quiet and
        nothing really stands out as particularly noteworthy here -- mostly
        minor fixes across the field:
      
         - ODROID booting was fixed due to PMIC interrupts missing in DT
         - a collection of i.MX fixes
         - minor Tegra fix for regulators
         - Rockchip fix and addition of SoC-specific mailing list to make it
           easier to find posted patches"
      
      * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        bus: arm-ccn: Fix warning message
        ARM: shmobile: koelsch: Remove non-existent i2c6 pinmux
        ARM: tegra: apalis/colibri t30: fix on-module 5v0 supplies
        MAINTAINERS: add new Rockchip SoC list
        ARM: dts: rockchip: readd missing mmc0 pinctrl settings
        ARM: dts: ODROID i2c improvements
        ARM: dts: Enable PMIC interrupts on ODROID
        ARM: dts: imx6sx: fix the pad setting for uart CTS_B
        ARM: dts: i.MX53: fix apparent bug in VPU clks
        ARM: imx: correct gpu2d_axi and gpu3d_axi clock setting
        ARM: dts: imx6: edmqmx6: change enet reset pin
        ARM: dts: vf610-twr: Fix pinctrl_esdhc1 pin definitions.
        ARM: imx: remove unnecessary ARCH_HAS_OPP select
        ARM: imx: fix TLB missing of IOMUXC base address during suspend
        ARM: imx6: fix SMP compilation again
        ARM: dt: sun6i: Add #address-cells and #size-cells to i2c controller nodes
      959dc258
    • Linus Torvalds's avatar
      Merge tag 'gpio-v3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · fa7f78e0
      Linus Torvalds authored
      Pull gpio fixes from Linus Walleij:
      
       - a largeish fix for the IRQ handling in the new Zynq driver.  The
         quite verbose commit message gives the exact details.
       - move some defines for gpiod flags outside an ifdef to make stub
         functions work again.
       - various minor fixes that we can accept for -rc1.
      
      * tag 'gpio-v3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpio-lynxpoint: enable input sensing in resume
        gpio: move GPIOD flags outside #ifdef
        gpio: delete unneeded test before of_node_put
        gpio: zynq: Fix IRQ handlers
        gpiolib: devres: use correct structure type name in sizeof
        MAINTAINERS: Change maintainer for gpio-bcm-kona.c
      fa7f78e0
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 5e30ca1e
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Intel and radeon fixes.
      
        Post KS/LC git requests from i915 and radeon stacked up.  They are all
        fixes along with some new pci ids for radeon, and one maintainers file
        entry.
      
         - i915: display fixes and irq fixes
         - radeon: pci ids, and misc gpuvm, dpm and hdp cache"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (29 commits)
        MAINTAINERS: Add entry for Renesas DRM drivers
        drm/radeon: add additional SI pci ids
        drm/radeon: add new bonaire pci ids
        drm/radeon: add new KV pci id
        Revert "drm/radeon: Use write-combined CPU mappings of ring buffers with PCIe"
        drm/radeon: fix active_cu mask on SI and CIK after re-init (v3)
        drm/radeon: fix active cu count for SI and CIK
        drm/radeon: re-enable selective GPUVM flushing
        drm/radeon: Sync ME and PFP after CP semaphore waits v4
        drm/radeon: fix display handling in radeon_gpu_reset
        drm/radeon: fix pm handling in radeon_gpu_reset
        drm/radeon: Only flush HDP cache for indirect buffers from userspace
        drm/radeon: properly document reloc priority mask
        drm/i915: don't try to retrain a DP link on an inactive CRTC
        drm/i915: make sure VDD is turned off during system suspend
        drm/i915: cancel hotplug and dig_port work during suspend and unload
        drm/i915: fix HPD IRQ reenable work cancelation
        drm/i915: take display port power domain in DP HPD handler
        drm/i915: Don't try to enable cursor from setplane when crtc is disabled
        drm/i915: Skip load detect when intel_crtc->new_enable==true
        ...
      5e30ca1e
    • Benjamin LaHaise's avatar
      aio: fix reqs_available handling · d856f32a
      Benjamin LaHaise authored
      As reported by Dan Aloni, commit f8567a38 ("aio: fix aio request
      leak when events are reaped by userspace") introduces a regression when
      user code attempts to perform io_submit() with more events than are
      available in the ring buffer.  Reverting that commit would reintroduce a
      regression when user space event reaping is used.
      
      Fixing this bug is a bit more involved than the previous attempts to fix
      this regression.  Since we do not have a single point at which we can
      count events as being reaped by user space and io_getevents(), we have
      to track event completion by looking at the number of events left in the
      event ring.  So long as there are as many events in the ring buffer as
      there have been completion events generate, we cannot call
      put_reqs_available().  The code to check for this is now placed in
      refill_reqs_available().
      
      A test program from Dan and modified by me for verifying this bug is available
      at http://www.kvack.org/~bcrl/20140824-aio_bug.c .
      Reported-by: default avatarDan Aloni <dan@kernelim.com>
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      Acked-by: default avatarDan Aloni <dan@kernelim.com>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Mateusz Guzik <mguzik@redhat.com>
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: stable@vger.kernel.org      # v3.16 and anything that f8567a38 was backported to
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d856f32a