An error occurred fetching the project authors.
  1. 16 Apr, 2020 1 commit
    • Darrick J. Wong's avatar
      xfs: move inode flush to the sync workqueue · f0f7a674
      Darrick J. Wong authored
      Move the inode dirty data flushing to a workqueue so that multiple
      threads can take advantage of a single thread's flushing work.  The
      ratelimiting technique used in bdd4ee4 was not successful, because
      threads that skipped the inode flush scan due to ratelimiting would
      ENOSPC early, which caused occasional (but noticeable) changes in
      behavior and sporadic fstest regressions.
      
      Therefore, make all the writer threads wait on a single inode flush,
      which eliminates both the stampeding hordes of flushers and the small
      window in which a write could fail with ENOSPC because it lost the
      ratelimit race after even another thread freed space.
      
      Fixes: c6425702 ("xfs: ratelimit inode flush on buffered write ENOSPC")
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      f0f7a674
  2. 31 Mar, 2020 1 commit
    • Darrick J. Wong's avatar
      xfs: ratelimit inode flush on buffered write ENOSPC · c6425702
      Darrick J. Wong authored
      A customer reported rcu stalls and softlockup warnings on a computer
      with many CPU cores and many many more IO threads trying to write to a
      filesystem that is totally out of space.  Subsequent analysis pointed to
      the many many IO threads calling xfs_flush_inodes -> sync_inodes_sb,
      which causes a lot of wb_writeback_work to be queued.  The writeback
      worker spends so much time trying to wake the many many threads waiting
      for writeback completion that it trips the softlockup detector, and (in
      this case) the system automatically reboots.
      
      In addition, they complain that the lengthy xfs_flush_inodes scan traps
      all of those threads in uninterruptible sleep, which hampers their
      ability to kill the program or do anything else to escape the situation.
      
      If there's thousands of threads trying to write to files on a full
      filesystem, each of those threads will start separate copies of the
      inode flush scan.  This is kind of pointless since we only need one
      scan, so rate limit the inode flush.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      c6425702
  3. 27 Mar, 2020 1 commit
  4. 07 Feb, 2020 2 commits
  5. 14 Jan, 2020 1 commit
    • Darrick J. Wong's avatar
      xfs: fix s_maxbytes computation on 32-bit kernels · 932befe3
      Darrick J. Wong authored
      I observed a hang in generic/308 while running fstests on a i686 kernel.
      The hang occurred when trying to purge the pagecache on a large sparse
      file that had a page created past MAX_LFS_FILESIZE, which caused an
      integer overflow in the pagecache xarray and resulted in an infinite
      loop.
      
      I then noticed that Linus changed the definition of MAX_LFS_FILESIZE in
      commit 0cc3b0ec ("Clarify (and fix) MAX_LFS_FILESIZE macros") so
      that it is now one page short of the maximum page index on 32-bit
      kernels.  Because the XFS function to compute max offset open-codes the
      2005-era MAX_LFS_FILESIZE computation and neither the vfs nor mm perform
      any sanity checking of s_maxbytes, the code in generic/308 can create a
      page above the pagecache's limit and kaboom.
      
      Fix all this by setting s_maxbytes to MAX_LFS_FILESIZE directly and
      aborting the mount with a warning if our assumptions ever break.  I have
      no answer for why this seems to have been broken for years and nobody
      noticed.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      932befe3
  6. 18 Nov, 2019 2 commits
  7. 11 Nov, 2019 1 commit
  8. 06 Nov, 2019 1 commit
  9. 05 Nov, 2019 17 commits
  10. 29 Oct, 2019 10 commits
  11. 21 Oct, 2019 1 commit
  12. 06 Sep, 2019 1 commit
    • Dave Chinner's avatar
      xfs: prevent CIL push holdoff in log recovery · 8ab39f11
      Dave Chinner authored
      generic/530 on a machine with enough ram and a non-preemptible
      kernel can run the AGI processing phase of log recovery enitrely out
      of cache. This means it never blocks on locks, never waits for IO
      and runs entirely through the unlinked lists until it either
      completes or blocks and hangs because it has run out of log space.
      
      It runs out of log space because the background CIL push is
      scheduled but never runs. queue_work() queues the CIL work on the
      current CPU that is busy, and the workqueue code will not run it on
      any other CPU. Hence if the unlinked list processing never yields
      the CPU voluntarily, the push work is delayed indefinitely. This
      results in the CIL aggregating changes until all the log space is
      consumed.
      
      When the log recoveyr processing evenutally blocks, the CIL flushes
      but because the last iclog isn't submitted for IO because it isn't
      full, the CIL flush never completes and nothing ever moves the log
      head forwards, or indeed inserts anything into the tail of the log,
      and hence nothing is able to get the log moving again and recovery
      hangs.
      
      There are several problems here, but the two obvious ones from
      the trace are that:
      	a) log recovery does not yield the CPU for over 4 seconds,
      	b) binding CIL pushes to a single CPU is a really bad idea.
      
      This patch addresses just these two aspects of the problem, and are
      suitable for backporting to work around any issues in older kernels.
      The more fundamental problem of preventing the CIL from consuming
      more than 50% of the log without committing will take more invasive
      and complex work, so will be done as followup work.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      8ab39f11
  13. 30 Aug, 2019 1 commit
    • Deepa Dinamani's avatar
      fs: Fill in max and min timestamps in superblock · 22b13969
      Deepa Dinamani authored
      Fill in the appropriate limits to avoid inconsistencies
      in the vfs cached inode times when timestamps are
      outside the permitted range.
      
      Even though some filesystems are read-only, fill in the
      timestamps to reflect the on-disk representation.
      Signed-off-by: default avatarDeepa Dinamani <deepa.kernel@gmail.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Acked-By: default avatarTigran Aivazian <aivazian.tigran@gmail.com>
      Acked-by: default avatarJeff Layton <jlayton@kernel.org>
      Cc: aivazian.tigran@gmail.com
      Cc: al@alarsen.net
      Cc: coda@cs.cmu.edu
      Cc: darrick.wong@oracle.com
      Cc: dushistov@mail.ru
      Cc: dwmw2@infradead.org
      Cc: hch@infradead.org
      Cc: jack@suse.com
      Cc: jaharkes@cs.cmu.edu
      Cc: luisbg@kernel.org
      Cc: nico@fluxnic.net
      Cc: phillip@squashfs.org.uk
      Cc: richard@nod.at
      Cc: salah.triki@gmail.com
      Cc: shaggy@kernel.org
      Cc: linux-xfs@vger.kernel.org
      Cc: codalist@coda.cs.cmu.edu
      Cc: linux-ext4@vger.kernel.org
      Cc: linux-mtd@lists.infradead.org
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: reiserfs-devel@vger.kernel.org
      22b13969