1. 31 Mar, 2020 1 commit
    • Darrick J. Wong's avatar
      xfs: ratelimit inode flush on buffered write ENOSPC · c6425702
      Darrick J. Wong authored
      A customer reported rcu stalls and softlockup warnings on a computer
      with many CPU cores and many many more IO threads trying to write to a
      filesystem that is totally out of space.  Subsequent analysis pointed to
      the many many IO threads calling xfs_flush_inodes -> sync_inodes_sb,
      which causes a lot of wb_writeback_work to be queued.  The writeback
      worker spends so much time trying to wake the many many threads waiting
      for writeback completion that it trips the softlockup detector, and (in
      this case) the system automatically reboots.
      
      In addition, they complain that the lengthy xfs_flush_inodes scan traps
      all of those threads in uninterruptible sleep, which hampers their
      ability to kill the program or do anything else to escape the situation.
      
      If there's thousands of threads trying to write to files on a full
      filesystem, each of those threads will start separate copies of the
      inode flush scan.  This is kind of pointless since we only need one
      scan, so rate limit the inode flush.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      c6425702
  2. 28 Mar, 2020 3 commits
  3. 27 Mar, 2020 18 commits
  4. 26 Mar, 2020 5 commits
    • Darrick J. Wong's avatar
      xfs: prohibit fs freezing when using empty transactions · 27fb5a72
      Darrick J. Wong authored
      I noticed that fsfreeze can take a very long time to freeze an XFS if
      there happens to be a GETFSMAP caller running in the background.  I also
      happened to notice the following in dmesg:
      
      ------------[ cut here ]------------
      WARNING: CPU: 2 PID: 43492 at fs/xfs/xfs_super.c:853 xfs_quiesce_attr+0x83/0x90 [xfs]
      Modules linked in: xfs libcrc32c ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip_set_hash_ip ip_set_hash_net xt_tcpudp xt_set ip_set_hash_mac ip_set nfnetlink ip6table_filter ip6_tables bfq iptable_filter sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: xfs]
      CPU: 2 PID: 43492 Comm: xfs_io Not tainted 5.6.0-rc4-djw #rc4
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014
      RIP: 0010:xfs_quiesce_attr+0x83/0x90 [xfs]
      Code: 7c 07 00 00 85 c0 75 22 48 89 df 5b e9 96 c1 00 00 48 c7 c6 b0 2d 38 a0 48 89 df e8 57 64 ff ff 8b 83 7c 07 00 00 85 c0 74 de <0f> 0b 48 89 df 5b e9 72 c1 00 00 66 90 0f 1f 44 00 00 41 55 41 54
      RSP: 0018:ffffc900030f3e28 EFLAGS: 00010202
      RAX: 0000000000000001 RBX: ffff88802ac54000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff81e4a6f0 RDI: 00000000ffffffff
      RBP: ffff88807859f070 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000010 R12: 0000000000000000
      R13: ffff88807859f388 R14: ffff88807859f4b8 R15: ffff88807859f5e8
      FS:  00007fad1c6c0fc0(0000) GS:ffff88807e000000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f0c7d237000 CR3: 0000000077f01003 CR4: 00000000001606a0
      Call Trace:
       xfs_fs_freeze+0x25/0x40 [xfs]
       freeze_super+0xc8/0x180
       do_vfs_ioctl+0x70b/0x750
       ? __fget_files+0x135/0x210
       ksys_ioctl+0x3a/0xb0
       __x64_sys_ioctl+0x16/0x20
       do_syscall_64+0x50/0x1a0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      These two things appear to be related.  The assertion trips when another
      thread initiates a fsmap request (which uses an empty transaction) after
      the freezer waited for m_active_trans to hit zero but before the the
      freezer executes the WARN_ON just prior to calling xfs_log_quiesce.
      
      The lengthy delays in freezing happen because the freezer calls
      xfs_wait_buftarg to clean out the buffer lru list.  Meanwhile, the
      GETFSMAP caller is continuing to grab and release buffers, which means
      that it can take a very long time for the buffer lru list to empty out.
      
      We fix both of these races by calling sb_start_write to obtain freeze
      protection while using empty transactions for GETFSMAP and for metadata
      scrubbing.  The other two users occur during mount, during which time we
      cannot fs freeze.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      27fb5a72
    • Brian Foster's avatar
      xfs: shutdown on failure to add page to log bio · 842a42d1
      Brian Foster authored
      If the bio_add_page() call fails, we proceed to write out a
      partially constructed log buffer. This corrupts the physical log
      such that log recovery is not possible. Worse, persistent
      occurrences of this error eventually lead to a BUG_ON() failure in
      bio_split() as iclogs wrap the end of the physical log, which
      triggers log recovery on subsequent mount.
      
      Rather than warn about writing out a corrupted log buffer, shutdown
      the fs as is done for any log I/O related error. This preserves the
      consistency of the physical log such that log recovery succeeds on a
      subsequent mount. Note that this was observed on a 64k page debug
      kernel without upstream commit 59bb4798 ("mm, sl[aou]b:
      guarantee natural alignment for kmalloc(power-of-two)"), which
      demonstrated frequent iclog bio overflows due to unaligned (slab
      allocated) iclog data buffers.
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      842a42d1
    • Darrick J. Wong's avatar
      xfs: directory bestfree check should release buffers · d59f44d3
      Darrick J. Wong authored
      When we're checking bestfree information in directory blocks, always
      drop the block buffer at the end of the function.  We should always
      release resources when we're done using them.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      d59f44d3
    • Darrick J. Wong's avatar
      xfs: drop all altpath buffers at the end of the sibling check · afbabf56
      Darrick J. Wong authored
      The dirattr btree checking code uses the altpath substructure of the
      dirattr state structure to check the sibling pointers of dir/attr tree
      blocks.  At the end of sibling checks, xfs_da3_path_shift could have
      changed multiple levels of buffer pointers in the altpath structure.
      Although we release the leaf level buffer, this isn't enough -- we also
      need to release the node buffers that are unique to the altpath.
      
      Not releasing all of the altpath buffers leaves them locked to the
      transaction.  This is suboptimal because we should release resources
      when we don't need them anymore.  Fix the function to loop all levels of
      the altpath, and fix the return logic so that we always run the loop.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      afbabf56
    • Darrick J. Wong's avatar
      xfs: preserve default grace interval during quotacheck · 5885539f
      Darrick J. Wong authored
      When quotacheck runs, it zeroes all the timer fields in every dquot.
      Unfortunately, it also does this to the root dquot, which erases any
      preconfigured grace intervals and warning limits that the administrator
      may have set.  Worse yet, the incore copies of those variables remain
      set.  This cache coherence problem manifests itself as the grace
      interval mysteriously being reset back to the defaults at the /next/
      mount.
      
      Fix it by not resetting the root disk dquot's timer and warning fields.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      5885539f
  5. 23 Mar, 2020 8 commits
  6. 19 Mar, 2020 5 commits