1. 15 Jun, 2020 4 commits
    • Jan Kara's avatar
      writeback: Drop I_DIRTY_TIME_EXPIRE · 5fcd5750
      Jan Kara authored
      The only use of I_DIRTY_TIME_EXPIRE is to detect in
      __writeback_single_inode() that inode got there because flush worker
      decided it's time to writeback the dirty inode time stamps (either
      because we are syncing or because of age). However we can detect this
      directly in __writeback_single_inode() and there's no need for the
      strange propagation with I_DIRTY_TIME_EXPIRE flag.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      5fcd5750
    • Jan Kara's avatar
      writeback: Fix sync livelock due to b_dirty_time processing · f9cae926
      Jan Kara authored
      When we are processing writeback for sync(2), move_expired_inodes()
      didn't set any inode expiry value (older_than_this). This can result in
      writeback never completing if there's steady stream of inodes added to
      b_dirty_time list as writeback rechecks dirty lists after each writeback
      round whether there's more work to be done. Fix the problem by using
      sync(2) start time is inode expiry value when processing b_dirty_time
      list similarly as for ordinarily dirtied inodes. This requires some
      refactoring of older_than_this handling which simplifies the code
      noticeably as a bonus.
      
      Fixes: 0ae45f63 ("vfs: add support for a lazytime mount option")
      CC: stable@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      f9cae926
    • Jan Kara's avatar
      writeback: Avoid skipping inode writeback · 5afced3b
      Jan Kara authored
      Inode's i_io_list list head is used to attach inode to several different
      lists - wb->{b_dirty, b_dirty_time, b_io, b_more_io}. When flush worker
      prepares a list of inodes to writeback e.g. for sync(2), it moves inodes
      to b_io list. Thus it is critical for sync(2) data integrity guarantees
      that inode is not requeued to any other writeback list when inode is
      queued for processing by flush worker. That's the reason why
      writeback_single_inode() does not touch i_io_list (unless the inode is
      completely clean) and why __mark_inode_dirty() does not touch i_io_list
      if I_SYNC flag is set.
      
      However there are two flaws in the current logic:
      
      1) When inode has only I_DIRTY_TIME set but it is already queued in b_io
      list due to sync(2), concurrent __mark_inode_dirty(inode, I_DIRTY_SYNC)
      can still move inode back to b_dirty list resulting in skipping
      writeback of inode time stamps during sync(2).
      
      2) When inode is on b_dirty_time list and writeback_single_inode() races
      with __mark_inode_dirty() like:
      
      writeback_single_inode()		__mark_inode_dirty(inode, I_DIRTY_PAGES)
        inode->i_state |= I_SYNC
        __writeback_single_inode()
      					  inode->i_state |= I_DIRTY_PAGES;
      					  if (inode->i_state & I_SYNC)
      					    bail
        if (!(inode->i_state & I_DIRTY_ALL))
        - not true so nothing done
      
      We end up with I_DIRTY_PAGES inode on b_dirty_time list and thus
      standard background writeback will not writeback this inode leading to
      possible dirty throttling stalls etc. (thanks to Martijn Coenen for this
      analysis).
      
      Fix these problems by tracking whether inode is queued in b_io or
      b_more_io lists in a new I_SYNC_QUEUED flag. When this flag is set, we
      know flush worker has queued inode and we should not touch i_io_list.
      On the other hand we also know that once flush worker is done with the
      inode it will requeue the inode to appropriate dirty list. When
      I_SYNC_QUEUED is not set, __mark_inode_dirty() can (and must) move inode
      to appropriate dirty list.
      Reported-by: default avatarMartijn Coenen <maco@android.com>
      Reviewed-by: default avatarMartijn Coenen <maco@android.com>
      Tested-by: default avatarMartijn Coenen <maco@android.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Fixes: 0ae45f63 ("vfs: add support for a lazytime mount option")
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      5afced3b
    • Jan Kara's avatar
      writeback: Protect inode->i_io_list with inode->i_lock · b35250c0
      Jan Kara authored
      Currently, operations on inode->i_io_list are protected by
      wb->list_lock. In the following patches we'll need to maintain
      consistency between inode->i_state and inode->i_io_list so change the
      code so that inode->i_lock protects also all inode's i_io_list handling.
      Reviewed-by: default avatarMartijn Coenen <maco@android.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      CC: stable@vger.kernel.org # Prerequisite for "writeback: Avoid skipping inode writeback"
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      b35250c0
  2. 09 Jun, 2020 36 commits