1. 21 Jul, 2011 16 commits
    • Christoph Hellwig's avatar
      fs: move inode_dio_done to the end_io handler · 72c5052d
      Christoph Hellwig authored
      For filesystems that delay their end_io processing we should keep our
      i_dio_count until the the processing is done.  Enable this by moving
      the inode_dio_done call to the end_io handler if one exist.  Note that
      the actual move to the workqueue for ext4 and XFS is not done in
      this patch yet, but left to the filesystem maintainers.  At least
      for XFS it's not needed yet either as XFS has an internal equivalent
      to i_dio_count.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      72c5052d
    • Christoph Hellwig's avatar
      fs: simplify the blockdev_direct_IO prototype · aacfc19c
      Christoph Hellwig authored
      Simple filesystems always pass inode->i_sb_bdev as the block device
      argument, and never need a end_io handler.  Let's simply things for
      them and for my grepping activity by dropping these arguments.  The
      only thing not falling into that scheme is ext4, which passes and
      end_io handler without needing special flags (yet), but given how
      messy the direct I/O code there is use of __blockdev_direct_IO
      in one instead of two out of three cases isn't going to make a large
      difference anyway.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      aacfc19c
    • Christoph Hellwig's avatar
      fs: always maintain i_dio_count · df2d6f26
      Christoph Hellwig authored
      Maintain i_dio_count for all filesystems, not just those using DIO_LOCKING.
      This these filesystems to also protect truncate against direct I/O requests
      by using common code.  Right now the only non-DIO_LOCKING filesystem that
      appears to do so is XFS, which uses an opencoded variant of the i_dio_count
      scheme.
      
      Behaviour doesn't change for filesystems never calling inode_dio_wait.
      For ext4 behaviour changes when using the dioread_nonlock option, which
      previously was missing any protection between truncate and direct I/O reads.
      For ocfs2 that handcrafted i_dio_count manipulations are replaced with
      the common code now enable.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      df2d6f26
    • Christoph Hellwig's avatar
      fs: move inode_dio_wait calls into ->setattr · 562c72aa
      Christoph Hellwig authored
      Let filesystems handle waiting for direct I/O requests themselves instead
      of doing it beforehand.  This means filesystem-specific locks to prevent
      new dio referenes from appearing can be held.  This is important to allow
      generalizing i_dio_count to non-DIO_LOCKING filesystems.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      562c72aa
    • Christoph Hellwig's avatar
      rw_semaphore: remove up/down_read_non_owner · 11b80f45
      Christoph Hellwig authored
      Now that the last users is gone these can be removed.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      11b80f45
    • Christoph Hellwig's avatar
      fs: kill i_alloc_sem · bd5fe6c5
      Christoph Hellwig authored
      i_alloc_sem is a rather special rw_semaphore.  It's the last one that may
      be released by a non-owner, and it's write side is always mirrored by
      real exclusion.  It's intended use it to wait for all pending direct I/O
      requests to finish before starting a truncate.
      
      Replace it with a hand-grown construct:
      
       - exclusion for truncates is already guaranteed by i_mutex, so it can
         simply fall way
       - the reader side is replaced by an i_dio_count member in struct inode
         that counts the number of pending direct I/O requests.  Truncate can't
         proceed as long as it's non-zero
       - when i_dio_count reaches non-zero we wake up a pending truncate using
         wake_up_bit on a new bit in i_flags
       - new references to i_dio_count can't appear while we are waiting for
         it to read zero because the direct I/O count always needs i_mutex
         (or an equivalent like XFS's i_iolock) for starting a new operation.
      
      This scheme is much simpler, and saves the space of a spinlock_t and a
      struct list_head in struct inode (typically 160 bits on a non-debug 64-bit
      system).
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      bd5fe6c5
    • Christoph Hellwig's avatar
      fs: simplify handling of zero sized reads in __blockdev_direct_IO · f9b5570d
      Christoph Hellwig authored
      Reject zero sized reads as soon as we know our I/O length, and don't
      borther with locks or allocations that might have to be cleaned up
      otherwise.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      f9b5570d
    • Jan Kara's avatar
      ext4: Rewrite ext4_page_mkwrite() to use generic helpers · 9ea7df53
      Jan Kara authored
      Rewrite ext4_page_mkwrite() to use __block_page_mkwrite() helper. This
      removes the need of using i_alloc_sem to avoid races with truncate which
      seems to be the wrong locking order according to lock ordering documented in
      mm/rmap.c. Also calling ext4_da_write_begin() as used by the old code seems to
      be problematic because we can decide to flush delay-allocated blocks which
      will acquire s_umount semaphore - again creating unpleasant lock dependency
      if not directly a deadlock.
      
      Also add a check for frozen filesystem so that we don't busyloop in page fault
      when the filesystem is frozen.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      9ea7df53
    • Christoph Hellwig's avatar
      fat: remove i_alloc_sem abuse · 58268691
      Christoph Hellwig authored
      Add a new rw_semaphore to protect bmap against truncate.  Previous
      i_alloc_sem was abused for this, but it's going away in this series.
      
      Note that we can't simply use i_mutex, given that the swapon code
      calls ->bmap under it.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      58268691
    • Tobias Klauser's avatar
      VFS: Fixup kerneldoc for generic_permission() · 8c5dc70a
      Tobias Klauser authored
      The flags parameter went away in
      d749519b444db985e40b897f73ce1898b11f997e
      Signed-off-by: default avatarTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8c5dc70a
    • Tomasz Stanislawski's avatar
      anonfd: fix missing declaration · e46ebd27
      Tomasz Stanislawski authored
      The forward declaration of struct file_operations is
      added to avoid compilation warnings.
      Signed-off-by: default avatarTomasz Stanislawski <t.stanislaws@samsung.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e46ebd27
    • Dave Chinner's avatar
      xfs: make use of new shrinker callout for the inode cache · 8daaa831
      Dave Chinner authored
      Convert the inode reclaim shrinker to use the new per-sb shrinker
      operations. This allows much bigger reclaim batches to be used, and
      allows the XFS inode cache to be shrunk in proportion with the VFS
      dentry and inode caches. This avoids the problem of the VFS caches
      being shrunk significantly before the XFS inode cache is shrunk
      resulting in imbalances in the caches during reclaim.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8daaa831
    • Dave Chinner's avatar
      vfs: increase shrinker batch size · 8ab47664
      Dave Chinner authored
      Now that the per-sb shrinker is responsible for shrinking 2 or more
      caches, increase the batch size to keep econmies of scale for
      shrinking each cache.  Increase the shrinker batch size to 1024
      objects.
      
      To allow for a large increase in batch size, add a conditional
      reschedule to prune_icache_sb() so that we don't hold the LRU spin
      lock for too long. This mirrors the behaviour of the
      __shrink_dcache_sb(), and allows us to increase the batch size
      without needing to worry about problems caused by long lock hold
      times.
      
      To ensure that filesystems using the per-sb shrinker callouts don't
      cause problems, document that the object freeing method must
      reschedule appropriately inside loops.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8ab47664
    • Dave Chinner's avatar
      superblock: add filesystem shrinker operations · 0e1fdafd
      Dave Chinner authored
      Now we have a per-superblock shrinker implementation, we can add a
      filesystem specific callout to it to allow filesystem internal
      caches to be shrunk by the superblock shrinker.
      
      Rather than perpetuate the multipurpose shrinker callback API (i.e.
      nr_to_scan == 0 meaning "tell me how many objects freeable in the
      cache), two operations will be added. The first will return the
      number of objects that are freeable, the second is the actual
      shrinker call.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      0e1fdafd
    • Dave Chinner's avatar
      inode: remove iprune_sem · 4f8c19fd
      Dave Chinner authored
      Now that we have per-sb shrinkers with a lifecycle that is a subset
      of the superblock lifecycle and can reliably detect a filesystem
      being unmounted, there is not longer any race condition for the
      iprune_sem to protect against. Hence we can remove it.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      4f8c19fd
    • Dave Chinner's avatar
      superblock: introduce per-sb cache shrinker infrastructure · b0d40c92
      Dave Chinner authored
      With context based shrinkers, we can implement a per-superblock
      shrinker that shrinks the caches attached to the superblock. We
      currently have global shrinkers for the inode and dentry caches that
      split up into per-superblock operations via a coarse proportioning
      method that does not batch very well.  The global shrinkers also
      have a dependency - dentries pin inodes - so we have to be very
      careful about how we register the global shrinkers so that the
      implicit call order is always correct.
      
      With a per-sb shrinker callout, we can encode this dependency
      directly into the per-sb shrinker, hence avoiding the need for
      strictly ordering shrinker registrations. We also have no need for
      any proportioning code for the shrinker subsystem already provides
      this functionality across all shrinkers. Allowing the shrinker to
      operate on a single superblock at a time means that we do less
      superblock list traversals and locking and reclaim should batch more
      effectively. This should result in less CPU overhead for reclaim and
      potentially faster reclaim of items from each filesystem.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b0d40c92
  2. 20 Jul, 2011 24 commits