1. 03 Apr, 2003 28 commits
    • Andrew Morton's avatar
      [PATCH] ext3: create a slab cache for transaction handles · c20fb5f1
      Andrew Morton authored
      ext3 allocates and frees at least one handle structure for each system call.
      kmalloc and kfree are apparent in the profiles.
      
      Adding a slab cache for these objects takes the overhead for a write() from
      1.63 microseconds down to 1.56.
      c20fb5f1
    • Andrew Morton's avatar
      [PATCH] ext3_commit_write speedup · 9aabee2e
      Andrew Morton authored
      For an appending write, ext3_commit_write() will call the expensive
      ext3_mark_inode_dirty() twice.  Once in generic_commit_write()'s extension of
      i_size and once in ext3_commit_write() itself where i_disksize is updated.
      
      But by updating i_disksize _before_ calling generic_commit_write() these can
      be piggybacked.
      
      The patch takes the overhead of a write() from 1.96 microseconds down to
      1.63.
      9aabee2e
    • Andrew Morton's avatar
      [PATCH] ext3_mark_inode_dirty() speedup · f0f46afd
      Andrew Morton authored
      ext3_mark_inode_dirty() (and several other callers) use the
      ext3_reserve_inode_write() and ext3_mark_ioc_dirty() pair for journalling an
      inode's backing block.
      
      Because ext3_reserve_inode_write() gets journalling access to the block there
      is no need for ext3_mark_iloc_dirty() to do it as well.
      
      This change reduces the overhead of a write() from 2.7 microseconds to 1.95
      on a 2.7G P4.
      f0f46afd
    • Andrew Morton's avatar
      [PATCH] Fix jbd assert failure on IO error. · 7ba93ca7
      Andrew Morton authored
      From: Stephen Tweedie <sct@redhat.com>
      
      The buffer_uptodate flag gets cleared on IO failure, and this can panic jbd
      when it tries to write such a buffer.  Relax the panic to be just a warning.
      7ba93ca7
    • Andrew Morton's avatar
      [PATCH] Add less-severe assert-failure form for ext3. · bdf6c6a6
      Andrew Morton authored
      From: Stephen Tweedie <sct@redhat.com>
      
      Add a new form of assert failure in ext3 which allows us to flag events which
      are *usually* bugs, but which can be legally triggered in the presence of IO
      failures.  Don't panic the kernel on such errors unless we've defined
      #JBD_PARANOID_IOFAIL, which will normally be set only for testing purposes.
      bdf6c6a6
    • Andrew Morton's avatar
      [PATCH] remove dparent_lock · 723c6e83
      Andrew Morton authored
      The big SMP machines are seeing quite some contention in dnotify_parent()
      (via vfs_write).  This function is hammering the global dparent_lock.
      
      However we don't actually need a global dparent_lock for pinning down
      dentry->d_parent.  We can use dentry->d_lock for this.  That is already being
      held across d_move.
      
      This patch speeds up SDET on the 16-way by 5% and wipes dnotify_parent() off
      the profiles.
      
      It also uninlines dnofity_parent().
      
      It also uses spin_lock(), which is faster than read_lock().
      
      I'm not sure that we need to take both the source and target dentry's d_lock
      in d_move.
      
      The patch also does lots of s/__inline__/inline/ in dcache.h
      723c6e83
    • Andrew Morton's avatar
      [PATCH] real_lookup race fix · 1b8910cf
      Andrew Morton authored
      From: Maneesh Soni <maneesh@in.ibm.com>
      
      Here is a patch to use seqlock for real_lookup race with d_lookup as suggested
      by Linus. The race condition can result in duplicate dentry when d_lookup
      fails due concurrent d_move in some unrelated directory.
      
      Apart from real_lookup, lookup_hash()->cached_lookup() can also fail due
      to same reason. So, for that I am doing the d_lookup again.
      
      Now we have __d_lookup (called from do_lookup() during pathwalk) and
      d_lookup which uses seqlock to protect againt rename race.
      
      dcachebench numbers (lower is better) don't have much difference on a 4-way
      PIII xeon SMP box.
      
      base-2565
      Average usec/iteration  19059.4
      Standard Deviation      503.07
      
      base-2565 + seq_lock
      Average usec/iteration  18843.2
      Standard Deviation      450.57
      1b8910cf
    • Andrew Morton's avatar
      [PATCH] exp_parent locking fixes · ec1d26ec
      Andrew Morton authored
      From: Neil Brown and myself.
      
      Don't do dput() inside read_lock().  It can sleep.
      ec1d26ec
    • Andrew Morton's avatar
      [PATCH] umsdos fixes · ca88b8e8
      Andrew Morton authored
      From: Andries.Brouwer@cwi.nl
      
      Make sure structs used by umsdos ioctls do not change size when the size of
      dev_t is changed.
      ca88b8e8
    • Andrew Morton's avatar
      [PATCH] Fix devfs' partition handling · 7ceef18f
      Andrew Morton authored
      From: Andre Landwehr <andre.landwehr@gmx.net>
      
      with / on an IDE harddisk the disks partitions do not appear in
      devfs, only the disc device. This is due to rescan_partitions
      being called twice and deleting but not re-creating the entries
      during the second call.
      
      hch has acked this.
      7ceef18f
    • Andrew Morton's avatar
      [PATCH] add vt console scrollback ioctl · 8a8e9c88
      Andrew Morton authored
      From: Samuel Thibault <Samuel.Thibault@ens-lyon.fr>
      
      There is no way for a braille device driven by brltty (userland root-owned
      daemon) to scrollback the virtual console, the only way is to use the pc
      keyboard.  A very simple TIOCLINUX ioctl meets this need (tested).
      
      Also add a command for bringing the last console to the top, as keyboard.c's
      lastcons() does when pressing alt - down arrow.
      8a8e9c88
    • Andrew Morton's avatar
      [PATCH] sync dirty pages in fadvise(FADV_DONTNEED) · 3bc17e74
      Andrew Morton authored
      This changes the fadvise(FADV_DONTNEED) operation to start async writeout of
      any dirty pages in the file.
      
      The thinking is that if the application doesn't want to use those pages in
      the future, we may as well get IO underway against them so they can be freed
      up on the next call to fadvise().
      
      The POSIX spec does not go into any detail as to whether this is the right or
      wrong behaviour.
      
      This provides a nice way for applications whihc are writing streaming data
      (the main users of fadvise) to keep the amount of dirty pagecache under
      control without having to resort to system-wide VM tuning.
      
      It also provides an "async fsync()".  If the application passes in a length
      of zero, fadvise will start async writeout of the pages, but will not
      invalidate any of the file's pagecache.
      3bc17e74
    • Andrew Morton's avatar
      [PATCH] Additional 3c980 device support · 89ef9495
      Andrew Morton authored
      From: "J.A. Magallon" <jamagallon@able.es>
      
      Adds support for a couple of 3c980 variants which are in pci.ids, but not in
      the driver.
      89ef9495
    • Andrew Morton's avatar
      [PATCH] aic7xxx timer deletion fix · 93bd249f
      Andrew Morton authored
      From: Zwane Mwaikambo <zwane@linuxpower.ca>
      
      ahc_linux_free_device() needs to use del_timer_sync().  slab corruption has
      been observed due to the timer handler running after the containing object
      was freed.
      93bd249f
    • Andrew Morton's avatar
      [PATCH] misc fixes · 98c20bf4
      Andrew Morton authored
      - Fix warning in sound/pci/cs46xx/cs46xx_lib.c (Martin Bligh)
      
      - pte_file() comment fix (Pete Zaitcev)
      
      - _PAGE_FILE comment clarifications
      
      - copy_to_user() check in do_proc_readlink()
      98c20bf4
    • Andrew Morton's avatar
      [PATCH] struct stat - support larger dev_t · e95b2065
      Andrew Morton authored
      From: Andries.Brouwer@cwi.nl
      
      Below a patch that changes struct stat for a number of
      architectures. Maintainers, please watch carefully.
      
      Struct stat is used to transfer information from kernel
      to user space on a stat() system call.
      It has fields st_dev, st_rdev.
      
      The size of these fields is in principle unrelated to
      the size of a dev_t in user space or the size of a
      dev_t or kdev_t in kernel space.
      
      It is just the "capacity" of the channel.
      The actual amount of useful information is the minimum
      of the four sizes (kernel dev_t, kernel kdev_t,
      user dev_t, width of stat st_dev, st_rdev fields).
      
      The goal of this patch is to make sure that the stat() and stat64()
      system calls transmit at least 32 and 64 bits, respectively.
      This is achieved by using the padding that was present already.
      We fail when no padding was present, or when the padding is on
      the wrong side (after the field, while the machine is big-endian).
      
      alpha:	stat: uses unsigned int, 32 bits
      arm:	stat: uses unsigned short - bad.
      	The padding is on one side, which means that this can
      	be made into unsigned long only on little endian systems.
      	FIXED - unless __ARMEB__.
      	stat64: used unsigned short - FIXED, now unsigned long long.
      cris:	stat: used unsigned short - FIXED, now unsigned long
      	stat64: used unsigned short - FIXED, now unsigned long long.
      i386:	stat: used unsigned short - FIXED, now unsigned long
      	stat64: used unsigned short - FIXED, now unsigned long long.
      ia64:	stat: uses unsigned long, 64 bits
      m68k:	stat: used unsigned short - bad, but this cannot be fixed
      	since m68k is big-endian, and the available padding is on
      	the wrong side. NOT FIXED.
      	stat64: used unsigned short - FIXED, now unsigned long long.
      mips:	stat: uses dev_t which is unsigned int, 32 bits
      	stat64: used unsigned long, 32 bits. NOT FIXED.
      	(There is padding on one side, so this can be fixed if __MIPSEL__.)
      mips64:	stat: uses dev_t which is unsigned int, 32 bits
      parisc:	stat: uses dev_t, 32 bits
      	stat64: uses unsigned long long, 64 bits
      ppc:	stat: uses dev_t which is unsigned int, 32 bits
      	stat64: unsigned long long, 64 bits
      ppc64:	stat: uses dev_t which is unsigned long, 64 bits
      	stat64: uses unsigned long, 64 bits
      sparc:	stat: uses unsigned short, no padding. NOT FIXED.
      	stat64: used unsigned short - FIXED, now unsigned long long.
      sparc64:stat: uses dev_t which is unsigned int, 32 bits
      	stat64: used unsigned short - FIXED, now unsigned long long.
      s390:	stat: used unsigned short, big-endian, padding on the wrong side,
      	NOT FIXED.
      	stat64: used unsigned short - FIXED, now unsigned long long.
      s390x:	stat: uses unsigned long, 64 bits
      sh:	stat: used unsigned short, but padding maybe on wrong side.
      	NOT FIXED.
      	stat64: used unsigned short - FIXED, now unsigned long long.
      v850:	stat: used __kernel_dev_t.
      	BUG: NEVER use __kernel types in a user space interface.
      	Replaced the types. FIXED - now unsigned int - 32 bits.
      	stat64: FIXED - now unsigned long long - 64 bits.
      x86_64:	stat: uses unsigned long, 64 bits
      
      So, on most architectures we achieve the aim of 32 bits for stat,
      64 bits for stat64. On all architectures we achieve at least
      16 bits for stat, 32 bits for stat64.
      e95b2065
    • Andrew Morton's avatar
      [PATCH] tmpfs 6/6: percentile sizing of tmpfs · 65aaef27
      Andrew Morton authored
      From: CaT <cat@zip.com.au>
      
      What this patch does is allow you to specify the max amount of memory tmpfs
      can use as a percentage of available real ram.  This (in my eyes) is useful
      so that you do not have to remember to change the setting if you want
      something other then 50% and some of your ram goes.
      
      Hugh redid the arithmetic to not overflow at 4GB; the particular order of
      lines helps RH's gcc-2.96-110 not to get confused in the do_div.  2.5 can use
      totalram_pages.  Update mount options in tmpfs Doc.
      
      There's an argument that the percentage should be of ram+swap, that's what
      Christoph originally intended.  But we set the default at 50% of ram only, so
      I believe it's more consistent to follow that precedent.
      65aaef27
    • Andrew Morton's avatar
      [PATCH] tmpfs 5/6: use cond_resched · 548ac1de
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      cond_resched each time around the loop in shmem_file_write
      and do_shmem_file_read, matching filemap.c.
      548ac1de
    • Andrew Morton's avatar
      [PATCH] tmpfs 4/6: use mark_page_accessed · 5d86cc8b
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      tmpfs pages should be surfing the LRUs in the company of their filemap
      friends: I was expecting the rules to change, but they've been stable so
      long, let's sprinkle mark_page_accessed in the equivalent places here; but
      (don't ask me why) SetPageReferenced in shmem_file_write.  Ooh, and
      shmem_populate was missing a flush_page_to_ram.
      5d86cc8b
    • Andrew Morton's avatar
      [PATCH] tmpfs 3/6: use generic_file_llseek · f56453c9
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      default_llseek's use of BKL and not i_sem was recently exposed:
      tmpfs should be using generic_file_llseek which guards with i_sem.
      f56453c9
    • Andrew Morton's avatar
      [PATCH] tmpfs 2/6 remove shmem_readpage · 2927b748
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      shmem_readpage was created to give tmpfs sendfile and loop ability; but
      they're both using shmem_file_sendfile now, so remove shmem_readpage.
      2927b748
    • Andrew Morton's avatar
      [PATCH] tmpfs 1/6 use generic_write_checks · acad2c18
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      Remove the duplicated checks in shmem_file-write(), use
      generic_write_checks() instead.
      acad2c18
    • Andrew Morton's avatar
      [PATCH] file limit checking simplification · d80bbda5
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      When handling rlimit != RLIM_INFINITY, generic_write_checks tests file
      position against 0xFFFFFFFFULL, and casts it to a u32.  This code is
      carried forward from 2.4.4, and the 2.4-ac tree contains an apparently
      obvious fix to one part of it (should set count to 0 not to a negative).
      But when you think it through, it all turns out to be bogus.
      
      On a 32-bit architecture: limit is a 32-bit unsigned long, we've
      already handled *pos < 0 and *pos >= limit, so *pos here has no way
      of being > 0xFFFFFFFFULL, and thus casting it to u32 won't truncate it.
      And on a 64-bit architecture: limit is a 64-bit unsigned long, but this
      code is disallowing file position beyond the 32 bits; or if there's some
      userspace compatibility issue, with limit having to fit into 32 bits,
      the 32-bit architecture argument applies and they're still irrelevant.
      
      So just remove the 0xFFFFFFFFULL test; and in place of the u32, cast to
      typeof(limit) so it's right even if rlimits get wider.  And there's no
      way we'd want to send SIGXFSZ below the limit: remove send_sig comment.
      
      There's a similarly suspicious u32 cast a little further down, when
      checking MAX_NON_LFS.  Given its definition, that does no harm on any
      arch: but it's better changed to unsigned long, the type of MAX_NON_LFS.
      d80bbda5
    • Andrew Morton's avatar
      [PATCH] bio kmapping changes · 240d3e2d
      Andrew Morton authored
      RAID5 is calling copy_data() under sh->lock.  But copy_data() does kmap(),
      which can sleep.
      
      The best fix is to use kmap_atomic() in there.  It is faster than kmap() and
      does not block.
      
      The patch removes the unused bio_kmap() and replaces __bio_kmap() with
      __bio_kmap_atomic().  I think it's best to withdraw the sleeping-and-slow
      bio_kmap() from the kernel API before someone else tries to use it.
      
      
      Also, I notice that bio_kmap_irq() was using local_save_flags().  This is a
      bug - local_save_flags() does not disable interrupts.  Converted that to
      local_irq_save().  These names are terribly chosen.
      
      This patch was acked by Jens and Neil.
      240d3e2d
    • Andrew Morton's avatar
      [PATCH] Fix some compile warnings · d597f71b
      Andrew Morton authored
      From: "Martin J. Bligh" <mbligh@aracnet.com>
      
      Fix a couple of instances of "warning: suggest parentheses around assignment
      used as truth value".
      d597f71b
    • Andrew Morton's avatar
      [PATCH] monotonic clock source for hangcheck timer · 92525be5
      Andrew Morton authored
      From: john stultz <johnstul@us.ibm.com>
      
      This patch, written with the advice of Joel Becker, addresses a problem with
      the hangcheck-timer.
      
      The basic problem is that the hangcheck-timer code (Required for Oracle)
      needs a accurate hard clock which can be used to detect OS stalls (due to
      udelay() or pci bus hangs) that would cause system time to skew (its sort of
      a sanity check that insures the system's notion of time is accurate).
      However, currently they are using get_cycles() to fetch the cpu's TSC
      register, thus this does not work on systems w/o a synced TSC.
      
      As suggested by Andi Kleen (see thread here:
      http://www.uwsg.iu.edu/hypermail/linux/kernel/0302.0/1234.html ) I've worked
      with Joel and others to implement the monotonic_clock() interface.  Some of
      the major considerations made when writing this patch were
      
      o Needs to be able to return accurate time in the absence of multiple timer
        interrupts
      
      o Needs to be abstracted out from the hardware
      
      o Avoids impacting gettimeofday() performance
      
      This interface returns a unsigned long long representing the number of
      nanoseconds that has passed since time_init().
      92525be5
    • Andrew Morton's avatar
      [PATCH] handle bad inodes in put_inode · 68fa8120
      Andrew Morton authored
      From: "J. Bruce Fields" <bfields@fieldses.org>
      
      If the NFS daemon is presented with a filehandle for a file that has
      been deleted, it does an iget() in fs/exportfs/expfs.c:export_iget() and
      gets a bad inode back.  When it subsequently iput()s the inode, the
      result is:
      
      Mar 27 12:53:40 snoopy kernel: EXT2-fs error (device ide0(3,3)): ext2_free_blocks: Freeing blocks not in datazone - block = 1802201963, count = 27499
      Mar 27 12:53:40 snoopy kernel: Remounting filesystem read-only
      
      The same can happen if ext2_get_inode() returns an error - ext2_read_inode()
      will return an uninitialised inode and ext2_put_inode() is not allowed to go
      looking inside the bad inode.
      68fa8120
    • Andrew Morton's avatar
      [PATCH] tmpfs blk_congestion_wait fix · 505f7dd2
      Andrew Morton authored
      From: Hugh Dickins <hugh@veritas.com>
      
      The blk_congestion_waits in shmem_getpage are appropriate when the error is
      -ENOMEM, but not when the error is -EEXIST.  So add that test in the first
      instance, but omit it all in the second instance.
      505f7dd2
  2. 02 Apr, 2003 12 commits