1. 11 Jul, 2003 8 commits
  2. 10 Jul, 2003 32 commits
    • Miles Bader's avatar
      [PATCH] show_stack changes for v850 · ccfd6724
      Miles Bader authored
      ccfd6724
    • Miles Bader's avatar
      [PATCH] More irqreturn_t changes for v850 · ab48a939
      Miles Bader authored
      ab48a939
    • Miles Bader's avatar
      [PATCH] Use <asm-generic/statsfs.h> on v850 · 599cd887
      Miles Bader authored
      599cd887
    • Jens Axboe's avatar
      [PATCH] disk stats accounting fix · b8ac7066
      Jens Axboe authored
      We should only account file system requests, ones originating from
      __make_request(). Otherwise it skews the counters and they go negative
      really fast.
      b8ac7066
    • Andrew Morton's avatar
      [PATCH] epoll-per-fd fix · 9cbdaa44
      Andrew Morton authored
      From: Davide Libenzi <davidel@xmailserver.org>
      
      Fix epoll to allow pushing of multiple file descriptors sharing the same
      kernel's file*
      9cbdaa44
    • Andrew Morton's avatar
      [PATCH] devfs deadlock fix · 1cf2ec10
      Andrew Morton authored
      From: Andrey Borzenkov <arvidjaar@mail.ru>
      
      I finally hit a painfully trivial way to reproduce another long standing devfs
      problem - deadlock between devfs_lookup and devfs_d_revalidate_wait. When
      devfs_lookup releases directory i_sem devfs_d_revalidate_wait grabs it (it
      happens not for every path) and goes to wait to be waked up. Unfortunately,
      devfs_lookup attempts to acquire directory i_sem before ever waking it up ...
      
      To reproduce (2.5.74 UP or SMP - does not matter, single CPU system)
      
      ls /dev/foo & rm -f /dev/foo &
      
      or possibly in a loop but then it easily fills up process table. In my case it
      hangs 100% reliably - on 2.5 OR 2.4.
      
      The current fix is to move re-acquire of i_sem after all
      devfs_d_revalidate_wait waiters have been waked up.  Much better fix would be
      to ensure that ->d_revalidate either is always called under i_sem or always
      without.  But that means the very heart of VFS and I do not dare to touch it.
      
      The fix has been tested on 2.4 (and is part of unofficial Mandrake Club
      kernel); I expected the same bug is in 2.5; I just was stupid not seeing the
      way to reproduce it before.
      1cf2ec10
    • Andrew Morton's avatar
      [PATCH] devfs oops fix · 934acf6c
      Andrew Morton authored
      From: Andrey Borzenkov <arvidjaar@mail.ru>
      
      Doing concurrent lookups for the same name in devfs with devfsd and modules
      enabled may result in stack coruption.
      
      When devfs_lookup needs to call devfsd it arranges for other lookups for the
      same name to wait. It is using local variable as wait queue head. After
      devfsd returns devfs_lookup wakes up all waiters and returns. Unfortunately
      there is no garantee all waiters will actually get chance to run and clean up
      before devfs_lookup returns. so some of them attempt to access already freed
      storage on stack.
      
      It is trivial to trigger with SMP kernel (I have single-CPU system if it
      matters) doing
      
      while true
      do
        ls /dev/foo &
      done
      
      Without spinlock debug system usually hung dead with reset button as the only
      possibility.
      
      I was not able to reproduce it on 2.4 on single-CPU system - in 2.4
      devfs_d_revalidate_wait does not attempt to remove itself from wait queue
      so it appears to be safe.
      
      The patch makes lookup struct be allocated from heap and adds reference
      counter to free it when no more needed.
      934acf6c
    • Andrew Morton's avatar
      [PATCH] Fix yenta-socket oops · e59d9afb
      Andrew Morton authored
      From: Russell King <rmk@arm.linux.org.uk>
      
      Interrupts can sometimes occur before the socket thread is started.
      e59d9afb
    • Andrew Morton's avatar
      [PATCH] yenta-socket initialisation fix · 85cea662
      Andrew Morton authored
      From: Daniel Ritz <daniel.ritz@gmx.ch>
      
      init_socket() enables interrupts, and the interrupt handler does a wakeup.
      Let's initialise that waitqueue head before turning on the interrupts.
      85cea662
    • Andrew Morton's avatar
      [PATCH] oom killer fixes · 21f8b968
      Andrew Morton authored
      From: William Lee Irwin III <wli@holomorphy.com>
      
      There are reports of kernel threads being killed by the oomkiller.  We
      think this is because the oom killer tries to kill a task after it has
      exitted and set its ->mm to zero.  The oom killer will then try to kill all
      other tasks which have a null ->mm.
      
      Attempt to detect that case and fix it up.
      21f8b968
    • Andrew Morton's avatar
      [PATCH] ext3: sync_fs() fix · af738c8a
      Andrew Morton authored
      From: Alex Tomas <bzzz@tmi.comex.ru>
      
      fsync_super() calls ->sync_fs() just after ->write_super().  But
      write_super() will start a commit.  In this case, ext3_sync_fs() will not
      itself start a commit, and it hence forgets to wait on the commit which
      ext3_write_super() started.
      
      Fix that up by making journal_start_commit() return the transaction ID of
      any currently-running transaction.
      af738c8a
    • Andrew Morton's avatar
      [PATCH] JBD: transaction buffer accounting fix · 4152cdfa
      Andrew Morton authored
      From: Alex Tomas <bzzz@tmi.comex.ru>
      
      start_this_handle() takes into account t_outstanding_credits when calculating
      log free space, but journal_next_log_block() accounts for blocks being logged
      also.  Hence, blocks are accounting twice.  This effectively reduces the
      amount of log space available to transactions and forces more commits.
      
      Fix it by decrementing t_outstanding_credits each time we allocate a new
      journal block.
      4152cdfa
    • Andrew Morton's avatar
      [PATCH] JBD: checkpointing optimisations · a2df663d
      Andrew Morton authored
      From: Alex Tomas <bzzz@tmi.comex.ru>
      
      Some transaction checkpointing improvements for the JBD commit phase.  Decent
      speedups:
      
      creation of 500K files in single dir (with htree, of course):
       before: 4m16.094s, 4m12.035s, 4m11.911s
       after:  1m41.364s, 1m43.461s, 1m45.189s
      
      removal of 500K files in single dir:
       before: 43m50.161s
       after:  38m45.510s
      
      
      - Make __log_wait_for_space() recalculate the needed blocks because journal
        free space changes during commit
      
      - Make log_do_checkpoint() starts scanning from the oldest transaction
      
      - Make log_do_checkpoint() stop scanning if a transaction gets dropped.
        The caller will reevaluate the transaction state and decide whether more
        space needs to be generated in the log.
      
        The effect of this is to smooth out the I/O patterns, avoid the huge
        stop-and-go which currently happens when forced checkpointing writes out
        and waits upon 3/4 of the journal's size worth of data.
      a2df663d
    • Andrew Morton's avatar
      [PATCH] nbd: make nbd and block layer agree about device and · 20c52ab8
      Andrew Morton authored
      From: Paul Clements <Paul.Clements@SteelEye.com>
      
      Ensure that nbd and the block layer agree about device block sizes and total
      device sizes.
      20c52ab8
    • Andrew Morton's avatar
      [PATCH] nbd: remove unneeded nbd_open/nbd_release and refcnt · 627c0412
      Andrew Morton authored
      From: Paul Clements <Paul.Clements@SteelEye.com>
      
      Remove the unneeded nbd_open and nbd_release functions.
      627c0412
    • Andrew Morton's avatar
      [PATCH] NBD documentation update · f4c39f4b
      Andrew Morton authored
      From: Paul Clements <Paul.Clements@SteelEye.com>
      
      Modernise nbd.txt a bit.
      f4c39f4b
    • Andrew Morton's avatar
      [PATCH] nbd: cleanup PARANOIA usage & code · d7b92e1d
      Andrew Morton authored
      From: Lou Langholtz <ldl@aros.net>
      
      This fifth patch cleans up usage of the PARANOIA sanity checking macro and
      code.  This patch modifies both drivers/block/nbd.c and
      include/linux/nbd.h.  It's intended to be applied incrementally on top of
      my fourth patch (4.1 really if you count the memset addition as .1's worth)
      that simply removed unneeded blksize_bits field.  Again, I wanted to get
      this smaller change out of the way before my next patch will is much more
      major.
      d7b92e1d
    • Andrew Morton's avatar
      [PATCH] nbd: initialise the embedded kobject · 4f9420c6
      Andrew Morton authored
      From: Lou Langholtz <ldl@aros.net>
      
      Fixes the NBD oopses which people have been reporting.
      4f9420c6
    • Andrew Morton's avatar
      [PATCH] nbd: remove unneeded blksize_bits field · 49e57bfc
      Andrew Morton authored
      From: Lou Langholtz <ldl@aros.net>
      
      This fourth patch simply removes the blksize_bits field from the nbd_device
      struct and driver implementation.  How this field made it into this driver
      to begin with is a mystery (where was Al Viro when that patch was
      submitted??).  :-)
      
      This patch modifies both drivers/block/nbd.c and include/linux/nbd.h files.
       It's intended to be applied incrementally on top of my third patch (for
      enhanced diagnostics support).
      49e57bfc
    • Andrew Morton's avatar
      [PATCH] nbd: enhanced diagnostics support · 9c976399
      Andrew Morton authored
      From: Lou Langholtz <ldl@aros.net>
      
      This third patch (for enhancing diagnostics support) applies incrementally
      after my last LKML'd patch (for cosmetic changes).  These changes introduce
      configurable KERN_DEBUG level printk output for a variety of different
      things that the driver does and provides the framework for enhanced future
      debugging support as well.
      9c976399
    • Andrew Morton's avatar
      [PATCH] NBD: cosmetic cleanups · 52fa6e21
      Andrew Morton authored
      From: Lou Langholtz <ldl@aros.net>
      
      It's a helpful step in being better able to identify code inefficiencies
      and problems particularly w.r.t.  locking.  It also modifies some of the
      output messages for greater consistancy and better diagnostic support.
      
      This second patch is a lead in that way to the third patch, which will
      simply introduce the dprintk() debugging facility that my jumbo patch
      originally had.
      
      With the cosmetics patch and debugging enhancement (patch), it will make it
      easier to fix or at least improve the locking bugs/races in NBD (that will
      likely make up the fourth patch in my envisioned roadmap).
      52fa6e21
    • Andrew Morton's avatar
      [PATCH] fix for CPU scheduler load distribution · e0a3db1a
      Andrew Morton authored
      From: Ingo Molnar <mingo@elte.hu>
      
      It makes hot-balancing happen in the 'busy tick' case as well, which should
      spread out processes more agressively.
      e0a3db1a
    • Andrew Morton's avatar
      [PATCH] separate locking for vfsmounts · 91b79ba7
      Andrew Morton authored
      From: Maneesh Soni <maneesh@in.ibm.com>
      
      While path walking we do follow_mount or follow_down which uses
      dcache_lock for serialisation.  vfsmount related operations also use
      dcache_lock for all updates. I think we can use a separate lock for
      vfsmount related work and can improve path walking.
      
      The following two patches does the same. The first one replaces
      dcache_lock with new vfsmount_lock in namespace.c. The lock is
      local to namespace.c and is not required outside. The second patch
      uses RCU to have lock free lookup_mnt(). The patches are quite simple
      and straight forward.
      
      The lockmeter reults show reduced contention, and lock acquisitions
      for dcache_lock while running dcachebench* on a 4-way SMP box
      
          SPINLOCKS         HOLD            WAIT
          UTIL  CON    MEAN(  MAX )   MEAN(  MAX )(% CPU)     TOTAL NOWAIT SPIN RJECT  NAME
      
        baselkm-2569:
          20.7% 20.9%  0.5us( 146us)  2.9us( 144us)(0.81%)  31590840 79.1% 20.9%    0%  dcache_lock
        mntlkm-2569:
          14.3% 13.6%  0.4us( 170us)  2.9us( 187us)(0.42%)  23071746 86.4% 13.6%    0%  dcache_lock
      
      We get more than 8% improvement on 4-way SMP and 44% improvement on 16-way
      NUMAQ while runing dcachebench*.
      
      		Average (usecs/iteration)	Std. Deviation
      		(lower is better)
      4-way SMP
        2.5.69	15739.3				470.90
        2.5.69-mnt	14459.6				298.51
      
      16-way NUMAQ
        2.5.69	120426.5			363.78
        2.5.69-mnt	 63225.8			427.60
      
      *dcachebench is a microbenchmark written by Bill Hartner and is available at
      http://www-124.ibm.com/developerworks/opensource/linuxperf/dcachebench/dcachebench.html
      
       vfsmount_lock.patch
       -------------------
       - Patch for replacing dcache_lock with new vfsmount_lock for all mount
         related operation. This removes the need to take dcache_lock while
         doing follow_mount or follow_down operations in path walking.
      
      I re-ran dcachebench with 2.5.70 as base on 16-way NUMAQ box.
      
                      	Average (usecs/iteration)       Std. Deviation
                      	(lower is better)
      16-way NUMAQ
      2.5.70 				120710.9		 	230.67
       + vfsmount_lock.patch  	65209.6				242.97
          + lookup_mnt-rcu.patch 	64042.3				416.61
      
      So just the lock splitting (vfsmount_lock.patch) gives almost similar benifits
      91b79ba7
    • Andrew Morton's avatar
      [PATCH] Fix race condition between aio_complete and · 679c40a8
      Andrew Morton authored
      From: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
      
      We hit a memory ordering race condition on AIO ring buffer tail pointer
      between function aio_complete() and aio_read_evt().
      
      What happens is that on an architecture that has a relaxed memory ordering
      model like IPF(ia64), explicit memory barrier is required in a SMP
      execution environment.  Considering the following case:
      
      1 CPU is executing a tight loop of aio_read_evt.  It is pulling event off
      the ring buffer.  During that loop, another CPU is executing aio_complete()
      where it is putting event into the ring buffer and then update the tail
      pointer.  However, due to relaxed memory ordering model, the tail pointer
      can be visible before the actual event is being updated.  So the other CPU
      sees the updated tail pointer but picks up a staled event data.
      
      A memory barrier is required in this case between the event data and tail
      pointer update.  Same is true for the head pointer but the window of the
      race condition is nil.  For function correctness, it is fixed here as well.
      
      By the way, this bug is fixed in the major distributor's kernel on 2.4.x
      kernel series for a while, but somehow hasn't been propagated to 2.5 kernel
      yet.
      679c40a8
    • Andrew Morton's avatar
      [PATCH] Bug fix in AIO initialization · b1648ead
      Andrew Morton authored
      From: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
      
      We hit this bug when we have the following scenario:
      
      One process initializes an AIO context and then forks out many child
      processes.  When those child processes exit, many BUG checks
      (effectively kernel oops) were triggered from put_ioctx(ctx) in function
      exit_aio().
      
      The issue was that the AIO context was incorrectly copied upon forking
      and mislead all child processes to think they have an IO context and
      trying to free it where they really don't own.  The following patch fix
      the issue.
      b1648ead
    • Andrew Morton's avatar
      [PATCH] Set umask correctly for nfsd kernel threads · b14241c4
      Andrew Morton authored
      From: Andreas Gruenbacher <agruen@suse.de>
      
      Without acls, when creating files the umask is applied directly in the vfs.
      ACLs require that the umask is applied at the file system level, depending on
      whether or not the containing directory has a default acl.  The daemonize()
      function makes kernel threads share their fs_struct structure with the init
      process.  Among other things, fs_struct contains the umask, so all kernel
      threads share their umask with init.
      
      The kernel nfsd needs to create files with a umask of 0.  Init's umask cannot
      simply be changed to 0 --- this would have side effects on init, and init
      would have side effects on nfsd.  So this patch recreates a fs_struct
      structure for nfsd kernel threads, and sets its umask to 0.
      
      This fixes bug #721, <http://www.osdl.net/show_bug.cgi?id=721>.
      b14241c4
    • Andrew Morton's avatar
      [PATCH] misc fixes · ecbaa730
      Andrew Morton authored
      - remove accidental debug code from ext3 commit.
      
      - /proc/profile documentation fix (Randy Dunlap)
      
      - use sb_breadahead() in ext2_preread_inode()
      
      - unused var in mpage_writepages()
      ecbaa730
    • Andrew Morton's avatar
      [PATCH] make CONFIG_KALLSYMS default to "on" · f3eee922
      Andrew Morton authored
      From: Diego Calleja Garcia <diegocg@teleline.es>
      
      Move CONFIG_KALLSYMS out of the arch directory and into init/.
      
      It defaults to "on" unless the user explicitly turns it off in the
      "embedded systems" menu.
      f3eee922
    • Andrew Morton's avatar
      [PATCH] kmap() -> kmap_atomic() in fs/exec.c · 9f1ed86f
      Andrew Morton authored
      replace a kmap() with kmap_atomic()
      9f1ed86f
    • Andrew Morton's avatar
      [PATCH] i_size atomic access · eafe5916
      Andrew Morton authored
      From: Daniel McNeil <daniel@osdl.org>
      
      This adds i_seqcount to the inode structure and then uses i_size_read() and
      i_size_write() to provide atomic access to i_size.  This is a port of
      Andrea Arcangeli's i_size atomic access patch from 2.4.  This only uses the
      generic reader/writer consistent mechanism.
      
      Before:
      mnm:/usr/src/25> size vmlinux
         text    data     bss     dec     hex filename
      2229582 1027683  162436 3419701  342e35 vmlinux
      
      After:
      mnm:/usr/src/25> size vmlinux
         text    data     bss     dec     hex filename
      2225642 1027655  162436 3415733  341eb5 vmlinux
      
      3.9k more text, a lot of it fastpath :(
      
      It's a very minor bug, and the fix has a fairly non-minor cost.  The most
      compelling reason for fixing this is that writepage() checks i_size.  If it
      sees a transient value it may decide that page is outside i_size and will
      refuse to write it.  Lost user data.
      eafe5916
    • Andrew Morton's avatar
      [PATCH] i_size atomic access: infrastructure · e9b94f6a
      Andrew Morton authored
      From: Daniel McNeil <daniel@osdl.org>
      
      This adds a sequence counter only version of the reader/writer consistent
      mechanism to seqlock.h This is used in the second part of this patch give
      atomic access to i_size.
      e9b94f6a
    • Andrew Morton's avatar
      [PATCH] wall_to_monotonic initialization fixes for · 1ac38088
      Andrew Morton authored
      From: Tim Schmielau <tim@physik3.uni-rostock.de>
      
      This patch adds (or fixes) initialization of wall_to_monotonic for a few
      more architectures.
      
      This should get rid of the strange uptime>14600 days reports, except on arm
      whose arch file layout is too unfamiliar to me.
      
      The patch is blessed by George Anzinger, but untested due to lack of
      hardware.
      1ac38088