1. 05 Nov, 2002 5 commits
    • Andrew Morton's avatar
      [PATCH] fix mod_timer() race · 2eb724ed
      Andrew Morton authored
      If two CPUs run mod_timer against the same not-pending timer then they
      have no locking relationship.  They can both see the timer as
      not-pending and they both add the timer to their cpu-local list.  The
      CPU which gets there second corrupts the first CPU's lists.
      
      This was causing Dave Hansen's 8-way to oops after a couple of minutes
      of specweb testing.
      
      I believe that to fix this we need locking which is associated with the
      timer itself.  The easy fix is hashed spinlocking based on the timer's
      address.  The hard fix is a lock inside the timer itself.
      
      It is hard because init_timer() becomes compulsory, to initialise that
      spinlock.  An unknown number of code paths in the kernel just wipe the
      timer to all-zeroes and start using it.
      
      I chose the hard way - it is cleaner and more idiomatic.  The patch
      also adds a "magic number" to the timer so we can detect when a timer
      was not correctly initialised.  A warning and stack backtrace is
      generated and the timer is fixed up.  After 16 such warnings the
      warning mechanism shuts itself up until a reboot.
      
      It took six patches to my kernel to stop the warnings from coming out.
      The uninitialised timers are extremely easy to find and fix.  But it
      will take some time to weed them all out.  Maybe we should go for
      the hashed locking...
      
      Note that the new timer->lock means that we can clean up some awkward
      "oh we raced, let's try again" code in timer.c.  But to do that we'd
      also need to take timer->lock in the commonly-called del_timer(), so I
      left it as-is.
      
      The lock is not needed in add_timer() because concurrent
      add_timer()/add_timer() and concurrent add_timer()/mod_timer() are
      illegal.
      2eb724ed
    • Andrew Morton's avatar
      [PATCH] `event' removal: kill it · 3cf803fb
      Andrew Morton authored
      Final act, from Manfred:
      
      The attached patch removes 'event' entirely from the kernel: it's not
      used anymore.
      
      All event users [vfat dentry revalidation; ext2/3 inode generation;
      readdir() file position revalidation in several filesystems] were
      converted to local counters.
      3cf803fb
    • Andrew Morton's avatar
      [PATCH] `event' removal: other filesystems · 9448b90c
      Andrew Morton authored
      Patch from Manfred Spraul
      
      Several filesystems compare f_version and i_version to validate
      directory positions in readdir(): The directory position is revalidated
      if i_version is not equal f_version.  Operations that could invalidate
      the cached position set i_version or f_version to '++event', event is a
      global variable.  Global uniqueness is not needed, 'i_version++' and
      'f_version=0' is sufficient to guarantee that the next readdir() will
      revalidate the directory position, and that avoids the need for an ugly
      global variable.
      
      The attached patch converts all filesystems except ext2, which was
      converted with a seperate patch.
      9448b90c
    • Andrew Morton's avatar
      [PATCH] `event' removal: ext2 · 9aefc010
      Andrew Morton authored
      Patch from Manfred Spraul
      
      Use a local counter instead of the global 'event' variable for the
      readdir() optimization.
      
      Depends on patch-event-II
      
      Background:
        The only user of i_version and f_version in ext2 is
        ext2_readdir(). As an optimization, ext2 performs the
        validation of the start position for readdir() only if
              flip->f_version != inode->i_version.
        If there was no llseek and no directory change since the
        last readdir() call, then f_pos can be trusted.
        f_version is set to 0 in get_empty_flip and during llseek.
        Right now, i_version set to ++event during ext2_read_inode
        and commit_chunk, i.e. at inode creation and if a directory
        is changed.
        Initializing i_version to 1, and updating with i_version++
        achieves the same effect, without the need of a global variable.
        Global uniqueness is not required, there are no other uses
        of [if]_version in ext2.
      
      Change relative to the patch you have right now:
      i_version is initialized to 1 instead of 0. For ext2 it's doesn't
      matter [there is always a valid 'len' value at the beginning of a
      directory data block], but it's cleaner.
      9aefc010
    • Andrew Morton's avatar
      [PATCH] `event' removal: core kernel · 4ccf7a32
      Andrew Morton authored
      Patch from Manfred Spraul
      
      f_version and i_version are used by filesystems to check if it can
      reuse the f_pos position across readdir calls without validation.
      
      Right now f_version and i_version are modified by
          f_version = ++event;
          i_version = ++event;
          if (f_version != i_version) goto revalidate
      and event is a global, exported variable.
      
      But that's not needed,
          f_version  = 0;
          i_version++;
          if (f_version != i_version) goto revalidate
      works too, without the ugly 'event' variable.
      
      I got an ok from viro, and I had notified the fs maintainers, no
      complaints either
      
      - block_dev.c, block_llseek updates f_version to '++event'.
         grep showed that no device driver uses f_version, this is dead
         code copied from the default llseek implementation.
      
      - the llseek implementations and get_empty_flip set f_version
         to '++event'
         This is not dead code, but
                  filp->f_version = 0
         achieves the same effect:
         f_version is used by the readdir() implementation of several
         filesystems to skip the revalidation of f_pos at the beginning
         of a readdir call: If llseek was not called and the filesystem
         did not change since the last readdir call, then the value in
         f_pos can be trusted.
         The implementation (for example in ext2) is
           inode->i_version = ++event;
         in all operations that change a directory
         At the beginning of file_operation->readdir():
           if(inode->i_version != flip->f_version)
                      revalidate();
           filp->f_version = inode->i_version;
         There are other users of f_version, but none of them use the
         default llseek implementation (e.g. fs/pipe.c)
      4ccf7a32
  2. 04 Nov, 2002 7 commits
  3. 03 Nov, 2002 28 commits