1. 24 Jun, 2004 33 commits
    • Andrew Morton's avatar
      [PATCH] reduce function inlining in slab.c · f875aa02
      Andrew Morton authored
      From: Manfred Spraul <manfred@colorfullife.com>
      
      slab.c contains too many inline functions:
      
      - some functions that are not performance critical were inlined.  Waste
        of text size.
      
      - The debug code relies on __builtin_return_address(0) to keep track of
        the callers.  According to rmk, gcc didn't inline some functions as
        expected and that resulted in useless debug output.  This was probably
        caused by the large debug-only inline functions.
      
      The attached patche removes most inline functions:
      
      - the empty on release/huge on debug inline functions were replaced with
        empty macros on release/normal functions on debug.
      
      - spurious inline statements were removed.
      
      The code is down to 6 inline functions: three one-liners for struct
      abstractions, one for a might_sleep_if test and two for the performance
      critical __cache_alloc / __cache_free functions.
      
      Note: If an embedded arch wants to save a few bytes by uninlining
      __cache_{free,alloc}: The right way to do that is to fold the functions
      into kmem_cache_xy and then replace kmalloc with
      kmem_cache_alloc(kmem_find_general_cachep(),).
      
      Signed-Off: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f875aa02
    • Andrew Morton's avatar
      [PATCH] hwcache align kmalloc caches · b167eef8
      Andrew Morton authored
      From: Manfred Spraul <manfred@colorfullife.com>
      
      Reversing the patches that made all caches hw cacheline aligned had an
      unintended side effect on the kmalloc caches: Before they had the
      SLAB_HWCACHE_ALIGN flag set, now it's clear.  This breaks one sgi driver -
      it expects aligned caches.  Additionally I think it's the right thing to
      do: It costs virtually nothing (the caches are power-of-two sized) and
      could reduce false sharing.
      
      Additionally, the patch adds back the documentation for the
      SLAB_HWCACHE_ALIGN flag.
      
      Signed-Off: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b167eef8
    • Andrew Morton's avatar
      [PATCH] tweak the buddy allocator for better I/O merging · c75b81a5
      Andrew Morton authored
      From: William Lee Irwin III <wli@holomorphy.com>
      
      Based on Arjan van de Ven's idea, with guidance and testing from James
      Bottomley.
      
      The physical ordering of pages delivered to the IO subsystem is strongly
      related to the order in which fragments are subdivided from larger blocks
      of memory tracked by the page allocator.
      
      Consider a single MAX_ORDER block of memory in isolation acted on by a
      sequence of order 0 allocations in an otherwise empty buddy system.
      Subdividing the block beginning at the highest addresses will yield all the
      pages of the block in reverse, and subdividing the block begining at the
      lowest addresses will yield all the pages of the block in physical address
      order.
      
      Empirical tests demonstrate this ordering is preserved, and that changing
      the order of subdivision so that the lowest page is split off first
      resolves the sglist merging difficulties encountered by driver authors at
      Adaptec and others in James Bottomley's testing.
      
      James found that before this patch, there were 40 merges out of about 32K
      segments.  Afterward, there were 24007 merges out of 19513 segments, for a
      merge rate of about 55%.  Merges of 128 segments, the maximum allowed, were
      observed afterward, where beforehand they never occurred.  It also improves
      dbench on my workstation and works fine there.
      Signed-off-by: default avatarWilliam Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c75b81a5
    • Andrew Morton's avatar
      [PATCH] Use fancy wakeups in wait.h · 758e48e4
      Andrew Morton authored
      Use the more SMP-friendly prepare_to_wait()/finish_wait() in wait_event() and
      friends.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      758e48e4
    • Andrew Morton's avatar
      [PATCH] dnotify.c: use inode->i_lock in place of dn_lock · 0ac04ac1
      Andrew Morton authored
      From: "Adam J. Richter" <adam@yggdrasil.com>
      
      Replace the use of a global spinlock with the per-inode ->i_lock.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0ac04ac1
    • Andrew Morton's avatar
      [PATCH] vm: vfs shrinkage tuning · a4411519
      Andrew Morton authored
      Some people want the dentry and inode caches shrink harder, others want them
      shrunk more reluctantly.
      
      The patch adds /proc/sys/vm/vfs_cache_pressure, which tunes the vfs cache
      versus pagecache scanning pressure.
      
      - at vfs_cache_pressure=0 we don't shrink dcache and icache at all.
      
      - at vfs_cache_pressure=100 there is no change in behaviour.
      
      - at vfs_cache_pressure > 100 we reclaim dentries and inodes harder.
      
      
      The number of megabytes of slab left after a slocate.cron on my 256MB test
      box:
      
      vfs_cache_pressure=100000   33480
      vfs_cache_pressure=10000    61996
      vfs_cache_pressure=1000     104056
      vfs_cache_pressure=200      166340
      vfs_cache_pressure=100      190200
      vfs_cache_pressure=50       206168
      
      Of course, this just left more directory and inode pagecache behind instead of
      vfs cache.  Interestingly, on this machine the entire slocate run fits into
      pagecache, but not into VFS caches.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      a4411519
    • Andrew Morton's avatar
      [PATCH] vmscan.c: dont reclaim too many pages · 42b8d994
      Andrew Morton authored
      The shrink_zone() logic can, under some circumstances, cause far too many
      pages to be reclaimed.  Say, we're scanning at high priority and suddenly hit
      a large number of reclaimable pages on the LRU.
      
      Change things so we bale out when SWAP_CLUSTER_MAX pages have been reclaimed.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      42b8d994
    • Andrew Morton's avatar
      [PATCH] vmscan.c scan rate fixes · 2332dc78
      Andrew Morton authored
      We've been futzing with the scan rates of the inactive and active lists far
      too much, and it's still not right (Anton reports interrupt-off times of over
      a second).
      
      - We have this logic in there from 2.4.early (at least) which tries to keep
        the inactive list 1/3rd the size of the active list.  Or something.
      
        I really cannot see any logic behind this, so toss it out and change the
        arithmetic in there so that all pages on both lists have equal scan rates.
      
      - Chunk the work up so we never hold interrupts off for more that 32 pages
        worth of scanning.
      
      - Make the per-zone scan-count accumulators unsigned long rather than
        atomic_t.
      
        Mainly because atomic_t's could conceivably overflow, but also because
        access to these counters is racy-by-design anyway.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      2332dc78
    • Andrew Morton's avatar
      [PATCH] vmscan.c: shuffle things around · acba6041
      Andrew Morton authored
      Move all the data structure declarations, macros and variable definitions to
      less surprising places.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      acba6041
    • Andrew Morton's avatar
      [PATCH] Fix and Reenable MSI Support on x86_64 · 0342e162
      Andrew Morton authored
      From: long <tlnguyen@snoqualmie.dp.intel.com>
      
      MSI support for x86_64 is currently disabled in the kernel 2.6.x.  Below is
      the patch, which provides a fix and reenable it.
      
      In addition, the patch provides a info message during kernel boot if
      configuring vector-base indexing.
      
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0342e162
    • Andrew Morton's avatar
      [PATCH] make irqaction use a cpu mask · 8c05319f
      Andrew Morton authored
      From: William Lee Irwin III <wli@holomorphy.com>
      
      The following patch makes irqaction's ->mask a cpumask as it was intended
      to be and wraps up the rest of the sweep.  Only struct irqaction is
      usefully greppable, so there may be some assignments to ->mask missing
      still.  This removes more code than it adds.
      
      From: William Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8c05319f
    • Andrew Morton's avatar
      [PATCH] alpha: cpumask fixups · 4320cbbd
      Andrew Morton authored
      From: William Lee Irwin III <wli@holomorphy.com>
      
      The cpumask patches broke alpha's build, even without the irqaction
      patch, largely centering around cpu_possible_map.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4320cbbd
    • Andrew Morton's avatar
      [PATCH] clean up cpumask_t temporaries · a3dcb7f4
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      Paul Jackson's cpumask tour-de-force allows us to get rid of those stupid
      temporaries which we used to hold CPU_MASK_ALL to hand them to functions.
      This used to break NR_CPUS > BITS_PER_LONG.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      a3dcb7f4
    • Andrew Morton's avatar
      [PATCH] cpumask: comment, spacing tweaks · 02d7effd
      Andrew Morton authored
      From: Paul Jackson <pj@sgi.com>
      
      Tweak cpumask.h comments, spacing:
      
      - Add comments for cpu_present_map macros: num_present_cpus() and
        cpu_present()
      
      - Remove comments for obsolete macros: cpu_set_online(),
        cpu_set_offline()
      
      - Reorder a few comment lines, to match the code and confuse readers of
        this patch
      
      - Tabify one chunk of code
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      02d7effd
    • Andrew Morton's avatar
      [PATCH] cpumask: optimize various uses of new cpumasks · 4b81e400
      Andrew Morton authored
      From: Paul Jackson <pj@sgi.com>
      
      Make use of for_each_cpu_mask() macro to simplify and optimize a couple of
      sparc64 per-CPU loops.
      
      Optimize a bit of cpumask code for asm-i386/mach-es7000
      
      Convert physids_complement() to use both args in the files
      include/asm-i386/mpspec.h, include/asm-x86_64/mpspec.h.
      
      Remove cpumask hack from asm-x86_64/topology.h routine pcibus_to_cpumask().
      
      Clarify and slightly optimize several cpumask manipulations in kernel/sched.c
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4b81e400
    • Andrew Morton's avatar
      [PATCH] cpumask: Remove no longer used obsolete macro emulation · 5ffa67fc
      Andrew Morton authored
      From: Paul Jackson <pj@sgi.com>
      
      Now that the emulation of the obsolete cpumask macros is no longer needed,
      remove it from cpumask.h
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5ffa67fc
    • Andrew Morton's avatar
      [PATCH] ppc64: cpu_online fix · ea72b241
      Andrew Morton authored
      include/asm/smp.h:55:1: warning: "cpu_possible" redefined
      include/asm/smp.h:54:1: warning: "cpu_online" redefined
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ea72b241
    • Andrew Morton's avatar
      [PATCH] x86_64: cpu_online fix · b8a02d07
      Andrew Morton authored
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b8a02d07
    • Andrew Morton's avatar
      [PATCH] cpumask: remove obsolete cpumask macro uses - other archs · 7f1c9f57
      Andrew Morton authored
      From: Paul Jackson <pj@sgi.com>
      
      Remove by recoding other uses of the obsolete cpumask const, coerce and
      promote macros.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7f1c9f57
    • Andrew Morton's avatar
      [PATCH] cpumask: remove obsolete cpumask macro uses - i386 arch · 9eb0dcc1
      Andrew Morton authored
      From: Paul Jackson <pj@sgi.com>
      
      Remove by recoding i386 uses of the obsolete cpumask const, coerce and promote
      macros.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9eb0dcc1
    • Andrew Morton's avatar
      [PATCH] cpumask: remove 26 no longer used cpumask*.h files · ed880528
      Andrew Morton authored
      From: Paul Jackson <pj@sgi.com>
      
      With the cpumask rewrite in the previous patch, these various
      include/asm-*/cpumask*.h headers are no longer used.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ed880528
    • Andrew Morton's avatar
      [PATCH] cpumask: rewrite cpumask.h - single bitmap based implementation · f3344dc3
      Andrew Morton authored
      From: Paul Jackson <pj@sgi.com>
      
      Major rewrite of cpumask to use a single implementation, as a struct-wrapped
      bitmap.
      
      This patch leaves some 26 include/asm-*/cpumask*.h header files orphaned - to
      be removed next patch.
      
      Some nine cpumask macros for const variants and to coerce and promote between
      an unsigned long and a cpumask are obsolete.  Simple emulation wrappers are
      provided in this patch for these obsolete macros, which can be removed once
      each of the 3 archs (i386, ppc64, x86_64) using them are recoded in follow-on
      patches to not need them.
      
      The CPU_MASK_ALL macro now avoids leaving possible garbage one bits in any
      unused portion of the high word.
      
      An inproved comment lists all available operators, for convenient browsing.
      
      From: Mikael Pettersson <mikpe@csd.uu.se>
      
        2.6.7-rc3-mm1 changed CPU_MASK_NONE into something that isn't a valid
        rvalue (it only works inside struct initializers).  This caused compile-time
        errors in perfctr in UP x86 builds.
      
      From: Arnd Bergmann <arnd@arndb.de>
      
        cpumask-5-10-rewrite-cpumaskh-single-bitmap-based from 2.6.7-rc3-mm1
        causes include2/asm/smp.h:54:1: warning: "cpu_online" redefined
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarMikael Pettersson <mikpe@csd.uu.se>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f3344dc3
    • Andrew Morton's avatar
      [PATCH] cpumask: bitmap inlining and optimizations · d6cf71d3
      Andrew Morton authored
      From: Paul Jackson <pj@sgi.com>
      
      These bitmap improvements make it a suitable basis for fully supporting
      cpumask_t and nodemask_t.  Inline macros with compile-time checks enable
      generating tight code on both small and large systems (large meaning cpumask_t
      requires more than one unsigned long's worth of bits).
      
      The existing bitmap_<op> macros in lib/bitmap.c are renamed to __bitmap_<op>,
      and wrappers for each bitmap_<op> are exposed in include/linux/bitmap.h
      
      This patch _includes_ Bill Irwins rewrite of the bitmap_shift operators to not
      require a fixed length intermediate bitmap.
      
      Improved comments list each available operator for easy browsing.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d6cf71d3
    • Andrew Morton's avatar
      [PATCH] cpumask: bitmap cleanup preparation for cpumask overhaul · ea0c1929
      Andrew Morton authored
      From: Paul Jackson <pj@sgi.com>
      
      Document the bitmap bit model and handling of unused bits.
      
      Tighten up bitmap so it does not generate nonzero bits in the unused tail if
      it is not given any on input.
      
      Add intersects, subset, xor and andnot operators.  Change bitmap_complement to
      take two operands.
      
      Add a couple of missing 'const' qualifiers on bitops test_bit and bitmap_equal
      args.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ea0c1929
    • Andrew Morton's avatar
      [PATCH] cpumask: make cpu_present_map real even on non-smp · d2cec97b
      Andrew Morton authored
      From: Paul Jackson <pj@sgi.com>
      
      This patch makes cpu_present_map a real map for all configurations, instead of
      a constant for non-SMP.  It also moves the definition of cpu_present_map out
      of kernel/cpu.c into kernel/sched.c, because cpu.c isn't compiled into non-SMP
      kernels.
      
      The pattern is that each of the possible, present and online cpu maps are
      actual kernel global cpumask_t variables, for all configurations.  They are
      documented in include/linux/cpumask.h.  Some of the UP (NR_CPUS=1) code
      cheats, and hardcodes the assumption that the single bit position of these
      maps is always set, as an optimization.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d2cec97b
    • Andrew Morton's avatar
      [PATCH] rcu: avoid passing an argument to the callback function · 8c1ce9d6
      Andrew Morton authored
      From: Dipankar Sarma <dipankar@in.ibm.com>
      
      This patch changes the call_rcu() API and avoids passing an argument to the
      callback function as suggested by Rusty.  Instead, it is assumed that the
      user has embedded the rcu head into a structure that is useful in the
      callback and the rcu_head pointer is passed to the callback.  The callback
      can use container_of() to get the pointer to its structure and work with
      it.  Together with the rcu-singly-link patch, it reduces the rcu_head size
      by 50%.  Considering that we use these in things like struct dentry and
      struct dst_entry, this is good savings in space.
      
      An example :
      
      struct my_struct {
      	struct rcu_head rcu;
      	int x;
      	int y;
      };
      
      void my_rcu_callback(struct rcu_head *head)
      {
      	struct my_struct *p = container_of(head, struct my_struct, rcu);
      	free(p);
      }
      
      void my_delete(struct my_struct *p)
      {
      	...
      	call_rcu(&p->rcu, my_rcu_callback);
      	...
      }
      Signed-Off-By: default avatarDipankar Sarma <dipankar@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8c1ce9d6
    • Andrew Morton's avatar
      [PATCH] reduce rcu_head size - core · b659a6fb
      Andrew Morton authored
      From: Dipankar Sarma <dipankar@in.ibm.com>
      
      This reduces the RCU head size by using a singly linked to maintain them.
      The ordering of the callbacks is still maintained as before by using a tail
      pointer for the next list.
      
      Signed-Off-By : Dipankar Sarma <dipankar@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b659a6fb
    • Andrew Morton's avatar
      [PATCH] rcu lock update: Code move & cleanup · 72914d30
      Andrew Morton authored
      From: Manfred Spraul <manfred@colorfullife.com>
      
      Step three for reducing cacheline trashing within rcupdate.c:
      
      Cleanup and code move from <linux/rcupdate.h> to kernel/rcupdate.c: Remove
      internal details from the header file.
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      72914d30
    • Andrew Morton's avatar
      [PATCH] rcu lock update: Use a sequence lock for starting batches · 720e8a63
      Andrew Morton authored
      From: Manfred Spraul <manfred@colorfullife.com>
      
      Step two for reducing cacheline trashing within rcupdate.c:
      
      rcu_process_callbacks always acquires rcu_ctrlblk.state.mutex and calls
      rcu_start_batch, even if the batch is already running or already scheduled to
      run.
      
      This can be avoided with a sequence lock: A sequence lock allows to read the
      current batch number and next_pending atomically.  If next_pending is already
      set, then there is no need to acquire the global mutex.
      
      This means that for each grace period, there will be
      
      - one write access to the rcu_ctrlblk.batch cacheline
      
      - lots of read accesses to rcu_ctrlblk.batch (3-10*cpus_online()).  Behavior
        similar to the jiffies cacheline, shouldn't be a problem.
      
      - cpus_online()+1 write accesses to rcu_ctrlblk.state, all of them starting
        with spin_lock(&rcu_ctrlblk.state.mutex).
      
        For large enough cpus_online() this will be a problem, but all except two
        of the spin_lock calls only protect the rcu_cpu_mask bitmap, thus a
        hierarchical bitmap would allow to split the write accesses to multiple
        cachelines.
      
      Tested on an 8-way with reaim.  Unfortunately it probably won't help with Jack
      Steiner's 'ls' test since in this test only one cpu generates rcu entries.
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      720e8a63
    • Andrew Morton's avatar
      [PATCH] rcu lock update: Add per-cpu batch counter · 5c60169a
      Andrew Morton authored
      From: Manfred Spraul <manfred@colorfullife.com>
      
      Below is the one of my patches from my rcu lock update.  Jack Steiner tested
      the first one on a 512p and it resolved the rcu cache line trashing.  All were
      tested on osdl with STP.
      
      Step one for reducing cacheline trashing within rcupdate.c:
      
      The current code uses the rcu_cpu_mask bitmap both for keeping track of the
      cpus that haven't gone through a quiescent state and for checking if a cpu
      should look for quiescent states.  The bitmap is frequently changed and the
      check is done by polling - together this causes cache line trashing.
      
      If it's cheaper to access a (mostly) read-only cacheline than a cacheline that
      is frequently dirtied, then it's possible to reduce the trashing by splitting
      the rcu_cpu_mask bitmap into two cachelines:
      
      The patch adds a generation counter and moves it into a separate cacheline.
      This allows to removes all accesses to rcu_cpumask (in the read-write
      cacheline) from rcu_pending and at least 50% of the accesses from
      rcu_check_quiescent_state.  rcu_pending and all but one call per cpu to
      rcu_check_quiescent_state access the read-only cacheline.  Probably not enough
      for 512p, but it's a start, just for 128 byte more memory use, without slowing
      down rcu grace periods.  Obviously the read-only cacheline is not really
      read-only: it's written once per grace period to indicate that a new grace
      period is running.
      
      Tests on an 8-way Pentium III with reaim showed some improvement:
      
      oprofile hits:
      Reference: http://khack.osdl.org/stp/293075/
      Hits	   %
      23741     0.0994  rcu_pending
      19057     0.0798  rcu_check_quiescent_state
      6530      0.0273  rcu_check_callbacks
      
      Patched: http://khack.osdl.org/stp/293076/
      8291      0.0579  rcu_pending
      5475      0.0382  rcu_check_quiescent_state
      3604      0.0252  rcu_check_callbacks
      
      The total runtime differs between both runs, thus the % number must
      be compared: Around 50% faster. I've uninlined rcu_pending for the
      test.
      
      Tested with reaim and kernbench.
      
      Description:
      
      - per-cpu quiescbatch and qs_pending fields introduced: quiescbatch contains
        the number of the last quiescent period that the cpu has seen and qs_pending
        is set if the cpu has not yet reported the quiescent state for the current
        period.  With these two fields a cpu can test if it should report a
        quiescent state without having to look at the frequently written
        rcu_cpu_mask bitmap.
      
      - curbatch split into two fields: rcu_ctrlblk.batch.completed and
        rcu_ctrlblk.batch.cur.  This makes it possible to figure out if a grace
        period is running (completed != cur) without accessing the rcu_cpu_mask
        bitmap.
      
      - rcu_ctrlblk.maxbatch removed and replaced with a true/false next_pending
        flag: next_pending=1 means that another grace period should be started
        immediately after the end of the current period.  Previously, this was
        achieved by maxbatch: curbatch==maxbatch means don't start, curbatch!=
        maxbatch means start.  A flag improves the readability: The only possible
        values for maxbatch were curbatch and curbatch+1.
      
      - rcu_ctrlblk split into two cachelines for better performance.
      
      - common code from rcu_offline_cpu and rcu_check_quiescent_state merged into
        cpu_quiet.
      
      - rcu_offline_cpu: replace spin_lock_irq with spin_lock_bh, there are no
        accesses from irq context (and there are accesses to the spinlock with
        enabled interrupts from tasklet context).
      
      - rcu_restart_cpu introduced, s390 should call it after changing nohz:
        Theoretically the global batch counter could wrap around and end up at
        RCU_quiescbatch(cpu).  Then the cpu would not look for a quiescent state and
        rcu would lock up.
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5c60169a
    • Andrew Morton's avatar
      [PATCH] Move saved_command_line to init/main.c · b884e838
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      Currently every arch declares its own char saved_command_line[].  Make sure
      every arch defines COMMAND_LINE_SIZE in asm/setup.h, and declare
      saved_command_line in linux/init.h (init/main.c contains the definition).
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b884e838
    • Andrew Morton's avatar
      [PATCH] jbd needs to wait for locked buffers · 4d4f4cc4
      Andrew Morton authored
      From: Chris Mason <mason@suse.com>
      
      jbd needs to wait for any io to complete on the buffer before changing the
      end_io function.  Using set_buffer_locked means that it can change the
      end_io function while the page is in the middle of writeback, and the
      writeback bit on the page will never get cleared.
      
      Since we set the buffer dirty earlier on, if the page was previously dirty,
      pdflush or memory pressure might trigger a writepage call, which will race
      with jbd's set_buffer_locked.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4d4f4cc4
    • Andrew Morton's avatar
      [PATCH] Allow i386 to reenable interrupts on lock contention · 36f9f209
      Andrew Morton authored
      From: Zwane Mwaikambo <zwane@linuxpower.ca>
      
      Following up on Keith's code, I adapted the i386 code to allow enabling
      interrupts during contested locks depending on previous interrupt
      enable status. Obviously there will be a text increase (only for non
      CONFIG_SPINLINE case), although it doesn't seem so bad, there will be an
      increased exit latency when we attempt a lock acquisition after spinning
      due to the extra instructions. How much this will affect performance I'm
      not sure yet as I haven't had time to micro bench.
      
         text    data     bss     dec     hex filename
      2628024  921731       0 3549755  362a3b vmlinux-after
      2621369  921731       0 3543100  36103c vmlinux-before
      2618313  919222       0 3537535  35fa7f vmlinux-spinline
      
      The code has been stress tested on a 16x NUMAQ (courtesy OSDL).
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      36f9f209
  2. 23 Jun, 2004 4 commits
  3. 22 Jun, 2004 3 commits
    • Jesse Barnes's avatar
      [PATCH] ppc32: Support for new Apple laptop models · f4897eb3
      Jesse Barnes authored
      This adds sound support for some of the newer PowerBooks.  It appears
      that this chip supports the AWACS sample rates, but has a snapper-style
      mixer.  Tested and works on my PowerBook5,4. 
      Signed-off-by: default avatarJesse Barnes <jbarnes@sgi.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f4897eb3
    • Paul Mackerras's avatar
      [PATCH] Handle altivec assist exception properly · 7a08473b
      Paul Mackerras authored
      This is the PPC64 counterpart of the PPC32 Altivec assist exception
      handler that went in recently.
      
      On PPC64 machines with Altivec (i.e.  machines that use the PPC970 chip,
      such as the G5 powermac), the altivec floating-point instructions can
      operate in two modes: one where denormalized inputs or outputs are
      truncated to zero, and one where they aren't.  In the latter mode the
      processor can take an exception when it encounters denormalized
      floating-point inputs or outputs rather than dealing with them in
      hardware.
      
      This patch adds code to deal properly with the exception, by emulating
      the instruction that caused the exception.  Previously the kernel just
      switched the altivec unit into the truncate-to-zero mode, which works
      but is a bit gross.  Fortunately there are only a limited set of altivec
      instructions which can generate the assist exception, so we don't have
      to emulate the whole altivec instruction set.
      
      Note that Altivec is Motorola's name for the PowerPC vector/SIMD
      instructions; IBM calls the same thing VMX, and currently only IBM makes
      64-bit PowerPC CPU chips.  Nevertheless, I have used the term Altivec in
      the PPC64 code for consistency with the PPC32 code.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7a08473b
    • Benjamin Herrenschmidt's avatar
      [PATCH] radeonfb: Fix panel detection on some laptops · 6340e7ba
      Benjamin Herrenschmidt authored
      The code in radeonfb looking for the BIOS image currently uses the BIOS
      ROM if any, and falls back to the RAM image if not found.  This is
      unfortunatly not correct for a bunch of laptops where the real panel
      data are only present in the RAM image.
      
      This works around this problem by preferring the RAM image on mobility
      chipsets.  This is definitely not the best workaround, we need some arch
      support for linking the RAM image to the PCI ID (preferrably by having
      the arch snapshot it during boot, isolating us completely from the
      details of where this image is in memory).  I'll see how we can get such
      an improvement later.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      6340e7ba