1. 30 Oct, 2002 40 commits
    • Andrew Morton's avatar
      [PATCH] hot-n-cold pages: page allocator core · a206231b
      Andrew Morton authored
      Hot/Cold pages and zone->lock amortisation
      a206231b
    • Andrew Morton's avatar
      [PATCH] hot-n-cold pages: bulk page freeing · 1d2652dd
      Andrew Morton authored
      Patch from Martin Bligh.
      
      Implements __free_pages_bulk().  Release multiple pages of a given
      order into the buddy all within a single acquisition of the zone lock.
      
      This also removes current->local_pages.  The per-task list of pages
      which only ever contained one page.  To prevent other tasks from
      stealing pages which this task has just freed up.
      
      Given that we're freeing into the per-cpu caches, and that those are
      multipage caches, and the cpu-stickiness of the scheduler, I think
      current->local_pages is no longer needed.
      1d2652dd
    • Andrew Morton's avatar
      [PATCH] hot-n-cold pages: bulk page allocator · 38e419f5
      Andrew Morton authored
      This is the hot-n-cold-pages series.  It introduces a per-cpu lockless
      LIFO pool in front of the page allocator.  For three reasons:
      
      1: To reduce lock contention on the buddy lock: we allocate and free
         pages in, typically, 16-page chunks.
      
      2: To return cache-warm pages to page allocation requests.
      
      3: As infrastructure for a page reservation API which can be used to
         ensure that the GFP_ATOMIC radix-tree node and pte_chain allocations
         cannot fail.  That code is not complete, and does not absolutely
         require hot-n-cold pages.  It'll work OK though.
      
      We add two queues per CPU.  The "hot" queue contains pages which the
      freeing code thought were likely to be cache-hot.  By default, new
      allocations are satisfied from this queue.
      
      The "cold" queue contains pages which the freeing code expected to be
      cache-cold.  The cold queue is mainly for lock amortisation, although
      it is possible to explicitly allocate cold pages.  The readahead code
      does that.
      
      I have been hot and cold on these patches for quite some time - the
      benefit is not great.
      
      - 4% speedup in Randy Hron's benching of the autoconf regression
        tests on a 4-way.  Most of this came from savings in pte_alloc and
        pmd_alloc: the pagetable clearing code liked the warmer pages (some
        architectures still have the pgt_cache, and can perhaps do away with
        them).
      
      - 1% to 2% speedup in kernel compiles on my 4-way and Martin's 32-way.
      
      - 60% speedup in a little test program which writes 80 kbytes to a
        file and ftruncates it to zero again.  Ran four instances of that on
        4-way and it loved the cache warmth.
      
      - 2.5% speedup in Specweb testing on 8-way
      
      - The thing which won me over: an 11% increase in throughput of the
        SDET benchmark on an 8-way PIII:
      
      	with hot & cold:
      
      	RESULT for 8 users is 17971    +12.1%
      	RESULT for 16 users is 17026   +12.0%
      	RESULT for 32 users is 17009   +10.4%
      	RESULT for 64 users is 16911   +10.3%
      
      	without:
      
      	RESULT for 8 users is 16038
      	RESULT for 16 users is 15200
      	RESULT for 32 users is 15406
      	RESULT for 64 users is 15331
      
        SDET is a very old SPEC test which simulates a development
        environment with a large number of users.  Lots of users running a
        mix of shell commands, basically.
      
      
      These patches were written by Martin Bligh and myself.
      
      This one implements rmqueue_bulk() - a function for removing multiple
      pages of a given order from the buddy lists.
      
      This is for lock amortisation: take the highly-contended zone->lock
      with less frequency, do more work once it has been acquired.
      38e419f5
    • Andrew Morton's avatar
      [PATCH] percpu: convert global page accounting · afce7191
      Andrew Morton authored
      Convert global page state accounting to use per-cpu storage
      
      (I think this code remains a little buggy, btw.  Note how I do
      
      	per_cpu(page_states, cpu).member += (delta);
      
      This gets done at interrupt time and hence is assuming that
      the "+=" operation on a ulong is atomic wrt interrupts on
      all architectures. How do we feel about that assumption?)
      afce7191
    • Andrew Morton's avatar
      [PATCH] percpu: create an EXPORT_PER_CPU_SYMBOL() macro · 999eac41
      Andrew Morton authored
      This is needed so that per-cpu information in the core kernel can be
      accessed from modules.
      999eac41
    • Andrew Morton's avatar
      [PATCH] percpu: convert buffer.c · e252fb96
      Andrew Morton authored
      Patch from Dipankar Sarma <dipankar@in.ibm.com>
      
      This patch makes per_cpu bh_accounting safe for cpu_possible
      allocation by using cpu notifiers.
      e252fb96
    • Andrew Morton's avatar
      [PATCH] percpu: convert softirqs · c1bf37e9
      Andrew Morton authored
      Patch from Dipankar Sarma <dipankar@in.ibm.com>
      
      This patch makes per_cpu tasklet vectors safe for cpu_possible
      allocation by using CPU notifiers.
      c1bf37e9
    • Andrew Morton's avatar
      [PATCH] percpu: convert timers · cf228cdc
      Andrew Morton authored
      Patch from Dipankar Sarma <dipankar@in.ibm.com>
      
      This patch changes the per-CPU data in timer management (tvec_bases)
      to use per_cpu data area and makes it safe for cpu_possible allocation
      by using CPU notifiers. End result - saving space.
      
      Depends on cpu_possible patch.
      cf228cdc
    • Andrew Morton's avatar
      [PATCH] percpu: convert RCU · c12e16e2
      Andrew Morton authored
      Patch from Dipankar Sarma <dipankar@in.ibm.com>
      
      This patch convers RCU per_cpu data to use per_cpu data area
      and makes it safe for cpu_possible allocation by using CPU
      notifiers.
      c12e16e2
    • Andrew Morton's avatar
      [PATCH] percpu: fix compile warning for UP builds · 0c83f291
      Andrew Morton authored
      A typical construct is:
      
      	int cpu = get_cpu();
      
      	foo = per_cpu(bar, cpu);
      	put_cpu();
      
      but this generates a compiler warning on uniprocessor builds: unused
      variable `cpu'.
      
      Add a dummy ref to `cpu' to per_cpu() to prevent this.
      0c83f291
    • Andrew Morton's avatar
      [PATCH] percpu: balance_dirty_pages ratelimit counters · f98bf5ff
      Andrew Morton authored
      Convert balance_dirty_pages_ratelimited() to use percpu storage
      for the ratelimiting counters.
      f98bf5ff
    • Andrew Morton's avatar
      [PATCH] slab: Use CPU notifiers · 4524ea04
      Andrew Morton authored
      - allocate memory for cpu buffers in cpu_up_prepare
      
      - start the timer in cpu_online
      
      - free the memory for cpu buffers in cpu_up_cancel.
      4524ea04
    • Andrew Morton's avatar
      [PATCH] slab: additional code cleanup · b464df2e
      Andrew Morton authored
      From Manfred Spraul
      
      - remove all typedef, except the kmem_bufctl_t.  It's a redefine for
        an int, i.e.  qualifies as tiny.
      
      - convert most macros to inline functions.
      b464df2e
    • Andrew Morton's avatar
      [PATCH] slab: Remove cache_chain_lock · 716b7ab1
      Andrew Morton authored
      Manfred added a new lock to protect the global list of slab caches.  We
      already have a semaphore from those but he needs locking from timer
      context.
      
      So here we remove that lock and just do a down_trylock() on the
      existing semaphore.  If that fails give up - we'll try again next timer
      tick.
      716b7ab1
    • Andrew Morton's avatar
      [PATCH] slab: Rework the slab timer code to use add_timer_on · bf19f75e
      Andrew Morton authored
      Manfred had all this weird code to schedule a kernel thread onto a
      different CPU just so that we could bond a timer to that CPU.
      
      Convert it all to use the new add_timer_on().
      bf19f75e
    • Andrew Morton's avatar
      [PATCH] slab: reap timers · fd1425d5
      Andrew Morton authored
      - add a reap timer that returns stale objects from the cpu arrays
      - use list_for_each instead of while loops
      - /proc/slabinfo layout change, for a new field about reaping.
      
      Implementation:
      slab contains 2 caches that contain objects that might be usable to the
      systems:
      - the cpu arrays contains objects that other cpus could use
      - the slabs_free list contains freeable slabs, i.e. pages that someone
      else might want.
      
      The patch now keeps track of accesses to the cpu arrays and to the free
      list. If there were no recent activities in one of the caches, part of
      the cache is flushed.
      
      Unlike <2.5.39, only a small part (~20%) is flushed each time:
      The older kernel would refill/drain bounce heavily under memory pressure:
      
      - kmem_cache_alloc: notices that there are no objects in the cpu
              cache, loads 120 objects from the slab lists, return 1.
              [assuming batchcount=120]
      - kmem_cache_reap is called due to memory pressure, finds 119
              objects in the cpu array and returns them to the slab lists.
      - repeat.
      
      In addition, the length of the free list is limited based on the free
      list accesses: a fixed "1" limit hurts the large object caches.
      
      That's the last part for now, next is: [not yet written]
      - cleanup: BUG_ON instead of if() BUG
      - OOM handling for enable_cpucaches
      - remove the unconditional might_sleep() from
              cache_alloc_debugcheck_before, and make that DEBUG dependant.
      - initial NUMA support, just to collect some stats:
              Which percentage of the objects are freed on the wrong
              node? 0.1% or 20%?
      fd1425d5
    • Andrew Morton's avatar
      [PATCH] slab: uninline poisoning checks · 1aabbecc
      Andrew Morton authored
      remove inline from the cache poison checks: the functions are not
      performance critical.
      1aabbecc
    • Andrew Morton's avatar
      [PATCH] slab: cleanups and speedups · cad9cd51
      Andrew Morton authored
      - enable the cpu array for all caches
      
      - remove the optimized implementations for quick list access - with
        cpu arrays in all caches, the list access is now rare.
      
      - make the cpu arrays mandatory, this removes 50% of the conditional
        branches from the hot path of kmem_cache_alloc [1]
      
      - poisoning for objects with constructors
      
      Patch got a bit longer...
      
      I forgot to mention this: head arrays mean that some pages can be
      blocked due to objects in the head arrays, and not returned to
      page_alloc.c.  The current kernel never flushes the head arrays, this
      might worsen the behaviour of low memory systems.  The hunk that
      flushes the arrays regularly comes next.
      
      Details changelog: [to be read site by side with the patch]
      
      * docu update
      
      * "growing" is not really needed: races between grow and shrink are
        handled by retrying.  [additionally, the current kernel never
        shrinks]
      
      * move the batchcount into the cpu array:
      	the old code contained a race during cpu cache tuning:
      		update batchcount [in cachep] before or after the IPI?
      	And NUMA will need it anyway.
      
      * bootstrap support: the cpu arrays are really mandatory, nothing
        works without them.  Thus a statically allocated cpu array is needed
        to for starting the allocators.
      
      * move the full, partial & free lists into a separate structure, as a
        preparation for NUMA
      
      * structure reorganization: now the cpu arrays are the most important
        part, not the lists.
      
      * dead code elimination: remove "failures", nowhere read.
      
      * dead code elimination: remove "OPTIMIZE": not implemented.  The
        idea is to skip the virt_to_page lookup for caches with on-slab slab
        structures, and use (ptr&PAGE_MASK) instead.  The details are in
        Bonwicks paper.  Not fully implemented.
      
      * remove GROWN: kernel never shrinks a cache, thus grown is
        meaningless.
      
      * bootstrap: starting the slab allocator is now a 3 stage process:
      	- nothing works, use the statically allocated cpu arrays.
      	- the smallest kmalloc allocator works, use it to allocate
      		cpu arrays.
      	- all kmalloc allocators work, use the default cpu array size
      
      * register a cpu nodifier callback, and allocate the needed head
        arrays if a new cpu arrives
      
      * always enable head arrays, even for DEBUG builds.  Poisoning and
        red-zoning now happens before an object is added to the arrays.
        Insert enable_all_cpucaches into cpucache_init, there is no need for
        seperate function.
      
      * modifications to the debug checks due to the earlier calls of the
        dtor for caches with poisoning enabled
      
      * poison+ctor is now supported
      
      * squeezing 3 objects into a cacheline is hopeless, the FIXME is not
        solvable and can be removed.
      
      * add additional debug tests: check_irq_off(), check_irq_on(),
        check_spinlock_acquired().
      
      * move do_ccupdate_local nearer to do_tune_cpucache.  Should have
        been part of -04-drain.
      
      * additional objects checks.  red-zoning is tricky: it's implemented
        by increasing the object size by 2*BYTES_PER_WORD.  Thus
        BYTES_PER_WORD must be added to objp before calling the destructor,
        constructor or before returing the object from alloc.  The poison
        functions add BYTES_PER_WORD internally.
      
      * create a flagcheck function, right now the tests are duplicated in
        cache_grow [always] and alloc_debugcheck_before [DEBUG only]
      
      * modify slab list updates: all allocs are now bulk allocs that try
        to get multiple objects at once, update the list pointers only at the
        end of a bulk alloc, not once per alloc.
      
      * might_sleep was moved into kmem_flagcheck.
      
      * major hotpath change:
      	- cc always exists, no fallback
      	- cache_alloc_refill is called with disabled interrupts,
      	  and does everything to recover from an empty cpu array.
      	  Far shorter & simpler __cache_alloc [inlined in both
      	  kmalloc and kmem_cache_alloc]
      
      * __free_block, free_block, cache_flusharray: main implementation of
        returning objects to the lists.  no big changes, diff lost track.
      
      * new debug check: too early kmalloc or kmem_cache_alloc
      
      * slightly reduce the sizes of the cpu arrays: keep the size < a
        power of 2, including batchcount, avail and now limit, for optimal
        kmalloc memory efficiency.
      
      That's it.  I even found 2 bugs while reading: dtors and ctors for
      verify were called with wrong parameters, with RED_ZONE enabled, and
      some checks still assumed that POISON and ctor are incompatible.
      cad9cd51
    • Andrew Morton's avatar
      [PATCH] slab: remove spaces from /proc identifiers · 5bbb9ea6
      Andrew Morton authored
      From Manfred Spraul
      
      remove the space from the name of the DMA caches: they make it
      impossible to tune the caches through /proc/slabinfo, and make parsing
      /proc/slabinfo difficult
      5bbb9ea6
    • Andrew Morton's avatar
      [PATCH] slab: take the spinlock in the drain function. · fa652753
      Andrew Morton authored
      In 2.5, local_irq_disable() provides protection against
      smp_call_function() on all architectures.  (Or it will, not sure.  But
      davem says this is OK).
      
      So a spin_lock() within the smp_call_function() callback is now
      permitted, and we can remove/cleanup the workaround.
      fa652753
    • Andrew Morton's avatar
      [PATCH] slab: reduce internal fragmentation · 69e74939
      Andrew Morton authored
      From Manfred Spraul
      
      If an object is freed from a slab, then move the slab to the tail of
      the partial list - this should increase the probability that the other
      objects from the same page are freed, too, and that a page can be
      returned to gfp later.
      
      In other words: if we just freed an object from this page then make
      this page be the *last* page which is eligible for new allocations.
      Under the assumption that other objects in that same page are about to
      be freed up as well.
      
      The cpu arrays are now always in front of the list, i.e.  cache hit
      rates should not matter.
      69e74939
    • Andrew Morton's avatar
      [PATCH] slab: enable the cpu arrays on uniprocessor · 23797198
      Andrew Morton authored
      From Manfred Spraul
      
      Always enable the cpu arrays, even on uniprocessor.
      
      They provide LIFO ordering, which should improve cache hit rates.  And
      the array allocator is slightly faster than the list operations.
      23797198
    • Andrew Morton's avatar
      [PATCH] slab: cleanup: rename static functions · 91767dfd
      Andrew Morton authored
      From Manfred Spraul
      
      remove kmem_ from all static function that are only used in slab.c.
      Except kmem_cache_slabmgmt, I've renamed it to alloc_slabmgmt().
      91767dfd
    • Andrew Morton's avatar
      [PATCH] slab: add_timer_on: add a timer on a particular CPU · 22331dad
      Andrew Morton authored
      add_timer_on is like add_timer, except it takes a target CPU on which
      to add the timer.
      
      The slab code needs per-cpu timers for shrinking the per-cpu caches.
      22331dad
    • Andrew Morton's avatar
      [PATCH] slab: extended cpu notifiers · 706489d8
      Andrew Morton authored
      Patch from Dipankar Sarma  <dipankar@in.ibm.com>
      
      This is Manfred's patch which provides a CPU_UP_PREPARE cpu notifier to
      allow initialization of per_cpu data just before the cpu becomes fully
      functional.
      
      It also provides a facility for the CPU_UP_PREPARE handler to return
      NOTIFY_BAD to signify that the CPU is not permitted to come up.  If
      that happens, a CPU_UP_CANCELLED message is passed to all the handlers.
      
      The patch also fixes a bogus NOFITY_BAD return from the softirq setup
      code.
      
      Patch has been acked by Rusty.
      
      We need this mechanism in slab for starting per-cpu timers and for
      allocating the per-cpu slab hgead arrays *before* the CPU has come up
      and started using slab.
      706489d8
    • Matthew Wilcox's avatar
      [PATCH] misc PA updates · c51fcfae
      Matthew Wilcox authored
       - Remove obsolete documentation
       - Update arch/parisc/lib
       - Remove arch/parisc/tools, we use asm-offsets.c these days
       - Update arch/parisc/Makefile, defconfig & vmlinux.lds.S
      c51fcfae
    • Matthew Wilcox's avatar
      [PATCH] parisc64 · ff3f38bc
      Matthew Wilcox authored
      Add support for the parisc64 architecture.
      ff3f38bc
    • Matthew Wilcox's avatar
      [PATCH] perf monitor for PA-RISC · 23d66173
      Matthew Wilcox authored
      Performance monitor support for PA8000+ processors.
      23d66173
    • Matthew Wilcox's avatar
      [PATCH] arch/parisc/kernel · 82430821
      Matthew Wilcox authored
      Update arch/parisc/kernel.
      82430821
    • Matthew Wilcox's avatar
      [PATCH] arch/parisc/mm · 6b3efc2a
      Matthew Wilcox authored
      Update arch/parisc/mm
      6b3efc2a
    • Matthew Wilcox's avatar
      [PATCH] include/asm-parisc · 1e0b058c
      Matthew Wilcox authored
      Update include/asm-parisc
      1e0b058c
    • Matthew Wilcox's avatar
      [PATCH] PA-RISC math emu · db299c0d
      Matthew Wilcox authored
      Add support for unimplemented FP ops on PA processors.
      db299c0d
    • Stelian Pop's avatar
      [PATCH] sonypi driver update · af4d0bf6
      Stelian Pop authored
      This patch adds some new events to the sonypi driver (Fn key
      pressed alone, jogdial turned fast or very fast) and cleanups
      the code a little bit.
      
      Thanks to Christian Gennerat for this contribution.
      af4d0bf6
    • Linus Torvalds's avatar
      Merge penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/epoll-0.15 · 0fca8365
      Linus Torvalds authored
      into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
      0fca8365
    • Davide Libenzi's avatar
      [PATCH] sys_epoll 0.15 · f751cfc0
      Davide Libenzi authored
      Latest version of the epoll interfaces.
      f751cfc0
    • Linus Torvalds's avatar
      Merge bk://ldm.bkbits.net/linux-2.5-kobject · ead7001e
      Linus Torvalds authored
      into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
      ead7001e
    • Patrick Mochel's avatar
      Merge osdl.org:/home/mochel/src/kernel/devel/linux-2.5-virgin · b835de74
      Patrick Mochel authored
      into osdl.org:/home/mochel/src/kernel/devel/linux-2.5-kobject
      b835de74
    • Patrick Mochel's avatar
      kobjects: add array of default attributes to subsystems, and create on registration. · c3f575f0
      Patrick Mochel authored
      struct subsystem may now contain a pointer to a NULL-terminated array of 
      default attributes to be exported when an object is registered with the subsystem.
      kobject registration will check the return values of the directory creation and 
      the creation of each file, and handle it appropriately. 
      
      The documentation has also been updated.
      c3f575f0
    • Patrick Mochel's avatar
      sysfs: kill struct sysfs_dir. · 332ad69d
      Patrick Mochel authored
      Previously, sysfs read() and write() calls looked for sysfs_ops in the struct 
      sysfs_dir, in the kobject. Since objects belong to a subsystem, and is a member
      of a group of like devices, the sysfs_ops have been moved to struct subsystem,
      and are referenced from there.
      
      The only remaining member of struct sysfs_dir is the dentry of the object's 
      directory. That is moved out of the dir struct and directly into struct kobject.
      That saves us 4 bytes/object.
      
      All of the sysfs functions that referenced the struct have been changed to just
      reference the dentry.
      332ad69d
    • Linus Torvalds's avatar
      Merge penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/kconfig · 6b668e88
      Linus Torvalds authored
      into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
      6b668e88