1. 16 Sep, 2002 12 commits
    • Jens Axboe's avatar
      [PATCH] fix elevator_linus accounting · 88c42974
      Jens Axboe authored
      elevator_linus is seriously broken wrt accounting. Marcelo recently took
      the patch to fix it in 2.4.20-pre, here's the 2.5 equiv.
      
      Right now, we account merges as costly and seeks as not. Only thing that
      prevents seek starvation is the aging scan. That is broken, very much
      so. This patch fixes that to account merges and inserts differently. A
      seek is ELV_LINUS_SEEK_COST more costly than a merge, currently that
      define is at '16'. Doing the math on a disk, this sort of makes sense.
      
      Defaults are read latency of 1024, which means 1024 merges or 64 seeks.
      Writes are double that.
      88c42974
    • Jens Axboe's avatar
      [PATCH] limit size of bio_vec pools · 33ddb687
      Jens Axboe authored
      We are currently wasting ~2MiB on the bio pools. This is ok on systems
      with plenty of ram, but it's too much for a 16mb system for instance.
      
      This patch scales the bio_vec mempool sizes a bit. The logic is mainly:
      
      +       megabytes = nr_free_pages() >> (20 - PAGE_SHIFT);
      +       if (megabytes <= 16)
      +               scale = 0;
      +       else if (megabytes <= 32)
      +               scale = 1;
      +       else if (megabytes <= 64)
      +               scale = 2;
      +       else if (megabytes <= 96)
      +               scale = 3;
      +       else if (megabytes <= 128)
      +               scale = 4;
      
      and then for mempool setup:
      
      +               if (i >= scale)
      +                       pool_entries >>= 1;
      +
      +               bp->pool = mempool_create(pool_entries, slab_pool_alloc,
                                              slab_pool_free, bp->slab);
      
      So we allocate less and less entries for the bigger sized pools. It
      doesn't make too much sense to fill the memory with sg tables for 256
      page entries on a 16mb system.
      
      In addition, we select a starting nr_pool_entries point, based on amount
      of ram as well:
      
      +       pool_entries = megabytes * 2;
      +       if (pool_entries > 256)
      +               pool_entries = 256;
      
      The end-result is that on a 128mb system, it looks like:
      
      BIO: pool of 256 setup, 14Kb (56 bytes/bio)
      biovec pool[0]:   1 bvecs: 244 entries (12 bytes)
      biovec pool[1]:   4 bvecs: 244 entries (48 bytes)
      biovec pool[2]:  16 bvecs: 244 entries (192 bytes)
      biovec pool[3]:  64 bvecs: 244 entries (768 bytes)
      biovec pool[4]: 128 bvecs: 122 entries (1536 bytes)
      biovec pool[5]: 256 bvecs:  61 entries (3072 bytes)
      
      ie a total of ~620KiB used. Booting with mem=32m gives us:
      
      BIO: pool of 256 setup, 14Kb (56 bytes/bio)
      biovec pool[0]:   1 bvecs:  56 entries (12 bytes)
      biovec pool[1]:   4 bvecs:  28 entries (48 bytes)
      biovec pool[2]:  16 bvecs:  14 entries (192 bytes)
      biovec pool[3]:  64 bvecs:   7 entries (768 bytes)
      biovec pool[4]: 128 bvecs:   3 entries (1536 bytes)
      biovec pool[5]: 256 bvecs:   1 entries (3072 bytes)
      
      ie a total of ~31KiB. Booting with 512mb makes it:
      
      BIO: pool of 256 setup, 14Kb (56 bytes/bio)
      biovec pool[0]:   1 bvecs: 256 entries (12 bytes)
      biovec pool[1]:   4 bvecs: 256 entries (48 bytes)
      biovec pool[2]:  16 bvecs: 256 entries (192 bytes)
      biovec pool[3]:  64 bvecs: 256 entries (768 bytes)
      biovec pool[4]: 128 bvecs: 256 entries (1536 bytes)
      biovec pool[5]: 256 bvecs: 256 entries (3072 bytes)
      
      which is the same as before. The cut-off point is somewhere a bit over
      256mb. Andrew suggested we may want to 'cheat' a bit here, and leave the
      busy pools alone. We know that mpage is going to be heavy on the 16
      entry pool, so it migh make sense to make such a pool and not scale
      that. We can deal with that later, though.
      33ddb687
    • Jens Axboe's avatar
      ide.h needs to include pci.h · e80bc959
      Jens Axboe authored
      e80bc959
    • Jens Axboe's avatar
      04968341
    • Jens Axboe's avatar
    • Jens Axboe's avatar
      Update promise drivers to new ide pci init scheme, remove now unused · c8b74f4b
      Jens Axboe authored
      old pdc202xx.c
      c8b74f4b
    • Jens Axboe's avatar
      New IDE pci low level driver setup scheme · aa509d0d
      Jens Axboe authored
      aa509d0d
    • Jens Axboe's avatar
      Missing module_init() · a40bed1d
      Jens Axboe authored
      a40bed1d
    • Jens Axboe's avatar
      Move pio setup and blacklists to ide-lib · 84fb4308
      Jens Axboe authored
      84fb4308
    • Jens Axboe's avatar
      Missing exports · 7526c9af
      Jens Axboe authored
      7526c9af
    • Jens Axboe's avatar
      Make sure ide init happens in the right order · 65fce515
      Jens Axboe authored
      65fce515
    • Jens Axboe's avatar
      Cleanup Config.in, and remove unused options · 409d51dc
      Jens Axboe authored
      409d51dc
  2. 15 Sep, 2002 14 commits
    • David Gibson's avatar
      [PATCH] Remove CONFIG_SMP around wait_task_inactive() · 6865038a
      David Gibson authored
      Linus, please apply.  This defines wait_task_inactive() to be a no-op
      on UP machines, and removes the #ifdef CONFIG_SMP which surrounds
      current calls.
      
      This also fixes compile on UP which was broken by the addition of a
      call to wait_task_inactive in fs/exec.c which was not protected by an
      #ifdef.
      6865038a
    • Andrew Morton's avatar
      [PATCH] various small cleanups · 16b38746
      Andrew Morton authored
      - Remove defunct active_list/inactive_list declarations (wli)
      
      - Update an obsolete comment (wli)
      
      - "mm/slab.c contains one leftover from the initial version with
        'unsigned short' bufctl entries.  The attached patch replaces '2'
        with the correct sizeof [which is now 4]" - Manfred Spraul
      
      - BUG checks for vfree/vunmap being called in interrupt context
        (because they take irq-unsafe spinlocks, I guess?) - davej
      
      - Simplify some coding in one_highpage_init() (Christoph Hellwig).
      16b38746
    • Andrew Morton's avatar
      [PATCH] add dump_stack(): cross-arch backtrace · 5868caf6
      Andrew Morton authored
      From Christoph Hellwig, also present in 2.4.
      
      Create an arch-independent `dump_stack()' function.  So we don't need to do
      
      #ifdef CONFIG_X86
      	show_stack(0);		/* No prototype in scope! */
      #endif
      
      any more.
      
      The whole dump_stack() implementation is delegated to the architecture.
      If it doesn't provide one, there is a default do-nothing library
      function.
      5868caf6
    • Andrew Morton's avatar
      [PATCH] clean up the TLB takedown code, remove debug · 5045fffe
      Andrew Morton authored
      - Remove the temp /proc/meminfo stats
      
      - Make the mmu_gather_t be 2048 bytes again
      
      - Removed unused variable (Oleg Nesterov)
      5045fffe
    • Andrew Morton's avatar
      [PATCH] fix a bogus OOM condition for __GFP_NOFS allocations · 483a40e4
      Andrew Morton authored
      If a GFP_NOFS allocation is made when the ZONE_NORMAL inactive list is
      full of dirty or under-writeback pages, there is nothing the caller can
      do to force some page reclaim.  The caller ends up getting oom-killed.
      
      - In mempool_alloc(), don't try to perform page reclaim again.  Just
        go to sleep and wait for some elements to be returned to the pool.
      
      - In try_to_free_pages(): perform a single, short scan of the LRU and
        if that doesn't work, fail the allocation.  GFP_NOFS allocators know
        how to handle that.
      483a40e4
    • Andrew Morton's avatar
      [PATCH] hold the page ref across ->readpage · f3b3dc81
      Andrew Morton authored
      read_pages() is dropping the page refcount before running ->readpage().
      Which just happens to work, because the page is in pagecache and
      locked.
      
      But it breaks under some unconventional things which reiser4 is doing,
      and it's better/safer/saner this way anyway.
      f3b3dc81
    • Andrew Morton's avatar
      [PATCH] ext3 ceanup: use EXT3_SB · db748675
      Andrew Morton authored
      Patch from Jani Monoses <jani@iv.ro>
      
      "This turns the remaining parts of ext3 to EXT3_SB and turns the
       latter from a macro to inline function which returns the generic_sbp
       field of u.
      
       linux/fs.h is not touched by this patch though.
      
       Intermezzo's three uses of ext3_sb are also not changed."
      db748675
    • Andrew Morton's avatar
      [PATCH] add /proc/meminfo:Mapped · 73960360
      Andrew Morton authored
      The patch adds a "Mapped" field to /proc/meminfo - tha amount of memory
      which is mapped into pagetables.
      
      This is a useful statistic to monitor when testing and observing the
      vitual memory system.
      73960360
    • Andrew Morton's avatar
      [PATCH] fix reverse map accounting leak · 05d9bac3
      Andrew Morton authored
      From Hugh Dickins.  Fix a leak in the /proc/meminfo:ReverseMaps
      accounting.
      05d9bac3
    • Andrew Morton's avatar
      [PATCH] hugetlb pages · c9d3808f
      Andrew Morton authored
      Rohit Seth's ia32 huge tlb pages patch.
      
      Anton Blanchard took a look at this today; he seemed happy
      with it and said he could borrow bits.
      c9d3808f
    • Andrew Morton's avatar
      [PATCH] resurrect /proc/meminfo:Buffers · fca174cc
      Andrew Morton authored
      The /proc/meminfo:Buffers statistic is quite useful - it tells us
      how effective we are being at caching filesystem metadata.
      
      For example, increases in this figure are a measure of success of the
      slablru and buffer_head-limitation patches.
      
      The patch resurrects buffermem accounting.  The metric is calculated
      on-demand, via a walk of the blockdev hashtable.
      fca174cc
    • Andrew Morton's avatar
      [PATCH] low-latency zap_page_range · e572ef2e
      Andrew Morton authored
      zap_page_range and truncate are the two main latency problems
      in the VM/VFS.  The radix-tree-based truncate grinds that into
      the dust, but no algorithmic fixes for pagetable takedown have
      presented themselves...
      
      Patch from Robert Love.
      
      Attached patch implements a low latency version of "zap_page_range()".
      
      Calls with even moderately large page ranges result in very long lock
      held times and consequently very long periods of non-preemptibility.
      This function is in my list of the top 3 worst offenders.  It is gross.
      
      This new version reimplements zap_page_range() as a loop over
      ZAP_BLOCK_SIZE chunks.  After each iteration, if a reschedule is
      pending, we drop page_table_lock and automagically preempt.  Note we can
      not blindly drop the locks and reschedule (e.g. for the non-preempt
      case) since there is a possibility to enter this codepath holding other
      locks.
      
      ... I am sure you are familar with all this, its the same deal as your
      low-latency work.  This patch implements the "cond_resched_lock()" as we
      discussed sometime back.  I think this solution should be acceptable to
      you and Linus.
      
      There are other misc. cleanups, too.
      
      This new zap_page_range() yields latency too-low-to-benchmark: <<1ms.
      e572ef2e
    • Linus Torvalds's avatar
      Linux v2.5.35 · 697f3abe
      Linus Torvalds authored
      697f3abe
    • Linus Torvalds's avatar
      Merge bk://ppc.bkbits.net/for-linus-ppc · 11a5dbb4
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      11a5dbb4
  3. 16 Sep, 2002 10 commits
  4. 15 Sep, 2002 4 commits
    • Ingo Molnar's avatar
      [PATCH] thread exec fix, BK-curr · 71ee22d3
      Ingo Molnar authored
      The broadcast SIGKILL kept pending in the new thread as well, and killed
      it prematurely ...
      71ee22d3
    • Linus Torvalds's avatar
      9325c684
    • Ingo Molnar's avatar
      [PATCH] thread-exec-2.5.34-B1, BK-curr · 63540cea
      Ingo Molnar authored
      This implements one of the last missing POSIX threading details - exec()
      semantics.  Previous kernels had code that tried to handle it, but that
      code had a number of disadvantages:
      
       - it only worked if the exec()-ing thread was the thread group leader,
         creating an assymetry. This does not work if the thread group leader
         has exited already.
      
       - it was racy: it sent a SIGKILL to every thread in the group but did not
         wait for them to actually process the SIGKILL. It did a yield() but
         that is not enough. All 'other' threads have to finish processing
         before we can continue with the exec().
      
      This adds the same logic, but extended with the following enhancements:
      
       - works from non-leader threads just as much as the thread group leader.
      
       - waits for all other threads to exit before continuing with the exec().
      
       - reuses the PID of the group.
      
      It would perhaps be a more generic approach to add a new syscall,
      sys_ungroup() - which would do largely what de_thread() does in this
      patch.
      
      But it's not really needed now - posix_spawn() is currently implemented
      via starting a non-CLONE_THREAD helper thread that does a sys_exec().
      There's no API currently that needs a direct exec() from a thread - but
      it could be created (such as pthread_exec_np()).  It would have the
      advantage of not having to go through a helper thread, but the
      difference is minimal.
      63540cea
    • Ingo Molnar's avatar
      [PATCH] exit-fix-2.5.34-C0, BK-curr · 7cd0a691
      Ingo Molnar authored
      This fixes one more exit-time resource accounting issue - and it's also
      a speedup and a thread-tree (to-be thread-aware pstree) visual
      improvement.
      
      In the current code we reparent detached threads to the init thread.
      This worked but was not very nice in ps output: threads showed up as
      being related to init.  There was also a resource-accounting issue, upon
      exit they update their parent's (ie.  init's) rusage fields -
      effectively losing these statistics.  Eg.  'time' under-reports CPU
      usage if the threaded app is Ctrl-C-ed prematurely.
      
      The solution is to reparent threads to the group leader - this is now
      very easy since we have p->group_leader cached and it's also valid all
      the time.  It's also somewhat faster for applications that use
      CLONE_THREAD but do not use the CLONE_DETACHED feature.
      7cd0a691