- 16 Sep, 2002 12 commits
-
-
Jens Axboe authored
elevator_linus is seriously broken wrt accounting. Marcelo recently took the patch to fix it in 2.4.20-pre, here's the 2.5 equiv. Right now, we account merges as costly and seeks as not. Only thing that prevents seek starvation is the aging scan. That is broken, very much so. This patch fixes that to account merges and inserts differently. A seek is ELV_LINUS_SEEK_COST more costly than a merge, currently that define is at '16'. Doing the math on a disk, this sort of makes sense. Defaults are read latency of 1024, which means 1024 merges or 64 seeks. Writes are double that.
-
Jens Axboe authored
We are currently wasting ~2MiB on the bio pools. This is ok on systems with plenty of ram, but it's too much for a 16mb system for instance. This patch scales the bio_vec mempool sizes a bit. The logic is mainly: + megabytes = nr_free_pages() >> (20 - PAGE_SHIFT); + if (megabytes <= 16) + scale = 0; + else if (megabytes <= 32) + scale = 1; + else if (megabytes <= 64) + scale = 2; + else if (megabytes <= 96) + scale = 3; + else if (megabytes <= 128) + scale = 4; and then for mempool setup: + if (i >= scale) + pool_entries >>= 1; + + bp->pool = mempool_create(pool_entries, slab_pool_alloc, slab_pool_free, bp->slab); So we allocate less and less entries for the bigger sized pools. It doesn't make too much sense to fill the memory with sg tables for 256 page entries on a 16mb system. In addition, we select a starting nr_pool_entries point, based on amount of ram as well: + pool_entries = megabytes * 2; + if (pool_entries > 256) + pool_entries = 256; The end-result is that on a 128mb system, it looks like: BIO: pool of 256 setup, 14Kb (56 bytes/bio) biovec pool[0]: 1 bvecs: 244 entries (12 bytes) biovec pool[1]: 4 bvecs: 244 entries (48 bytes) biovec pool[2]: 16 bvecs: 244 entries (192 bytes) biovec pool[3]: 64 bvecs: 244 entries (768 bytes) biovec pool[4]: 128 bvecs: 122 entries (1536 bytes) biovec pool[5]: 256 bvecs: 61 entries (3072 bytes) ie a total of ~620KiB used. Booting with mem=32m gives us: BIO: pool of 256 setup, 14Kb (56 bytes/bio) biovec pool[0]: 1 bvecs: 56 entries (12 bytes) biovec pool[1]: 4 bvecs: 28 entries (48 bytes) biovec pool[2]: 16 bvecs: 14 entries (192 bytes) biovec pool[3]: 64 bvecs: 7 entries (768 bytes) biovec pool[4]: 128 bvecs: 3 entries (1536 bytes) biovec pool[5]: 256 bvecs: 1 entries (3072 bytes) ie a total of ~31KiB. Booting with 512mb makes it: BIO: pool of 256 setup, 14Kb (56 bytes/bio) biovec pool[0]: 1 bvecs: 256 entries (12 bytes) biovec pool[1]: 4 bvecs: 256 entries (48 bytes) biovec pool[2]: 16 bvecs: 256 entries (192 bytes) biovec pool[3]: 64 bvecs: 256 entries (768 bytes) biovec pool[4]: 128 bvecs: 256 entries (1536 bytes) biovec pool[5]: 256 bvecs: 256 entries (3072 bytes) which is the same as before. The cut-off point is somewhere a bit over 256mb. Andrew suggested we may want to 'cheat' a bit here, and leave the busy pools alone. We know that mpage is going to be heavy on the 16 entry pool, so it migh make sense to make such a pool and not scale that. We can deal with that later, though.
-
Jens Axboe authored
-
Jens Axboe authored
-
Jens Axboe authored
-
Jens Axboe authored
old pdc202xx.c
-
Jens Axboe authored
-
Jens Axboe authored
-
Jens Axboe authored
-
Jens Axboe authored
-
Jens Axboe authored
-
Jens Axboe authored
-
- 15 Sep, 2002 14 commits
-
-
David Gibson authored
Linus, please apply. This defines wait_task_inactive() to be a no-op on UP machines, and removes the #ifdef CONFIG_SMP which surrounds current calls. This also fixes compile on UP which was broken by the addition of a call to wait_task_inactive in fs/exec.c which was not protected by an #ifdef.
-
Andrew Morton authored
- Remove defunct active_list/inactive_list declarations (wli) - Update an obsolete comment (wli) - "mm/slab.c contains one leftover from the initial version with 'unsigned short' bufctl entries. The attached patch replaces '2' with the correct sizeof [which is now 4]" - Manfred Spraul - BUG checks for vfree/vunmap being called in interrupt context (because they take irq-unsafe spinlocks, I guess?) - davej - Simplify some coding in one_highpage_init() (Christoph Hellwig).
-
Andrew Morton authored
From Christoph Hellwig, also present in 2.4. Create an arch-independent `dump_stack()' function. So we don't need to do #ifdef CONFIG_X86 show_stack(0); /* No prototype in scope! */ #endif any more. The whole dump_stack() implementation is delegated to the architecture. If it doesn't provide one, there is a default do-nothing library function.
-
Andrew Morton authored
- Remove the temp /proc/meminfo stats - Make the mmu_gather_t be 2048 bytes again - Removed unused variable (Oleg Nesterov)
-
Andrew Morton authored
If a GFP_NOFS allocation is made when the ZONE_NORMAL inactive list is full of dirty or under-writeback pages, there is nothing the caller can do to force some page reclaim. The caller ends up getting oom-killed. - In mempool_alloc(), don't try to perform page reclaim again. Just go to sleep and wait for some elements to be returned to the pool. - In try_to_free_pages(): perform a single, short scan of the LRU and if that doesn't work, fail the allocation. GFP_NOFS allocators know how to handle that.
-
Andrew Morton authored
read_pages() is dropping the page refcount before running ->readpage(). Which just happens to work, because the page is in pagecache and locked. But it breaks under some unconventional things which reiser4 is doing, and it's better/safer/saner this way anyway.
-
Andrew Morton authored
Patch from Jani Monoses <jani@iv.ro> "This turns the remaining parts of ext3 to EXT3_SB and turns the latter from a macro to inline function which returns the generic_sbp field of u. linux/fs.h is not touched by this patch though. Intermezzo's three uses of ext3_sb are also not changed."
-
Andrew Morton authored
The patch adds a "Mapped" field to /proc/meminfo - tha amount of memory which is mapped into pagetables. This is a useful statistic to monitor when testing and observing the vitual memory system.
-
Andrew Morton authored
From Hugh Dickins. Fix a leak in the /proc/meminfo:ReverseMaps accounting.
-
Andrew Morton authored
Rohit Seth's ia32 huge tlb pages patch. Anton Blanchard took a look at this today; he seemed happy with it and said he could borrow bits.
-
Andrew Morton authored
The /proc/meminfo:Buffers statistic is quite useful - it tells us how effective we are being at caching filesystem metadata. For example, increases in this figure are a measure of success of the slablru and buffer_head-limitation patches. The patch resurrects buffermem accounting. The metric is calculated on-demand, via a walk of the blockdev hashtable.
-
Andrew Morton authored
zap_page_range and truncate are the two main latency problems in the VM/VFS. The radix-tree-based truncate grinds that into the dust, but no algorithmic fixes for pagetable takedown have presented themselves... Patch from Robert Love. Attached patch implements a low latency version of "zap_page_range()". Calls with even moderately large page ranges result in very long lock held times and consequently very long periods of non-preemptibility. This function is in my list of the top 3 worst offenders. It is gross. This new version reimplements zap_page_range() as a loop over ZAP_BLOCK_SIZE chunks. After each iteration, if a reschedule is pending, we drop page_table_lock and automagically preempt. Note we can not blindly drop the locks and reschedule (e.g. for the non-preempt case) since there is a possibility to enter this codepath holding other locks. ... I am sure you are familar with all this, its the same deal as your low-latency work. This patch implements the "cond_resched_lock()" as we discussed sometime back. I think this solution should be acceptable to you and Linus. There are other misc. cleanups, too. This new zap_page_range() yields latency too-low-to-benchmark: <<1ms.
-
Linus Torvalds authored
-
bk://ppc.bkbits.net/for-linus-ppcLinus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
- 16 Sep, 2002 10 commits
-
-
Paul Mackerras authored
-
Paul Mackerras authored
This gets rid of ide_request/free_irq, ide_get/release_lock, ide_check/request/release_region etc.
-
Paul Mackerras authored
-
Paul Mackerras authored
There is a perfectly good one in drivers/ide/ide-iops.c now.
-
Paul Mackerras authored
and add exit_group to the syscall table.
-
Paul Mackerras authored
-
Paul Mackerras authored
-
Paul Mackerras authored
-
Paul Mackerras authored
-
Paul Mackerras authored
into samba.org:/home/paulus/kernel/for-linus-ppc
-
- 15 Sep, 2002 4 commits
-
-
Ingo Molnar authored
The broadcast SIGKILL kept pending in the new thread as well, and killed it prematurely ...
-
Linus Torvalds authored
-
Ingo Molnar authored
This implements one of the last missing POSIX threading details - exec() semantics. Previous kernels had code that tried to handle it, but that code had a number of disadvantages: - it only worked if the exec()-ing thread was the thread group leader, creating an assymetry. This does not work if the thread group leader has exited already. - it was racy: it sent a SIGKILL to every thread in the group but did not wait for them to actually process the SIGKILL. It did a yield() but that is not enough. All 'other' threads have to finish processing before we can continue with the exec(). This adds the same logic, but extended with the following enhancements: - works from non-leader threads just as much as the thread group leader. - waits for all other threads to exit before continuing with the exec(). - reuses the PID of the group. It would perhaps be a more generic approach to add a new syscall, sys_ungroup() - which would do largely what de_thread() does in this patch. But it's not really needed now - posix_spawn() is currently implemented via starting a non-CLONE_THREAD helper thread that does a sys_exec(). There's no API currently that needs a direct exec() from a thread - but it could be created (such as pthread_exec_np()). It would have the advantage of not having to go through a helper thread, but the difference is minimal.
-
Ingo Molnar authored
This fixes one more exit-time resource accounting issue - and it's also a speedup and a thread-tree (to-be thread-aware pstree) visual improvement. In the current code we reparent detached threads to the init thread. This worked but was not very nice in ps output: threads showed up as being related to init. There was also a resource-accounting issue, upon exit they update their parent's (ie. init's) rusage fields - effectively losing these statistics. Eg. 'time' under-reports CPU usage if the threaded app is Ctrl-C-ed prematurely. The solution is to reparent threads to the group leader - this is now very easy since we have p->group_leader cached and it's also valid all the time. It's also somewhat faster for applications that use CLONE_THREAD but do not use the CLONE_DETACHED feature.
-