- 12 Apr, 2003 20 commits
-
-
Andrew Morton authored
From Alex Tomas and myself It is identical in concept to the block allocator change. It uses the same hashed spinlock.
-
Andrew Morton authored
From Alex Tomas and myself ext2 currently uses lock_super() to protect the filesystem's in-core block allocation bitmaps. On big SMP machines the contention on that semaphore is causing high context switch rates, large amounts of idle time and reduced throughput. The context switch rate can also worsen block allocation: if several tasks are trying to allocate blocks inside the same blockgroup for different files, madly rotating between those tasks will cause the files' blocks to be intermingled. On SDET and dbench-style worloads (lots of tasks doing lots of allocation) this patch (and a similar one for the inode allocator) improve throughout on an 8-way by ~15%. On 16-way NUMAQ the speedup is 150%. What wedo isto remove the lock altogether and just rely on the atomic semantics of test_and_set_bit(): if the allocator sees a block was free it runs test_and_set_bit(). If that fails, then we raced and the allocator will go and look for another block. Of course, we don't really use test_and_set_bit() because that isn'tendian-dependent. New atomic endian-independent functions are introduced: ext2_set_bit_atomic() and ext2_clear_bit_atomic(). We do not need ext2_test_bit_atomic(), since even if ext2_test_bit() returns the wrong result, that error will be detected and naturally handled in the subsequent ext2_set_bit_atomic(). For little-endian machines the new atomic ops map directly onto the test_and_set_bit(), etc. For big-endian machines we provide the architecture's impementation with the address of a spinlock whcih can be taken around the nonatomic ext2_set_bit(). The spinlocks are hashed, and the hash is scaled according to the machine size. Architectures are free to implement optimised versions of ext2_set_bit_atomic() and ext2_clear_bit_atomic().
-
Andrew Morton authored
ext2 and ext3 per-blockgroup metadata needs locking. An fs-wide lock is expensive, and a per-blockgroup lock consumes too much storage (up to 32768 blockgroups per filesystem). We need something in-between. blockgroup_locks are very simple hashed spinlocks which provide this compromise. The size of the lock is scaled by NR_CPUS to implement an additional speed/space tradeoff. These locks are actually fairly generic. However I presented it as something which is specific to ext2 and ext3 so that people wouldn't go using them all over the place. They consume a lot of storage.
-
Andrew Morton authored
Several places in ext2 and ext3 are using filesystem-wide counters which use global locking. Mainly for the orlov allocator's heuristics. To solve the contention which this causes we can trade off accuracy against speed. This patch introduces a "percpu_counter" library type in which the counts are per-cpu and are periodically spilled into a global counter. Readers only read the global counter. These objects are *large*. On a 32 CPU P4, they are 4 kbytes. On a 4 way p3, 128 bytes.
-
Andrew Morton authored
From: Dave Hansen <haveblue@us.ibm.com> Documents the information in /proc/meminfo
-
Andrew Morton authored
From: Matt Porter <porter@cox.net> There was a thread a while back on lkml where Dave Hansen proposed this simple vmalloc usage reporting patch. The thread pretty much died out as most people seemed focused on what VM loading type bugs it could solve. I had posted that this type of information was really valuable in debugging embedded Linux board ports. A common example is where people do arch specific setup that limits there vmalloc space and then they find modules won't load. ;) Having the Vmalloc* info readily available is real useful in helping folks to fix their kernel ports.
-
Andrew Morton authored
From: David Mosberger <davidm@napali.hpl.hp.com> interrupts_open() can easily try to kmalloc() more memory than supported by kmalloc. E.g., with 16KB page size and NR_CPUS==64, it would try to allocate 147456 bytes. The workaround below is to allocate 4KB per 8 CPUs. Not really a solution, but the fundamental problem is that /proc/interrupts shouldn't use a fixed buffer size in the first place. I suppose another solution would be to use vmalloc() instead. It all feels like bandaids though.
-
Andrew Morton authored
From: Brian Gerst and David Mosberger The previous fix to the kmalloc_sizes[] array didn't null-terminate the correct array. Fix that up, and also avoid running ARRAY_SIZE() against an array which is really a null-terminated list.
-
Andrew Morton authored
From: Christoph Hellwig <hch@lst.de> This patch is from the IA64 tree, with minor cleanups from me. Split out initialization of pgdat->node_mem_map into a separate function and allow architectures to override it. This is needed for HP IA64 machines that have a virtually mapped memory map to support big memory holes without having to use discontigmem. (memmap_init_zone is non-static to allow the IA64 code to use it - I did that instead of passing it's address into the arch hook as it is done currently in the IA64 tree)
-
Andrew Morton authored
From: Christoph Hellwig <hch@lst.de> This patch is from the IA64 tree, with some minor cleanups by me. David described it as: This is a performance speed up and some minor indendation fixups. The problem is that the bootmem code is (a) hugely slow and (b) has execution that grow quadratically with the size of the bootmap bitmap. This causes noticable slowdowns, especially on machines with (relatively) large holes in the physical memory map. Issue (b) is addressed by maintaining the "last_success" cache, so that we start the next search from the place where we last found some memory (this part of the patch could stand additional reviewing/testing). Issue (a) is addressed by using find_next_zero_bit() instead of the slow bit-by-bit testing.
-
Andrew Morton authored
Time to write a 2M file, one byte at a time: Before: 1.09s user 4.92s system 99% cpu 6.014 total 0.74s user 5.28s system 99% cpu 6.023 total 1.03s user 4.97s system 100% cpu 5.991 total After: 0.79s user 5.17s system 99% cpu 5.993 total 0.79s user 5.17s system 100% cpu 5.957 total 0.84s user 5.11s system 100% cpu 5.942 total
-
Andrew Morton authored
From: David Mosberger <davidm@napali.hpl.hp.com> The patch below is needed to make it possible to map stack pages without execution permission (as we do on ia64).
-
Andrew Morton authored
If get_block() returns -ENOSPC __block_write_full_page() is currently clearing PG_uptodate. Tht doesn't make any sense - failure to allocate space (or an IO error) does not make the page not uptodate. It will create pages which are dirty, mapped into pagetables and not uptodate, which is a nonsensical state.
-
Andrew Morton authored
From: Jan Kara <jack@ucw.cz> Fixes a deadlock-causing lock-ranking bug between dqio_sem and journal_start(). It sets up the needed infrastructure so that the quota code's sync_dquot() operation can call into ext3 and arrange for the transaction start to be nested outside the taking of dqio_sem.
-
Andrew Morton authored
From: Hugh Dickins <hugh@veritas.com> This patch removes the long deprecated flush_page_to_ram. We have two different schemes for doing this cache flushing stuff, the old flush_page_to_ram way and the not so old flush_dcache_page etc. way: see DaveM's Documentation/cachetlb.txt. Keeping flush_page_to_ram around is confusing, and makes it harder to get this done right. All architectures are updated, but the only ones where it amounts to more than deleting a line or two are m68k, mips, mips64 and v850. I followed a prescription from DaveM (though not to the letter), that those arches with non-nop flush_page_to_ram need to do what it did in their clear_user_page and copy_user_page and flush_dcache_page. Dave is consterned that, in the v850 nb85e case, this patch leaves its flush_dcache_page as was, uses it in clear_user_page and copy_user_page, instead of making them all flush icache as well. That may be wrong: I'm just hesitant to add cruft blindly, changing a flush_dcache macro to flush icache too; and naively hope that the necessary flush_icache calls are already in place. Miles, please let us know which way is right for v850 nb85e - thanks.
-
Andrew Morton authored
I've had a warning in there for 4-5 months and it has never triggered. I think it's safe to remove this test.
-
Andrew Morton authored
From: Geert Uytterhoeven <geert@linux-m68k.org> It updates include/asm-{generic,parisc}/rtc.h for the recent changes in drivers/char/genrtc.c and include/asm-{m68k,ppc}/rtc.h. get_rtc_time() now returns some RTC flags instead of a 0/-1 success/failure indicator. These flags include: - RTC_BATT_BAD: RTC battery is bad (can be detected on PA-RISC) - RTC_24H: Clock runs in 24 hour mode Most of these flags are the same as drivers/char/rtc.c, but RTC_BATT_BAD is a new one.
-
Andrew Morton authored
radix_tree_delete() currently returns 0 on success, -ENOENT if there was nothing to delete. But it is more useful to return the address of the deleted item on success and NULL if there was no matching item. It can potentially save a lookup+delete operation.
-
Andrew Morton authored
- allocated storage `envp' was being leaked on an error path - kmalloc() returns void*, no need to cast it - don't return 0 from a void-returning function Greg has acked this patch.
-
Ben Collins authored
-
- 11 Apr, 2003 18 commits
-
-
David S. Miller authored
into kernel.bkbits.net:/home/davem/net-2.5
-
David S. Miller authored
-
David S. Miller authored
-
Stephen Hemminger authored
-
Stephen Hemminger authored
-
Andrew Morton authored
-
Linus Torvalds authored
-
George Anzinger authored
Noted by David Mosberger: "If someone happens to arm a periodic timer at exactly 256 jiffies (as ohci happens to do on platforms with HZ=1024), then you end up getting an endless loop of timer activations, causing a machine hang. The problem is that __run_timers updates base->timer_jiffies _before_ running the callback routines. If a callback re-arms the timer at exactly 256 jiffies, add_timers() will reinsert the timer into the list that we're currently processing, which of course will cause the timer to expire immediately again, etc., etc., ad naseum... " The answer here is to move the whole expired list to a local header and to not look back.
-
Ben Collins authored
- Convert nodemgr to new driver model. - Convert to new module_param() calls. - Merged fixes for devfs mkdir and some sleep-in-atomic fixes from mainline 2.5-bk - Fix possible memory corruption on highlevel local read/write. - Fix bitmap usage for some bitops. - Fix bug in closing ISO stream. - Fixes for nodemgr probing in the event of a reset storm. - Workaround for nForce2 firewire chipset. This is preliminary. - Conversion of SBP-2 to use new driver model in nodemgr, including providing a driver for firewire unit directories and registering proper callbacks.
-
bk://kernel.bkbits.net/gregkh/linux/linus-2.5Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
Greg Kroah-Hartman authored
into kroah.com:/home/greg/linux/BK/gregkh-2.5
-
Greg Kroah-Hartman authored
into kroah.com:/home/greg/linux/BK/i2c-2.5
-
Luca Tettamanti authored
-
Oliver Neukum authored
the driver should not mess with configurations here.
-
Oliver Neukum authored
there's no reason this driver should mess with configurations.
-
Linus Torvalds authored
Found by Rik van Riel: "There's a serious bug in the handling of the pointer returned by kmap_atomic() in nfs/dir.c. The pointer (part of desc) is passed into find_dirent_name and from there into dir_decode, which modifies the pointer. That means you end up passing a wrong address to kunmap_atomic()."
-
bk://kernel.bkbits.net/davem/net-2.5Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
bk://kernel.bkbits.net/davem/sparc-2.5Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
- 10 Apr, 2003 2 commits
-
-
David S. Miller authored
into kernel.bkbits.net:/home/davem/net-2.5
-
David S. Miller authored
-