Commits · 631709da2279701e966f8b0b749b9d121d2325a9 · nexedi / linux

31 Oct, 2002 35 commits

[PATCH] speedup heuristic for get_unmapped_area · 631709da

Andrew Morton authored Oct 31, 2002

[I was going to send shared pagetables today, but it failed in
my testing under X :( ]

the first one is an mmap inefficiency that was reported by Saurabh Desai.
The test_str02 NPTL test-utility does the following: it tests the maximum
number of threads by creating a new thread, which thread creates a new
thread itself, etc. It basically creates thousands of parallel threads,
which means thousands of thread stacks.

NPTL uses mmap() to allocate new default thread stacks - and POSIX
requires us to install a 'guard page' as well, which is done via
mprotect(PROT_NONE) on the first page of the stack. This means that tons
of NPTL threads means 2* tons of vmas per MM, all allocated in a forward
fashion starting at the virtual address of 1 GB (TASK_UNMAPPED_BASE).

Saurabh reported a slowdown after the first couple of thousands of
threads, which i can reproduce as well. The reason for this slowdown is
the get_unmapped_area() implementation, which tries to achieve the most
compact virtual memory allocation, by searching for the vma at
TASK_UNMAPPED_BASE, and then linearly searching for a hole. With thousands
of linearly allocated vmas this is an increasingly painful thing to do ...

obviously, high-performance threaded applications will create stacks
without the guard page, which triggers the anon-vma merging code so we end
up with one large vma, not tons of small vmas.

it's also possible for userspace to be smarter by setting aside a stack
space and keeping a bitmap of allocated stacks and using MAP_FIXED (this
also enables it to do the guard page not via mprotect() but by keeping the
stacks apart by 1 page - ie. half the number of vmas) - but this also
decreases flexibility.

So i think that the default behavior nevertheless makes sense as well, so
IMO we should optimize it in the kernel.

there are various solutions to this problem, none of which solve the
problem in a 100% sufficient way, so i went for the simplest approach: i
added code to cache the 'last known hole' address in mm->free_area_cache,
which is used as a hint to get_unmapped_area().

this fixed the test_str02 testcase wonderfully, thread creation
performance for this testcase is O(1) again, but this simpler solution
obviously has a number of weak spots, and the (unlikely but possible)
worst-case is quite close to the current situation. In any case, this
approach does not sacrifice the perfect VM compactness out mmap()
implementation achieves, so it's a performance optimization with no
externally visible consequences.

The most generic and still perfectly-compact VM allocation solution would
be to have a vma tree for the 'inverse virtual memory space', ie. a tree
of free virtual memory ranges, which could be searched and iterated like
the space of allocated vmas. I think we could do this by extending vmas,
but the drawback is larger vmas. This does not save us from having to scan
vmas linearly still, because the size constraint is still present, but at
least most of the anon-mmap activities are constant sized. (both malloc()
and the thread-stack allocator uses mostly fixed sizes.)

This patch contains some fixes from Dave Miller - on some architectures
it is not posible to evaluate TASK_UNMAPPED_BASE at compile-time.

631709da

[PATCH] Orlov block allocator for ext2 · b2205dc0

Andrew Morton authored Oct 31, 2002

This is Al's implementation of the Orlov block allocator for ext2.

At least doubles the throughput for the traverse-a-kernel-tree
test and is well tested.

I still need to do the ext3 version.

No effort has been put into tuning it at this time, so more gains
are probably possible.

b2205dc0

Merge bk://ldm.bkbits.net/linux-2.5-kobject · 4856e09e
Linus Torvalds authored Oct 31, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
4856e09e

kobject: don't create directory for kobject/subsystem if name is NULL. · b053262f

Patrick Mochel authored Oct 31, 2002

This allows subsystems to exist the hierarchy, but not be exported via
the filesystem. This fixes a minor flaw with partitions, as partition
objects are children of block devices, though they register with the
partition subsystem. Really, the partition subsystem shouldn't have
presence in the tree at all, yet still exist.

b053262f

Merge http://gkernel.bkbits.net/alpha-2.5 · 1baa95c5
Linus Torvalds authored Oct 31, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
1baa95c5
Merge bk://ldm.bkbits.net/linux-2.5-kobject · 6dc1ec37
Linus Torvalds authored Oct 31, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
6dc1ec37
Fix alpha build. · f32abcc0
Jeff Garzik authored Oct 31, 2002

f32abcc0
turn off kobject debugging by default. · 24555ac2
Patrick Mochel authored Oct 30, 2002

24555ac2
driverfs: die die die · 808897cf
Patrick Mochel authored Oct 30, 2002

808897cf
convert edd to use kobjects and sysfs. · 60211581
Patrick Mochel authored Oct 30, 2002

60211581
driver model: remove few remaining references to driverfs. · 293c14d9
Patrick Mochel authored Oct 30, 2002

293c14d9
make sure block device_init() is called before part_init(). · 69a304e3
Patrick Mochel authored Oct 30, 2002

69a304e3

acpi: convert to use kobjects and sysfs. · 8bebafe7

Patrick Mochel authored Oct 30, 2002

- replace driver_dir_entry in acpi_device with struct kobject.
- register acpi with firmware subsystem on startup.
- register sub-subsystem.
- put namespace hierarchy under that.

8bebafe7

create firmware subsystem and register it on startup. · c408284c
Patrick Mochel authored Oct 30, 2002

c408284c
Merge bk://extfs.bkbits.net/extfs-2.5-update · 6e6e099b
Linus Torvalds authored Oct 30, 2002
```
into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
```
6e6e099b