- 15 Aug, 2002 24 commits
-
-
Albert Cranford authored
Pleaase reverse deadlocking change to i2c-elektor.c
-
Andrew Morton authored
The remaining source of page-at-a-time activity against pagemap_lru_lock is the anonymous pagefault path, which cannot be changed to operate against multiple pages at a time. But what we can do is to batch up just its adding of pages to the LRU, via buffering and deferral. This patch is based on work from Bill Irwin. The patch changes lru_cache_add to put the pages into a per-CPU pagevec. They are added to the LRU 16-at-a-time. And in the page reclaim code, purge the local CPU's buffer before starting. This is mainly to decrease the chances of pages staying off the LRU for very long periods: if the machine is under memory pressure, CPUs will spill their pages onto the LRU promptly. A consequence of this change is that we can have up to 15*num_cpus pages which are not on the LRU. Which could have a slight effect on VM accuracy, but I find that doubtful. If the system is under memory pressure the pages will be added to the LRU promptly, and these pages are the most-recently-touched ones - the VM isn't very interested in them anyway. This optimisation could be made SMP-specific, but I felt it best to turn it on for UP as well for consistency and better testing coverage.
-
Andrew Morton authored
Some fallout from the pagemap_lru_lock changes: - lru_cache_del() is no longer used. Kill it. - page_cache_release() almost never actually frees pages. So inline page_cache_release() and move its rarely-called slow path into (the misnamed) mm/swap.c - update the locking comment in filemap.c. pagemap_lru_lock used to be one of the outermost locks in the VM locking hierarchy. Now, we never take any other locks while holding pagemap_lru_lock. So it doesn't have any relationship with anything. - put_page() now removes pages from the LRU on the final put. The lock is interrupt safe.
-
Andrew Morton authored
It is expensive for a CPU to take an interrupt while holding the page LRU lock, because other CPUs will pile up on the lock while the interrupt runs. Disabling interrupts while holding the lock reduces contention by an additional 30% on 4-way. This is when the only source of interrupts is disk completion. The improvement will be higher with more CPUs and it will be higher if there is networking happening. The maximum hold time of this lock is 17 microseconds on 500 MHx PIII, which is well inside the kernel's maximum interrupt latency (which was 100 usecs when I last looked, a year ago). This optimisation is not needed on uniprocessor, but the patch disables IRQs while holding pagemap_lru_lock anyway, so it becomes an irq-safe spinlock, and pages can be moved from the LRU in interrupt context. pagemap_lru_lock has been renamed to _pagemap_lru_lock to pick up any missed uses, and to reliably break any out-of-tree patches which may be using the old semantics.
-
Andrew Morton authored
Convert all the bulk callers of lru_cache_del() to use the batched pagevec_lru_del() function. Change truncate_complete_page() to not delete the page from the LRU. Do it in page_cache_release() instead. (This reintroduces the problem with final-release-from-interrupt. THat gets fixed further on). This patch changes the truncate locking somewhat. The removal from the LRU now happens _after_ the page has been removed from the address_space and has been unlocked. So there is now a window where the shrink_cache code can discover the to-be-freed page via the LRU list. But that's OK - the page is clean, its buffers (if any) are clean. It's not attached to any mapping.
-
Andrew Morton authored
The patch goes through the various places which were calling lru_cache_add() against bulk pages and batches them up. Also. This whole patch series improves the behaviour of the system under heavy writeback load. There is a reduction in page allocation failures, some reduction in loss of interactivity due to page allocators getting stuck on writeback from the VM. (This is still bad though). I think it's due to the change here in mpage_writepages(). That function was originally unconditionally refiling written-back pages to the head of the inactive list. The theory being that they should be moved out of the way of page allocators, who would end up waiting on them. It appears that this simply had the effect of pushing dirty, unwritten data closer to the tail of the inactive list, making things worse. So instead, if the caller is (typically) balance_dirty_pages() then leave the pages where they are on the LRU. If the caller is PF_MEMALLOC then the pages *have* to be refiled. This is because VM writeback is clustered along mapping->dirty_pages, and it's almost certain that the pages which are being written are near the tail of the LRU. If they were left there, page allocators would block on them too soon. It would effectively become a synchronous write.
-
Andrew Morton authored
Makes mpage_writepages() move pages around on the LRU sixteen-at-a-time rather than one-at-a-time.
-
Andrew Morton authored
This patch multithreads the main page reclaim function, shrink_cache(). This function used to run under pagemap_lru_lock. Instead, we grab that lock, put 32 pages from the LRU into a private list, drop the pagemap_lru_lock and then proceed to attempt to free those pages. Any pages which were succesfully reclaimed are batch-freed. Pages which were not reclaimed are re-added to the LRU. This patch reduces pagemap_lru_lock contention on the 4-way by a factor of thirty. The shrink_cache() code has been simplified somewhat. refill_inactive() was being called too often - often just to process two or three pages. Fiddled with that so it processes pages at the same rate, but works on 32 pages at a time. Added a couple of mark_page_accessed() calls into mm/memory.c from 2.4. They seem appropriate. Change the shrink_caches() logic so that it will still trickle through the active list (via refill_inactive) even if the inactive list is much larger than the active list.
-
Andrew Morton authored
This is the first patch in a series of eight which address pagemap_lru_lock contention, and which simplify the VM locking hierarchy. Most testing has been done with all eight patches applied, so it would be best not to cherrypick, please. The workload which was optimised was: 4x500MHz PIII CPUs, mem=512m, six disks, six filesystems, six processes each flat-out writing a large file onto one of the disks. ie: heavy page replacement load. The frequency with which pagemap_lru_lock is taken is reduced by 90%. Lockmeter claims that pagemap_lru_lock contention on the 4-way has been reduced by 98%. Total amount of system time lost to lock spinning went from 2.5% to 0.85%. Anton ran a similar test on 8-way PPC, the reduction in system time was around 25%, and the reduction in time spent playing with pagemap_lru_lock was 80%. http://samba.org/~anton/linux/2.5.30/standard/ versus http://samba.org/~anton/linux/2.5.30/akpm/ Throughput changes on uniprocessor are modest: a 1% speedup with this workload due to shortened code paths and improved cache locality. The patches do two main things: 1: In almost all places where the kernel was doing something with lots of pages one-at-a-time, convert the code to do the same thing sixteen-pages-at-a-time. Take the lock once rather than sixteen times. Take the lock for the minimum possible time. 2: Multithread the pagecache reclaim function: don't hold pagemap_lru_lock while reclaiming pagecache pages. That function was massively expensive. One fallout from this work is that we never take any other locks while holding pagemap_lru_lock. So this lock conceptually disappears from the VM locking hierarchy. So. This is all basically a code tweak to improve kernel scalability. It does it by optimising the existing design, rather than by redesign. There is little conceptual change to how the VM works. This is as far as I can tweak it. It seems that the results are now acceptable on SMP. But things are still bad on NUMA. It is expected that the per-zone LRU and per-zone LRU lock patches will fix NUMA as well, but that has yet to be tested. This first patch introduces `struct pagevec', which is the basic unit of batched work. It is simply: struct pagevec { unsigned nr; struct page *pages[16]; }; pagevecs are used in the following patches to get the VM away from page-at-a-time operations. This patch includes all the pagevec library functions which are used in later patches.
-
Matthew Wilcox authored
nlmsvc_notify_blocked() is only called via the fl_notify() pointer which is only called immediately after we already did a locks_delete_block(), so calling posix_unblock_lock() here is always a NOP.
-
Dave Jones authored
This patch from Pat Mochel cleans up the hell that was mtrr.c into something a lot more modular and easy to understand, by doing the implementation-per-file as has been done to various other things by Pat and myself over the last months. It's functionally identical from a kernel internal point of view, and a userspace point of view, and is basically just a very large code clean up.
-
Ingo Molnar authored
one of the debugging tests triggered a false-positive BUG() when a detached thread was straced.
-
Ingo Molnar authored
it is much cleaner to pass in the address of the user-space VM lock - this will also enable arbitrary implementations of the stack-unlock, as the fifth clone() parameter.
-
Rusty Russell authored
It's referenced by mips and mips64 (both far out of date), but never actually defined anywhere.
-
http://linuxusb.bkbits.net/linus-2.5Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
Petr Vandrovec authored
Update ES1371 to new synchronize_irq() API.
-
Petr Vandrovec authored
line_length, type and visual moved from display struct to the fb_info's fix structure during last fbdev updates. Unfortunately generic code was not updated together, so now every fbdev driver is broken.
-
Petr Vandrovec authored
Characters 0x80-0x9F from ISO encodings are U+0080-U+009F, so map them both ways. Otherwise you cannot use chars 0x80-0x9F in filenames on filesystems using NLS.
-
http://linux-scsi.bkbits.net/scsi-for-linus-2.5Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
bk://ldm.bkbits.net/linux-2.5Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
Matthew Wilcox authored
We don't need to reenable interrupts before calling panic.
-
Alexander Viro authored
-
Alexander Viro authored
-
Alexander Viro authored
-
- 14 Aug, 2002 16 commits
-
-
James Bottomley authored
into mulgrave.(none):/home/jejb/BK/scsi-for-linus-2.5
-
Greg Kroah-Hartman authored
-
ssh://linux-scsi@linux-scsi.bkbits.net/scsi-for-linus-2.5James Bottomley authored
into mulgrave.(none):/home/jejb/BK/scsi-for-linus-2.5
-
David Brownell authored
Moves some functions that are only used by usbfs to be private, and documents some of the interface issues that need to be cleaned up.
-
Greg Kroah-Hartman authored
-
James Bottomley authored
-
James Bottomley authored
-
James Bottomley authored
into mulgrave.(none):/home/jejb/BK/scsi-for-linus-2.5
-
James Bottomley authored
-
Douglas Gilbert authored
support using work done by Kai Makisara (on st driver, posted 2002/7/29). Changelog: Changes since 3.5.26 (20020708) - re-add direct IO using Kai Makisara's work - re-tab to 8, start using C99-isms - simplify memory management Like Kai's patch, this one needs kernel/ksyms.c altered to export get_user_pages(). Kai's worker routines st_map_user_pages() and st_unmap_user_pages() are duplicated as is. Hopefully these routines will find a home in a library soon. The re-tabbing makes the patches rather large so here are 2 urls: This tarball contains sg.h and sg.c http://www.torque.net/sg/p/sg3527.tgz This gzipped patch is against lk 2.5.31 and touches kernel/ksyms.c as well http://www.torque.net/sg/p/sg_3527_lk2531.diff.gz Testing is ongoing, everything works apart from "zero copy" copy. That uses mmap-ed IO on the read side and direct IO on the write side. Not too many people would be using that I suspect. Doug Gilbert
-
Douglas Gilbert authored
Linus, Below is a patch to a file that documents the interface between the scsi mid level and lower level (HBA) drivers. The main change is documenting "autosense". bios_param()'s interface has changed. Doug Gilbert
-
Douglas Gilbert authored
support for per driver parameters added in lk 2.5.31 1.62 changes: - driverfs support for these options (more to come): /driverfs/bus/scsi/drivers/scsi_debug/delay [rw] /driverfs/bus/scsi/drivers/scsi_debug/num_devs [r] /driverfs/bus/scsi/drivers/scsi_debug/opts [rw] - start using some C99 - fdisk requires EINVAL from unsupported ioctls (scsi_debug previously used ENOTTY) 1.61 changes: - simulate delayed responses, controlled by 'scsi_debug_delay' - support REPORT LUNS - support more MODE SENSE pages - [following Doug Ledford's suggestion] do autosense (i.e. set Scsi_Cmnd::sense_buffer array appropriately when a status of CHECK CONDITION is set) - minor driverfs support - start adding error injection logic, see "scsi_debug_every_nth" Doug Gilbert
-
Patrick Mochel authored
-
Patrick Mochel authored
The device_root device was only a placeholder device that provided a head for the global device list, and a parent directory for root bridge devices. This removes the device and replaces with an explicit global_device_list and a separate root directory. We never used any of the other fields in device_root, and we special cased it. So, it's better off dead.
-
Patrick Mochel authored
-
Patrick Mochel authored
-