- 10 May, 2004 40 commits
-
-
Andrew Morton authored
From: "Jose R. Santos" <jrsantos@austin.ibm.com> It alleviates some issues seen with Linux when accessing millions of files on machines with large amounts of RAM (+32GB). Both algorithms are base on some studies that Dominique Heger was doing on hash table efficiencies in Linux. The dentry hash table has been tested in small systems with one internal IDE hard disk as well as in large SMP with many fiberchanel disks. Dominique claims that in all the testing done, they did not see one case were this has function provided worst performance and that in most test they were seeing better performance. The inode hash function was done by me base on Dominique's original work and has only been stress tested with SpecSFS. It provided a 3% improvement over the default algorithm in the SpecSFS results and speed ups in the response time of almost all filesystem operations the benchmark stress. With the better distribution is as also possible to reduce the number of inode buckets for 32 million to 16 million and still get a slightly better results. Anton was nice enough to provide some graphs that show the distribution before and after the patch at http://samba.org/~anton/linux/sfs/1/ For the dentry hash function, some of my other coorkers had put this hash function through various testing and have concluded that the hash function was equal or better than the default hash function. These runs were done with a (hopefully to be Open Source soon) benchmark called FFSB which can simulate various io patters across many filesystems and variable file sizes. SpecSFS fileset is basically a lot of small file which varies depending on the size of the run. For a not so big SMP system the number of file is in the +20 Million files range. Of those 20 million files only 10% are access randomly by the client. The purpose of this is that the benchmark tries to stress not only the NFS layer but, VM and Filesystems layers as well. The filesets are also hundreds of gigabytes in size in order to promote disk head movement by guaranteeing cache misses in memory. SFS 27% of the workload are lookups __d_lookup has showing high in my profiles. For the inode hash the problem that I see is that when running a benchmark with this huge fileset we end up trying to free a lot of inode entries during the run while trying to put new entries in cache. We end up calling ifind_fast() which calls find_inodes_fast() held under inode_lock. In order to avoid holding the inode_lock we needed to avoid having long chains in that hash function. When I took a look at the original hash function, I found it to be a bit to simple for any workload. My solution (which I took advantage of Dominique's work) was to create a hash that function that could generate completely different hashes depending on the hashval and the superblock in order to have the hash scale as we added more filesystems to the machine. Both of these problems can be somewhat tuned out by increasing the number of buckets of both d and i cache but it got to a point were I had 256MB of inode and 128MB in dentry hash buckets on a not so large SMP. With the hash changes I have been able to reduce the number of buckets to 128MB for inode cache and to 32MB for dentry cache and still get better performance. If it help my case... I haven't been running this benchmark for long, so I haven't been able to find a way to cheat. I need to come up with generic solutions until I can find a cheat for the benchmark. :) SDET results: Steve Pratt seem to have a SDET setup already and he did me the favor of running SDET with a reduce dentry entry hash table size. I belive that his table suggest that less than 3% change is acceptable variability, but overall he got a 5% better number using the new hash algorith. A) x4408way1.sdet.2.6.5100000-8p.04-05-05_12.08.44 vs B) x4408way1.sdet.2.6.5+hash-100000-8p.04-05-05_11.48.02 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 1048576 (order: 10, 4194304 bytes) Results:Throughput tolerance = 0.00 + 3.00% of A A B Threads Ops/sec Ops/sec %diff diff tolerance ----------- ------------ ------------ -------- ------------ ------------ 1 4341.9300 4401.9500 1.38 60.02 130.26 2 8242.2000 8165.1200 -0.94 -77.08 247.27 4 15274.4900 15257.1000 -0.11 -17.39 458.23 8 21326.9200 21320.7000 -0.03 -6.22 639.81 16 23056.2100 24282.8000 5.32 1226.59 691.69 * 32 23397.2500 24684.6100 5.50 1287.36 701.92 * 64 23372.7600 23632.6500 1.11 259.89 701.18 128 17009.3900 16651.9600 -2.10 -357.43 510.28 =========================================================================
-
Andrew Morton authored
From: C.L. Tien <cltien@cmedia.com.tw> Current version from cmedia.
-
Andrew Morton authored
From: Matt Domsch <Matt_Domsch@dell.com> Clean up the edd.c driver. * use kobject_set_name() instead of snprintf() per GregKH's recommendation. * Add MODULE_VERSION() * s/driverfs/sysfs/ in Kconfig * Remove report URL message, as there have been too many BIOSs reported, virtually none of which are EDD-capable. This may return if/when I develop a better reporting method and database to capture/store the data from users. * Remove the unused code for creating a symlink to the scsi_device. This never worked right, and I'm going to show the relationship from a userspace tool which uses libsysfs instead.
-
Andrew Morton authored
kblockd is the thread which runs unplug functions, not keventd.
-
Andrew Morton authored
From: Rusty Russell <rusty@rustcorp.com.au> Only print the tainted message the first time. Its purpose is to warn users that we can't support them, not to fill their logs.
-
Andrew Morton authored
From: Paul Mackerras <paulus@samba.org> The patch below moves the ppc64 spinlocks and rwlocks out of line and into arch/ppc64/lib/locks.c, and implements _raw_spin_lock_flags for ppc64. Part of the motivation for moving the spinlocks and rwlocks out of line was that I needed to add code to the slow paths to yield the processor to the hypervisor on systems with shared processors. On these systems, a cpu as seen by the kernel is a virtual processor that is not necessarily running full-time on a real physical cpu. If we are spinning on a lock which is held by another virtual processor which is not running at the moment, we are just wasting time. In such a situation it is better to do a hypervisor call to ask it to give the rest of our time slice to the lock holder so that forward progress can be made. The one problem with out-of-line spinlock routines is that lock contention will show up in profiles in the spin_lock etc. routines rather than in the callers, as it does with inline spinlocks. I have added a CONFIG_SPINLINE config option for people that want to do profiling. In the longer term, Anton is talking about teaching the profiling code to attribute samples in the spin lock routines to the routine's caller. This patch reduces the kernel by about 80kB on my G5. With inline spinlocks selected, the kernel gets about 4kB bigger than without the patch, because _raw_spin_lock_flags is slightly bigger than _raw_spin_lock. This patch depends on the patch from Keith Owens to add _raw_spin_lock_flags.
-
Andrew Morton authored
From: Keith Owens <kaos@sgi.com> As requested by Linus, update all architectures to add the common infrastructure. Tested on ia64 and i386. Enable interrupts while waiting for a disabled spinlock, but only if interrupts were enabled before issuing spin_lock_irqsave(). This patch consists of three sections :- * An architecture independent change to call _raw_spin_lock_flags() instead of _raw_spin_lock() when the flags are available. * An ia64 specific change to implement _raw_spin_lock_flags() and to define _raw_spin_lock(lock) as _raw_spin_lock_flags(lock, 0) for the ASM_SUPPORTED case. * Patches for all other architectures and for ia64 with !ASM_SUPPORTED to map _raw_spin_lock_flags(lock, flags) to _raw_spin_lock(lock). Architecture maintainers can define _raw_spin_lock_flags() to do something useful if they want to enable interrupts while waiting for a disabled spinlock.
-
Andrew Morton authored
From: Alexey Dobriyan <adobriyan@mail.ru> Fix various kernel-doc parameters.
-
Andrew Morton authored
From: Alexey Dobriyan <adobriyan@mail.ru> DOCPROC Documentation/DocBook/parportbook.sgml Warning(drivers/parport/share.c:188): No description found for parameter 'drv' (kernel-doc parameter name is incorrect.)
-
Andrew Morton authored
From: Greg Aumann <Greg_Aumann@sil.org> This patch updates two error messages to reflect changes in the code.
-
Andrew Morton authored
From: "Theodore Ts'o" <tytso@mit.edu> This patch versus improves the output produced by "echo 1 > /proc/sys/vm/block_dump", in the following ways: 1) The messages are printed with KERN_DEBUG, so that even if sysklogd is running, if configured appropriately, it will not need to write to log files. 2) The inode which is dirtied by a process is now identified more precisely by inode number and filesystem ID, and by a dcache name if present. 3) In the generic filesystem sget function, the superblock id (s_id) is filled in with the filesystem type by default. Filesystems which are block-device based will override s_id, but this allows pseudo filesystems such as tmpfs, procfs, etc. to be identified in (2).
-
Andrew Morton authored
find_user() is being called from set/get_priority(), but it doesn't take the needed lock, and those callers were forgetting to drop the refcount which find_user() took.
-
Andrew Morton authored
From: Olaf Hering <olh@suse.de>
-
Andrew Morton authored
From: Chris Mason <mason@suse.com> From: Jeff Mahoney <jeffm@suse.com> Add device info to the various reiserfs warnings and panics so you can tell which filesystem triggers the message. Loosely based on code from Oleg Drokin.
-
Andrew Morton authored
From: Chris Mason <mason@suse.com> From: jeffm@suse.com reiserfs permission bug fix for xattrs
-
Andrew Morton authored
From: Chris Mason <mason@suse.com> ReiserFS support for quotas. Originally from Jan Kara
-
Andrew Morton authored
From: Chris Mason <mason@suse.com> From: jeffm@suse.com reiserfs xattr locking fixes
-
Andrew Morton authored
From: Chris Mason <mason@suse.com> From: jeffm@suse.com reiserfs support for selinux
-
Andrew Morton authored
From: Chris Mason <mason@suse.com> From: jeffm@suse.com reiserfs support for trusted xattrs
-
Andrew Morton authored
From: Chris Mason <mason@suse.com> From: jeffm@suse.com reiserfs acl support
-
Andrew Morton authored
From: Chris Mason <mason@suse.com> From: jeffm@suse.com reiserfs support for xattrs
-
Andrew Morton authored
From: Chris Mason <mason@suse.com> From: jeffm@suse.com properly init device inodes in the acl code
-
Andrew Morton authored
From: Bart Samwel <bart@samwel.tk> This patch from Micha Feigin fixes some bugs in the earlier reiserfs commit default patch. The changelog: * If you remounted without any commit=NNN option, it would assume commit=0 and restore the defaults. This patch makes it leave the current state alone if you don't pass commit=NNN. * Added range check for cast from unsigned long to unsigned int.
-
Andrew Morton authored
From: FabF <Fabian.Frederick@skynet.be> Use the pre-existing enum rather than magic numbers.
-
Andrew Morton authored
From: Burton Windle <bwindle@fint.org> Fix the 3c905C 10/100 transceiver initialisation woes.
-
Andrew Morton authored
Currently, shrink_slab() will decide that it needs to scan a certain number of dentries, will call shrink_dcache_memory() requesting that this be done, and shrink_dcache_memory() will simply bale out without doing anything because the caller did not have __GFP_FS. This has the potential to disrupt our lovely pagecache-vs-slab balancing act. So change things so that shrinker callouts can return -1, indicating that they baled out. This way, shrink_slab can remember that this slab was owed a certain number of scannings and these will be correctly performed next time a __GFP_FS caller comes by.
-
Andrew Morton authored
From: Andi Kleen <ak@suse.de> We still need some kind of early CPU detection, e.g. for the AMD768 workaround and for the slab allocator to size its slabs correctly for the cache line. Also some other code already had private early CPU routines. This patch takes a new approach compared to the previous patch which caused Andrew so much grief. It only fills in a few selected fields in boot_cpu_data (only the data needed to identify the CPU type and the cache alignment). In particular the feature masks are not filled in, and the other fields are also not touched to prevent unwanted side effects. Also convert the ppro workaround to use standard cpu data now. I'm not sure if slab still has the necessary support to use the cache line size early; previously Manfred showed some serious memory saving with this for kernels that are compiled for a bigger cache line size than the CPU (is often the case on distribution kernels). This code could be reenable now with this patch.
-
Andrew Morton authored
From: Anton Blanchard <anton@samba.org> Found a few warnings when compiling with NAPI off.
-
Andrew Morton authored
From: Coywolf Qi Hunt <coywolf@greatcn.org> Since "Direct booting from floppy is no longer supported", this patch is remove the bootsect_helper code from x86_64 and PC-9800.
-
Andrew Morton authored
From: Coywolf Qi Hunt <coywolf@greatcn.org> Since "Direct booting from floppy is no longer supported", this patch is to remove the bootsect_helper code. And also a comment fix. The other two platforms x86_64 and PC-9800 should also be cleaned up too.
-
Andrew Morton authored
From: "Prof. BJ" <prof.bj@freemail.hu> - m8xx_setup warning and mfmsr error fix - ppc8xx_pic include error fix - tqm8xxl.c typeing (syntax) error fix - commproc.c include error and prototype warning fix (acked by Matt Porter)
-
Andrew Morton authored
From: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com> The patch fixes a problem with ES7000 Server Management mechanism that uses platform register mip_port. It was not initialized, so the mechanism was not functional. The patch also fixes the APIC destination for hierarchical and flat cluster models used in ES7000. The destination ID's reflect policies for Cascade based systems which use logical delivery and lowest priority mechanism, and for xAPIC based models that use physical delivery and fixed APIC destinations. The patch also turns on NO_IOAPIC_CHECK (1) to avoid error messages and attempts to re-write the ID, because on ES7000 all ID's are hard coded in the BIOS and cannot be altered.
-
Andrew Morton authored
From: Arnd Bergmann <arnd@arndb.de> sys32_nfsservctl is the largest remaining syscall emulation handler that can be consolidated. mips and ia64 currently don't use this at all, parisc has a simpler implementation than the one used by s390, sparc ppc and that the new compat_sys_nfsservctl is based on. The user access checks in the code are inconsistant at least, which should be fixed here. Compile tested only due to lack of proper test setup.
-
Andrew Morton authored
From: Arnd Bergmann <arnd@arndb.de> sys32_select has seven mostly but not exactly identical versions, so consolidate them as compat_sys_select. Based on the ppc64 implementation, which most closely resembles sys_select. One bug that was not caught by LTP has been fixed since the first version of this patch. tested x86_64, ia64 and s390.
-
Andrew Morton authored
From: Arnd Bergmann <arnd@arndb.de> The code for sys32_execve/do_execve32 in most of the seven versions was copied from fs/exec.c but not kept up-to-date. The new compat_do_execve() function is based on the mips code and has been resync'ed with do_execve(). IA64 changes are from Arun Sharma. Tested on x86_64, ia64 and s390
-
Andrew Morton authored
From: Arnd Bergmann <arnd@arndb.de> The seven implementations of this have gone out of sync and are mostly buggy. The new compat_sys_* version is based on the ppc64 implementation, which most closely resembles the code in sys_readv/sys_writev. Tested on x86_64, ia64 and s390.
-
Andrew Morton authored
From: Nick Piggin <nickpiggin@yahoo.com.au> Without disturbing the read/write ratio, increase the bathc expiry intervals. This wil have the effect of increasing latency a little, but with improved throughput.
-
Andrew Morton authored
From: <bart@samwel.tk> Richard Atterer reported that mutt does not play well with noatime (it uses access times to check whether new mail has arrived in a folder). This patch warns about this in the doc, and adds a setting to the control script to disable the noatime remount.
-
Andrew Morton authored
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
-
Andrew Morton authored
From: Stephen Smalley <sds@epoch.ncsc.mil> This patch changes the SELinux module to try to reset any descriptors it closes on exec (due to a lack of permission by the new domain to the inherited open file) to refer to the null device. This counters the problem of SELinux inducing program misbehavior, particularly due to having descriptors 0-2 closed when the new domain is not allowed access to the caller's tty. This is primarily to address the case where the caller is trusted with respect to the new domain, as the untrusted caller case is already handled via AT_SECURE and glibc secure mode. The code is partly based on the OpenWall LSM, which in turn drew from the OpenWall kernel patch. Note that the code does not guarantee that the descriptor is always re-opened to /dev/null; it merely makes a reasonable effort to do so, but can fail under various conditions.
-