Commits · 99effef9544e4d526abf836d6d6c680853e6cf64 · Kirill Smelkov / linux

An error occurred fetching the project authors.

10 May, 2004 40 commits

[PATCH] dentry and inode cache hash algorithm performance changes. · 99effef9

Andrew Morton authored 20 years ago

From: "Jose R. Santos" <jrsantos@austin.ibm.com>

It alleviates some issues seen with Linux when accessing millions of files on
machines with large amounts of RAM (+32GB).  Both algorithms are base on some
studies that Dominique Heger was doing on hash table efficiencies in Linux.
The dentry hash table has been tested in small systems with one internal IDE
hard disk as well as in large SMP with many fiberchanel disks.  Dominique
claims that in all the testing done, they did not see one case were this has
function provided worst performance and that in most test they were seeing
better performance.

The inode hash function was done by me base on Dominique's original work and
has only been stress tested with SpecSFS.  It provided a 3% improvement over
the default algorithm in the SpecSFS results and speed ups in the response
time of almost all filesystem operations the benchmark stress.  With the
better distribution is as also possible to reduce the number of inode buckets
for 32 million to 16 million and still get a slightly better results.

Anton was nice enough to provide some graphs that show the distribution 
before and after the patch at http://samba.org/~anton/linux/sfs/1/

For the dentry hash function, some of my other coorkers had put this hash
function through various testing and have concluded that the hash function was
equal or better than the default hash function.  These runs were done with a
(hopefully to be Open Source soon) benchmark called FFSB which can simulate
various io patters across many filesystems and variable file sizes.

SpecSFS fileset is basically a lot of small file which varies depending on the
size of the run.  For a not so big SMP system the number of file is in the +20
Million files range.  Of those 20 million files only 10% are access randomly
by the client.  The purpose of this is that the benchmark tries to stress not
only the NFS layer but, VM and Filesystems layers as well.  The filesets are
also hundreds of gigabytes in size in order to promote disk head movement by
guaranteeing cache misses in memory.  SFS 27% of the workload are lookups
__d_lookup has showing high in my profiles.

For the inode hash the problem that I see is that when running a benchmark
with this huge fileset we end up trying to free a lot of inode entries during
the run while trying to put new entries in cache.  We end up calling
ifind_fast() which calls find_inodes_fast() held under inode_lock.  In order
to avoid holding the inode_lock we needed to avoid having long chains in that
hash function.

When I took a look at the original hash function, I found it to be a bit to
simple for any workload.  My solution (which I took advantage of Dominique's
work) was to create a hash that function that could generate completely
different hashes depending on the hashval and the superblock in order to have
the hash scale as we added more filesystems to the machine.

Both of these problems can be somewhat tuned out by increasing the number of
buckets of both d and i cache but it got to a point were I had 256MB of inode
and 128MB in dentry hash buckets on a not so large SMP.  With the hash changes
I have been able to reduce the number of buckets to 128MB for inode cache and
to 32MB for dentry cache and still get better performance.

If it help my case...  I haven't been running this benchmark for long, so I
haven't been able to find a way to cheat.  I need to come up with generic
solutions until I can find a cheat for the benchmark.  :)


SDET results:

Steve Pratt seem to have a SDET setup already and he did me the favor of
running SDET with a reduce dentry entry hash table size.  I belive that
his table suggest that less than 3% change is acceptable variability, but
overall he got a 5% better number using the new hash algorith.

A) x4408way1.sdet.2.6.5100000-8p.04-05-05_12.08.44 vs 
B) x4408way1.sdet.2.6.5+hash-100000-8p.04-05-05_11.48.02


  Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
  Inode-cache hash table entries: 1048576 (order: 10, 4194304 bytes) 

Results:Throughput

                                          tolerance = 0.00 + 3.00% of A
                      A            B
   Threads      Ops/sec      Ops/sec    %diff         diff    tolerance
----------- ------------ ------------ -------- ------------ ------------
         1    4341.9300    4401.9500     1.38        60.02       130.26 
         2    8242.2000    8165.1200    -0.94       -77.08       247.27 
         4   15274.4900   15257.1000    -0.11       -17.39       458.23 
         8   21326.9200   21320.7000    -0.03        -6.22       639.81 
        16   23056.2100   24282.8000     5.32      1226.59       691.69  * 
        32   23397.2500   24684.6100     5.50      1287.36       701.92  * 
        64   23372.7600   23632.6500     1.11       259.89       701.18 
       128   17009.3900   16651.9600    -2.10      -357.43       510.28 
=========================================================================

99effef9

[PATCH] cmpci OSS driver update · 9e315f49
Andrew Morton authored 20 years ago
```
From: C.L. Tien <cltien@cmedia.com.tw>

Current version from cmedia.
```
9e315f49

[PATCH] EDD: follow sysfs convention, MODULE_VERSION, remove dead SCSI symlink · da78fe73

Andrew Morton authored 20 years ago

From: Matt Domsch <Matt_Domsch@dell.com>

Clean up the edd.c driver.

* use kobject_set_name() instead of snprintf() per GregKH's recommendation.
* Add MODULE_VERSION()
* s/driverfs/sysfs/ in Kconfig
* Remove report URL message, as there have been too many BIOSs reported,
  virtually none of which are EDD-capable.  This may return if/when I
  develop a better reporting method and database to capture/store the
  data from users.
* Remove the unused code for creating a symlink to the scsi_device.
  This never worked right, and I'm going to show the relationship from
  a userspace tool which uses libsysfs instead.

da78fe73

[PATCH] blk_start_queue() should use kblockd · 12db2584
Andrew Morton authored 20 years ago
```
kblockd is the thread which runs unplug functions, not keventd.
```
12db2584

[PATCH] Only Print Taint Message Once · d137ab48

Andrew Morton authored 20 years ago

From: Rusty Russell <rusty@rustcorp.com.au>

Only print the tainted message the first time.  Its purpose is to warn
users that we can't support them, not to fill their logs.

d137ab48

[PATCH] Un-inline spinlocks on ppc64 · 5dfd0a43

Andrew Morton authored 20 years ago

From: Paul Mackerras <paulus@samba.org>

The patch below moves the ppc64 spinlocks and rwlocks out of line and into
arch/ppc64/lib/locks.c, and implements _raw_spin_lock_flags for ppc64.

Part of the motivation for moving the spinlocks and rwlocks out of line was
that I needed to add code to the slow paths to yield the processor to the
hypervisor on systems with shared processors. On these systems, a cpu as
seen by the kernel is a virtual processor that is not necessarily running
full-time on a real physical cpu. If we are spinning on a lock which is
held by another virtual processor which is not running at the moment, we
are just wasting time. In such a situation it is better to do a hypervisor
call to ask it to give the rest of our time slice to the lock holder so
that forward progress can be made.

The one problem with out-of-line spinlock routines is that lock contention
will show up in profiles in the spin_lock etc. routines rather than in the
callers, as it does with inline spinlocks. I have added a CONFIG_SPINLINE
config option for people that want to do profiling. In the longer term, Anton
is talking about teaching the profiling code to attribute samples in the spin
lock routines to the routine's caller.

This patch reduces the kernel by about 80kB on my G5. With inline
spinlocks selected, the kernel gets about 4kB bigger than without the
patch, because _raw_spin_lock_flags is slightly bigger than _raw_spin_lock.

This patch depends on the patch from Keith Owens to add
_raw_spin_lock_flags.

5dfd0a43

[PATCH] Allow architectures to reenable interrupts on contended spinlocks · 07f94531

Andrew Morton authored 20 years ago

From: Keith Owens <kaos@sgi.com>

As requested by Linus, update all architectures to add the common
infrastructure.  Tested on ia64 and i386.

Enable interrupts while waiting for a disabled spinlock, but only if
interrupts were enabled before issuing spin_lock_irqsave().

This patch consists of three sections :-

* An architecture independent change to call _raw_spin_lock_flags()
  instead of _raw_spin_lock() when the flags are available.

* An ia64 specific change to implement _raw_spin_lock_flags() and to
  define _raw_spin_lock(lock) as _raw_spin_lock_flags(lock, 0) for the
  ASM_SUPPORTED case.

* Patches for all other architectures and for ia64 with !ASM_SUPPORTED
  to map _raw_spin_lock_flags(lock, flags) to _raw_spin_lock(lock).
  Architecture maintainers can define _raw_spin_lock_flags() to do
  something useful if they want to enable interrupts while waiting for
  a disabled spinlock.

07f94531

[PATCH] Kill some 'No description found...' warnings. (kernel-api.sgml) · a023cd55
Andrew Morton authored 20 years ago
```
From: Alexey Dobriyan <adobriyan@mail.ru>

Fix various kernel-doc parameters.
```
a023cd55

[PATCH] Kill a warning while making pdfdocs. · 72468a40

Andrew Morton authored 20 years ago

From: Alexey Dobriyan <adobriyan@mail.ru>

  DOCPROC Documentation/DocBook/parportbook.sgml
Warning(drivers/parport/share.c:188): No description found for parameter 'drv'
(kernel-doc parameter name is incorrect.)

72468a40

[PATCH] com90xx error message patch: check_region() gone · 8b3ca458

Andrew Morton authored 20 years ago

From: Greg Aumann <Greg_Aumann@sil.org>

This patch updates two error messages to reflect changes in the code.

8b3ca458

[PATCH] Improve laptop mode's block_dump output · 6835de14

Andrew Morton authored 20 years ago

From: "Theodore Ts'o" <tytso@mit.edu>

This patch versus improves the output produced by "echo 1 >
/proc/sys/vm/block_dump", in the following ways:

1) The messages are printed with KERN_DEBUG, so that even if sysklogd is
   running, if configured appropriately, it will not need to write to log
   files.

2) The inode which is dirtied by a process is now identified more
   precisely by inode number and filesystem ID, and by a dcache name if
   present.

3) In the generic filesystem sget function, the superblock id (s_id) is
   filled in with the filesystem type by default.  Filesystems which are
   block-device based will override s_id, but this allows pseudo
   filesystems such as tmpfs, procfs, etc.  to be identified in (2).

6835de14

[PATCH] find_user locking and leak fix · 475c3656

Andrew Morton authored 20 years ago

find_user() is being called from set/get_priority(), but it doesn't take the
needed lock, and those callers were forgetting to drop the refcount which
find_user() took.

475c3656

[PATCH] mptfusion depends on scsi · 5a80c2ea
Andrew Morton authored 20 years ago
```
From: Olaf Hering <olh@suse.de>
```
5a80c2ea

[PATCH] reiserfs: add device info to diagnostic messages · 9511c080

Andrew Morton authored 20 years ago

From: Chris Mason <mason@suse.com>

From: Jeff Mahoney <jeffm@suse.com>

Add device info to the various reiserfs warnings and panics so you can tell
which filesystem triggers the message.  Loosely based on code from Oleg
Drokin.

9511c080

[PATCH] reiserfs: xattr permission fix · cee42600

Andrew Morton authored 20 years ago

From: Chris Mason <mason@suse.com>

From: jeffm@suse.com

reiserfs permission bug fix for xattrs

cee42600

[PATCH] reiserfs: quota support · 446a7461

Andrew Morton authored 20 years ago

From: Chris Mason <mason@suse.com>

ReiserFS support for quotas.  Originally from Jan Kara

446a7461

[PATCH] reiserfs: xattr locking fixes · 30304fc9
Andrew Morton authored 20 years ago
```
From: Chris Mason <mason@suse.com>

From: jeffm@suse.com

reiserfs xattr locking fixes
```
30304fc9

[PATCH] reiserfs: selinux support · 647c60b9

Andrew Morton authored 20 years ago

From: Chris Mason <mason@suse.com>

From: jeffm@suse.com

reiserfs support for selinux

647c60b9

[PATCH] reiserfs: support trusted xattrs · a4a4ddc5

Andrew Morton authored 20 years ago

From: Chris Mason <mason@suse.com>

From: jeffm@suse.com

reiserfs support for trusted xattrs

a4a4ddc5

[PATCH] reiserfs: ACL support · 0acef032
Andrew Morton authored 20 years ago
```
From: Chris Mason <mason@suse.com>

From: jeffm@suse.com

reiserfs acl support
```
0acef032

[PATCH] reiserfs: xattr support · 0b1a6a8c

Andrew Morton authored 20 years ago

From: Chris Mason <mason@suse.com>

From: jeffm@suse.com

reiserfs support for xattrs

0b1a6a8c

[PATCH] reiserfs: acl device node initialization · 06803e35

Andrew Morton authored 20 years ago

From: Chris Mason <mason@suse.com>

From: jeffm@suse.com

properly init device inodes in the acl code

06803e35

[PATCH] Reiserfs commit default fix · bb0ad0aa

Andrew Morton authored 20 years ago

From: Bart Samwel <bart@samwel.tk>

This patch from Micha Feigin fixes some bugs in the earlier reiserfs 
commit default patch. The changelog:

* If you remounted without any commit=NNN option, it would assume commit=0
  and restore the defaults.  This patch makes it leave the current state alone
  if you don't pass commit=NNN.

* Added range check for cast from unsigned long to unsigned int.

bb0ad0aa

[PATCH] partitioning cleanup: use DOS_EXTENDED_PARTITION · 6ef00625
Andrew Morton authored 20 years ago
```
From: FabF <Fabian.Frederick@skynet.be>

Use the pre-existing enum rather than magic numbers.
```
6ef00625
[PATCH] fix 3c59x.c to allow 3c905c 100bT-FD · 073c4132
Andrew Morton authored 20 years ago
```
From: Burton Windle <bwindle@fint.org>

Fix the 3c905C 10/100 transceiver initialisation woes.
```
073c4132

[PATCH] shrink_slab: improved handling of GFP_NOFS allocations · edb41998

Andrew Morton authored 20 years ago

Currently, shrink_slab() will decide that it needs to scan a certain number of
dentries, will call shrink_dcache_memory() requesting that this be done, and
shrink_dcache_memory() will simply bale out without doing anything because the
caller did not have __GFP_FS.

This has the potential to disrupt our lovely pagecache-vs-slab balancing act. 
So change things so that shrinker callouts can return -1, indicating that they
baled out.  This way, shrink_slab can remember that this slab was owed a
certain number of scannings and these will be correctly performed next time a
__GFP_FS caller comes by.

edb41998

[PATCH] New version of early CPU detect · b528cea7

Andrew Morton authored 20 years ago

From: Andi Kleen <ak@suse.de>

We still need some kind of early CPU detection, e.g. for the AMD768
workaround and for the slab allocator to size its slabs correctly for the
cache line. Also some other code already had private early CPU routines.

This patch takes a new approach compared to the previous patch which caused
Andrew so much grief. It only fills in a few selected fields in
boot_cpu_data (only the data needed to identify the CPU type and the cache
alignment). In particular the feature masks are not filled in, and the
other fields are also not touched to prevent unwanted side effects.

Also convert the ppro workaround to use standard cpu data now.

I'm not sure if slab still has the necessary support to use the cache line
size early; previously Manfred showed some serious memory saving with this
for kernels that are compiled for a bigger cache line size than the CPU (is
often the case on distribution kernels). This code could be reenable now
with this patch.

b528cea7

[PATCH] remove some unused variables in s2io · ed67bbe7
Andrew Morton authored 20 years ago
```
From: Anton Blanchard <anton@samba.org>

Found a few warnings when compiling with NAPI off.
```
ed67bbe7

[PATCH] Remove bootsect_helper on x86_64 and pc98 · 51538d85

Andrew Morton authored 20 years ago

From: Coywolf Qi Hunt <coywolf@greatcn.org>

Since "Direct booting from floppy is no longer supported", this patch is
remove the bootsect_helper code from x86_64 and PC-9800.

51538d85

[PATCH] Remove bootsect_helper and a comment fix · 7d8d2dfe

Andrew Morton authored 20 years ago

From: Coywolf Qi Hunt <coywolf@greatcn.org>

Since "Direct booting from floppy is no longer supported", this patch is to
remove the bootsect_helper code. And also a comment fix.

The other two platforms x86_64 and PC-9800 should also be cleaned up too.

7d8d2dfe

[PATCH] ppc32: ppc8xx build fixes · 79fde358

Andrew Morton authored 20 years ago

From: "Prof. BJ" <prof.bj@freemail.hu>

- m8xx_setup warning and mfmsr error fix
- ppc8xx_pic include error fix
- tqm8xxl.c typeing (syntax) error fix
- commproc.c include error and prototype warning fix

(acked by Matt Porter)

79fde358

[PATCH] es7000 subarch update · 45dc4f27

Andrew Morton authored 20 years ago

From: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>

The patch fixes a problem with ES7000 Server Management mechanism that uses
platform register mip_port. It was not initialized, so the mechanism was not
functional.

The patch also fixes the APIC destination for hierarchical and flat cluster
models used in ES7000. The destination ID's reflect policies for Cascade
based systems which use logical delivery and lowest priority mechanism, and
for xAPIC based models that use physical delivery and fixed APIC destinations.

The patch also turns on NO_IOAPIC_CHECK (1) to avoid error messages and
attempts to re-write the ID, because on ES7000 all ID's are hard coded in the
BIOS and cannot be altered.

45dc4f27

[PATCH] Consolidate sys32_nfsservctl · 522cbd42

Andrew Morton authored 20 years ago

From: Arnd Bergmann <arnd@arndb.de>

sys32_nfsservctl is the largest remaining syscall emulation handler that can
be consolidated.  mips and ia64 currently don't use this at all, parisc has a
simpler implementation than the one used by s390, sparc ppc and that the new
compat_sys_nfsservctl is based on.

The user access checks in the code are inconsistant at least, which should be
fixed here.

Compile tested only due to lack of proper test setup.

522cbd42

[PATCH] Consolidate sys32_select · 37915f7b

Andrew Morton authored 20 years ago

From: Arnd Bergmann <arnd@arndb.de>

sys32_select has seven mostly but not exactly identical versions, so
consolidate them as compat_sys_select.  Based on the ppc64 implementation,
which most closely resembles sys_select.  One bug that was not caught by LTP
has been fixed since the first version of this patch.

tested x86_64, ia64 and s390.

37915f7b

[PATCH] Consolidate do_execve32 · 265e0a42

Andrew Morton authored 20 years ago

From: Arnd Bergmann <arnd@arndb.de>

The code for sys32_execve/do_execve32 in most of the seven versions was copied
from fs/exec.c but not kept up-to-date.  The new compat_do_execve() function
is based on the mips code and has been resync'ed with do_execve().  IA64
changes are from Arun Sharma.

Tested on x86_64, ia64 and s390

265e0a42

[PATCH] Consolidate sys32_readv and sys32_writev · 4791db72

Andrew Morton authored 20 years ago

From: Arnd Bergmann <arnd@arndb.de>

The seven implementations of this have gone out of sync and are mostly buggy. 
The new compat_sys_* version is based on the ppc64 implementation, which most
closely resembles the code in sys_readv/sys_writev.

Tested on x86_64, ia64 and s390.

4791db72

[PATCH] AS: increase batch expiry intervals · 8aab2013

Andrew Morton authored 20 years ago

From: Nick Piggin <nickpiggin@yahoo.com.au>

Without disturbing the read/write ratio, increase the bathc expiry
intervals.  This wil have the effect of increasing latency a little, but
with improved throughput.

8aab2013

[PATCH] Laptop Mode doc update · e33daf9d

Andrew Morton authored 20 years ago

From: <bart@samwel.tk>

Richard Atterer reported that mutt does not play well with noatime (it uses
access times to check whether new mail has arrived in a folder).  This patch
warns about this in the doc, and adds a setting to the control script to
disable the noatime remount.

e33daf9d

[PATCH] cyclades MAINTAINERS update · c489e9e6
Andrew Morton authored 20 years ago
```
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
```
c489e9e6

[PATCH] selinux: reopen descriptors closed on exec to /dev/null · def3f08e

Andrew Morton authored 20 years ago

From: Stephen Smalley <sds@epoch.ncsc.mil>

This patch changes the SELinux module to try to reset any descriptors it
closes on exec (due to a lack of permission by the new domain to the inherited
open file) to refer to the null device. This counters the problem of SELinux
inducing program misbehavior, particularly due to having descriptors 0-2
closed when the new domain is not allowed access to the caller's tty. This is
primarily to address the case where the caller is trusted with respect to the
new domain, as the untrusted caller case is already handled via AT_SECURE and
glibc secure mode. The code is partly based on the OpenWall LSM, which in
turn drew from the OpenWall kernel patch. Note that the code does not
guarantee that the descriptor is always re-opened to /dev/null; it merely
makes a reasonable effort to do so, but can fail under various conditions.

def3f08e