1. 10 May, 2004 40 commits
    • Andrew Morton's avatar
      [PATCH] dentry and inode cache hash algorithm performance changes. · 99effef9
      Andrew Morton authored
      From: "Jose R. Santos" <jrsantos@austin.ibm.com>
      
      It alleviates some issues seen with Linux when accessing millions of files on
      machines with large amounts of RAM (+32GB).  Both algorithms are base on some
      studies that Dominique Heger was doing on hash table efficiencies in Linux.
      The dentry hash table has been tested in small systems with one internal IDE
      hard disk as well as in large SMP with many fiberchanel disks.  Dominique
      claims that in all the testing done, they did not see one case were this has
      function provided worst performance and that in most test they were seeing
      better performance.
      
      The inode hash function was done by me base on Dominique's original work and
      has only been stress tested with SpecSFS.  It provided a 3% improvement over
      the default algorithm in the SpecSFS results and speed ups in the response
      time of almost all filesystem operations the benchmark stress.  With the
      better distribution is as also possible to reduce the number of inode buckets
      for 32 million to 16 million and still get a slightly better results.
      
      Anton was nice enough to provide some graphs that show the distribution 
      before and after the patch at http://samba.org/~anton/linux/sfs/1/
      
      For the dentry hash function, some of my other coorkers had put this hash
      function through various testing and have concluded that the hash function was
      equal or better than the default hash function.  These runs were done with a
      (hopefully to be Open Source soon) benchmark called FFSB which can simulate
      various io patters across many filesystems and variable file sizes.
      
      SpecSFS fileset is basically a lot of small file which varies depending on the
      size of the run.  For a not so big SMP system the number of file is in the +20
      Million files range.  Of those 20 million files only 10% are access randomly
      by the client.  The purpose of this is that the benchmark tries to stress not
      only the NFS layer but, VM and Filesystems layers as well.  The filesets are
      also hundreds of gigabytes in size in order to promote disk head movement by
      guaranteeing cache misses in memory.  SFS 27% of the workload are lookups
      __d_lookup has showing high in my profiles.
      
      For the inode hash the problem that I see is that when running a benchmark
      with this huge fileset we end up trying to free a lot of inode entries during
      the run while trying to put new entries in cache.  We end up calling
      ifind_fast() which calls find_inodes_fast() held under inode_lock.  In order
      to avoid holding the inode_lock we needed to avoid having long chains in that
      hash function.
      
      When I took a look at the original hash function, I found it to be a bit to
      simple for any workload.  My solution (which I took advantage of Dominique's
      work) was to create a hash that function that could generate completely
      different hashes depending on the hashval and the superblock in order to have
      the hash scale as we added more filesystems to the machine.
      
      Both of these problems can be somewhat tuned out by increasing the number of
      buckets of both d and i cache but it got to a point were I had 256MB of inode
      and 128MB in dentry hash buckets on a not so large SMP.  With the hash changes
      I have been able to reduce the number of buckets to 128MB for inode cache and
      to 32MB for dentry cache and still get better performance.
      
      If it help my case...  I haven't been running this benchmark for long, so I
      haven't been able to find a way to cheat.  I need to come up with generic
      solutions until I can find a cheat for the benchmark.  :)
      
      
      SDET results:
      
      Steve Pratt seem to have a SDET setup already and he did me the favor of
      running SDET with a reduce dentry entry hash table size.  I belive that
      his table suggest that less than 3% change is acceptable variability, but
      overall he got a 5% better number using the new hash algorith.
      
      A) x4408way1.sdet.2.6.5100000-8p.04-05-05_12.08.44 vs 
      B) x4408way1.sdet.2.6.5+hash-100000-8p.04-05-05_11.48.02
      
      
        Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
        Inode-cache hash table entries: 1048576 (order: 10, 4194304 bytes) 
      
      Results:Throughput
      
                                                tolerance = 0.00 + 3.00% of A
                            A            B
         Threads      Ops/sec      Ops/sec    %diff         diff    tolerance
      ----------- ------------ ------------ -------- ------------ ------------
               1    4341.9300    4401.9500     1.38        60.02       130.26 
               2    8242.2000    8165.1200    -0.94       -77.08       247.27 
               4   15274.4900   15257.1000    -0.11       -17.39       458.23 
               8   21326.9200   21320.7000    -0.03        -6.22       639.81 
              16   23056.2100   24282.8000     5.32      1226.59       691.69  * 
              32   23397.2500   24684.6100     5.50      1287.36       701.92  * 
              64   23372.7600   23632.6500     1.11       259.89       701.18 
             128   17009.3900   16651.9600    -2.10      -357.43       510.28 
      =========================================================================
      99effef9
    • Andrew Morton's avatar
      [PATCH] cmpci OSS driver update · 9e315f49
      Andrew Morton authored
      From: C.L. Tien <cltien@cmedia.com.tw>
      
      Current version from cmedia.
      9e315f49
    • Andrew Morton's avatar
      [PATCH] EDD: follow sysfs convention, MODULE_VERSION, remove dead SCSI symlink · da78fe73
      Andrew Morton authored
      From: Matt Domsch <Matt_Domsch@dell.com>
      
      Clean up the edd.c driver.
      
      * use kobject_set_name() instead of snprintf() per GregKH's recommendation.
      * Add MODULE_VERSION()
      * s/driverfs/sysfs/ in Kconfig
      * Remove report URL message, as there have been too many BIOSs reported,
        virtually none of which are EDD-capable.  This may return if/when I
        develop a better reporting method and database to capture/store the
        data from users.
      * Remove the unused code for creating a symlink to the scsi_device.
        This never worked right, and I'm going to show the relationship from
        a userspace tool which uses libsysfs instead.
      da78fe73
    • Andrew Morton's avatar
      [PATCH] blk_start_queue() should use kblockd · 12db2584
      Andrew Morton authored
      kblockd is the thread which runs unplug functions, not keventd.
      12db2584
    • Andrew Morton's avatar
      [PATCH] Only Print Taint Message Once · d137ab48
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      Only print the tainted message the first time.  Its purpose is to warn
      users that we can't support them, not to fill their logs.
      d137ab48
    • Andrew Morton's avatar
      [PATCH] Un-inline spinlocks on ppc64 · 5dfd0a43
      Andrew Morton authored
      From: Paul Mackerras <paulus@samba.org>
      
      The patch below moves the ppc64 spinlocks and rwlocks out of line and into
      arch/ppc64/lib/locks.c, and implements _raw_spin_lock_flags for ppc64.
      
      Part of the motivation for moving the spinlocks and rwlocks out of line was
      that I needed to add code to the slow paths to yield the processor to the
      hypervisor on systems with shared processors.  On these systems, a cpu as
      seen by the kernel is a virtual processor that is not necessarily running
      full-time on a real physical cpu.  If we are spinning on a lock which is
      held by another virtual processor which is not running at the moment, we
      are just wasting time.  In such a situation it is better to do a hypervisor
      call to ask it to give the rest of our time slice to the lock holder so
      that forward progress can be made.
      
      The one problem with out-of-line spinlock routines is that lock contention
      will show up in profiles in the spin_lock etc.  routines rather than in the
      callers, as it does with inline spinlocks.  I have added a CONFIG_SPINLINE
      config option for people that want to do profiling.  In the longer term, Anton
      is talking about teaching the profiling code to attribute samples in the spin
      lock routines to the routine's caller.
      
      This patch reduces the kernel by about 80kB on my G5.  With inline
      spinlocks selected, the kernel gets about 4kB bigger than without the
      patch, because _raw_spin_lock_flags is slightly bigger than _raw_spin_lock.
      
      This patch depends on the patch from Keith Owens to add
      _raw_spin_lock_flags.
      5dfd0a43
    • Andrew Morton's avatar
      [PATCH] Allow architectures to reenable interrupts on contended spinlocks · 07f94531
      Andrew Morton authored
      From: Keith Owens <kaos@sgi.com>
      
      As requested by Linus, update all architectures to add the common
      infrastructure.  Tested on ia64 and i386.
      
      Enable interrupts while waiting for a disabled spinlock, but only if
      interrupts were enabled before issuing spin_lock_irqsave().
      
      This patch consists of three sections :-
      
      * An architecture independent change to call _raw_spin_lock_flags()
        instead of _raw_spin_lock() when the flags are available.
      
      * An ia64 specific change to implement _raw_spin_lock_flags() and to
        define _raw_spin_lock(lock) as _raw_spin_lock_flags(lock, 0) for the
        ASM_SUPPORTED case.
      
      * Patches for all other architectures and for ia64 with !ASM_SUPPORTED
        to map _raw_spin_lock_flags(lock, flags) to _raw_spin_lock(lock).
        Architecture maintainers can define _raw_spin_lock_flags() to do
        something useful if they want to enable interrupts while waiting for
        a disabled spinlock.
      07f94531
    • Andrew Morton's avatar
      [PATCH] Kill some 'No description found...' warnings. (kernel-api.sgml) · a023cd55
      Andrew Morton authored
      From: Alexey Dobriyan <adobriyan@mail.ru>
      
      Fix various kernel-doc parameters.
      a023cd55
    • Andrew Morton's avatar
      [PATCH] Kill a warning while making pdfdocs. · 72468a40
      Andrew Morton authored
      From: Alexey Dobriyan <adobriyan@mail.ru>
      
        DOCPROC Documentation/DocBook/parportbook.sgml
      Warning(drivers/parport/share.c:188): No description found for parameter 'drv'
      (kernel-doc parameter name is incorrect.)
      72468a40
    • Andrew Morton's avatar
      [PATCH] com90xx error message patch: check_region() gone · 8b3ca458
      Andrew Morton authored
      From: Greg Aumann <Greg_Aumann@sil.org>
      
      This patch updates two error messages to reflect changes in the code.
      8b3ca458
    • Andrew Morton's avatar
      [PATCH] Improve laptop mode's block_dump output · 6835de14
      Andrew Morton authored
      From: "Theodore Ts'o" <tytso@mit.edu>
      
      This patch versus improves the output produced by "echo 1 >
      /proc/sys/vm/block_dump", in the following ways:
      
      1) The messages are printed with KERN_DEBUG, so that even if sysklogd is
         running, if configured appropriately, it will not need to write to log
         files.
      
      2) The inode which is dirtied by a process is now identified more
         precisely by inode number and filesystem ID, and by a dcache name if
         present.
      
      3) In the generic filesystem sget function, the superblock id (s_id) is
         filled in with the filesystem type by default.  Filesystems which are
         block-device based will override s_id, but this allows pseudo
         filesystems such as tmpfs, procfs, etc.  to be identified in (2).
      6835de14
    • Andrew Morton's avatar
      [PATCH] find_user locking and leak fix · 475c3656
      Andrew Morton authored
      find_user() is being called from set/get_priority(), but it doesn't take the
      needed lock, and those callers were forgetting to drop the refcount which
      find_user() took.
      475c3656
    • Andrew Morton's avatar
      [PATCH] mptfusion depends on scsi · 5a80c2ea
      Andrew Morton authored
      From: Olaf Hering <olh@suse.de>
      5a80c2ea
    • Andrew Morton's avatar
      [PATCH] reiserfs: add device info to diagnostic messages · 9511c080
      Andrew Morton authored
      From: Chris Mason <mason@suse.com>
      
      From: Jeff Mahoney <jeffm@suse.com>
      
      Add device info to the various reiserfs warnings and panics so you can tell
      which filesystem triggers the message.  Loosely based on code from Oleg
      Drokin.
      9511c080
    • Andrew Morton's avatar
      [PATCH] reiserfs: xattr permission fix · cee42600
      Andrew Morton authored
      From: Chris Mason <mason@suse.com>
      
      From: jeffm@suse.com
      
      reiserfs permission bug fix for xattrs
      cee42600
    • Andrew Morton's avatar
      [PATCH] reiserfs: quota support · 446a7461
      Andrew Morton authored
      From: Chris Mason <mason@suse.com>
      
      ReiserFS support for quotas.  Originally from Jan Kara
      446a7461
    • Andrew Morton's avatar
      [PATCH] reiserfs: xattr locking fixes · 30304fc9
      Andrew Morton authored
      From: Chris Mason <mason@suse.com>
      
      From: jeffm@suse.com
      
      reiserfs xattr locking fixes
      30304fc9
    • Andrew Morton's avatar
      [PATCH] reiserfs: selinux support · 647c60b9
      Andrew Morton authored
      From: Chris Mason <mason@suse.com>
      
      From: jeffm@suse.com
      
      reiserfs support for selinux
      647c60b9
    • Andrew Morton's avatar
      [PATCH] reiserfs: support trusted xattrs · a4a4ddc5
      Andrew Morton authored
      From: Chris Mason <mason@suse.com>
      
      From: jeffm@suse.com
      
      reiserfs support for trusted xattrs
      a4a4ddc5
    • Andrew Morton's avatar
      [PATCH] reiserfs: ACL support · 0acef032
      Andrew Morton authored
      From: Chris Mason <mason@suse.com>
      
      From: jeffm@suse.com
      
      reiserfs acl support
      0acef032
    • Andrew Morton's avatar
      [PATCH] reiserfs: xattr support · 0b1a6a8c
      Andrew Morton authored
      From: Chris Mason <mason@suse.com>
      
      From: jeffm@suse.com
      
      reiserfs support for xattrs
      0b1a6a8c
    • Andrew Morton's avatar
      [PATCH] reiserfs: acl device node initialization · 06803e35
      Andrew Morton authored
      From: Chris Mason <mason@suse.com>
      
      From: jeffm@suse.com
      
      properly init device inodes in the acl code
      06803e35
    • Andrew Morton's avatar
      [PATCH] Reiserfs commit default fix · bb0ad0aa
      Andrew Morton authored
      From: Bart Samwel <bart@samwel.tk>
      
      This patch from Micha Feigin fixes some bugs in the earlier reiserfs 
      commit default patch. The changelog:
      
      * If you remounted without any commit=NNN option, it would assume commit=0
        and restore the defaults.  This patch makes it leave the current state alone
        if you don't pass commit=NNN.
      
      * Added range check for cast from unsigned long to unsigned int.
      bb0ad0aa
    • Andrew Morton's avatar
      [PATCH] partitioning cleanup: use DOS_EXTENDED_PARTITION · 6ef00625
      Andrew Morton authored
      From: FabF <Fabian.Frederick@skynet.be>
      
      Use the pre-existing enum rather than magic numbers.
      6ef00625
    • Andrew Morton's avatar
      [PATCH] fix 3c59x.c to allow 3c905c 100bT-FD · 073c4132
      Andrew Morton authored
      From: Burton Windle <bwindle@fint.org>
      
      Fix the 3c905C 10/100 transceiver initialisation woes.
      073c4132
    • Andrew Morton's avatar
      [PATCH] shrink_slab: improved handling of GFP_NOFS allocations · edb41998
      Andrew Morton authored
      Currently, shrink_slab() will decide that it needs to scan a certain number of
      dentries, will call shrink_dcache_memory() requesting that this be done, and
      shrink_dcache_memory() will simply bale out without doing anything because the
      caller did not have __GFP_FS.
      
      This has the potential to disrupt our lovely pagecache-vs-slab balancing act. 
      So change things so that shrinker callouts can return -1, indicating that they
      baled out.  This way, shrink_slab can remember that this slab was owed a
      certain number of scannings and these will be correctly performed next time a
      __GFP_FS caller comes by.
      edb41998
    • Andrew Morton's avatar
      [PATCH] New version of early CPU detect · b528cea7
      Andrew Morton authored
      From: Andi Kleen <ak@suse.de>
      
      We still need some kind of early CPU detection, e.g.  for the AMD768
      workaround and for the slab allocator to size its slabs correctly for the
      cache line.  Also some other code already had private early CPU routines.
      
      This patch takes a new approach compared to the previous patch which caused
      Andrew so much grief.  It only fills in a few selected fields in
      boot_cpu_data (only the data needed to identify the CPU type and the cache
      alignment).  In particular the feature masks are not filled in, and the
      other fields are also not touched to prevent unwanted side effects.
      
      Also convert the ppro workaround to use standard cpu data now. 
      
      I'm not sure if slab still has the necessary support to use the cache line
      size early; previously Manfred showed some serious memory saving with this
      for kernels that are compiled for a bigger cache line size than the CPU (is
      often the case on distribution kernels).  This code could be reenable now
      with this patch.
      b528cea7
    • Andrew Morton's avatar
      [PATCH] remove some unused variables in s2io · ed67bbe7
      Andrew Morton authored
      From: Anton Blanchard <anton@samba.org>
      
      Found a few warnings when compiling with NAPI off.
      ed67bbe7
    • Andrew Morton's avatar
      [PATCH] Remove bootsect_helper on x86_64 and pc98 · 51538d85
      Andrew Morton authored
      From: Coywolf Qi Hunt <coywolf@greatcn.org>
      
      Since "Direct booting from floppy is no longer supported", this patch is
      remove the bootsect_helper code from x86_64 and PC-9800.
      51538d85
    • Andrew Morton's avatar
      [PATCH] Remove bootsect_helper and a comment fix · 7d8d2dfe
      Andrew Morton authored
      From: Coywolf Qi Hunt <coywolf@greatcn.org>
      
      Since "Direct booting from floppy is no longer supported", this patch is to
      remove the bootsect_helper code.  And also a comment fix.
      
      The other two platforms x86_64 and PC-9800 should also be cleaned up too.
      7d8d2dfe
    • Andrew Morton's avatar
      [PATCH] ppc32: ppc8xx build fixes · 79fde358
      Andrew Morton authored
      From: "Prof. BJ" <prof.bj@freemail.hu>
      
      - m8xx_setup warning and mfmsr error fix
      - ppc8xx_pic include error fix
      - tqm8xxl.c typeing (syntax) error fix
      - commproc.c include error and prototype warning fix
      
      (acked by Matt Porter)
      79fde358
    • Andrew Morton's avatar
      [PATCH] es7000 subarch update · 45dc4f27
      Andrew Morton authored
      From: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
      
      The patch fixes a problem with ES7000 Server Management mechanism that uses
      platform register mip_port.  It was not initialized, so the mechanism was not
      functional.
      
      The patch also fixes the APIC destination for hierarchical and flat cluster
      models used in ES7000.  The destination ID's reflect policies for Cascade
      based systems which use logical delivery and lowest priority mechanism, and
      for xAPIC based models that use physical delivery and fixed APIC destinations.
      
      The patch also turns on NO_IOAPIC_CHECK (1) to avoid error messages and
      attempts to re-write the ID, because on ES7000 all ID's are hard coded in the
      BIOS and cannot be altered.
      45dc4f27
    • Andrew Morton's avatar
      [PATCH] Consolidate sys32_nfsservctl · 522cbd42
      Andrew Morton authored
      From: Arnd Bergmann <arnd@arndb.de>
      
      sys32_nfsservctl is the largest remaining syscall emulation handler that can
      be consolidated.  mips and ia64 currently don't use this at all, parisc has a
      simpler implementation than the one used by s390, sparc ppc and that the new
      compat_sys_nfsservctl is based on.
      
      The user access checks in the code are inconsistant at least, which should be
      fixed here.
      
      Compile tested only due to lack of proper test setup.
      522cbd42
    • Andrew Morton's avatar
      [PATCH] Consolidate sys32_select · 37915f7b
      Andrew Morton authored
      From: Arnd Bergmann <arnd@arndb.de>
      
      sys32_select has seven mostly but not exactly identical versions, so
      consolidate them as compat_sys_select.  Based on the ppc64 implementation,
      which most closely resembles sys_select.  One bug that was not caught by LTP
      has been fixed since the first version of this patch.
      
      tested x86_64, ia64 and s390.
      37915f7b
    • Andrew Morton's avatar
      [PATCH] Consolidate do_execve32 · 265e0a42
      Andrew Morton authored
      From: Arnd Bergmann <arnd@arndb.de>
      
      The code for sys32_execve/do_execve32 in most of the seven versions was copied
      from fs/exec.c but not kept up-to-date.  The new compat_do_execve() function
      is based on the mips code and has been resync'ed with do_execve().  IA64
      changes are from Arun Sharma.
      
      Tested on x86_64, ia64 and s390
      265e0a42
    • Andrew Morton's avatar
      [PATCH] Consolidate sys32_readv and sys32_writev · 4791db72
      Andrew Morton authored
      From: Arnd Bergmann <arnd@arndb.de>
      
      The seven implementations of this have gone out of sync and are mostly buggy. 
      The new compat_sys_* version is based on the ppc64 implementation, which most
      closely resembles the code in sys_readv/sys_writev.
      
      Tested on x86_64, ia64 and s390.
      4791db72
    • Andrew Morton's avatar
      [PATCH] AS: increase batch expiry intervals · 8aab2013
      Andrew Morton authored
      From: Nick Piggin <nickpiggin@yahoo.com.au>
      
      Without disturbing the read/write ratio, increase the bathc expiry
      intervals.  This wil have the effect of increasing latency a little, but
      with improved throughput.
      8aab2013
    • Andrew Morton's avatar
      [PATCH] Laptop Mode doc update · e33daf9d
      Andrew Morton authored
      From: <bart@samwel.tk>
      
      Richard Atterer reported that mutt does not play well with noatime (it uses
      access times to check whether new mail has arrived in a folder).  This patch
      warns about this in the doc, and adds a setting to the control script to
      disable the noatime remount.
      e33daf9d
    • Andrew Morton's avatar
      [PATCH] cyclades MAINTAINERS update · c489e9e6
      Andrew Morton authored
      From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
      c489e9e6
    • Andrew Morton's avatar
      [PATCH] selinux: reopen descriptors closed on exec to /dev/null · def3f08e
      Andrew Morton authored
      From: Stephen Smalley <sds@epoch.ncsc.mil>
      
      This patch changes the SELinux module to try to reset any descriptors it
      closes on exec (due to a lack of permission by the new domain to the inherited
      open file) to refer to the null device.  This counters the problem of SELinux
      inducing program misbehavior, particularly due to having descriptors 0-2
      closed when the new domain is not allowed access to the caller's tty.  This is
      primarily to address the case where the caller is trusted with respect to the
      new domain, as the untrusted caller case is already handled via AT_SECURE and
      glibc secure mode.  The code is partly based on the OpenWall LSM, which in
      turn drew from the OpenWall kernel patch.  Note that the code does not
      guarantee that the descriptor is always re-opened to /dev/null; it merely
      makes a reasonable effort to do so, but can fail under various conditions.
      def3f08e