1. 29 Dec, 2003 40 commits
    • Andrew Morton's avatar
      [PATCH] Fix writev atomicity on pipe/fifo · 1af764e1
      Andrew Morton authored
      From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      
      Current writev() of pipe/fifo can be interleaved with data from other
      processes doing writes even when the requests size is <= PIPE_BUF.  These
      writes should in fact be atomic.
      
      The readv() side is also supported for same behavior with read().  And it
      is faster.
      
      readv/writev version of bw_pipe in LMbench
      
      2.6.0-test9-bk12
      hirofumi@devron (i686-pc-linux-gnu)[1010]$ ./bw_pipe -m 4096 -M 5
      Pipe bandwidth: 45.53 MB/sec
      hirofumi@devron (i686-pc-linux-gnu)[1009]$ ./bw_pipe -m 1024 -M 5
      Pipe bandwidth: 20.08 MB/sec
      
      2.6.0-test9-bk12 + patch
      hirofumi@devron (i686-pc-linux-gnu)[1001]$ ./bw_pipe -m 4096 -M 5
      Pipe bandwidth: 65.98 MB/sec
      hirofumi@devron (i686-pc-linux-gnu)[1002]$ ./bw_pipe -m 1024 -M 5
      Pipe bandwidth: 32.19 MB/sec
      1af764e1
    • Andrew Morton's avatar
      [PATCH] optimize ia32 memmove · ed109bc5
      Andrew Morton authored
      From: Manfred Spraul <manfred@colorfullife.com>
      
      The memmove implementation of i386 is not optimized: it uses movsb, which is
      far slower than movsd.  The optimization is trivial: if dest is less than
      source, then call memcpy().  markw tried it on a 4xXeon with dbt2, it saved
      around 300 million cpu ticks in cache_flusharray():
      
      oprofile, GLOBAL_POWER_EVENTS, count 100k
      Before:
      c0144ed1 <cache_flusharray>: /* cache_flusharray total:  21823  0.0165 */
           6 4.5e-06 :c0144f8e:       cmp    %esi,%ebx
          11 8.3e-06 :c0144f90:       jae    c0144f9e <cache_flusharray+0xcd>
           3 2.3e-06 :c0144f92:       mov    %ebx,%edi
        7305  0.0055 :c0144f94:       repz movsb %ds:(%esi),%es:(%edi)
         201 1.5e-04 :c0144f96:       add    $0x10,%esp
      
      After:
      c0144f1d <cache_flusharray>: /* cache_flusharray total:  17959  0.0136 */
        1270 9.6e-04 :c0144f1d:       push   %ebp
      [snip]
           6 4.6e-06 :c0144fdc:       cmp    %esi,%ebx
          13 9.9e-06 :c0144fde:       jae    c0145000 <cache_flusharray+0xe3>
           2 1.5e-06 :c0144fe0:       mov    %edx,%eax
           1 7.6e-07 :c0144fe2:       mov    %ebx,%edi
          11 8.4e-06 :c0144fe4:       shr    $0x2,%eax
           1 7.6e-07 :c0144fe7:       mov    %eax,%ecx
        4129  0.0031 :c0144fe9:       repz movsl %ds:(%esi),%es:(%edi)
         261 2.0e-04 :c0144feb:       test   $0x2,%dl
          27 2.1e-05 :c0144fee:       je     c0144ff2 <cache_flusharray+0xd5>
                     :c0144ff0:       movsw  %ds:(%esi),%es:(%edi)
          95 7.2e-05 :c0144ff2:       test   $0x1,%dl
          96 7.3e-05 :c0144ff5:       je     c0144ff8 <cache_flusharray+0xdb>
                     :c0144ff7:       movsb  %ds:(%esi),%es:(%edi)
         121 9.2e-05 :c0144ff8:       add    $0x1c,%esp
      ed109bc5
    • Andrew Morton's avatar
      [PATCH] Use NODES_SHIFT to calculate ZONE_SHIFT · e2c3c9e2
      Andrew Morton authored
      From: jbarnes@sgi.com (Jesse Barnes)
      
      Now that we have a proper NODES_SHIFT value, we need to use it to define
      ZONE_SHIFT otherwise we'll spill over 8 bits if we have more than 85 nodes.
      e2c3c9e2
    • Andrew Morton's avatar
      [PATCH] Fix for more than 256 CPUs · e403669e
      Andrew Morton authored
      From: Paul Jackson <pj@sgi.com>
      
      The patch is needed to build NR_CPUS > 256.
      
      Without this fix, you get compile errors:
          include/linux/cpumask.h: In function `next_online_cpu':
          include/linux/cpumask.h:56: structure has no member named `val'
      e403669e
    • Andrew Morton's avatar
      [PATCH] ia32 WP test cleanup · 6caf4668
      Andrew Morton authored
      From: Zwane Mwaikambo <zwane@arm.linux.org.uk>
      
      Make the test unconditional - we can always run it now we have fixmap
      support.
      6caf4668
    • Andrew Morton's avatar
      [PATCH] Restore /proc/pid/maps formatting · 3f3a4378
      Andrew Morton authored
      The seq_file conversion of /proc/pid/maps caused altered behaviour with
      respect to 2.4.22.  Before the conversion, spaces and tabs in filenames were
      displayed verbatim.  After the conversion they are escaped as \040, etc.
      
      Also, if the mmapped file has been unlinked the output appears as
      
      40017000-40018000 rw-p 00000000 03:02 1425800    /home/akpm/foo\040(deleted)
      
      instead of
      
      40017000-40018000 rw-p 00000000 03:02 1425800    /home/akpm/foo (deleted)
      
      This could break applications which parse /proc/pid/maps (one person has
      reported this).
      
      The patch restores the 2.4.20 behaviour.
      3f3a4378
    • Andrew Morton's avatar
      [PATCH] Get modpost to work properly with vmlinux in a different directory · e5d9d44e
      Andrew Morton authored
      From: "Bryan O'Sullivan" <bos@pathscale.com>
      
      The current version of modpost breaks if invoked from outside the build
      tree.  This patch fixes that, and simplifies the code a bit while it's at
      it.
      e5d9d44e
    • Andrew Morton's avatar
      [PATCH] Be verbose about the ia32 time source · 67fbc534
      Andrew Morton authored
      From: john stultz <johnstul@us.ibm.com>
      
      The patch arranges for each timesource type to have a name, and uses that to
      tell the user which timesource is in use at bootup time.
      67fbc534
    • Andrew Morton's avatar
      [PATCH] vmscan: reset refill_counter after refilling the inactive list · 9c8c9492
      Andrew Morton authored
      zone->refill_counter is only there to provide decent levels of work batching:
      don't call refill_inactive_zone() just for a couple of pages.
      
      But the logic in there allows it to build up to huge values and it can
      overflow (go negative) which will disable refilling altogether until it wraps
      positive again.
      
      Just reset it to zero whenever we decide to do some refilling.
      9c8c9492
    • Andrew Morton's avatar
      [PATCH] serial console registration bugfix · 6f222020
      Andrew Morton authored
      From: Bjorn Helgaas <bjorn.helgaas@hp.com>
      
      uart_set_options() can dereference a null pointer.  This happens if you
      specify a console that hasn't previously been setup by early_serial_setup().
      
      For example, on ia64, the HCDP typically tells us about line 0, so we calls
      early_serial_setup() for it.  If the user specifies "console=ttyS3", we
      machine-check when trying to follow the uninitialized port->ops pointer.
      
      It's not entirely clear to me whether we should return 0 or -ENODEV or
      something.  The advantage of returning zero is that if the user specifies
      "console=ttyS0" and we just lack the HCDP, the console doesn't work as early
      as usual, but it does start working after the serial driver detects the port
      (though the baud/parity/etc from the command line are lost).  Returning
      -ENODEV seems to prevent it from ever working.
      6f222020
    • Andrew Morton's avatar
      [PATCH] Fix sysenter disabling in vm86 mode · 783faefa
      Andrew Morton authored
      From: Brian Gerst <bgerst@didntduck.org>
      
      The current code disables sysenter when first entering vm86 mode, but does
      not disable it again when coming back to a vm86 task after a task switch.
      783faefa
    • Andrew Morton's avatar
      [PATCH] Add `gcc -Os' config option · ffd0cf49
      Andrew Morton authored
      From: Adrian Bunk <bunk@fs.tum.de>
      
      Allow the kernel to be built with `-Os'.
      
      It requires CONFIG_EMBEDDED.  This is to make it "hard to get at" because
      one gcc version (3.2.x I think) from RH9 generates crashy kernels with this
      option set.
      ffd0cf49
    • Andrew Morton's avatar
      [PATCH] Fix proc_pid_lookup vs exit race · a93fabd3
      Andrew Morton authored
      From: Manfred Spraul <manfred@colorfullife.com>
      
      Fixes a race between proc_pid_lookup and sys_exit.
      
      - The inodes and dentries for /proc/<pid>/whatever are cached in the dentry
        cache.  d_revalidate is used to protect against stale data: d_revalidate
        returns invalid if the task exited.
      
        Additionally, sys_exit flushes the dentries for the task that died -
        otherwise the dentries would stay around until they arrive at the end of
        the LRU, which could take some time.  But there is one race:
      
        - proc_pid_lookup finds a task and prepares new dentries for it. It must 
          drop all locks for that operation.
        - the process exits, and the /proc/ dentries are flushed. Nothing
          happens, because they are not yet in the hash tables.
        - proc_pid_lookup adds the task to the dentry cache.
      
        Result: dentry of a dead task in the hash tables.
      
        The patch fixes that problem by flushing again if proc_pid_lookup notices
        that the thread exited while it created the dentry.  The patch should go
        in, but it's not critical.
      
      
      - task->proc_dentry must be the dentry of /proc/<pid>.  That way sys_exit
        can flush the whole subtree at exit time.  proc_task_lookup is a direct
        copy of proc_pid_lookup and handles /proc/<>/task/<pid>.  It contains the
        lines that set task->proc_dentry.  This is bogus, and must be removed.
      
        This hunk is much more critical, because creates a de-facto dentry leak
        (they are recovered after flushing real dentries from the cache).
      a93fabd3
    • Andrew Morton's avatar
      [PATCH] Fix init_i82365 sysfs ordering oops · 0f3edb4c
      Andrew Morton authored
      From: Russell King <rmk@arm.linux.org.uk>
      
      This oops has been caused by the need to register the class before
      registering any objects against it.  Unfortunately, the class needs
      to be registered asynchronously in a separate thread to avoid driver
      model deadlock with yenta with cardbus cards inserted or standard
      PCMCIA cards not being detected correctly due to a race.
      
      I think the only real solution is to remove the class_device_create_file
      calls from all socket drivers.  This is just a simple commenting out of
      the calls, and should be suitable for the remainder of the -test kernels.
      
      Due to the number of cases that we're encountering with PCMCIA, I'm
      beginning to wonder if the driver model could be fixed to be more kind
      to PCMCIA by avoiding some of these ordering dependencies.  None of this
      would be a problem if the driver model would allow PCI device drivers to
      register PCI devices while their probe or remove functions were executing.
      0f3edb4c
    • Andrew Morton's avatar
      [PATCH] NSL config fixes · 4e1c704a
      Andrew Morton authored
      From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      
      - use "select" instead of "depend"
      
      - remove the unused SMB_NLS
      
      - remove unneeded "default y" of CONFIG_NLS
      
      - revert to postion of nls menu (middle of filessytem menus is strange)
      
      - fix "#ifdef CONFIG_NLS" on UDF (should this add new one to Kconfig?)
      4e1c704a
    • Andrew Morton's avatar
      [PATCH] Fix dcache and icache bloat with deep directories · 6f5bd3c5
      Andrew Morton authored
      This fixes the recently-reported "fsstress memory leak" problem.  It has been
      there since November 2002.
      
      shrink_dcache() has a heuristic to prevent the dcache (and hence icache) from
      getting shrunk too far: it refuses to allow the dcache to shrink below
      2*nr_used.
      
      Problem is, _all_ non-leaf dentries (directories) count as used.  So when you
      have really deep directory hierarchies (fsstress creates these), nr_used is
      really high, and there is no upper bound to the amount of pinned dcache.
      
      The patch just rips out the heuristic.  This means that dcache (and hence
      icache (and hence pagecache)) will be shrunk more aggressively.  This could
      be a problem, and tons of testing is needed - a new heuristic may be needed.
      
      However I am not able to reproduce the problem which cause me to add this
      heuristic in the first place:
      
         Simple testcase: run a huge `dd' while running a concurrent `watch -n1
         cat /proc/meminfo'.  The program text for `cat' gets loaded from disk once
         per second.
      6f5bd3c5
    • Andrew Morton's avatar
      [PATCH] cmpci.c: remove pointless set_fs() · a61f9729
      Andrew Morton authored
      It is doing a set_fs(KERNEL_DS) for no obvious reason.
      
      Spotted by margitsw@t-online.de (Margit Schubert-While)
      a61f9729
    • Andrew Morton's avatar
      [PATCH] ext3 scheduling latency fix · 9e77aa68
      Andrew Morton authored
      Sometimes kjournald has to refile a huge number of buffers, because someone
      else wrote them out beforehand - they are all clean.
      
      This happens under a lock and scheduling latencies of 88 milliseconds on a
      2.7GHx CPU were observed.
      
      The patch forward-ports a little bit of the 2.4 low-latency patch to fix this
      problem.
      
      Worst-case on ext3 is now sub-half-millisecond, except for when the RCU
      dentry reaping softirq cuts in :(
      9e77aa68
    • Andrew Morton's avatar
      [PATCH] make name_to_dev_t __init · c5427c68
      Andrew Morton authored
      It calls __init functions anyway.
      c5427c68
    • Andrew Morton's avatar
      [PATCH] Use __GFP_REPEAT for cdrom buffer · 40ea9a64
      Andrew Morton authored
      The cdrom driver does an order-4 allocation and the open will fail if that
      allocation does not succeed.  This happened to me on an unstressed 900MB
      machine.
      
      So add the __GFP_REPEAT flag in there - this will cause the page allocator to
      keep on freeing pages until the allocation succeeds.
      
      It can in theory livelock but in practice I expect it is OK: the user should
      just stop running dbench or whatever it is which is gobbling all the memory
      and the mount/open will then succeed.
      40ea9a64
    • Andrew Morton's avatar
      [PATCH] scale the initial value of min_free_kbytes · 4d1ba80c
      Andrew Morton authored
      This tunable refers to the amount of free memory which the VM will attempt to
      sustain.  It is mainly needed for atomic allocations (eg, networking
      receive).
      
      It is currently hardwired to 1024k, which is far too large for small machines
      and too small for large machines.
      
      Rework it to be 128k on tiny machines and 16M on huge machines.
      4d1ba80c
    • Andrew Morton's avatar
      [PATCH] sqrt() fixes · e44db7e2
      Andrew Morton authored
      It turns out that the int_sqrt() function in oom_kill.c gets it wrong.
      
      But fb_sqrt() in fbmon.c gets its math right.  Move that function into
      lib/int_sqrt.c, and consolidate.
      
      (oom_kill.c fix from Thomas Schlichter <schlicht@uni-mannheim.de>)
      e44db7e2
    • Andrew Morton's avatar
      [PATCH] compat_ioctl for i2c · 9b1ace8b
      Andrew Morton authored
      From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      
      I needed those for the G5 on ppc64, so here they are, I was only
      able to test the SMBUS stuff though.
      9b1ace8b
    • Andrew Morton's avatar
      [PATCH] EFI support for ia32 · c596442a
      Andrew Morton authored
      From: Matt Tolentino <metolent@snoqualmie.dp.intel.com>
      
      Attached is a patch that enables EFI boot-up support in ia32 kernels.
      
      In order to continue to determine whether the kernel should initialize using
      EFI tables, I've temporarily added a check on the LOADER_TYPE boot parameter.
       Although I haven't requested that elilo be assigned an id for this yet, I've
      used this to determine whether the kernel should use the EFI initialization
      path as well as a check to see if the EFI_SYSTAB boot parameter contains
      anything.  If someone has a better suggestion for determining this, I'm
      open...
      
      This patch also uses the existing ioremapping functions to map the efi tables
      into kernel virtual address space.  I've added an option such that I could
      use Dave Hansen's boot_ioremap() before paging_init().  After paging_init, I
      then remap the efi memmap using bt_ioremap for use later.  This has
      eliminated the need for several functions...thanks for the suggestions and
      thanks for your help Dave.  Still this could use a look-see.
      c596442a
    • Andrew Morton's avatar
      [PATCH] ia32 Message Signalled Interrupt support · f036d4ea
      Andrew Morton authored
      From: long <tlnguyen@snoqualmie.dp.intel.com>
      
      
      Add support for Message Signalled Interrupt delivery on ia32.
      
      With a fix from Zwane Mwaikambo <zwane@arm.linux.org.uk>
      f036d4ea
    • Andrew Morton's avatar
      [PATCH] futex uninlining · 82b699a6
      Andrew Morton authored
                 text    data     bss     dec     hex filename
      Before:    4674    1040    4100    9814    2656 kernel/futex.o
      After:     4098    1176    4100    9374    249e kernel/futex.o
      82b699a6
    • Andrew Morton's avatar
      [PATCH] make /proc/tty/driver/ S_IRUSR | S_IXUSR for root only · 4b66a41d
      Andrew Morton authored
      From: Chris Wright <chrisw@osdl.org>
      
      Fix for CAN-2003-0461: /proc/tty/driver/serial in Linux 2.4.x reveals the
      exact number of characters used in serial links, which could allow local
      users to obtain potentially sensitive information such as the length of
      passwords.
      4b66a41d
    • Andrew Morton's avatar
      [PATCH] fix suid leak in /proc · ce3323db
      Andrew Morton authored
      From: Chris Wright <chrisw@osdl.org>
      
      Fix for CAN-2003-0501: The /proc filesystem in Linux allows local users to
      obtain sensitive information by opening various entries in /proc/self
      before executing a setuid program, which causes the program to fail to
      change the ownership and permissions of those entries.
      ce3323db
    • Andrew Morton's avatar
      [PATCH] fix unsigned issue with env_end - env_start · d585d2c0
      Andrew Morton authored
      From: Chris Wright <chrisw@osdl.org>
      
      Fix for CAN-2003-0462:  A race condition in the way env_start and
      env_end pointers are initialized in the execve system call and used in
      fs/proc/base.c on Linux 2.4 allows local users to cause a denial of
      service (crash).
      d585d2c0
    • Andrew Morton's avatar
      [PATCH] use new steal_locks helper · 02c541ec
      Andrew Morton authored
      From: Chris Wright <chrisw@osdl.org>
      
      Use the new steal_locks helper to steal the locks from the old files struct
      left from unshare_files() when the new unshared struct files gets used.
      02c541ec
    • Andrew Morton's avatar
      [PATCH] add steal_locks helper · 088f5d72
      Andrew Morton authored
      From: Chris Wright <chrisw@osdl.org>
      
      Add steal_locks helper for use in conjunction with unshare_files to make
      sure POSIX file lock semantics aren't broken due to unshare_files.
      088f5d72
    • Andrew Morton's avatar
      [PATCH] use new unshare_files helper · 04e9bcb4
      Andrew Morton authored
      From: Chris Wright <chrisw@osdl.org>
      
      Use unshare_files during binary loading to eliminate potential leak of
      the binary's fd installed during execve().  As is, this breaks
      binfmt_som.c
      04e9bcb4
    • Andrew Morton's avatar
      [PATCH] unshare_files · 02cda956
      Andrew Morton authored
      From: Chris Wright <chrisw@osdl.org>
      
      Introduce unshare_files as a helper for use during execve to eliminate
      potential leak of the execve'd binary's fd.
      02cda956
    • Linus Torvalds's avatar
      Merge bk://kernel.bkbits.net/davem/compat-aio-2.6 · 3a8d5347
      Linus Torvalds authored
      into home.osdl.org:/home/torvalds/v2.5/linux
      3a8d5347
    • David S. Miller's avatar
      Merge davem@nuts.ninka.net:/disk1/davem/BK/compat-aio-2.5 · 30ca6c0b
      David S. Miller authored
      into kernel.bkbits.net:/home/davem/compat-aio-2.5
      30ca6c0b
    • Linus Torvalds's avatar
      Merge bk://bk.arm.linux.org.uk/linux-2.6-exp · 42c1faa2
      Linus Torvalds authored
      into home.osdl.org:/home/torvalds/v2.5/linux
      42c1faa2
    • Linus Torvalds's avatar
      Merge bk://kernel.bkbits.net/davem/sparc-2.5 · 0b170e19
      Linus Torvalds authored
      into home.osdl.org:/home/torvalds/v2.5/linux
      0b170e19
    • Linus Torvalds's avatar
      Merge bk://linuxusb.bkbits.net/usb-devel-2.6 · 9ade0432
      Linus Torvalds authored
      into home.osdl.org:/home/torvalds/v2.5/linux
      9ade0432
    • David S. Miller's avatar
      Merge davem@nuts.ninka.net:/disk1/davem/BK/sparc-2.5 · 3a613294
      David S. Miller authored
      into kernel.bkbits.net:/home/davem/sparc-2.5
      3a613294
    • Pete Zaitcev's avatar
      [SPARC]: Get sun4c functional again in 2.6.0 · f2a39c2b
      Pete Zaitcev authored
      Move some elements of task_struct into thread_info so that
      these elements are locked into the TLB in the trap handlers
      and thus will not cause a watchdog reset.
      f2a39c2b