1. 29 Dec, 2003 40 commits
    • Andrew Morton's avatar
      [PATCH] Don't panic in mpparse on x86-64 · 53b3aa6c
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      Merge i386 fix. Don't panic in MP table parsing when the table is bad.
      53b3aa6c
    • Andrew Morton's avatar
      [PATCH] Signal fixes for x86-64 · ca981c9f
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      Merge signal race fixes from i386 to x86-64.
      
      Fix a bug in system call restart, noted by John Blackwood.
      ca981c9f
    • Andrew Morton's avatar
      [PATCH] Merge i386 fix for page fault to x86-64 · 2988d8dd
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      Merge the i386 fix for the page fault from Linus to x86-64
      (I'm not actually sure what it fixes, but if it's good for 32bit
      it is likely good for 64bit too)
      2988d8dd
    • Andrew Morton's avatar
      [PATCH] Add more paranoid checking in x86-64 prefetch checker · cf79a124
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      Make sure we never access anything in kernel mapping while
      doing the prefetch workaround checks on x86-64.
      
      Originally suggested by Jamie Lockier.
      cf79a124
    • Andrew Morton's avatar
      [PATCH] Fix 32bit truncate on x86-64 · 8f0f4aaa
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      Another potential data corruption fix.
      
      The 32bit truncate64 on x86-64 did silently truncate
      offsets >32bit. That broke mysql for example. Fix that.
      
      From Chris Wilson
      8f0f4aaa
    • Andrew Morton's avatar
      [PATCH] Fix sysrq-t on x86-64 · 3959fde8
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      From Badari Pulavarty
      
      Without this sysrq-t shows the same backtrace for all processes on x86-64
      3959fde8
    • Andrew Morton's avatar
      [PATCH] Fix CPUID compilation on x86-64 · 2393a309
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      A lot of people have run into this: the x86-64 cpuid driver didn't
      compile as module.
      
      Using a kludge suggested by Sam Ravnsborg.
      2393a309
    • Andrew Morton's avatar
      [PATCH] Critical x86-64 IOMMU fixes for 2.6.0 · f2059100
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      Please consider applying this patch, I would consider it critical for x86-64.
      
      The 2.6.0 x86-64 IOMMU code unfortunately had a few problems, leading
      to non booting systems and in a few cases to data corruption.
      
      It fixes a two serious bugs in handling special kinds of scatter gather
      lists in pci_map_sg.
      
      AGP was completely broken with IOMMU because of a wrong #ifdef.
      Fix that.
      
      One TLB flush optimization I did a long time ago seems to break on
      some 3ware boards (who require IOMMU because they don't support 64bit
      addresses).  The breakage lead to data corruption. This patch diables
      the optimization for now and fixes a potential SMP race in the flush
      code too. The TLB flush is done in a slower, but more reliable way
      now too.
      
      This patch fixes them. Please consider applying, because some of these
      problems hit quite many people.
      
      This also disables the IOMMU_DEBUG in the defconfig. A lot of people 
      were using the IOMMU when they didn't need to, which multiplied the
      problems.
      
      IOMMU merge is disabled for now. This was an experimental optimization
      which helped with some block devices, but for production it seems to
      be better to disable it for now because there are some questionable
      corner cases when the IOMMU aperture fragments. The same is done
      for IOMMU SAC force, which was related to that. 
      
      i386 has quite broken semantics for pci_alloc_consistent(). It uses
      the standard device DMA mask instead of the consistent mask. Make us
      bug-to-bug compatible here. This fixes problems with some sound
      drivers that don't support full 32bit addressing.
      f2059100
    • Andrew Morton's avatar
      [PATCH] Add a.out support for x86-64 · b14a4258
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      Add 32bit a.out support for x86-64.
      
      Not exactly an important bug fix, but maybe it will help someone.  This
      should increase the current 98% compatibility to i386 to perhaps 98.1% @)
      
      I tested an old a.out SuSE 4.2 installation in chroot and it worked.  It
      also ran some very old linux binaries from '92 found on ftp.funet.fi.  The
      only program that didn't was the SuSE a.out GNU emacs, but I was too lazy
      to track that down.  Core dumps are not supported.
      b14a4258
    • Andrew Morton's avatar
      [PATCH] statfs64 fix · dce80777
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      It fixes the statfs64 emulation on x86-64.  The problem is that x86-64
      needs an __attribute__((aligned)) on the compat_statfs64 structure.  The
      conclusion last time this was discussed was that the structure should be
      duplicated.
      
      Essentially it is the old shared structure copied to every user and x86-64
      uses __attribute__((packed)).
      dce80777
    • Andrew Morton's avatar
      [PATCH] dm and bounce buffer panic fix · 85734c47
      Andrew Morton authored
      From: Mark Haverkamp <markh@osdl.org>
      
      About three weeks ago markw at osdl posted a mail about a panic that he
      was seeing:
      
      http://marc.theaimsgroup.com/?l=linux-kernel&m=106737176716474&w=2
      
      I believe what is happening, is that the dm __clone_and_map function is
      generating bio structures with the bi_idx field non-zero.  When
      __blk_queue_bounce creates a new bio with bounce pages, it sets the bi_idx
      field to 0 rather than the bi_idx of the original.  This causes trouble since
      bv_page pointers will be dereferenced later that are zero.  The following
      uses the original bio structure's bi_idx in the new bio structure and in
      copy_to_high_bio_irq and bounce_end_io.
      
      This has cleared up the panic when using the volume.
      
      (acked by Joe Thornber)
      85734c47
    • Andrew Morton's avatar
      [PATCH] ext3: bd_claim for journal device · 9907e736
      Andrew Morton authored
      From: Neil Brown <neilb@cse.unsw.edu.au>
      
      Change ext3 to run bd_claim() against external journal devices. It is
      significant only for those who have ext3 journals on a separate device, and
      gets exclusive access to that device.
      9907e736
    • Andrew Morton's avatar
      [PATCH] remove include recursion from linux/pagemap.h · 1fcec52f
      Andrew Morton authored
      From: Arnaldo Carvalho de Melo <acme@conectiva.com.br>
      
      pagemap.h, do not include thyself.
      1fcec52f
    • Andrew Morton's avatar
      [PATCH] remove lock_kernel() from proc_bus_pci_lseek() · 1b6f967a
      Andrew Morton authored
      Remove pointless lock_kernel(), replace with the standard-but-still-odd
      i_sem-based lseek locking.
      1b6f967a
    • Andrew Morton's avatar
      [PATCH] fix oops in proc_kill_inodes() · 4617516d
      Andrew Morton authored
      proc_kill_inodes() walks the s_files list, playing with ->f_dentry.
      
      But there is a window in which __fput() will leave a file on that list with a
      null f_dentry and f_vfsmnt.
      
      I'm not sure it was ever confirmed that this fixed the reported oops, but it
      seems much better to set those fields to null _after_ removing the filp from
      the list.
      
      (Actually, there's no need to null those pointers out at all.  But whatever;
      it caught a bug).
      4617516d
    • Andrew Morton's avatar
      [PATCH] pagefault accounting fix · d2c585d3
      Andrew Morton authored
      From: William Lee Irwin III <wli@holomorphy.com>
      
      Our accounting of minor faults versus major faults is currently quite wrong.
      
      To fix it up we need to propagate the actual fault type back to the
      higher-level code.  Repurpose the currently-unused third arg to ->nopage
      for this.
      d2c585d3
    • Andrew Morton's avatar
      [PATCH] Remove CLONE_FILES from init kernel thread creation · 282ed003
      Andrew Morton authored
      From: James Morris <jmorris@redhat.com>
      
      The patch below removes the CLONE_FILES flag from the kernel_thread() call
      which starts init.
      
      This is to prevent other kernel threads from sharing file descriptors
      opened by init (try 'lsof /dev/initctl' on a 2.6 system :-).
      
      The reason this patch is being proposed is so that usermode helper apps
      launched via kernel threads (e.g. modprobe, hotplug) do not then inherit
      any such file descriptors.  This is not a problem in itself so far (other
      than being messy), but it is a problem for SELinux, which will otherwise
      need to grant access to /dev/initctl by modprobe and hotplug, a somewhat
      undesirable scenario.
      
      As far as I can tell, there is no reason why init needs to be spawned with
      CLONE_FILES.  Please let me know if there are any objections to the
      change, which I would like to propose for 2.6.0+ as a cleanup.
      282ed003
    • Andrew Morton's avatar
      [PATCH] Add support for SGI's IOC4 chipset · 125a4634
      Andrew Morton authored
      From: Aniket Malatpure <aniket@sgi.com>
      
      Adds support for the IOC4 IDE part.
      125a4634
    • Andrew Morton's avatar
      [PATCH] new /proc/irq cpumask format; consolidate cpumask display and input code · 409c7f3a
      Andrew Morton authored
      From: Paul Jackson <pj@sgi.com>
      
      This patch is a followup to one from Bill Irwin.  On Nov
      17, he had consolidated the half-dozen chunks of code
      that displayed cpumasks in /proc/irq/prof_cpu_mask and
      /proc/irq/<pid>/smp_affinity into a single routine, which he
      called format_cpumask().
      
      I believe that Andrew Morton has accepted Bill's patch into
      his 2.6.0-test10-mm1 patch set as the "format_cpumask" patch.
      I hope that the following patch will replace Bill's patch.
      I look forward to Bill's feedback on this patch.
      
      The following patch carries Bill's work further:
      
       1) It also consolidates the input side (write syscalls).
       2) It adapts a new format, same on input and output.
       3) The core routines work for any multi-word bitmask,
          not just cpumasks.
       4) The core routines avoid overrunning their output
          buffers.
      
      Note esp. for David Mosberger:
      
          The small patch I sent you and the linux-ia64 list
          yesterday entitled: "check user access ok writing
          /proc/irq/<pid>/smp_affinity" for arch ia64 only is
          _separate_ from the following patch.  Neither presumes the
          other.  However, they do collide on one line.  Last one in
          is a Monkey's Uncle and will need an updated patch from me
          (or otherwise need to resolve the one obvious collision).
      
      Details of the following patch:
      
      Both the display and input of cpumasks on 9 arch's are
      consolidated into a single pair of routines, which use the
      same format for input and output, as recommended by Tony
      Luck.  The two common routines work on any multi-word bitmask
      (array of unsigned longs).  A pair of trivial inline wrappers
      cpumask_snprintf() and cpumask_parse() hide this generality
      for the common case of cpumask input and output.
      
      My real motivation for consolidating this code will become
      visible later - when I seek to add a nodemask_t that resembles
      cpumask_t (just a different length).  These common underlying
      routines will be used there as well, following up on a suggestion
      of Christoph Hellwig that I investigate implementing nodemask_t
      as an ADT sharing infrastructure with cpumask_t.  However, I
      believe that this patch stands on its own merit, consolidating
      a couple hundred lines of duplicated code, and making the
      cpumask display format usable on very large systems.
      
      There are two exceptions to the consolidation - the alpha and
      sparc64 arch's manipulate bare unsigned longs, not cpumask_t's,
      on input (write syscall), and do stuff that was more funky than
      I could make sense of.  So the input side of these two arch's
      was left as-is.  I'd welcome someone with access to either of
      these systems to provide additional patches.
      
      The new format consists of multiple 32 bit words, separated by
      commas, displayed and input in hex.  The following comment from
      this patch describes this format further:
      
      * The ascii representation of multi-word bit masks displays each
      * 32bit word in hex (not zero filled), and for masks longer than
      * one word, uses a comma separator between words.  Words are
      * displayed in big-endian order most significant first.  And hex
      * digits within a word are also in big-endian order, of course.
      *
      * Examples:
      *   A mask with just bit 0 set displays as "1".
      *   A mask with just bit 127 set displays as "80000000,0,0,0".
      *   A mask with just bit 64 set displays as "1,0,0".
      *   A mask with bits 0, 1, 2, 4, 8, 16, 32 and 64 set displays
      *     as "1,1,10117".  The first "1" is for bit 64, the second
      *     for bit 32, the third for bit 16, and so forth, to the
      *     "7", which is for bits 2, 1 and 0.
      *   A mask with bits 32 through 39 set displays as "ff,0".
      
      The essential reason for adding the comma breaks was to make
      the long masks from our (SGI's) big 512 CPU systems parsable by
      humans.  An unbroken string of 128 hex digits is pretty difficult
      to read.  For those who are compiling systems with CONFIG_NR_CPUS
      of 32 or less, there should be no visible change in format.
      
      There are of course a thousand possible output formats that
      meet similar criteria.  If someone wants to lobby for and seek
      consensus behind another such format, that's fine.  Now that
      the format is consolidated into a single pair of routines,
      it should be easy to adapt whatever we choose.
      
      Internally, the display routine uses snprintf to track the
      remaining space in its output buffer, to avoid the risk of
      overrunning it.
      
      A new file, lib/mask.c, is added to the lib directory, to
      hold the two common routines.  I anticipate adding a few more
      common routines for generic support of multi-word bit masks to
      lib/mask.c, in subsequent patches that will add a nodemask_t
      type as an ADT sharing implementation with cpumask_t.
      409c7f3a
    • Andrew Morton's avatar
      [PATCH] cpumask.h reorg · 89832108
      Andrew Morton authored
      From: Paul Jackson <pj@sgi.com>
      
      Push the cpumask implementation from linux/cpumask.h into asm/cpumask.h, so
      that ia64 can do special things without breaking sparc64.
      
      1) Each arch has its own include/asm-<arch>/cpumask.h file
      
      2) That arch-specific header file can include <asm-generic/cpumask.h>,
         if it wants to make use of the generic cpumask implementation.
      
      3) Using code should continue to include linux/cpumask.h, which
         in turn includes asm/cpumask.h.  Some common implementation
         independent cpumask related items, such as the cpu_online_map,
         are declared directly in linux/cpumask.h.
      89832108
    • Andrew Morton's avatar
      [PATCH] Add lib/parser.c kernel-doc · adf9a351
      Andrew Morton authored
      From: Will Dyson <will_dyson@pobox.com>
      
      Add documentation and comments to lib/parser.c and include/linux/parser.h
      adf9a351
    • Andrew Morton's avatar
      [PATCH] IDE capability elevation fix · cb8d8fe9
      Andrew Morton authored
      From: Alan Cox <alan@redhat.com>
      
      Capability elevation bug in 2.6.0 IDE. Long fixed in 2.4.x, trivial to cure
      cb8d8fe9
    • Andrew Morton's avatar
      [PATCH] IDE MMIO fix · 90c6dd77
      Andrew Morton authored
      From: Alan Cox <alan@redhat.com>
      
      IDE core code had the mmio==2 (ioremap) mode supported but two small changes
      had been missed for ide-dma.c.  Without this fix mmio IDE controllers bomb if
      you have plenty of memory as it uses request_mem_region on an ioremap return.
      90c6dd77
    • Andrew Morton's avatar
      [PATCH] Can't disable IDE DMA · 22f4d9f1
      Andrew Morton authored
      From: Peter Chubb <peterc@gelato.unsw.edu.au>
      
      If you try to disable IDE DMA from Kconfig, you'll end up with an undefined
      symbol, ide_hwif_setup_dma().
      
      The attached rather ugly patch fixes the problem by defining a dummy
      function.
      22f4d9f1
    • Andrew Morton's avatar
      [PATCH] PIIX5 Doesn't work on IA64 · c1f0e653
      Andrew Morton authored
      From: Peter Chubb <peterc@gelato.unsw.edu.au>
      
      The PIIX5 IDE controller on I2000 IA64 boxen using the 460GX chipset will
      hang on startup if an ordinary harddrive is plugged into it (it seems to
      workj for the LSI120 and the CDROM drives).
      
      This is because the 460GX chipset contains a PCI expanssion bridge that
      works like the 450NX PXB, and has the same PCI ID (but a later revision).
      The PIIX driver, to work around interactions between PIIX4 and the 450NX
      PXB, tries to disable DMA.
      
      Unfortunately, the way it tries to disable DMA doesn't work, and the higher
      layers think that DMA is still on, and so timeout waiting for DMA, and then
      hang on bootup.
      
      A simple workaround is to tighten the check for the buggy chipset, as in
      the attached patch.  However, someone with more time (and who actually
      *understands* the IDE subsystem) needs to fix the real bug as well.
      c1f0e653
    • Andrew Morton's avatar
      [PATCH] ide-tape update · 8179c97e
      Andrew Morton authored
      From: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>,
            Stuart Hayes <stuart_hayes@dell.com>
      
      - Check drive's write protect bit, try to return appropriate
        errors when attempting to write a write-protected tape.
      
      - Moved "idetape_read_position" call in idetape_chrdev_open
        after the "wait_ready" call.
      
      - Added IDETAPE_MEDIUM_PRESENT flag so driver would know
        not to rewind tape after ejecting it.
      
      - Fixed bug with ide_abort_pipeline (it was deleting stages
        from tape->next_stage to end, instead of from
        new_last_stage->next (tape->next_stage was set to NULL
        by idetape_discard_read_pipeline before calling!).
      
      - Made improvements to idetape_wait_ready.
      
      - Added a few comments here and there.
      
      - Made MTOFFL unlock tape drive door before attempting to eject.
      
      - Added fixes to get Seagate STT3401A Travan working:
        Handle drives that don't support 0-length reads/writes increased timeout
        (retension takes ~10 minutes before irq is returned).
        Fixed request mode page packet command byte 3.
      
      Also remove code depending on NO_LONGER_REQUIRED to match 2.4.x (me).
      8179c97e
    • Andrew Morton's avatar
      [PATCH] Minor bug fixes to the compat layer · 14209d06
      Andrew Morton authored
      From: Arun Sharma <arun.sharma@intel.com>
      
      - Several instances where we were using pid_t instead of uid_t
      
      - If the caller passed a NULL `oldact' pointer into sys_sigprocmask then
        don't try to write the old sigmask there.
      14209d06
    • Andrew Morton's avatar
      [PATCH] watchdog write() return value fixes · 41339307
      Andrew Morton authored
      From: gleb@nbase.co.il (Gleb Natapov)
      
      There is inconsistency in fops->write() implementation in different
      watchdog drivers.  Some of them return number of bytes written while others
      return 1.
      
      I think the correct implementation should always return number of bytes
      written (we examine all the buffer after all) otherwise "echo V >
      /dev/watchdog" doesn't work as expected (it doesn't stop watchdog).
      41339307
    • Andrew Morton's avatar
      [PATCH] missing padding in cpio_mkfile in usr/gen_init_cpio.c · a7380b60
      Andrew Morton authored
      From: Olaf Hering <olh@suse.de>
      
      We need to update `offset' here so that the subsequent push_pad() (which
      uses `offset') will do the right thing.
      a7380b60
    • Andrew Morton's avatar
      [PATCH] document elevator= parameter · a5c9613f
      Andrew Morton authored
      From: Valdis.Kletnieks@vt.edu
      
      Nick wrote a nice as-iosched.txt file, but apparently nobody updated the
      kernel-parameters.txt file...
      a5c9613f
    • Andrew Morton's avatar
      [PATCH] support centrino 1GHz · ce2da20e
      Andrew Morton authored
      From: Jeremy Fitzhardinge <jeremy@goop.org>
      
      I've been getting quite a lot of people mailing me about this CPU.  It
      seems Toshiba has released a machine with it.  It would be nice if this
      patch gets into a kernel soonish.  It's very low-impact.
      ce2da20e
    • Andrew Morton's avatar
      [PATCH] Intel 440gx PCI IDs · a77ef229
      Andrew Morton authored
      - Add missing PCI ID
      
      - Forward-port IRQ routing workaround from 2.4.
      a77ef229
    • Andrew Morton's avatar
      [PATCH] seq_file version of /proc/interrupts · ab6b1810
      Andrew Morton authored
      From: corbet@lwn.net (Jonathan Corbet)
      
      This converts all architectures' /proc/interrupts implementation over to
      seq_file.  We need this for SMP machines with ridiculous numbers of CPUs and
      if you convert one arch, you have to convert them all...
      ab6b1810
    • Andrew Morton's avatar
      [PATCH] eicon/ and hardware/eicon/ drivers using the same symbols · b031787e
      Andrew Morton authored
      From: Adrian Bunk <bunk@fs.tum.de>
      
      The legacy eicon driver in drivers/isdn/eicon is the old one and will be
      removed as soon as all features went to the new driver.  Anyway this old
      driver was never meant to be non-module.
      b031787e
    • Andrew Morton's avatar
      [PATCH] fix SOUND_CMPCI Configure help entry · 54f47272
      Andrew Morton authored
      From: Adrian Bunk <bunk@fs.tum.de>
      
      the issue below is only a minor documentation fix, but it has confused
      me when configuring a kernel for such a card.
      54f47272
    • Andrew Morton's avatar
      [PATCH] find_busiest_queue() commentary fix · 2d0014c7
      Andrew Morton authored
      From: Ingo Molnar <mingo@elte.hu>
      
      Clarify a comment in the CPU scheduler.
      2d0014c7
    • Andrew Morton's avatar
      [PATCH] use alloc_percpu in percpu_counters · 22565897
      Andrew Morton authored
      From: Martin Hicks <mort@wildopensource.com>
      
      Once NR_CPUS exceeds about 300 ext2 and ext3 will not compile, because the
      percpu counters in the superblocks are so huge that they cannot be kmalloced.
      
      Fix this by converting the percpu_counter mechanism to use alloc_percpu()
      rather than an NR_CPUS-sized array.
      22565897
    • Andrew Morton's avatar
      [PATCH] lockless semop · 55e8b1a1
      Andrew Morton authored
      From: Manfred Spraul <manfred@colorfullife.com>
      
      attached is the lockless semop patch. I did another test run with 
      idle=poll on an pentium III, and it remained unchanged: 99.9% direct 
      fast path, 0.1% race with wakeup against writing the final result code:
      
      http://khack.osdl.org/stp/282936/environment/proc/slabinfo
      
      That means there is no immediate need to add the two-stage
      implementation to finish_wait.
      
      It reduces the spinlock operations on the semaphore array spinlock by 1/3.
      55e8b1a1
    • Andrew Morton's avatar
      [PATCH] Fix writev atomicity on pipe/fifo · 1af764e1
      Andrew Morton authored
      From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      
      Current writev() of pipe/fifo can be interleaved with data from other
      processes doing writes even when the requests size is <= PIPE_BUF.  These
      writes should in fact be atomic.
      
      The readv() side is also supported for same behavior with read().  And it
      is faster.
      
      readv/writev version of bw_pipe in LMbench
      
      2.6.0-test9-bk12
      hirofumi@devron (i686-pc-linux-gnu)[1010]$ ./bw_pipe -m 4096 -M 5
      Pipe bandwidth: 45.53 MB/sec
      hirofumi@devron (i686-pc-linux-gnu)[1009]$ ./bw_pipe -m 1024 -M 5
      Pipe bandwidth: 20.08 MB/sec
      
      2.6.0-test9-bk12 + patch
      hirofumi@devron (i686-pc-linux-gnu)[1001]$ ./bw_pipe -m 4096 -M 5
      Pipe bandwidth: 65.98 MB/sec
      hirofumi@devron (i686-pc-linux-gnu)[1002]$ ./bw_pipe -m 1024 -M 5
      Pipe bandwidth: 32.19 MB/sec
      1af764e1
    • Andrew Morton's avatar
      [PATCH] optimize ia32 memmove · ed109bc5
      Andrew Morton authored
      From: Manfred Spraul <manfred@colorfullife.com>
      
      The memmove implementation of i386 is not optimized: it uses movsb, which is
      far slower than movsd.  The optimization is trivial: if dest is less than
      source, then call memcpy().  markw tried it on a 4xXeon with dbt2, it saved
      around 300 million cpu ticks in cache_flusharray():
      
      oprofile, GLOBAL_POWER_EVENTS, count 100k
      Before:
      c0144ed1 <cache_flusharray>: /* cache_flusharray total:  21823  0.0165 */
           6 4.5e-06 :c0144f8e:       cmp    %esi,%ebx
          11 8.3e-06 :c0144f90:       jae    c0144f9e <cache_flusharray+0xcd>
           3 2.3e-06 :c0144f92:       mov    %ebx,%edi
        7305  0.0055 :c0144f94:       repz movsb %ds:(%esi),%es:(%edi)
         201 1.5e-04 :c0144f96:       add    $0x10,%esp
      
      After:
      c0144f1d <cache_flusharray>: /* cache_flusharray total:  17959  0.0136 */
        1270 9.6e-04 :c0144f1d:       push   %ebp
      [snip]
           6 4.6e-06 :c0144fdc:       cmp    %esi,%ebx
          13 9.9e-06 :c0144fde:       jae    c0145000 <cache_flusharray+0xe3>
           2 1.5e-06 :c0144fe0:       mov    %edx,%eax
           1 7.6e-07 :c0144fe2:       mov    %ebx,%edi
          11 8.4e-06 :c0144fe4:       shr    $0x2,%eax
           1 7.6e-07 :c0144fe7:       mov    %eax,%ecx
        4129  0.0031 :c0144fe9:       repz movsl %ds:(%esi),%es:(%edi)
         261 2.0e-04 :c0144feb:       test   $0x2,%dl
          27 2.1e-05 :c0144fee:       je     c0144ff2 <cache_flusharray+0xd5>
                     :c0144ff0:       movsw  %ds:(%esi),%es:(%edi)
          95 7.2e-05 :c0144ff2:       test   $0x1,%dl
          96 7.3e-05 :c0144ff5:       je     c0144ff8 <cache_flusharray+0xdb>
                     :c0144ff7:       movsb  %ds:(%esi),%es:(%edi)
         121 9.2e-05 :c0144ff8:       add    $0x1c,%esp
      ed109bc5