1. 28 May, 2002 24 commits
    • Linus Torvalds's avatar
      Merge home.transmeta.com:/home/torvalds/v2.5/blk-plug · 9c2c68b8
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      9c2c68b8
    • Jens Axboe's avatar
      [PATCH] block plugging reworked · eba5b46c
      Jens Axboe authored
      This patch provides the ability for a block driver to signal it's too
      busy to receive more work and temporarily halt the request queue. In
      concept it's similar to the networking netif_{start,stop}_queue helpers.
      
      To do this cleanly, I've ripped out the old tq_disk task queue. Instead
      an internal list of plugged queues is maintained which will honor the
      current queue state (see QUEUE_FLAG_STOPPED bit). Execution of
      request_fn has been moved to tasklet context. blk_run_queues() provides
      similar functionality to the old run_task_queue(&tq_disk).
      
      Now, this only works at the request_fn level and not at the
      make_request_fn level. This is on purpose: drivers working at the
      make_request_fn level are essentially providing a piece of the block
      level infrastructure themselves. There are basically two reasons for
      doing make_request_fn style setups:
      
      o block remappers. start/stop functionality will be done at the target
        device in this case, which is the level that will signal hardware full
        (or continue) anyways.
      
      o drivers who wish to receive single entities of "buffers" and not
        merged requests etc. This could use the start/stop functionality. I'd
        suggest _still_ using a request_fn for these, but set the queue
        options so that no merging etc ever takes place. This has the added
        bonus of providing the usual request depletion throttling at the block
        level.
      eba5b46c
    • Anton Blanchard's avatar
      [PATCH] TLB shootdown infrastructure in 2.5 · b8391722
      Anton Blanchard authored
      It looks like a race betwen exec_mmap and access_process_vm in
      proc_pid_cmdline (or any other procfs functions that uses
      access_process_vm).
      b8391722
    • Jens Axboe's avatar
      [PATCH] a few ll_rw_blk exports missing · 454c37c0
      Jens Axboe authored
      o blk_get_request() and blk_put_request() needs exporting
      o blk_max_pfn is used by BLOCK_BOUNCE_ANY, which modular SCSI needs
      454c37c0
    • Robert Love's avatar
      [PATCH] Robert Love likes leather and chains · 3e4a097b
      Robert Love authored
      > Hmm. That patch does not compile. "p->cpu" does not exist, it's
      > "p->thread_info->cpu". Tssk.
      
      Ouch, I am bad.  Sorry.
      
      Make the ChangeLog entry something really defamatory.
      
      	Robert Love
      3e4a097b
    • Robert Love's avatar
      [PATCH] O(1) count_active_tasks · 01bc15ed
      Robert Love authored
      This is William Irwin's algorithmically O(1) version of
      count_active_tasks (which is currently O(n) for n total tasks on the
      system).
      
      I like it a lot: we become O(1) because now we count uninterruptible
      tasks, so we can return (nr_uninterruptible + nr_running).  It does not
      introduce any overhead or hurt the case for small n, so I have no
      complaints.
      
      This copy has a small optimization over the original posting, but is
      otherwise the same thing wli posted earlier.  I have tested to make sure
      this returns accurate results and that the kernel profile improves.
      01bc15ed
    • Ivan Kokshaysky's avatar
      [PATCH] 2.5.18 pci/setup-bus.c: incorrect BUG() calls · 5ff8f2bb
      Ivan Kokshaysky authored
      Previously assigned resources are perfectly valid - just silently
      ignore them.
      5ff8f2bb
    • Robert Love's avatar
      [PATCH] real-time info in /proc/<pid>/stats · 79569bfe
      Robert Love authored
      Attached patch adds output of rt_priority and policy to
      /proc/<pid>/stats.
      
      This will not break compatibility with existing applications and will
      allow ps(1) and friends to display pertinent scheduling information.
      79569bfe
    • Jan-Benedict Glaw's avatar
      [PATCH] Trivial compile fix to fs/binfmt_em86.c · 3e7e1382
      Jan-Benedict Glaw authored
      Please apply this patch to let binfmt_em86.c compile again.
      3e7e1382
    • Linus Torvalds's avatar
      More drm updates from Keith Whitwell · 1df703fa
      Linus Torvalds authored
      1df703fa
    • Ivan Kokshaysky's avatar
      [PATCH] 2.5.18: unnamed PCI bus resources · bae651dd
      Ivan Kokshaysky authored
      As pointed out by Russell King, resource name pointers
      of the secondary PCI buses are left uninitialized in the
      non-x86 PCI allocation path.
      
      Assigning these pointers in pci_add_new_bus() fixes the problem.
      bae651dd
    • Linus Torvalds's avatar
      Merge http://fbdev.bkbits.net:8080/fbdev-2.5 · 7304ada2
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      7304ada2
    • Martin Dalecki's avatar
      [PATCH] 2.5.18 QUEUE_EMPTY and the unpleasant friends. · cdac1baf
      Martin Dalecki authored
       - Eliminate all usages of the obscure QUEUE_EMPTY macro.
      
       - Eliminate all unneccessary checks for RQ_INACTIVE, this can't happen during
         the time we run the request strategy routine of a single major number block
         device. Perhaps the still remaining usage in scsi and i2o_block.c should be
         killed as well, since the upper ll_rw_blk layer shouldn't pass inactive
         requests down.
      
      Those are all places where we have deeply burried and hidden major number
      indexed arrays. Let's deal with them slowly...
      cdac1baf
    • Martin Dalecki's avatar
      [PATCH] airo · 5fb231d2
      Martin Dalecki authored
      Since apparently no body else did care thus far, and since I'm using
      this driver, well here it comes:
      
       - Adjust the airo wireless LAN card driver for the fact that modules
         don't export symbols by default any longer.
      
       - Make some stuff which obivously should be static there static as well.
         (Plenty of code in Linux actually deserves a review for this
         far too common bug...)
      5fb231d2
    • Martin Dalecki's avatar
      [PATCH] 2.5.18 IDE 72 · 9c4d67fb
      Martin Dalecki authored
       - Replace ide_delay_50m with mdelay(50). There is absolutely no reason we
         should behave different behaviors whatever IDECS support is enabled or not.
      
       - Kill last parameter of ide_register_hw(). It should return a pointer to the
         interface registered later.
      
       - pdc202xx patches by Bartomiej onierkiewicz.
      
       - ServerWorks chi pset support cleanup by Andrej Panin.
      
       - Move temporarily ide_setup_ports to main.c unfold it in ide-pnp.c.
      9c4d67fb
    • Robert Love's avatar
      [PATCH] preempt-safe net/ code · 1bc32826
      Robert Love authored
      This fixes three locations in net/ where per-CPU data could bite us
      under preemption.  This is the result of an audit I did and should
      constitute all of the unsafe code in net/.
      
      In net/core/skbuff.c I did not have to introduce any code - just
      rearrange the grabbing of smp_processor_id() to be in the interrupt off
      region.  Pretty clean fixes.
      
      Note in the future we can use put_cpu() and get_cpu() to grab the CPU#
      safely.  I will send a patch to Marcelo so we can have a 2.4 version
      (which doesn't do the preempt stuff), too...
      1bc32826
    • Robert Love's avatar
      [PATCH] set_cpus_allowed optimization · eab0fed3
      Robert Love authored
      This adds an optimization to set_cpus_allowed: if the task is not
      running, there is no sense in kicking the migration_threads into action,
      we just need to update task->cpu.  This was suggested by Mike Kravetz.
      
      Besides being an optimization, this would prevent any future race
      between set_cpus_allowed and the migration_threads.
      eab0fed3
    • Robert Love's avatar
      [PATCH] documentation for the new scheduler · 49610fe2
      Robert Love authored
      This adds documentation about the O(1) scheduler to Documentation/.  The
      new scheduler is complicated and providing future scheduler hackers some
      background seems a Good Thing to me.
      
      Specifically:
      
      - add Documentation/sched-coding.txt: an overview of the functions,
        magic numbers, and variables in the scheduler as well as (most
        importantly) a review of the locking semantics.
      
      - add Documentation/sched-design.txt: an edited version of Ingo's
        initial email to lkml about his scheduler.  Goes over the design,
        implementation, and goals of the scheduler.  I tried to edit it where
        needed to bring it in line with the scheduler as it is today.
      
      - modify kernel/sched.c: update your copyright and add a change entry
        for the new scheduler.
      49610fe2
    • Robert Love's avatar
      [PATCH] trivial: no "error" on preempt_count notice · 75e50517
      Robert Love authored
      The attached trivial patch simply changes the printk debug statement in
      do_exit when preempt_count!=0 to say "note" instead of "error" and log
      at KERN_INFO in lieu of KERN_ERR.
      
      I want to keep the message around a bit, but people get too paranoid
      when things like nfsd legitimately exit with a preempt_count=1.
      75e50517
    • James Simmons's avatar
      Ported Voodoo3+ cards over to new api. · 1ecd8afb
      James Simmons authored
      1ecd8afb
    • James Simmons's avatar
      More changes for new fbdev subsytem. · a49ac33b
      James Simmons authored
      a49ac33b
    • Linus Torvalds's avatar
      350d70da
    • James Simmons's avatar
    • James Simmons's avatar
      Merge · 4118216f
      James Simmons authored
      4118216f
  2. 27 May, 2002 16 commits
    • Andrew Morton's avatar
      [PATCH] avoid sys_sync livelocks · 8d04539d
      Andrew Morton authored
      This makes sure that sys_sync() will terminate.  It counts up the
      number of dirty pages in the machine and will refuse to write out more
      than 1.25 times this number of pages.  This function is called twice
      on the sys_sync() path, so the kernel will actually write 2.5x the number
      of initially-dirty pages before giving up.
      8d04539d
    • Andrew Morton's avatar
      [PATCH] move nr_active and nr_inactive into per-CPU page · ce677ce2
      Andrew Morton authored
      It might reduce pagemap_lru_lock hold times a little, and is more
      consistent.  I think all global page accounting is now inside
      page_states[].
      ce677ce2
    • Andrew Morton's avatar
      [PATCH] factor common code in page_alloc.c · 9a0bd0e3
      Andrew Morton authored
      Factor out some similar code in page_alloc.c
      9a0bd0e3
    • Andrew Morton's avatar
      [PATCH] move BH_JBD out of buffer_head.h · 28ea30f7
      Andrew Morton authored
      For historical reasons, ext3 has a private BH state bit which has
      global scope.  This patch moves it inside ext3.
      28ea30f7
    • Andrew Morton's avatar
      [PATCH] fix ext3 __FUNCTION__ warnings · ca927221
      Andrew Morton authored
      Patch from Anton Blanchard which replaces
      
      	printk(KERN_FOO __FUNCTION__ ": msg");
      
      with
      	printk(KERN_FOO "%s: msg", __FUNCTION__);
      
      in ext3.
      ca927221
    • Andrew Morton's avatar
      [PATCH] generic_file_write() cleanup · 124d8831
      Andrew Morton authored
      Fixes all the goto spaghetti in generic_file_write() and turns it into
      something which humans can understand.
      
      Andi tells me that gcc3 does a decent job of relocating blocks out of
      line anyway.  This patch gives the compiler a helping hand with
      appropriate use of likely() and unlikely().
      124d8831
    • Andrew Morton's avatar
      [PATCH] remove mem_map_t · fd6dee02
      Andrew Morton authored
      Random cleanup: remove the mem_map_t typedef.  Just use 'struct page'
      everywhere.
      fd6dee02
    • Andrew Morton's avatar
      [PATCH] dirsync · bb772c58
      Andrew Morton authored
      An implementation of directory-synchronous mounts.
      
      I sent this out some months ago and it didn't generate a lot of
      interest.  Later we had one of the usual cheery exchanges with Wietse
      Venema (postfix development) and he agreed that directory synchronous
      mounts were something that he could use, and that there was benefit in
      implementing them in Linux.  If you choose to apply this I'll push the
      2.4 patch.
      
      
      
      Patch against e2fsprogs-1.26:
              http://www.zip.com.au/~akpm/linux/dirsync/e2fsprogs-1.26.patch
      
      Patch against util-linux-2.11n:
              http://www.zip.com.au/~akpm/linux/dirsync/util-linux-2.11n.patch
      
      
      The kernel patch includes implementations for ext2 and ext3. It's
      pretty simple.
      
      - When dirsync is in operation against a directory, the following operations
        are synchronous within that directory:  create, link, unlink, symlink,
        mkdir, rmdir, mknod, rename (synchronous if either the source or dest
        directory is dirsync).
      
      - dirsync is a subset of sync.  So `mount -o sync' or `chattr +S'
        give you everything which `mount -o dirsync' or `chattr +D' gives,
        plus synchronous file writes.
      
      - ext2's inode.i_attr_flags is unused, and is removed.
      
      - mount /dev/foo /mnt/bar -o dirsync  works as expected.
      
      - An ext2 or ext3 directory tree can be set dirsync with `chattr +D -R'.
      
      - dirsync is maintained as new directories are created under
        a `chattr +D' directory.  Like `chattr +S'.
      
      - Other filesystems can trivially be taught about dirsync.  It's just
        a matter of replacing `IS_SYNC(inode)' with `IS_DIRSYNC(inode)' in
        the directory update functions.  IS_SYNC will still be honoured when
        IS_DIRSYNC is used.
      
      - Non-directory files do not have their dirsync flag propagated.  So
        an S_ISREG file which is created inside a dirsync directory will not
        have its dirsync bit set.  chattr needs to do this as well.
      
      - There was a bit of version skew between e2fsprogs' idea of the
        inode flags and the kernel's.  That is sorted out here.
      
      - `lsattr' shows the dirsync flag as "D".  The letter "D" was
        previously being used for Compressed_Dirty_File.  I changed
        Compressed_Dirty_File to use "Z".  Is that OK?
      
      The mount(2) manpage needs to be taught about MS_DIRSYNC.
      bb772c58
    • Andrew Morton's avatar
      [PATCH] rename writeback_mapping to writepages · 7d608fac
      Andrew Morton authored
      Spot the difference:
      
      aops.readpage
      aops.readpages
      aops.writepage
      aops.writeback_mapping
      
      The patch renames `writeback_mapping' to `writepages'
      7d608fac
    • Andrew Morton's avatar
      [PATCH] enable direct-to-BIO readahead for ext3 · 1dd747c0
      Andrew Morton authored
      Turn on multipage no-buffers reads for ext3.
      1dd747c0
    • Andrew Morton's avatar
      [PATCH] direct-to-BIO writeback · ab9e8941
      Andrew Morton authored
      Multipage BIO writeout from the pagecache.
      
      It's pretty much the same as multipage reads.  It falls back to buffers
      if things got complex.
      
      The write case is a little more complex because it handles pages which
      have buffers and pages which do not.  If the page didn't have buffers
      this code does not add them.
      ab9e8941
    • Andrew Morton's avatar
      [PATCH] direct-to-BIO readahead · bc67de55
      Andrew Morton authored
      Implements BIO-based multipage reads into the pagecache, and turns this
      on for ext2.
      
      CPU load for `cat large_file > /dev/null' is reduced by approximately
      15%.  Similar reductions for tiobench with a single thread.  (Earlier
      claims of 25% were exaggerated - they were measured with slab debug
      enabled.  But 15% isn't bad for a load which is dominated by copy_*_user
      costs).
      
      With 2, 4 and 8 tiobench threads, throughput is increased as well, which was
      unexpected.  It's due to request queue weirdness.  (Generally the
      request queueing is doing bad things under certain workloads - that's a
      separate issue.)
      
      BIOs of up to 64 kbytes are assembled and submitted for readahead and
      for single-page reads.  So the work involved in reading 32 pages has gone
      from:
      
      	- allocate and attach 32 buffer_heads
      	- submit 32 buffer_heads
      	- allocate 32 bios
      	- submit 32 bios
      
      to:
      
      	- allocate 2 bios
      	- submit 2 bios
      
      These pages never have buffers attached.  Buffers will be attached
      later if the application writes to these pages (file overwrite).
      
      The first version of this code (in the "delayed allocation" patches)
      tries to handle everything - bios which start mid-page, bios which end
      mid-page and pages which are covered by multiple bios.  It is very
      complex code and in fact appears to be incorrect: out-of-order BIO
      completion could cause a page to come unlocked at the wrong time.
      
      This implementation is much simpler: if things get complex, it just
      falls back to the buffer-based block_read_full_page(), which isn't
      going away, and which understands all that complexity.  There's no
      point in doing this in two places.
      
      This code will bypass the buffer layer for
      
       - fully-mapped pages which are on-disk contiguous.
      
       - fully unmapoped pages (holes)
      
       - partially unmapped pages, where the unmappedness is at the end of
         the page (end-of-file).
      
      and everything else falls back to buffers.
      
      This means that with blocksize == PAGE_CACHE_SIZE, 100% of pages are
      handed direct to BIO.  With a heavy 10-minute dbench run on 4k
      PAGE_CACHE_SIZE and 1k blocks, 95% of pages were handed direct to BIO.
      Almost all of the other 5% were passed to block_read_full_page()
      because they were already partially uptodate from an earlier sub-page
      write().  This ratio will fall if PAGE_CACHE_SIZE/blocksize is greater
      than four.  But if that's the case, CPU efficiency is far from the main
      concern - there are significant seek and bandwidth problems just at 4
      blocks per page.
      
      This code will stress out the block layer somewhat - RAID0 doesn't like
      multipage BIOs, and there are probably others.  RAID0 seems to struggle
      along - readahead fails but read falls back to single-page reads, which
      succeed.  Such problems may be worked around by setting MPAGE_BIO_MAX_SIZE
      to PAGE_CACHE_SIZE in fs/mpage.c.
      
      It is trivial to enable multipage reads for many other filesystems.  We
      can do that after completion of external testing of ext2.
      bc67de55
    • Andrew Morton's avatar
      [PATCH] relax nr_to_write requirements · 47279570
      Andrew Morton authored
      Relax the requirements on the writeback_mapping a_op.
      
      This function is passed the number of pages which it should write.  The
      current fs-writeback.c code will get confused if the address_space
      writes back more pages than it was asked to.
      
      With this change the address_space may write more pages than required
      if that is convenient.  Extent-based fileystems may wish to do this.
      47279570
    • Andrew Morton's avatar
      [PATCH] mark swapout pages PageWriteback() · 357f5a5e
      Andrew Morton authored
      Pages which are under writeout to swap are locked, and not
      PageWriteback().  So page allocators do not throttle against them in
      shrink_caches().
      
      This causes enormous list scans and general coma under really heavy
      swapout loads.
      
      One fix would be to teach shrink_cache() to wait on PG_locked for swap
      pages.  The other approach is to set both PG_locked and PG_writeback
      for swap pages so they can be handled in the same manner as file-backed
      pages in shrink_cache().
      
      This patch takes the latter approach.
      357f5a5e
    • Andrew Morton's avatar
      [PATCH] fix loop driver for large BIOs · bd052817
      Andrew Morton authored
      Fix bug in the loop driver.
      
      When presented with a multipage BIO, loop is overindexing the first
      page in the BIO rather than advancing to the second page.  It scribbles
      on the backing file and/or on kernel memory.
      
      This happens with multipage BIO-based pagecache I/O and presumably with
      O_DIRECT also.
      
      The fix is much-needed with the multipage-BIO patches - using that code
      on loop-backed filesystems has rather messy results.
      bd052817
    • Andrew Morton's avatar
      [PATCH] ext3 set_page_dirty fix · 12feeeda
      Andrew Morton authored
      The set_page_dirty() in the ext3_writepage() failure path isn't right.
      set_page_dirty() will alter buffer states - it's a "whole page"
      dirtying.
      
      __set_page_dirty_buffers() is emitting warnings when it refuses to set
      dirty a non-uptodate buffer against a partially-mapped page.
      
      All we want to do in there is to move the page back onto
      mapping->dirty_pages, without altering the state of its buffers.
      12feeeda