1. 25 Nov, 2002 30 commits
  2. 22 Nov, 2002 10 commits
    • Linus Torvalds's avatar
      Linux v2.5.49 · cebce9d8
      Linus Torvalds authored
      cebce9d8
    • Linus Torvalds's avatar
      Merge bk://bk.arm.linux.org.uk · 1ca4ebb9
      Linus Torvalds authored
      into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
      1ca4ebb9
    • Russell King's avatar
      [ARM] Fixups for 2.5.48-bkcur · 7ad26fa6
      Russell King authored
      Fix compilation errors for do_fork() and print_symbol()
      7ad26fa6
    • Linus Torvalds's avatar
      Merge bk://cifs.bkbits.net/linux-2.5cifs · 32ff6d01
      Linus Torvalds authored
      into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
      32ff6d01
    • Andrew Morton's avatar
      [PATCH] no-buffer-head ext2 option · b1ad1f4e
      Andrew Morton authored
      Implements a new set of block address_space_operations which will never
      attach buffer_heads to file pagecache.  These can be turned on for ext2
      with the `nobh' mount option.
      
      During write-intensive testing on a 7G machine, total buffer_head
      storage remained below 0.3 megabytes.  And those buffer_heads are
      against ZONE_NORMAL pagecache and will be reclaimed by ZONE_NORMAL
      memory pressure.
      
      This work is, of course, a special for the huge highmem machines.
      Possibly it obsoletes the buffer_heads_over_limit stuff (which doesn't
      work terribly well), but that code is simple, and will provide relief
      for other filesystems.
      
      
      It should be noted that the nobh_prepare_write() function and the
      PageMappedToDisk() infrastructure is what is needed to solve the
      problem of user data corruption when the filesystem which backs a
      sparse MAP_SHARED mapping runs out of space.  We can use this code in
      filemap_nopage() to ensure that all mapped pages have space allocated
      on-disk.  Deliver SIGBUS on ENOSPC.
      
      This will require a new address_space op, I expect.
      b1ad1f4e
    • Andrew Morton's avatar
      [PATCH] handle zones which are full of unreclaimable pages · 36fb7f84
      Andrew Morton authored
      This patch is a general solution to the situation where a zone is full
      of pinned pages.
      
      This can come about if:
      
      a) Someone has allocated all of ZONE_DMA for IO buffers
      
      b) Some application is mlocking some memory and a zone ends up full
         of mlocked pages (can happen on a 1G ia32 system)
      
      c) All of ZONE_HIGHMEM is pinned in hugetlb pages (can happen on 1G
         machines)
      
      We'll currently burn 10% of CPU in kswapd when this happens, although
      it is quite hard to trigger.
      
      The algorithm is:
      
      - If page reclaim has scanned 2 * the total number of pages in the
        zone and there have been no pages freed in that zone then mark the
        zone as "all unreclaimable".
      
      - When a zone is "all unreclaimable" page reclaim almost ignores it.
        We will perform a "light" scan at DEF_PRIORITY (typically 1/4096'th of
        the zone, or 64 pages) and then forget about the zone.
      
      - When a batch of pages are freed into the zone, clear its "all
        unreclaimable" state and start full scanning again.  The assumption
        being that some state change has come about which will make reclaim
        successful again.
      
        So if a "light scan" actually frees some pages, the zone will revert to
        normal state immediately.
      
      So we're effectively putting the zone into "low power" mode, and lightly
      polling it to see if something has changed.
      
      The code works OK, but is quite hard to test - I mainly tested it by
      pinning all highmem in hugetlb pages.
      36fb7f84
    • Andrew Morton's avatar
      [PATCH] strengthen the `incremental min' logic in the page · fee2b68d
      Andrew Morton authored
      Strengthen the `incremental min' logic in the page allocator.
      
      Currently it is allowing the allocation to succeed if the zone has
      free_pages >= pages_high.
      
      This was to avoid a lockup corner case in which all the zones were at
      pages_high so reclaim wasn't doing anything, but the incremental min
      refused to take pages from those zones anyway.
      
      But we want the incremental min zone protection to work.  So:
      
      - Only allow the allocator to dip below the incremental min if he
        cannot run direct reclaim.
      
      - Change the page reclaim code so that on the direct reclaim path,
        the caller can free pages beyond ->pages_high.  So if the incremental
        min test fails, the caller will go and free some more memory.
      
        Eventually, the caller will have freed enough memory for the
        incremental min test to pass against one of the zones.
      fee2b68d
    • Andrew Morton's avatar
      [PATCH] Remove mapping->vm_writeback · 53bf7bef
      Andrew Morton authored
      The vm_writeback address_space operation was designed to provide the VM
      with a "clustered writeout" capability.  It allowed the filesystem to
      perform more intelligent writearound decisions when the VM was trying
      to clean a particular page.
      
      I can't say I ever saw any real benefit from this - not much writeout
      actually happens on that path - quite a lot of work has gone into
      minimising it actually.
      
      The default ->vm_writeback a_op which I provided wrote back the pages
      in ->dirty_pages order.  But there is one scenario in which this causes
      problems - writing a single 4G file with mem=4G.  We end up with all of
      ZONE_NORMAL full of dirty pages, but all writeback effort is against
      highmem pages.  (Because there is about 1.5G of dirty memory total).
      
      Net effect: the machine stalls ZONE_NORMAL allocation attempts until
      the ->dirty_pages writeback advances onto ZONE_NORMAL pages.
      
      This can be fixed most sweetly with additional radix-tree
      infrastructure which will be quite complex.  Later.
      
      
      So this patch dumps it all, and goes back to using writepage
      against individual pages as they come off the LRU.
      53bf7bef
    • Andrew Morton's avatar
      [PATCH] Fix busy-wait with writeback to large queues · 5fa9d488
      Andrew Morton authored
      blk_congestion_wait() is a utility function which various callers use
      to throttle themselves to the rate at which the IO system can retire
      writes.
      
      The current implementation refuses to wait if no queues are "congested"
      (>75% of requests are in flight).
      
      That doesn't work if the queue is so huge that it can hold more than
      40% (dirty_ratio) of memory.  The queue simply cannot enter congestion
      because the VM refuses to allow more than 40% of memory to be dirtied.
      (This spin could happen with a lot of normal-sized queues too)
      
      So this patch simply changes blk_congestion_wait() to throttle even if
      there are no congested queues.  It will cause the caller to sleep until
      someone puts back a write request against any queue.  (Nobody uses
      blk_congestion_wait for read congestion).
      
      The patch adds new state to backing_dev_info->state: a couple of flags
      which indicate whether there are _any_ reads or writes in flight
      against that queue.  This was added to prevent blk_congestion_wait()
      from taking a nap when there are no writes at all in flight.
      
      But the "are there any reads" info could be used to defer background
      writeout from pdflush, to reduce read-vs-write competition.  We'll see.
      
      Because the large request queues have made a fundamental change:
      blocking in get_request_wait() has been the main form of VM throttling
      for years.  But with large queues it doesn't work any more - all
      throttling happens in blk_congestion_wait().
      
      Also, change io_schedule_timeout() to propagate the schedule_timeout()
      return value.  I was using that in some debug code, but it should have
      been like that from day one.
      5fa9d488
    • Andrew Morton's avatar
      [PATCH] bootmem crash fix · 40a7fe2f
      Andrew Morton authored
      From Roman Zippel.  Don't assume that physical memory starts at
      physical address zero.
      40a7fe2f