1. 12 Mar, 2004 40 commits
    • Andrew Morton's avatar
      [PATCH] fix the kswapd zone scanning algorithm · ffa0fb78
      Andrew Morton authored
      This removes a vestige of the old algorithm.  We don't want to skip zones if
      all_zones_ok is true: we've already precalculated which zones need scanning
      and this just stops us from ever performing kswapd reclaim from the DMA zone.
      ffa0fb78
    • Andrew Morton's avatar
      [PATCH] kswapd: fix lumpy page reclaim · 519ab68b
      Andrew Morton authored
      As kswapd is now scanning zones in the highmem->normal->dma direction it can
      get into competition with the page allocator: kswapd keep on trying to free
      pages from highmem, then kswapd moves onto lowmem.  By the time kswapd has
      done proportional scanning in lowmem, someone has come in and allocated a few
      pages from highmem.  So kswapd goes back and frees some highmem, then some
      lowmem again.  But nobody has allocated any lowmem yet.  So we keep on and on
      scanning lowmem in response to highmem page allocations.
      
      With a simple `dd' on a 1G box we get:
      
       r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy wa id
       0  3      0  59340   4628 922348    0    0     4 28188 1072   808  0 10 46 44
       0  3      0  29932   4660 951760    0    0     0 30752 1078   441  1  6 30 64
       0  3      0  57568   4556 924052    0    0     0 30748 1075   478  0  8 43 49
       0  3      0  29664   4584 952176    0    0     0 30752 1075   472  0  6 34 60
       0  3      0   5304   4620 976280    0    0     4 40484 1073   456  1  7 52 41
       0  3      0 104856   4508 877112    0    0     0 18452 1074    97  0  7 67 26
       0  3      0  70768   4540 911488    0    0     0 35876 1078   746  0  7 34 59
       1  2      0  42544   4568 939680    0    0     0 21524 1073   556  0  5 43 51
       0  3      0   5520   4608 976428    0    0     4 37924 1076   836  0  7 41 51
       0  2      0   4848   4632 976812    0    0    32 12308 1092    94  0  1 33 66
      
      Simple fix: go back to scanning the zones in the dma->normal->highmem
      direction so we meet the page allocator in the middle somewhere.
      
       r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy wa id
       1  3      0   5152   3468 976548    0    0     4 37924 1071   650  0  8 64 28
       1  2      0   4888   3496 976588    0    0     0 23576 1075   726  0  6 66 27
       0  3      0   5336   3532 976348    0    0     0 31264 1072   708  0  8 60 32
       0  3      0   6168   3560 975504    0    0     0 40992 1072   683  0  6 63 31
       0  3      0   4560   3580 976844    0    0     0 18448 1073   233  0  4 59 37
       0  3      0   5840   3624 975712    0    0     4 26660 1072   800  1  8 46 45
       0  3      0   4816   3648 976640    0    0     0 40992 1073   526  0  6 47 47
       0  3      0   5456   3672 976072    0    0     0 19984 1070   320  0  5 60 35
      519ab68b
    • Andrew Morton's avatar
      [PATCH] kswapd: avoid unnecessary reclaiming from higher zones · 9ef935c2
      Andrew Morton authored
      Currently kswapd walks across all zones in dma->normal->highmem order,
      performing proportional scanning until all zones are OK.  This means that
      pressure against ZONE_NORMAL causes unnecessary reclaim of ZONE_HIGHMEM.
      
      To fix that up we change kswapd so that it walks the zones in the
      high->normal->dma direction, skipping zones which are OK.  Once it encounters
      a zone which needs some reclaim kswapd will perform proportional scanning
      against that zone as well as all the succeeding lower zones.
      
      We scan the lower zones even if they have sufficient free pages.  This is
      because
      
      a) the lower zone may be above pages_high, but because of the incremental
         min, the lower zone may still not be eligible for allocations.  That's bad
         because cache in that lower zone will then not be scanned at the correct
         rate.
      
      b) pages in this lower zone are usable for allocations against the higher
         zone.  So we do want to san all the relevant zones at an equal rate.
      9ef935c2
    • Andrew Morton's avatar
      [PATCH] vmscan: avoid bogus throttling · bcf2fb27
      Andrew Morton authored
      - If max_scan evaluates to zero due to a very small inactive list and high
        `priority' numbers, we don't want to thrlttle yet.
      
      - In balance_pgdat(), we may end up not scanning any pages because all
        zones happened to be above pages_high.  Avoid throttling in this case too.
      bcf2fb27
    • Andrew Morton's avatar
      [PATCH] Balance inter-zone scan rates · e5f02647
      Andrew Morton authored
      When page reclaim is working out how many pages to san in a zone (max-scan)
      it presently rounds that number up if it looks too small - for work batching.
      
      Problem is, this can result in excessive scanning against small zones which
      have few inactive pages.  So remove it.
      
      Not that it is possible for max_scan to be zero.  That's OK - it'll become
      non-zero as the priority increases.
      e5f02647
    • Andrew Morton's avatar
      [PATCH] vmscan: drive everything via nr_to_scan · 5954a8b0
      Andrew Morton authored
      Page reclaim is currently a bit schitzo: sometimes we say "go and scan this
      many pages and tell me how many pages were freed" and at other times we say
      "go and scan this many pages, but stop if you freed this many".
      
      It makes the logic harder to control and to understand.  This patch coverts
      everything into the "go and scan this many pages and tell me how many pages
      were freed" model.
      
      It doesn't seem to affect performance much either way.
      5954a8b0
    • Andrew Morton's avatar
      [PATCH] vmscan: zone balancing fix · b532f4af
      Andrew Morton authored
      We currently have a problem with the balancing of reclaim between zones: much
      more reclaim happens against highmem than against lowmem.
      
      This patch partially fixes this by changing the direct reclaim path so it
      does not bale out of the zone walk after having reclaimed sufficient pages
      from highmem: go on to reclaim from lowmem regardless of how many pages we
      reclaimed from lowmem.
      b532f4af
    • Andrew Morton's avatar
      [PATCH] vm: scan slab in response to highmem scanning · 768c4fcc
      Andrew Morton authored
      The patch which went in six months or so back which said "only reclaim slab
      if we're scanning lowmem pagecache" was wrong.  I must have been asleep at
      the time.
      
      We do need to scan slab in response to highmem page reclaim as well.  Because
      all the math is based around the total amount of memory in the machine, and
      we know that if we're performing highmem page reclaim then the lower zones
      have no free memory.
      768c4fcc
    • Andrew Morton's avatar
      [PATCH] vmscan: fix calculation of number of pages scanned · a5cc10d5
      Andrew Morton authored
      From: Nick Piggin <piggin@cyberone.com.au>
      
      The logic which calculates the numberof pages which were scanned is mucked
      up.  Fix.
      a5cc10d5
    • Andrew Morton's avatar
      [PATCH] vm: shrink slab evenly in try_to_free_pages() · b488ea81
      Andrew Morton authored
      From: Nick Piggin <piggin@cyberone.com.au>
      
      In try_to_free_pages(), put even pressure on the slab even if we have
      reclaimed enough pages from the LRU.
      b488ea81
    • Andrew Morton's avatar
      [PATCH] shrink_slab: math precision fix · dee96113
      Andrew Morton authored
      From: Nick Piggin <piggin@cyberone.com.au>
      
      In shrink_slab(), do the multiply before the divide to avoid losing
      precision.
      dee96113
    • Andrew Morton's avatar
      [PATCH] vmscan: preserve page referenced info in refill_inactive() · 29d8c59c
      Andrew Morton authored
      From: Nick Piggin <piggin@cyberone.com.au>
      
      If refill_inactive_zone() is running in its dont-reclaim-mapped-memory mode
      we are tossing away the referenced infomation on active mapped pages.
      
      So put that info back if we're not going to deactivate the page.
      29d8c59c
    • Andrew Morton's avatar
      [PATCH] kswapd throttling fixes · b6c1702e
      Andrew Morton authored
      The logic in balance_pgdat() is all bollixed up.
      
      - the incoming arg `nr_pages' should be used to determine if we're being
        asked to free a specific number of pages, not `to_free'.
      
      - local variable `to_free' is not appropriate for the determination of
        whether we failed to bring all zones to appropriate free pages levels.
      
        Fix this by correctly calculating `all_zones_ok' and then use
        all_zones_ok to determine whether we need to throttle kswapd.
      
      So the logic now is:
      
      
      	for (increasing priority) {
      
      		all_zones_ok = 1;
      
      		for (all zones) {
      			to_reclaim = number of pages to try to reclaim
      				     from this zone;
      			max_scan = number of pages to scan in this pass
      				   (gets larger as `priority' decreases)
      			/*
      			 * set `reclaimed' to the number of pages which were
      			 * actually freed up
      			 */
      			reclaimed = scan(max_scan pages);
      			reclaimed += shrink_slab();
      
      			to_free -= reclaimed;	/* for the `nr_pages>0' case */
      
      			/*
      			 * If this scan failed to reclaim `to_reclaim' or more
      			 * pages, we're getting into trouble.  Need to scan
      			 * some more, and throttle kswapd.   Note that this
      			 * zone may now have sufficient free pages due to
      			 * freeing activity by some other process.   That's
      			 * OK - we'll pick that info up on the next pass
      			 * through the loop.
      			 */
      			if (reclaimed < to_reclaim)
      				all_zones_ok = 0;
      		}
      		if (to_free > 0)
      			continue;	/* swsusp: need to do more work */
      		if (all_zones_ok)
      			break;		/* kswapd is done */
      		/*
      		 * OK, kswapd is getting into trouble.  Take a nap, then take
      		 * another pass across the zones.
      		 */
      		blk_congestion_wait();
      	}
      b6c1702e
    • Andrew Morton's avatar
      [PATCH] mm/vmscan.c: remove unused priority argument. · 13095f7a
      Andrew Morton authored
      From: Nikita Danilov <Nikita@Namesys.COM>
      
      Now that decision to reclaim mapped memory is taken on the basis of
      zone->prev_priority, priority argument is no longer needed.
      13095f7a
    • Andrew Morton's avatar
      [PATCH] Narrow blk_congestion_wait races · c05d7ab9
      Andrew Morton authored
      From: Nick Piggin <piggin@cyberone.com.au>
      
      The addition of the smp_mb and the other change is to try to close the
      window for races a bit.  Obviously they can still happen, it's a racy
      interface and it doesn't matter much.
      c05d7ab9
    • Andrew Morton's avatar
      [PATCH] return remaining jiffies from blk_congestion_wait() · f3179458
      Andrew Morton authored
      Teach blk_congestion_wait() to return the number of jiffies remaining.  This
      is for debug, but it is also nicely consistent.
      f3179458
    • Andrew Morton's avatar
      [PATCH] vm: per-zone vmscan instrumentation · 760d95b5
      Andrew Morton authored
      To check on zone balancing, split the /proc/vmstat:pgsteal, pgreclaim pgalloc
      and pgscan stats into per-zone counters.
      
      Additionally, split the pgscan stats into pgscan_direct and pgscan_kswapd to
      see who's doing how much scanning.
      
      And add a metric for the number of slab objects which were scanned.
      760d95b5
    • Andrew Morton's avatar
      [PATCH] synclink.c update · bae30a3f
      Andrew Morton authored
      From: Paul Fulghum <paulkf@microgate.com>
      
      * track driver API changes
      * remove cast (kernel janitor)
      bae30a3f
    • Andrew Morton's avatar
      [PATCH] synclink_cs.c update · 208516ea
      Andrew Morton authored
      From: Paul Fulghum <paulkf@microgate.com>
      
      * Track driver API changes
      * Remove cast (kernel janitor)
      208516ea
    • Andrew Morton's avatar
      [PATCH] synclinkmp.c update · abc5e2bb
      Andrew Morton authored
      From: Paul Fulghum <paulkf@microgate.com>
      
      Patch for synclinkmp.c
      
      * Track driver API changes
      * Remove cast (kernel janitor)
      * Replace page_free call with kfree (to match kmalloc allocation)
      abc5e2bb
    • Andrew Morton's avatar
      [PATCH] Add barriers to avoid race in mempool_alloc/free · 66d1bbed
      Andrew Morton authored
      From: Chris Mason <mason@suse.com>
      
      mempool_alloc() and mempool_free() check pool->curr_nr without any locks
      held.  This can lead to skipping a wakeup when there are people waiting,
      and sleeping when there are free elements in the pool.
      
      I can't trigger this reliably, but sooner or later someone on ppc is
      probably going to hit it.
      66d1bbed
    • Andrew Morton's avatar
      [PATCH] m68k: interrupt management cleanups · e798a41d
      Andrew Morton authored
      From: Geert Uytterhoeven <geert@linux-m68k.org>
      
      M68k interrupt management: rename routines to not confuse them with
      syscalls
      
      - sys_{request,free}_irq() -> cpu_{request,free}_irq()
      
      - q40_sys_default_handler[] -> q40_default_handler
      
      - sys_default_handler() -> default_handler()
      e798a41d
    • Andrew Morton's avatar
      [PATCH] m68k: Macintosh IDE fixes · 4edbed7b
      Andrew Morton authored
      From: Geert Uytterhoeven <geert@linux-m68k.org>
      
      Mac IDE: Make sure the core IDE driver doesn't try to request the MMIO
      ports a second time, since this will fail.
      4edbed7b
    • Andrew Morton's avatar
      [PATCH] Apollo fb sysfsification · d8059782
      Andrew Morton authored
      From: Geert Uytterhoeven <geert@linux-m68k.org>
      
      Apollo fb: Add sysfs support (from James Simmons)
      d8059782
    • Andrew Morton's avatar
      [PATCH] m68k: Amiga Framemaster II fb sysfsification · 77375e79
      Andrew Morton authored
      From: Geert Uytterhoeven <geert@linux-m68k.org>
      
      Amiga Framemaster II fb: Add sysfs support (from James Simmons)
      77375e79
    • Andrew Morton's avatar
      [PATCH] m68k: __test_and_set_bit() · c138cf43
      Andrew Morton authored
      From: Geert Uytterhoeven <geert@linux-m68k.org>
      
      Add missing implementation for non-atomic __test_and_set_bit()
      c138cf43
    • Andrew Morton's avatar
      [PATCH] fbdev: monitor detection fixes · c42f7c1f
      Andrew Morton authored
      From: James Simmons <jsimmons@infradead.org>,
            Kronos <kronos@kronoz.cjb.net>
      
      Various fixes and enhancements to the monitor hardware detection code.  The
      only driver that uses it is the radeon driver.
      
      Old EDID parsing code was very verbose, half of the patch address this (ie.
      print lots of stuff iff DEBUG).  The other big change is the FB_MODE_IS_*
      stuff: we really need a way to know the origin of a video mode.  In this way
      we can select video mode that comes from EDID instead of VESA or GTF.
      
      Drivers other than radeonfb won't be affected because they cannot (yet) get
      EDID from the monitor and don't use EDID related code.
      c42f7c1f
    • Andrew Morton's avatar
      [PATCH] Fix NULL pointer dereference in blkmtd.c · 787bc776
      Andrew Morton authored
      From: Michel Marti <michel.marti@objectxp.com>
      
      The blkmtd driver oopses in add_device().  The following trivial patch
      fixes this.
      787bc776
    • Andrew Morton's avatar
      [PATCH] fix raid0 readahead size · 132a4161
      Andrew Morton authored
      From: Arjan van de Ven <arjanv@redhat.com>
      
      Readahead of raid0 was suboptimal; it read only 1 stride ahead.  The
      problem with this is that while it will keep all spindles busy, it will not
      actually manage to make larger IO's, eg each disk would just do the chunk
      size IO.  Doing at least 2 chunks is more than appropriate so that each
      spindle will get a chance to merge IO's.
      
      (Neil fixed raid6 and raid6 too)
      132a4161
    • Andrew Morton's avatar
      [PATCH] module.h __attribute_used__ fix · ef3555ba
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
      
      Someone added __attribute_used__ throughout module.h, but didn't remove the
      ", unused".  Looks like some arch/gcc combos still consider it unused, and
      discard the fn.
      ef3555ba
    • Andrew Morton's avatar
      [PATCH] Fix CONFIG_NVRAM dependencies · b814dfb7
      Andrew Morton authored
      From: Geert Uytterhoeven <geert@linux-m68k.org>
      
      Make CONFIG_NVRAM depend on the prerequisites that are explicitly checked
      for in drivers/char/nvram.c, or on CONFIG_GENERIC_NVRAM (for PPC).
      b814dfb7
    • Andrew Morton's avatar
      [PATCH] Applicom warning · 36f606d2
      Andrew Morton authored
      From: Geert Uytterhoeven <geert@linux-m68k.org>
      
      Add missing include (needed for struct inode)
      36f606d2
    • Andrew Morton's avatar
      [PATCH] Disable Macintosh device drivers for all but PPC || MAC · 72984282
      Andrew Morton authored
      From: Marc-Christian Petersen <m.c.p@wolk-project.de>
      
      The attached patch is needed to stop showing us "Macintosh device drivers"
      for all architectures via menuconfig || xconfig || gconfig.  It's only
      necessary for PPC and/or MAC.
      
      ACKed by benh.
      72984282
    • Andrew Morton's avatar
      [PATCH] add nowarn to a few pte chain allocators · a4fc9e26
      Andrew Morton authored
      From: Arjan van de Ven <arjanv@redhat.com>
      
      Several of the pte_chain_alloc() allocators that use GFP_ATOMIC have a
      fallback for failure that sleeps; they thus need to not warn on failure..
      Seen during a big fork on a busy system.
      a4fc9e26
    • Andrew Morton's avatar
      [PATCH] cciss: init section fix · 6edcc434
      Andrew Morton authored
      From: "Randy.Dunlap" <rddunlap@osdl.org>
      
      cciss_scsi_detect() can be called after init (for TAPE support).
      6edcc434
    • Andrew Morton's avatar
      [PATCH] EDD: Get Legacy Parameters · 66b61a5c
      Andrew Morton authored
      From: Matt Domsch <Matt_Domsch@dell.com>
      
      Patch below from Patrick J. LoPresti and myself.  Patrick describes:
      
      Why this patch?  The problem is that the legacy BIOS interface
      (INT13/AH=3D08) for querying the disk geometry returns different values
      than the extended INT13 interface which the EDD code currently uses.  This
      is because the legacy interface only provides a 10-bit cylinder field, so
      modern BIOSes "lie" about the head/sector counts in order to make more of
      the disk visible within the first 1024 cylinders.
      
      Many non-Linux applications, including the stock Windows boot loader, DOS
      fdisk, etc., rely upon the legacy interface and geometry.  So it is useful
      to be able to obtain the legacy values from a running Linux kernel.
      
      What this patch does is to add new entries under
      /sys/firmware/edd/int13_devXX named "legacy_cylinders", "legacy_heads", and
      "legacy_sectors".  These provide the geometry given by the legacy
      INT13/AH=3D08 BIOS interface, just like the current "default_cylinders"
      etc.  provide the the geometry given by the INT13/AH=3D48 interface.
      
      Without this patch, I cannot use Linux to partition a drive and install
      Windows, which happens to be my application.
      
       - Pat
         http://unattended.sourceforge.net/
      
      In addition, this adds two buggy BIOS workarounds  in the EDD int13
      calls as suggested by Ralf Brown's interrupt list.
      
      I'm also interested in moving this code out of arch/i386/kernel/edd.c and
      include/asm-i386/edd.h, as I believe it is applicable on x86-64 as well.
      However, there's no good place under drivers/ to put edd.c when it's not
      tied to a bus, but to several CPU architectures and their firmwares...
      Maybe a new directory drivers/firmware?
      66b61a5c
    • Andrew Morton's avatar
      [PATCH] wavfront.c needs syscalls.h · e0ed9d75
      Andrew Morton authored
      sound/oss/wavfront.c: In function `wavefront_download_firmware':
      sound/oss/wavfront.c:2524: warning: implicit declaration of function `sys_open'
      sound/oss/wavfront.c:2533: warning: implicit declaration of function `sys_read'
      sound/oss/wavfront.c:2582: warning: implicit declaration of function `sys_close
      e0ed9d75
    • Andrew Morton's avatar
      [PATCH] Fix reading the last block on a bdev · f8ccec6c
      Andrew Morton authored
      From: Chris Mason <mason@suse.com>
      
      This patch fixes a problem we're hitting on ia64 with page sizes > 4k.
      
      When the page size is greater than the block size, and parts of the page
      fall past the end of the device, readpage will fail because
      blkdev_get_block returns -EIO for blocks past i_size.
      
      The attached patch changes blkdev_get_block to return holes when reading
      past the end of the device, which allows us to read that last valid 4k
      block and then fill the rest of the page with zeros.  Writes will still
      fail with -EIO.
      f8ccec6c
    • Andrew Morton's avatar
      [PATCH] Fix rootfs on ramdisk · 6def6a58
      Andrew Morton authored
      From: vda <vda@port.imtp.ilyichevsk.odessa.ua>
      
      Add a missing test for the "root=/dev/ram" kernel boot option.  It's just an
      alias for /dev/ram0, but it worked in 2.4...
      6def6a58
    • Andrew Morton's avatar
      [PATCH] current_is_keventd() speedup · 6551f0aa
      Andrew Morton authored
      From: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      
      current_is_keventd() doesn't need to search across all the CPUs to identify
      itself.
      6551f0aa