1. 21 Dec, 2002 16 commits
    • Andrew Morton's avatar
      [PATCH] Give kswapd writeback higher priority than pdflush · e386771c
      Andrew Morton authored
      The `low latency page reclaim' design works by preventing page
      allocators from blocking on request queues (and by preventing them from
      blocking against writeback of individual pages, but that is immaterial
      here).
      
      This has a problem under some situations.  pdflush (or a write(2)
      caller) could be saturating the queue with highmem pages.  This
      prevents anyone from writing back ZONE_NORMAL pages.  We end up doing
      enormous amounts of scenning.
      
      A test case is to mmap(MAP_SHARED) almost all of a 4G machine's memory,
      then kill the mmapping applications.  The machine instantly goes from
      0% of memory dirty to 95% or more.  pdflush kicks in and starts writing
      the least-recently-dirtied pages, which are all highmem.  The queue is
      congested so nobody will write back ZONE_NORMAL pages.  kswapd chews
      50% of the CPU scanning past dirty ZONE_NORMAL pages and page reclaim
      efficiency (pages_reclaimed/pages_scanned) falls to 2%.
      
      So this patch changes the policy for kswapd.  kswapd may use all of a
      request queue, and is prepared to block on request queues.
      
      What will now happen in the above scenario is:
      
      1: The page alloctor scans some pages, fails to reclaim enough
         memory and takes a nap in blk_congetion_wait().
      
      2: kswapd() will scan the ZONE_NORMAL LRU and will start writing
         back pages.  (These pages will be rotated to the tail of the
         inactive list at IO-completion interrupt time).
      
         This writeback will saturate the queue with ZONE_NORMAL pages.
         Conveniently, pdflush will avoid the congested queues.  So we end up
         writing the correct pages.
      
      In this test, kswapd CPU utilisation falls from 50% to 2%, page reclaim
      efficiency rises from 2% to 40% and things are generally a lot happier.
      
      
      The downside is that kswapd may now do a lot less page reclaim,
      increasing page allocation latency, causing more direct reclaim,
      increasing lock contention in the VM, etc.  But I have not been able to
      demonstrate that in testing.
      
      
      The other problem is that there is only one kswapd, and there are lots
      of disks.  That is a generic problem - without being able to co-opt
      user processes we don't have enough threads to keep lots of disks saturated.
      
      One fix for this would be to add an additional "really congested"
      threshold in the request queues, so kswapd can still perform
      nonblocking writeout.  This gives kswapd priority over pdflush while
      allowing kswapd to feed many disk queues.  I doubt if this will be
      called for.
      e386771c
    • Andrew Morton's avatar
      [PATCH] Remove PF_NOWARN · 833cb2a6
      Andrew Morton authored
      We keep getting in a mess with the current->flags setting and
      unsetting.
      
      Remove current->flags:PF_NOWARN and create __GFP_NOWARN instead.
      833cb2a6
    • Andrew Morton's avatar
      [PATCH] misc fixes · 72c36b7d
      Andrew Morton authored
      - A C99 initialiser in drivers/char/mem.c
      
      - Remove unneeded deref in madvise_willneed()
      72c36b7d
    • Andrew Morton's avatar
      [PATCH] Add generic_file_readonly_mmap() for nommu · 503c99ef
      Andrew Morton authored
      Add a generic_file_readonly_mmap() for !CONFIG_MMU.
      503c99ef
    • Andrew Morton's avatar
      [PATCH] more informative slab poisoning · 4f781c84
      Andrew Morton authored
      slab poisons objects with 0x5a both when they are constructed and when
      they are freed.  So it is not possible to tell whether a deref of
      0x5a5a5a5a was a use-before-initialisation bug or a use-after-free bug.
      
      The patch changes it so that
      
      1) A deref of 0x5a5a5a5a means use-of-uninitialised-memory
      
      2) A deref of 0x6b6b6b6b means use-of-freed-memory.
      4f781c84
    • Andrew Morton's avatar
      [PATCH] fix use-after-free bug in move_vma() · 5446f21e
      Andrew Morton authored
      move_vma() calls do_munmap() and then uses the memory at *new_vma.
      
      But when starting X11 it just happens that the memory which do_munmap
      unmapped had the same start address and the range at *new_vma.  So new_vma
      is freed by do_munmap().
      
      This was never noticed before because (vm_flags & VM_LOCKED) evaluates
      false when vm_flags is 0x5a5a5a5a.  But I just changed that to 0x6b6b6b6b
      and boom - we call make_pages_present() with start == end == 0x6b6b6b6b and
      it goes BUG.
      
      So I think the right fix here is for move_vma() to not inspect the values
      of any vma's after it has called do_munmap().
      
      The patch does that, for `new_vma'.
      
      The local variable `vma' is also being used after the call do do_munmap(),
      and this may also be a bug.  Proving that this is not so, and adding a
      comment to explain why is hereby added to Hugh's todo list ;)
      5446f21e
    • Andrew Morton's avatar
      [PATCH] fix a page dirtying race in vmscan.c · 985babe8
      Andrew Morton authored
      There's a small window in which another CPU could dirty the page after
      we've cleaned it, and before we've moved it to mapping->dirty_pages().
      The end result is a dirty page on mapping->locked_pages, which is
      wrong.
      
      So take mapping->page_lock before clearing the dirty bit.
      985babe8
    • Andrew Morton's avatar
      [PATCH] sync_fs deadlock fix · e101875d
      Andrew Morton authored
      Running a `mount -o remount' against ext3 deadlocks if there is heavy
      write activity.  It's a sort of AB/BA deadlock caused by calling
      log_wait_commit() under lock_super().  The caller holds lock_super()
      and is waiting for a commit, but the commit cannot complete because
      lock_super() is also used in the block allocator.
      
      The way we fixed this in tha past is to drop the superblock lock inside
      ext3.  The way this patch fixes it is to arrange for lock_super() to
      not be held around the ->sync_fs() call.
      
      Also: sync_filesystems is on the sys_sync() path and is racy wrt
      unmount.  Check sb->s_root after taking sb->s_umount.
      e101875d
    • Linus Torvalds's avatar
      Sysenter cleanups (originals by Brian Gerst, updated and expanded by me): · d8ce4c5f
      Linus Torvalds authored
       - set up kernel stack pointer for sysenter at each context switch.
       - disable sysenter while in vm86 mode.
       - clean up mtrr number defines and SEP feature testing
      d8ce4c5f
    • Linus Torvalds's avatar
    • Ivan Kokshaysky's avatar
      [PATCH] PCI: setup-xx fixes · 2ce208e5
      Ivan Kokshaysky authored
      Don't disable PCI devices before changing the BARs, as discussed
      recently.  Disabling PCI_COMMAND_MASTER bit is an obvious bug.
      
      Further, pdev_enable_device() is a leftover from very old (2.0, I guess)
      alpha PCI code.  It's used in pci_assign_unassigned_resources() to
      enable *every* PCI device in the system.  So, if we have two graphic
      cards on the same bus, both with legacy VGA IO...  oops.
      
      Actually, only alpha relied on that due to the lack of
      pcibios_enable_device (which has been already fixed).
      2ce208e5
    • Manfred Spraul's avatar
      [PATCH] new attempt at sys_poll allocation (was: Re: Poll patches..) · 9dd405aa
      Manfred Spraul authored
      This replaces the dynamically allocated two-level array in sys_poll with
      a dynamically allocated linked list.  The current implementation causes
      at least two alloc/free calls, even if only one or two descriptors are
      polled.  This reduces that to one alloc/free, and the .text segment is
      around 220 bytes shorter.  The microbenchmark that polls one pipe fd is
      around 30% faster.  [1140 cycles instead of 1604 cycles, Celeron mobile
      1.13 GHz]
      9dd405aa
    • Linus Torvalds's avatar
      Merge bk://linux-dj.bkbits.net/agpgart · 564dede9
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      564dede9
    • Dave Jones's avatar
      Merge tetrachloride.(none):/mnt/stuff/kernel/2.5/bk-linus · 01d8392d
      Dave Jones authored
      into tetrachloride.(none):/mnt/stuff/kernel/2.5/agpgart
      01d8392d
    • Dave Jones's avatar
      [AGP] Make things compile again if AGP3=n · add4c230
      Dave Jones authored
      add4c230
    • Linus Torvalds's avatar
      Merge http://lia64.bkbits.net/to-linus-2.5 · c6bb6a89
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      c6bb6a89
  2. 20 Dec, 2002 24 commits