1. 26 Sep, 2006 21 commits
    • Peter Zijlstra's avatar
      [PATCH] mm: tracking shared dirty pages · d08b3851
      Peter Zijlstra authored
      Tracking of dirty pages in shared writeable mmap()s.
      
      The idea is simple: write protect clean shared writeable pages, catch the
      write-fault, make writeable and set dirty.  On page write-back clean all the
      PTE dirty bits and write protect them once again.
      
      The implementation is a tad harder, mainly because the default
      backing_dev_info capabilities were too loosely maintained.  Hence it is not
      enough to test the backing_dev_info for cap_account_dirty.
      
      The current heuristic is as follows, a VMA is eligible when:
       - its shared writeable
          (vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
       - it is not a 'special' mapping
          (vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
       - the backing_dev_info is cap_account_dirty
          mapping_cap_account_dirty(vma->vm_file->f_mapping)
       - f_op->mmap() didn't change the default page protection
      
      Page from remap_pfn_range() are explicitly excluded because their COW
      semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
      because they don't have a backing store anyway.
      
      mprotect() is taught about the new behaviour as well.  However it overrides
      the last condition.
      
      Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
      It can be called on any page, but is currently only implemented for mapped
      pages, if the page is found the be of a VMA that accounts dirty pages it will
      also wrprotect the PTE.
      
      Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
      under ->private_lock.  This seems to be safe, since ->private_lock is used to
      serialize access to the buffers, not the page itself.  This is needed because
      clear_page_dirty() will call into page_mkclean() and would thereby violate
      locking order.
      
      [dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d08b3851
    • Nick Piggin's avatar
      [PATCH] mm: VM_BUG_ON · 725d704e
      Nick Piggin authored
      Introduce a VM_BUG_ON, which is turned on with CONFIG_DEBUG_VM.  Use this
      in the lightweight, inline refcounting functions; PageLRU and PageActive
      checks in vmscan, because they're pretty well confined to vmscan.  And in
      page allocate/free fastpaths which can be the hottest parts of the kernel
      for kbuilds.
      
      Unlike BUG_ON, VM_BUG_ON must not be used to execute statements with
      side-effects, and should not be used outside core mm code.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      725d704e
    • James Bottomley's avatar
      [PATCH] update to the kernel kmap/kunmap API · a6ca1b99
      James Bottomley authored
      Give non-highmem architectures access to the kmap API for the purposes of
      overriding (this is what the attached patch does).
      
      The proposal is that we should now require all architectures with coherence
      issues to manage data coherence via the kmap/kunmap API.  Thus driver
      writers never have to write code like
      
          kmap(page)
          modify data in page
          flush_kernel_dcache_page(page)
          kunmap(page)
      
      instead, kmap/kunmap will manage the coherence and driver (and filesystem)
      writers don't need to worry about how to flush between kmap and kunmap.
      
      For most architectures, the page only needs to be flushed if it was
      actually written to *and* there are user mappings of it, so the best
      implementation looks to be: clear the page dirty pte bit in the kernel page
      tables on kmap and on kunmap, check page->mappings for user maps, and then
      the dirty bit, and only flush if it both has user mappings and is dirty.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      a6ca1b99
    • Jan Kara's avatar
      [PATCH] jbd: fix commit of ordered data buffers · 3998b930
      Jan Kara authored
      Original commit code assumes, that when a buffer on BJ_SyncData list is
      locked, it is being written to disk.  But this is not true and hence it can
      lead to a potential data loss on crash.  Also the code didn't count with
      the fact that journal_dirty_data() can steal buffers from committing
      transaction and hence could write buffers that no longer belong to the
      committing transaction.  Finally it could possibly happen that we tried
      writing out one buffer several times.
      
      The patch below tries to solve these problems by a complete rewrite of the
      data commit code.  We go through buffers on t_sync_datalist, lock buffers
      needing write out and store them in an array.  Buffers are also immediately
      refiled to BJ_Locked list or unfiled (if the write out is completed).  When
      the array is full or we have to block on buffer lock, we submit all
      accumulated buffers for IO.
      
      [suitable for 2.6.18.x around the 2.6.19-rc2 timeframe]
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3998b930
    • Jan Blunck's avatar
      [PATCH] trigger a syntax error if percpu macros are incorrectly used · 632bbfee
      Jan Blunck authored
      get_cpu_var()/per_cpu()/__get_cpu_var() arguments must be simple
      identifiers.  Otherwise the arch dependent implementations might break.
      
      This patch enforces the correct usage of the macros by producing a syntax
      error if the variable is not a simple identifier.
      Signed-off-by: default avatarJan Blunck <jblunck@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      632bbfee
    • Christoph Lameter's avatar
      [PATCH] Fix longstanding load balancing bug in the scheduler · 0a2966b4
      Christoph Lameter authored
      The scheduler will stop load balancing if the most busy processor contains
      processes pinned via processor affinity.
      
      The scheduler currently only does one search for busiest cpu.  If it cannot
      pull any tasks away from the busiest cpu because they were pinned then the
      scheduler goes into a corner and sulks leaving the idle processors idle.
      
      F.e.  If you have processor 0 busy running four tasks pinned via taskset,
      there are none on processor 1 and one just started two processes on
      processor 2 then the scheduler will not move one of the two processes away
      from processor 2.
      
      This patch fixes that issue by forcing the scheduler to come out of its
      corner and retrying the load balancing by considering other processors for
      load balancing.
      
      This patch was originally developed by John Hawkes and discussed at
      
          http://marc.theaimsgroup.com/?l=linux-kernel&m=113901368523205&w=2.
      
      I have removed extraneous material and gone back to equipping struct rq
      with the cpu the queue is associated with since this makes the patch much
      easier and it is likely that others in the future will have the same
      difficulty of figuring out which processor owns which runqueue.
      
      The overhead added through these patches is a single word on the stack if
      the kernel is configured to support 32 cpus or less (32 bit).  For 32 bit
      environments the maximum number of cpus that can be configued is 255 which
      would result in the use of 32 bytes additional on the stack.  On IA64 up to
      1k cpus can be configured which will result in the use of 128 additional
      bytes on the stack.  The maximum additional cache footprint is one
      cacheline.  Typically memory use will be much less than a cacheline and the
      additional cpumask will be placed on the stack in a cacheline that already
      contains other local variable.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: John Hawkes <hawkes@sgi.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Williams <pwil3058@bigpond.net.au>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0a2966b4
    • Linus Torvalds's avatar
      Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev · 656ddf79
      Linus Torvalds authored
      * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev:
        [libata] Fix oops introduced in non-uniform port handling fix
        [PATCH] ata-piix: fixes kerneldoc error
      656ddf79
    • Jeff Garzik's avatar
      [libata] Fix oops introduced in non-uniform port handling fix · 29da9f6d
      Jeff Garzik authored
      Noticed by several people.
      Signed-off-by: default avatarJeff Garzik <jeff@garzik.org>
      29da9f6d
    • Linus Torvalds's avatar
      Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 · 7e472020
      Linus Torvalds authored
      * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
        [NetLabel]: update docs with website information
        [NetLabel]: rework the Netlink attribute handling (part 2)
        [NetLabel]: rework the Netlink attribute handling (part 1)
        [Netlink]: add nla_validate_nested()
        [NETLINK]: add nla_for_each_nested() to the interface list
        [NetLabel]: change the SELinux permissions
        [NetLabel]: make the CIPSOv4 cache spinlocks bottom half safe
        [NetLabel]: correct improper handling of non-NetLabel peer contexts
        [TCP]: make cubic the default
        [TCP]: default congestion control menu
        [ATM] he: Fix __init/__devinit conflict
        [NETFILTER]: Add dscp,DSCP headers to header-y
        [DCCP]: Introduce dccp_probe
        [DCCP]: Use constants for CCIDs
        [DCCP]: Introduce constants for CCID numbers
        [DCCP]: Allow default/fallback service code.
      7e472020
    • Linus Torvalds's avatar
      Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 · 7b29122f
      Linus Torvalds authored
      * master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
        [SOUND] sparc/amd7930: Use __devinit and __devinitdata as needed.
        [SUNLANCE]: Mark sparc_lance_probe_one as __devinit.
        [SPARC64]: Fix section-mismatch errors in solaris emul module.
      7b29122f
    • Jonathan Corbet's avatar
      [PATCH] VIDIOC_ENUMSTD bug · b7de567b
      Jonathan Corbet authored
      The v4l2 API documentation for VIDIOC_ENUMSTD says:
      
      	To enumerate all standards applications shall begin at index
      	zero, incrementing by one until the driver returns EINVAL.
      
      The actual code, however, tests the index this way:
      
                     if (index<=0 || index >= vfd->tvnormsize) {
                              ret=-EINVAL;
      
      So any application which passes in index=0 gets EINVAL right off the bat
      - and, in fact, this is what happens to mplayer.  So I think the
      following patch is called for, and maybe even appropriate for a 2.6.18.x
      stable release.
      Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b7de567b
    • Ed Swierk's avatar
      [PATCH] load_module: no BUG if module_subsys uninitialized · 1cc5f714
      Ed Swierk authored
      Invoking load_module() before param_sysfs_init() is called crashes in
      mod_sysfs_setup(), since the kset in module_subsys is not initialized yet.
      
      In my case, net-pf-1 is getting modprobed as a result of hotplug trying to
      create a UNIX socket.  Calls to hotplug begin after the topology_init
      initcall.
      
      Another patch for the same symptom (module_subsys-initialize-earlier.patch)
      moves param_sysfs_init() to the subsys initcalls, but this is still not
      early enough in the boot process in some cases.  In particular,
      topology_init() causes /sbin/hotplug to run, which requests net-pf-1 (the
      UNIX socket protocol) which can be compiled as a module.  Moving
      param_sysfs_init() to the postcore initcalls fixes this particular race,
      but there might well be other cases where a usermodehelper causes a module
      to load earlier still.
      
      The patch makes load_module() return an error rather than crashing the
      kernel if invoked before module_subsys is initialized.
      
      Cc: Mark Huang <mlhuang@cs.princeton.edu>
      Cc: Greg KH <greg@kroah.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1cc5f714
    • keith mannthey's avatar
      [PATCH] i386: fix flat mode numa on a real numa system · bfa0e9a0
      keith mannthey authored
      If there is only 1 node in the system cpus should think they are apart of
      some other node.
      
      If cases where a real numa system boots the Flat numa option make sure the
      cpus don't claim to be apart on a non-existent node.
      Signed-off-by: default avatarKeith Mannthey <kmannth@us.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      bfa0e9a0
    • KAMEZAWA Hiroyuki's avatar
      [PATCH] cpu to node relationship fixup: map cpu to node · 3212fe15
      KAMEZAWA Hiroyuki authored
      Assume that a cpu is *physically* offlined at boot time...
      
      Because smpboot.c::smp_boot_cpu_map() canoot find cpu's sapicid,
      numa.c::build_cpu_to_node_map() cannot build cpu<->node map for
      offlined cpu.
      
      For such cpus, cpu_to_node map should be fixed at cpu-hot-add.
      This mapping should be done before cpu onlining.
      
      This patch also handles cpu hotremove case.
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3212fe15
    • KAMEZAWA Hiroyuki's avatar
      [PATCH] cpu to node relationship fixup: acpi_map_cpu2node · 08992986
      KAMEZAWA Hiroyuki authored
      Problem description:
      
        We have additional_cpus= option for allocating possible_cpus.  But nid
        for possible cpus are not fixed at boot time.  cpus which is offlined at
        boot or cpus which is not on SRAT is not tied to its node.  This will
        cause panic at cpu onlining.
      
      Usually, pxm_to_nid() mapping is fixed at boot time by SRAT.
      
      But, unfortunately, some system (my system!) do not include
      full SRAT table for possible cpus.  (Then, I use
      additiona_cpus= option.)
      
      For such possible cpus, pxm<->nid should be fixed at
      hot-add.  We now have acpi_map_pxm_to_node() which is also
      used at boot.  It's suitable here.
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      08992986
    • Michael Hanselmann's avatar
      [PATCH] backlight: fix oops in __mutex_lock_slowpath during head /sys/class/graphics/fb0/* · 25981de5
      Michael Hanselmann authored
      Seems like not all drivers use the framebuffer_alloc() function and won't
      have an initialized mutex.  But those don't have a backlight, anyway.
      Signed-off-by: default avatarMichael Hanselmann <linux-kernel@hansmi.ch>
      Cc: Olaf Hering <olaf@aepfle.de>
      Cc: "Antonino A. Daplas" <adaplas@pol.net>
      Cc: Daniel R Thompson <daniel.thompson@st.com>
      Cc: Jon Smirl <jonsmirl@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      25981de5
    • David Rientjes's avatar
      [PATCH] do not free non slab allocated per_cpu_pageset · f3ef9ead
      David Rientjes authored
      Stops panic associated with attempting to free a non slab-allocated
      per_cpu_pageset.
      Signed-off-by: default avatarDavid Rientjes <rientjes@cs.washington.edu>
      Acked-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f3ef9ead
    • keith mannthey's avatar
      [PATCH] i386 bootioremap / kexec fix · 24fd425e
      keith mannthey authored
      With CONFIG_PHYSICAL_START set to a non default values the i386
      boot_ioremap code calculated its pte index wrong and users of boot_ioremap
      have their areas incorrectly mapped (for me SRAT table not mapped during
      early boot).  This patch removes the addr < BOOT_PTE_PTRS constraint.
      
      [ Keith says this is applicable to 2.6.16 and 2.6.17 as well ]
      
      Signed-off-by: Keith Mannthey<kmannth@us.ibm.com>
      Cc: Vivek Goyal <vgoyal@in.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: <stable@kernel.org>
      Cc: Adrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      24fd425e
    • Peter Zijlstra's avatar
      [PATCH] rtc: lockdep fix/workaround · 0b16f21f
      Peter Zijlstra authored
      BUG: warning at kernel/lockdep.c:1816/trace_hardirqs_on() (Not tainted)
       [<c04051ee>] show_trace_log_lvl+0x58/0x171
       [<c0405802>] show_trace+0xd/0x10
       [<c040591b>] dump_stack+0x19/0x1b
       [<c043abee>] trace_hardirqs_on+0xa2/0x11e
       [<c06143c3>] _spin_unlock_irq+0x22/0x26
       [<c0541540>] rtc_get_rtc_time+0x32/0x176
       [<c0419ba4>] hpet_rtc_interrupt+0x92/0x14d
       [<c0450f94>] handle_IRQ_event+0x20/0x4d
       [<c0451055>] __do_IRQ+0x94/0xef
       [<c040678d>] do_IRQ+0x9e/0xbd
       [<c0404a49>] common_interrupt+0x25/0x2c
      DWARF2 unwinder stuck at common_interrupt+0x25/0x2c
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0b16f21f
    • Ian Kent's avatar
      [PATCH] autofs4: zero timeout prevents shutdown · c0ba7e51
      Ian Kent authored
      If the timeout of an autofs mount is set to zero then umounts are disabled.
       This works fine, however the kernel module checks the expire timeout and
      goes no further if it is zero.  This is not the right thing to do at
      shutdown as the module is passed an option to expire mounts regardless of
      their timeout setting.
      
      This patch allows autofs to honor the force expire option.
      Signed-off-by: default avatarIan Kent <raven@themaw.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c0ba7e51
    • Henne's avatar
      [PATCH] ata-piix: fixes kerneldoc error · c32a8fd7
      Henne authored
      Fixes an error in kerneldoc of ata_piix.c.
      Signed-off-by: default avatarHenrik Kretzschmar <henne@nachtwindheim.de>
      Signed-off-by: default avatarJeff Garzik <jeff@garzik.org>
      c32a8fd7
  2. 25 Sep, 2006 19 commits