1. 30 Jun, 2006 20 commits
    • Christoph Lameter's avatar
      [PATCH] zoned vm counters: conversion of nr_pagetables to per zone counter · df849a15
      Christoph Lameter authored
      Conversion of nr_page_table_pages to a per zone counter
      
      [akpm@osdl.org: bugfix]
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      df849a15
    • Christoph Lameter's avatar
      [PATCH] zoned vm counters: conversion of nr_slab to per zone counter · 9a865ffa
      Christoph Lameter authored
      - Allows reclaim to access counter without looping over processor counts.
      
      - Allows accurate statistics on how many pages are used in a zone by
        the slab. This may become useful to balance slab allocations over
        various zones.
      
      [akpm@osdl.org: bugfix]
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9a865ffa
    • Christoph Lameter's avatar
      [PATCH] zoned vm counters: zone_reclaim: remove /proc/sys/vm/zone_reclaim_interval · 34aa1330
      Christoph Lameter authored
      The zone_reclaim_interval was necessary because we were not able to determine
      how many unmapped pages exist in a zone.  Therefore we had to scan in
      intervals to figure out if any pages were unmapped.
      
      With the zoned counters and NR_ANON_PAGES we now know the number of pagecache
      pages and the number of mapped pages in a zone.  So we can simply skip the
      reclaim if there is an insufficient number of unmapped pages.  We use
      SWAP_CLUSTER_MAX as the boundary.
      
      Drop all support for /proc/sys/vm/zone_reclaim_interval.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      34aa1330
    • Christoph Lameter's avatar
      [PATCH] zoned vm counters: split NR_ANON_PAGES off from NR_FILE_MAPPED · f3dbd344
      Christoph Lameter authored
      The current NR_FILE_MAPPED is used by zone reclaim and the dirty load
      calculation as the number of mapped pagecache pages.  However, that is not
      true.  NR_FILE_MAPPED includes the mapped anonymous pages.  This patch
      separates those and therefore allows an accurate tracking of the anonymous
      pages per zone.
      
      It then becomes possible to determine the number of unmapped pages per zone
      and we can avoid scanning for unmapped pages if there are none.
      
      Also it may now be possible to determine the mapped/unmapped ratio in
      get_dirty_limit.  Isnt the number of anonymous pages irrelevant in that
      calculation?
      
      Note that this will change the meaning of the number of mapped pages reported
      in /proc/vmstat /proc/meminfo and in the per node statistics.  This may affect
      user space tools that monitor these counters!  NR_FILE_MAPPED works like
      NR_FILE_DIRTY.  It is only valid for pagecache pages.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f3dbd344
    • Christoph Lameter's avatar
      [PATCH] zoned vm counters: remove NR_FILE_MAPPED from scan control structure · bf02cf4b
      Christoph Lameter authored
      We can now access the number of pages in a mapped state in an inexpensive way
      in shrink_active_list.  So drop the nr_mapped field from scan_control.
      
      [akpm@osdl.org: bugfix]
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      bf02cf4b
    • Christoph Lameter's avatar
      [PATCH] zoned vm counters: conversion of nr_pagecache to per zone counter · 347ce434
      Christoph Lameter authored
      Currently a single atomic variable is used to establish the size of the page
      cache in the whole machine.  The zoned VM counters have the same method of
      implementation as the nr_pagecache code but also allow the determination of
      the pagecache size per zone.
      
      Remove the special implementation for nr_pagecache and make it a zoned counter
      named NR_FILE_PAGES.
      
      Updates of the page cache counters are always performed with interrupts off.
      We can therefore use the __ variant here.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      347ce434
    • Christoph Lameter's avatar
      [PATCH] zoned vm counters: convert nr_mapped to per zone counter · 65ba55f5
      Christoph Lameter authored
      nr_mapped is important because it allows a determination of how many pages of
      a zone are not mapped, which would allow a more efficient means of determining
      when we need to reclaim memory in a zone.
      
      We take the nr_mapped field out of the page state structure and define a new
      per zone counter named NR_FILE_MAPPED (the anonymous pages will be split off
      from NR_MAPPED in the next patch).
      
      We replace the use of nr_mapped in various kernel locations.  This avoids the
      looping over all processors in try_to_free_pages(), writeback, reclaim (swap +
      zone reclaim).
      
      [akpm@osdl.org: bugfix]
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      65ba55f5
    • Christoph Lameter's avatar
      [PATCH] zoned vm counters: basic ZVC (zoned vm counter) implementation · 2244b95a
      Christoph Lameter authored
      Per zone counter infrastructure
      
      The counters that we currently have for the VM are split per processor.  The
      processor however has not much to do with the zone these pages belong to.  We
      cannot tell f.e.  how many ZONE_DMA pages are dirty.
      
      So we are blind to potentially inbalances in the usage of memory in various
      zones.  F.e.  in a NUMA system we cannot tell how many pages are dirty on a
      particular node.  If we knew then we could put measures into the VM to balance
      the use of memory between different zones and different nodes in a NUMA
      system.  For example it would be possible to limit the dirty pages per node so
      that fast local memory is kept available even if a process is dirtying huge
      amounts of pages.
      
      Another example is zone reclaim.  We do not know how many unmapped pages exist
      per zone.  So we just have to try to reclaim.  If it is not working then we
      pause and try again later.  It would be better if we knew when it makes sense
      to reclaim unmapped pages from a zone.  This patchset allows the determination
      of the number of unmapped pages per zone.  We can remove the zone reclaim
      interval with the counters introduced here.
      
      Futhermore the ability to have various usage statistics available will allow
      the development of new NUMA balancing algorithms that may be able to improve
      the decision making in the scheduler of when to move a process to another node
      and hopefully will also enable automatic page migration through a user space
      program that can analyse the memory load distribution and then rebalance
      memory use in order to increase performance.
      
      The counter framework here implements differential counters for each processor
      in struct zone.  The differential counters are consolidated when a threshold
      is exceeded (like done in the current implementation for nr_pageache), when
      slab reaping occurs or when a consolidation function is called.
      
      Consolidation uses atomic operations and accumulates counters per zone in the
      zone structure and also globally in the vm_stat array.  VM functions can
      access the counts by simply indexing a global or zone specific array.
      
      The arrangement of counters in an array also simplifies processing when output
      has to be generated for /proc/*.
      
      Counters can be updated by calling inc/dec_zone_page_state or
      _inc/dec_zone_page_state analogous to *_page_state.  The second group of
      functions can be called if it is known that interrupts are disabled.
      
      Special optimized increment and decrement functions are provided.  These can
      avoid certain checks and use increment or decrement instructions that an
      architecture may provide.
      
      We also add a new CONFIG_DMA_IS_NORMAL that signifies that an architecture can
      do DMA to all memory and therefore ZONE_NORMAL will not be populated.  This is
      only currently set for IA64 SGI SN2 and currently only affects
      node_page_state().  In the best case node_page_state can be reduced to
      retrieving a single counter for the one zone on the node.
      
      [akpm@osdl.org: cleanups]
      [akpm@osdl.org: export vm_stat[] for filesystems]
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      2244b95a
    • Christoph Lameter's avatar
      [PATCH] zoned vm counters: create vmstat.c/.h from page_alloc.c/.h · f6ac2354
      Christoph Lameter authored
      NOTE: ZVC are *not* the lightweight event counters.  ZVCs are reliable whereas
      event counters do not need to be.
      
      Zone based VM statistics are necessary to be able to determine what the state
      of memory in one zone is.  In a NUMA system this can be helpful for local
      reclaim and other memory optimizations that may be able to shift VM load in
      order to get more balanced memory use.
      
      It is also useful to know how the computing load affects the memory
      allocations on various zones.  This patchset allows the retrieval of that data
      from userspace.
      
      The patchset introduces a framework for counters that is a cross between the
      existing page_stats --which are simply global counters split per cpu-- and the
      approach of deferred incremental updates implemented for nr_pagecache.
      
      Small per cpu 8 bit counters are added to struct zone.  If the counter exceeds
      certain thresholds then the counters are accumulated in an array of
      atomic_long in the zone and in a global array that sums up all zone values.
      The small 8 bit counters are next to the per cpu page pointers and so they
      will be in high in the cpu cache when pages are allocated and freed.
      
      Access to VM counter information for a zone and for the whole machine is then
      possible by simply indexing an array (Thanks to Nick Piggin for pointing out
      that approach).  The access to the total number of pages of various types does
      no longer require the summing up of all per cpu counters.
      
      Benefits of this patchset right now:
      
      - Ability for UP and SMP configuration to determine how memory
        is balanced between the DMA, NORMAL and HIGHMEM zones.
      
      - loops over all processors are avoided in writeback and
        reclaim paths. We can avoid caching the writeback information
        because the needed information is directly accessible.
      
      - Special handling for nr_pagecache removed.
      
      - zone_reclaim_interval vanishes since VM stats can now determine
        when it is worth to do local reclaim.
      
      - Fast inline per node page state determination.
      
      - Accurate counters in /sys/devices/system/node/node*/meminfo. Current
        counters are counting simply which processor allocated a page somewhere
        and guestimate based on that. So the counters were not useful to show
        the actual distribution of page use on a specific zone.
      
      - The swap_prefetch patch requires per node statistics in order to
        figure out when processors of a node can prefetch. This patch provides
        some of the needed numbers.
      
      - Detailed VM counters available in more /proc and /sys status files.
      
      References to earlier discussions:
      V1 http://marc.theaimsgroup.com/?l=linux-kernel&m=113511649910826&w=2
      V2 http://marc.theaimsgroup.com/?l=linux-kernel&m=114980851924230&w=2
      V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115014697910351&w=2
      V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767318740&w=2
      
      Performance tests with AIM7 did not show any regressions.  Seems to be a tad
      faster even.  Tested on ia64/NUMA.  Builds fine on i386, SMP / UP.  Includes
      fixes for s390/arm/uml arch code.
      
      This patch:
      
      Move counter code from page_alloc.c/page-flags.h to vmstat.c/h.
      
      Create vmstat.c/vmstat.h by separating the counter code and the proc
      functions.
      
      Move the vm_stat_text array before zoneinfo_show.
      
      [akpm@osdl.org: s390 build fix]
      [akpm@osdl.org: HOTPLUG_CPU build fix]
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f6ac2354
    • Adrian Bunk's avatar
      [PATCH] fix ISTALLION=y · 672b2714
      Adrian Bunk authored
      drivers/char/istallion.c: In function ‘stli_initbrds’:
      drivers/char/istallion.c:4150: error: implicit declaration of function ‘stli_parsebrd’
      drivers/char/istallion.c:4150: error: ‘stli_brdsp’ undeclared (first use in this function)
      drivers/char/istallion.c:4150: error: (Each undeclared identifier is reported only once
      drivers/char/istallion.c:4150: error: for each function it appears in.)
      drivers/char/istallion.c:4164: error: implicit declaration of function ‘stli_argbrds’
      
      While I was at it, I also removed the #ifdef MODULE around the initialation
      code to allow it to perhaps work when built into the kernel and made a
      needlessly global function static.
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      672b2714
    • Andrew Morton's avatar
      [PATCH] msr.c: use register_hotcpu_notifier() · e09793bb
      Andrew Morton authored
      register_cpu_notifier() cannot do anything in a module, in a
      !CONFIG_HOTPLUG_CPU kernel.
      
      Cc: Chandra Seetharaman <sekharan@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e09793bb
    • Ingo Molnar's avatar
      [PATCH] fix platform_device_put/del mishaps · 1017f6af
      Ingo Molnar authored
      This fixes drivers/char/pc8736x_gpio.c and drivers/char/scx200_gpio.c to
      use the platform_device_del/put ops correctly.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Jim Cromie <jim.cromie@gmail.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1017f6af
    • Ingo Molnar's avatar
      [PATCH] fix drivers/video/imacfb.c compilation · 491d525f
      Ingo Molnar authored
      Fix build error on x86_64.  There's nothing even remotely close to
      imacmp_seg in the kernel, so I removed the whole line.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Edgar Hucek <hostmaster@ed-soft.at>
      Cc: Antonino Daplas <adaplas@pol.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      491d525f
    • Linus Torvalds's avatar
      Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 · 501b7c77
      Linus Torvalds authored
      * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
        ocfs2: remove redundant NULL checks in ocfs2_direct_IO_get_blocks()
        ocfs2: clean up some osb fields
        ocfs2: fix init of uuid_net_key
        ocfs2: silence a debug print
        ocfs2: silence ENOENT during lookup of broken links
        ocfs2: Cleanup message prints
        ocfs2: silence -EEXIST from ocfs2_extent_map_insert/lookup
        [PATCH] fs/ocfs2/dlm/dlmrecovery.c: make dlm_lockres_master_requery() static
        ocfs2: warn the user on a dead timeout mismatch
        ocfs2: OCFS2_FS must depend on SYSFS
        ocfs2: Compile-time disabling of ocfs2 debugging output.
        configfs: Clear up a few extra spaces where there should be TABs.
        configfs: Release memory in configfs_example.
      501b7c77
    • Linus Torvalds's avatar
      Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 · 74e651f0
      Linus Torvalds authored
      * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (30 commits)
        [TIPC]: Initial activation message now includes TIPC version number
        [TIPC]: Improve response to requests for node/link information
        [TIPC]: Fixed skb_under_panic caused by tipc_link_bundle_buf
        [IrDA]: Fix the AU1000 FIR dependencies
        [IrDA]: Fix RCU lock pairing on error path
        [XFRM]: unexport xfrm_state_mtu
        [NET]: make skb_release_data() static
        [NETFILTE] ipv4: Fix typo (Bugzilla #6753)
        [IrDA]: MCS7780 usb_driver struct should be static
        [BNX2]: Turn off link during shutdown
        [BNX2]: Use dev_kfree_skb() instead of the _irq version
        [ATM]: basic sysfs support for ATM devices
        [ATM]: [suni] change suni_init to __devinit
        [ATM]: [iphase] should be __devinit not __init
        [ATM]: [idt77105] should be __devinit not __init
        [BNX2]: Add NETIF_F_TSO_ECN
        [NET]: Add ECN support for TSO
        [AF_UNIX]: Datagram getpeersec
        [NET]: Fix logical error in skb_gso_ok
        [PKT_SCHED]: PSCHED_TADD() and PSCHED_TADD2() can result,tv_usec >= 1000000
        ...
      74e651f0
    • Allan Stephens's avatar
    • Allan Stephens's avatar
      [TIPC]: Improve response to requests for node/link information · ea13847b
      Allan Stephens authored
      Now allocates reply space for "get links" request based on number of actual
      links, not number of potential links.  Also, limits reply to "get links" and
      "get nodes" requests to 32KB to match capabilities of tipc-config utility
      that issued request.
      Signed-off-by: default avatarAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: default avatarPer Liden <per.liden@ericsson.com>
      ea13847b
    • Allan Stephens's avatar
      [TIPC]: Fixed skb_under_panic caused by tipc_link_bundle_buf · e49060c7
      Allan Stephens authored
      Now determines tailroom of bundle buffer by directly inspection of buffer.
      Previously, buffer was assumed to have a max capacity equal to the link MTU,
      but the addition of link MTU negotiation means that the link MTU can increase
      after the bundle buffer is allocated.
      Signed-off-by: default avatarAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: default avatarPer Liden <per.liden@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e49060c7
    • Adrian Bunk's avatar
      [IrDA]: Fix the AU1000 FIR dependencies · caf430f3
      Adrian Bunk authored
      AU1000 FIR is broken, it should depend on SOC_AU1000.
      
      Spotted by Jean-Luc Leger.
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarSamuel Ortiz <samuel@sortiz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      caf430f3
    • Josh Triplett's avatar
      [IrDA]: Fix RCU lock pairing on error path · 1bc17311
      Josh Triplett authored
      irlan_client_discovery_indication calls rcu_read_lock and rcu_read_unlock, but
      returns without unlocking in an error case.  Fix that by replacing the return
      with a goto so that the rcu_read_unlock always gets executed.
      Signed-off-by: default avatarJosh Triplett <josh@freedesktop.org>
      Acked-by: default avatarPaul E. McKenney <paulmck@us.ibm.com>
      Signed-off-by: Samuel Ortiz samuel@sortiz.org <>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1bc17311
  2. 29 Jun, 2006 20 commits