1. 01 Apr, 2010 34 commits
  2. 15 Mar, 2010 6 commits
    • Greg Kroah-Hartman's avatar
      Linux 2.6.32.10 · dd49f626
      Greg Kroah-Hartman authored
      dd49f626
    • Ian Campbell's avatar
      x86, mm: Allow highmem user page tables to be disabled at boot time · 1942aeab
      Ian Campbell authored
      commit 14315592 upstream.
      
      Distros generally (I looked at Debian, RHEL5 and SLES11) seem to
      enable CONFIG_HIGHPTE for any x86 configuration which has highmem
      enabled. This means that the overhead applies even to machines which
      have a fairly modest amount of high memory and which therefore do not
      really benefit from allocating PTEs in high memory but still pay the
      price of the additional mapping operations.
      
      Running kernbench on a 4G box I found that with CONFIG_HIGHPTE=y but
      no actual highptes being allocated there was a reduction in system
      time used from 59.737s to 55.9s.
      
      With CONFIG_HIGHPTE=y and highmem PTEs being allocated:
        Average Optimal load -j 4 Run (std deviation):
        Elapsed Time 175.396 (0.238914)
        User Time 515.983 (5.85019)
        System Time 59.737 (1.26727)
        Percent CPU 263.8 (71.6796)
        Context Switches 39989.7 (4672.64)
        Sleeps 42617.7 (246.307)
      
      With CONFIG_HIGHPTE=y but with no highmem PTEs being allocated:
        Average Optimal load -j 4 Run (std deviation):
        Elapsed Time 174.278 (0.831968)
        User Time 515.659 (6.07012)
        System Time 55.9 (1.07799)
        Percent CPU 263.8 (71.266)
        Context Switches 39929.6 (4485.13)
        Sleeps 42583.7 (373.039)
      
      This patch allows the user to control the allocation of PTEs in
      highmem from the command line ("userpte=nohigh") but retains the
      status-quo as the default.
      
      It is possible that some simple heuristic could be developed which
      allows auto-tuning of this option however I don't have a sufficiently
      large machine available to me to perform any particularly meaningful
      experiments. We could probably handwave up an argument for a threshold
      at 16G of total RAM.
      
      Assuming 768M of lowmem we have 196608 potential lowmem PTE
      pages. Each page can map 2M of RAM in a PAE-enabled configuration,
      meaning a maximum of 384G of RAM could potentially be mapped using
      lowmem PTEs.
      
      Even allowing generous factor of 10 to account for other required
      lowmem allocations, generous slop to account for page sharing (which
      reduces the total amount of RAM mappable by a given number of PT
      pages) and other innacuracies in the estimations it would seem that
      even a 32G machine would not have a particularly pressing need for
      highmem PTEs. I think 32G could be considered to be at the upper bound
      of what might be sensible on a 32 bit machine (although I think in
      practice 64G is still supported).
      
      It's seems questionable if HIGHPTE is even a win for any amount of RAM
      you would sensibly run a 32 bit kernel on rather than going 64 bit.
      Signed-off-by: default avatarIan Campbell <ian.campbell@citrix.com>
      LKML-Reference: <1266403090-20162-1-git-send-email-ian.campbell@citrix.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      1942aeab
    • Thomas Gleixner's avatar
      sched: Don't use possibly stale sched_class · 04833a6a
      Thomas Gleixner authored
      commit 83ab0aa0 upstream.
      
      setscheduler() saves task->sched_class outside of the rq->lock held
      region for a check after the setscheduler changes have become
      effective. That might result in checking a stale value.
      
      rtmutex_setprio() has the same problem, though it is protected by
      p->pi_lock against setscheduler(), but for correctness sake (and to
      avoid bad examples) it needs to be fixed as well.
      
      Retrieve task->sched_class inside of the rq->lock held region.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      04833a6a
    • Suresh Siddha's avatar
      sched: Fix SMT scheduler regression in find_busiest_queue() · 76d07136
      Suresh Siddha authored
      commit 9000f05c upstream.
      
      Fix a SMT scheduler performance regression that is leading to a scenario
      where SMT threads in one core are completely idle while both the SMT threads
      in another core (on the same socket) are busy.
      
      This is caused by this commit (with the problematic code highlighted)
      
         commit bdb94aa5
         Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
         Date:   Tue Sep 1 10:34:38 2009 +0200
      
         sched: Try to deal with low capacity
      
         @@ -4203,15 +4223,18 @@ find_busiest_queue()
         ...
      	for_each_cpu(i, sched_group_cpus(group)) {
         +	unsigned long power = power_of(i);
      
         ...
      
         -	wl = weighted_cpuload(i);
         +	wl = weighted_cpuload(i) * SCHED_LOAD_SCALE;
         +	wl /= power;
      
         -	if (rq->nr_running == 1 && wl > imbalance)
         +	if (capacity && rq->nr_running == 1 && wl > imbalance)
      		continue;
      
      On a SMT system, power of the HT logical cpu will be 589 and
      the scheduler load imbalance (for scenarios like the one mentioned above)
      can be approximately 1024 (SCHED_LOAD_SCALE). The above change of scaling
      the weighted load with the power will result in "wl > imbalance" and
      ultimately resulting in find_busiest_queue() return NULL, causing
      load_balance() to think that the load is well balanced. But infact
      one of the tasks can be moved to the idle core for optimal performance.
      
      We don't need to use the weighted load (wl) scaled by the cpu power to
      compare with  imabalance. In that condition, we already know there is only a
      single task "rq->nr_running == 1" and the comparison between imbalance,
      wl is to make sure that we select the correct priority thread which matches
      imbalance. So we really need to compare the imabalnce with the original
      weighted load of the cpu and not the scaled load.
      
      But in other conditions where we want the most hammered(busiest) cpu, we can
      use scaled load to ensure that we consider the cpu power in addition to the
      actual load on that cpu, so that we can move the load away from the
      guy that is getting most hammered with respect to the actual capacity,
      as compared with the rest of the cpu's in that busiest group.
      
      Fix it.
      Reported-by: default avatarMa Ling <ling.ma@intel.com>
      Initial-Analysis-by: default avatarZhang, Yanmin <yanmin_zhang@linux.intel.com>
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1266023662.2808.118.camel@sbs-t61.sc.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      76d07136
    • Vaidyanathan Srinivasan's avatar
      sched: Fix sched_mv_power_savings for !SMT · d9d97367
      Vaidyanathan Srinivasan authored
      commit 28f53181 upstream.
      
      Fix for sched_mc_powersavigs for pre-Nehalem platforms.
      Child sched domain should clear SD_PREFER_SIBLING if parent will have
      SD_POWERSAVINGS_BALANCE because they are contradicting.
      
      Sets the flags correctly based on sched_mc_power_savings.
      Signed-off-by: default avatarVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20100208100555.GD2931@dirshya.in.ibm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      d9d97367
    • Gleb Natapov's avatar
      KVM: x86 emulator: Check CPL level during privilege instruction emulation · 0e352d47
      Gleb Natapov authored
      commit e92805ac upstream.
      
      Add CPL checking in case emulator is tricked into emulating
      privilege instruction from userspace.
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      0e352d47