An error occurred fetching the project authors.
  1. 29 Dec, 2008 11 commits
    • Gregory Haskins's avatar
      sched: create "pushable_tasks" list to limit pushing to one attempt · 917b627d
      Gregory Haskins authored
      The RT scheduler employs a "push/pull" design to actively balance tasks
      within the system (on a per disjoint cpuset basis).  When a task is
      awoken, it is immediately determined if there are any lower priority
      cpus which should be preempted.  This is opposed to the way normal
      SCHED_OTHER tasks behave, which will wait for a periodic rebalancing
      operation to occur before spreading out load.
      
      When a particular RQ has more than 1 active RT task, it is said to
      be in an "overloaded" state.  Once this occurs, the system enters
      the active balancing mode, where it will try to push the task away,
      or persuade a different cpu to pull it over.  The system will stay
      in this state until the system falls back below the <= 1 queued RT
      task per RQ.
      
      However, the current implementation suffers from a limitation in the
      push logic.  Once overloaded, all tasks (other than current) on the
      RQ are analyzed on every push operation, even if it was previously
      unpushable (due to affinity, etc).  Whats more, the operation stops
      at the first task that is unpushable and will not look at items
      lower in the queue.  This causes two problems:
      
      1) We can have the same tasks analyzed over and over again during each
         push, which extends out the fast path in the scheduler for no
         gain.  Consider a RQ that has dozens of tasks that are bound to a
         core.  Each one of those tasks will be encountered and skipped
         for each push operation while they are queued.
      
      2) There may be lower-priority tasks under the unpushable task that
         could have been successfully pushed, but will never be considered
         until either the unpushable task is cleared, or a pull operation
         succeeds.  The net result is a potential latency source for mid
         priority tasks.
      
      This patch aims to rectify these two conditions by introducing a new
      priority sorted list: "pushable_tasks".  A task is added to the list
      each time a task is activated or preempted.  It is removed from the
      list any time it is deactivated, made current, or fails to push.
      
      This works because a task only needs to be attempted to push once.
      After an initial failure to push, the other cpus will eventually try to
      pull the task when the conditions are proper.  This also solves the
      problem that we don't completely analyze all tasks due to encountering
      an unpushable tasks.  Now every task will have a push attempted (when
      appropriate).
      
      This reduces latency both by shorting the critical section of the
      rq->lock for certain workloads, and by making sure the algorithm
      considers all eligible tasks in the system.
      
      [ rostedt: added a couple more BUG_ONs ]
      Signed-off-by: default avatarGregory Haskins <ghaskins@novell.com>
      Acked-by: default avatarSteven Rostedt <srostedt@redhat.com>
      917b627d
    • Gregory Haskins's avatar
      plist: fix PLIST_NODE_INIT to work with debug enabled · 4075134e
      Gregory Haskins authored
      It seems that PLIST_NODE_INIT breaks if used and DEBUG_PI_LIST is defined.
      Since there are no current users of PLIST_NODE_INIT, this has gone
      undetected.  This patch fixes the build issue that enables the
      DEBUG_PI_LIST later in the series when we use it in init_task.h
      Signed-off-by: default avatarGregory Haskins <ghaskins@novell.com>
      4075134e
    • Gregory Haskins's avatar
      sched: add sched_class->needs_post_schedule() member · 967fc046
      Gregory Haskins authored
      We currently run class->post_schedule() outside of the rq->lock, which
      means that we need to test for the need to post_schedule outside of
      the lock to avoid a forced reacquistion.  This is currently not a problem
      as we only look at rq->rt.overloaded.  However, we want to enhance this
      going forward to look at more state to reduce the need to post_schedule to
      a bare minimum set.  Therefore, we introduce a new member-func called
      needs_post_schedule() which tests for the post_schedule condtion without
      actually performing the work.  Therefore it is safe to call this
      function before the rq->lock is released, because we are guaranteed not
      to drop the lock at an intermediate point (such as what post_schedule()
      may do).
      
      We will use this later in the series
      
      [ rostedt: removed paranoid BUG_ON ]
      Signed-off-by: default avatarGregory Haskins <ghaskins@novell.com>
      967fc046
    • Gregory Haskins's avatar
      sched: make double-lock-balance fair · 8f45e2b5
      Gregory Haskins authored
      double_lock balance() currently favors logically lower cpus since they
      often do not have to release their own lock to acquire a second lock.
      The result is that logically higher cpus can get starved when there is
      a lot of pressure on the RQs.  This can result in higher latencies on
      higher cpu-ids.
      
      This patch makes the algorithm more fair by forcing all paths to have
      to release both locks before acquiring them again.  Since callsites to
      double_lock_balance already consider it a potential preemption/reschedule
      point, they have the proper logic to recheck for atomicity violations.
      Signed-off-by: default avatarGregory Haskins <ghaskins@novell.com>
      8f45e2b5
    • Gregory Haskins's avatar
      sched: pull only one task during NEWIDLE balancing to limit critical section · 7e96fa58
      Gregory Haskins authored
      git-id c4acb2c0 attempted to limit
      newidle critical section length by stopping after at least one task
      was moved.  Further investigation has shown that there are other
      paths nested further inside the algorithm which still remain that allow
      long latencies to occur with newidle balancing.  This patch applies
      the same technique inside balance_tasks() to limit the duration of
      this optional balancing operation.
      Signed-off-by: default avatarGregory Haskins <ghaskins@novell.com>
      CC: Nick Piggin <npiggin@suse.de>
      7e96fa58
    • Gregory Haskins's avatar
      sched: only try to push a task on wakeup if it is migratable · 777c2f38
      Gregory Haskins authored
      There is no sense in wasting time trying to push a task away that
      cannot move anywhere else.  We gain no benefit from trying to push
      other tasks at this point, so if the task being woken up is non
      migratable, just skip the whole operation.  This reduces overhead
      in the wakeup path for certain tasks.
      Signed-off-by: default avatarGregory Haskins <ghaskins@novell.com>
      777c2f38
    • Gregory Haskins's avatar
      sched: use highest_prio.next to optimize pull operations · 74ab8e4f
      Gregory Haskins authored
      We currently take the rq->lock for every cpu in an overload state during
      pull_rt_tasks().  However, we now have enough information via the
      highest_prio.[curr|next] fields to determine if there is any tasks of
      interest to warrant the overhead of the rq->lock, before we actually take
      it.  So we use this information to reduce lock contention during the
      pull for the case where the source-rq doesnt have tasks that preempt
      the current task.
      Signed-off-by: default avatarGregory Haskins <ghaskins@novell.com>
      74ab8e4f
    • Gregory Haskins's avatar
      sched: use highest_prio.curr for pull threshold · a8728944
      Gregory Haskins authored
      highest_prio.curr is actually a more accurate way to keep track of
      the pull_rt_task() threshold since it is always up to date, even
      if the "next" task migrates during double_lock.  Therefore, stop
      looking at the "next" task object and simply use the highest_prio.curr.
      Signed-off-by: default avatarGregory Haskins <ghaskins@novell.com>
      a8728944
    • Gregory Haskins's avatar
      sched: track the next-highest priority on each runqueue · e864c499
      Gregory Haskins authored
      We will use this later in the series to reduce the amount of rq-lock
      contention during a pull operation
      Signed-off-by: default avatarGregory Haskins <ghaskins@novell.com>
      e864c499
    • Gregory Haskins's avatar
      sched: cleanup inc/dec_rt_tasks · 4d984277
      Gregory Haskins authored
      Move some common definitions up to the function prologe to simplify the
      body logic.
      Signed-off-by: default avatarGregory Haskins <ghaskins@novell.com>
      4d984277
    • Sergio Luis's avatar
      x86: mark get_cpu_leaves() with __cpuinit annotation · 6092848a
      Sergio Luis authored
      Impact: fix section mismatch warning
      
      Commit b2bb8554 ("x86: Remove cpumask games
      in x86/kernel/cpu/intel_cacheinfo.c") introduced get_cpu_leaves(), which
      references __cpuinit cpuid4_cache_lookup().
      
      Mark get_cpu_leaves() with a __cpuinit annotation.
      Signed-off-by: default avatarSergio Luis <sergio@larces.uece.br>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      6092848a
  2. 23 Dec, 2008 4 commits
  3. 19 Dec, 2008 10 commits
  4. 18 Dec, 2008 4 commits
    • Mike Travis's avatar
      x86: use possible_cpus=NUM to extend the possible cpus allowed · 3b11ce7f
      Mike Travis authored
      Impact: add new boot parameter
      
      Use possible_cpus=NUM kernel parameter to extend the number of possible
      cpus.
      
      The ability to HOTPLUG ON cpus that are "possible" but not "present" is
      dealt with in a later patch.
      Signed-off-by: default avatarMike Travis <travis@sgi.com>
      3b11ce7f
    • Mike Travis's avatar
      x86: fix cpu_mask_to_apicid_and to include cpu_online_mask · a775a38b
      Mike Travis authored
      Impact: fix potential APIC crash
      
      In determining the destination apicid, there are usually three cpumasks
      that are considered: the incoming cpumask arg, cfg->domain and the
      cpu_online_mask.  Since we are just introducing the cpu_mask_to_apicid_and
      function, make sure it includes the cpu_online_mask in it's evaluation.
      [Added with this patch.]
      
      There are two io_apic.c functions that did not previously use the
      cpu_online_mask:  setup_IO_APIC_irq and msi_compose_msg.  Both of these
      simply used cpu_mask_to_apicid(cfg->domain & TARGET_CPUS), and all but
      one arch (NUMAQ[*]) returns only online cpus in the TARGET_CPUS mask,
      so the behavior is identical for all cases.
      
      [*: NUMAQ bug?]
      
      Note that alloc_cpumask_var is only used for the 32-bit cases where
      it's highly likely that the cpumask set size will be small and therefore
      CPUMASK_OFFSTACK=n.  But if that's not the case, failing the allocate
      will cause the same return value as the default.
      Signed-off-by: default avatarMike Travis <travis@sgi.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a775a38b
    • Ingo Molnar's avatar
      Merge branch 'x86/apic' into cpus4096 · 9a3d8f73
      Ingo Molnar authored
      This done for conflict prevention: we merge it into the cpus4096 tree
      because upcoming cpumask changes will touch apic.c that would collide
      with x86/apic otherwise.
      9a3d8f73
    • Ingo Molnar's avatar
      Merge branch 'linus' into cpus4096 · b9974dc6
      Ingo Molnar authored
      b9974dc6
  5. 17 Dec, 2008 11 commits