• Morten Rasmussen's avatar
    sched/fair: Add asymmetric CPU capacity wakeup scan · b7a33161
    Morten Rasmussen authored
    Issue
    =====
    
    On asymmetric CPU capacity topologies, we currently rely on wake_cap() to
    drive select_task_rq_fair() towards either:
    
    - its slow-path (find_idlest_cpu()) if either the previous or
      current (waking) CPU has too little capacity for the waking task
    - its fast-path (select_idle_sibling()) otherwise
    
    Commit:
    
      3273163c ("sched/fair: Let asymmetric CPU configurations balance at wake-up")
    
    points out that this relies on the assumption that "[...]the CPU capacities
    within an SD_SHARE_PKG_RESOURCES domain (sd_llc) are homogeneous".
    
    This assumption no longer holds on newer generations of big.LITTLE
    systems (DynamIQ), which can accommodate CPUs of different compute capacity
    within a single LLC domain. To hopefully paint a better picture, a regular
    big.LITTLE topology would look like this:
    
      +---------+ +---------+
      |   L2    | |   L2    |
      +----+----+ +----+----+
      |CPU0|CPU1| |CPU2|CPU3|
      +----+----+ +----+----+
          ^^^         ^^^
        LITTLEs      bigs
    
    which would result in the following scheduler topology:
    
      DIE [         ] <- sd_asym_cpucapacity
      MC  [   ] [   ] <- sd_llc
           0 1   2 3
    
    Conversely, a DynamIQ topology could look like:
    
      +-------------------+
      |        L3         |
      +----+----+----+----+
      | L2 | L2 | L2 | L2 |
      +----+----+----+----+
      |CPU0|CPU1|CPU2|CPU3|
      +----+----+----+----+
         ^^^^^     ^^^^^
        LITTLEs    bigs
    
    which would result in the following scheduler topology:
    
      MC [       ] <- sd_llc, sd_asym_cpucapacity
          0 1 2 3
    
    What this means is that, on DynamIQ systems, we could pass the wake_cap()
    test (IOW presume the waking task fits on the CPU capacities of some LLC
    domain), thus go through select_idle_sibling().
    This function operates on an LLC domain, which here spans both bigs and
    LITTLEs, so it could very well pick a CPU of too small capacity for the
    task, despite there being fitting idle CPUs - it very much depends on the
    CPU iteration order, on which we have absolutely no guarantees
    capacity-wise.
    
    Implementation
    ==============
    
    Introduce yet another select_idle_sibling() helper function that takes CPU
    capacity into account. The policy is to pick the first idle CPU which is
    big enough for the task (task_util * margin < cpu_capacity). If no
    idle CPU is big enough, we pick the idle one with the highest capacity.
    
    Unlike other select_idle_sibling() helpers, this one operates on the
    sd_asym_cpucapacity sched_domain pointer, which is guaranteed to span all
    known CPU capacities in the system. As such, this will work for both
    "legacy" big.LITTLE (LITTLEs & bigs split at MC, joined at DIE) and for
    newer DynamIQ systems (e.g. LITTLEs and bigs in the same MC domain).
    
    Note that this limits the scope of select_idle_sibling() to
    select_idle_capacity() for asymmetric CPU capacity systems - the LLC domain
    will not be scanned, and no further heuristic will be applied.
    Signed-off-by: default avatarMorten Rasmussen <morten.rasmussen@arm.com>
    Signed-off-by: default avatarValentin Schneider <valentin.schneider@arm.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Reviewed-by: default avatarQuentin Perret <qperret@google.com>
    Link: https://lkml.kernel.org/r/20200206191957.12325-2-valentin.schneider@arm.com
    b7a33161
fair.c 289 KB