• Vincent Donnefort's avatar
    sched/fair: Remove the energy margin in feec() · b812fc97
    Vincent Donnefort authored
    find_energy_efficient_cpu() integrates a margin to protect tasks from
    bouncing back and forth from a CPU to another. This margin is set as being
    6% of the total current energy estimated on the system. This however does
    not work for two reasons:
    
    1. The energy estimation is not a good absolute value:
    
    compute_energy() used in feec() is a good estimation for task placement as
    it allows to compare the energy with and without a task. The computed
    delta will give a good overview of the cost for a certain task placement.
    It, however, doesn't work as an absolute estimation for the total energy
    of the system. First it adds the contribution to idle CPUs into the
    energy, second it mixes util_avg with util_est values. util_avg contains
    the near history for a CPU usage, it doesn't tell at all what the current
    utilization is. A system that has been quite busy in the near past will
    hold a very high energy and then a high margin preventing any task
    migration to a lower capacity CPU, wasting energy. It even creates a
    negative feedback loop: by holding the tasks on a less efficient CPU, the
    margin contributes in keeping the energy high.
    
    2. The margin handicaps small tasks:
    
    On a system where the workload is composed mostly of small tasks (which is
    often the case on Android), the overall energy will be high enough to
    create a margin none of those tasks can cross. On a Pixel4, a small
    utilization of 5% on all the CPUs creates a global estimated energy of 140
    joules, as per the Energy Model declaration of that same device. This
    means, after applying the 6% margin that any migration must save more than
    8 joules to happen. No task with a utilization lower than 40 would then be
    able to migrate away from the biggest CPU of the system.
    
    The 6% of the overall system energy was brought by the following patch:
    
     (eb92692b sched/fair: Speed-up energy-aware wake-ups)
    
    It was previously 6% of the prev_cpu energy. Also, the following one
    made this margin value conditional on the clusters where the task fits:
    
     (8d4c97c1 sched/fair: Only compute base_energy_pd if necessary)
    
    We could simply revert that margin change to what it was, but the original
    version didn't have strong grounds neither and as demonstrated in (1.) the
    estimated energy isn't a good absolute value. Instead, removing it
    completely. It is indeed, made possible by recent changes that improved
    energy estimation comparison fairness (sched/fair: Remove task_util from
    effective utilization in feec()) (PM: EM: Increase energy calculation
    precision) and task utilization stabilization (sched/fair: Decay task
    util_avg during migration)
    
    Without a margin, we could have feared bouncing between CPUs. But running
    LISA's eas_behaviour test coverage on three different platforms (Hikey960,
    RB-5 and DB-845) showed no issue.
    
    Removing the energy margin enables more energy-optimized placements for a
    more energy efficient system.
    Signed-off-by: default avatarVincent Donnefort <vincent.donnefort@arm.com>
    Signed-off-by: default avatarVincent Donnefort <vdonnefort@google.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: default avatarDietmar Eggemann <dietmar.eggemann@arm.com>
    Tested-by: default avatarLukasz Luba <lukasz.luba@arm.com>
    Link: https://lkml.kernel.org/r/20220621090414.433602-8-vdonnefort@google.com
    b812fc97
fair.c 316 KB