• Phil Auld's avatar
    sched/fair: Fix throttle_list starvation with low CFS quota · baa9be4f
    Phil Auld authored
    With a very low cpu.cfs_quota_us setting, such as the minimum of 1000,
    distribute_cfs_runtime may not empty the throttled_list before it runs
    out of runtime to distribute. In that case, due to the change from
    c06f04c7 to put throttled entries at the head of the list, later entries
    on the list will starve.  Essentially, the same X processes will get pulled
    off the list, given CPU time and then, when expired, get put back on the
    head of the list where distribute_cfs_runtime will give runtime to the same
    set of processes leaving the rest.
    
    Fix the issue by setting a bit in struct cfs_bandwidth when
    distribute_cfs_runtime is running, so that the code in throttle_cfs_rq can
    decide to put the throttled entry on the tail or the head of the list.  The
    bit is set/cleared by the callers of distribute_cfs_runtime while they hold
    cfs_bandwidth->lock.
    
    This is easy to reproduce with a handful of CPU consumers. I use 'crash' on
    the live system. In some cases you can simply look at the throttled list and
    see the later entries are not changing:
    
      crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1"  "$4}' | pr -t -n3
        1     ffff90b56cb2d200  -976050
        2     ffff90b56cb2cc00  -484925
        3     ffff90b56cb2bc00  -658814
        4     ffff90b56cb2ba00  -275365
        5     ffff90b166a45600  -135138
        6     ffff90b56cb2da00  -282505
        7     ffff90b56cb2e000  -148065
        8     ffff90b56cb2fa00  -872591
        9     ffff90b56cb2c000  -84687
       10     ffff90b56cb2f000  -87237
       11     ffff90b166a40a00  -164582
    
      crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1"  "$4}' | pr -t -n3
        1     ffff90b56cb2d200  -994147
        2     ffff90b56cb2cc00  -306051
        3     ffff90b56cb2bc00  -961321
        4     ffff90b56cb2ba00  -24490
        5     ffff90b166a45600  -135138
        6     ffff90b56cb2da00  -282505
        7     ffff90b56cb2e000  -148065
        8     ffff90b56cb2fa00  -872591
        9     ffff90b56cb2c000  -84687
       10     ffff90b56cb2f000  -87237
       11     ffff90b166a40a00  -164582
    
    Sometimes it is easier to see by finding a process getting starved and looking
    at the sched_info:
    
      crash> task ffff8eb765994500 sched_info
      PID: 7800   TASK: ffff8eb765994500  CPU: 16  COMMAND: "cputest"
        sched_info = {
          pcount = 8,
          run_delay = 697094208,
          last_arrival = 240260125039,
          last_queued = 240260327513
        },
      crash> task ffff8eb765994500 sched_info
      PID: 7800   TASK: ffff8eb765994500  CPU: 16  COMMAND: "cputest"
        sched_info = {
          pcount = 8,
          run_delay = 697094208,
          last_arrival = 240260125039,
          last_queued = 240260327513
        },
    Signed-off-by: default avatarPhil Auld <pauld@redhat.com>
    Reviewed-by: default avatarBen Segall <bsegall@google.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: stable@vger.kernel.org
    Fixes: c06f04c7 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop")
    Link: http://lkml.kernel.org/r/20181008143639.GA4019@pauld.bos.csbSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    baa9be4f
sched.h 57.9 KB