• Tejun Heo's avatar
    blkcg: implement blk-iocost · 7caa4715
    Tejun Heo authored
    This patchset implements IO cost model based work-conserving
    proportional controller.
    
    While io.latency provides the capability to comprehensively prioritize
    and protect IOs depending on the cgroups, its protection is binary -
    the lowest latency target cgroup which is suffering is protected at
    the cost of all others.  In many use cases including stacking multiple
    workload containers in a single system, it's necessary to distribute
    IO capacity with better granularity.
    
    One challenge of controlling IO resources is the lack of trivially
    observable cost metric.  The most common metrics - bandwidth and iops
    - can be off by orders of magnitude depending on the device type and
    IO pattern.  However, the cost isn't a complete mystery.  Given
    several key attributes, we can make fairly reliable predictions on how
    expensive a given stream of IOs would be, at least compared to other
    IO patterns.
    
    The function which determines the cost of a given IO is the IO cost
    model for the device.  This controller distributes IO capacity based
    on the costs estimated by such model.  The more accurate the cost
    model the better but the controller adapts based on IO completion
    latency and as long as the relative costs across differents IO
    patterns are consistent and sensible, it'll adapt to the actual
    performance of the device.
    
    Currently, the only implemented cost model is a simple linear one with
    a few sets of default parameters for different classes of device.
    This covers most common devices reasonably well.  All the
    infrastructure to tune and add different cost models is already in
    place and a later patch will also allow using bpf progs for cost
    models.
    
    Please see the top comment in blk-iocost.c and documentation for
    more details.
    
    v2: Rebased on top of RQ_ALLOC_TIME changes and folded in Rik's fix
        for a divide-by-zero bug in current_hweight() triggered by zero
        inuse_sum.
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Cc: Andy Newell <newella@fb.com>
    Cc: Josef Bacik <jbacik@fb.com>
    Cc: Rik van Riel <riel@surriel.com>
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    7caa4715
cgroup-v2.rst 91.9 KB