• Tejun Heo's avatar
    blk-iocost: Account force-charged overage in absolute vtime · 36a52481
    Tejun Heo authored
    Currently, when a bio needs to be force-charged and there isn't enough
    budget, vtime is simply pushed into the future.  This means that the
    cost of the whole bio is scaled using the current hweight and then
    charged immediately.  Until the global vtime advances beyond this
    future vtime, the cgroup won't be allowed to issue normal IOs.
    
    This is incorrect and can lead to, for example, exploding vrate or
    extended stalls if vrate range is constrained.  Consider the following
    scenario.
    
    1. A cgroup with a very low hweight runs out of budget.
    
    2. A storm of swap-out happens on it.  All of them are scaled
       according to the current low hweight and charged to vtime pushing
       it to a far future.
    
    3. All other cgroups go idle and now the above cgroup has access to
       the whole device.  However, because vtime is already wound using
       the past low hweight, what its current hweight is doesn't matter
       until global vtime catches up to the local vtime.
    
    4. As a result, either vrate gets ramped up extremely or the IOs stall
       while the underlying device is idle.
    
    This is because the hweight the overage is calculated at is different
    from the hweight that it's being paid at.
    
    Fix it by remembering the overage in absoulte vtime and continuously
    paying with the actual budget according to the current hweight at each
    period.
    
    Note that non-forced bios which wait already remembers the cost in
    absolute vtime.  This brings forced-bio accounting in line.
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    36a52481
blk-iocost.c 66.1 KB