• Qais Yousef's avatar
    block/blk-mq: Don't complete locally if capacities are different · af550e4c
    Qais Yousef authored
    The logic in blk_mq_complete_need_ipi() assumes SMP systems where all
    CPUs have equal compute capacities and only LLC cache can make
    a different on perceived performance. But this assumption falls apart on
    HMP systems where LLC is shared, but the CPUs have different capacities.
    Staying local then can have a big performance impact if the IO request
    was done from a CPU with higher capacity but the interrupt is serviced
    on a lower capacity CPU.
    
    Use the new cpus_equal_capacity() function to check if we need to send
    an IPI.
    
    Without the patch I see the BLOCK softirq always running on little cores
    (where the hardirq is serviced). With it I can see it running on all
    cores.
    
    This was noticed after the topology change [1] where now on a big.LITTLE
    we truly get that the LLC is shared between all cores where as in the
    past it was being misrepresented for historical reasons. The logic
    exposed a missing dependency on capacities for such systems where there
    can be a big performance difference between the CPUs.
    
    This of course introduced a noticeable change in behavior depending on
    how the topology is presented. Leading to regressions in some workloads
    as the performance of the BLOCK softirq on littles can be noticeably
    worse on some platforms.
    
    Worth noting that we could have checked for capacities being greater
    than or equal instead for equality. This will lead to favouring higher
    performance always. But opted for equality instead to match the
    performance of the requester without making an assumption that can lead
    to power trade-offs which these systems tend to be sensitive about. If
    the requester would like to run faster, it's better to rely on the
    scheduler to give the IO requester via some facility to run on a faster
    core; and then if the interrupt triggered on a CPU with different
    capacity we'll make sure to match the performance the requester is
    supposed to run at.
    
    [1] https://lpc.events/event/16/contributions/1342/attachments/962/1883/LPC-2022-Android-MC-Phantom-Domains.pdfSigned-off-by: default avatarQais Yousef <qyousef@layalina.io>
    Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
    Link: https://lore.kernel.org/r/20240223155749.2958009-3-qyousef@layalina.ioSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
    af550e4c
blk-mq.c 123 KB