• Chris Wilson's avatar
    drm/i915/cmdparser: Use cached vmappings · 0b537272
    Chris Wilson authored
    The single largest factor in the overhead of parsing the commands is the
    setup of the virtual mapping to provide a continuous block for the batch
    buffer. If we keep those vmappings around (against the better judgement
    of mm/vmalloc.c, which we offset by handwaving and looking suggestively
    at the shrinker) we can dramatically improve the performance of the
    parser for small batches (such as media workloads). Furthermore, we can
    use the prepare shmem read/write functions to determine  how best we
    need to clflush the range (rather than every page of the object).
    
    The impact of caching both src/dst vmaps is +80% on ivb and +140% on byt
    for the throughput on small batches. (Caching just the dst vmap and
    iterating over the src, doing a page by page copy is roughly 5% slower
    on both platforms. That may be an acceptable trade-off to eliminate one
    cached vmapping, and we may be able to reduce the per-page copying overhead
    further.) For *this* simple test case, the cmdparser is now within a
    factor of 2 of ideal performance.
    Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
    Cc: Matthew Auld <matthew.william.auld@gmail.com>
    Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
    Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-33-chris@chris-wilson.co.uk
    0b537272
i915_gem_execbuffer.c 53.1 KB