• Mark Hairgrove's avatar
    powerpc/powernv/npu: Use size-based ATSD invalidates · 3689c37d
    Mark Hairgrove authored
    Prior to this change only two types of ATSDs were issued to the NPU:
    invalidates targeting a single page and invalidates targeting the whole
    address space. The crossover point happened at the configurable
    atsd_threshold which defaulted to 2M. Invalidates that size or smaller
    would issue per-page invalidates for the whole range.
    
    The NPU supports more invalidation sizes however: 64K, 2M, 1G, and all.
    These invalidates target addresses aligned to their size. 2M is a common
    invalidation size for GPU-enabled applications because that is a GPU
    page size, so reducing the number of invalidates by 32x in that case is a
    clear improvement.
    
    ATSD latency is high in general so now we always issue a single invalidate
    rather than multiple. This will over-invalidate in some cases, but for any
    invalidation size over 2M it matches or improves the prior behavior.
    There's also an improvement for single-page invalidates since the prior
    version issued two invalidates for that case instead of one.
    
    With this change all issued ATSDs now perform a flush, so the flush
    parameter has been removed from all the helpers.
    
    To show the benefit here are some performance numbers from a
    microbenchmark which creates a 1G allocation then uses mprotect with
    PROT_NONE to trigger invalidates in strides across the allocation.
    
    One NPU (1 GPU):
    
             mprotect rate (GB/s)
    Stride   Before      After      Speedup
    64K         5.3        5.6           5%
    1M         39.3       57.4          46%
    2M         49.7       82.6          66%
    4M        286.6      285.7           0%
    
    Two NPUs (6 GPUs):
    
             mprotect rate (GB/s)
    Stride   Before      After      Speedup
    64K         6.5        7.4          13%
    1M         33.4       67.9         103%
    2M         38.7       93.1         141%
    4M        356.7      354.6          -1%
    
    Anything over 2M is roughly the same as before since both cases issue a
    single ATSD.
    Signed-off-by: default avatarMark Hairgrove <mhairgrove@nvidia.com>
    Reviewed-By: default avatarAlistair Popple <alistair@popple.id.au>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    3689c37d
npu-dma.c 25.4 KB