• Alan Previn's avatar
    drm/i915/guc: Don't update engine busyness stats too frequently · 59bcdb56
    Alan Previn authored
    Using two different types of workoads, it was observed that
    guc_update_engine_gt_clks was being called too frequently and/or
    causing a CPU-to-lmem bandwidth hit over PCIE. Details on
    the workloads and numbers are in the notes below.
    
    Background: At the moment, guc_update_engine_gt_clks can be invoked
    via one of 3 ways. #1 and #2 are infrequent under normal operating
    conditions:
         1.When a predefined "ping_delay" timer expires so that GuC-
           busyness can sample the GTPM clock counter to ensure it
           doesn't miss a wrap-around of the 32-bits of the HW counter.
           (The ping_delay is calculated based on 1/8th the time taken
           for the counter go from 0x0 to 0xffffffff based on the
           GT frequency. This comes to about once every 28 seconds at a
           GT frequency of 19.2Mhz).
         2.In preparation for a gt reset.
         3.In response to __gt_park events (as the gt power management
           puts the gt into a lower power state when there is no work
           being done).
    
    Root-cause: For both the workloads described farther below, it was
    observed that when user space calls IOCTLs that unparks the
    gt momentarily and repeats such calls many times in quick succession,
    it triggers calling guc_update_engine_gt_clks as many times. However,
    the primary purpose of guc_update_engine_gt_clks is to ensure we don't
    miss the wraparound while the counter is ticking. Thus, the solution
    is to ensure we skip that check if gt_park is calling this function
    earlier than necessary.
    
    Solution: Snapshot jiffies when we do actually update the busyness
    stats. Then get the new jiffies every time intel_guc_busyness_park
    is called and bail if we are being called too soon. Use half of the
    ping_delay as a safe threshold.
    
    NOTE1: Workload1: IGTs' gem_create was modified to create a file handle,
    allocate memory with sizes that range from a min of 4K to the max supported
    (in power of two step-sizes). Its maps, modifies and reads back the
    memory. Allocations and modification is repeated until total memory
    allocation reaches the max. Then the file handle is closed. With this
    workload, guc_update_engine_gt_clks was called over 188 thousand times
    in the span of 15 seconds while this test ran three times. With this patch,
    the number of calls reduced to 14.
    
    NOTE2: Workload2: 30 transcode sessions are created in quick succession.
    While these sessions are created, pcm-iio tool was used to measure I/O
    read operation bandwidth consumption sampled at 100 milisecond intervals
    over the course of 20 seconds. The total bandwidth consumed over 20 seconds
    without this patch was measured at average at 311KBps per sample. With this
    patch, the number went down to about 175Kbps which is about a 43% savings.
    Signed-off-by: default avatarAlan Previn <alan.previn.teres.alexis@intel.com>
    Reviewed-by: default avatarUmesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
    Acked-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
    Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20220623023157.211650-2-alan.previn.teres.alexis@intel.com
    59bcdb56
intel_guc_submission.c 137 KB