• Ben Widawsky's avatar
    drm/i915: Dynamic Parity Detection handling · e3689190
    Ben Widawsky authored
    On IVB hardware we are given an interrupt whenever a L3 parity error
    occurs in the L3 cache. The L3 cache is used by internal GPU clients
    only.  This is a very rare occurrence (in fact to test this I need to
    use specially instrumented silicon).
    
    When a row in the L3 cache detects a parity error the HW generates an
    interrupt. The interrupt is masked in GTIMR until we get a chance to
    read some registers and alert userspace via a uevent. With this
    information userspace can use a sysfs interface (follow-up patch) to
    remap those rows.
    
    Way above my level of understanding, but if a given row fails, it is
    statistically more likely to fail again than a row which has not failed.
    Therefore it is desirable for an operating system to maintain a lifelong
    list of failing rows and always remap any bad rows on driver load.
    Hardware limits the number of rows that are remappable per bank/subbank,
    and should more than that many rows detect parity errors, software
    should maintain a list of the most frequent errors, and remap those
    rows.
    
    V2: Drop WARN_ON(IS_GEN6) (Jesse)
    DRM_DEBUG row/bank/subbank on errror (Jesse)
    Comment updates (Jesse)
    Reviewed-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
    Signed-off-by: default avatarBen Widawsky <ben@bwidawsk.net>
    Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
    e3689190
i915_reg.h 155 KB