• Omer Shpigelman's avatar
    habanalabs: complete user context cleanup before hard reset · f650a95b
    Omer Shpigelman authored
    This patch fixes a bug which led to a crash during hard reset flow.
    Before a hard reset is executed, we wait a few seconds for the user
    context cleanup to complete.
    If it wasn't completed, we kill the user process and move on to the reset
    flow.
    Upon killing the user process, the context cleanup flow begins and may
    take a while due to MMU unmaps.
    Meanwhile, in the driver reset flow, we change the PCI DRAM bar location
    which can interfere with the MMU that uses the bar.
    If the context cleanup flow didn't finish quickly, a crash may occur due
    to PCI DRAM bar mislocation during the MMU unmap.
    Hence adding a wait between killing the user process and the start of the
    reset flow.
    Signed-off-by: default avatarOmer Shpigelman <oshpigelman@habana.ai>
    Signed-off-by: default avatarOded Gabbay <oded.gabbay@gmail.com>
    f650a95b
device.c 27 KB