habanalabs: increase timeout during reset

When doing training, the DL framework (e.g. tensorflow) performs hundreds of thousands of memory allocations and mappings. In case the driver needs to perform hard-reset during training, the driver kills the application and unmaps all those memory allocations. Unfortunately, because of that large amount of mappings, the driver isn't able to do that in the current timeout (5 seconds). Therefore, increase the timeout significantly to 30 seconds to avoid situation where the driver resets the device with active mappings, which sometime can cause a kernel bug. BTW, it doesn't mean we will spend all the 30 seconds because the reset thread checks every one second if the unmap operation is done. Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>

habanalabs: increase timeout during reset
When doing training, the DL framework (e.g. tensorflow) performs hundreds of thousands of memory allocations and mappings. In case the driver needs to perform hard-reset during training, the driver kills the application and unmaps all those memory allocations. Unfortunately, because of that large amount of mappings, the driver isn't able to do that in the current timeout (5 seconds). Therefore, increase the timeout significantly to 30 seconds to avoid situation where the driver resets the device with active mappings, which sometime can cause a kernel bug. BTW, it doesn't mean we will spend all the 30 seconds because the reset thread checks every one second if the unmap operation is done. Reviewed-by: Omer Shpigelman <oshpigelman@habana.ai> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
7a65ee04 · Oded Gabbay · 49aba0bb · 7a65ee04
Commit 7a65ee04 authored Mar 27, 2020 by Oded Gabbay
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

drivers/misc/habanalabs/habanalabs.h drivers/misc/habanalabs/habanalabs.h +1 -1

No files found.
--- a/drivers/misc/habanalabs/habanalabs.h
+++ b/drivers/misc/habanalabs/habanalabs.h
@@ -23,7 +23,7 @@

 #define HL_MMAP_CB_MASK			(0x8000000000000000ull >> PAGE_SHIFT)

-#define HL_PENDING_RESET_PER_SEC	5
+#define HL_PENDING_RESET_PER_SEC	30

 #define HL_DEVICE_TIMEOUT_USEC		1000000 /* 1 s */