• Rajneesh Bhardwaj's avatar
    drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs · 36988070
    Rajneesh Bhardwaj authored
    Checkpoint-Restore in userspace (CRIU) is a powerful tool that can
    snapshot a running process and later restore it on same or a remote
    machine but expects the processes that have a device file (e.g. GPU)
    associated with them, provide necessary driver support to assist CRIU
    and its extensible plugin interface. Thus, In order to support the
    Checkpoint-Restore of any ROCm process, the AMD Radeon Open Compute
    Kernel driver, needs to provide a set of new APIs that provide
    necessary VRAM metadata and its contents to a userspace component
    (CRIU plugin) that can store it in form of image files.
    
    This introduces some new ioctls which will be used to checkpoint-Restore
    any KFD bound user process. KFD only allows ioctl calls from the same
    process that opened the KFD file descriptor. Since these ioctls are
    expected to be called from a KFD criu plugin which has elevated ptrace
    attached privileges and CAP_CHECKPOINT_RESTORE capabilities attached with
    the file descriptors so modify KFD to allow such calls.
    
    (API redesigned by David Yat Sin)
    Suggested-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
    Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
    Signed-off-by: default avatarDavid Yat Sin <david.yatsin@amd.com>
    Signed-off-by: default avatarRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
    Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
    36988070
kfd_chardev.c 54.8 KB