drm/amdkfd: fix a use after free race with mmu_notifer unregister
When using mmu_notifer_unregister_no_release() the caller must ensure there is a SRCU synchronize before the mn memory is freed, otherwise use after free races are possible, for instance: CPU0 CPU1 invalidate_range_start hlist_for_each_entry_rcu(..) mmu_notifier_unregister_no_release(&p->mn) kfree(mn) if (mn->ops->invalidate_range_end) The error unwind in amdkfd misses the SRCU synchronization. amdkfd keeps the kfd_process around until the mm is released, so split the flow to fully initialize the kfd_process and register it for find_process, and with the notifier. Past this point the kfd_process does not need to be cleaned up as it is fully ready. The final failable step does a vm_mmap() and does not seem to impact the kfd_process global state. Since it also cannot be undone (and already has problems with undo if it internally fails), it has to be last. This way we don't have to try to unwind the mmu_notifier_register() and avoid the problem with the SRCU. Along the way this also fixes various other error unwind bugs in the flow. Fixes: 45102048 ("amdkfd: Add process queue manager module") Link: https://lore.kernel.org/r/20190806231548.25242-10-jgg@ziepe.caReviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Showing
Please register or sign in to comment