• Jason Gunthorpe's avatar
    drm/amdkfd: fix a use after free race with mmu_notifer unregister · 0029cab3
    Jason Gunthorpe authored
    When using mmu_notifer_unregister_no_release() the caller must ensure
    there is a SRCU synchronize before the mn memory is freed, otherwise use
    after free races are possible, for instance:
    
         CPU0                                      CPU1
                                          invalidate_range_start
                                             hlist_for_each_entry_rcu(..)
     mmu_notifier_unregister_no_release(&p->mn)
     kfree(mn)
                                          if (mn->ops->invalidate_range_end)
    
    The error unwind in amdkfd misses the SRCU synchronization.
    
    amdkfd keeps the kfd_process around until the mm is released, so split the
    flow to fully initialize the kfd_process and register it for find_process,
    and with the notifier. Past this point the kfd_process does not need to be
    cleaned up as it is fully ready.
    
    The final failable step does a vm_mmap() and does not seem to impact the
    kfd_process global state. Since it also cannot be undone (and already has
    problems with undo if it internally fails), it has to be last.
    
    This way we don't have to try to unwind the mmu_notifier_register() and
    avoid the problem with the SRCU.
    
    Along the way this also fixes various other error unwind bugs in the flow.
    
    Fixes: 45102048 ("amdkfd: Add process queue manager module")
    Link: https://lore.kernel.org/r/20190806231548.25242-10-jgg@ziepe.caReviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
    Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
    0029cab3
kfd_process.c 30.5 KB