• Jason Gunthorpe's avatar
    RDMA/ucma: Do not miss ctx destruction steps in some cases · 8ae291cc
    Jason Gunthorpe authored
    The destruction flow is very complicated here because the cm_id can be
    destroyed from the event handler at any time if the device is
    hot-removed. This leaves behind a partial ctx with no cm_id in the
    xarray, and will let user space leak memory.
    
    Make everything consistent in this flow in all places:
    
     - Return the xarray back to XA_ZERO_ENTRY before beginning any
       destruction. The thread that reaches this first is responsible to
       kfree, everyone else does nothing.
    
     - Test the xarray during the special hot-removal case to block the
       queue_work, this has much simpler locking and doesn't require a
       'destroying'
    
     - Fix the ref initialization so that it is only positive if cm_id !=
       NULL, then rely on that to guide the destruction process in all cases.
    
    Now the new ucma_destroy_private_ctx() can be called in all places that
    want to free the ctx, including all the error unwinds, and none of the
    details are missed.
    
    Fixes: a1d33b70 ("RDMA/ucma: Rework how new connections are passed through event delivery")
    Link: https://lore.kernel.org/r/20210105111327.230270-1-leon@kernel.orgSigned-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
    Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
    8ae291cc
ucma.c 46.6 KB