• Jason Gunthorpe's avatar
    RDMA/ipoib: Fix ABBA deadlock with ipoib_reap_ah() · 65936bf2
    Jason Gunthorpe authored
    ipoib_mcast_carrier_on_task() insanely open codes a rtnl_lock() such that
    the only time flush_workqueue() can be called is if it also clears
    IPOIB_FLAG_OPER_UP.
    
    Thus the flush inside ipoib_flush_ah() will deadlock if it gets unlucky
    enough, and lockdep doesn't help us to find it early:
    
              CPU0               CPU1          CPU2
       __ipoib_ib_dev_flush()
          down_read(vlan_rwsem)
    
                             ipoib_vlan_add()
                               rtnl_trylock()
                               down_write(vlan_rwsem)
    
    				      ipoib_mcast_carrier_on_task()
    					 while (!rtnl_trylock())
    					      msleep(20);
    
          ipoib_flush_ah()
    	flush_workqueue(priv->wq)
    
    Clean up the ah_reaper related functions and lifecycle to make sense:
    
     - Start/Stop of the reaper should only be done in open/stop NDOs, not in
       any other places
    
     - cancel and flush of the reaper should only happen in the stop NDO.
       cancel is only functional when combined with IPOIB_STOP_REAPER.
    
     - Non-stop places were flushing the AH's just need to flush out dead AH's
       synchronously and ignore the background task completely. It is fully
       locked and harmless to leave running.
    
    Which ultimately fixes the ABBA deadlock by removing the unnecessary
    flush_workqueue() from the problematic place under the vlan_rwsem.
    
    Fixes: efc82eee ("IB/ipoib: No longer use flush as a parameter")
    Link: https://lore.kernel.org/r/20200625174219.290842-1-kamalheib1@gmail.comReported-by: default avatarKamal Heib <kheib@redhat.com>
    Tested-by: default avatarKamal Heib <kheib@redhat.com>
    Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
    65936bf2
ipoib_ib.c 33.4 KB