• Parav Pandit's avatar
    RDMA/mlx5: Fix devlink deadlock on net namespace deletion · fbdd0049
    Parav Pandit authored
    When a mlx5 core devlink instance is reloaded in different net namespace,
    its associated IB device is deleted and recreated.
    
    Example sequence is:
    $ ip netns add foo
    $ devlink dev reload pci/0000:00:08.0 netns foo
    $ ip netns del foo
    
    mlx5 IB device needs to attach and detach the netdevice to it through the
    netdev notifier chain during load and unload sequence.  A below call graph
    of the unload flow.
    
    cleanup_net()
       down_read(&pernet_ops_rwsem); <- first sem acquired
         ops_pre_exit_list()
           pre_exit()
             devlink_pernet_pre_exit()
               devlink_reload()
                 mlx5_devlink_reload_down()
                   mlx5_unload_one()
                   [...]
                     mlx5_ib_remove()
                       mlx5_ib_unbind_slave_port()
                         mlx5_remove_netdev_notifier()
                           unregister_netdevice_notifier()
                             down_write(&pernet_ops_rwsem);<- recurrsive lock
    
    Hence, when net namespace is deleted, mlx5 reload results in deadlock.
    
    When deadlock occurs, devlink mutex is also held. This not only deadlocks
    the mlx5 device under reload, but all the processes which attempt to
    access unrelated devlink devices are deadlocked.
    
    Hence, fix this by mlx5 ib driver to register for per net netdev notifier
    instead of global one, which operats on the net namespace without holding
    the pernet_ops_rwsem.
    
    Fixes: 4383cfcc ("net/mlx5: Add devlink reload")
    Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.comSigned-off-by: default avatarParav Pandit <parav@nvidia.com>
    Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
    Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
    fbdd0049
main.c 127 KB