• Shay Drory's avatar
    net/mlx5: Drain fw_reset when removing device · 16d42d31
    Shay Drory authored
    In case fw sync reset is called in parallel to device removal, device
    might stuck in the following deadlock:
             CPU 0                        CPU 1
             -----                        -----
                                      remove_one
                                       uninit_one (locks intf_state_mutex)
    mlx5_sync_reset_now_event()
    work in fw_reset->wq.
     mlx5_enter_error_state()
      mutex_lock (intf_state_mutex)
                                       cleanup_once
                                        fw_reset_cleanup()
                                         destroy_workqueue(fw_reset->wq)
    
    Drain the fw_reset WQ, and make sure no new work is being queued, before
    entering uninit_one().
    The Drain is done before devlink_unregister() since fw_reset, in some
    flows, is using devlink API devlink_remote_reload_actions_performed().
    
    Fixes: 38b9f903 ("net/mlx5: Handle sync reset request event")
    Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
    Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
    Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
    16d42d31
main.c 49.2 KB