• Can Guo's avatar
    scsi: ufs: Fix concurrency of error handler and other error recovery paths · 4db7a236
    Can Guo authored
    Error recovery can be invoked from multiple code paths, including hibern8
    enter/exit (from ufshcd_link_recovery), ufshcd_eh_host_reset_handler() and
    eh_work scheduled from IRQ context. Ultimately, these paths are all trying
    to invoke ufshcd_reset_and_restore() in either a synchronous or
    asynchronous manner. This causes problems:
    
     - If link recovery happens during ungate work, ufshcd_hold() would be
       called recursively. Although commit 53c12d0e ("scsi: ufs: fix error
       recovery after the hibern8 exit failure") fixed a deadlock due to
       recursive calls of ufshcd_hold() by adding a check of eh_in_progress
       into ufshcd_hold, this check allows eh_work to run in parallel while
       link recovery is running.
    
     - Similar concurrency can also happen when error recovery is invoked from
       ufshcd_eh_host_reset_handler and ufshcd_link_recovery.
    
     - Concurrency can even happen between eh_works. eh_work, currently queued
       on system_wq, is allowed to have multiple instances running in parallel,
       but we don't have proper protection for that.
    
    If any of above concurrency scenarios happen, error recovery would fail and
    lead ufs device and host into bad states. To fix the concurrency problem,
    this change queues eh_work on a single threaded workqueue and removes link
    recovery calls from the hibern8 enter/exit path. In addition, make use of
    eh_work in eh_host_reset_handler instead of calling
    ufshcd_reset_and_restore. This unifies the UFS error recovery mechanism.
    
    According to the UFSHCI JEDEC spec, hibern8 enter/exit error occurs when
    the link is broken. This essentially applies to any power mode change
    operations (since they all use PACP_PWR cmds in UniPro layer). So, if a
    power mode change operation (including AH8 enter/exit) fails, mark link
    state as UIC_LINK_BROKEN_STATE and schedule the eh_work.  In this case,
    error handler needs to do a full reset and restore to recover the link back
    to active. Before the link state is recovered to active,
    ufshcd_uic_pwr_ctrl simply returns -ENOLINK to avoid more errors.
    
    Link: https://lore.kernel.org/r/1596975355-39813-6-git-send-email-cang@codeaurora.orgReviewed-by: default avatarBean Huo <beanhuo@micron.com>
    Reviewed-by: default avatarAsutosh Das <asutoshd@codeaurora.org>
    Signed-off-by: default avatarCan Guo <cang@codeaurora.org>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    4db7a236
ufshcd.h 36.1 KB