• Xiang Chen's avatar
    scsi: hisi_sas: Fix the conflict between device gone and host reset · e74006ed
    Xiang Chen authored
    When device gone, it will check whether it is during reset, if not, it will
    send internal task abort. Before internal task abort returned, reset
    begins, and it will check whether SAS_PHY_UNUSED is set, if not, it will
    call hisi_sas_init_device(), but at that time domain_device may already be
    freed or part of it is freed, so it may referenece null pointer in
    hisi_sas_init_device(). It may occur as follows:
    
        thread0				thread1
    hisi_sas_dev_gone()
        check whether in RESET(no)
        internal task abort
    				    reset prep
    				    soft_reset
    				    ... (part of reset_done)
        internal task abort failed
        release resource anyway
        clear_itct
        device->lldd_dev=NULL
    				    hisi_sas_reset_init_all_device
    					check sas_dev->dev_type is SAS_PHY_UNUSED and
    					!device
        set dev_type SAS_PHY_UNUSED
        sas_free_device
    					hisi_sas_init_device
    					...
    
    Semaphore hisi_hba.sema is used to sync the processes of device gone and
    host reset.
    
    To solve the issue, expand the scope that semaphore protects and let them
    never occur together.
    
    And also some places will check whether domain_device is NULL to judge
    whether the device is gone. So when device gone, need to clear
    sas_dev->sas_device.
    
    Link: https://lore.kernel.org/r/1567774537-20003-14-git-send-email-john.garry@huawei.comSigned-off-by: default avatarXiang Chen <chenxiang66@hisilicon.com>
    Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    e74006ed
hisi_sas_main.c 101 KB