• Vitaly Kuznetsov's avatar
    scsi_sysfs: protect against double execution of __scsi_remove_device() · be821fd8
    Vitaly Kuznetsov authored
    On some host errors storvsc module tries to remove sdev by scheduling a job
    which does the following:
    
       sdev = scsi_device_lookup(wrk->host, 0, 0, wrk->lun);
       if (sdev) {
           scsi_remove_device(sdev);
           scsi_device_put(sdev);
       }
    
    While this code seems correct the following crash is observed:
    
     general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
     RIP: 0010:[<ffffffff81169979>]  [<ffffffff81169979>] bdi_destroy+0x39/0x220
     ...
     [<ffffffff814aecdc>] ? _raw_spin_unlock_irq+0x2c/0x40
     [<ffffffff8127b7db>] blk_cleanup_queue+0x17b/0x270
     [<ffffffffa00b54c4>] __scsi_remove_device+0x54/0xd0 [scsi_mod]
     [<ffffffffa00b556b>] scsi_remove_device+0x2b/0x40 [scsi_mod]
     [<ffffffffa00ec47d>] storvsc_remove_lun+0x3d/0x60 [hv_storvsc]
     [<ffffffff81080791>] process_one_work+0x1b1/0x530
     ...
    
    The problem comes with the fact that many such jobs (for the same device)
    are being scheduled simultaneously. While scsi_remove_device() uses
    shost->scan_mutex and scsi_device_lookup() will fail for a device in
    SDEV_DEL state there is no protection against someone who did
    scsi_device_lookup() before we actually entered __scsi_remove_device(). So
    the whole scenario looks like that: two callers do simultaneous (or
    preemption happens) calls to scsi_device_lookup() ant these calls succeed
    for both of them, after that they try doing scsi_remove_device().
    shost->scan_mutex only serializes their calls to __scsi_remove_device()
    and we end up doing the cleanup path twice.
    Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    be821fd8
scsi_sysfs.c 33.6 KB