• Dmitry Fomichev's avatar
    dm kcopyd: always complete failed jobs · d1fef414
    Dmitry Fomichev authored
    This patch fixes a problem in dm-kcopyd that may leave jobs in
    complete queue indefinitely in the event of backing storage failure.
    
    This behavior has been observed while running 100% write file fio
    workload against an XFS volume created on top of a dm-zoned target
    device. If the underlying storage of dm-zoned goes to offline state
    under I/O, kcopyd sometimes never issues the end copy callback and
    dm-zoned reclaim work hangs indefinitely waiting for that completion.
    
    This behavior was traced down to the error handling code in
    process_jobs() function that places the failed job to complete_jobs
    queue, but doesn't wake up the job handler. In case of backing device
    failure, all outstanding jobs may end up going to complete_jobs queue
    via this code path and then stay there forever because there are no
    more successful I/O jobs to wake up the job handler.
    
    This patch adds a wake() call to always wake up kcopyd job wait queue
    for all I/O jobs that fail before dm_io() gets called for that job.
    
    The patch also sets the write error status in all sub jobs that are
    failed because their master job has failed.
    
    Fixes: b73c67c2 ("dm kcopyd: add sequential write feature")
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarDmitry Fomichev <dmitry.fomichev@wdc.com>
    Reviewed-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
    Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    d1fef414
dm-kcopyd.c 21.5 KB