• Josef Bacik's avatar
    block: init flush rq ref count to 1 · 8a1a3d38
    Josef Bacik authored
    [ Upstream commit b554db14 ]
    
    We discovered a problem in newer kernels where a disconnect of a NBD
    device while the flush request was pending would result in a hang.  This
    is because the blk mq timeout handler does
    
            if (!refcount_inc_not_zero(&rq->ref))
                    return true;
    
    to determine if it's ok to run the timeout handler for the request.
    Flush_rq's don't have a ref count set, so we'd skip running the timeout
    handler for this request and it would just sit there in limbo forever.
    
    Fix this by always setting the refcount of any request going through
    blk_init_rq() to 1.  I tested this with a nbd-server that dropped flush
    requests to verify that it hung, and then tested with this patch to
    verify I got the timeout as expected and the error handling kicked in.
    Thanks,
    Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
    8a1a3d38
blk-core.c 105 KB