• Josef Bacik's avatar
    btrfs: fix possible infinite loop in data async reclaim · c4923027
    Josef Bacik authored
    Dave reported an issue where generic/102 would sometimes hang.  This
    turned out to be because we'd get into this spot where we were no longer
    making progress on data reservations because our exit condition was not
    met.  The log is basically
    
    while (!space_info->full && !list_empty(&space_info->tickets))
    	flush_space(space_info, flush_state);
    
    where flush state is our various flush states, but doesn't include
    ALLOC_CHUNK_FORCE.  This is because we actually lead with allocating
    chunks, and so the assumption was that once you got to the actual
    flushing states you could no longer allocate chunks.  This was a stupid
    assumption, because you could have deleted block groups that would be
    reclaimed by a transaction commit, thus unsetting space_info->full.
    This is essentially what happens with generic/102, and so sometimes
    you'd get stuck in the flushing loop because we weren't allocating
    chunks, but flushing space wasn't giving us what we needed to make
    progress.
    
    Fix this by adding ALLOC_CHUNK_FORCE to the end of our flushing states,
    that way we will eventually bail out because we did end up with
    space_info->full if we free'd a chunk previously.  Otherwise, as is the
    case for this test, we'll allocate our chunk and continue on our happy
    merry way.
    Reported-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    c4923027
space-info.c 45.9 KB