• Josef Bacik's avatar
    btrfs: check WRITE_ERR when trying to read an extent buffer · 651740a5
    Josef Bacik authored
    Filipe reported a hang when we have errors on btrfs.  This turned out to
    be a side-effect of my fix c2e39305 ("btrfs: clear extent buffer
    uptodate when we fail to write it") which made it so we clear
    EXTENT_BUFFER_UPTODATE on an eb when we fail to write it out.
    
    Below is a paste of Filipe's analysis he got from using drgn to debug
    the hang
    
    """
    btree readahead code calls read_extent_buffer_pages(), sets ->io_pages to
    a value while writeback of all pages has not yet completed:
       --> writeback for the first 3 pages finishes, we clear
           EXTENT_BUFFER_UPTODATE from eb on the first page when we get an
           error.
       --> at this point eb->io_pages is 1 and we cleared Uptodate bit from the
           first 3 pages
       --> read_extent_buffer_pages() does not see EXTENT_BUFFER_UPTODATE() so
           it continues, it's able to lock the pages since we obviously don't
           hold the pages locked during writeback
       --> read_extent_buffer_pages() then computes 'num_reads' as 3, and sets
           eb->io_pages to 3, since only the first page does not have Uptodate
           bit set at this point
       --> writeback for the remaining page completes, we ended decrementing
           eb->io_pages by 1, resulting in eb->io_pages == 2, and therefore
           never calling end_extent_buffer_writeback(), so
           EXTENT_BUFFER_WRITEBACK remains in the eb's flags
       --> of course, when the read bio completes, it doesn't and shouldn't
           call end_extent_buffer_writeback()
       --> we should clear EXTENT_BUFFER_UPTODATE only after all pages of
           the eb finished writeback?  or maybe make the read pages code
           wait for writeback of all pages of the eb to complete before
           checking which pages need to be read, touch ->io_pages, submit
           read bio, etc
    
    writeback bit never cleared means we can hang when aborting a
    transaction, at:
    
        btrfs_cleanup_one_transaction()
           btrfs_destroy_marked_extents()
             wait_on_extent_buffer_writeback()
    """
    
    This is a problem because our writes are not synchronized with reads in
    any way.  We clear the UPTODATE flag and then we can easily come in and
    try to read the EB while we're still waiting on other bio's to
    complete.
    
    We have two options here, we could lock all the pages, and then check to
    see if eb->io_pages != 0 to know if we've already got an outstanding
    write on the eb.
    
    Or we can simply check to see if we have WRITE_ERR set on this extent
    buffer.  We set this bit _before_ we clear UPTODATE, so if the read gets
    triggered because we aren't UPTODATE because of a write error we're
    guaranteed to have WRITE_ERR set, and in this case we can simply return
    -EIO.  This will fix the reported hang.
    Reported-by: default avatarFilipe Manana <fdmanana@suse.com>
    Fixes: c2e39305 ("btrfs: clear extent buffer uptodate when we fail to write it")
    CC: stable@vger.kernel.org # 5.4+
    Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    651740a5
extent_io.c 196 KB