• Christoph Hellwig's avatar
    xfs: free RT extents after updating the bmap btree · 9871d096
    Christoph Hellwig authored
    Currently xfs_bmap_del_extent_real frees RT extents before updating
    the bmap btree, while it frees regular blocks after performing the bmap
    btree update for convoluted historic reasons.  Switch to free the RT
    blocks in the same place as the regular data blocks instead to simply
    the code and fix a very theoretical bug.
    
    A short history of this code researched by Dave Chiner below:
    
    The truncate for data device extents was originally a two-phase
    operation. First it removed the bmapbt record, but because this can
    free BMBT extents, it can use up all the free space tree reservation
    space. So the transaction gets rolled to commit the BMBT change and
    the xfs_bmap_finish() call that frees the data extent runs with a
    new transaction reservation that allows different free space btrees
    to be logged without overrun.
    
    However, on crash, this could lose the free space because there was
    nothing to tell recovery about the extents removed from the BMBT,
    hence EFIs were introduced. They tie the extent free operation to the
    bmapbt record removal commit for recovery of the second phase of the
    extent removal process.
    
    Then RT extents came along. RT extent freeing does not require a
    free space btree reservation because the free space metadata is
    static and transaction size is bound. Hence we don't need to care if
    the BMBT record removal modifies the per-ag free space trees and we
    don't need a two-phase extent remove transaction. The only thing we
    have to care about is not losing space on crash.
    
    Hence instead of recording the extent for freeing in the bmap list
    for xfs_bmap_finish() to process in a new transaction, it simply
    freed the rtextent directly. So the original code (from 1994) simply
    replaced the "free AG extent later" queueing with a direct free.
    
    This code was originally at the start of xfs_dmap_del_extent(), but
    the xfs_bmap_add_free() got moved to the end of the function via the
    "do_fx" flag (the current code logic) in 1997 (commit c4fac74eaa58
    in the historic xfs-import tree) because there was a shutdown occurring
    because of a case where splitting the extent record failed because the
    BMBT split and the filesystem didn't have enough space for the split to
    be done. (FWIW, I'm not sure this can happen anymore.)
    
    The commit backed out the BMBT change on ENOSPC error, and in doing
    so I think this actually breaks RT free space tracking. However, it
    then returns an ENOSPC error, and we have a dirty transaction in the
    RT case so this will shut down the filesysetm when the transaction
    is cancelled. Hence the corrupted "bmbt now points at freed rt dev
    space" condition never make it to disk, but it's still the wrong way
    to handle the issue.
    
    IOWs, this proposed change fixes that "shutdown at ENOSPC on rt
    devices" situation that was introduced by the above commit back in
    1997.
    Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
    Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatar"Darrick J. Wong" <djwong@kernel.org>
    Signed-off-by: default avatarChandan Babu R <chandanbabu@kernel.org>
    9871d096
xfs_bmap.c 170 KB