• Andreas Dilger's avatar
    Ext4: Uninitialized Block Groups · 717d50e4
    Andreas Dilger authored
    In pass1 of e2fsck, every inode table in the fileystem is scanned and checked,
    regardless of whether it is in use.  This is this the most time consuming part
    of the filesystem check.  The unintialized block group feature can greatly
    reduce e2fsck time by eliminating checking of uninitialized inodes.
    
    With this feature, there is a a high water mark of used inodes for each block
    group.  Block and inode bitmaps can be uninitialized on disk via a flag in the
    group descriptor to avoid reading or scanning them at e2fsck time.  A checksum
    of each group descriptor is used to ensure that corruption in the group
    descriptor's bit flags does not cause incorrect operation.
    
    The feature is enabled through a mkfs option
    
    	mke2fs /dev/ -O uninit_groups
    
    A patch adding support for uninitialized block groups to e2fsprogs tools has
    been posted to the linux-ext4 mailing list.
    
    The patches have been stress tested with fsstress and fsx.  In performance
    tests testing e2fsck time, we have seen that e2fsck time on ext3 grows
    linearly with the total number of inodes in the filesytem.  In ext4 with the
    uninitialized block groups feature, the e2fsck time is constant, based
    solely on the number of used inodes rather than the total inode count.
    Since typical ext4 filesystems only use 1-10% of their inodes, this feature can
    greatly reduce e2fsck time for users.  With performance improvement of 2-20
    times, depending on how full the filesystem is.
    
    The attached graph shows the major improvements in e2fsck times in filesystems
    with a large total inode count, but few inodes in use.
    
    In each group descriptor if we have
    
    EXT4_BG_INODE_UNINIT set in bg_flags:
            Inode table is not initialized/used in this group. So we can skip
            the consistency check during fsck.
    EXT4_BG_BLOCK_UNINIT set in bg_flags:
            No block in the group is used. So we can skip the block bitmap
            verification for this group.
    
    We also add two new fields to group descriptor as a part of
    uninitialized group patch.
    
            __le16  bg_itable_unused;       /* Unused inodes count */
            __le16  bg_checksum;            /* crc16(sb_uuid+group+desc) */
    
    bg_itable_unused:
    
    If we have EXT4_BG_INODE_UNINIT not set in bg_flags
    then bg_itable_unused will give the offset within
    the inode table till the inodes are used. This can be
    used by fsck to skip list of inodes that are marked unused.
    
    bg_checksum:
    Now that we depend on bg_flags and bg_itable_unused to determine
    the block and inode usage, we need to make sure group descriptor
    is not corrupt. We add checksum to group descriptor to
    detect corruption. If the descriptor is found to be corrupt, we
    mark all the blocks and inodes in the group used.
    Signed-off-by: default avatarAvantika Mathur <mathur@us.ibm.com>
    Signed-off-by: default avatarAndreas Dilger <adilger@clusterfs.com>
    Signed-off-by: default avatarMingming Cao <cmm@us.ibm.com>
    Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
    717d50e4
Kconfig 75.7 KB