• Ritesh Harjani's avatar
    ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling · 07b5b8e1
    Ritesh Harjani authored
    There could be a race in function ext4_mb_discard_group_preallocations()
    where the 1st thread may iterate through group's bb_prealloc_list and
    remove all the PAs and add to function's local list head.
    Now if the 2nd thread comes in to discard the group preallocations,
    it will see that the group->bb_prealloc_list is empty and will return 0.
    
    Consider for a case where we have less number of groups
    (for e.g. just group 0),
    this may even return an -ENOSPC error from ext4_mb_new_blocks()
    (where we call for ext4_mb_discard_group_preallocations()).
    But that is wrong, since 2nd thread should have waited for 1st thread
    to release all the PAs and should have retried for allocation.
    Since 1st thread was anyway going to discard the PAs.
    
    The algorithm using this percpu seq counter goes below:
    1. We sample the percpu discard_pa_seq counter before trying for block
       allocation in ext4_mb_new_blocks().
    2. We increment this percpu discard_pa_seq counter when we either allocate
       or free these blocks i.e. while marking those blocks as used/free in
       mb_mark_used()/mb_free_blocks().
    3. We also increment this percpu seq counter when we successfully identify
       that the bb_prealloc_list is not empty and hence proceed for discarding
       of those PAs inside ext4_mb_discard_group_preallocations().
    
    Now to make sure that the regular fast path of block allocation is not
    affected, as a small optimization we only sample the percpu seq counter
    on that cpu. Only when the block allocation fails and when freed blocks
    found were 0, that is when we sample percpu seq counter for all cpus using
    below function ext4_get_discard_pa_seq_sum(). This happens after making
    sure that all the PAs on grp->bb_prealloc_list got freed or if it's empty.
    
    It can be well argued that why don't just check for grp->bb_free to
    see if there are any free blocks to be allocated. So here are the two
    concerns which were discussed:-
    
    1. If for some reason the blocks available in the group are not
       appropriate for allocation logic (say for e.g.
       EXT4_MB_HINT_GOAL_ONLY, although this is not yet implemented), then
       the retry logic may result into infinte looping since grp->bb_free is
       non-zero.
    
    2. Also before preallocation was clubbed with block allocation with the
       same ext4_lock_group() held, there were lot of races where grp->bb_free
       could not be reliably relied upon.
    Due to above, this patch considers discard_pa_seq logic to determine if
    we should retry for block allocation. Say if there are are n threads
    trying for block allocation and none of those could allocate or discard
    any of the blocks, then all of those n threads will fail the block
    allocation and return -ENOSPC error. (Since the seq counter for all of
    those will match as no block allocation/discard was done during that
    duration).
    Signed-off-by: default avatarRitesh Harjani <riteshh@linux.ibm.com>
    Link: https://lore.kernel.org/r/7f254686903b87c419d798742fd9a1be34f0657b.1589955723.git.riteshh@linux.ibm.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
    07b5b8e1
mballoc.c 151 KB