• Naohiro Aota's avatar
    btrfs: zoned: fix chunk allocation condition for zoned allocator · 82187d2e
    Naohiro Aota authored
    The ZNS specification defines a limit on the number of "active"
    zones. That limit impose us to limit the number of block groups which
    can be used for an allocation at the same time. Not to exceed the
    limit, we reuse the existing active block groups as much as possible
    when we can't activate any other zones without sacrificing an already
    activated block group in commit a85f05e5 ("btrfs: zoned: avoid
    chunk allocation if active block group has enough space").
    
    However, the check is wrong in two ways. First, it checks the
    condition for every raid index (ffe_ctl->index). Even if it reaches
    the condition and "ffe_ctl->max_extent_size >=
    ffe_ctl->min_alloc_size" is met, there can be other block groups
    having enough space to hold ffe_ctl->num_bytes. (Actually, this won't
    happen in the current zoned code as it only supports SINGLE
    profile. But, it can happen once it enables other RAID types.)
    
    Second, it checks the active zone availability depending on the
    raid index. The raid index is just an index for
    space_info->block_groups, so it has nothing to do with chunk allocation.
    
    These mistakes are causing a faulty allocation in a certain
    situation. Consider we are running zoned btrfs on a device whose
    max_active_zone == 0 (no limit). And, suppose no block group have a
    room to fit ffe_ctl->num_bytes but some room to meet
    ffe_ctl->min_alloc_size (i.e. max_extent_size > num_bytes >=
    min_alloc_size).
    
    In this situation, the following occur:
    
    - With SINGLE raid_index, it reaches the chunk allocation checking
      code
    - The check returns true because we can activate a new zone (no limit)
    - But, before allocating the chunk, it iterates to the next raid index
      (RAID5)
    - Since there are no RAID5 block groups on zoned mode, it again
      reaches the check code
    - The check returns false because of btrfs_can_activate_zone()'s "if
      (raid_index != BTRFS_RAID_SINGLE)" part
    - That results in returning -ENOSPC without allocating a new chunk
    
    As a result, we end up hitting -ENOSPC too early.
    
    Move the check to the right place in the can_allocate_chunk() hook,
    and do the active zone check depending on the allocation flag, not on
    the raid index.
    
    CC: stable@vger.kernel.org # 5.16
    Signed-off-by: default avatarNaohiro Aota <naohiro.aota@wdc.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    82187d2e
zoned.c 50.6 KB