• Chao Yu's avatar
    f2fs: introduce discard_unit mount option · 4f993264
    Chao Yu authored
    As James Z reported in bugzilla:
    
    https://bugzilla.kernel.org/show_bug.cgi?id=213877
    
    [1.] One-line summary of the problem:
    Mount multiple SMR block devices exceed certain number cause system non-response
    
    [2.] Full description of the problem/report:
    Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
    Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
    The number of SMR devices with other FS mounted on this system does not interfere with the result above.
    
    [3.] Keywords (i.e., modules, networking, kernel):
    F2FS, SMR, Memory
    
    [4.] Kernel information
    [4.1.] Kernel version (uname -a):
    Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
    
    [4.2.] Kernel .config file:
    Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
    
    [5.] Most recent kernel version which did not have the bug:
    None
    
    [6.] Output of Oops.. message (if applicable) with symbolic information
         resolved (see Documentation/admin-guide/oops-tracing.rst)
    None
    
    [7.] A small shell script or example program which triggers the
         problem (if possible)
    mount /dev/sdX /mnt/0X
    
    [8.] Memory consumption
    
    With 24 * 14T SMR Block device with F2FS
    free -g
                  total        used        free      shared  buff/cache   available
    Mem:             46          36           0           0          10          10
    Swap:             0           0           0
    
    With 3 * 14T SMR Block device with F2FS
    free -g
                   total        used        free      shared  buff/cache   available
    Mem:               7           5           0           0           1           1
    Swap:              7           0           7
    
    The root cause is, there are three bitmaps:
    - cur_valid_map
    - ckpt_valid_map
    - discard_map
    and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
    necessary, but discard_map is optional, since this bitmap will only be
    useful in mountpoint that small discard is enabled.
    
    For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
    discard for a section(zone) when all blocks of that section are invalid,
    so, for such device, we don't need small discard functionality at all.
    
    This patch introduces a new mountoption "discard_unit=block|segment|
    section" to support issuing discard with different basic unit which is
    aligned to block, segment or section, so that user can specify
    "discard_unit=segment" or "discard_unit=section" to disable small
    discard functionality.
    
    Note that this mount option can not be changed by remount() due to
    related metadata need to be initialized during mount().
    
    In order to save memory, let's use "discard_unit=section" for blkzoned
    device by default.
    Signed-off-by: default avatarChao Yu <chao@kernel.org>
    Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
    4f993264
sysfs.c 36.9 KB