• Jan Kara's avatar
    ext4: Improve scalability of ext4 orphan file handling · 4a79a98c
    Jan Kara authored
    Even though the length of the critical section when adding / removing
    orphaned inodes was significantly reduced by using orphan file, the
    contention of lock protecting orphan file still appears high in profiles
    for truncate / unlink intensive workloads with high number of threads.
    
    This patch makes handling of orphan file completely lockless. Also to
    reduce conflicts between CPUs different CPUs start searching for empty
    slot in orphan file in different blocks.
    
    Performance comparison of locked orphan file handling, lockless orphan
    file handling, and completely disabled orphan inode handling
    from 80 CPU Xeon Server with 526 GB of RAM, filesystem located on
    SAS SSD disk, average of 5 runs:
    
    stress-orphan (microbenchmark truncating files byte-by-byte from N
    processes in parallel)
    
    Threads Time            Time            Time
            Orphan locked   Orphan lockless No orphan
      1       0.945600       0.939400        0.891200
      2       1.331800       1.246600        1.174400
      4       1.995000       1.780600        1.713200
      8       6.424200       4.900000        4.106000
     16      14.937600       8.516400        8.138000
     32      33.038200      24.565600       24.002200
     64      60.823600      39.844600       38.440200
    128     122.941400      70.950400       69.315000
    
    So we can see that with lockless orphan file handling, addition /
    deletion of orphaned inodes got almost completely out of picture even
    for a microbenchmark stressing it.
    
    For reaim creat_clo workload on ramdisk there are also noticeable gains
    (average of 5 runs):
    
    Clients         Vanilla (ops/s)        Patched (ops/s)
    creat_clo-1     14705.88 (   0.00%)    14354.07 *  -2.39%*
    creat_clo-3     27108.43 (   0.00%)    28301.89 (   4.40%)
    creat_clo-5     37406.48 (   0.00%)    45180.73 *  20.78%*
    creat_clo-7     41338.58 (   0.00%)    54687.50 *  32.29%*
    creat_clo-9     45226.13 (   0.00%)    62937.07 *  39.16%*
    creat_clo-11    44000.00 (   0.00%)    65088.76 *  47.93%*
    creat_clo-13    36516.85 (   0.00%)    68661.97 *  88.03%*
    creat_clo-15    30864.20 (   0.00%)    69551.78 * 125.35%*
    creat_clo-17    27478.45 (   0.00%)    67729.08 * 146.48%*
    creat_clo-19    25000.00 (   0.00%)    61621.62 * 146.49%*
    creat_clo-21    18772.35 (   0.00%)    63829.79 * 240.02%*
    creat_clo-23    16698.94 (   0.00%)    61938.96 * 270.92%*
    creat_clo-25    14973.05 (   0.00%)    56947.61 * 280.33%*
    creat_clo-27    16436.69 (   0.00%)    65008.03 * 295.51%*
    creat_clo-29    13949.01 (   0.00%)    69047.62 * 395.00%*
    creat_clo-31    14283.52 (   0.00%)    67982.45 * 375.95%*
    Reviewed-by: default avatarTheodore Ts'o <tytso@mit.edu>
    Reviewed-by: default avatarLukas Czerner <lczerner@redhat.com>
    Signed-off-by: default avatarJan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20210816095713.16537-5-jack@suse.czSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
    4a79a98c
orphan.c 18.5 KB