• Eric Mei's avatar
    md/raid5: don't do chunk aligned read on degraded array. · 9ffc8f7c
    Eric Mei authored
    When array is degraded, read data landed on failed drives will result in
    reading rest of data in a stripe. So a single sequential read would
    result in same data being read twice.
    
    This patch is to avoid chunk aligned read for degraded array. The
    downside is to involve stripe cache which means associated CPU overhead
    and extra memory copy.
    
    Test Results:
    Following test are done on a enterprise storage node with Seagate 6T SAS
    drives and Xeon E5-2648L CPU (10 cores, 1.9Ghz), 10 disks MD RAID6 8+2,
    chunk size 128 KiB.
    
    I use FIO, using direct-io with various bs size, enough queue depth,
    tested sequential and 100% random read against 3 array config:
     1) optimal, as baseline;
     2) degraded;
     3) degraded with this patch.
    Kernel version is 4.0-rc3.
    
    Each individual test I only did once so there might be some variations,
    but we just focus on big trend.
    
    Sequential Read:
      bs=(KiB)  optimal(MiB/s)  degraded(MiB/s)  degraded-with-patch (MiB/s)
       1024       1608            656              995
        512       1624            710              956
        256       1635            728              980
        128       1636            771              983
         64       1612           1119             1000
         32       1580           1420             1004
         16       1368            688              986
          8        768            647              953
          4        411            413              850
    
    Random Read:
      bs=(KiB)  optimal(IOPS)  degraded(IOPS)  degraded-with-patch (IOPS)
       1024        163            160              156
        512        274            273              272
        256        426            428              424
        128        576            592              591
         64        726            724              726
         32        849            848              837
         16        900            970              971
          8        927            940              929
          4        948            940              955
    
    Some notes:
      * In sequential + optimal, as bs size getting smaller, the FIO thread
    become CPU bound.
      * In sequential + degraded, there's big increase when bs is 64K and
    32K, I don't have explanation.
      * In sequential + degraded-with-patch, the MD thread mostly become CPU
    bound.
    
    If you want to we can discuss specific data point in those data. But in
    general it seems with this patch, we have more predictable and in most
    cases significant better sequential read performance when array is
    degraded, and almost no noticeable impact on random read.
    
    Performance is a complicated thing, the patch works well for this
    particular configuration, but may not be universal. For example I
    imagine testing on all SSD array may have very different result. But I
    personally think in most cases IO bandwidth is more scarce resource than
    CPU.
    Signed-off-by: default avatarEric Mei <eric.mei@seagate.com>
    Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    9ffc8f7c
raid5.c 218 KB