• Eivind Sarto's avatar
    raid5: avoid release list until last reference of the stripe · cf170f3f
    Eivind Sarto authored
    The (lockless) release_list reduces lock contention, but there is excessive
    queueing and dequeuing of stripes on this list.  A stripe will currently be
    queued on the release_list with a stripe reference count > 1.  This can cause
    the raid5 kernel thread(s) to dequeue the stripe and decrement the refcount
    without doing any other useful processing of the stripe.  The are two cases
    when the stripe can be put on the release_list multiple times before it is
    actually handled by the kernel thread(s).
    1) make_request() activates the stripe processing in 4k increments.  When a
       write request is large enough to span multiple chunks of a stripe_head, the
       first 4k chunk adds the stripe to the plug list.  The next 4k chunk that is
       processed for the same stripe puts the stripe on the release_list with a
       refcount=2.  This can cause the kernel thread to process and decrement the
       stripe before the stripe us unplugged, which again will put it back on the
       release_list.
    2) Whenever IO is scheduled on a stripe (pre-read and/or write), the stripe
       refcount is set to the number of active IO (for each chunk).  The stripe is
       released as each IO complete, and can be queued and dequeued multiple times
       on the release_list, until its refcount finally reached zero.
    
    This simple patch will ensure a stripe is only queued on the release_list when
    its refcount=1 and is ready to be handled by the kernel thread(s).  I added some
    instrumentation to raid5 and counted the number of times striped were queued on
    the release_list for a variety of write IO sizes.  Without this patch the number
    of times stripes got queued on the release_list was 100-500% higher than with
    the patch.  The excess queuing will increase with the IO size.  The patch also
    improved throughput by 5-10%.
    Signed-off-by: default avatarEivind Sarto <esarto@fusionio.com>
    Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    cf170f3f
raid5.c 197 KB