• Coly Li's avatar
    bcache: add stop_when_cache_set_failed option to backing device · 7e027ca4
    Coly Li authored
    When there are too many I/O errors on cache device, current bcache code
    will retire the whole cache set, and detach all bcache devices. But the
    detached bcache devices are not stopped, which is problematic when bcache
    is in writeback mode.
    
    If the retired cache set has dirty data of backing devices, continue
    writing to bcache device will write to backing device directly. If the
    LBA of write request has a dirty version cached on cache device, next time
    when the cache device is re-registered and backing device re-attached to
    it again, the stale dirty data on cache device will be written to backing
    device, and overwrite latest directly written data. This situation causes
    a quite data corruption.
    
    But we cannot simply stop all attached bcache devices when the cache set is
    broken or disconnected. For example, use bcache to accelerate performance
    of an email service. In such workload, if cache device is broken but no
    dirty data lost, keep the bcache device alive and permit email service
    continue to access user data might be a better solution for the cache
    device failure.
    
    Nix <nix@esperi.org.uk> points out the issue and provides the above example
    to explain why it might be necessary to not stop bcache device for broken
    cache device. Pavel Goran <via-bcache@pvgoran.name> provides a brilliant
    suggestion to provide "always" and "auto" options to per-cached device
    sysfs file stop_when_cache_set_failed. If cache set is retiring and the
    backing device has no dirty data on cache, it should be safe to keep the
    bcache device alive. In this case, if stop_when_cache_set_failed is set to
    "auto", the device failure handling code will not stop this bcache device
    and permit application to access the backing device with a unattached
    bcache device.
    
    Changelog:
    [mlyle: edited to not break string constants across lines]
    v3: fix typos pointed out by Nix.
    v2: change option values of stop_when_cache_set_failed from 1/0 to
        "auto"/"always".
    v1: initial version, stop_when_cache_set_failed can be 0 (not stop) or 1
        (always stop).
    Signed-off-by: default avatarColy Li <colyli@suse.de>
    Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
    Signed-off-by: default avatarMichael Lyle <mlyle@lyle.org>
    Cc: Nix <nix@esperi.org.uk>
    Cc: Pavel Goran <via-bcache@pvgoran.name>
    Cc: Junhui Tang <tang.junhui@zte.com.cn>
    Cc: Hannes Reinecke <hare@suse.com>
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    7e027ca4
sysfs.c 23.9 KB