• Krzysztof Wojcik's avatar
    FIX: md: process hangs at wait_barrier after 0->10 takeover · 02214dc5
    Krzysztof Wojcik authored
    Following symptoms were observed:
    1. After raid0->raid10 takeover operation we have array with 2
    missing disks.
    When we add disk for rebuild, recovery process starts as expected
    but it does not finish- it stops at about 90%, md126_resync process
    hangs in "D" state.
    2. Similar behavior is when we have mounted raid0 array and we
    execute takeover to raid10. After this when we try to unmount array-
    it causes process umount hangs in "D"
    
    In scenarios above processes hang at the same function- wait_barrier
    in raid10.c.
    Process waits in macro "wait_event_lock_irq" until the
    "!conf->barrier" condition will be true.
    In scenarios above it never happens.
    
    Reason was that at the end of level_store, after calling pers->run,
    we call mddev_resume. This calls pers->quiesce(mddev, 0) with
    RAID10, that calls lower_barrier.
    However raise_barrier hadn't been called on that 'conf' yet,
    so conf->barrier becomes negative, which is bad.
    
    This patch introduces setting conf->barrier=1 after takeover
    operation. It prevents to become barrier negative after call
    lower_barrier().
    Signed-off-by: default avatarKrzysztof Wojcik <krzysztof.wojcik@intel.com>
    Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    02214dc5
raid10.c 67.3 KB