• Jiufei Xue's avatar
    ocfs2: fix a tiny race that leads file system read-only · 814ce694
    Jiufei Xue authored
    when o2hb detect a node down, it first set the dead node to recovery map
    and create ocfs2rec which will replay journal for dead node.  o2hb
    thread then call dlm_do_local_recovery_cleanup() to delete the lock for
    dead node.  After the lock of dead node is gone, locks for other nodes
    can be granted and may modify the meta data without replaying journal of
    the dead node.  The detail is described as follows.
    
         N1                         N2                   N3(master)
    modify the extent tree of
    inode, and commit
    dirty metadata to journal,
    then goes down.
                                                     o2hb thread detects
                                                     N1 goes down, set
                                                     recovery map and
                                                     delete the lock of N1.
    
                                                     dlm_thread flush ast
                                                     for the lock of N2.
                            do not detect the death
                            of N1, so recovery map is
                            empty.
    
                            read inode from disk
                            without replaying
                            the journal of N1 and
                            modify the extent tree
                            of the inode that N1
                            had modified.
                                                     ocfs2rec recover the
                                                     journal of N1.
                                                     The modification of N2
                                                     is lost.
    
    The modification of N1 and N2 are not serial, and it will lead to
    read-only file system.  We can set recovery_waiting flag to the lock
    resource after delete the lock for dead node to prevent other node from
    getting the lock before dlm recovery.  After dlm recovery, the recovery
    map on N2 is not empty, ocfs2_inode_lock_full_nested() will wait for ocfs2
    recovery.
    Signed-off-by: default avatarJiufei Xue <xuejiufei@huawei.com>
    Reviewed-by: default avatarJoseph Qi <joseph.qi@huawei.com>
    Cc: Mark Fasheh <mfasheh@suse.de>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Junxiao Bi <junxiao.bi@oracle.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    814ce694
dlmthread.c 21.1 KB