• Xue jiufei's avatar
    ocfs2/dlm: fix possible convert=sion deadlock · 6718cb5e
    Xue jiufei authored
    We found there is a conversion deadlock when the owner of lockres
    happened to crash before send DLM_PROXY_AST_MSG for a downconverting
    lock.  The situation is as follows:
    
    Node1                            Node2                  Node3
                               the owner of lockresA
    lock_1 granted at EX mode
    and call ocfs2_cluster_unlock
    to decrease ex_holders.
                                                     converting lock_3 from
                                                     NL to EX
                               send DLM_PROXY_AST_MSG
                               to Node1, asking Node 1
                               to downconvert.
    receiving DLM_PROXY_AST_MSG,
    thread ocfs2dc send
    DLM_CONVERT_LOCK_MSG
    to Node2 to downconvert
    lock_1(EX->NL).
                               lock_1 can be granted and
                               put it into pending_asts
                               list, return DLM_NORMAL.
                               then something happened
                               and Node2 crashed.
    received DLM_NORMAL, waiting
    for DLM_PROXY_AST_MSG.
                                                   selected as the recovery
                                                   master, receving migrate
                                                   lock from Node1, queue
                                                   lock_1 to the tail of
                                                   converting list.
    
    After dlm recovery, converting list in the master of lockresA(Node3)
    will be: converting list head <-> lock_3(NL->EX) <->lock_1(EX<->NL).
    Requested mode of lock_3 is not compatible with the granted mode of
    lock_1, so it can not be granted.  and lock_1 can not downconvert
    because covnerting queue is strictly FIFO.  So a deadlock is created.
    We think function dlm_process_recovery_data() should queue_ast for
    lock_1 or alter the order of lock_1 and lock_3, so dlm_thread can
    process lock_1 first.  And if there are multiple downconverting locks,
    they must convert form PR to NL, so no need to sort them.
    Signed-off-by: default avatarjoyce.xue <xuejiufei@huawei.com>
    Cc: Mark Fasheh <mfasheh@suse.com>
    Cc: Joel Becker <jlbec@evilplan.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    6718cb5e
dlmrecovery.c 86.2 KB