• Wengang Wang's avatar
    ocfs2/dlm: avoid incorrect bit set in refmap on recovery master · b12a358c
    Wengang Wang authored
    commit a524812b upstream.
    
    In the following situation, there remains an incorrect bit in refmap on the
    recovery master. Finally the recovery master will fail at purging the lockres
    due to the incorrect bit in refmap.
    
    1) node A has no interest on lockres A any longer, so it is purging it.
    2) the owner of lockres A is node B, so node A is sending de-ref message
    to node B.
    3) at this time, node B crashed. node C becomes the recovery master. it recovers
    lockres A(because the master is the dead node B).
    4) node A migrated lockres A to node C with a refbit there.
    5) node A failed to send de-ref message to node B because it crashed. The failure
    is ignored. no other action is done for lockres A any more.
    
    For mormal, re-send the deref message to it to recovery master can fix it. Well,
    ignoring the failure of deref to the original master and not recovering the lockres
    to recovery master has the same effect. And the later is simpler.
    Signed-off-by: default avatarWengang Wang <wen.gang.wang@oracle.com>
    Acked-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
    Signed-off-by: default avatarJoel Becker <joel.becker@oracle.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
    b12a358c
dlmrecovery.c 83 KB