• David Teigland's avatar
    [DLM] fix old rcom messages · 38aa8b0c
    David Teigland authored
    A reply to a recovery message will often be received after the relevant
    recovery sequence has aborted and the next recovery sequence has begun.
    We need to ignore replies to these old messages from the previous
    recovery.  There's already a way to do this for synchronous recovery
    requests using the rc_id number, but not for async.
    
    Each recovery sequence already has a locally unique sequence number
    associated with it.  This patch adds a field to the rcom (recovery
    message) structure where this recovery sequence number can be placed,
    rc_seq.  When a node sends a reply to a recovery request, it copies the
    rc_seq number it received into rc_seq_reply.  When the first node receives
    the reply to its recovery message, it will check whether rc_seq_reply
    matches the current recovery sequence number, ls_recover_seq, and if not
    then it ignores the old reply.
    
    An old, inadequate approach to filtering out old replies (checking if the
    current stage of recovery has moved back to the start) has been removed
    from two spots.
    
    The protocol version number is changed to reflect the different rcom
    structures.
    Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
    Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
    38aa8b0c
dlm_internal.h 14.4 KB