• Tuong Lien's avatar
    tipc: fix link session and re-establish issues · 91986ee1
    Tuong Lien authored
    When a link endpoint is re-created (e.g. after a node reboot or
    interface reset), the link session number is varied by random, the peer
    endpoint will be synced with this new session number before the link is
    re-established.
    
    However, there is a shortcoming in this mechanism that can lead to the
    link never re-established or faced with a failure then. It happens when
    the peer endpoint is ready in ESTABLISHING state, the 'peer_session' as
    well as the 'in_session' flag have been set, but suddenly this link
    endpoint leaves. When it comes back with a random session number, there
    are two situations possible:
    
    1/ If the random session number is larger than (or equal to) the
    previous one, the peer endpoint will be updated with this new session
    upon receipt of a RESET_MSG from this endpoint, and the link can be re-
    established as normal. Otherwise, all the RESET_MSGs from this endpoint
    will be rejected by the peer. In turn, when this link endpoint receives
    one ACTIVATE_MSG from the peer, it will move to ESTABLISHED and start
    to send STATE_MSGs, but again these messages will be dropped by the
    peer due to wrong session.
    The peer link endpoint can still become ESTABLISHED after receiving a
    traffic message from this endpoint (e.g. a BCAST_PROTOCOL or
    NAME_DISTRIBUTOR), but since all the STATE_MSGs are invalid, the link
    will be forced down sooner or later!
    
    Even in case the random session number is larger than the previous one,
    it can be that the ACTIVATE_MSG from the peer arrives first, and this
    link endpoint moves quickly to ESTABLISHED without sending out any
    RESET_MSG yet. Consequently, the peer link will not be updated with the
    new session number, and the same link failure scenario as above will
    happen.
    
    2/ Another situation can be that, the peer link endpoint was reset due
    to any reasons in the meantime, its link state was set to RESET from
    ESTABLISHING but still in session, i.e. the 'in_session' flag is not
    reset...
    Now, if the random session number from this endpoint is less than the
    previous one, all the RESET_MSGs from this endpoint will be rejected by
    the peer. In the other direction, when this link endpoint receives a
    RESET_MSG from the peer, it moves to ESTABLISHING and starts to send
    ACTIVATE_MSGs, but all these messages will be rejected by the peer too.
    As a result, the link cannot be re-established but gets stuck with this
    link endpoint in state ESTABLISHING and the peer in RESET!
    
    Solution:
    
    ===========
    
    This link endpoint should not go directly to ESTABLISHED when getting
    ACTIVATE_MSG from the peer which may belong to the old session if the
    link was re-created. To ensure the session to be correct before the
    link is re-established, the peer endpoint in ESTABLISHING state will
    send back the last session number in ACTIVATE_MSG for a verification at
    this endpoint. Then, if needed, a new and more appropriate session
    number will be regenerated to force a re-synch first.
    
    In addition, when a link in ESTABLISHING state is reset, its state will
    move to RESET according to the link FSM, along with resetting the
    'in_session' flag (and the other data) as a normal link reset, it will
    also be deleted if requested.
    
    The solution is backward compatible.
    Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
    Acked-by: default avatarYing Xue <ying.xue@windriver.com>
    Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    91986ee1
node.c 62.6 KB