• Jon Paul Maloy's avatar
    tipc: eliminate risk of stalled link synchronization · 0f8b8e28
    Jon Paul Maloy authored
    In commit 6e498158 ("tipc: move link synch and failover to link aggregation level")
    we introduced a new mechanism for performing link failover and
    synchronization. We have now detected a bug in this mechanism.
    
    During link synchronization we use the arrival of any packet on
    the tunnel link to trig a check for whether it has reached the
    synchronization point or not. This has turned out to be too
    permissive, since it may cause an arriving non-last SYNCH packet to
    end the synch state, just to see the next SYNCH packet initiate a
    new synch state with a new, higher synch point. This is not fatal,
    but should be avoided, because it may significantly extend the
    synchronization period, while at the same time we are not allowed
    to send NACKs if packets are lost. In the worst case, a low-traffic
    user may see its traffic stall until a LINK_PROTOCOL state message
    trigs the link to leave synchronization state.
    
    At the same time, LINK_PROTOCOL packets which happen to have a (non-
    valid) sequence number lower than the tunnel link's rcv_nxt value will
    be consistently dropped, and will never be able to resolve the situation
    described above.
    
    We fix this by exempting LINK_PROTOCOL packets from the sequence number
    check, as they should be. We also reduce (but don't completely
    eliminate) the risk of entering multiple synchronization states by only
    allowing the (logically) first SYNCH packet to initiate a synchronization
    state. This works independently of actual packet arrival order.
    
    Fixes: commit 6e498158 ("tipc: move link synch and failover to link aggregation level")
    Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
    Acked-by: default avatarYing Xue <ying.xue@windriver.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    0f8b8e28
node.c 34.3 KB