• Erik Hugne's avatar
    tipc: add name distributor resiliency queue · a5325ae5
    Erik Hugne authored
    TIPC name table updates are distributed asynchronously in a cluster,
    entailing a risk of certain race conditions. E.g., if two nodes
    simultaneously issue conflicting (overlapping) publications, this may
    not be detected until both publications have reached a third node, in
    which case one of the publications will be silently dropped on that
    node. Hence, we end up with an inconsistent name table.
    
    In most cases this conflict is just a temporary race, e.g., one
    node is issuing a publication under the assumption that a previous,
    conflicting, publication has already been withdrawn by the other node.
    However, because of the (rtt related) distributed update delay, this
    may not yet hold true on all nodes. The symptom of this failure is a
    syslog message: "tipc: Cannot publish {%u,%u,%u}, overlap error".
    
    In this commit we add a resiliency queue at the receiving end of
    the name table distributor. When insertion of an arriving publication
    fails, we retain it in this queue for a short amount of time, assuming
    that another update will arrive very soon and clear the conflict. If so
    happens, we insert the publication, otherwise we drop it.
    
    The (configurable) retention value defaults to 2000 ms. Knowing from
    experience that the situation described above is extremely rare, there
    is no risk that the queue will accumulate any large number of items.
    Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
    Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
    Acked-by: default avatarYing Xue <ying.xue@windriver.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    a5325ae5
net.txt 9.73 KB