• Tao Ma's avatar
    ocfs2: Reconnect after idle time out. · 5cc3bf27
    Tao Ma authored
    Currently, o2net connects to a node on hb_up and disconnects on
    hb_down and net timeout.
    
    It disconnects on net timeout is ok, but it should attempt to
    reconnect back. This is because sometimes nodes get overloaded
    enough that the network connection breaks but the disk hb does not.
    And if we get into that situation, we either fence (unnecessarily)
    or wait for its disk hb to die (and sometimes hang in the process).
    
    So in this updated scheme, when the network disconnects, we keep
    attempting to reconnect till we succeed or we get a disk hb down
    event.
    
    If the other node is really dead, then we will eventually get a
    node down event. If not, we should be able to connect again and
    continue.
    Signed-off-by: default avatarTao Ma <tao.ma@oracle.com>
    Signed-off-by: default avatarMark Fasheh <mfasheh@suse.com>
    5cc3bf27
tcp.c 54.5 KB