• Alexander Aring's avatar
    fs: dlm: change handling of reconnects · ba3ab3ca
    Alexander Aring authored
    This patch changes the handling of reconnects. At first we only close
    the connection related to the communication failure. If we get a new
    connection for an already existing connection we close the existing
    connection and take the new one.
    
    This patch improves significantly the stability of tcp connections while
    running "tcpkill -9 -i $IFACE port 21064" while generating a lot of dlm
    messages e.g. on a gfs2 mount with many files. My test setup shows that a
    deadlock is "more" unlikely. Before this patch I wasn't able to get
    not a deadlock after 5 seconds. After this patch my observation is
    that it's more likely to survive after 5 seconds and more, but still a
    deadlock occurs after certain time. My guess is that there are still
    "segments" inside the tcp writequeue or retransmit queue which get dropped
    when receiving a tcp reset [1]. Hard to reproduce because the right message
    need to be inside these queues, which might even be in the 5 first seconds
    with this patch.
    
    [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/tcp_input.c?h=v5.8-rc6#n4122Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
    Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
    ba3ab3ca
lowcomms.c 39.9 KB