• Ilya Dryomov's avatar
    libceph: reschedule a tick in finish_hunting() · 7b4c443d
    Ilya Dryomov authored
    If we go without an established session for a while, backoff delay will
    climb to 30 seconds.  The keepalive timeout is also 30 seconds, so it's
    pretty easily hit after a prolonged hunting for a monitor: we don't get
    a chance to send out a keepalive in time, which means we never get back
    a keepalive ack in time, cutting an established session and attempting
    to connect to a different monitor every 30 seconds:
    
      [Sun Apr 1 23:37:05 2018] libceph: mon0 10.80.20.99:6789 session established
      [Sun Apr 1 23:37:36 2018] libceph: mon0 10.80.20.99:6789 session lost, hunting for new mon
      [Sun Apr 1 23:37:36 2018] libceph: mon2 10.80.20.103:6789 session established
      [Sun Apr 1 23:38:07 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon
      [Sun Apr 1 23:38:07 2018] libceph: mon1 10.80.20.100:6789 session established
      [Sun Apr 1 23:38:37 2018] libceph: mon1 10.80.20.100:6789 session lost, hunting for new mon
      [Sun Apr 1 23:38:37 2018] libceph: mon2 10.80.20.103:6789 session established
      [Sun Apr 1 23:39:08 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon
    
    The regular keepalive interval is 10 seconds.  After ->hunting is
    cleared in finish_hunting(), call __schedule_delayed() to ensure we
    send out a keepalive after 10 seconds.
    
    Cc: stable@vger.kernel.org # 4.7+
    Link: http://tracker.ceph.com/issues/23537Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
    Reviewed-by: default avatarJason Dillaman <dillaman@redhat.com>
    7b4c443d
mon_client.c 32.8 KB