• James Smart's avatar
    nvme-fc: change controllers first connect to use reconnect path · 4c984154
    James Smart authored
    Current code follows the framework that has been in the transports
    from the beginning where initial link-side controller connect occurs
    as part of "creating the controller". Thus that first connect fully
    talks to the controller and obtains values that can then be used in
    for blk-mq setup, etc. It also means that everything about the
    controller is fully know before the "create controller" call returns.
    
    This has several weaknesses:
    - The initial create_ctrl call made by the cli will block for a long
      time as wire transactions are performed synchronously. This delay
      becomes longer if errors occur or connectivity is lost and retries
      need to be performed.
    - Code wise, it means there is a separate connect path for initial
      controller connect vs the (same) steps used in the reconnect path.
    - And as there's separate paths, it means there's separate error
      handling and retry logic. It also plays havoc with the NEW state
      (should transition out of it after successful initial connect) vs
      the RESETTING and CONNECTING (reconnect) states that want to be
      transitioned to on error.
    - As there's separate paths, to recover from errors and disruptions,
      it requires separate recovery/retry paths as well and can severely
      convolute the controller state.
    
    This patch reworks the fc transport to use the same connect paths
    for the initial connection as it uses for reconnect. This makes a
    single path for error recovery and handling.
    
    This patch:
    - Removes the driving of the initial connect and replaces it with
      a state transition to CONNECTING and initiating the reconnect
      thread. A dummy state transition of RESETTING had to be traversed
      as a direct transtion of NEW->CONNECTING is not allowed. Given
      that the controller is "new", the RESETTING transition is a simple
      no-op. Once in the reconnecting thread, the normal behaviors of
      ctrl_loss_tmo (max_retries * connect_delay) and dev_loss_tmo will
      apply before the controller is torn down.
    - Only if the state transitions couldn't be traversed and the
      reconnect thread not scheduled, will the controller be torn down
      while in create_ctrl.
    - The prior code used the controller state of NEW to indicate
      whether request queues had been initialized or not. For the admin
      queue, the request queue is always created, so there's no need to
      check a state. For IO queues, change to tracking whether a successful
      io request queue create has occurred (e.g. 1st successful connect).
    - The initial controller id is initialized to the dynamic controller
      id used in the initial connect message. It will be overwritten by
      the real controller id once the controller is connected on the wire.
    Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
    Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
    4c984154
fc.c 88.2 KB