• Anton Eidelman's avatar
    nvme-multipath: fix hang when disk goes live over reconnect · a4a6f3c8
    Anton Eidelman authored
    nvme_mpath_init_identify() invoked from nvme_init_identify() fetches a
    fresh ANA log from the ctrl.  This is essential to have an up to date
    path states for both existing namespaces and for those scan_work may
    discover once the ctrl is up.
    
    This happens in the following cases:
      1) A new ctrl is being connected.
      2) An existing ctrl is successfully reconnected.
      3) An existing ctrl is being reset.
    
    While in (1) ctrl->namespaces is empty, (2 & 3) may have namespaces, and
    nvme_read_ana_log() may call nvme_update_ns_ana_state().
    
    This result in a hang when the ANA state of an existing namespace changes
    and makes the disk live: nvme_mpath_set_live() issues IO to the namespace
    through the ctrl, which does NOT have IO queues yet.
    
    See sample hang below.
    
    Solution:
    - nvme_update_ns_ana_state() to call set_live only if ctrl is live
    - nvme_read_ana_log() call from nvme_mpath_init_identify()
      therefore only fetches and parses the ANA log;
      any erros in this process will fail the ctrl setup as appropriate;
    - a separate function nvme_mpath_update()
      is called in nvme_start_ctrl();
      this parses the ANA log without fetching it.
      At this point the ctrl is live,
      therefore, disks can be set live normally.
    
    Sample failure:
        nvme nvme0: starting error recovery
        nvme nvme0: Reconnecting in 10 seconds...
        block nvme0n6: no usable path - requeuing I/O
        INFO: task kworker/u8:3:312 blocked for more than 122 seconds.
              Tainted: G            E     5.14.5-1.el7.elrepo.x86_64 #1
        Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
        Call Trace:
         __schedule+0x2a2/0x7e0
         schedule+0x4e/0xb0
         io_schedule+0x16/0x40
         wait_on_page_bit_common+0x15c/0x3e0
         do_read_cache_page+0x1e0/0x410
         read_cache_page+0x12/0x20
         read_part_sector+0x46/0x100
         read_lba+0x121/0x240
         efi_partition+0x1d2/0x6a0
         bdev_disk_changed.part.0+0x1df/0x430
         bdev_disk_changed+0x18/0x20
         blkdev_get_whole+0x77/0xe0
         blkdev_get_by_dev+0xd2/0x3a0
         __device_add_disk+0x1ed/0x310
         device_add_disk+0x13/0x20
         nvme_mpath_set_live+0x138/0x1b0 [nvme_core]
         nvme_update_ns_ana_state+0x2b/0x30 [nvme_core]
         nvme_update_ana_state+0xca/0xe0 [nvme_core]
         nvme_parse_ana_log+0xac/0x170 [nvme_core]
         nvme_read_ana_log+0x7d/0xe0 [nvme_core]
         nvme_mpath_init_identify+0x105/0x150 [nvme_core]
         nvme_init_identify+0x2df/0x4d0 [nvme_core]
         nvme_init_ctrl_finish+0x8d/0x3b0 [nvme_core]
         nvme_tcp_setup_ctrl+0x337/0x390 [nvme_tcp]
         nvme_tcp_reconnect_ctrl_work+0x24/0x40 [nvme_tcp]
         process_one_work+0x1bd/0x360
         worker_thread+0x50/0x3d0
    Signed-off-by: default avatarAnton Eidelman <anton@lightbitslabs.com>
    Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
    Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
    a4a6f3c8
core.c 128 KB