1. 08 Oct, 2018 12 commits
  2. 05 Oct, 2018 3 commits
    • Bart Van Assche's avatar
      blk-mq-debugfs: Also show requests that have not yet been started · 6d8623a7
      Bart Van Assche authored
      When debugging e.g. the SCSI timeout handler it is important that
      requests that have not yet been started or that already have
      completed are also reported through debugfs.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6d8623a7
    • Jens Axboe's avatar
      Merge branch 'nvme-4.20' of git://git.infradead.org/nvme into for-4.20/block · 4f5735f3
      Jens Axboe authored
      Pull NVMe updates from Christoph:
      
      "A relatively boring merge window:
      
       - better AEN tracing (Chaitanya)
       - NUMA aware PCIe multipathing (me)
       - RDMA workqueue fixes (Sagi)
       - better bio usage in the target (Sagi)
       - FC rework for target removal (James)
       - better multipath handling of ->queue_rq failures (James)
       - various cleanups (Milan)"
      
      * 'nvme-4.20' of git://git.infradead.org/nvme:
        nvmet-rdma: use a private workqueue for delete
        nvme: take node locality into account when selecting a path
        nvmet: don't split large I/Os unconditionally
        nvme: call nvme_complete_rq when nvmf_check_ready fails for mpath I/O
        nvme-core: add async event trace helper
        nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device
        nvmet_fc: support target port removal with nvmet layer
        nvme-fc: fix for a minor typos
        nvmet: remove redundant module prefix
        nvme: fix typo in nvme_identify_ns_descs
      4f5735f3
    • Sagi Grimberg's avatar
      nvmet-rdma: use a private workqueue for delete · 2acf70ad
      Sagi Grimberg authored
      Queue deletion is done asynchronous when the last reference on the queue
      is dropped.  Thus, in order to make sure we don't over allocate under a
      connect/disconnect storm, we let queue deletion complete before making
      forward progress.
      
      However, given that we flush the system_wq from rdma_cm context which
      runs from a workqueue context, we can have a circular locking complaint
      [1]. Fix that by using a private workqueue for queue deletion.
      
      [1]:
      ======================================================
      WARNING: possible circular locking dependency detected
      4.19.0-rc4-dbg+ #3 Not tainted
      ------------------------------------------------------
      kworker/5:0/39 is trying to acquire lock:
      00000000a10b6db9 (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x6f/0x440 [rdma_cm]
      
      but task is already holding lock:
      00000000331b4e2c ((work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x3ed/0xa20
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #3 ((work_completion)(&queue->release_work)){+.+.}:
             process_one_work+0x474/0xa20
             worker_thread+0x63/0x5a0
             kthread+0x1cf/0x1f0
             ret_from_fork+0x24/0x30
      
      -> #2 ((wq_completion)"events"){+.+.}:
             flush_workqueue+0xf3/0x970
             nvmet_rdma_cm_handler+0x133d/0x1734 [nvmet_rdma]
             cma_ib_req_handler+0x72f/0xf90 [rdma_cm]
             cm_process_work+0x2e/0x110 [ib_cm]
             cm_req_handler+0x135b/0x1c30 [ib_cm]
             cm_work_handler+0x2b7/0x38cd [ib_cm]
             process_one_work+0x4ae/0xa20
      nvmet_rdma:nvmet_rdma_cm_handler: nvmet_rdma: disconnected (10): status 0 id 0000000040357082
             worker_thread+0x63/0x5a0
             kthread+0x1cf/0x1f0
             ret_from_fork+0x24/0x30
      nvme nvme0: Reconnecting in 10 seconds...
      
      -> #1 (&id_priv->handler_mutex/1){+.+.}:
             __mutex_lock+0xfe/0xbe0
             mutex_lock_nested+0x1b/0x20
             cma_ib_req_handler+0x6aa/0xf90 [rdma_cm]
             cm_process_work+0x2e/0x110 [ib_cm]
             cm_req_handler+0x135b/0x1c30 [ib_cm]
             cm_work_handler+0x2b7/0x38cd [ib_cm]
             process_one_work+0x4ae/0xa20
             worker_thread+0x63/0x5a0
             kthread+0x1cf/0x1f0
             ret_from_fork+0x24/0x30
      
      -> #0 (&id_priv->handler_mutex){+.+.}:
             lock_acquire+0xc5/0x200
             __mutex_lock+0xfe/0xbe0
             mutex_lock_nested+0x1b/0x20
             rdma_destroy_id+0x6f/0x440 [rdma_cm]
             nvmet_rdma_release_queue_work+0x8e/0x1b0 [nvmet_rdma]
             process_one_work+0x4ae/0xa20
             worker_thread+0x63/0x5a0
             kthread+0x1cf/0x1f0
             ret_from_fork+0x24/0x30
      
      Fixes: 777dc823 ("nvmet-rdma: occasionally flush ongoing controller teardown")
      Reported-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Tested-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      2acf70ad
  3. 03 Oct, 2018 2 commits
  4. 02 Oct, 2018 1 commit
  5. 01 Oct, 2018 10 commits
    • Christoph Hellwig's avatar
      nvme: take node locality into account when selecting a path · f3334447
      Christoph Hellwig authored
      Make current_path an array with an entry for every possible node, and
      cache the best path on a per-node basis.  Take the node distance into
      account when selecting it.  This is primarily useful for dual-ported PCIe
      devices which are connected to PCIe root ports on different sockets.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      f3334447
    • Sagi Grimberg's avatar
      nvmet: don't split large I/Os unconditionally · 73383adf
      Sagi Grimberg authored
      If we know that the I/O size exceeds our inline bio vec, no
      point using it and split the rest to begin with. We could
      in theory reuse the inline bio and only allocate the bio_vec,
      but its really not worth optimizing for.
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      73383adf
    • James Smart's avatar
      nvme: call nvme_complete_rq when nvmf_check_ready fails for mpath I/O · 783f4a44
      James Smart authored
      When an io is rejected by nvmf_check_ready() due to validation of the
      controller state, the nvmf_fail_nonready_command() will normally return
      BLK_STS_RESOURCE to requeue and retry.  However, if the controller is
      dying or the I/O is marked for NVMe multipath, the I/O is failed so that
      the controller can terminate or so that the io can be issued on a
      different path.  Unfortunately, as this reject point is before the
      transport has accepted the command, blk-mq ends up completing the I/O
      and never calls nvme_complete_rq(), which is where multipath may preserve
      or re-route the I/O. The end result is, the device user ends up seeing an
      EIO error.
      
      Example: single path connectivity, controller is under load, and a reset
      is induced.  An I/O is received:
      
        a) while the reset state has been set but the queues have yet to be
           stopped; or
        b) after queues are started (at end of reset) but before the reconnect
           has completed.
      
      The I/O finishes with an EIO status.
      
      This patch makes the following changes:
      
        - Adds the HOST_PATH_ERROR pathing status from TP4028
        - Modifies the reject point such that it appears to queue successfully,
          but actually completes the io with the new pathing status and calls
          nvme_complete_rq().
        - nvme_complete_rq() recognizes the new status, avoids resetting the
          controller (likely was already done in order to get this new status),
          and calls the multipather to clear the current path that errored.
          This allows the next command (retry or new command) to select a new
          path if there is one.
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      783f4a44
    • Chaitanya Kulkarni's avatar
      nvme-core: add async event trace helper · 09bd1ff4
      Chaitanya Kulkarni authored
      This patch adds a new event for nvme async event notification.
      We print the async event in the decoded format when we recognize
      the event otherwise we just dump the result.
      Signed-off-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      09bd1ff4
    • James Smart's avatar
      nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device · 97faec53
      James Smart authored
      The fc transport device should allow for a rediscovery, as userspace
      might have lost the events. Example is udev events not handled during
      system startup.
      
      This patch add a sysfs entry 'nvme_discovery' on the fc class to
      have it replay all udev discovery events for all local port/remote
      port address pairs.
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      97faec53
    • James Smart's avatar
      nvmet_fc: support target port removal with nvmet layer · ea96d649
      James Smart authored
      Currently, if a targetport has been connected to via the nvmet config
      (in other words, the add_port() transport routine called, and the nvmet
      port pointer stored for using in upcalls on new io), and if the
      targetport is then removed (say the lldd driver decides to unload or
      fully reset its hardware) and then re-added (the lldd driver reloads or
      reinits its hardware), the port pointer has been lost so there's no way
      to continue to post commands up to nvmet via the transport port.
      
      Correct by allocating a small "port context" structure that will be
      linked to by the targetport. The context will save the targetport WWN's
      and the nvmet port pointer to use for it.  Initial allocation will occur
      when the targetport is bound to via add_port.  The context will be
      deallocated when remove_port() is called.  If a targetport is removed
      while nvmet has the active port context, the targetport will be unlinked
      from the port context before removal.  If a new targetport is registered,
      the port contexts without a binding are looked through and if the WWN's
      match (so it's the same as nvmet's port context) the port context is
      linked to the new target port.  Thus new io can be received on the new
      targetport and operation resumes with nvmet.
      
      Additionally, this also resolves nvmet configuration changing out from
      underneath of the nvme-fc target port (for example: a nvmetcli clear).
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      ea96d649
    • Milan P. Gandhi's avatar
      d4e4230c
    • Chaitanya Kulkarni's avatar
      nvmet: remove redundant module prefix · d93cb392
      Chaitanya Kulkarni authored
      This patch removes the redundant module prefix used in the pr_err() when
      nvmet_get_smart_log_nsid() failed to find the namespace provided as a part
      of smart-log command.
      Signed-off-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      d93cb392
    • Milan P. Gandhi's avatar
    • Jens Axboe's avatar
      Merge tag 'v4.19-rc6' into for-4.20/block · c0aac682
      Jens Axboe authored
      Merge -rc6 in, for two reasons:
      
      1) Resolve a trivial conflict in the blk-mq-tag.c documentation
      2) A few important regression fixes went into upstream directly, so
         they aren't in the 4.20 branch.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      
      * tag 'v4.19-rc6': (780 commits)
        Linux 4.19-rc6
        MAINTAINERS: fix reference to moved drivers/{misc => auxdisplay}/panel.c
        cpufreq: qcom-kryo: Fix section annotations
        perf/core: Add sanity check to deal with pinned event failure
        xen/blkfront: correct purging of persistent grants
        Revert "xen/blkfront: When purging persistent grants, keep them in the buffer"
        selftests/powerpc: Fix Makefiles for headers_install change
        blk-mq: I/O and timer unplugs are inverted in blktrace
        dax: Fix deadlock in dax_lock_mapping_entry()
        x86/boot: Fix kexec booting failure in the SEV bit detection code
        bcache: add separate workqueue for journal_write to avoid deadlock
        drm/amd/display: Fix Edid emulation for linux
        drm/amd/display: Fix Vega10 lightup on S3 resume
        drm/amdgpu: Fix vce work queue was not cancelled when suspend
        Revert "drm/panel: Add device_link from panel device to DRM device"
        xen/blkfront: When purging persistent grants, keep them in the buffer
        clocksource/drivers/timer-atmel-pit: Properly handle error cases
        block: fix deadline elevator drain for zoned block devices
        ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge
        drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set
        ...
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c0aac682
  6. 30 Sep, 2018 4 commits
  7. 29 Sep, 2018 8 commits