1. 20 Oct, 2021 6 commits
    • Israel Rukshin's avatar
      nvmet-tcp: fix use-after-free when a port is removed · 2351ead9
      Israel Rukshin authored
      When removing a port, all its controllers are being removed, but there
      are queues on the port that doesn't belong to any controller (during
      connection time). This causes a use-after-free bug for any command
      that dereferences req->port (like in nvmet_alloc_ctrl). Those queues
      should be destroyed before freeing the port via configfs. Destroy
      the remaining queues after the accept_work was cancelled guarantees
      that no new queue will be created.
      Signed-off-by: default avatarIsrael Rukshin <israelr@nvidia.com>
      Reviewed-by: default avatarMax Gurtovoy <mgurtovoy@nvidia.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      2351ead9
    • Israel Rukshin's avatar
      nvmet-rdma: fix use-after-free when a port is removed · fcf73a80
      Israel Rukshin authored
      When removing a port, all its controllers are being removed, but there
      are queues on the port that doesn't belong to any controller (during
      connection time). This causes a use-after-free bug for any command
      that dereferences req->port (like in nvmet_alloc_ctrl). Those queues
      should be destroyed before freeing the port via configfs. Destroy the
      remaining queues after the RDMA-CM was destroyed guarantees that no
      new queue will be created.
      Signed-off-by: default avatarIsrael Rukshin <israelr@nvidia.com>
      Reviewed-by: default avatarMax Gurtovoy <mgurtovoy@nvidia.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      fcf73a80
    • Israel Rukshin's avatar
      nvmet: fix use-after-free when a port is removed · e3e19dcc
      Israel Rukshin authored
      When a port is removed through configfs, any connected controllers
      are starting teardown flow asynchronously and can still send commands.
      This causes a use-after-free bug for any command that dereferences
      req->port (like in nvmet_parse_io_cmd).
      
      To fix this, wait for all the teardown scheduled works to complete
      (like release_work at rdma/tcp drivers). This ensures there are no
      active controllers when the port is eventually removed.
      Signed-off-by: default avatarIsrael Rukshin <israelr@nvidia.com>
      Reviewed-by: default avatarMax Gurtovoy <mgurtovoy@nvidia.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      e3e19dcc
    • Saurav Kashyap's avatar
      qla2xxx: add ->map_queues support for nvme · 2b2af50a
      Saurav Kashyap authored
      Implement ->map queues and use the block layer blk_mq_pci_map_queues
      helper for mapping queues to CPUs.
      
      With this mapping minimum 10%+ increase in performance is noticed.
      Signed-off-by: default avatarSaurav Kashyap <skashyap@marvell.com>
      Signed-off-by: default avatarNilesh Javali <njavali@marvell.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      2b2af50a
    • Saurav Kashyap's avatar
      nvme-fc: add support for ->map_queues · 01d83816
      Saurav Kashyap authored
      NVMe FC don't have support for map queues, unlike the PCI, RDMA and TCP
      transports.  Add a ->map_queues callout for the LLDDs to provide such
      functionality.
      Signed-off-by: default avatarSaurav Kashyap <skashyap@marvell.com>
      Signed-off-by: default avatarNilesh Javali <njavali@marvell.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      01d83816
    • Hannes Reinecke's avatar
      nvme: generate uevent once a multipath namespace is operational again · f6f09c15
      Hannes Reinecke authored
      When fast_io_fail_tmo is set I/O will be aborted while recovery is
      still ongoing. This causes MD to set the namespace to failed, and
      no futher I/O will be submitted to that namespace.
      
      However, once the recovery succeeds and the namespace becomes
      operational again the NVMe subsystem doesn't send a notification,
      so MD cannot automatically reinstate operation and requires
      manual interaction.
      
      This patch will send a KOBJ_CHANGE uevent per multipathed namespace
      once the underlying controller transitions to LIVE, allowing an automatic
      MD reassembly with these udev rules:
      
      /etc/udev/rules.d/65-md-auto-re-add.rules:
      SUBSYSTEM!="block", GOTO="md_end"
      
      ACTION!="change", GOTO="md_end"
      ENV{ID_FS_TYPE}!="linux_raid_member", GOTO="md_end"
      PROGRAM="/sbin/md_raid_auto_readd.sh $devnode"
      LABEL="md_end"
      
      /sbin/md_raid_auto_readd.sh:
      
      MDADM=/sbin/mdadm
      DEVNAME=$1
      
      export $(${MDADM} --examine --export ${DEVNAME})
      
      if [ -z "${MD_UUID}" ]; then
          exit 1
      fi
      
      UUID_LINK=$(readlink /dev/disk/by-id/md-uuid-${MD_UUID})
      MD_DEVNAME=${UUID_LINK##*/}
      export $(${MDADM} --detail --export /dev/${MD_DEVNAME})
      if [ -z "${MD_METADATA}" ] ; then
          exit 1
      fi
      if [ $(cat /sys/block/${MD_DEVNAME}/md/degraded) != 1 ]; then
          echo "${MD_DEVNAME}: array not degraded, nothing to do"
          exit 0
      fi
      MD_STATE=$(cat /sys/block/${MD_DEVNAME}/md/array_state)
      if [ ${MD_STATE} != "clean" ] ; then
          echo "${MD_DEVNAME}: array state ${MD_STATE}, cannot re-add"
          exit 1
      fi
      MD_VARNAME="MD_DEVICE_dev_${DEVNAME##*/}_ROLE"
      if [ ${!MD_VARNAME} = "spare" ] ; then
          ${MDADM} --manage /dev/${MD_DEVNAME} --re-add ${DEVNAME}
      fi
      
      Changes to v2:
      - Add udev rules example to description
      Changes to v1:
      - use disk_uevent() as suggested by hch
      Signed-off-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      f6f09c15
  2. 19 Oct, 2021 3 commits
    • Jens Axboe's avatar
      nvme: don't memset() the normal read/write command · a9a7e30f
      Jens Axboe authored
      This memset in the fast path costs a lot of cycles on my setup. Here's a
      top-of-profile of doing ~6.7M IOPS:
      
      +    5.90%  io_uring  [nvme]            [k] nvme_queue_rq
      +    5.32%  io_uring  [nvme_core]       [k] nvme_setup_cmd
      +    5.17%  io_uring  [kernel.vmlinux]  [k] io_submit_sqes
      +    4.97%  io_uring  [kernel.vmlinux]  [k] blkdev_direct_IO
      
      and a perf diff with this patch:
      
           0.92%     +4.40%  [nvme_core]       [k] nvme_setup_cmd
      
      reducing it from 5.3% to only 0.9%. This takes it from the 2nd most
      cycle consumer to something that's mostly irrelevant.
      Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a9a7e30f
    • Jens Axboe's avatar
      nvme: move command clear into the various setup helpers · 9c3d2929
      Jens Axboe authored
      We don't have to worry about doing extra memsets by moving it outside
      the protection of RQF_DONTPREP, as nvme doesn't do partial completions.
      
      This is in preparation for making the read/write fast path not do a full
      memset of the command.
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9c3d2929
    • Michael Schmitz's avatar
      block: ataflop: fix breakage introduced at blk-mq refactoring · 86d46fda
      Michael Schmitz authored
      Refactoring of the Atari floppy driver when converting to blk-mq
      has broken the state machine in not-so-subtle ways:
      
      finish_fdc() must be called when operations on the floppy device
      have completed. This is crucial in order to relase the ST-DMA
      lock, which protects against concurrent access to the ST-DMA
      controller by other drivers (some DMA related, most just related
      to device register access - broken beyond compare, I know).
      
      When rewriting the driver's old do_request() function, the fact
      that finish_fdc() was called only when all queued requests had
      completed appears to have been overlooked. Instead, the new
      request function calls finish_fdc() immediately after the last
      request has been queued. finish_fdc() executes a dummy seek after
      most requests, and this overwrites the state machine's interrupt
      hander that was set up to wait for completion of the read/write
      request just prior. To make matters worse, finish_fdc() is called
      before device interrupts are re-enabled, making certain that the
      read/write interupt is missed.
      
      Shifting the finish_fdc() call into the read/write request
      completion handler ensures the driver waits for the request to
      actually complete. With a queue depth of 2, we won't see long
      request sequences, so calling finish_fdc() unconditionally just
      adds a little overhead for the dummy seeks, and keeps the code
      simple.
      
      While we're at it, kill ataflop_commit_rqs() which does nothing
      but run finish_fdc() unconditionally, again likely wiping out an
      in-flight request.
      Signed-off-by: default avatarMichael Schmitz <schmitzmic@gmail.com>
      Fixes: 6ec3938c ("ataflop: convert to blk-mq")
      CC: linux-block@vger.kernel.org
      CC: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Link: https://lore.kernel.org/r/20211019061321.26425-1-schmitzmic@gmail.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      86d46fda
  3. 18 Oct, 2021 31 commits