An error occurred fetching the project authors.
  1. 10 Aug, 2023 2 commits
    • Mike Christie's avatar
      vhost-scsi: Rename vhost_scsi_iov_to_sgl · c5ace19e
      Mike Christie authored
      Rename vhost_scsi_iov_to_sgl to vhost_scsi_map_iov_to_sgl so it matches
      matches the naming style used for vhost_scsi_copy_iov_to_sgl.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20230709202859.138387-3-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      c5ace19e
    • Mike Christie's avatar
      vhost-scsi: Fix alignment handling with windows · 5ced58bf
      Mike Christie authored
      The linux block layer requires bios/requests to have lengths with a 512
      byte alignment. Some drivers/layers like dm-crypt and the directi IO code
      will test for it and just fail. Other drivers like SCSI just assume the
      requirement is met and will end up in infinte retry loops. The problem
      for drivers like SCSI is that it uses functions like blk_rq_cur_sectors
      and blk_rq_sectors which divide the request's length by 512. If there's
      lefovers then it just gets dropped. But other code in the block/scsi
      layer may use blk_rq_bytes/blk_rq_cur_bytes and end up thinking there is
      still data left and try to retry the cmd. We can then end up getting
      stuck in retry loops where part of the block/scsi thinks there is data
      left, but other parts think we want to do IOs of zero length.
      
      Linux will always check for alignment, but windows will not. When
      vhost-scsi then translates the iovec it gets from a windows guest to a
      scatterlist, we can end up with sg items where the sg->length is not
      divisible by 512 due to the misaligned offset:
      
      sg[0].offset = 255;
      sg[0].length = 3841;
      sg...
      sg[N].offset = 0;
      sg[N].length = 255;
      
      When the lio backends then convert the SG to bios or other iovecs, we
      end up sending them with the same misaligned values and can hit the
      issues above.
      
      This just has us drop down to allocating a temp page and copying the data
      when we detect a misaligned buffer and the IO is large enough that it
      will get split into multiple bad IOs.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20230709202859.138387-2-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      5ced58bf
  2. 03 Jul, 2023 4 commits
    • Mike Christie's avatar
      vhost_scsi: add support for worker ioctls · d74b55e6
      Mike Christie authored
      This has vhost-scsi support the worker ioctls by calling the
      vhost_worker_ioctl helper.
      
      With a single worker, the single thread becomes a bottlneck when trying
      to use 3 or more virtqueues like:
      
      fio --filename=/dev/sdb  --direct=1 --rw=randrw --bs=4k \
      --ioengine=libaio --iodepth=128  --numjobs=3
      
      With the patches and doing a worker per vq, we can scale to at least
      16 vCPUs/vqs (that's my system limit) with the same command fio command
      above with numjobs=16:
      
      fio --filename=/dev/sdb  --direct=1 --rw=randrw --bs=4k \
      --ioengine=libaio --iodepth=64  --numjobs=16
      
      which gives around 2002K IOPs.
      
      Note that for testing I dropped depth to 64 above because the vhost/virt
      layer supports only 1024 total commands per device. And the only tuning I
      did was set LIO's emulate_pr to 0 to avoid LIO's PR lock in the main IO
      path which becomes an issue at around 12 jobs/virtqueues.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20230626232307.97930-17-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      d74b55e6
    • Mike Christie's avatar
      vhost_scsi: flush IO vqs then send TMF rsp · 0a3eac52
      Mike Christie authored
      With one worker we will always send the scsi cmd responses then send the
      TMF rsp, because LIO will always complete the scsi cmds first then call
      into us to send the TMF response.
      
      With multiple workers, the IO vq workers could be running while the
      TMF/ctl vq worker is running so this has us do a flush before completing
      the TMF to make sure cmds are completed when it's work is later queued
      and run.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20230626232307.97930-12-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      0a3eac52
    • Mike Christie's avatar
      vhost_scsi: convert to vhost_vq_work_queue · 78af31cc
      Mike Christie authored
      Convert from vhost_work_queue to vhost_vq_work_queue so we can
      remove vhost_work_queue.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20230626232307.97930-11-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      78af31cc
    • Mike Christie's avatar
      vhost_scsi: make SCSI cmd completion per vq · 48ae70dd
      Mike Christie authored
      This patch separates the scsi cmd completion code paths so we can complete
      cmds based on their vq instead of having all cmds complete on the same
      worker/CPU. This will be useful with the next patches that allow us to
      create mulitple worker threads and bind them to different vqs, and we can
      have completions running on different threads/CPUs.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Message-Id: <20230626232307.97930-10-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      48ae70dd
  3. 21 Apr, 2023 5 commits
    • Mike Christie's avatar
      vhost-scsi: Reduce vhost_scsi_mutex use · bea273c7
      Mike Christie authored
      We on longer need to hold the vhost_scsi_mutex the entire time we
      set/clear the endpoint. The tv_tpg_mutex handles tpg accesses not related
      to the tpg list, the port link/unlink functions use the tv_tpg_mutex while
      accessing the tpg->vhost_scsi pointer, vhost_scsi_do_plug will no longer
      queue events after the virtqueue's backend has been cleared and flushed,
      and we don't drop our refcount to the tpg until after we have stopped
      cmds and wait for outstanding cmds to complete.
      
      This moves the vhost_scsi_mutex use to it's documented use of being used
      to access the tpg list. We then don't need to hold it while a flush is
      being performed causing other device's vhost_scsi_set_endpoint
      and vhost_scsi_make_tpg/vhost_scsi_drop_tpg calls to have to wait on a
      flakey device.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20230321020624.13323-8-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      bea273c7
    • Mike Christie's avatar
      vhost-scsi: Drop vhost_scsi_mutex use in port callouts · f5ed6f9e
      Mike Christie authored
      We are using the vhost_scsi_mutex to make sure vhost_scsi_port_link and
      vhost_scsi_port_unlink see if vhost_scsi_clear_endpoint has cleared
      tpg->vhost_scsi and it can't be freed while they are using.
      
      However, we currently set the tpg->vhost_scsi pointer while holding
      tv_tpg_mutex. So, we can just hold that while calling
      vhost_scsi_hotplug/hotunplug. We then don't need to hold the
      vhost_scsi_mutex while vhost_scsi_clear_endpoint is holding it and doing
      a flush which could cause the LUN map/unmap to have to wait on another
      device's flush.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20230321020624.13323-7-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      f5ed6f9e
    • Mike Christie's avatar
      vhost-scsi: Check for a cleared backend before queueing an event · ced9eb37
      Mike Christie authored
      We currenly hold the vhost_scsi_mutex while clearing the endpoint and
      while performing vhost_scsi_do_plug, so tpg->vhost_scsi can't be freed
      from uder us, and to make sure anything queued is handled by the
      full call in vhost_scsi_clear_endpoint.
      
      This patch removes the need for the vhost_scsi_mutex for the latter
      case. In the next patches, we won't hold the vhost_scsi_mutex while
      flushing so this patch adds a check for the clearing of the virtqueue
      from vhost_scsi_clear_endpoint. We then know that once
      vhost_scsi_clear_endpoint has cleared the backend that no new events
      will be queued, and the flush after the vhost_vq_set_backend(vq, NULL)
      call will see everything that's been queued to that point. So the flush
      will then handle all events without the need for the vhost_scsi_mutex.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20230321020624.13323-6-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      ced9eb37
    • Mike Christie's avatar
      vhost-scsi: Drop device mutex use in vhost_scsi_do_plug · eb1b2914
      Mike Christie authored
      We don't need the device mutex in vhost_scsi_do_plug because:
      1. we have the vhost_scsi_mutex so the tpg->vhost_scsi pointer will not
      change on us and the vhost_scsi can't be freed from under us if it was
      set.
      2. vhost_scsi_clear_endpoint will stop the virtqueues and flush them while
      holding the vhost_scsi_mutex so we know once vhost_scsi_clear_endpoint
      has completed that vhost_scsi_do_plug can't send new events and any
      queued ones have completed.
      
      So this patch drops the device mutex use in vhost_scsi_do_plug.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20230321020624.13323-5-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      eb1b2914
    • Mike Christie's avatar
      vhost-scsi: Delay releasing our refcount on the tpg · 9a10cb4d
      Mike Christie authored
      We currently hold the vhost_scsi_mutex the entire time we are running
      vhost_scsi_clear_endpoint. One of the reasons for this is that it prevents
      userspace from being able to free the se_tpg from under us after we have
      called target_undepend_item. However, it forces management operations for
      for other devices to have to wait on a flakey device's:
      
      vhost_scsi_clear_endpoint -> vhost_scsi_flush()
      
      call which can which can take a long time.
      
      This moves the target_undepend_item call and the tpg unsetup code to after
      we have stopped new IO from starting up and after we have waited on
      running IO. We can then release our refcount on the tpg and session
      knowing our device is no longer accessing them. We can then drop the
      vhost_scsi_mutex use during thee flush call in later patches in this set,
      when we have removed other reasons for holding it.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20230321020624.13323-4-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      9a10cb4d
  4. 04 Apr, 2023 2 commits
    • Mike Christie's avatar
      vhost-scsi: Fix crash during LUN unmapping · 4c363c81
      Mike Christie authored
      We normally clear the endpoint then unmap LUNs so the devices are fully
      shutdown when the LUN is unmapped, but it's legal to unmap before
      clearing. If the user does that while TMFs are running then we can end
      up crashing.
      
      vhost_scsi_port_unlink assumes that the LUN's tmf struct will always be on
      the tmf_queue list. However, if a TMF is running then it will have been
      removed while it's executing. If we do a LUN unmap at this time, then
      we assume the entry is on the list and just start accessing it and free
      it.
      
      This fixes the bug by just allocating the vhost_scsi_tmf struct when it's
      needed like is done with the se_tmr struct that's needed when we submit
      the TMF. In this path perf is not an issue and we can use GFP_KERNEL
      since it won't swing directly back on us, so we don't need to preallocate
      the struct.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20230321020624.13323-3-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      4c363c81
    • Mike Christie's avatar
      vhost-scsi: Fix vhost_scsi struct use after free · e508efc3
      Mike Christie authored
      If vhost_scsi_setup_vq_cmds fails we leave the tpg->vhost_scsi pointer
      set. If the device is freed and then the user unmaps the LUN, the call to
      vhost_scsi_port_unlink -> vhost_scsi_hotunplug will see the that
      tpg->vhost_scsi is still set and try to use it.
      
      This has us clear the vhost_scsi pointer in the failure path. It also
      has us take tv_tpg_mutex in this failure path, because tv_tpg_vhost_count
      is accessed under this mutex in vhost_scsi_drop_nexus and in the future
      we will want to serialize access to tpg->vhost_scsi with that mutex
      instead of the vhost_scsi_mutex.
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Message-Id: <20230321020624.13323-2-michael.christie@oracle.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      e508efc3
  5. 30 Mar, 2023 1 commit
    • Jens Axboe's avatar
      iov_iter: add iter_iovec() helper · de4f5fed
      Jens Axboe authored
      This returns a pointer to the current iovec entry in the iterator. Only
      useful with ITER_IOVEC right now, but it prepares us to treat ITER_UBUF
      and ITER_IOVEC identically for the first segment.
      
      Rename struct iov_iter->iov to iov_iter->__iov to find any potentially
      troublesome spots, and also to prevent anyone from adding new code that
      accesses iter->iov directly.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      de4f5fed
  6. 17 Mar, 2023 1 commit
  7. 21 Feb, 2023 1 commit
  8. 27 Jan, 2023 1 commit
    • Jason Wang's avatar
      vhost-scsi: unbreak any layout for response · 6dd88fd5
      Jason Wang authored
      Al Viro said:
      
      """
      Since "vhost/scsi: fix reuse of &vq->iov[out] in response"
      we have this:
                      cmd->tvc_resp_iov = vq->iov[vc.out];
                      cmd->tvc_in_iovs = vc.in;
      combined with
                      iov_iter_init(&iov_iter, ITER_DEST, &cmd->tvc_resp_iov,
                                    cmd->tvc_in_iovs, sizeof(v_rsp));
      in vhost_scsi_complete_cmd_work().  We used to have ->tvc_resp_iov
      _pointing_ to vq->iov[vc.out]; back then iov_iter_init() asked to
      set an iovec-backed iov_iter over the tail of vq->iov[], with
      length being the amount of iovecs in the tail.
      
      Now we have a copy of one element of that array.  Fortunately, the members
      following it in the containing structure are two non-NULL kernel pointers,
      so copy_to_iter() will not copy anything beyond the first iovec - kernel
      pointer is not (on the majority of architectures) going to be accepted by
      access_ok() in copyout() and it won't be skipped since the "length" (in
      reality - another non-NULL kernel pointer) won't be zero.
      
      So it's not going to give a guest-to-qemu escalation, but it's definitely
      a bug.  Frankly, my preference would be to verify that the very first iovec
      is long enough to hold rsp_size.  Due to the above, any users that try to
      give us vq->iov[vc.out].iov_len < sizeof(struct virtio_scsi_cmd_resp)
      would currently get a failure in vhost_scsi_complete_cmd_work()
      anyway.
      """
      
      However, the spec doesn't say anything about the legacy descriptor
      layout for the respone. So this patch tries to not assume the response
      to reside in a single separate descriptor which is what commit
      79c14141 ("vhost/scsi: Convert completion path to use") tries to
      achieve towards to ANY_LAYOUT.
      
      This is done by allocating and using dedicate resp iov in the
      command. To be safety, start with UIO_MAXIOV to be consistent with the
      limitation that we advertise to the vhost_get_vq_desc().
      
      Testing with the hacked virtio-scsi driver that use 1 descriptor for 1
      byte in the response.
      Reported-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin Coddington <bcodding@redhat.com>
      Cc: Nicholas Bellinger <nab@linux-iscsi.org>
      Fixes: a77ec83a ("vhost/scsi: fix reuse of &vq->iov[out] in response")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Message-Id: <20230119073647.76467-1-jasowang@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      6dd88fd5
  9. 25 Nov, 2022 1 commit
    • Al Viro's avatar
      use less confusing names for iov_iter direction initializers · de4eda9d
      Al Viro authored
      READ/WRITE proved to be actively confusing - the meanings are
      "data destination, as used with read(2)" and "data source, as
      used with write(2)", but people keep interpreting those as
      "we read data from it" and "we write data to it", i.e. exactly
      the wrong way.
      
      Call them ITER_DEST and ITER_SOURCE - at least that is harder
      to misinterpret...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      de4eda9d
  10. 11 Aug, 2022 2 commits
  11. 09 Aug, 2022 1 commit
  12. 31 May, 2022 2 commits
  13. 05 Sep, 2021 1 commit
  14. 03 Jul, 2021 3 commits
  15. 04 Mar, 2021 5 commits
  16. 23 Feb, 2021 1 commit
  17. 18 Dec, 2020 1 commit
  18. 25 Nov, 2020 1 commit
  19. 15 Nov, 2020 4 commits
  20. 29 Jul, 2020 1 commit