Commits · 9be5d2d424a19a0c6d643a1baecba3d58e725012 · Kirill Smelkov / linux

21 Apr, 2023 38 commits

Simon Horman authored Mar 31, 2023

This patch addresses the following minor kdoc problems.

* Incorrect spelling of 'callback' and 'notification'
* Unrecognised kdoc format for 'struct vdpa_map_file'
* Missing documentation of 'get_vendor_vq_stats' member of
  'struct vdpa_config_ops'
* Missing documentation of 'max_supported_vqs' and 'supported_features'
  members of 'struct vdpa_mgmt_dev'

Most of these problems were flagged by:

 $ ./scripts/kernel-doc -Werror -none  include/linux/vdpa.h
 include/linux/vdpa.h:20: warning: expecting prototype for struct vdpa_calllback. Prototype was for struct vdpa_callback instead
 include/linux/vdpa.h:117: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
  * Corresponding file area for device memory mapping
 include/linux/vdpa.h:357: warning: Function parameter or member 'get_vendor_vq_stats' not described in 'vdpa_config_ops'
 include/linux/vdpa.h:518: warning: Function parameter or member 'supported_features' not described in 'vdpa_mgmt_dev'
 include/linux/vdpa.h:518: warning: Function parameter or member 'max_supported_vqs' not described in 'vdpa_mgmt_dev'

The misspelling of 'notification' was flagged by:
 $ ./scripts/checkpatch.pl --codespell --showfile --strict -f include/linux/vdpa.h
 include/linux/vdpa.h:171: CHECK: 'notifcation' may be misspelled - perhaps 'notification'?
 ...
Signed-off-by: Simon Horman <horms@kernel.org>
Message-Id: <20230331-vhost-fixes-v1-1-1f046e735b9e@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>

9be5d2d4

virtio_ring: don't update event idx on get_buf · 6c0b057c

Albert Huang authored Mar 29, 2023

In virtio_net, if we disable napi_tx, when we trigger a tx interrupt,
the vq->event_triggered will be set to true. It is then never reset
until we explicitly call virtqueue_enable_cb_delayed or
virtqueue_enable_cb_prepare.

If we disable the napi_tx, virtqueue_enable_cb* will only be called when
the tx ring is getting relatively empty.

Since event_triggered is true, VRING_AVAIL_F_NO_INTERRUPT or
VRING_PACKED_EVENT_FLAG_DISABLE will not be set. As a result we update
vring_used_event(&vq->split.vring) or vq->packed.vring.driver->off_wrap
every time we call virtqueue_get_buf_ctx. This causes more interrupts.

To summarize:
1) event_triggered was set to true in vring_interrupt()
2) after this nothing will happen in virtqueue_disable_cb() so
   VRING_AVAIL_F_NO_INTERRUPT is not set in avail_flags_shadow
3) virtqueue_get_buf_ctx_split() will still think the cb is enabled
   and then it will publish a new event index

To fix:
update VRING_AVAIL_F_NO_INTERRUPT or VRING_PACKED_EVENT_FLAG_DISABLE in
the vq when we call virtqueue_disable_cb even when event_triggered is
true.

Tested with iperf:
iperf3 tcp stream:
vm1 -----------------> vm2
vm2 just receives tcp data stream from vm1, and sends acks to vm1,
there are many tx interrupts in vm2.
with the patch applied there are just a few tx interrupts.

v2->v3:
-update the interrupt disable flag even with the event_triggered is set,
-instead of checking whether event_triggered is set in
-virtqueue_get_buf_ctx_{packed/split}, will cause the drivers  which have
-not called virtqueue_{enable/disable}_cb to miss notifications.

v3->v4:
-remove change for
-"if (vq->packed.event_flags_shadow != VRING_PACKED_EVENT_FLAG_DISABLE)"
-in virtqueue_disable_cb_packed

Fixes: 8d622d21 ("virtio: fix up virtio_disable_cb")
Signed-off-by: Albert Huang <huangjie.albert@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230329102300.61000-1-huangjie.albert@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

6c0b057c

vdpa_sim: add support for user VA · 4bb94d2d

Stefano Garzarella authored Apr 04, 2023

The new "use_va" module parameter (default: true) is used in
vdpa_alloc_device() to inform the vDPA framework that the device
supports VA.

vringh is initialized to use VA only when "use_va" is true and the
user's mm has been bound. So, only when the bus supports user VA
(e.g. vhost-vdpa).

vdpasim_mm_work_fn work is used to serialize the binding to a new
address space when the .bind_mm callback is invoked, and unbinding
when the .unbind_mm callback is invoked.

Call mmget_not_zero()/kthread_use_mm() inside the worker function
to pin the address space only as long as needed, following the
documentation of mmget() in include/linux/sched/mm.h:

  * Never use this function to pin this address space for an
  * unbounded/indefinite amount of time.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20230404131734.45943-1-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

4bb94d2d

vdpa_sim: replace the spinlock with a mutex to protect the state · d7621c28

Stefano Garzarella authored Apr 04, 2023

The spinlock we use to protect the state of the simulator is sometimes
held for a long time (for example, when devices handle requests).

This also prevents us from calling functions that might sleep (such as
kthread_flush_work() in the next patch), and thus having to release
and retake the lock.

For these reasons, let's replace the spinlock with a mutex that gives
us more flexibility.
Suggested-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20230404131730.45920-1-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

d7621c28

vdpa_sim: use kthread worker · 76acfa7b

Stefano Garzarella authored Apr 04, 2023

Let's use our own kthread to run device jobs.
This allows us more flexibility, especially we can attach the kthread
to the user address space when vDPA uses user's VA.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20230404131725.45908-1-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

76acfa7b

vdpa_sim: make devices agnostic for work management · e2a4f808

Stefano Garzarella authored Apr 04, 2023

Let's move work management inside the vdpa_sim core.
This way we can easily change how we manage the works, without
having to change the devices each time.
Acked-by: Eugenio Pérez Martin <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20230404131721.45886-1-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

e2a4f808

vringh: support VA with iotlb · 42823a87

Stefano Garzarella authored Apr 04, 2023

vDPA supports the possibility to use user VA in the iotlb messages.
So, let's add support for user VA in vringh to use it in the vDPA
simulators.
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20230404131716.45855-1-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

42823a87

vringh: define the stride used for translation · f609d6cb

Stefano Garzarella authored Apr 04, 2023

Define a macro to be reused in the different parts of the code.

Useful for the next patches where we add more arrays to manage also
translations with user VA.
Suggested-by: Eugenio Perez Martin <eperezma@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20230404131326.44403-5-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

f609d6cb

vringh: replace kmap_atomic() with kmap_local_page() · c0371782

Stefano Garzarella authored Apr 04, 2023

kmap_atomic() is deprecated in favor of kmap_local_page() since commit
f3ba3c71 ("mm/highmem: Provide kmap_local*").

With kmap_local_page() the mappings are per thread, CPU local, can take
page-faults, and can be called from any context (including interrupts).
Furthermore, the tasks can be preempted and, when they are scheduled to
run again, the kernel virtual addresses are restored and still valid.

kmap_atomic() is implemented like a kmap_local_page() which also disables
page-faults and preemption (the latter only for !PREEMPT_RT kernels,
otherwise it only disables migration).

The code within the mappings/un-mappings in getu16_iotlb() and
putu16_iotlb() don't depend on the above-mentioned side effects of
kmap_atomic(), so that mere replacements of the old API with the new one
is all that is required (i.e., there is no need to explicitly add calls
to pagefault_disable() and/or preempt_disable()).

This commit reuses a "boiler plate" commit message from Fabio, who has
already did this change in several places.

Cc: "Fabio M. De Francesco" <fmdefrancesco@gmail.com>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20230404131326.44403-4-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

c0371782

vhost-vdpa: use bind_mm/unbind_mm device callbacks · 9067de47

Stefano Garzarella authored Apr 04, 2023

When the user call VHOST_SET_OWNER ioctl and the vDPA device
has `use_va` set to true, let's call the bind_mm callback.
In this way we can bind the device to the user address space
and directly use the user VA.

The unbind_mm callback is called during the release after
stopping the device.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20230404131326.44403-3-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

9067de47

vdpa: add bind_mm/unbind_mm callbacks · c618c84d

Stefano Garzarella authored Apr 04, 2023

These new optional callbacks is used to bind/unbind the device to
a specific address space so the vDPA framework can use VA when
these callbacks are implemented.
Suggested-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20230404131326.44403-2-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

c618c84d

vringh: fix typos in the vringh_init_* documentation · 905233af

Stefano Garzarella authored Mar 31, 2023

Replace `userpace` with `userspace`.

Cc: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20230331080208.17002-1-sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>

905233af

vduse: Support specifying bounce buffer size via sysfs · b774f93d

Xie Yongji authored Mar 23, 2023

As discussed in [1], this adds sysfs interface to support
specifying bounce buffer size in virtio-vdpa case. It would
be a performance tuning parameter for high throughput workloads.

[1] https://lore.kernel.org/netdev/e8f25a35-9d45-69f9-795d-bdbbb90337a3@redhat.com/Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230323053043.35-12-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

b774f93d

vduse: Delay iova domain creation · d4438d23

Xie Yongji authored Mar 23, 2023

Delay creating iova domain until the vduse device is
registered to vdpa bus.

This is a preparation for adding sysfs interface to
support specifying bounce buffer size for the iova
domain.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230323053043.35-11-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

d4438d23

vduse: Signal vq trigger eventfd directly if possible · e38632dd

Xie Yongji authored Mar 23, 2023

Now the vdpa callback will associate an trigger
eventfd in some cases. For performance reasons,
VDUSE can signal it directly during irq injection.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230323053043.35-10-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

e38632dd

vdpa: Add eventfd for the vdpa callback · 5e68470f

Xie Yongji authored Mar 23, 2023

Add eventfd for the vdpa callback so that user
can signal it directly instead of triggering the
callback. It will be used for vhost-vdpa case.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20230323053043.35-9-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>

5e68470f

vduse: Add sysfs interface for irq callback affinity · 66640f4a

Xie Yongji authored Mar 23, 2023

Add sysfs interface for each vduse virtqueue to
get/set the affinity for irq callback. This might
be useful for performance tuning when the irq callback
affinity mask contains more than one CPU.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20230323053043.35-8-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>

66640f4a

vduse: Support get_vq_affinity callback · bfae1648

Xie Yongji authored Mar 23, 2023

This implements get_vq_affinity callback so that
the virtio-blk driver can build the blk-mq queues
based on the irq callback affinity.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20230323053043.35-7-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>

bfae1648

vduse: Support set_vq_affinity callback · 28f6288e

Xie Yongji authored Mar 23, 2023

Since virtio-vdpa bus driver already support interrupt
affinity spreading mechanism, let's implement the
set_vq_affinity callback to bring it to vduse device.
After we get the virtqueue's affinity, we can spread
IRQs between CPUs in the affinity mask, in a round-robin
manner, to run the irq callback.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20230323053043.35-6-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>

28f6288e

vduse: Refactor allocation for vduse virtqueues · 78885597

Xie Yongji authored Mar 23, 2023

Allocate memory for vduse virtqueues one by one instead of
doing one allocation for all of them.

This is a preparation for adding sysfs interface for virtqueues.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230323053043.35-5-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

78885597

virtio-vdpa: Support interrupt affinity spreading mechanism · 3dad5682

Xie Yongji authored Mar 23, 2023

To support interrupt affinity spreading mechanism,
this makes use of group_cpus_evenly() to create
an irq callback affinity mask for each virtqueue
of vdpa device. Then we will unify set_vq_affinity
callback to pass the affinity to the vdpa device driver.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Message-Id: <20230323053043.35-4-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>

3dad5682

vdpa: Add set/get_vq_affinity callbacks in vdpa_config_ops · 1d246927

Xie Yongji authored Mar 23, 2023

This introduces set/get_vq_affinity callbacks in
vdpa_config_ops to support virtqueue affinity
management for vdpa device drivers.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230323053043.35-3-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

1d246927

lib/group_cpus: Export group_cpus_evenly() · aaf05948

Xie Yongji authored Mar 23, 2023

Export group_cpus_evenly() so that some modules
can make use of it to group CPUs evenly according
to NUMA and CPU locality.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230323053043.35-2-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

aaf05948

vdpa/mlx5: Extend driver support for new features · e9d67e59

Eli Cohen authored Mar 21, 2023

Extend the possible list for features that can be supported by firmware.
Note that different versions of firmware may or may not support these
features. The driver is made aware of them by querying the firmware.

While doing this, improve the code so we use enum names instead of hard
coded numerical values.

The new features supported by the driver are the following:

VIRTIO_NET_F_MRG_RXBUF
VIRTIO_NET_F_HOST_ECN
VIRTIO_NET_F_GUEST_ECN
VIRTIO_NET_F_GUEST_TSO6
VIRTIO_NET_F_GUEST_TSO4
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20230321112809.221432-3-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eugenio Pérez Martin <eperezma@redhat.com>

e9d67e59

vdpa/mlx5: Make VIRTIO_NET_F_MRG_RXBUF off by default · 791a1cb7

Eli Cohen authored Mar 21, 2023

Following patch adds driver support for VIRTIO_NET_F_MRG_RXBUF.

Current firmware versions show degradation in packet rate when using
MRG_RXBUF. Users who favor memory saving over packet rate could enable
this feature but we want to keep it off by default.

One can still enable it when creating the vdpa device using vdpa tool by
providing features that include it.

For example:
$ vdpa dev add name vdpa0 mgmtdev pci/0000:86:00.2 device_features 0x300cb982b
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20230321112809.221432-2-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>

791a1cb7

virtio_ring: Allow non power of 2 sizes for packed virtqueue · a084983d

Feng Liu authored Mar 15, 2023

According to the Virtio Specification, the Queue Size parameter of a
virtqueue corresponds to the maximum number of descriptors in that
queue, and it does not have to be a power of 2 for packed virtqueues.
However, the virtio_pci_modern driver enforced a power of 2 check for
virtqueue sizes, which is unnecessary and restrictive for packed
virtuqueue.

Split virtqueue still needs to check the virtqueue size is power_of_2
which has been done in vring_alloc_queue_split of the virtio_ring layer.

To validate this change, we tested various virtqueue sizes for packed
rings, including 128, 256, 512, 100, 200, 500, and 1000, with
CONFIG_PAGE_POISONING enabled, and all tests passed successfully.
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>

Message-Id: <20230315185458.11638-2-feliu@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

a084983d

vhost-scsi: Reduce vhost_scsi_mutex use · bea273c7

Mike Christie authored Mar 20, 2023

We on longer need to hold the vhost_scsi_mutex the entire time we
set/clear the endpoint. The tv_tpg_mutex handles tpg accesses not related
to the tpg list, the port link/unlink functions use the tv_tpg_mutex while
accessing the tpg->vhost_scsi pointer, vhost_scsi_do_plug will no longer
queue events after the virtqueue's backend has been cleared and flushed,
and we don't drop our refcount to the tpg until after we have stopped
cmds and wait for outstanding cmds to complete.

This moves the vhost_scsi_mutex use to it's documented use of being used
to access the tpg list. We then don't need to hold it while a flush is
being performed causing other device's vhost_scsi_set_endpoint
and vhost_scsi_make_tpg/vhost_scsi_drop_tpg calls to have to wait on a
flakey device.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230321020624.13323-8-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

bea273c7

vhost-scsi: Drop vhost_scsi_mutex use in port callouts · f5ed6f9e

Mike Christie authored Mar 20, 2023

We are using the vhost_scsi_mutex to make sure vhost_scsi_port_link and
vhost_scsi_port_unlink see if vhost_scsi_clear_endpoint has cleared
tpg->vhost_scsi and it can't be freed while they are using.

However, we currently set the tpg->vhost_scsi pointer while holding
tv_tpg_mutex. So, we can just hold that while calling
vhost_scsi_hotplug/hotunplug. We then don't need to hold the
vhost_scsi_mutex while vhost_scsi_clear_endpoint is holding it and doing
a flush which could cause the LUN map/unmap to have to wait on another
device's flush.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230321020624.13323-7-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

f5ed6f9e

vhost-scsi: Check for a cleared backend before queueing an event · ced9eb37

Mike Christie authored Mar 20, 2023

We currenly hold the vhost_scsi_mutex while clearing the endpoint and
while performing vhost_scsi_do_plug, so tpg->vhost_scsi can't be freed
from uder us, and to make sure anything queued is handled by the
full call in vhost_scsi_clear_endpoint.

This patch removes the need for the vhost_scsi_mutex for the latter
case. In the next patches, we won't hold the vhost_scsi_mutex while
flushing so this patch adds a check for the clearing of the virtqueue
from vhost_scsi_clear_endpoint. We then know that once
vhost_scsi_clear_endpoint has cleared the backend that no new events
will be queued, and the flush after the vhost_vq_set_backend(vq, NULL)
call will see everything that's been queued to that point. So the flush
will then handle all events without the need for the vhost_scsi_mutex.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230321020624.13323-6-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

ced9eb37

vhost-scsi: Drop device mutex use in vhost_scsi_do_plug · eb1b2914

Mike Christie authored Mar 20, 2023

We don't need the device mutex in vhost_scsi_do_plug because:
1. we have the vhost_scsi_mutex so the tpg->vhost_scsi pointer will not
change on us and the vhost_scsi can't be freed from under us if it was
set.
2. vhost_scsi_clear_endpoint will stop the virtqueues and flush them while
holding the vhost_scsi_mutex so we know once vhost_scsi_clear_endpoint
has completed that vhost_scsi_do_plug can't send new events and any
queued ones have completed.

So this patch drops the device mutex use in vhost_scsi_do_plug.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230321020624.13323-5-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

eb1b2914

vhost-scsi: Delay releasing our refcount on the tpg · 9a10cb4d

Mike Christie authored Mar 20, 2023

We currently hold the vhost_scsi_mutex the entire time we are running
vhost_scsi_clear_endpoint. One of the reasons for this is that it prevents
userspace from being able to free the se_tpg from under us after we have
called target_undepend_item. However, it forces management operations for
for other devices to have to wait on a flakey device's:

vhost_scsi_clear_endpoint -> vhost_scsi_flush()

call which can which can take a long time.

This moves the target_undepend_item call and the tpg unsetup code to after
we have stopped new IO from starting up and after we have waited on
running IO. We can then release our refcount on the tpg and session
knowing our device is no longer accessing them. We can then drop the
vhost_scsi_mutex use during thee flush call in later patches in this set,
when we have removed other reasons for holding it.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20230321020624.13323-4-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

9a10cb4d

virtio_ring: Use const to annotate read-only pointer params · 4b6ec919

Feng Liu authored Mar 10, 2023

Add const to make the read-only pointer parameters clear, similar to
many existing functions.

To implement this change, the commit also introduces the use of
`container_of_const` to implement `to_vvq`, which ensures the const-ness
of read-only parameters and avoids accidental modification of their
members.
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>

Message-Id: <20230310053428.3376-4-feliu@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

4b6ec919

virtio_ring: Avoid using inline for small functions · 1adbd6b2

Feng Liu authored Mar 10, 2023

According to kernel coding style [1], defining inline functions is not
necessary and beneficial for simple functions. Hence clean up the code
by removing the inline keyword.

It is verified with GCC 12.2.0, the generated code with/without inline
is same. Additionally tested with pktgen and iperf, and verified the
result, the pps test results are the same in the cases of with/without
inline.

Iperf and pps of pktgen for virtio-net didn't change before and after
the change.

[1]
https://www.kernel.org/doc/html/v6.2-rc3/process/coding-style.html#the-inline-diseaseSigned-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20230310053428.3376-3-feliu@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

1adbd6b2

tools/virtio: virtio_test -h,--help should return directly · 6b27cd84

Rong Tao authored Mar 09, 2023

When we get help information, we should return directly, and we should not
execute test cases. Move the exit() directly into the help() function and
remove it from case '?'.
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Message-Id: <tencent_822CEBEB925205EA1573541CD1C2604F4805@qq.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

6b27cd84

tools/virtio: virtio_test: Fix indentation · 9b2b3de6

Rong Tao authored Mar 09, 2023

Replace eight spaces with Tab.
Signed-off-by: Rong Tao <rtoax@foxmail.com>
Message-Id: <tencent_89579C514BC4020324A1A4ACA44B5B95BB07@qq.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

9b2b3de6

virtio: Reorder fields in 'struct virtqueue' · 48cd6bc5

Christophe JAILLET authored Feb 18, 2023

Group some variables based on their sizes to reduce hole and avoid padding.
On x86_64, this shrinks the size of 'struct virtqueue'
from 72 to 68 bytes.

It saves a few bytes of memory.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Message-Id: <8f3d2e49270a2158717e15008e7ed7228196ba02.1676707807.git.christophe.jaillet@wanadoo.fr>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Lafreniere <peter@n8pjl.ca>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

48cd6bc5

vhost: use struct_size and size_add to compute flex array sizes · e4be66e5

Jacob Keller authored Feb 27, 2023

The vhost_get_avail_size and vhost_get_used_size functions compute the size
of structures with flexible array members with an additional 2 bytes if the
VIRTIO_RING_F_EVENT_IDX feature flag is set. Convert these functions to use
struct_size() and size_add() instead of coding the calculation by hand.

This ensures that the calculations will saturate at SIZE_MAX rather than
overflowing.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Message-Id: <20230227214127.3678392-1-jacob.e.keller@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

e4be66e5

vdpa/mlx5: Avoid losing link state updates · c384c240

Eli Cohen authored Apr 17, 2023

Current code ignores link state updates if VIRTIO_NET_F_STATUS was not
negotiated. However, link state updates could be received before feature
negotiation was completed , therefore causing link state events to be
lost, possibly leaving the link state down.

Modify the code so link state notifier is registered after DRIVER_OK was
negotiated and carry the registration only if
VIRTIO_NET_F_STATUS was negotiated.  Unregister the notifier when the
device is reset.

Fixes: 033779a7 ("vdpa/mlx5: make MTU/STATUS presence conditional on feature bits")
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20230417110343.138319-1-elic@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

c384c240

16 Apr, 2023 2 commits

Linux 6.3-rc7 · 6a8f57ae
Linus Torvalds authored Apr 16, 2023

6a8f57ae

Merge tag 'sched_urgent_for_v6.3_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6c538e1a

Linus Torvalds authored Apr 16, 2023

Pull scheduler fix from Borislav Petkov:

 - Do not pull tasks to the local scheduling group if its average load
   is higher than the average system load

* tag 'sched_urgent_for_v6.3_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix imbalance overflow

6c538e1a