From: Stefano Garzarella <sgarzare@redhat.com>
To: Xuewei Niu <niuxuewei97@gmail.com>
Cc: fupan.lfp@antgroup.com, mst@redhat.com,
niuxuewei.nxw@antgroup.com, parav@nvidia.com,
virtio-comment@lists.linux.dev
Subject: Re: [PATCH v6 RESEND] virtio-vsock: Add support for multi devices
Date: Wed, 26 Mar 2025 09:50:23 +0100 [thread overview]
Message-ID: <7o7pkrmbwp6zyybgqqliybesezk2zpnopwwgzlbfpnw2m7relz@mxyhzve6lote> (raw)
In-Reply-To: <20250325031946.1934483-1-niuxuewei.nxw@antgroup.com>
On Tue, Mar 25, 2025 at 11:19:46AM +0800, Xuewei Niu wrote:
>> On Mon, Mar 24, 2025 at 02:43:35PM +0800, Xuewei Niu wrote:
>> >This patch brings a new feature, called "multi devices", to the virtio
>> >vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
>> >"device_order" field to the config for the virtio vsock.
>> >
>> >== Motivition ==
>> >
>> >Vsock is a lightweight and widely used data exchange mechanism between host
>> >and guest. Currently, the virtio-vsock only supports one device, resulting
>> >in the inability to enable more than one backend. For instance, two devices
>> >are required: one to transfer data to the VMM via virtio-vsock,
>>
>> Come to think of it, AF_VSOCK defines CID 0 (VMADDR_CID_HYPERVISOR) to
>> communicate with the hypervisor, but in virtio-vsock we never supported
>> it. Could this be the use case?
>>
>> We could in this way add a new feature for those devices that
>> communicate only with the VMM, where the CID of the VM is quite useless.
>> So instead of having multiple CIDs per VM, we could continue to have a
>> single CID, but the transport could support 2 devices, one to
>> communicate with the VMM (CID = 0) and one to communicate with the host
>> apps (CID = 2).
>>
>> Maybe this is orthogonal to this proposal, though, because it might
>> still make sense to have multiple vsock devices, even though it's not
>> very clear to me.
>
>In terms of the current situation, two devices are enough.
>
>We are the team of Kata Containers, so we are focusing on cloud-native
>computing. What I mentioned below might be beyond the scope of the virtio
>spec, just for your reference.
>
>The background is that the architecture of proxy mesh has been evolved over
>the past few years: from per-pod to per-host (e.g. Istio Ambient Mesh[1]).
>
>Thanks to the TSI[2] and vhost-user protocol, network packets can bypass
>both host and guest network stacks. It is possible to establish a fast path
>between the pod and the proxy.
>
>When we have multiple networks, it is intuitive to have multiple NICs. So
>does vsock.
Be careful though, we don't want to complicate vsock to become like a
NIC.
>
>When multiple networks are availble, it means that it is possible to have
>multiple proxies(i.e. user processes). In this case, two devices are not
>enough. This feature makes vsock more flexible and scalable.
This is a good point, but I really don't understand why a VM should have
multiple CIDs assigned.
>
>Looks like you don't like the design of multiple devices. May I ask why? Is
>it too heavy for you?
Yes, I am concerned that we are over complicating vsock.
Since AF_VSOCK already defines an address to communicate with the
hypervisor, why can't the device that ends up in the VMM (TSI) use that?
I believe that having multiple devices only introduces a complication in
the user. What source device/CID should the user use and for what
reason?
All this should be hidden and especially in your case, this is already
easily done by using VMADDR_CID_HYPERVISOR to communicate with the VMM
and VMADDR_CID_HOST to communicate with applications in the host. So
maybe we only need to handle 2 device types in the driver, without
adding this functionality but just introducing a new device type (via a
feature) that handles VMADDR_CID_HYPERVISOR, so that the driver knows
that at most it can have 2 devices, one for the VMM and one for the
host.
>
>> > and another to a user process via vhost-user-vsock.
>>
>> So to recap, one device would be used only to communicate with the VMM,
>> and the other device to communicate with other external processes,
>> right?
>>
>> Do you have any other use cases?
>>
>> >Apart from that, a side gain is that theoretically the performance might be
>> >improved since each device has its own queue. But it varies depending on
>> >the implementation.
>>
>> This though might be easier to implement supported multi-queue in the
>> device, instead of adding n devices to the VM.
>
>I think multi-queue and multi-device are independent of each other, just
>like what network devices do. A single vsock device can be considered as a
>group of queues (if multi-queue is supported), and it can be assigned a
>thread to handle the traffic.
That's right
multi-queue -> performance
multi-device -> more addresses
And that's where I worry, why complicate vsock to get more addresses for
a VM?
>
>So I accepted Parav's sugguestion, mentioned it as a side gain.
>
>> >== Typical Usages ==
>> >
>> >Assuming there are two virtio-vsock devices on the guest, with CIDs 3 and 4
>> >respectively. And the device with CID 3 is default.
>> >
>> >Connect to the host using the device with CID 3.
>> >
>> >```c
>> >// use default one (no bind)
>> >fd = socket(AF_VSOCK);
>> >connect(fd, 2, 1234);
>> >n = write(fd, buffer);
>> >
>> >// or bind explicitly
>> >fd = socket(AF_VSOCK);
>> >bind(fd, 3, -1);
>> >connect(fd, 2, 1234);
>> >n = write(fd, buffer);
>> >```
>> >
>> >Connect to the host using the device with CID 4.
>> >
>> >```c
>> >// must bind explicitly as the device with CID 4 is not default.
>> >fd = socket(AF_VSOCK);
>> >bind(fd, 4, -1);
>> >connect(fd, 2, 1234);
>> >n = write(fd, buffer);
>> >```
>> >
>> >The first version of multi-devices implementation is available at [1].
>> >
>> >[1] https://lore.kernel.org/virtualization/20240517144607.2595798-1-niuxuewei.nxw@antgroup.com
>> >
>> >Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
>> >---
>> > device-types/vsock/description.tex | 30 ++++++++++++++++++++++++++++--
>> > 1 file changed, 28 insertions(+), 2 deletions(-)
>> >
>> >diff --git a/device-types/vsock/description.tex b/device-types/vsock/description.tex
>> >index 7d91d15..7d0cfe4 100644
>> >--- a/device-types/vsock/description.tex
>> >+++ b/device-types/vsock/description.tex
>> >@@ -20,6 +20,7 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
>> > \item[VIRTIO_VSOCK_F_STREAM (0)] stream socket type is supported.
>> > \item[VIRTIO_VSOCK_F_SEQPACKET (1)] seqpacket socket type is supported.
>> > \item[VIRTIO_VSOCK_F_NO_IMPLIED_STREAM (2)] stream socket type is not implied.
>> >+\item[VIRTIO_VSOCK_F_MULTI_DEVICES (3)] multiple devices feature is supported.
>> > \end{description}
>> >
>> > \drivernormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
>> >@@ -34,6 +35,12 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
>> > VIRTIO_VSOCK_F_NO_IMPLIED_STREAM, the driver MAY act as if
>> > VIRTIO_VSOCK_F_STREAM has also been negotiated.
>> >
>> >+The driver SHOULD ignore devices that do not have
>> >+VIRTIO_VSOCK_F_MULTI_DEVICES if the feature has been negotiated.
>> >+
>> >+The driver SHOULD ignore all subsequent devices if a device without
>> >+VIRTIO_VSOCK_F_MULTI_DEVICES is present.
>> >+
>> > \devicenormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
>> >
>> > The device SHOULD offer the VIRTIO_VSOCK_F_NO_IMPLIED_STREAM feature.
>> >@@ -52,6 +59,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Socket Device
>> > \begin{lstlisting}
>> > struct virtio_vsock_config {
>> > le64 guest_cid;
>> >+ le16 device_order;
>> > };
>> > \end{lstlisting}
>> >
>> >@@ -77,11 +85,27 @@ \subsection{Device configuration layout}\label{sec:Device Types / Socket Device
>> > \hline
>> > \end{tabular}
>> >
>> >+The \field{device_order} is used to identify the default device. Up to
>> >+65,535 devices can be supported due to the size.
>> >+
>> >+\devicenormative{\subsubsection}{Device configuration layout}{Device Types / Socket Device / Device configuration layout}
>> >+
>> >+The device MUST provide a distinct \field{device_order} if
>> >+VIRTIO_VSOCK_F_MULTI_DEVICES feature has been negotiated.
>> >+
>> >+\drivernormative{\subsubsection}{Device configuration layout}{Device Types / Socket Device / Device configuration layout}
>> >+
>> >+The driver MUST treat the device with the lowest \field{device_order} as
>> >+the default device.
>> >+
>> > \subsection{Device Initialization}\label{sec:Device Types / Socket Device / Device Initialization}
>> >
>> > \begin{enumerate}
>> > \item The guest's cid is read from \field{guest_cid}.
>> >
>> >+\item If VIRTIO_VSOCK_F_MULTI_DEVICES has been negotiated, the device's
>> >+order will be read from \field{device_order}.
>> >+
>> > \item Buffers are added to the event virtqueue to receive events from the device.
>> >
>> > \item Buffers are added to the rx virtqueue to start receiving packets.
>> >@@ -233,8 +257,10 @@ \subsubsection{Receive and Transmit}\label{sec:Device Types / Socket Device / De
>> >
>> > \drivernormative{\paragraph}{Device Operation: Receive and Transmit}{Device Types / Socket Device / Device Operation / Receive and Transmit}
>> >
>> >-The \field{guest_cid} configuration field MUST be used as the source CID when
>> >-sending outgoing packets.
>> >+If \field{src_cid} is missing in outgoing packets, the driver MUST assign
>>
>> I think here we have to define what the driver does, since the driver
>> has to populate that field, how is it missing?
>>
>> Maybe we are confusing interaction with user space, so we should say
>> something like, “If the source socket is not bound to any source CID,
>> the driver MUST assign ...”
>
>Will do.
>
>> >+one. If more than one device is present, the driver SHOULD use the default
>> >+device's \field{guest_cid} configuration. Otherwise, the driver SHOULD use
>> >+the \field{guest_cid} of the only available device.
>> >
>> > A VIRTIO_VSOCK_OP_RST reply MUST be sent if a packet is received with an
>> > unknown \field{type} value.
>> >--
>> >2.34.1
>
>[1]: https://istio.io/latest/blog/2022/introducing-ambient-mesh/
>[2]: https://github.com/containers/libkrun?tab=readme-ov-file#networking
>
>Thanks,
>Xuewei
>
next prev parent reply other threads:[~2025-03-26 8:50 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-24 6:43 [PATCH v6 RESEND] virtio-vsock: Add support for multi devices Xuewei Niu
2025-03-24 13:51 ` Stefano Garzarella
2025-03-25 3:19 ` Xuewei Niu
2025-03-26 8:50 ` Stefano Garzarella [this message]
2025-03-26 10:00 ` Xuewei Niu
2025-03-26 10:32 ` Stefano Garzarella
2025-03-26 10:36 ` Stefano Garzarella
2025-03-26 2:59 ` Xuewei Niu
2025-03-26 9:03 ` Stefano Garzarella
2025-03-27 8:18 ` Xuewei Niu
2025-03-31 6:18 ` Xuewei Niu
2025-04-01 11:15 ` Stefano Garzarella
2025-04-07 2:17 ` Xuewei Niu
2025-04-08 13:34 ` Stefano Garzarella
2025-04-09 6:55 ` Xuewei Niu
2025-04-09 9:34 ` Stefano Garzarella
2025-04-10 3:05 ` Xuewei Niu
2025-04-10 7:21 ` Stefano Garzarella
2025-04-10 8:58 ` Xuewei Niu
2025-04-10 10:38 ` Stefano Garzarella
2025-04-10 10:47 ` Xuewei Niu
2025-04-10 10:49 ` Stefano Garzarella
2025-04-10 13:47 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7o7pkrmbwp6zyybgqqliybesezk2zpnopwwgzlbfpnw2m7relz@mxyhzve6lote \
--to=sgarzare@redhat.com \
--cc=fupan.lfp@antgroup.com \
--cc=mst@redhat.com \
--cc=niuxuewei.nxw@antgroup.com \
--cc=niuxuewei97@gmail.com \
--cc=parav@nvidia.com \
--cc=virtio-comment@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).