From: Xuewei Niu <niuxuewei97@gmail.com>
To: niuxuewei97@gmail.com
Cc: fupan.lfp@antgroup.com, mst@redhat.com,
niuxuewei.nxw@antgroup.com, parav@nvidia.com,
sgarzare@redhat.com, stefanha@redhat.com,
virtio-comment@lists.linux.dev
Subject: Re: [PATCH v6 RESEND] virtio-vsock: Add support for multi devices
Date: Mon, 31 Mar 2025 14:18:27 +0800 [thread overview]
Message-ID: <20250331061827.2500867-1-niuxuewei.nxw@antgroup.com> (raw)
In-Reply-To: <20250327081830.2309856-1-niuxuewei.nxw@antgroup.com>
> > On Wed, 26 Mar 2025 at 11:32, Stefano Garzarella <sgarzare@redhat.com> wrote:
> > > On Wed, Mar 26, 2025 at 06:00:31PM +0800, Xuewei Niu wrote:
> > > >> On Tue, Mar 25, 2025 at 11:19:46AM +0800, Xuewei Niu wrote:
> > > >> >> On Mon, Mar 24, 2025 at 02:43:35PM +0800, Xuewei Niu wrote:
> > > >> >> >This patch brings a new feature, called "multi devices", to the virtio
> > > >> >> >vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > > >> >> >"device_order" field to the config for the virtio vsock.
> > > >> >> >
> > > >> >> >== Motivition ==
> > > >> >> >
> > > >> >> >Vsock is a lightweight and widely used data exchange mechanism between host
> > > >> >> >and guest. Currently, the virtio-vsock only supports one device, resulting
> > > >> >> >in the inability to enable more than one backend. For instance, two devices
> > > >> >> >are required: one to transfer data to the VMM via virtio-vsock,
> > > >> >>
> > > >> >> Come to think of it, AF_VSOCK defines CID 0 (VMADDR_CID_HYPERVISOR) to
> > > >> >> communicate with the hypervisor, but in virtio-vsock we never supported
> > > >> >> it. Could this be the use case?
> > > >> >>
> > > >> >> We could in this way add a new feature for those devices that
> > > >> >> communicate only with the VMM, where the CID of the VM is quite useless.
> > > >> >> So instead of having multiple CIDs per VM, we could continue to have a
> > > >> >> single CID, but the transport could support 2 devices, one to
> > > >> >> communicate with the VMM (CID = 0) and one to communicate with the host
> > > >> >> apps (CID = 2).
> > > >> >>
> > > >> >> Maybe this is orthogonal to this proposal, though, because it might
> > > >> >> still make sense to have multiple vsock devices, even though it's not
> > > >> >> very clear to me.
> > > >> >
> > > >> >In terms of the current situation, two devices are enough.
> > > >> >
> > > >> >We are the team of Kata Containers, so we are focusing on cloud-native
> > > >> >computing. What I mentioned below might be beyond the scope of the virtio
> > > >> >spec, just for your reference.
> > > >> >
> > > >> >The background is that the architecture of proxy mesh has been evolved over
> > > >> >the past few years: from per-pod to per-host (e.g. Istio Ambient Mesh[1]).
> > > >> >
> > > >> >Thanks to the TSI[2] and vhost-user protocol, network packets can bypass
> > > >> >both host and guest network stacks. It is possible to establish a fast path
> > > >> >between the pod and the proxy.
> > > >> >
> > > >> >When we have multiple networks, it is intuitive to have multiple NICs. So
> > > >> >does vsock.
> > > >>
> > > >> Be careful though, we don't want to complicate vsock to become like a
> > > >> NIC.
> > > >>
> > > >> >
> > > >> >When multiple networks are availble, it means that it is possible to have
> > > >> >multiple proxies(i.e. user processes). In this case, two devices are not
> > > >> >enough. This feature makes vsock more flexible and scalable.
> > > >>
> > > >> This is a good point, but I really don't understand why a VM should have
> > > >> multiple CIDs assigned.
> > > >
> > > >I think priority is not the biggest issue here. So let us focus on how to
> > > >route the connection to the right device among more than two devices.
> > >
> > > That's why I was recommending a different approach. IMO the user should
> > > not do this, but that should be transparent, hidden in the driver.
> > >
> > > By supporting VMADDR_CID_HYPERVISOR, we know very well if a packet is to
> > > be sent to the VMM, then we have to use the device that supports it.
> > > Whereas if the user connects to VMADDR_CID_HOST we have to use the other
> > > device.
> > >
> > > The user doesn't have to do anything, only use the right destination CID
> > > if it wants to talk to the VMM or another host process.
> >
> > Obviously, if we want to support more than 2 devices, we need this
> > that you are proposing. But IMO we need also to support
> > VMADDR_CID_HYPERVISOR, and we should prevent the user from doing
> > bind() on a random CID if one of the two devices only talks to the
> > VMM.
>
> I agree with supporting `VMADDR_CID_HYPERVISOR` for virtio-vsock. I can
> work on this later.
>
> > Because, again, how does the user know which CID to bind?
>
> Nice catch! I am trying to give a solution for this issue regarding the
> scenario of more than two devices.
>
> Let users access the `device_order` and the `guest_cid` field. Host user
> program and guest user program can make an advance agreement. For example,
> the first device (whose `device_order` is smallest) is used to communicate
> with host process 1, the second device is used to host process 2, and so
> on.
>
> The guest user program want to direct the message to host process 2, then
> the things would be:
>
> 1. Guest user program gets the second device's `guest_cid`.
> 2. Guest user program binds to the CID.
>
> This could be worked because the `device_order` is a VM-level
> configuration. (On the contrary, the `guest_cid` is a host-level
> configuration).
>
> If people don't need this feature (use 1 or 2 devices only), they can use
> vsock as the simple way. Otherwise, people should accept the more
> complicated way.
>
> WDYT?
Or we can replace the device_order with the guest_lid (aka local id). The
guest_lid is a VM-level address space, while the guest_cid is a host-level
address space.
```c
struct virtio_vsock_config {
__le64 guest_cid;
__le16 guest_lid; /* previous device_order */
};
```
With this design, the relationship between the device and the guest_lid
should be set properly before building the guest app and launching the
VM.
For example, host process 0's guest_lid is 1000, and host process 1's is
2000. Their guest_cid will be determined when the VM started. The device
table should be like this:
* device0: process=VM guest_lid=0 guest_cid=0 <default device>
* device1: process=0 guest_lid=1000 guest_cid=x
* device2: process=1 guest_lid=2000 guest_cid=y
The driver should expose an interface, such as ioctl, receiving a
local_cid. Guest apps can use it to obtain the actual guest_cid.
It is expected that there will not be too many virtio-vsock devices (less
than 16). Therefore, conflicts with guest_lid are not a big issue.
Thanks,
Xuewei
> > >
> > > >
> > > >Our solution uses CID as device identification. From the users'
> > > >perspective, they can direct the connection to the appropriate device by
> > > >specifying a CID in either the `connect` or `bind` syscall.
> > >
> > > How does the user know which device/CID to bind if it wants to talk with
> > > the VMM or with the application?
> > >
> > > >
> > > >Assigning one CID to a VM looks good to me. But I am not sure how to
> > > >distinguish the devices. For example, should we expose a ioctl or a
> > > >sockopt?
> > >
> > > Nope, just simply use the right destination CID in the connect()
> > > (VMADDR_CID_HYPERVISOR or VMADDR_CID_HOST), without doing any bind().
> > >
> > > For receiving, the user can check the source CID after connection and
> > > decide to discard connections from VMADDR_CID_HYPERVISOR or
> > > VMADDR_CID_HOST depending of the service.
> > >
> > > Thanks,
> > > Stefano
next prev parent reply other threads:[~2025-03-31 6:18 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-24 6:43 [PATCH v6 RESEND] virtio-vsock: Add support for multi devices Xuewei Niu
2025-03-24 13:51 ` Stefano Garzarella
2025-03-25 3:19 ` Xuewei Niu
2025-03-26 8:50 ` Stefano Garzarella
2025-03-26 10:00 ` Xuewei Niu
2025-03-26 10:32 ` Stefano Garzarella
2025-03-26 10:36 ` Stefano Garzarella
2025-03-26 2:59 ` Xuewei Niu
2025-03-26 9:03 ` Stefano Garzarella
2025-03-27 8:18 ` Xuewei Niu
2025-03-31 6:18 ` Xuewei Niu [this message]
2025-04-01 11:15 ` Stefano Garzarella
2025-04-07 2:17 ` Xuewei Niu
2025-04-08 13:34 ` Stefano Garzarella
2025-04-09 6:55 ` Xuewei Niu
2025-04-09 9:34 ` Stefano Garzarella
2025-04-10 3:05 ` Xuewei Niu
2025-04-10 7:21 ` Stefano Garzarella
2025-04-10 8:58 ` Xuewei Niu
2025-04-10 10:38 ` Stefano Garzarella
2025-04-10 10:47 ` Xuewei Niu
2025-04-10 10:49 ` Stefano Garzarella
2025-04-10 13:47 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250331061827.2500867-1-niuxuewei.nxw@antgroup.com \
--to=niuxuewei97@gmail.com \
--cc=fupan.lfp@antgroup.com \
--cc=mst@redhat.com \
--cc=niuxuewei.nxw@antgroup.com \
--cc=parav@nvidia.com \
--cc=sgarzare@redhat.com \
--cc=stefanha@redhat.com \
--cc=virtio-comment@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).