From: Stefano Garzarella <sgarzare@redhat.com>
To: Xuewei Niu <niuxuewei97@gmail.com>
Cc: fupan.lfp@antgroup.com, mst@redhat.com,
niuxuewei.nxw@antgroup.com, parav@nvidia.com,
stefanha@redhat.com, virtio-comment@lists.linux.dev
Subject: Re: [PATCH v6 RESEND] virtio-vsock: Add support for multi devices
Date: Tue, 8 Apr 2025 15:34:01 +0200 [thread overview]
Message-ID: <CAGxU2F4Qejw2hd45SduH=OwzUZVR6xYJATRyDskukHU8+2nkGw@mail.gmail.com> (raw)
In-Reply-To: <20250407021715.2736840-1-niuxuewei.nxw@antgroup.com>
On Mon, 7 Apr 2025 at 04:17, Xuewei Niu <niuxuewei97@gmail.com> wrote:
>
> > On Mon, Mar 31, 2025 at 02:18:27PM +0800, Xuewei Niu wrote:
> > >> > On Wed, 26 Mar 2025 at 11:32, Stefano Garzarella <sgarzare@redhat.com> wrote:
> > >> > > On Wed, Mar 26, 2025 at 06:00:31PM +0800, Xuewei Niu wrote:
> > >> > > >> On Tue, Mar 25, 2025 at 11:19:46AM +0800, Xuewei Niu wrote:
> > >> > > >> >> On Mon, Mar 24, 2025 at 02:43:35PM +0800, Xuewei Niu wrote:
> > >> > > >> >> >This patch brings a new feature, called "multi devices", to the virtio
> > >> > > >> >> >vsock. It introduces a "VIRTIO_VSOCK_F_MULTI_DEVICES" feature bit, and a
> > >> > > >> >> >"device_order" field to the config for the virtio vsock.
> > >> > > >> >> >
> > >> > > >> >> >== Motivition ==
> > >> > > >> >> >
> > >> > > >> >> >Vsock is a lightweight and widely used data exchange mechanism between host
> > >> > > >> >> >and guest. Currently, the virtio-vsock only supports one device, resulting
> > >> > > >> >> >in the inability to enable more than one backend. For instance, two devices
> > >> > > >> >> >are required: one to transfer data to the VMM via virtio-vsock,
> > >> > > >> >>
> > >> > > >> >> Come to think of it, AF_VSOCK defines CID 0 (VMADDR_CID_HYPERVISOR) to
> > >> > > >> >> communicate with the hypervisor, but in virtio-vsock we never supported
> > >> > > >> >> it. Could this be the use case?
> > >> > > >> >>
> > >> > > >> >> We could in this way add a new feature for those devices that
> > >> > > >> >> communicate only with the VMM, where the CID of the VM is quite useless.
> > >> > > >> >> So instead of having multiple CIDs per VM, we could continue to have a
> > >> > > >> >> single CID, but the transport could support 2 devices, one to
> > >> > > >> >> communicate with the VMM (CID = 0) and one to communicate with the host
> > >> > > >> >> apps (CID = 2).
> > >> > > >> >>
> > >> > > >> >> Maybe this is orthogonal to this proposal, though, because it might
> > >> > > >> >> still make sense to have multiple vsock devices, even though it's not
> > >> > > >> >> very clear to me.
> > >> > > >> >
> > >> > > >> >In terms of the current situation, two devices are enough.
> > >> > > >> >
> > >> > > >> >We are the team of Kata Containers, so we are focusing on cloud-native
> > >> > > >> >computing. What I mentioned below might be beyond the scope of the virtio
> > >> > > >> >spec, just for your reference.
> > >> > > >> >
> > >> > > >> >The background is that the architecture of proxy mesh has been evolved over
> > >> > > >> >the past few years: from per-pod to per-host (e.g. Istio Ambient Mesh[1]).
> > >> > > >> >
> > >> > > >> >Thanks to the TSI[2] and vhost-user protocol, network packets can bypass
> > >> > > >> >both host and guest network stacks. It is possible to establish a fast path
> > >> > > >> >between the pod and the proxy.
> > >> > > >> >
> > >> > > >> >When we have multiple networks, it is intuitive to have multiple NICs. So
> > >> > > >> >does vsock.
> > >> > > >>
> > >> > > >> Be careful though, we don't want to complicate vsock to become like a
> > >> > > >> NIC.
> > >> > > >>
> > >> > > >> >
> > >> > > >> >When multiple networks are availble, it means that it is possible to have
> > >> > > >> >multiple proxies(i.e. user processes). In this case, two devices are not
> > >> > > >> >enough. This feature makes vsock more flexible and scalable.
> > >> > > >>
> > >> > > >> This is a good point, but I really don't understand why a VM should have
> > >> > > >> multiple CIDs assigned.
> > >> > > >
> > >> > > >I think priority is not the biggest issue here. So let us focus on how to
> > >> > > >route the connection to the right device among more than two devices.
> > >> > >
> > >> > > That's why I was recommending a different approach. IMO the user should
> > >> > > not do this, but that should be transparent, hidden in the driver.
> > >> > >
> > >> > > By supporting VMADDR_CID_HYPERVISOR, we know very well if a packet is to
> > >> > > be sent to the VMM, then we have to use the device that supports it.
> > >> > > Whereas if the user connects to VMADDR_CID_HOST we have to use the other
> > >> > > device.
> > >> > >
> > >> > > The user doesn't have to do anything, only use the right destination CID
> > >> > > if it wants to talk to the VMM or another host process.
> > >> >
> > >> > Obviously, if we want to support more than 2 devices, we need this
> > >> > that you are proposing. But IMO we need also to support
> > >> > VMADDR_CID_HYPERVISOR, and we should prevent the user from doing
> > >> > bind() on a random CID if one of the two devices only talks to the
> > >> > VMM.
> > >>
> > >> I agree with supporting `VMADDR_CID_HYPERVISOR` for virtio-vsock. I can
> > >> work on this later.
> >
> > Would be nice to have both together, but I'm fine if you want to
> > postpone it.
> >
> > >>
> > >> > Because, again, how does the user know which CID to bind?
> > >>
> > >> Nice catch! I am trying to give a solution for this issue regarding the
> > >> scenario of more than two devices.
> > >>
> > >> Let users access the `device_order` and the `guest_cid` field. Host user
> > >> program and guest user program can make an advance agreement. For example,
> > >> the first device (whose `device_order` is smallest) is used to communicate
> > >> with host process 1, the second device is used to host process 2, and so
> > >> on.
> > >>
> > >> The guest user program want to direct the message to host process 2, then
> > >> the things would be:
> > >>
> > >> 1. Guest user program gets the second device's `guest_cid`.
> > >> 2. Guest user program binds to the CID.
> > >>
> > >> This could be worked because the `device_order` is a VM-level
> > >> configuration. (On the contrary, the `guest_cid` is a host-level
> > >> configuration).
> > >>
> > >> If people don't need this feature (use 1 or 2 devices only), they can use
> > >> vsock as the simple way. Otherwise, people should accept the more
> > >> complicated way.
> > >>
> > >> WDYT?
> > >
> > >Or we can replace the device_order with the guest_lid (aka local id). The
> > >guest_lid is a VM-level address space, while the guest_cid is a host-level
> > >address space.
> > >
> > >```c
> > >struct virtio_vsock_config {
> > > __le64 guest_cid;
> > > __le16 guest_lid; /* previous device_order */
> > >};
> > >```
> > >
> > >With this design, the relationship between the device and the guest_lid
> > >should be set properly before building the guest app and launching the
> > >VM.
> > >
> > >For example, host process 0's guest_lid is 1000, and host process 1's is
> > >2000. Their guest_cid will be determined when the VM started. The device
> > >table should be like this:
> > >
> > >* device0: process=VM guest_lid=0 guest_cid=0 <default device>
> > >* device1: process=0 guest_lid=1000 guest_cid=x
> > >* device2: process=1 guest_lid=2000 guest_cid=y
> > >
> > >The driver should expose an interface, such as ioctl, receiving a
> > >local_cid. Guest apps can use it to obtain the actual guest_cid.
> >
> > No, please, I don't think adding virtio-specific behaviour in AF_VSOCK
> > is what we want.
> >
> > Let's continue with device_order and see what others say.
> >
> > I think we need to try to get a better understanding of what to do,
> > depending on the direction:
> >
> > - host -> guest: it might make sense multiple devices with different
> > CIDs, and the host will know which one to use depending on the CID
> > assigned to the device (e.g. vhost, vhost-user, device in VMM)
> >
> > - guest -> host: again I think we should differentiate the device to use
> > depending on the destination CID which can be VMADDR_CID_HOST,
> > VMADDR_CID_HYPERVISOR, or in the case where sibling communication is
> > supported a CID >= 3, so maybe we should have some features or flags
> > in the config space to describe destination CID supported for each
> > device
>
> I don't understand the point of adding a new features/flags. Could you
> explain a bit more?
The idea is to inform the guest which addresses are reachable by the
device, so the guest can easily decide which device to use. I'm
talking about the destination, so CID_HOST(2), CID_HYPERVISOS(0) or a
sibling VM (CID >=3).
>
> We have had the guest_cid field in the config space. The guest knows all
> devices present in the VM.
Okay, but how can the guest figure out from this information which
device to use to talk to the hypervisor or an application in the host?
>
> If the app tries to bind a random CID, it will fail since the driver can't
> find the device by the CID.
I'm not talking about the source CID on which to do bind() (which I
honestly don't like), but I'm talking about the destination CID on
which to do connect().
>
> > so that the guest knows which device to use depending on the destination
> > CID.
>
> Yes, this is what I was describing in the previous comment. The message
> will be directed to the device by the destination CID.
Sorry, I don't understand how you do this without having an
information from the device about what addresses it supports. Can you
elaborate a bit?
Thanks,
Stefano
next prev parent reply other threads:[~2025-04-08 13:34 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-24 6:43 [PATCH v6 RESEND] virtio-vsock: Add support for multi devices Xuewei Niu
2025-03-24 13:51 ` Stefano Garzarella
2025-03-25 3:19 ` Xuewei Niu
2025-03-26 8:50 ` Stefano Garzarella
2025-03-26 10:00 ` Xuewei Niu
2025-03-26 10:32 ` Stefano Garzarella
2025-03-26 10:36 ` Stefano Garzarella
2025-03-26 2:59 ` Xuewei Niu
2025-03-26 9:03 ` Stefano Garzarella
2025-03-27 8:18 ` Xuewei Niu
2025-03-31 6:18 ` Xuewei Niu
2025-04-01 11:15 ` Stefano Garzarella
2025-04-07 2:17 ` Xuewei Niu
2025-04-08 13:34 ` Stefano Garzarella [this message]
2025-04-09 6:55 ` Xuewei Niu
2025-04-09 9:34 ` Stefano Garzarella
2025-04-10 3:05 ` Xuewei Niu
2025-04-10 7:21 ` Stefano Garzarella
2025-04-10 8:58 ` Xuewei Niu
2025-04-10 10:38 ` Stefano Garzarella
2025-04-10 10:47 ` Xuewei Niu
2025-04-10 10:49 ` Stefano Garzarella
2025-04-10 13:47 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGxU2F4Qejw2hd45SduH=OwzUZVR6xYJATRyDskukHU8+2nkGw@mail.gmail.com' \
--to=sgarzare@redhat.com \
--cc=fupan.lfp@antgroup.com \
--cc=mst@redhat.com \
--cc=niuxuewei.nxw@antgroup.com \
--cc=niuxuewei97@gmail.com \
--cc=parav@nvidia.com \
--cc=stefanha@redhat.com \
--cc=virtio-comment@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).