virtio-dev.lists.oasis-open.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: zhenwei pi <pizhenwei@bytedance.com>
Cc: "Michael S . Tsirkin" <mst@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>,
	parav@nvidia.com,  virtio-dev@lists.oasis-open.org,
	 "virtio-comment@lists.oasis-open.org"
	<virtio-comment@lists.oasis-open.org>,
	 "helei.sig11@bytedance.com" <helei.sig11@bytedance.com>,
	houp@yusur.tech
Subject: [virtio-dev] Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
Date: Mon, 24 Apr 2023 11:40:02 +0800	[thread overview]
Message-ID: <CACGkMEvh+G2Tm=WDTOK3K2GO8c2dQ4y=UiKAk_kvsid2MGzbGg@mail.gmail.com> (raw)
In-Reply-To: <1ab0beff-8b18-7a94-1a68-6bf36bcd0394@bytedance.com>

On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi <pizhenwei@bytedance.com> wrote:
>
> Hi,
>
> In the past years, virtio supports lots of device specifications by
> PCI/MMIO/CCW. These devices work fine in the virtualization environment,
> and we have a chance to support virtio device family for the
> container/host scenario.

PCI can work for containers for sure (or does it meet any issue like
scalability?). It's better to describe what problems you met and why
you choose this way to solve it.

It's better to compare this with

1) hiding the fabrics details via DPU
2) vDPA

>
> - Theory
> "Virtio Over Fabrics" aims at "reuse virtio device specifications", and
> provides network defined peripheral devices.
> And this protocol also could be used in virtualization environment,
> typically hypervisor(or vhost-user process) handles request from virtio
> PCI/MMIO/CCW, remaps request and forwards to target by fabrics.

This requires meditation in the datapath, isn't it?

>
> - Protocol
> The detail protocol definition see:
> https://github.com/pizhenwei/linux/blob/virtio-of-github/include/uapi/linux/virtio_of.h

I'd say a RFC patch for virtio spec is more suitable than the codes.

>
> Example of virtio-blk read/write by TCP/RDMA:
> 1. Virtio Over TCP
> 1.1 An example of virtio-blk write(8K) command:
> Initiator side sends a stream buffer(command + 4 * desc + 8208 bytes):
>   COMMAND            +------+
>                      |opcode|  ->  virtio_of_op_vring
>                      +------+
>                      |cmd id|  ->  10
>                      +------+
>                      |length|  ->  8208
>                      +------+
>                      |ndesc |  ->  4
>                      +------+
>                      |rsvd  |
>                      +------+
>
>   DESC0              +------+
>                +-----|addr  |  -> 0
>                |     +------+
>                |     |length|  -> 16 (virtio blk write command)
>                |     +------+
>                |     |id    |  -> 10
>                |     +------+
>                |     |flags |  -> VRING_DESC_F_NEXT
>                |     +------+
>                |
>   DESC1        |     +------+
>                | +---|addr  |  -> 16
>                | |   +------+
>                | |   |length|  -> 4096
>                | |   +------+
>                | |   |id    |  -> 11
>                | |   +------+
>                | |   |flags |  -> VRING_DESC_F_NEXT
>                | |   +------+
>                | |
>   DESC2        | |   +------+
>                | |   |addr  |  -> 4112
>                | |   +------+
>                | | +-|length|  -> 4096
>                | | | +------+
>                | | | |id    |  -> 12
>                | | | +------+
>                | | | |flags |  -> VRING_DESC_F_NEXT
>                | | | +------+
>                | | |
>   DESC3        | | | +------+
>                | | | |addr  |  -> 0
>                | | | +------+
>                | | | |length|  -> 1
>                | | | +------+
>                | | | |id    |  -> 13
>                | | | +------+
>                | | | |flags |  -> VRING_DESC_F_WRITE
>                | | | +------+
>                | | |
>   DATA         +-+-+>+------+  -> 0
>                  | | |......|
>                  +-+>+------+  -> 16
>                    | |......|
>                    +>+------+  -> 4112
>                      |......|
>                      +------+  -> 8208
>
> Target side sends a stream buffer(completion + 1 * desc + 1 bytes):
>   COMPLETION         +------+
>                      |status|  ->  VIRTIO_OF_SUCCESS
>                      +------+
>                      |cmd id|  ->  10
>                      +------+
>                      |ndesc |  ->  1
>                      +------+
>                      |rsvd  |
>                      +------+
>                      |value |  -> 1 (value.u32)
>                      +------+
>
>   DESC0              +------+
>                    +-|addr  |  -> 0
>                    | +------+
>                    | |length|  -> 1
>                    | +------+
>                    | |id    |  -> 13
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_WRITE
>                    | +------+
>                    |
>   DATA             |>+------+  -> 0
>                      |......|
>                      +------+  -> 1
>
> 1.2 An example of virtio-blk read(8K) command:
> Initiator side sends a stream buffer(command + 4 * desc + 16 bytes):
>   COMMAND            +------+
>                      |opcode|  ->  virtio_of_op_vring
>                      +------+
>                      |cmd id|  ->  14
>                      +------+
>                      |length|  ->  16 (virtio blk read command)
>                      +------+
>                      |ndesc |  ->  4
>                      +------+
>                      |rsvd  |
>                      +------+
>
>   DESC0              +------+
>                    +-|addr  |  -> 0
>                    | +------+
>                    | |length|  -> 16
>                    | +------+
>                    | |id    |  -> 14
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_NEXT
>                    | +------+
>                    |
>   DESC1            | +------+
>                    | |addr  |  -> 16
>                    | +------+
>                    | |length|  -> 4096
>                    | +------+
>                    | |id    |  -> 15
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
>                    | +------+
>                    |
>   DESC2            | +------+
>                    | |addr  |  -> 4112
>                    | +------+
>                    | |length|  -> 4096
>                    | +------+
>                    | |id    |  -> 16
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
>                    | +------+
>                    |
>   DESC3            | +------+
>                    | |addr  |  -> 0
>                    | +------+
>                    | |length|  -> 1
>                    | +------+
>                    | |id    |  -> 17
>                    | +------+
>                    | |flags |  -> VRING_DESC_F_WRITE
>                    | +------+
>                    |
>   DATA             +>+------+  -> 0
>                      |......|
>                      +------+  -> 16
>
> Target side sends a stream buffer(completion + 3 * desc + 8193 bytes):
>   COMPLETION         +------+
>                      |status|  ->  VIRTIO_OF_SUCCESS
>                      +------+
>                      |cmd id|  ->  14
>                      +------+
>                      |ndesc |  ->  3
>                      +------+
>                      |rsvd  |
>                      +------+
>                      |value |  -> 8193 (value.u32)
>                      +------+
>
>   DESC0              +------+
>                +-----|addr  |  -> 0
>                |     +------+
>                |     |length|  -> 4096
>                |     +------+
>                |     |id    |  -> 15
>                |     +------+
>                |     |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
>                |     +------+
>                |
>   DESC1        |     +------+
>                | +---|addr  |  -> 4096
>                | |   +------+
>                | |   |length|  -> 4096
>                | |   +------+
>                | |   |id    |  -> 16
>                | |   +------+
>                | |   |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
>                | |   +------+
>                | |
>   DESC2        | |   +------+
>                | |   |addr  |  -> 8192
>                | |   +------+
>                | | +-|length|  -> 1
>                | | | +------+
>                | | | |id    |  -> 17
>                | | | +------+
>                | | | |flags |  -> VRING_DESC_F_WRITE
>                | | | +------+
>                | | |
>   DATA         +-+-+>+------+  -> 0
>                  | | |......|
>                  +-+>+------+  -> 4096
>                    | |......|
>                    +>+------+  -> 8192
>                      |......|
>                      +------+  -> 8193
>
> 1. Virtio Over RDMA
> 2.1 An example of virtio-blk write(8K) command:
> Initiator side sends a message (command + 4 * desc) by RDMA POST SEND:
>   COMMAND            +------+
>                      |opcode|  ->  virtio_of_op_vring
>                      +------+
>                      |cmd id|  ->  10
>                      +------+
>                      |length|  ->  0
>                      +------+
>                      |ndesc |  ->  4
>                      +------+
>                      |rsvd  |
>                      +------+
>
>   DESC0              +------+
>                      |addr  |  -> 0xffff012345670000
>                      +------+
>                      |length|  -> 16 (virtio blk write command)
>                      +------+
>                      |id    |  -> 10
>                      +------+
>                      |flags |  -> VRING_DESC_F_NEXT
>                      +------+
>                      |key   |  -> 0x1234
>                      +------+
>
>   DESC1              +------+
>                      |addr  |  -> 0xffff012345671000
>                      +------+
>                      |length|  -> 4096
>                      +------+
>                      |id    |  -> 11
>                      +------+
>                      |flags |  -> VRING_DESC_F_NEXT
>                      +------+
>                      |key   |  -> 0x1236
>                      +------+
>
>   DESC2              +------+
>                      |addr  |  -> 0xffff012345673000
>                      +------+
>                      |length|  -> 4096
>                      +------+
>                      |id    |  -> 12
>                      +------+
>                      |flags |  -> VRING_DESC_F_NEXT
>                      +------+
>                      |key   |  -> 0x1238
>                      +------+
>
>   DESC3              +------+
>                      |addr  |  -> 0xffff012345677000
>                      +------+
>                      |length|  -> 1
>                      +------+
>                      |id    |  -> 13
>                      +------+
>                      |flags |  -> VRING_DESC_F_WRITE
>                      +------+
>                      |key   |  -> 0x1239
>                      +------+
>
> Target side reads the remote address of DESC0/DESC1/DESC2 by RDMA POST
> READ, and writes the remote address of DESC3 by RDMA POST WRITE, sends a
> completion by POST SEND:
>   COMPLETION         +------+
>                      |status|  ->  VIRTIO_OF_SUCCESS
>                      +------+
>                      |cmd id|  ->  10
>                      +------+
>                      |ndesc |  ->  0
>                      +------+
>                      |rsvd  |
>                      +------+
>                      |value |  -> 1 (value.u32)
>                      +------+
>
> 2.2 An example of virtio-blk read(8K) command:
> This is quite similar to 2.1 except flags in DESC1/DESC2, target side
> reads the remote address of DESC0 by RDMA POST READ, and writes the
> remote address of DESC1/DESC2/DESC3 by RDMA POST WRITE, sends a
> completion by POST SEND.
>
> - Example
> I develop an kernel initiator(unstable, WIP version, currently TCP/RDMA
> supported):
> https://github.com/pizhenwei/linux/tree/virtio-of-github

A quick glance at the code told me it's a mediation layer that convert
descriptors in the vring to the fabric specific packet. This is the
vDPA way.

If we agree virtio of fabic is useful, we need invent facilities to
allow building packet directly without bothering the virtqueue (the
API is layout independent anyhow).

Thanks

>
> And a target(unstable, WIP version, currently blk/crypto/rng supported):
> https://github.com/pizhenwei/virtio-target/tree/WIP
>
> Run target firstly: ~# ./vtgt vtgt.conf
> Then install kernel modules in initiator side:
>   ~# insmod ./virtio_fabrics.ko
>   ~# insmod ./virtio_tcp.ko
>   ~# insmod ./virtio_rdma.ko
>
> Create a virtio-blk device over TCP by command:
>   ~# echo
> command=create,transport=tcp,taddr=192.168.122.1,tport=15771,tvqn=virtio-target/block/block0.service,iaddr=192.168.122.1,iport=0,ivqn=vqn.uuid:42761df9-4c3f-4b27-843d-c88d1dcdce32
>  > /dev/virtio-fabrics
>
> Or create a virtio-crypto device over RDMA by command:
>   ~# echo
> command=create,transport=rdma,taddr=192.168.122.1,tport=15771,tvqn=virtio-target/crypto/crypto0.service,iaddr=192.168.122.1,iport=0,ivqn=vqn.uuid:42761df9-4c3f-4b27-843d-c88d1dcdce32
>  > /dev/virtio-fabrics
>
> Or destroy a virtio-of device by command:
>   ~# echo
> command=destroy,transport=tcp,taddr=192.168.122.1,tport=15771,tvqn=vqn.uuid:2d5130d8-36d5-4fe8-ae55-48ea51e0391a,iaddr=192.168.122.1,ivqn=vqn.uuid:42761df9-4c3f-4b27-843d-c88d1dcdce32
>  > /dev/virtio-fabrics
>
> --
> zhenwei pi
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


       reply	other threads:[~2023-04-24  3:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1ab0beff-8b18-7a94-1a68-6bf36bcd0394@bytedance.com>
2023-04-24  3:40 ` Jason Wang [this message]
     [not found]   ` <8f65c9aa-c867-0929-151c-21bbe25a0693@bytedance.com>
2023-04-25  5:03     ` [virtio-dev] Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA) Parav Pandit
2023-04-25  6:31       ` Jason Wang
2023-04-25 13:27         ` [virtio-dev] " Parav Pandit
2023-04-27  8:20       ` [virtio-dev] Re: " zhenwei pi
2023-04-27 20:31         ` [virtio-dev] " Parav Pandit
2023-04-25  6:36     ` [virtio-dev] " Jason Wang
2023-04-26  9:29       ` Xuan Zhuo
2023-04-25 13:55   ` [virtio-dev] " Stefan Hajnoczi
2023-04-26  1:08     ` [virtio-dev] " zhenwei pi
2023-04-25 14:09   ` [virtio-dev] " Stefan Hajnoczi
2023-04-26  3:03     ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACGkMEvh+G2Tm=WDTOK3K2GO8c2dQ4y=UiKAk_kvsid2MGzbGg@mail.gmail.com' \
    --to=jasowang@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=helei.sig11@bytedance.com \
    --cc=houp@yusur.tech \
    --cc=mst@redhat.com \
    --cc=parav@nvidia.com \
    --cc=pizhenwei@bytedance.com \
    --cc=virtio-comment@lists.oasis-open.org \
    --cc=virtio-dev@lists.oasis-open.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).