From: Parav Pandit <parav@nvidia.com>
To: <mst@redhat.com>, <virtio-dev@lists.oasis-open.org>,
<cohuck@redhat.com>, <david.edmondson@oracle.com>
Cc: <sburla@marvell.com>, <jasowang@redhat.com>, <yishaih@nvidia.com>,
<maorg@nvidia.com>, <virtio-comment@lists.oasis-open.org>,
<shahafs@nvidia.com>, Parav Pandit <parav@nvidia.com>
Subject: [virtio-comment] [PATCH v3 0/3] transport-pci: Introduce legacy registers access using AQ
Date: Fri, 2 Jun 2023 23:36:01 +0300 [thread overview]
Message-ID: <20230602203604.627661-1-parav@nvidia.com> (raw)
This short series introduces legacy registers access commands for the owner
group member PCI PF to access the legacy registers of the member VFs.
If in future any SIOV devices to support legacy registers, they
can be easily supported using same commands by using the group
member identifiers of the future SIOV devices.
More details as overview, motivation, use case are further described
below.
Patch summary:
--------------
patch-1 split rows of admin opcode tables by a line
patch-2 adds administrative virtuqueue commands
patch-3 adds its conformance section
This short series is on top of latest work [1] from Michael.
It uses the newly introduced administrative virtqueue facility with 3 new
commands which uses the existing virtio_admin_cmd.
[1] https://lists.oasis-open.org/archives/virtio-comment/202305/msg00112.html
Usecase:
--------
1. A hypervisor/system needs to provide transitional
virtio devices to the guest VM at scale of thousands,
typically, one to eight devices per VM.
2. A hypervisor/system needs to provide such devices using a
vendor agnostic driver in the hypervisor system.
3. A hypervisor system prefers to have single stack regardless of
virtio device type (net/blk) and be future compatible with a
single vfio stack using SR-IOV or other scalable device
virtualization technology to map PCI devices to the guest VM.
(as transitional or otherwise)
Motivation/Background:
----------------------
The existing virtio transitional PCI device is missing support for
PCI SR-IOV based devices. Currently it does not work beyond
PCI PF, or as software emulated device in reality. Currently it
has below cited system level limitations:
[a] PCIe spec citation:
VFs do not support I/O Space and thus VF BARs shall not indicate I/O Space.
[b] cpu arch citiation:
Intel 64 and IA-32 Architectures Software Developer’s Manual:
The processor’s I/O address space is separate and distinct from
the physical-memory address space. The I/O address space consists
of 64K individually addressable 8-bit I/O ports, numbered 0 through FFFFH.
[c] PCIe spec citation:
If a bridge implements an I/O address range,...I/O address range will be
aligned to a 4 KB boundary.
Overview:
---------
Above usecase requirements can be solved by PCI PF group owner accessing
its group member PCI VFs legacy registers using an admin virtqueue of
the group owner PCI PF.
Two new admin virtqueue commands are added which read/write PCI VF
registers.
The third command suggested by Jason queries the VF device's driver
notification region.
Software usage example:
-----------------------
One way to use and map to the guest VM is by using vfio driver
framework in Linux kernel.
+----------------------+
|pci_dev_id = 0x100X |
+---------------|pci_rev_id = 0x0 |-----+
|vfio device |BAR0 = I/O region | |
| |Other attributes | |
| +----------------------+ |
| |
+ +--------------+ +-----------------+ |
| |I/O BAR to AQ | | Other vfio | |
| |rd/wr mapper | | functionalities | |
| +--------------+ +-----------------+ |
| |
+------+-------------------------+-----------+
| |
| Driver notification
| |
| |
+----+------------+ +----+------------+
| +-----+ | | PCI VF device A |
| | AQ |-------------+---->+-------------+ |
| +-----+ | | | | legacy regs | |
| PCI PF device | | | +-------------+ |
+-----------------+ | +-----------------+
|
| +----+------------+
| | PCI VF device N |
+---->+-------------+ |
| | legacy regs | |
| +-------------+ |
+-----------------+
2. Virtio pci driver to bind to the listed device id and
use it as native device in the host.
3. Use it in a light weight hypervisor to run bare-metal OS.
Please review.
Alternatives considered:
========================
1. Exposing BAR0 as MMIO BAR that follows legacy registers template
Pros:
a. Kind of works with legacy drivers as some of them have API
which is agnostic to MMIO vs IOBAR.
b. Does not require hypervisor intervantion
Cons:
a. Device reset is extremely hard to implement in device at scale as
driver does not wait for reset completion
b. Device register width related problems persist that hypervisor if
wishes, cannot fix it.
2. Accessing VF registers by tunneling it through new legacy PCI capability
Pros:
a. Self contained, but cannot work with future PCI SIOV devices
Cons:
a. Equally slow as AQ access
b. Still requires new capability for notification access
conclusion for picking AQ approach:
==================================
1. Overall AQ based access is simpler to implement with combination of
best from software and device so that legacy registers do not get baked
in the device hardware
2. AQ allows hypervisor software to intercept legacy registers and make
corrections if needed
3. Provides trade-off between performance, device complexity vs spec,
while still maintaining passthrough mode for the VFs with minimal
hypervisor intercepts only for legacy registers access
4. AQ mechanism is designed for accessing other member devices registers
as noted in AQ submission, it utilizes the existing infrastructure over
other alternatives.
Fixes: https://github.com/oasis-tcs/virtio-spec/issues/167
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
changelog:
v2->v3:
- added new patch to split raws of admin vq opcode table
- adddressed Jason and Michael's comment to split single register
access command to common config and device specific commands.
- dropped the suggetion to introduce enable/disable command as
admin command cap bit already covers it.
- added other alternative design considered and discussed in detail in v0, v1 and v2
v1->v2:
- addressed comments from Michael
- added theory of operation
- grammar corrections
- removed group fields description from individual commands as
it is already present in generic section
- added endianness normative for legacy device registers region
- renamed the file to drop vf and add legacy prefix
- added overview in commit log
- renamed subsection to reflect command
v0->v1:
- addressed comments, suggesetions and ideas from Michael Tsirkin and Jason Wang
- far more simpler design than MMR access
- removed complexities of MMR device ids
- removed complexities of MMR registers and extended capabilities
- dropped adding new extended capabilities because if if they are
added, a pci device still needs to have existing capabilities
in the legacy configuration space and hypervisor driver do not
need to access them
Parav Pandit (3):
admin: Split opcode table rows with a line
transport-pci: Introduce legacy registers access commands
transport-pci: Add legacy register access conformance section
admin.tex | 14 ++-
conformance.tex | 2 +
transport-pci-legacy-regs.tex | 189 ++++++++++++++++++++++++++++++++++
transport-pci.tex | 2 +
4 files changed, 206 insertions(+), 1 deletion(-)
create mode 100644 transport-pci-legacy-regs.tex
--
2.26.2
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
next reply other threads:[~2023-06-02 20:36 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-02 20:36 Parav Pandit [this message]
2023-06-02 20:36 ` [virtio-comment] [PATCH v3 1/3] admin: Split opcode table rows with a line Parav Pandit
2023-06-02 20:36 ` [virtio-comment] [PATCH v3 2/3] transport-pci: Introduce legacy registers access commands Parav Pandit
2023-06-04 13:22 ` [virtio-comment] " Michael S. Tsirkin
2023-06-04 13:51 ` [virtio-comment] " Parav Pandit
2023-06-04 14:13 ` [virtio-comment] " Michael S. Tsirkin
2023-06-04 14:32 ` [virtio-comment] " Parav Pandit
2023-06-04 14:41 ` [virtio-comment] " Michael S. Tsirkin
2023-06-04 15:01 ` [virtio-comment] " Parav Pandit
2023-06-04 22:10 ` [virtio-comment] " Michael S. Tsirkin
2023-06-04 23:57 ` [virtio-comment] " Parav Pandit
2023-06-08 18:34 ` [virtio-comment] " Michael S. Tsirkin
2023-06-08 18:55 ` [virtio-comment] " Parav Pandit
2023-06-08 19:00 ` [virtio-comment] " Michael S. Tsirkin
2023-06-08 19:04 ` [virtio-comment] " Parav Pandit
2023-06-02 20:36 ` [virtio-comment] [PATCH v3 3/3] transport-pci: Add legacy register access conformance section Parav Pandit
2023-06-04 13:34 ` [virtio-comment] Re: [PATCH v3 0/3] transport-pci: Introduce legacy registers access using AQ Michael S. Tsirkin
2023-06-04 13:41 ` [virtio-comment] " Parav Pandit
2023-06-04 13:55 ` [virtio-comment] " Michael S. Tsirkin
2023-06-04 14:10 ` [virtio-comment] " Parav Pandit
2023-06-04 14:23 ` [virtio-comment] " Michael S. Tsirkin
2023-06-04 14:48 ` [virtio-comment] " Parav Pandit
2023-06-04 14:53 ` [virtio-comment] " Michael S. Tsirkin
2023-06-04 15:07 ` [virtio-comment] " Parav Pandit
2023-06-04 21:48 ` [virtio-comment] " Michael S. Tsirkin
2023-06-04 23:40 ` [virtio-comment] " Parav Pandit
2023-06-05 5:51 ` [virtio-comment] " Michael S. Tsirkin
2023-06-05 13:27 ` [virtio-comment] " Parav Pandit
2023-06-05 13:50 ` [virtio-comment] " Michael S. Tsirkin
2023-06-05 16:04 ` [virtio-comment] " Parav Pandit
2023-06-05 21:57 ` [virtio-comment] " Michael S. Tsirkin
2023-06-05 22:12 ` [virtio-comment] RE: [virtio-dev] " Parav Pandit
2023-06-06 11:56 ` [virtio-comment] " Michael S. Tsirkin
2023-06-06 20:15 ` [virtio-comment] " Parav Pandit
2023-06-07 2:27 ` [virtio-comment] " Jason Wang
2023-06-07 3:05 ` [virtio-comment] " Parav Pandit
2023-06-07 6:54 ` [virtio-comment] " Jason Wang
2023-06-07 8:54 ` Michael S. Tsirkin
2023-06-08 14:38 ` [virtio-comment] " Parav Pandit
2023-06-08 14:44 ` [virtio-comment] " Michael S. Tsirkin
2023-06-08 14:53 ` [virtio-comment] " Parav Pandit
2023-06-08 15:03 ` [virtio-comment] " Michael S. Tsirkin
2023-06-08 15:16 ` [virtio-comment] " Parav Pandit
2023-06-08 18:03 ` [virtio-comment] " Michael S. Tsirkin
2023-06-08 18:11 ` [virtio-comment] " Parav Pandit
2023-06-08 18:31 ` [virtio-comment] " Michael S. Tsirkin
2023-06-08 19:00 ` [virtio-comment] " Parav Pandit
2023-06-08 19:03 ` [virtio-comment] " Michael S. Tsirkin
2023-06-08 19:12 ` [virtio-comment] " Parav Pandit
2023-06-09 2:06 ` [virtio-comment] " Jason Wang
2023-06-09 2:29 ` [virtio-comment] " Parav Pandit
2023-06-09 2:42 ` [virtio-comment] " Jason Wang
2023-06-09 2:53 ` [virtio-comment] " Parav Pandit
2023-06-09 2:56 ` [virtio-comment] " Jason Wang
2023-06-09 2:58 ` Parav Pandit
2023-06-09 3:02 ` Jason Wang
2023-06-09 3:25 ` Parav Pandit
2023-06-09 6:27 ` Jason Wang
2023-06-09 7:21 ` Michael S. Tsirkin
2023-06-09 17:11 ` Parav Pandit
2023-06-11 0:27 ` Michael S. Tsirkin
2023-06-11 2:08 ` Parav Pandit
2023-06-11 7:14 ` Michael S. Tsirkin
2023-06-11 12:54 ` Parav Pandit
2023-06-11 20:09 ` Michael S. Tsirkin
2023-06-11 20:17 ` Parav Pandit
2023-06-11 23:15 ` Michael S. Tsirkin
2023-06-26 3:46 ` Jason Wang
2023-06-26 3:32 ` Jason Wang
2023-06-26 3:51 ` Parav Pandit
2023-06-27 2:38 ` Jason Wang
2023-06-27 3:17 ` Parav Pandit
2023-06-27 4:33 ` Jason Wang
2023-06-26 3:50 ` Jason Wang
2023-06-26 3:55 ` Parav Pandit
2023-06-26 10:49 ` Michael S. Tsirkin
2023-06-09 7:15 ` Michael S. Tsirkin
2023-06-26 3:59 ` Jason Wang
2023-06-26 4:04 ` Parav Pandit
2023-06-27 2:42 ` Jason Wang
2023-06-26 7:13 ` Michael S. Tsirkin
2023-06-07 8:57 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230602203604.627661-1-parav@nvidia.com \
--to=parav@nvidia.com \
--cc=cohuck@redhat.com \
--cc=david.edmondson@oracle.com \
--cc=jasowang@redhat.com \
--cc=maorg@nvidia.com \
--cc=mst@redhat.com \
--cc=sburla@marvell.com \
--cc=shahafs@nvidia.com \
--cc=virtio-comment@lists.oasis-open.org \
--cc=virtio-dev@lists.oasis-open.org \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).