LKML Archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Nicolin Chen <nicolinc@nvidia.com>,
	"will@kernel.org" <will@kernel.org>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	"suravee.suthikulpanit@amd.com" <suravee.suthikulpanit@amd.com>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"linux-tegra@vger.kernel.org" <linux-tegra@vger.kernel.org>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"vasant.hegde@amd.com" <vasant.hegde@amd.com>,
	"jon.grimm@amd.com" <jon.grimm@amd.com>,
	"santosh.shukla@amd.com" <santosh.shukla@amd.com>,
	"Dhaval.Giani@amd.com" <Dhaval.Giani@amd.com>,
	"shameerali.kolothum.thodi@huawei.com"
	<shameerali.kolothum.thodi@huawei.com>
Subject: Re: [PATCH RFCv1 04/14] iommufd: Add struct iommufd_viommu and iommufd_viommu_ops
Date: Wed, 22 May 2024 10:39:05 -0300	[thread overview]
Message-ID: <20240522133905.GX20229@nvidia.com> (raw)
In-Reply-To: <BN9PR11MB5276423A0BFBDA8346E1ED3C8CEB2@BN9PR11MB5276.namprd11.prod.outlook.com>

On Wed, May 22, 2024 at 08:58:34AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, May 14, 2024 11:56 PM
> > 
> > On Sun, May 12, 2024 at 08:34:02PM -0700, Nicolin Chen wrote:
> > > On Sun, May 12, 2024 at 11:03:53AM -0300, Jason Gunthorpe wrote:
> > > > On Fri, Apr 12, 2024 at 08:47:01PM -0700, Nicolin Chen wrote:
> > > > > Add a new iommufd_viommu core structure to represent a vIOMMU
> > instance in
> > > > > the user space, typically backed by a HW-accelerated feature of an
> > IOMMU,
> > > > > e.g. NVIDIA CMDQ-Virtualization (an ARM SMMUv3 extension) and
> > AMD Hardware
> > > > > Accelerated Virtualized IOMMU (vIOMMU).
> > > >
> > > > I expect this will also be the only way to pass in an associated KVM,
> > > > userspace would supply the kvm when creating the viommu.
> > > >
> > > > The tricky bit of this flow is how to manage the S2. It is necessary
> > > > that the S2 be linked to the viommu:
> > > >
> > > >  1) ARM BTM requires the VMID to be shared with KVM
> > > >  2) AMD and others need the S2 translation because some of the HW
> > > >     acceleration is done inside the guest address space
> > > >
> > > > I haven't looked closely at AMD but presumably the VIOMMU create will
> > > > have to install the S2 into a DID or something?
> > > >
> > > > So we need the S2 to exist before the VIOMMU is created, but the
> > > > drivers are going to need some more fixing before that will fully
> > > > work.
> 
> Can you elaborate on this point? VIOMMU is a dummy container when
> it's created and the association to S2 comes relevant only until when
> VQUEUE is created inside and linked to a device? 

VIOMMU contains:
 - A nesting parent
 - A KVM
 - Any global per-VM data the driver needs
   * In ARM case this is VMID, sometimes shared with KVM
   * In AMD case this is will allocate memory in the
     "viommu backing storage memory"

Objects can be created on top of a VIOMMU:
 - A nested HWPT (iommu_hwpt_alloc::pt_id can be a viommu)
 - A vqueue (ARM/AMD)
 - Other AMD virtualized objects (EventLog, PPRLog)

It is desirable to keep the VIOMMU linked to only a single nesting
parent that never changes. Given it seems to be a small ask to
allocate the nesting parent before the VIOMMU providing it at VIOMMU
creation time looks like it will simplify the drivers because they can
rely on it always existing and never changing.

I think this lends itself to a logical layered VMM design..

 - If VFIO is being used get an iommufd
 - Allocate an IOAS for the entire guest physical
 - Determine the vIOMMU driver to use
 - Allocate a HWPT for use by all the vIOMMU instances
 - Allocate a VIOMMU per vIOMMU instance

On ARM the S2 is not divorced from the VIOMMU, ARM requires a single
VMID, shared with KVM, and localized to a single VM for some of the
bypass features (vBTM, vCMDQ). So to attach a S2 you actually have to
attach the VIOMMU to pick up the correct VMID.

I imagine something like this:
   hwpt_alloc(deva, nesting_parent=true) = shared_s2
   viommu_alloc(deva, shared_s2) = viommu1
   viommu_alloc(devb, shared_s2) = viommu2
   hwpt_alloc(deva, viommu1, vste) = deva_vste
   hwpt_alloc(devb, viommu2, vste) = devb_vste
   attach(deva, deva_vste)
   attach(devb, devb_vste)
   attach(devc, shared_s2)

The driver will then know it should program three different VMIDs for
the same S2 page table, which matches the ARM expectation for
VMID. That is to say we'd pass in the viommu as the pt_id for the
iommu_hwpt_alloc. The viommu would imply both the S2 page table and
any meta information like VMID the driver needs.

Both AMD and the vCMDQ thing need to translate some PFNs through the
S2 and program them elsewhere, this is manually done by SW, and there
are three choices I guess:
 - Have the VMM do it and provide  void __user * to the driver
 - Have the driver do it through the S2 directly and track
   S2 invalidations
 - Have the driver open an access on the IOAS and use the access unmap

Not sure which is the best..

> > Right, Intel currently doesn't need it, but I feel like everyone will
> > need this eventually as the fast invalidation path is quite important.
> 
> yes, there is no need but I don't see any harm of preparing for such
> extension on VT-d. Logically it's clearer, e.g. if we decide to move
> device TLB invalidation to a separate uAPI then vIOMMU is certainly
> a clearer object to carry it. and hardware extensions really looks like
> optimization on software implementations.
> 
> and we do need make a decision now, given if we make vIOMMU as
> a generic object for all vendors it may have potential impact on
> the user page fault support which Baolu is working on.

> the so-called
> fault object will be contained in vIOMMU, which is software managed
> on VT-d/SMMU but passed through on AMD. 

Hmm, given we currently have no known hardware entanglement between
PRI and VIOMMU it does seem OK for PRI to just exist seperate for
now. If someone needs them linked someday we can add a viommu_id to
the create pri queue command.

> And probably we don't need another handle mechanism in the attach
> path, suppose the vIOMMU object already contains necessary
> information to find out iommufd_object for a reported fault.

The viommu might be useful to have the kernel return the vRID instead
of the dev_id in the fault messages. I'm not sure how valuable this
is..

Jason

  parent reply	other threads:[~2024-05-22 13:39 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-13  3:46 [PATCH RFCv1 00/14] Add Tegra241 (Grace) CMDQV Support (part 2/2) Nicolin Chen
2024-04-13  3:46 ` [PATCH RFCv1 01/14] iommufd: Move iommufd_object to public iommufd header Nicolin Chen
2024-05-12 13:21   ` Jason Gunthorpe
2024-05-12 22:40     ` Nicolin Chen
2024-05-13 22:30       ` Jason Gunthorpe
2024-04-13  3:46 ` [PATCH RFCv1 02/14] iommufd: Swap _iommufd_object_alloc and __iommufd_object_alloc Nicolin Chen
2024-05-12 13:26   ` Jason Gunthorpe
2024-05-13  2:29     ` Nicolin Chen
2024-05-13  3:44       ` Nicolin Chen
2024-05-13 22:30       ` Jason Gunthorpe
2024-04-13  3:47 ` [PATCH RFCv1 03/14] iommufd: Prepare for viommu structures and functions Nicolin Chen
2024-05-12 13:42   ` Jason Gunthorpe
2024-05-13  2:35     ` Nicolin Chen
2024-04-13  3:47 ` [PATCH RFCv1 04/14] iommufd: Add struct iommufd_viommu and iommufd_viommu_ops Nicolin Chen
2024-05-12 14:03   ` Jason Gunthorpe
2024-05-13  3:34     ` Nicolin Chen
2024-05-14 15:55       ` Jason Gunthorpe
2024-05-22  8:58         ` Tian, Kevin
2024-05-22  9:57           ` Baolu Lu
2024-05-22 13:39           ` Jason Gunthorpe [this message]
2024-05-23  1:43             ` Tian, Kevin
2024-05-23  4:01               ` Nicolin Chen
2024-05-23  5:40                 ` Tian, Kevin
2024-05-23 12:58               ` Jason Gunthorpe
2024-05-24  2:16                 ` Tian, Kevin
2024-05-24 13:03                   ` Jason Gunthorpe
2024-05-24  2:36                 ` Tian, Kevin
2024-04-13  3:47 ` [PATCH RFCv1 05/14] iommufd: Add IOMMUFD_OBJ_VIOMMU and IOMMUFD_CMD_VIOMMU_ALLOC Nicolin Chen
2024-05-12 14:27   ` Jason Gunthorpe
2024-05-13  4:33     ` Nicolin Chen
2024-05-14 15:38       ` Jason Gunthorpe
2024-05-15  1:20         ` Nicolin Chen
2024-05-21 18:05           ` Jason Gunthorpe
2024-05-22  0:13             ` Nicolin Chen
2024-05-22 16:46               ` Jason Gunthorpe
2024-04-13  3:47 ` [PATCH RFCv1 06/14] iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage Nicolin Chen
2024-04-13  3:47 ` [PATCH RFCv1 07/14] iommufd: Add viommu set/unset_dev_id ops Nicolin Chen
2024-05-12 14:46   ` Jason Gunthorpe
2024-05-13  4:39     ` Nicolin Chen
2024-05-14 15:53       ` Jason Gunthorpe
2024-05-15  1:59         ` Nicolin Chen
2024-05-21 18:24           ` Jason Gunthorpe
2024-05-21 22:27             ` Nicolin Chen
2024-05-22 13:59               ` Jason Gunthorpe
2024-05-23  6:19             ` Tian, Kevin
2024-05-23 15:01               ` Jason Gunthorpe
2024-05-24  2:21                 ` Tian, Kevin
2024-05-24  3:26                   ` Nicolin Chen
2024-05-24  5:24                     ` Tian, Kevin
2024-05-24  5:57                       ` Nicolin Chen
2024-05-24  7:21                         ` Tian, Kevin
2024-05-24 13:12                           ` Jason Gunthorpe
2024-05-24 13:05                   ` Jason Gunthorpe
2024-05-23  5:44       ` Tian, Kevin
2024-05-23  6:09         ` Nicolin Chen
2024-05-23  6:22           ` Tian, Kevin
2024-05-23 13:33         ` Jason Gunthorpe
2024-05-12 14:51   ` Jason Gunthorpe
2024-04-13  3:47 ` [PATCH RFCv1 08/14] iommufd: Add IOMMU_VIOMMU_SET_DEV_ID ioctl Nicolin Chen
2024-05-12 14:58   ` Jason Gunthorpe
2024-05-13  5:24     ` Nicolin Chen
2024-05-17  5:14     ` Nicolin Chen
2024-05-21 18:30       ` Jason Gunthorpe
2024-05-22  2:15         ` Nicolin Chen
2024-05-23  6:42   ` Tian, Kevin
2024-05-24  5:40     ` Nicolin Chen
2024-05-24  7:13       ` Tian, Kevin
2024-05-24 13:19         ` Jason Gunthorpe
2024-05-27  1:08           ` Tian, Kevin
2024-05-28 20:22             ` Nicolin Chen
2024-05-28 20:33               ` Nicolin Chen
2024-05-29  2:58               ` Tian, Kevin
2024-05-29  3:20                 ` Nicolin Chen
2024-05-30  0:28                   ` Tian, Kevin
2024-05-30  0:58                     ` Nicolin Chen
2024-05-30  3:05                       ` Tian, Kevin
2024-05-30  4:26                         ` Nicolin Chen
2024-06-01 21:45                       ` Jason Gunthorpe
2024-06-03  3:25                         ` Nicolin Chen
2024-06-06 18:24                           ` Jason Gunthorpe
2024-06-06 18:44                             ` Nicolin Chen
2024-06-07  0:27                               ` Jason Gunthorpe
2024-06-07  0:36                                 ` Tian, Kevin
2024-06-07 14:49                                   ` Jason Gunthorpe
2024-06-07 21:19                                     ` Nicolin Chen
2024-06-10 12:04                                       ` Jason Gunthorpe
2024-06-10 20:01                                         ` Nicolin Chen
2024-06-10 22:01                                           ` Jason Gunthorpe
2024-06-10 23:04                                             ` Nicolin Chen
2024-06-11  0:28                                               ` Jason Gunthorpe
2024-06-11  0:44                                                 ` Nicolin Chen
2024-06-11 12:17                                                   ` Jason Gunthorpe
2024-06-11 19:11                                                     ` Nicolin Chen
2024-05-28 20:30             ` Nicolin Chen
2024-05-24 13:20       ` Jason Gunthorpe
2024-04-13  3:47 ` [PATCH RFCv1 09/14] iommufd/selftest: Add IOMMU_VIOMMU_SET_DEV_ID test coverage Nicolin Chen
2024-04-13  3:47 ` [PATCH RFCv1 10/14] iommufd/selftest: Add IOMMU_TEST_OP_MV_CHECK_DEV_ID Nicolin Chen
2024-04-13  3:47 ` [PATCH RFCv1 11/14] iommufd: Add struct iommufd_vqueue and its related viommu ops Nicolin Chen
2024-04-13  3:47 ` [PATCH RFCv1 12/14] iommufd: Add IOMMUFD_OBJ_VQUEUE and IOMMUFD_CMD_VQUEUE_ALLOC Nicolin Chen
2024-05-12 15:02   ` Jason Gunthorpe
2024-05-13  4:41     ` Nicolin Chen
2024-05-13 22:36       ` Jason Gunthorpe
2024-05-23  6:57     ` Tian, Kevin
2024-05-24  4:42       ` Nicolin Chen
2024-05-24  5:26         ` Tian, Kevin
2024-05-24  6:03           ` Nicolin Chen
2024-05-23  7:05   ` Tian, Kevin
2024-04-13  3:47 ` [PATCH RFCv1 13/14] iommufd: Add mmap infrastructure Nicolin Chen
2024-05-12 15:19   ` Jason Gunthorpe
2024-05-13  4:43     ` Nicolin Chen
2024-04-13  3:47 ` [PATCH RFCv1 14/14] iommu/tegra241-cmdqv: Add user-space use support Nicolin Chen
2024-05-22  8:40 ` [PATCH RFCv1 00/14] Add Tegra241 (Grace) CMDQV Support (part 2/2) Tian, Kevin
2024-05-22 16:48   ` Jason Gunthorpe
2024-05-22 19:47     ` Nicolin Chen
2024-05-22 23:28       ` Jason Gunthorpe
2024-05-22 23:43         ` Tian, Kevin
2024-05-23  3:09           ` Nicolin Chen
2024-05-23 12:48             ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240522133905.GX20229@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=Dhaval.Giani@amd.com \
    --cc=eric.auger@redhat.com \
    --cc=iommu@lists.linux.dev \
    --cc=jon.grimm@amd.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=santosh.shukla@amd.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=vasant.hegde@amd.com \
    --cc=will@kernel.org \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).