All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolin Chen <nicolinc@nvidia.com>
To: Eric Auger <eric.auger@redhat.com>,
	Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	Linuxarm <linuxarm@huawei.com>,
	Zhangfei Gao <zhangfei.gao@linaro.org>,
	Michael Shavit <mshavit@google.com>,
	"Moritz Fischer" <mdf@kernel.org>,
	"baolu.lu@linux.intel.com" <baolu.lu@linux.intel.com>
Subject: Re: Query on ARM SMMUv3 nested support
Date: Tue, 9 Apr 2024 12:47:42 -0700	[thread overview]
Message-ID: <ZhWbXkl4fzWGR+VR@Asurada-Nvidia> (raw)
In-Reply-To: <ZhTcaSGOvsIkZDy1@Asurada-Nvidia>

On Mon, Apr 08, 2024 at 11:13:02PM -0700, Nicolin Chen wrote:
> Sorry for my belated reply.
> 
> On Tue, Apr 02, 2024 at 07:25:56AM +0000, Shameerali Kolothum Thodi wrote:
> 
> > Here we have specified nested-smmuv3 and have  a PCIe root port(ioh3420) and
> > a virtio-pci dev(for virtfs). Once this VM is up and running, both the ioh3420
> > and virtio-9p-pci will not work as expected as we return sysmem address space
> > for them.
> > 
> > And then later when you try to hotplug a vfio dev to the PCIe root port,
> > device_add vfio-pci,host=0000:7d:02.1,bus=rp1,id=net1,iommufd=iommufd0
> > 
> > it won't work either because the address space for ioh3420 is wrong.
> > 
> > Taking another look at this, I think we can replace my earlier attempt to fix
> > this by detecting if the dev is vfio or not and add that into the check below,
> > 
> > --- a/hw/arm/smmu-common.c
> > +++ b/hw/arm/smmu-common.c
> > @@ -619,8 +619,15 @@ static AddressSpace *smmu_find_add_as(PCIBus
> > *bus, void *opaque, int devfn)
> >      SMMUState *s = opaque;
> >      SMMUPciBus *sbus = smmu_get_sbus(s, bus);
> >      SMMUDevice *sdev = smmu_get_sdev(s, sbus, bus, devfn);
> > +    bool is_vfio = false;
> > +    PCIDevice *pdev;
> > 
> > -    if (s->nested && !s->s2_hwpt) {
> > +    pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
> > +    if (object_dynamic_cast(OBJECT(pdev), "vfio-pci")) {
> > +        is_vfio = true;
> > +    }
> > +
> > +    if (is_vfio && s->nested && !s->s2_hwpt) {
> >          return &sdev->as_sysmem;
> >      } else {
> >          return &sdev->as;
> > 
> > Please let me know your thoughts on this.
> 
> Hmm, checking "vfio-pci" here feels a bit of layer violation..
> But I understand how you got there...
> 
> I think that the current way of enabling the "nested" flag for
> a device is a bit messy:
>     if -machine has "iommu=nested-smmuv3" &&
>        -device is "vfio_pci" &&
>        -device has "iommufd"
> 
> Any other device that doesn't meet the latter two conditions
> is ambiguously added to a soft smmuv3, which feels wrong to
> me now.
> 
> Also, the callbacks from hw/vfio/pci.c to iommu are tricky as
> well:
>     vfio_realize() {
>         pci_device_iommu_address_space() {
>             smmu_find_add_as();   // Can only assume vfio-pci is behind
>                                   // a nested SMMU. So returns system AS
>         }
>         vfio_attach_device() {
>             ioctl(VFIO_DEVICE_BIND_IOMMUFD);
>             ioctl(VFIO_DEVICE_ATTACH_IOMMUFD_PT);
>             listener_add_address_space() {
>                 if (memory_region_is_iommu(section->mr)) { // Skip
>                     iommu_notifier_init(IOMMU_NOTIFIER_IOTLB_EVENTS);
>                     memory_region_iommu_replay();
>                 }
>             }
>         }
>     }
> 
> In my view, we should sort this out with a cleaner enabling
> command first. And then, as Jason pointed out, other devices
> (non vfio-pci) should probably not be added to the SMMU, i.e.
> they might need to be added to a separate PCI bus/bridge with
> out iommu. In this case, the vSMMU module would not need an
> iommu AS at all when "iommu=nested-smmuv3".

Just recalled that kvm_arch_fixup_msi_route still requires a
"vfio-pci" device to return an IOMMUMemoryRegion v.s. system
memory. And that's why there's a "!s2_hwpt" condition (hack)
when returning the system AS, so that (at the beginning) the
vfio attach could bypass the iotlb registeration using system
AS and (after s2_hwpt is allocated) MSI could translate the
MSI address using iommu AS.

Otherwise, such a device could (likely "should" in my view)
stay with the system AS, since it doesn't need the emulated 
iotlb/cache at all. Yet in that case, we would need another
pci_device_msi_address_space for MSI..

@Eric,
Would it be possible for you shed some light also?

In a summary, we have the following opens:
1. How should we enable "nesting" for a passthrough device?
   - Currently is a bit ambiguous:
     if -machine has "iommu=nested-smmuv3" &&
        -device is "vfio_pci" &&
        -device has "iommufd"
   - Actually "-device has 'iommufd'" only guarantees using
     the IOMMUFD cdev APIs v.s. the legacy one, i.e. it's not
     necessary to enable nesting supposedly?

2. How should we handle a non passthrough device (no nesting)?
   - Current is a bit ambiguous again by adding it to the soft
     SMMUv3. But, is it necessary? It doesn't seem to have any
     positive performance gain.
   - Yet not adding it behind the SMMUv3 would require another
     noiommu PCI bus to dock such a device, I assume?

3. Should we keep a nesting-enabled passthrough device in the
   system AS v.s. iommu AS?
   - The reasoning is that it doesn't need an emulated iotlb
     or cache since it uses the HW one.
   - Also, with a HW-accelerated VCMDQ on NVIDIA Grace, the
     emulated iotlb should be turned off because there will be
     no TLBI/ATC command trapped by QEMU, as all those commands
     will be issued to the VCMDQ HW directly.
   - The only usage of an iommu AS is the MSI code, as it's a
     bit unique requiring for a gIOVA to gPA translation. So,
     keeping such a device in the system AS would need another
     pci_device_msi_address_space, which feels reasonable since
     we need something similar in the kernel too:
     https://lore.kernel.org/lkml/f65a3d7e3e0e0c7b2f5cfc03ada9618aa8b523bb.1683688960.git.nicolinc@nvidia.com/

Thanks
Nicolin

      reply	other threads:[~2024-04-09 19:47 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-13 10:13 Query on ARM SMMUv3 nested support Shameerali Kolothum Thodi
2024-03-13 23:50 ` Nicolin Chen
2024-03-18 16:22   ` Jason Gunthorpe
2024-03-22 15:04   ` Shameerali Kolothum Thodi
2024-03-29  7:13     ` Nicolin Chen
2024-04-02  7:25       ` Shameerali Kolothum Thodi
2024-04-02 11:28         ` Jason Gunthorpe
2024-04-09  6:12         ` Nicolin Chen
2024-04-09 19:47           ` Nicolin Chen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZhWbXkl4fzWGR+VR@Asurada-Nvidia \
    --to=nicolinc@nvidia.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=eric.auger@redhat.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=linuxarm@huawei.com \
    --cc=mdf@kernel.org \
    --cc=mshavit@google.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=zhangfei.gao@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.