From: Jason Gunthorpe <jgg@ziepe.ca>
To: Tomasz Jeznach <tjeznach@rivosinc.com>
Cc: Anup Patel <apatel@ventanamicro.com>,
devicetree@vger.kernel.org, Conor Dooley <conor+dt@kernel.org>,
Albert Ou <aou@eecs.berkeley.edu>,
linux@rivosinc.com, Will Deacon <will@kernel.org>,
Joerg Roedel <joro@8bytes.org>,
linux-kernel@vger.kernel.org, Rob Herring <robh+dt@kernel.org>,
Sebastien Boeuf <seb@rivosinc.com>,
iommu@lists.linux.dev, Palmer Dabbelt <palmer@dabbelt.com>,
Paul Walmsley <paul.walmsley@sifive.com>,
Nick Kossifidis <mick@ics.forth.gr>,
Krzysztof Kozlowski <krzk+dt@kernel.org>,
Robin Murphy <robin.murphy@arm.com>,
linux-riscv@lists.infradead.org
Subject: Re: [PATCH v3 7/7] iommu/riscv: Paging domain support
Date: Tue, 7 May 2024 13:51:56 -0300 [thread overview]
Message-ID: <20240507165156.GH4718@ziepe.ca> (raw)
In-Reply-To: <CAH2o1u6g87nt=S7id-O43PubR=GaOLj-vmk7+OdTiY=Kw1BU5A@mail.gmail.com>
On Mon, May 06, 2024 at 07:22:07PM -0700, Tomasz Jeznach wrote:
> On Sun, May 5, 2024 at 8:46 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, May 03, 2024 at 12:44:09PM -0700, Tomasz Jeznach wrote:
> > > > For detach I think yes:
> > > >
> > > > Inv CPU Detach CPU
> > > >
> > > > write io_pte Update device descriptor
> > > > rcu_read_lock
> > > > list_for_each
> > > > <make invalidation command> <make description inv cmd>
> > > > dma_wmb() dma_wmb()
> > > > <doorbell> <cmd doorbell>
> > > > rcu_read_unlock
> > > > list_del_rcu()
> > > > <wipe ASID>
> > > >
> > > > In this case I think we never miss an invalidation, the list_del is
> > > > always after the HW has been fully fenced, so I don't think we can
> > > > have any issue. Maybe a suprious invalidation if the ASID gets
> > > > re-used, but who cares.
> > > >
> > > > Attach is different..
> > > >
> > > > Inv CPU Attach CPU
> > > >
> > > > write io_pte
> > > > rcu_read_lock
> > > > list_for_each // empty
> > > > list_add_rcu()
> > > > Update device descriptor
> > > > <make description inv cmd>
> > > > dma_wmb()
> > > > <cmd doorbell>
> > > > rcu_read_unlock
> > > >
> > > > As above shows we can "miss" an invalidation. The issue is narrow, the
> > > > io_pte could still be sitting in write buffers in "Inv CPU" and not
> > > > yet globally visiable. "Attach CPU" could get the device descriptor
> > > > installed in the IOMMU and the IOMMU could walk an io_pte that is in
> > > > the old state. Effectively this is because there is no release/acquire
> > > > barrier passing the io_pte store from the Inv CPU to the Attach CPU to the
> > > > IOMMU.
> > > >
> > > > It seems like it should be solvable somehow:
> > > > 1) Inv CPU releases all the io ptes
> > > > 2) Attach CPU acquires the io ptes before updating the DDT
> > > > 3) Inv CPU acquires the RCU list in such a way that either attach
> > > > CPU will acquire the io_pte or inv CPU will acquire the RCU list.
> > > > 4) Either invalidation works or we release the new iopte to the SMMU
> > > > and don't need it.
> > > >
> > > > But #3 is a really weird statement. smb_mb() on both sides may do the
> > > > job??
> > > >
> > >
> > > Actual attach sequence is slightly different.
> > >
> > > Inv CPU Attach CPU
> > >
> > > write io_pte
> > > rcu_read_lock
> > > list_for_each // empty
> > > list_add_rcu()
> > > IOTLB.INVAL(PSCID)
> > > <make description inv cmd>
> > > dma_wmb()
> > > <cmd doorbell>
> > > rcu_read_unlock
> > >
> > > I've tried to cover this case with riscv_iommu_iotlb_inval() called
> > > before the attached domain is visible to the device.
> >
> > That invalidation shouldn't do anything. If this is the first attach
> > of a PSCID then the PSCID had better already be empty, it won't become
> > non-empty until the DDT entry is installed.
> >
> > And if it is the second attach then the Inv CPU is already taking care
> > of things, no need to invalidate at all.
> >
> > Regardless, there is still a theortical race that the IOPTEs haven't
> > been made visible yet because there is still no synchronization with
> > the CPU writing them.
> >
> > So, I don't think this solves any problem. I belive you need the
> > appropriate kind of CPU barrier here instead of an invalidation.
> >
>
> Yes. There was a small, but still plausible race w/ IOPTEs visibility
> to the IOMMU.
> For v5 I'm adding two barriers to the inval/detach flow, I believe
> should cover it.
>
> 1) In riscv_iommu_iotlb_inval() unconditional dma_wmb() to make any
> pending writes to PTEs visible to the IOMMU device. This should cover
> the case when list_add_rcu() update is not yet visible in the
> _iotlb_inval() sequence, for the first time the domain is attached to
> the IOMMU.
>
> Inv CPU Attach CPU
> write io_pte
> dma_wmb (1)
> rcu_read_lock
> list_for_each // empty list_add_rcu()
> smp_wmb (2)
> Update device descriptor
> <make description inv cmd>
> // PTEs are visible to the HW (*1)
> dma_wmb()
> <cmd doorbell>
> rcu_read_unlock
>
> 2) In riscv_iommu_bond_link() write memory barrier to ensure list
> update is visible before IOMMU descriptor update. If stale data has
> been fetched by the HW, inval CPU will run iotlb-invalidation
> sequence. There is a possibility that IOMMU will fetch correct PTEs
> and will receive unnecessary IOTLB inval, but I don't think anyone
> would care.
>
> Inv CPU Attach CPU
> write io_pte list_add_rcu()
> smp_wmb (2)
> Update device descriptor
> <make description inv cmd>
> // HW might fetch stale PTEs
> dma_wmb()
> <cmd doorbell>
> dma_wmb (1)
> rcu_read_lock
> list_for_each // non-empty (*2)
> <make iotlb inval cmd>
> dma_wmb()
> <cmd doorbell>
> rcu_read_unlock
>
> 3) I've also updated riscv_iommu_bond_unlink() to wipe the PSCID cache
> on the last domain unlink from the IOMMU.
>
> Thank you for pointing this out. Let me know if that makes sense.
I'm not an expert in barriers, but I think you need the more expensive
"mb" in both cases.
The inv side is both releasing the write and acquiring the list
read. IIRC READ_ONCE is not a full acquire?
The Attach side is both releasing the list_add_rcu() and acquiring the
iopte.
rcu is still a benefit, there is no cache line sharing and there is
only one full barrier, not two, like a spinlock.
And a big fat comment in both sides explaining this :)
Jason
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2024-05-07 16:52 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-30 20:01 [PATCH v3 0/7] Linux RISC-V IOMMU Support Tomasz Jeznach
2024-04-30 20:01 ` [PATCH v3 1/7] dt-bindings: iommu: riscv: Add bindings for RISC-V IOMMU Tomasz Jeznach
2024-05-01 9:30 ` Conor Dooley
2024-05-01 13:15 ` Rob Herring
2024-05-02 2:47 ` Tomasz Jeznach
2024-05-02 15:15 ` Conor Dooley
2024-04-30 20:01 ` [PATCH v3 2/7] iommu/riscv: Add RISC-V IOMMU platform device driver Tomasz Jeznach
2024-05-01 10:26 ` Baolu Lu
2024-05-01 14:20 ` Jason Gunthorpe
2024-05-02 2:23 ` Baolu Lu
2024-05-02 2:44 ` Tomasz Jeznach
2024-04-30 20:01 ` [PATCH v3 3/7] iommu/riscv: Add RISC-V IOMMU PCIe " Tomasz Jeznach
2024-05-01 10:01 ` Baolu Lu
2024-04-30 20:01 ` [PATCH v3 4/7] iommu/riscv: Enable IOMMU registration and device probe Tomasz Jeznach
2024-05-01 9:53 ` Baolu Lu
2024-04-30 20:01 ` [PATCH v3 5/7] iommu/riscv: Device directory management Tomasz Jeznach
2024-05-01 14:57 ` Jason Gunthorpe
2024-05-02 1:38 ` Baolu Lu
2024-05-02 1:57 ` Baolu Lu
2024-05-02 2:06 ` Baolu Lu
2024-04-30 20:01 ` [PATCH v3 6/7] iommu/riscv: Command and fault queue support Tomasz Jeznach
2024-05-02 3:51 ` Baolu Lu
2024-04-30 20:01 ` [PATCH v3 7/7] iommu/riscv: Paging domain support Tomasz Jeznach
2024-05-01 14:56 ` Jason Gunthorpe
2024-05-03 17:44 ` Tomasz Jeznach
2024-05-03 18:10 ` Jason Gunthorpe
2024-05-03 19:44 ` Tomasz Jeznach
2024-05-05 15:46 ` Jason Gunthorpe
2024-05-07 2:22 ` Tomasz Jeznach
2024-05-07 16:51 ` Jason Gunthorpe [this message]
2024-05-08 16:23 ` Tomasz Jeznach
2024-05-02 3:50 ` Baolu Lu
2024-05-02 4:39 ` Tomasz Jeznach
2024-05-01 16:07 ` [PATCH v3 0/7] Linux RISC-V IOMMU Support Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240507165156.GH4718@ziepe.ca \
--to=jgg@ziepe.ca \
--cc=aou@eecs.berkeley.edu \
--cc=apatel@ventanamicro.com \
--cc=conor+dt@kernel.org \
--cc=devicetree@vger.kernel.org \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=krzk+dt@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux@rivosinc.com \
--cc=mick@ics.forth.gr \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=robh+dt@kernel.org \
--cc=robin.murphy@arm.com \
--cc=seb@rivosinc.com \
--cc=tjeznach@rivosinc.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).