Linux-RISC-V Archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Tomasz Jeznach <tjeznach@rivosinc.com>
Cc: Anup Patel <apatel@ventanamicro.com>,
	devicetree@vger.kernel.org, Conor Dooley <conor+dt@kernel.org>,
	Albert Ou <aou@eecs.berkeley.edu>,
	linux@rivosinc.com, Will Deacon <will@kernel.org>,
	Joerg Roedel <joro@8bytes.org>,
	linux-kernel@vger.kernel.org, Rob Herring <robh+dt@kernel.org>,
	Sebastien Boeuf <seb@rivosinc.com>,
	iommu@lists.linux.dev, Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Nick Kossifidis <mick@ics.forth.gr>,
	Krzysztof Kozlowski <krzk+dt@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	linux-riscv@lists.infradead.org
Subject: Re: [PATCH v3 7/7] iommu/riscv: Paging domain support
Date: Tue, 7 May 2024 13:51:56 -0300	[thread overview]
Message-ID: <20240507165156.GH4718@ziepe.ca> (raw)
In-Reply-To: <CAH2o1u6g87nt=S7id-O43PubR=GaOLj-vmk7+OdTiY=Kw1BU5A@mail.gmail.com>

On Mon, May 06, 2024 at 07:22:07PM -0700, Tomasz Jeznach wrote:
> On Sun, May 5, 2024 at 8:46 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, May 03, 2024 at 12:44:09PM -0700, Tomasz Jeznach wrote:
> > > > For detach I think yes:
> > > >
> > > >    Inv CPU                                   Detach CPU
> > > >
> > > >   write io_pte                               Update device descriptor
> > > >   rcu_read_lock
> > > >     list_for_each
> > > >       <make invalidation command>            <make description inv cmd>
> > > >       dma_wmb()                              dma_wmb()
> > > >       <doorbell>                             <cmd doorbell>
> > > >   rcu_read_unlock
> > > >                                              list_del_rcu()
> > > >                                              <wipe ASID>
> > > >
> > > > In this case I think we never miss an invalidation, the list_del is
> > > > always after the HW has been fully fenced, so I don't think we can
> > > > have any issue. Maybe a suprious invalidation if the ASID gets
> > > > re-used, but who cares.
> > > >
> > > > Attach is different..
> > > >
> > > >    Inv CPU                                   Attach CPU
> > > >
> > > >   write io_pte
> > > >   rcu_read_lock
> > > >     list_for_each // empty
> > > >                                              list_add_rcu()
> > > >                                              Update device descriptor
> > > >                                              <make description inv cmd>
> > > >                                              dma_wmb()
> > > >                                              <cmd doorbell>
> > > >   rcu_read_unlock
> > > >
> > > > As above shows we can "miss" an invalidation. The issue is narrow, the
> > > > io_pte could still be sitting in write buffers in "Inv CPU" and not
> > > > yet globally visiable. "Attach CPU" could get the device descriptor
> > > > installed in the IOMMU and the IOMMU could walk an io_pte that is in
> > > > the old state. Effectively this is because there is no release/acquire
> > > > barrier passing the io_pte store from the Inv CPU to the Attach CPU to the
> > > > IOMMU.
> > > >
> > > > It seems like it should be solvable somehow:
> > > >  1) Inv CPU releases all the io ptes
> > > >  2) Attach CPU acquires the io ptes before updating the DDT
> > > >  3) Inv CPU acquires the RCU list in such a way that either attach
> > > >     CPU will acquire the io_pte or inv CPU will acquire the RCU list.
> > > >  4) Either invalidation works or we release the new iopte to the SMMU
> > > >     and don't need it.
> > > >
> > > > But #3 is a really weird statement. smb_mb() on both sides may do the
> > > > job??
> > > >
> > >
> > > Actual attach sequence is slightly different.
> > >
> > >  Inv CPU                            Attach CPU
> > >
> > >  write io_pte
> > >   rcu_read_lock
> > >     list_for_each // empty
> > >                                     list_add_rcu()
> > >                                     IOTLB.INVAL(PSCID)
> > >                                     <make description inv cmd>
> > >                                     dma_wmb()
> > >                                     <cmd doorbell>
> > >  rcu_read_unlock
> > >
> > > I've tried to cover this case with riscv_iommu_iotlb_inval() called
> > > before the attached domain is visible to the device.
> >
> > That invalidation shouldn't do anything. If this is the first attach
> > of a PSCID then the PSCID had better already be empty, it won't become
> > non-empty until the DDT entry is installed.
> >
> > And if it is the second attach then the Inv CPU is already taking care
> > of things, no need to invalidate at all.
> >
> > Regardless, there is still a theortical race that the IOPTEs haven't
> > been made visible yet because there is still no synchronization with
> > the CPU writing them.
> >
> > So, I don't think this solves any problem. I belive you need the
> > appropriate kind of CPU barrier here instead of an invalidation.
> >
> 
> Yes. There was a small, but still plausible race w/ IOPTEs visibility
> to the IOMMU.
> For v5 I'm adding two barriers to the inval/detach flow, I believe
> should cover it.
> 
> 1) In riscv_iommu_iotlb_inval() unconditional dma_wmb() to make any
> pending writes to PTEs visible to the IOMMU device. This should cover
> the case when list_add_rcu() update is not yet visible in the
> _iotlb_inval() sequence, for the first time the domain is attached to
> the IOMMU.
> 
>   Inv CPU                                    Attach CPU
>   write io_pte
>   dma_wmb (1)
>   rcu_read_lock
>     list_for_each // empty                   list_add_rcu()
>                                              smp_wmb (2)
>                                              Update device descriptor
>                                              <make description inv cmd>
>                                              // PTEs are visible to the HW (*1)
>                                              dma_wmb()
>                                              <cmd doorbell>
>   rcu_read_unlock
> 
> 2) In riscv_iommu_bond_link() write memory barrier to ensure list
> update is visible before IOMMU descriptor update. If stale data has
> been fetched by the HW, inval CPU will run iotlb-invalidation
> sequence. There is a possibility that IOMMU will fetch correct PTEs
> and will receive unnecessary IOTLB inval, but I don't think anyone
> would care.
> 
>   Inv CPU                                    Attach CPU
>   write io_pte                               list_add_rcu()
>                                              smp_wmb (2)
>                                              Update device descriptor
>                                              <make description inv cmd>
>                                              // HW might fetch stale PTEs
>                                              dma_wmb()
>                                              <cmd doorbell>
>   dma_wmb (1)
>   rcu_read_lock
>     list_for_each // non-empty (*2)
>       <make iotlb inval cmd>
>       dma_wmb()
>       <cmd doorbell>
>   rcu_read_unlock
> 
> 3) I've also updated riscv_iommu_bond_unlink() to wipe the PSCID cache
> on the last domain unlink from the IOMMU.
> 
> Thank you for pointing this out. Let me know if that makes sense.

I'm not an expert in barriers, but I think you need the more expensive
"mb" in both cases.

The inv side is both releasing the write and acquiring the list
read. IIRC READ_ONCE is not a full acquire?

The Attach side is both releasing the list_add_rcu() and acquiring the
iopte.

rcu is still a benefit, there is no cache line sharing and there is
only one full barrier, not two, like a spinlock.

And a big fat comment in both sides explaining this :)

Jason

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2024-05-07 16:52 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-30 20:01 [PATCH v3 0/7] Linux RISC-V IOMMU Support Tomasz Jeznach
2024-04-30 20:01 ` [PATCH v3 1/7] dt-bindings: iommu: riscv: Add bindings for RISC-V IOMMU Tomasz Jeznach
2024-05-01  9:30   ` Conor Dooley
2024-05-01 13:15   ` Rob Herring
2024-05-02  2:47     ` Tomasz Jeznach
2024-05-02 15:15       ` Conor Dooley
2024-04-30 20:01 ` [PATCH v3 2/7] iommu/riscv: Add RISC-V IOMMU platform device driver Tomasz Jeznach
2024-05-01 10:26   ` Baolu Lu
2024-05-01 14:20     ` Jason Gunthorpe
2024-05-02  2:23       ` Baolu Lu
2024-05-02  2:44         ` Tomasz Jeznach
2024-04-30 20:01 ` [PATCH v3 3/7] iommu/riscv: Add RISC-V IOMMU PCIe " Tomasz Jeznach
2024-05-01 10:01   ` Baolu Lu
2024-04-30 20:01 ` [PATCH v3 4/7] iommu/riscv: Enable IOMMU registration and device probe Tomasz Jeznach
2024-05-01  9:53   ` Baolu Lu
2024-04-30 20:01 ` [PATCH v3 5/7] iommu/riscv: Device directory management Tomasz Jeznach
2024-05-01 14:57   ` Jason Gunthorpe
2024-05-02  1:38   ` Baolu Lu
2024-05-02  1:57     ` Baolu Lu
2024-05-02  2:06   ` Baolu Lu
2024-04-30 20:01 ` [PATCH v3 6/7] iommu/riscv: Command and fault queue support Tomasz Jeznach
2024-05-02  3:51   ` Baolu Lu
2024-04-30 20:01 ` [PATCH v3 7/7] iommu/riscv: Paging domain support Tomasz Jeznach
2024-05-01 14:56   ` Jason Gunthorpe
2024-05-03 17:44     ` Tomasz Jeznach
2024-05-03 18:10       ` Jason Gunthorpe
2024-05-03 19:44         ` Tomasz Jeznach
2024-05-05 15:46           ` Jason Gunthorpe
2024-05-07  2:22             ` Tomasz Jeznach
2024-05-07 16:51               ` Jason Gunthorpe [this message]
2024-05-08 16:23                 ` Tomasz Jeznach
2024-05-02  3:50   ` Baolu Lu
2024-05-02  4:39     ` Tomasz Jeznach
2024-05-01 16:07 ` [PATCH v3 0/7] Linux RISC-V IOMMU Support Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240507165156.GH4718@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=aou@eecs.berkeley.edu \
    --cc=apatel@ventanamicro.com \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=krzk+dt@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux@rivosinc.com \
    --cc=mick@ics.forth.gr \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=robh+dt@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=seb@rivosinc.com \
    --cc=tjeznach@rivosinc.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).