Linux-Devicetree Archive mirror
 help / color / mirror / Atom feed
From: Tomasz Jeznach <tjeznach@rivosinc.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
	 Robin Murphy <robin.murphy@arm.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	 Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	 Anup Patel <apatel@ventanamicro.com>,
	Sunil V L <sunilvl@ventanamicro.com>,
	 Nick Kossifidis <mick@ics.forth.gr>,
	Sebastien Boeuf <seb@rivosinc.com>,
	Rob Herring <robh+dt@kernel.org>,
	 Krzysztof Kozlowski <krzk+dt@kernel.org>,
	Conor Dooley <conor+dt@kernel.org>,
	devicetree@vger.kernel.org,  iommu@lists.linux.dev,
	linux-riscv@lists.infradead.org,  linux-kernel@vger.kernel.org,
	linux@rivosinc.com
Subject: Re: [PATCH v3 7/7] iommu/riscv: Paging domain support
Date: Wed, 8 May 2024 09:23:37 -0700	[thread overview]
Message-ID: <CAH2o1u7tu2NrbFs0g8y4iUdAUYUEN=O_H4C+O_LmwbuZS-zeqw@mail.gmail.com> (raw)
In-Reply-To: <20240507165156.GH4718@ziepe.ca>

On Tue, May 7, 2024 at 9:51 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Mon, May 06, 2024 at 07:22:07PM -0700, Tomasz Jeznach wrote:
> > On Sun, May 5, 2024 at 8:46 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, May 03, 2024 at 12:44:09PM -0700, Tomasz Jeznach wrote:
> > > > > For detach I think yes:
> > > > >
> > > > >    Inv CPU                                   Detach CPU
> > > > >
> > > > >   write io_pte                               Update device descriptor
> > > > >   rcu_read_lock
> > > > >     list_for_each
> > > > >       <make invalidation command>            <make description inv cmd>
> > > > >       dma_wmb()                              dma_wmb()
> > > > >       <doorbell>                             <cmd doorbell>
> > > > >   rcu_read_unlock
> > > > >                                              list_del_rcu()
> > > > >                                              <wipe ASID>
> > > > >
> > > > > In this case I think we never miss an invalidation, the list_del is
> > > > > always after the HW has been fully fenced, so I don't think we can
> > > > > have any issue. Maybe a suprious invalidation if the ASID gets
> > > > > re-used, but who cares.
> > > > >
> > > > > Attach is different..
> > > > >
> > > > >    Inv CPU                                   Attach CPU
> > > > >
> > > > >   write io_pte
> > > > >   rcu_read_lock
> > > > >     list_for_each // empty
> > > > >                                              list_add_rcu()
> > > > >                                              Update device descriptor
> > > > >                                              <make description inv cmd>
> > > > >                                              dma_wmb()
> > > > >                                              <cmd doorbell>
> > > > >   rcu_read_unlock
> > > > >
> > > > > As above shows we can "miss" an invalidation. The issue is narrow, the
> > > > > io_pte could still be sitting in write buffers in "Inv CPU" and not
> > > > > yet globally visiable. "Attach CPU" could get the device descriptor
> > > > > installed in the IOMMU and the IOMMU could walk an io_pte that is in
> > > > > the old state. Effectively this is because there is no release/acquire
> > > > > barrier passing the io_pte store from the Inv CPU to the Attach CPU to the
> > > > > IOMMU.
> > > > >
> > > > > It seems like it should be solvable somehow:
> > > > >  1) Inv CPU releases all the io ptes
> > > > >  2) Attach CPU acquires the io ptes before updating the DDT
> > > > >  3) Inv CPU acquires the RCU list in such a way that either attach
> > > > >     CPU will acquire the io_pte or inv CPU will acquire the RCU list.
> > > > >  4) Either invalidation works or we release the new iopte to the SMMU
> > > > >     and don't need it.
> > > > >
> > > > > But #3 is a really weird statement. smb_mb() on both sides may do the
> > > > > job??
> > > > >
> > > >
> > > > Actual attach sequence is slightly different.
> > > >
> > > >  Inv CPU                            Attach CPU
> > > >
> > > >  write io_pte
> > > >   rcu_read_lock
> > > >     list_for_each // empty
> > > >                                     list_add_rcu()
> > > >                                     IOTLB.INVAL(PSCID)
> > > >                                     <make description inv cmd>
> > > >                                     dma_wmb()
> > > >                                     <cmd doorbell>
> > > >  rcu_read_unlock
> > > >
> > > > I've tried to cover this case with riscv_iommu_iotlb_inval() called
> > > > before the attached domain is visible to the device.
> > >
> > > That invalidation shouldn't do anything. If this is the first attach
> > > of a PSCID then the PSCID had better already be empty, it won't become
> > > non-empty until the DDT entry is installed.
> > >
> > > And if it is the second attach then the Inv CPU is already taking care
> > > of things, no need to invalidate at all.
> > >
> > > Regardless, there is still a theortical race that the IOPTEs haven't
> > > been made visible yet because there is still no synchronization with
> > > the CPU writing them.
> > >
> > > So, I don't think this solves any problem. I belive you need the
> > > appropriate kind of CPU barrier here instead of an invalidation.
> > >
> >
> > Yes. There was a small, but still plausible race w/ IOPTEs visibility
> > to the IOMMU.
> > For v5 I'm adding two barriers to the inval/detach flow, I believe
> > should cover it.
> >
> > 1) In riscv_iommu_iotlb_inval() unconditional dma_wmb() to make any
> > pending writes to PTEs visible to the IOMMU device. This should cover
> > the case when list_add_rcu() update is not yet visible in the
> > _iotlb_inval() sequence, for the first time the domain is attached to
> > the IOMMU.
> >
> >   Inv CPU                                    Attach CPU
> >   write io_pte
> >   dma_wmb (1)
> >   rcu_read_lock
> >     list_for_each // empty                   list_add_rcu()
> >                                              smp_wmb (2)
> >                                              Update device descriptor
> >                                              <make description inv cmd>
> >                                              // PTEs are visible to the HW (*1)
> >                                              dma_wmb()
> >                                              <cmd doorbell>
> >   rcu_read_unlock
> >
> > 2) In riscv_iommu_bond_link() write memory barrier to ensure list
> > update is visible before IOMMU descriptor update. If stale data has
> > been fetched by the HW, inval CPU will run iotlb-invalidation
> > sequence. There is a possibility that IOMMU will fetch correct PTEs
> > and will receive unnecessary IOTLB inval, but I don't think anyone
> > would care.
> >
> >   Inv CPU                                    Attach CPU
> >   write io_pte                               list_add_rcu()
> >                                              smp_wmb (2)
> >                                              Update device descriptor
> >                                              <make description inv cmd>
> >                                              // HW might fetch stale PTEs
> >                                              dma_wmb()
> >                                              <cmd doorbell>
> >   dma_wmb (1)
> >   rcu_read_lock
> >     list_for_each // non-empty (*2)
> >       <make iotlb inval cmd>
> >       dma_wmb()
> >       <cmd doorbell>
> >   rcu_read_unlock
> >
> > 3) I've also updated riscv_iommu_bond_unlink() to wipe the PSCID cache
> > on the last domain unlink from the IOMMU.
> >
> > Thank you for pointing this out. Let me know if that makes sense.
>
> I'm not an expert in barriers, but I think you need the more expensive
> "mb" in both cases.
>
> The inv side is both releasing the write and acquiring the list
> read. IIRC READ_ONCE is not a full acquire?
>
> The Attach side is both releasing the list_add_rcu() and acquiring the
> iopte.
>
> rcu is still a benefit, there is no cache line sharing and there is
> only one full barrier, not two, like a spinlock.
>
> And a big fat comment in both sides explaining this :)
>

I'm not an expert in barriers as well, but I've checked offline with one ;)

And the conclusion is that we need FENCE W,W (or stronger) on the
attach side, and FENCE RW,RW in the invalidation sequence.  Hardware
access to PTE/DC is sequential, with implied FENCE R,R barriers.

As 'attach' sequence is a rare event anyway, I'll go on "mb" on both
sides, as suggested.
Unless there are other opinions on that I'll update barriers to match
this conclusion and try to capture in long comment for bond / inval /
attach synchronization assumptions.

Jason, thanks again for pointing this out.


> Jason

Best,
- Tomasz

  reply	other threads:[~2024-05-08 16:23 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-30 20:01 [PATCH v3 0/7] Linux RISC-V IOMMU Support Tomasz Jeznach
2024-04-30 20:01 ` [PATCH v3 1/7] dt-bindings: iommu: riscv: Add bindings for RISC-V IOMMU Tomasz Jeznach
2024-05-01  9:30   ` Conor Dooley
2024-05-01 13:15   ` Rob Herring
2024-05-02  2:47     ` Tomasz Jeznach
2024-05-02 15:15       ` Conor Dooley
2024-04-30 20:01 ` [PATCH v3 2/7] iommu/riscv: Add RISC-V IOMMU platform device driver Tomasz Jeznach
2024-05-01 10:26   ` Baolu Lu
2024-05-01 14:20     ` Jason Gunthorpe
2024-05-02  2:23       ` Baolu Lu
2024-05-02  2:44         ` Tomasz Jeznach
2024-04-30 20:01 ` [PATCH v3 3/7] iommu/riscv: Add RISC-V IOMMU PCIe " Tomasz Jeznach
2024-05-01 10:01   ` Baolu Lu
2024-04-30 20:01 ` [PATCH v3 4/7] iommu/riscv: Enable IOMMU registration and device probe Tomasz Jeznach
2024-05-01  9:53   ` Baolu Lu
2024-04-30 20:01 ` [PATCH v3 5/7] iommu/riscv: Device directory management Tomasz Jeznach
2024-05-01 14:57   ` Jason Gunthorpe
2024-05-02  1:38   ` Baolu Lu
2024-05-02  1:57     ` Baolu Lu
2024-05-02  2:06   ` Baolu Lu
2024-04-30 20:01 ` [PATCH v3 6/7] iommu/riscv: Command and fault queue support Tomasz Jeznach
2024-05-02  3:51   ` Baolu Lu
2024-04-30 20:01 ` [PATCH v3 7/7] iommu/riscv: Paging domain support Tomasz Jeznach
2024-05-01 14:56   ` Jason Gunthorpe
2024-05-03 17:44     ` Tomasz Jeznach
2024-05-03 18:10       ` Jason Gunthorpe
2024-05-03 19:44         ` Tomasz Jeznach
2024-05-05 15:46           ` Jason Gunthorpe
2024-05-07  2:22             ` Tomasz Jeznach
2024-05-07 16:51               ` Jason Gunthorpe
2024-05-08 16:23                 ` Tomasz Jeznach [this message]
2024-05-02  3:50   ` Baolu Lu
2024-05-02  4:39     ` Tomasz Jeznach
2024-05-01 16:07 ` [PATCH v3 0/7] Linux RISC-V IOMMU Support Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAH2o1u7tu2NrbFs0g8y4iUdAUYUEN=O_H4C+O_LmwbuZS-zeqw@mail.gmail.com' \
    --to=tjeznach@rivosinc.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=apatel@ventanamicro.com \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=joro@8bytes.org \
    --cc=krzk+dt@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux@rivosinc.com \
    --cc=mick@ics.forth.gr \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=robh+dt@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=seb@rivosinc.com \
    --cc=sunilvl@ventanamicro.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).