LKML Archive mirror
 help / color / mirror / Atom feed
From: Yan Zhao <yan.y.zhao@intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: <kvm@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<x86@kernel.org>, <alex.williamson@redhat.com>,
	<kevin.tian@intel.com>, <iommu@lists.linux.dev>,
	<pbonzini@redhat.com>, <seanjc@google.com>,
	<dave.hansen@linux.intel.com>, <luto@kernel.org>,
	<peterz@infradead.org>, <tglx@linutronix.de>, <mingo@redhat.com>,
	<bp@alien8.de>, <hpa@zytor.com>, <corbet@lwn.net>,
	<joro@8bytes.org>, <will@kernel.org>, <robin.murphy@arm.com>,
	<baolu.lu@linux.intel.com>, <yi.l.liu@intel.com>
Subject: Re: [PATCH 5/5] iommufd: Flush CPU caches on DMA pages in non-coherent domains
Date: Wed, 22 May 2024 11:17:00 +0800	[thread overview]
Message-ID: <Zk1jrI8bOR5vYKlc@yzhao56-desk.sh.intel.com> (raw)
In-Reply-To: <20240521160442.GI20229@nvidia.com>

On Tue, May 21, 2024 at 01:04:42PM -0300, Jason Gunthorpe wrote:
> On Mon, May 20, 2024 at 10:45:56AM +0800, Yan Zhao wrote:
> > On Fri, May 17, 2024 at 02:04:18PM -0300, Jason Gunthorpe wrote:
> > > On Thu, May 16, 2024 at 10:32:43AM +0800, Yan Zhao wrote:
> > > > On Wed, May 15, 2024 at 05:43:04PM -0300, Jason Gunthorpe wrote:
> > > > > On Wed, May 15, 2024 at 03:06:36PM +0800, Yan Zhao wrote:
> > > > > 
> > > > > > > So it has to be calculated on closer to a page by page basis (really a
> > > > > > > span by span basis) if flushing of that span is needed based on where
> > > > > > > the pages came from. Only pages that came from a hwpt that is
> > > > > > > non-coherent can skip the flushing.
> > > > > > Is area by area basis also good?
> > > > > > Isn't an area either not mapped to any domain or mapped into all domains?
> > > > > 
> > > > > Yes, this is what the span iterator turns into in the background, it
> > > > > goes area by area to cover things.
> > > > > 
> > > > > > But, yes, considering the limited number of non-coherent domains, it appears
> > > > > > more robust and clean to always flush for non-coherent domain in
> > > > > > iopt_area_fill_domain().
> > > > > > It eliminates the need to decide whether to retain the area flag during a split.
> > > > > 
> > > > > And flush for pin user pages, so you basically always flush because
> > > > > you can't tell where the pages came from.
> > > > As a summary, do you think it's good to flush in below way?
> > > > 
> > > > 1. in iopt_area_fill_domains(), flush before mapping a page into domains when
> > > >    iopt->noncoherent_domain_cnt > 0, no matter where the page is from.
> > > >    Record cache_flush_required in pages for unpin.
> > > > 2. in iopt_area_fill_domain(), pass in hwpt to check domain non-coherency.
> > > >    flush before mapping a page into a non-coherent domain, no matter where the
> > > >    page is from.
> > > >    Record cache_flush_required in pages for unpin.
> > > > 3. in batch_unpin(), flush if pages->cache_flush_required before
> > > >    unpin_user_pages.
> > > 
> > > It does not quite sound right, there should be no tracking in the
> > > pages of this stuff.
> > What's the downside of having tracking in the pages?
> 
> Well, a counter doesn't make sense. You could have a single sticky bit
> that indicates that all PFNs are coherency dirty and overflush them on
> every map and unmap operation.
cache_flush_required is a sticky bit actually. It's set if any PFN in the
iopt_pages is mapped into a noncoherent domain.
batch_unpin() checks this sticky bit for flush.

@@ -198,6 +198,11 @@ struct iopt_pages {
        void __user *uptr;
        bool writable:1;
        u8 account_mode;
+       /*
+        * CPU cache flush is required before mapping the pages to or after
+        * unmapping it from a noncoherent domain
+        */
+       bool cache_flush_required:1;

(Please ignore the confusing comment).

iopt->noncoherent_domain_cnt is a counter. It's increased/decreased on
non-coherent hwpt attach/detach.

@@ -53,6 +53,7 @@ struct io_pagetable {
        struct rb_root_cached reserved_itree;
        u8 disable_large_pages;
        unsigned long iova_alignment;
+       unsigned int noncoherent_domain_cnt;
 };

Since iopt->domains contains no coherency info, this counter helps
iopt_area_fill_domains() to decide whether to flush pages and set sticky bit
cache_flush_required in iopt_pages.
Though it's not that useful to iopt_area_fill_domain(), after your suggestion
to pass in hwpt.

> This is certainly the simplest option, but gives the maximal flushes.

Why does this give the maximal flushes?
Considering the flush after unmap,
- With a sticky bit in iopt_pages, once a iopt_pages has been mapped into a
  non-coherent domain, the PFNs in the iopt_pages will be flushed for only once
  right before the page is unpinned.

- But if we do the flush after each iopt_area_unmap_domain_range() for each
  non-coherent domain, then the flush count for each PFN is the count of
  non-coherent domains.

> 
> If you want to minimize flushes then you can't store flush
> minimization information in the pages because it isn't global to the
> pages and will not be accurate enough.
> 
> > > If pfn_reader_fill_span() does batch_from_domain() and
> > > the source domain's storage_domain is non-coherent then you can skip
> > > the flush. This is not pedantically perfect in skipping all flushes, but
> > > in practice it is probably good enough.
> 
> > We don't know whether the source storage_domain is non-coherent since
> > area->storage_domain is of "struct iommu_domain".
>  
> > Do you want to add a flag in "area", e.g. area->storage_domain_is_noncoherent,
> > and set this flag along side setting storage_domain?
> 
> Sure, that could work.
When the storage_domain is set in iopt_area_fill_domains(),
    "area->storage_domain = xa_load(&area->iopt->domains, 0);"
is there a convenient way to know the storage_domain is non-coherent?

> 
> > > __iopt_area_unfill_domain() (and children) must flush after
> > > iopt_area_unmap_domain_range() if the area's domain is
> > > non-coherent. This is also not perfect, but probably good enough.
> > Do you mean flush after each iopt_area_unmap_domain_range() if the domain is
> > non-coherent?
> > The problem is that iopt_area_unmap_domain_range() knows only IOVA, the
> > IOVA->PFN relationship is not available without iommu_iova_to_phys() and
> > iommu_domain contains no coherency info.
> 
> Yes, you'd have to read back the PFNs on this path which it doesn't do
> right now.. Given this pain it would be simpler to have one bit in the
> pages that marks it permanently non-coherent and all pfns will be
> flushed before put_page is called.
> 
> The trouble with a counter is that the count going to zero doesn't
> really mean we flushed the PFN if it is being held someplace else.
Not sure if you are confused between iopt->noncoherent_domain_cnt and
pages->cache_flush_required.

iopt->noncoherent_domain_cnt is increased/decreased on non-coherent hwpt
attach/detach.

Once iopt->noncoherent_domain_cnt is non-zero, sticky bit cache_flush_required
in iopt_pages will be set during filling domain, PFNs in the iopt_pages will be
flushed right before unpinning even though iopt->noncoherent_domain_cnt might
have gone to 0 at that time.

  reply	other threads:[~2024-05-22  3:18 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-07  6:18 [PATCH 0/5] Enforce CPU cache flush for non-coherent device assignment Yan Zhao
2024-05-07  6:19 ` [PATCH 1/5] x86/pat: Let pat_pfn_immune_to_uc_mtrr() check MTRR for untracked PAT range Yan Zhao
2024-05-07  8:26   ` Tian, Kevin
2024-05-07  9:12     ` Yan Zhao
2024-05-08 22:14       ` Alex Williamson
2024-05-09  3:36         ` Yan Zhao
2024-05-16  7:42       ` Tian, Kevin
2024-05-16 14:07         ` Sean Christopherson
2024-05-20  2:36           ` Tian, Kevin
2024-05-07  6:20 ` [PATCH 2/5] KVM: x86/mmu: Fine-grained check of whether a invalid & RAM PFN is MMIO Yan Zhao
2024-05-07  8:39   ` Tian, Kevin
2024-05-07  9:19     ` Yan Zhao
2024-05-07  6:20 ` [PATCH 3/5] x86/mm: Introduce and export interface arch_clean_nonsnoop_dma() Yan Zhao
2024-05-07  8:51   ` Tian, Kevin
2024-05-07  9:40     ` Yan Zhao
2024-05-20 14:07   ` Christoph Hellwig
2024-05-21 15:49     ` Jason Gunthorpe
2024-05-21 16:00       ` Jason Gunthorpe
2024-05-22  3:41         ` Yan Zhao
2024-05-28  6:37         ` Christoph Hellwig
2024-06-01 19:46           ` Jason Gunthorpe
2024-06-06  2:48             ` Yan Zhao
2024-06-06 11:55               ` Jason Gunthorpe
2024-06-07  9:39                 ` Yan Zhao
2024-05-07  6:21 ` [PATCH 4/5] vfio/type1: Flush CPU caches on DMA pages in non-coherent domains Yan Zhao
2024-05-09 18:10   ` Alex Williamson
2024-05-10 10:31     ` Yan Zhao
2024-05-10 16:57       ` Alex Williamson
2024-05-13  7:11         ` Yan Zhao
2024-05-16  7:53           ` Tian, Kevin
2024-05-16  8:34           ` Tian, Kevin
2024-05-16 20:31             ` Alex Williamson
2024-05-17 17:11               ` Jason Gunthorpe
2024-05-20  2:52                 ` Tian, Kevin
2024-05-21 16:07                   ` Jason Gunthorpe
2024-05-21 16:21                     ` Alex Williamson
2024-05-21 16:34                       ` Jason Gunthorpe
2024-05-21 18:19                         ` Alex Williamson
2024-05-21 18:37                           ` Jason Gunthorpe
2024-05-22  6:24                             ` Tian, Kevin
2024-05-22 12:29                               ` Jason Gunthorpe
2024-05-22 14:43                                 ` Alex Williamson
2024-05-22 16:52                                   ` Jason Gunthorpe
2024-05-22 18:22                                     ` Alex Williamson
2024-05-22 23:26                                 ` Tian, Kevin
2024-05-22 23:32                                   ` Jason Gunthorpe
2024-05-22 23:40                                     ` Tian, Kevin
2024-05-23 14:58                                       ` Jason Gunthorpe
2024-05-23 22:47                                         ` Alex Williamson
2024-05-24  0:30                                           ` Tian, Kevin
2024-05-24 13:50                                           ` Jason Gunthorpe
2024-05-22  3:33                           ` Yan Zhao
2024-05-22  3:24                         ` Yan Zhao
2024-05-22 12:26                           ` Jason Gunthorpe
2024-05-24  3:07                             ` Yan Zhao
2024-05-16 20:50           ` Alex Williamson
2024-05-17  3:11             ` Yan Zhao
2024-05-17  4:44               ` Alex Williamson
2024-05-17  5:00                 ` Yan Zhao
2024-05-07  6:22 ` [PATCH 5/5] iommufd: " Yan Zhao
2024-05-09 14:13   ` Jason Gunthorpe
2024-05-10  8:03     ` Yan Zhao
2024-05-10 13:29       ` Jason Gunthorpe
2024-05-13  7:43         ` Yan Zhao
2024-05-14 15:11           ` Jason Gunthorpe
2024-05-15  7:06             ` Yan Zhao
2024-05-15 20:43               ` Jason Gunthorpe
2024-05-16  2:32                 ` Yan Zhao
2024-05-16  8:38                   ` Tian, Kevin
2024-05-16  9:48                     ` Yan Zhao
2024-05-17 17:04                   ` Jason Gunthorpe
2024-05-20  2:45                     ` Yan Zhao
2024-05-21 16:04                       ` Jason Gunthorpe
2024-05-22  3:17                         ` Yan Zhao [this message]
2024-05-22  6:29                           ` Yan Zhao
2024-05-22 17:01                             ` Jason Gunthorpe
2024-05-27  7:15                               ` Yan Zhao
2024-06-01 16:48                                 ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zk1jrI8bOR5vYKlc@yzhao56-desk.sh.intel.com \
    --to=yan.y.zhao@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=robin.murphy@arm.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).