LKML Archive mirror
 help / color / mirror / Atom feed
From: Yan Zhao <yan.y.zhao@intel.com>
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, alex.williamson@redhat.com, jgg@nvidia.com,
	kevin.tian@intel.com
Cc: iommu@lists.linux.dev, pbonzini@redhat.com, seanjc@google.com,
	dave.hansen@linux.intel.com, luto@kernel.org,
	peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, hpa@zytor.com, corbet@lwn.net, joro@8bytes.org,
	will@kernel.org, robin.murphy@arm.com, baolu.lu@linux.intel.com,
	yi.l.liu@intel.com, Yan Zhao <yan.y.zhao@intel.com>
Subject: [PATCH 5/5] iommufd: Flush CPU caches on DMA pages in non-coherent domains
Date: Tue,  7 May 2024 14:22:12 +0800	[thread overview]
Message-ID: <20240507062212.20535-1-yan.y.zhao@intel.com> (raw)
In-Reply-To: <20240507061802.20184-1-yan.y.zhao@intel.com>

Flush CPU cache on DMA pages before mapping them into the first
non-coherent domain (domain that does not enforce cache coherency, i.e. CPU
caches are not force-snooped) and after unmapping them from the last
domain.

Devices attached to non-coherent domains can execute non-coherent DMAs
(DMAs that lack CPU cache snooping) to access physical memory with CPU
caches bypassed.

Such a scenario could be exploited by a malicious guest, allowing them to
access stale host data in memory rather than the data initialized by the
host (e.g., zeros) in the cache, thus posing a risk of information leakage
attack.

Furthermore, the host kernel (e.g. a ksm thread) might encounter
inconsistent data between the CPU cache and memory (left by a malicious
guest) after a page is unpinned for DMA but before it's recycled.

Therefore, it is required to flush the CPU cache before a page is
accessible to non-coherent DMAs and after the page is inaccessible to
non-coherent DMAs.

However, the CPU cache is not flushed immediately when the page is unmapped
from the last non-coherent domain. Instead, the flushing is performed
lazily, right before the page is unpinned.
Take the following example to illustrate the process. The CPU cache is
flushed right before step 2 and step 5.
1. A page is mapped into a coherent domain.
2. The page is mapped into a non-coherent domain.
3. The page is unmapped from the non-coherent domain e.g.due to hot-unplug.
4. The page is unmapped from the coherent domain.
5. The page is unpinned.

Reasons for adopting this lazily flushing design include:
- There're several unmap paths and only one unpin path. Lazily flush before
  unpin wipes out the inconsistency between cache and physical memory
  before a page is globally visible and produces code that is simpler, more
  maintainable and easier to backport.
- Avoid dividing a large unmap range into several smaller ones or
  allocating additional memory to hold IOVA to HPA relationship.

Unlike "has_noncoherent_domain" flag used in vfio_iommu, the
"noncoherent_domain_cnt" counter is implemented in io_pagetable to track
whether an iopt has non-coherent domains attached.
Such a difference is because in iommufd only hwpt of type paging contains
flag "enforce_cache_coherency" and iommu domains in io_pagetable has no
flag "enforce_cache_coherency" as that in vfio_domain.
A counter in io_pagetable can avoid traversing ioas->hwpt_list and holding
ioas->mutex.

Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Closes: https://lore.kernel.org/lkml/20240109002220.GA439767@nvidia.com
Fixes: e8d57210035b ("iommufd: Add kAPI toward external drivers for physical devices")
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/iommu/iommufd/hw_pagetable.c    | 19 +++++++++--
 drivers/iommu/iommufd/io_pagetable.h    |  5 +++
 drivers/iommu/iommufd/iommufd_private.h |  1 +
 drivers/iommu/iommufd/pages.c           | 44 +++++++++++++++++++++++--
 4 files changed, 65 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 33d142f8057d..e3099d732c5c 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -14,12 +14,18 @@ void iommufd_hwpt_paging_destroy(struct iommufd_object *obj)
 		container_of(obj, struct iommufd_hwpt_paging, common.obj);
 
 	if (!list_empty(&hwpt_paging->hwpt_item)) {
+		struct io_pagetable *iopt = &hwpt_paging->ioas->iopt;
 		mutex_lock(&hwpt_paging->ioas->mutex);
 		list_del(&hwpt_paging->hwpt_item);
 		mutex_unlock(&hwpt_paging->ioas->mutex);
 
-		iopt_table_remove_domain(&hwpt_paging->ioas->iopt,
-					 hwpt_paging->common.domain);
+		iopt_table_remove_domain(iopt, hwpt_paging->common.domain);
+
+		if (!hwpt_paging->enforce_cache_coherency) {
+			down_write(&iopt->domains_rwsem);
+			iopt->noncoherent_domain_cnt--;
+			up_write(&iopt->domains_rwsem);
+		}
 	}
 
 	if (hwpt_paging->common.domain)
@@ -176,6 +182,12 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 			goto out_abort;
 	}
 
+	if (!hwpt_paging->enforce_cache_coherency) {
+		down_write(&ioas->iopt.domains_rwsem);
+		ioas->iopt.noncoherent_domain_cnt++;
+		up_write(&ioas->iopt.domains_rwsem);
+	}
+
 	rc = iopt_table_add_domain(&ioas->iopt, hwpt->domain);
 	if (rc)
 		goto out_detach;
@@ -183,6 +195,9 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 	return hwpt_paging;
 
 out_detach:
+	down_write(&ioas->iopt.domains_rwsem);
+	ioas->iopt.noncoherent_domain_cnt--;
+	up_write(&ioas->iopt.domains_rwsem);
 	if (immediate_attach)
 		iommufd_hw_pagetable_detach(idev);
 out_abort:
diff --git a/drivers/iommu/iommufd/io_pagetable.h b/drivers/iommu/iommufd/io_pagetable.h
index 0ec3509b7e33..557da8fb83d9 100644
--- a/drivers/iommu/iommufd/io_pagetable.h
+++ b/drivers/iommu/iommufd/io_pagetable.h
@@ -198,6 +198,11 @@ struct iopt_pages {
 	void __user *uptr;
 	bool writable:1;
 	u8 account_mode;
+	/*
+	 * CPU cache flush is required before mapping the pages to or after
+	 * unmapping it from a noncoherent domain
+	 */
+	bool cache_flush_required:1;
 
 	struct xarray pinned_pfns;
 	/* Of iopt_pages_access::node */
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 991f864d1f9b..fc77fd43b232 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -53,6 +53,7 @@ struct io_pagetable {
 	struct rb_root_cached reserved_itree;
 	u8 disable_large_pages;
 	unsigned long iova_alignment;
+	unsigned int noncoherent_domain_cnt;
 };
 
 void iopt_init_table(struct io_pagetable *iopt);
diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index 528f356238b3..8f4b939cba5b 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -272,6 +272,17 @@ struct pfn_batch {
 	unsigned int total_pfns;
 };
 
+static void iopt_cache_flush_pfn_batch(struct pfn_batch *batch)
+{
+	unsigned long cur, i;
+
+	for (cur = 0; cur < batch->end; cur++) {
+		for (i = 0; i < batch->npfns[cur]; i++)
+			arch_clean_nonsnoop_dma(PFN_PHYS(batch->pfns[cur] + i),
+						PAGE_SIZE);
+	}
+}
+
 static void batch_clear(struct pfn_batch *batch)
 {
 	batch->total_pfns = 0;
@@ -637,10 +648,18 @@ static void batch_unpin(struct pfn_batch *batch, struct iopt_pages *pages,
 	while (npages) {
 		size_t to_unpin = min_t(size_t, npages,
 					batch->npfns[cur] - first_page_off);
+		unsigned long pfn = batch->pfns[cur] + first_page_off;
+
+		/*
+		 * Lazily flushing CPU caches when a page is about to be
+		 * unpinned if the page was mapped into a noncoherent domain
+		 */
+		if (pages->cache_flush_required)
+			arch_clean_nonsnoop_dma(pfn << PAGE_SHIFT,
+						to_unpin << PAGE_SHIFT);
 
 		unpin_user_page_range_dirty_lock(
-			pfn_to_page(batch->pfns[cur] + first_page_off),
-			to_unpin, pages->writable);
+			pfn_to_page(pfn), to_unpin, pages->writable);
 		iopt_pages_sub_npinned(pages, to_unpin);
 		cur++;
 		first_page_off = 0;
@@ -1358,10 +1377,17 @@ int iopt_area_fill_domain(struct iopt_area *area, struct iommu_domain *domain)
 {
 	unsigned long done_end_index;
 	struct pfn_reader pfns;
+	bool cache_flush_required;
 	int rc;
 
 	lockdep_assert_held(&area->pages->mutex);
 
+	cache_flush_required = area->iopt->noncoherent_domain_cnt &&
+			       !area->pages->cache_flush_required;
+
+	if (cache_flush_required)
+		area->pages->cache_flush_required = true;
+
 	rc = pfn_reader_first(&pfns, area->pages, iopt_area_index(area),
 			      iopt_area_last_index(area));
 	if (rc)
@@ -1369,6 +1395,9 @@ int iopt_area_fill_domain(struct iopt_area *area, struct iommu_domain *domain)
 
 	while (!pfn_reader_done(&pfns)) {
 		done_end_index = pfns.batch_start_index;
+		if (cache_flush_required)
+			iopt_cache_flush_pfn_batch(&pfns.batch);
+
 		rc = batch_to_domain(&pfns.batch, domain, area,
 				     pfns.batch_start_index);
 		if (rc)
@@ -1413,6 +1442,7 @@ int iopt_area_fill_domains(struct iopt_area *area, struct iopt_pages *pages)
 	unsigned long unmap_index;
 	struct pfn_reader pfns;
 	unsigned long index;
+	bool cache_flush_required;
 	int rc;
 
 	lockdep_assert_held(&area->iopt->domains_rwsem);
@@ -1426,9 +1456,19 @@ int iopt_area_fill_domains(struct iopt_area *area, struct iopt_pages *pages)
 	if (rc)
 		goto out_unlock;
 
+	cache_flush_required = area->iopt->noncoherent_domain_cnt &&
+			       !pages->cache_flush_required;
+
+	if (cache_flush_required)
+		pages->cache_flush_required = true;
+
 	while (!pfn_reader_done(&pfns)) {
 		done_first_end_index = pfns.batch_end_index;
 		done_all_end_index = pfns.batch_start_index;
+
+		if (cache_flush_required)
+			iopt_cache_flush_pfn_batch(&pfns.batch);
+
 		xa_for_each(&area->iopt->domains, index, domain) {
 			rc = batch_to_domain(&pfns.batch, domain, area,
 					     pfns.batch_start_index);
-- 
2.17.1


  parent reply	other threads:[~2024-05-07  6:22 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-07  6:18 [PATCH 0/5] Enforce CPU cache flush for non-coherent device assignment Yan Zhao
2024-05-07  6:19 ` [PATCH 1/5] x86/pat: Let pat_pfn_immune_to_uc_mtrr() check MTRR for untracked PAT range Yan Zhao
2024-05-07  8:26   ` Tian, Kevin
2024-05-07  9:12     ` Yan Zhao
2024-05-08 22:14       ` Alex Williamson
2024-05-09  3:36         ` Yan Zhao
2024-05-16  7:42       ` Tian, Kevin
2024-05-16 14:07         ` Sean Christopherson
2024-05-20  2:36           ` Tian, Kevin
2024-05-07  6:20 ` [PATCH 2/5] KVM: x86/mmu: Fine-grained check of whether a invalid & RAM PFN is MMIO Yan Zhao
2024-05-07  8:39   ` Tian, Kevin
2024-05-07  9:19     ` Yan Zhao
2024-05-07  6:20 ` [PATCH 3/5] x86/mm: Introduce and export interface arch_clean_nonsnoop_dma() Yan Zhao
2024-05-07  8:51   ` Tian, Kevin
2024-05-07  9:40     ` Yan Zhao
2024-05-20 14:07   ` Christoph Hellwig
2024-05-21 15:49     ` Jason Gunthorpe
2024-05-21 16:00       ` Jason Gunthorpe
2024-05-22  3:41         ` Yan Zhao
2024-05-28  6:37         ` Christoph Hellwig
2024-06-01 19:46           ` Jason Gunthorpe
2024-06-06  2:48             ` Yan Zhao
2024-06-06 11:55               ` Jason Gunthorpe
2024-06-07  9:39                 ` Yan Zhao
2024-05-07  6:21 ` [PATCH 4/5] vfio/type1: Flush CPU caches on DMA pages in non-coherent domains Yan Zhao
2024-05-09 18:10   ` Alex Williamson
2024-05-10 10:31     ` Yan Zhao
2024-05-10 16:57       ` Alex Williamson
2024-05-13  7:11         ` Yan Zhao
2024-05-16  7:53           ` Tian, Kevin
2024-05-16  8:34           ` Tian, Kevin
2024-05-16 20:31             ` Alex Williamson
2024-05-17 17:11               ` Jason Gunthorpe
2024-05-20  2:52                 ` Tian, Kevin
2024-05-21 16:07                   ` Jason Gunthorpe
2024-05-21 16:21                     ` Alex Williamson
2024-05-21 16:34                       ` Jason Gunthorpe
2024-05-21 18:19                         ` Alex Williamson
2024-05-21 18:37                           ` Jason Gunthorpe
2024-05-22  6:24                             ` Tian, Kevin
2024-05-22 12:29                               ` Jason Gunthorpe
2024-05-22 14:43                                 ` Alex Williamson
2024-05-22 16:52                                   ` Jason Gunthorpe
2024-05-22 18:22                                     ` Alex Williamson
2024-05-22 23:26                                 ` Tian, Kevin
2024-05-22 23:32                                   ` Jason Gunthorpe
2024-05-22 23:40                                     ` Tian, Kevin
2024-05-23 14:58                                       ` Jason Gunthorpe
2024-05-23 22:47                                         ` Alex Williamson
2024-05-24  0:30                                           ` Tian, Kevin
2024-05-24 13:50                                           ` Jason Gunthorpe
2024-05-22  3:33                           ` Yan Zhao
2024-05-22  3:24                         ` Yan Zhao
2024-05-22 12:26                           ` Jason Gunthorpe
2024-05-24  3:07                             ` Yan Zhao
2024-05-16 20:50           ` Alex Williamson
2024-05-17  3:11             ` Yan Zhao
2024-05-17  4:44               ` Alex Williamson
2024-05-17  5:00                 ` Yan Zhao
2024-05-07  6:22 ` Yan Zhao [this message]
2024-05-09 14:13   ` [PATCH 5/5] iommufd: " Jason Gunthorpe
2024-05-10  8:03     ` Yan Zhao
2024-05-10 13:29       ` Jason Gunthorpe
2024-05-13  7:43         ` Yan Zhao
2024-05-14 15:11           ` Jason Gunthorpe
2024-05-15  7:06             ` Yan Zhao
2024-05-15 20:43               ` Jason Gunthorpe
2024-05-16  2:32                 ` Yan Zhao
2024-05-16  8:38                   ` Tian, Kevin
2024-05-16  9:48                     ` Yan Zhao
2024-05-17 17:04                   ` Jason Gunthorpe
2024-05-20  2:45                     ` Yan Zhao
2024-05-21 16:04                       ` Jason Gunthorpe
2024-05-22  3:17                         ` Yan Zhao
2024-05-22  6:29                           ` Yan Zhao
2024-05-22 17:01                             ` Jason Gunthorpe
2024-05-27  7:15                               ` Yan Zhao
2024-06-01 16:48                                 ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240507062212.20535-1-yan.y.zhao@intel.com \
    --to=yan.y.zhao@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=robin.murphy@arm.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).