[PATCH 0/5] Enforce CPU cache flush for non-coherent device assignment

LKML Archive mirror
 help / color / mirror / Atom feed

From: Yan Zhao <yan.y.zhao@intel.com>
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, alex.williamson@redhat.com, jgg@nvidia.com,
	kevin.tian@intel.com
Cc: iommu@lists.linux.dev, pbonzini@redhat.com, seanjc@google.com,
	dave.hansen@linux.intel.com, luto@kernel.org,
	peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, hpa@zytor.com, corbet@lwn.net, joro@8bytes.org,
	will@kernel.org, robin.murphy@arm.com, baolu.lu@linux.intel.com,
	yi.l.liu@intel.com, Yan Zhao <yan.y.zhao@intel.com>
Subject: [PATCH 0/5] Enforce CPU cache flush for non-coherent device assignment
Date: Tue,  7 May 2024 14:18:02 +0800	[thread overview]
Message-ID: <20240507061802.20184-1-yan.y.zhao@intel.com> (raw)

This is a follow-up series to fix the security risk for non-coherent device
assignment raised by Jason in [1].

When IOMMU does not enforce cache coherency, devices are allowed to perform
non-coherent DMAs (DMAs that lack CPU cache snooping). This scenario poses
a risk of information leakage when the device is assigned into a VM.
Specifically, a malicious guest could potentially retrieve stale host data
through non-coherent DMA reads of physical memory, while data initialized
by host (e.g., zeros) still resides in the cache.

Furthermore, host kernel (e.g. a ksm thread) might encounter inconsistent
data between the CPU cache and physical memory (left by a malicious guest)
after a page is unpinned for DMA but before the page is recycled.

Therefore, a mitigation in VFIO/IOMMUFD is required to flush CPU caches on
pages involved in non-coherent DMAs prior to or following their mapping or
unmapping to or from the IOMMU.

The mitigation is not implemented in DMA API layer, so as to avoid slowing
down the DMA API users. Users of the DMA API are expected to take care of
CPU cache flushing in one of two ways: (a) by using the DMA API which is
aware of the non-coherence and does the flushes internally or (b) be aware
of its flushing needs and handle them on its own if they are overriding the
platform using no-snoop. A general mitigation in DMA API layer will only
come when non-coherent DMAs are common, which, however, is not the case
(now only Intel GPU and some ARM devices).

Also the mitigation is not implemented in IOMMU core for VMs exclusively,
because it would make a large IOTLB flush range being split due to the
absence of information regarding to IOVA-PFN relationship in IOMMU core.

Given non-coherent devices exist both on x86 and ARM, this series
introduces an arch helper to flush CPU caches for non-coherent DMAs which
is available for both VFIO and IOMMUFD, though current only implementation
for x86 is provided.

Series Layout:

Patch 1 first fixes an error in pat_pfn_immune_to_uc_mtrr() which always
        returns WB for untracked PAT ranges. This error leads to KVM
        treating all PFNs within these untracked PAT ranges as cacheable
        memory types, even when a PFN's MTRR type is UC. (An example is for
        VGA range from 0xa0000-0xbffff).
        Patch 3 will use pat_pfn_immune_to_uc_mtrr() to determine
        uncacheable PFNs.

Patch 2 is a side fix in KVM to prevent guest cacheable access to PFNs
        mapped as UC in host.

Patch 3 introduces and exports an arch helper arch_clean_nonsnoop_dma() to
        flush CPU cachelines. It takes physical address and size as inputs
        and provides a implementation for x86.
        Given that executing CLFLUSH on certain MMIO ranges on x86 can be
        problematic, potentially causing machine check exceptions on some
        platforms, while flushing is necessary on some other MMIO ranges
        (e.g., some MMIO ranges for PMEM), this patch determines
        cacheability by consulting the PAT (if enabled) or MTRR type (if
        PAT is disabled). It assesses whether a PFN is considered as
        uncacheable by the host. For reserved pages or !pfn_valid() PFN,
        CLFLUSH is avoided if the PFN is recognized as uncacheable on the
        host.

Patch 4/5 implement a mitigation in vfio/iommufd to flush CPU caches
         - before a page is accessible to non-coherent DMAs,
         - after the page is inaccessible to non-coherent DMAs, and right
           before it's unpinned for DMAs.

Performance data:

The overhead of flushing CPU caches is measured below:
CPU MHz:4494.377, 4 vCPU, 8G guest memory
Pass-through GPU: 1G aperture

Across each VM boot up and tear down,

IOMMUFD     |     Map        |   Unmap        | Teardown 
------------|----------------|----------------|-------------
w/o clflush | 1167M          |   40M          |  201M
w/  clflush | 2400M (+1233M) |  276M (+236M)  | 1160M (+959M)

Map = total cycles of iommufd_ioas_map() during VM boot up
Unmap = total cycles of iommufd_ioas_unmap() during VM boot up
Teardown = total cycles of iommufd_hwpt_paging_destroy() at VM teardown

VFIO        |     Map        |   Unmap        | Teardown 
------------|----------------|----------------|-------------
w/o clflush | 3058M          |  379M          |  448M
w/  clflush | 5664M (+2606M) | 1653M (+1274M) | 1522M (+1074M)

Map = total cycles of vfio_dma_do_map() during VM boot up
Unmap = total cycles of vfio_dma_do_unmap() during VM boot up
Teardown = total cycles of vfio_iommu_type1_detach_group() at VM teardown

[1] https://lore.kernel.org/lkml/20240109002220.GA439767@nvidia.com

Yan Zhao (5):
  x86/pat: Let pat_pfn_immune_to_uc_mtrr() check MTRR for untracked PAT
    range
  KVM: x86/mmu: Fine-grained check of whether a invalid & RAM PFN is
    MMIO
  x86/mm: Introduce and export interface arch_clean_nonsnoop_dma()
  vfio/type1: Flush CPU caches on DMA pages in non-coherent domains
  iommufd: Flush CPU caches on DMA pages in non-coherent domains

 arch/x86/include/asm/cacheflush.h       |  3 +
 arch/x86/kvm/mmu/spte.c                 | 14 +++-
 arch/x86/mm/pat/memtype.c               | 12 +++-
 arch/x86/mm/pat/set_memory.c            | 88 +++++++++++++++++++++++++
 drivers/iommu/iommufd/hw_pagetable.c    | 19 +++++-
 drivers/iommu/iommufd/io_pagetable.h    |  5 ++
 drivers/iommu/iommufd/iommufd_private.h |  1 +
 drivers/iommu/iommufd/pages.c           | 44 ++++++++++++-
 drivers/vfio/vfio_iommu_type1.c         | 51 ++++++++++++++
 include/linux/cacheflush.h              |  6 ++
 10 files changed, 237 insertions(+), 6 deletions(-)

base-commit: e67572cd2204894179d89bd7b984072f19313b03
-- 
2.17.1

next             reply	other threads:[~2024-05-07  6:19 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-07  6:18 Yan Zhao [this message]
2024-05-07  6:19 ` [PATCH 1/5] x86/pat: Let pat_pfn_immune_to_uc_mtrr() check MTRR for untracked PAT range Yan Zhao
2024-05-07  8:26   ` Tian, Kevin
2024-05-07  9:12     ` Yan Zhao
2024-05-08 22:14       ` Alex Williamson
2024-05-09  3:36         ` Yan Zhao
2024-05-16  7:42       ` Tian, Kevin
2024-05-16 14:07         ` Sean Christopherson
2024-05-20  2:36           ` Tian, Kevin
2024-05-07  6:20 ` [PATCH 2/5] KVM: x86/mmu: Fine-grained check of whether a invalid & RAM PFN is MMIO Yan Zhao
2024-05-07  8:39   ` Tian, Kevin
2024-05-07  9:19     ` Yan Zhao
2024-05-07  6:20 ` [PATCH 3/5] x86/mm: Introduce and export interface arch_clean_nonsnoop_dma() Yan Zhao
2024-05-07  8:51   ` Tian, Kevin
2024-05-07  9:40     ` Yan Zhao
2024-05-20 14:07   ` Christoph Hellwig
2024-05-21 15:49     ` Jason Gunthorpe
2024-05-21 16:00       ` Jason Gunthorpe
2024-05-22  3:41         ` Yan Zhao
2024-05-07  6:21 ` [PATCH 4/5] vfio/type1: Flush CPU caches on DMA pages in non-coherent domains Yan Zhao
2024-05-09 18:10   ` Alex Williamson
2024-05-10 10:31     ` Yan Zhao
2024-05-10 16:57       ` Alex Williamson
2024-05-13  7:11         ` Yan Zhao
2024-05-16  7:53           ` Tian, Kevin
2024-05-16  8:34           ` Tian, Kevin
2024-05-16 20:31             ` Alex Williamson
2024-05-17 17:11               ` Jason Gunthorpe
2024-05-20  2:52                 ` Tian, Kevin
2024-05-21 16:07                   ` Jason Gunthorpe
2024-05-21 16:21                     ` Alex Williamson
2024-05-21 16:34                       ` Jason Gunthorpe
2024-05-21 18:19                         ` Alex Williamson
2024-05-21 18:37                           ` Jason Gunthorpe
2024-05-22  6:24                             ` Tian, Kevin
2024-05-22 12:29                               ` Jason Gunthorpe
2024-05-22 14:43                                 ` Alex Williamson
2024-05-22 16:52                                   ` Jason Gunthorpe
2024-05-22 18:22                                     ` Alex Williamson
2024-05-22 23:26                                 ` Tian, Kevin
2024-05-22 23:32                                   ` Jason Gunthorpe
2024-05-22 23:40                                     ` Tian, Kevin
2024-05-23 14:58                                       ` Jason Gunthorpe
2024-05-22  3:33                           ` Yan Zhao
2024-05-22  3:24                         ` Yan Zhao
2024-05-22 12:26                           ` Jason Gunthorpe
2024-05-16 20:50           ` Alex Williamson
2024-05-17  3:11             ` Yan Zhao
2024-05-17  4:44               ` Alex Williamson
2024-05-17  5:00                 ` Yan Zhao
2024-05-07  6:22 ` [PATCH 5/5] iommufd: " Yan Zhao
2024-05-09 14:13   ` Jason Gunthorpe
2024-05-10  8:03     ` Yan Zhao
2024-05-10 13:29       ` Jason Gunthorpe
2024-05-13  7:43         ` Yan Zhao
2024-05-14 15:11           ` Jason Gunthorpe
2024-05-15  7:06             ` Yan Zhao
2024-05-15 20:43               ` Jason Gunthorpe
2024-05-16  2:32                 ` Yan Zhao
2024-05-16  8:38                   ` Tian, Kevin
2024-05-16  9:48                     ` Yan Zhao
2024-05-17 17:04                   ` Jason Gunthorpe
2024-05-20  2:45                     ` Yan Zhao
2024-05-21 16:04                       ` Jason Gunthorpe
2024-05-22  3:17                         ` Yan Zhao
2024-05-22  6:29                           ` Yan Zhao
2024-05-22 17:01                             ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240507061802.20184-1-yan.y.zhao@intel.com \
    --to=yan.y.zhao@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=robin.murphy@arm.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).