LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] iommu/amd: Use 128-bit cmpxchg operation to update DTE
@ 2024-09-06 12:13 Suravee Suthikulpanit
  2024-09-06 12:13 ` [PATCH v3 1/5] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported Suravee Suthikulpanit
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2024-09-06 12:13 UTC (permalink / raw)
  To: linux-kernel, iommu
  Cc: joro, robin.murphy, vasant.hegde, ubizjak, jgg, jon.grimm,
	santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit

This series modifies current implementation to use 128-bit cmpxchg to update DTE
when needed as specified in the AMD I/O Virtualization Techonology (IOMMU)
Specification.

Changes in V3:
  * Patch 2:
    - Consolidate patch 2 and 3
    - Change rw_semaphore to spin_lock
  * Patch 3: Expand locking across 256-bit DTE read and update
  * Patch 4: Fix clear_dte_entry()
  * Patch 5: Consolidate amd_iommu_set_dirty_tracking() and set_dte_irq_entry() fixes

v2: https://lore.kernel.org/lkml/20240829180726.5022-1-suravee.suthikulpanit@amd.com/
v1 :https://lore.kernel.org/lkml/e937e26f-038a-6d01-76a9-76c86760ca4a@gmail.com/T/

Thanks,
Suravee

Suravee Suthikulpanit (5):
  iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported
  iommu/amd: Introduce helper functions to access and update 256-bit DTE
  iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
  iommu/amd: Modify clear_dte_entry() to avoid in-place update
  iommu/amd: Do not update DTE in-place in amd_iommu_set_dirty_tracking
    and set_dte_irq_entry

 drivers/iommu/amd/amd_iommu_types.h |   8 +-
 drivers/iommu/amd/init.c            |  23 +--
 drivers/iommu/amd/iommu.c           | 272 +++++++++++++++++++---------
 3 files changed, 201 insertions(+), 102 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/5] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported
  2024-09-06 12:13 [PATCH v3 0/5] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
@ 2024-09-06 12:13 ` Suravee Suthikulpanit
  2024-09-06 16:38   ` Jason Gunthorpe
  2024-09-06 12:13 ` [PATCH v3 2/5] iommu/amd: Introduce helper functions to access and update 256-bit DTE Suravee Suthikulpanit
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Suravee Suthikulpanit @ 2024-09-06 12:13 UTC (permalink / raw)
  To: linux-kernel, iommu
  Cc: joro, robin.murphy, vasant.hegde, ubizjak, jgg, jon.grimm,
	santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit

According to the AMD IOMMU spec, the IOMMU reads the entire DTE either
in two 128-bit transactions or a single 256-bit transaction. It is
recommended to update DTE using 128-bit operation followed by an
INVALIDATE_DEVTAB_ENTYRY command when the IV=1b or V=1b.

According to the AMD BIOS and Kernel Developer's Guide (BDKG) dated back
to family 10h Processor [1], which is the first introduction of AMD IOMMU,
AMD processor always has CPUID Fn0000_0001_ECX[CMPXCHG16B]=1.
Therefore, it is safe to assume cmpxchg128 is available with all AMD
processor w/ IOMMU.

In addition, the CMPXCHG16B feature has already been checked separately
before enabling the GA, XT, and GAM modes. Consolidate the detection logic,
and fail the IOMMU initialization if the feature is not supported.

[1] https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/programmer-references/31116.pdf

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/init.c | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 6b15ce09e78d..983c09898a10 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1762,13 +1762,8 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h,
 		else
 			iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
 
-		/*
-		 * Note: GA (128-bit IRTE) mode requires cmpxchg16b supports.
-		 * GAM also requires GA mode. Therefore, we need to
-		 * check cmpxchg16b support before enabling it.
-		 */
-		if (!boot_cpu_has(X86_FEATURE_CX16) ||
-		    ((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0))
+		/* GAM requires GA mode. */
+		if ((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0)
 			amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
 		break;
 	case 0x11:
@@ -1778,13 +1773,8 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h,
 		else
 			iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
 
-		/*
-		 * Note: GA (128-bit IRTE) mode requires cmpxchg16b supports.
-		 * XT, GAM also requires GA mode. Therefore, we need to
-		 * check cmpxchg16b support before enabling them.
-		 */
-		if (!boot_cpu_has(X86_FEATURE_CX16) ||
-		    ((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0)) {
+		/* XT and GAM require GA mode. */
+		if ((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0) {
 			amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
 			break;
 		}
@@ -3049,6 +3039,11 @@ static int __init early_amd_iommu_init(void)
 		return -EINVAL;
 	}
 
+	if (!boot_cpu_has(X86_FEATURE_CX16)) {
+		pr_err("Failed to initialize. The CMPXCHG16B feature is required.\n");
+		return -EINVAL;
+	}
+
 	/*
 	 * Validate checksum here so we don't need to do it when
 	 * we actually parse the table
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/5] iommu/amd: Introduce helper functions to access and update 256-bit DTE
  2024-09-06 12:13 [PATCH v3 0/5] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
  2024-09-06 12:13 ` [PATCH v3 1/5] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported Suravee Suthikulpanit
@ 2024-09-06 12:13 ` Suravee Suthikulpanit
  2024-09-06 15:53   ` Jacob Pan
                     ` (3 more replies)
  2024-09-06 12:13 ` [PATCH v3 3/5] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
                   ` (2 subsequent siblings)
  4 siblings, 4 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2024-09-06 12:13 UTC (permalink / raw)
  To: linux-kernel, iommu
  Cc: joro, robin.murphy, vasant.hegde, ubizjak, jgg, jon.grimm,
	santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit

The current implementation does not follow 128-bit write requirement
to update DTE as specified in the AMD I/O Virtualization Techonology
(IOMMU) Specification.

Therefore, modify the struct dev_table_entry to contain union of u128 data
array, and introduce two helper functions:

  * update_dte256() to update DTE using two 128-bit cmpxchg
    operations to update 256-bit DTE with the modified structure.

  * get_dte256() to copy 256-bit DTE to the provided structrue.

In addition, when access/update 256-bit DTE, it needs to be locked
to prevent cmpxchg128 failure and data tearing. Therefore, introduce
a per-DTE spin_lock struct dev_data.dte_lock to provide synchronization
across get_dte256() and update_dte256().

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu_types.h |  6 ++-
 drivers/iommu/amd/iommu.c           | 80 +++++++++++++++++++++++++----
 2 files changed, 76 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index c9f9a598eb82..1836da2d9e60 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -833,6 +833,7 @@ struct devid_map {
 struct iommu_dev_data {
 	/*Protect against attach/detach races */
 	spinlock_t lock;
+	spinlock_t dte_lock;              /* DTE lock for 256-bit access */
 
 	struct list_head list;		  /* For domain->dev_list */
 	struct llist_node dev_data_list;  /* For global dev_data_list */
@@ -883,7 +884,10 @@ extern struct amd_iommu *amd_iommus[MAX_IOMMUS];
  * Structure defining one entry in the device table
  */
 struct dev_table_entry {
-	u64 data[4];
+	union {
+		u64 data[4];
+		u128 data128[2];
+	};
 };
 
 /*
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 87c5385ce3f2..b994e7837306 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -85,6 +85,45 @@ static void set_dte_entry(struct amd_iommu *iommu,
  *
  ****************************************************************************/
 
+static void update_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
+			  struct dev_table_entry *new)
+{
+	struct dev_table_entry *dev_table = get_dev_table(iommu);
+	struct dev_table_entry *ptr = &dev_table[dev_data->devid];
+	struct dev_table_entry old;
+	u128 tmp;
+
+	lockdep_assert_held(&dev_data->dte_lock);
+
+	old.data128[0] = ptr->data128[0];
+	old.data128[1] = ptr->data128[1];
+
+	tmp = cmpxchg128(&ptr->data128[1], old.data128[1], new->data128[1]);
+	if (tmp == old.data128[1]) {
+		if (!try_cmpxchg128(&ptr->data128[0], &old.data128[0], new->data128[0])) {
+			/* Restore hi 128-bit */
+			cmpxchg128(&ptr->data128[1], new->data128[1], tmp);
+			pr_err("%s: Failed. devid=%#x, dte=%016llx:%016llx:%016llx:%016llx\n",
+			       __func__, dev_data->devid, new->data[0], new->data[1],
+			       new->data[2], new->data[3]);
+		}
+	}
+}
+
+static void get_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
+		      struct dev_table_entry *dte)
+{
+	struct dev_table_entry *ptr;
+	struct dev_table_entry *dev_table = get_dev_table(iommu);
+
+	lockdep_assert_held(&dev_data->dte_lock);
+
+	ptr = &dev_table[dev_data->devid];
+
+	dte->data128[0] = ptr->data128[0];
+	dte->data128[1] = ptr->data128[1];
+}
+
 static inline bool pdom_is_v2_pgtbl_mode(struct protection_domain *pdom)
 {
 	return (pdom && (pdom->pd_mode == PD_MODE_V2));
@@ -205,6 +244,7 @@ static struct iommu_dev_data *alloc_dev_data(struct amd_iommu *iommu, u16 devid)
 		return NULL;
 
 	spin_lock_init(&dev_data->lock);
+	spin_lock_init(&dev_data->dte_lock);
 	dev_data->devid = devid;
 	ratelimit_default_init(&dev_data->rs);
 
@@ -232,9 +272,11 @@ static struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 devid
 
 static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
 {
+	struct dev_table_entry dte;
 	struct amd_iommu *iommu;
-	struct dev_table_entry *dev_table;
+	struct iommu_dev_data *dev_data, *alias_data;
 	u16 devid = pci_dev_id(pdev);
+	int ret;
 
 	if (devid == alias)
 		return 0;
@@ -243,13 +285,28 @@ static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
 	if (!iommu)
 		return 0;
 
-	amd_iommu_set_rlookup_table(iommu, alias);
-	dev_table = get_dev_table(iommu);
-	memcpy(dev_table[alias].data,
-	       dev_table[devid].data,
-	       sizeof(dev_table[alias].data));
+	/* Get DTE for pdev */
+	dev_data = dev_iommu_priv_get(&pdev->dev);
+	if (!dev_data)
+		return -EINVAL;
 
-	return 0;
+	spin_lock(&dev_data->dte_lock);
+	get_dte256(iommu, dev_data, &dte);
+	spin_unlock(&dev_data->dte_lock);
+
+	/* Setup for alias */
+	alias_data = search_dev_data(iommu, alias);
+	if (!alias_data) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	spin_lock(&alias_data->dte_lock);
+	update_dte256(iommu, alias_data, &dte);
+	amd_iommu_set_rlookup_table(iommu, alias);
+	spin_unlock(&alias_data->dte_lock);
+out:
+	return ret;
 }
 
 static void clone_aliases(struct amd_iommu *iommu, struct device *dev)
@@ -583,10 +640,15 @@ static void amd_iommu_uninit_device(struct device *dev)
 static void dump_dte_entry(struct amd_iommu *iommu, u16 devid)
 {
 	int i;
-	struct dev_table_entry *dev_table = get_dev_table(iommu);
+	struct dev_table_entry dte;
+	struct iommu_dev_data *dev_data = find_dev_data(iommu, devid);
+
+	spin_lock(&dev_data->dte_lock);
+	get_dte256(iommu, dev_data, &dte);
+	spin_unlock(&dev_data->dte_lock);
 
 	for (i = 0; i < 4; ++i)
-		pr_err("DTE[%d]: %016llx\n", i, dev_table[devid].data[i]);
+		pr_err("DTE[%d]: %016llx\n", i, dte.data[i]);
 }
 
 static void dump_command(unsigned long phys_addr)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 3/5] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
  2024-09-06 12:13 [PATCH v3 0/5] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
  2024-09-06 12:13 ` [PATCH v3 1/5] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported Suravee Suthikulpanit
  2024-09-06 12:13 ` [PATCH v3 2/5] iommu/amd: Introduce helper functions to access and update 256-bit DTE Suravee Suthikulpanit
@ 2024-09-06 12:13 ` Suravee Suthikulpanit
  2024-09-06 12:13 ` [PATCH v3 4/5] iommu/amd: Modify clear_dte_entry() to avoid in-place update Suravee Suthikulpanit
  2024-09-06 12:13 ` [PATCH v3 5/5] iommu/amd: Do not update DTE in-place in amd_iommu_set_dirty_tracking and set_dte_irq_entry Suravee Suthikulpanit
  4 siblings, 0 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2024-09-06 12:13 UTC (permalink / raw)
  To: linux-kernel, iommu
  Cc: joro, robin.murphy, vasant.hegde, ubizjak, jgg, jon.grimm,
	santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit

Also, the set_dte_entry() is used to program several DTE fields (e.g.
stage1 table, stage2 table, domain id, and etc.), which is difficult
to keep track with current implementation.

Therefore, separate logic for setting up the GCR3 Table Root Pointer,
GIOV, GV, GLX, and GuestPagingMode into another helper function
set_dte_gcr3_table().

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/iommu.c | 123 ++++++++++++++++++++++----------------
 1 file changed, 70 insertions(+), 53 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b994e7837306..f18ae6c077f4 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1913,16 +1913,56 @@ int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid)
 	return ret;
 }
 
+static void set_dte_gcr3_table(struct amd_iommu *iommu,
+			       struct iommu_dev_data *dev_data,
+			       struct dev_table_entry *target)
+{
+	struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
+	u64 tmp, gcr3;
+
+	if (!gcr3_info->gcr3_tbl)
+		return;
+
+	pr_debug("%s: devid=%#x, glx=%#x, gcr3_tbl=%#llx\n",
+		 __func__, dev_data->devid, gcr3_info->glx,
+		 (unsigned long long)gcr3_info->gcr3_tbl);
+
+	tmp = gcr3_info->glx;
+	target->data[0] |= (tmp & DTE_GLX_MASK) << DTE_GLX_SHIFT;
+	if (pdom_is_v2_pgtbl_mode(dev_data->domain))
+		target->data[0] |= DTE_FLAG_GIOV;
+	target->data[0] |= DTE_FLAG_GV;
+
+	/* First mask out possible old values for GCR3 table */
+	tmp = DTE_GCR3_VAL_A(~0ULL) << DTE_GCR3_SHIFT_A;
+	target->data[0] &= ~tmp;
+	tmp = DTE_GCR3_VAL_B(~0ULL) << DTE_GCR3_SHIFT_B;
+	tmp |= DTE_GCR3_VAL_C(~0ULL) << DTE_GCR3_SHIFT_C;
+	target->data[1] &= ~tmp;
+
+	gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
+
+	/* Encode GCR3 table into DTE */
+	tmp = DTE_GCR3_VAL_A(gcr3) << DTE_GCR3_SHIFT_A;
+	target->data[0] |= tmp;
+	tmp = DTE_GCR3_VAL_B(gcr3) << DTE_GCR3_SHIFT_B;
+	tmp |= DTE_GCR3_VAL_C(gcr3) << DTE_GCR3_SHIFT_C;
+	target->data[1] |= tmp;
+
+	/* Mask out old values for GuestPagingMode */
+	target->data[2] &= ~(0x3ULL << DTE_GPT_LEVEL_SHIFT);
+	/* Guest page table can only support 4 and 5 levels  */
+	if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL)
+		target->data[2] |= ((u64)GUEST_PGTABLE_5_LEVEL << DTE_GPT_LEVEL_SHIFT);
+}
+
 static void set_dte_entry(struct amd_iommu *iommu,
 			  struct iommu_dev_data *dev_data)
 {
-	u64 pte_root = 0;
-	u64 flags = 0;
-	u32 old_domid;
-	u16 devid = dev_data->devid;
 	u16 domid;
+	u32 old_domid;
+	struct dev_table_entry new;
 	struct protection_domain *domain = dev_data->domain;
-	struct dev_table_entry *dev_table = get_dev_table(iommu);
 	struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
 
 	if (gcr3_info && gcr3_info->gcr3_tbl)
@@ -1930,73 +1970,50 @@ static void set_dte_entry(struct amd_iommu *iommu,
 	else
 		domid = domain->id;
 
+
+	spin_lock(&dev_data->dte_lock);
+	/*
+	 * Need to preserve the certain fields in DTE because it contains
+	 * interrupt-remapping and other settings, which might be
+	 * programmed earlier by other code.
+	 */
+	get_dte256(iommu, dev_data, &new);
+
 	if (domain->iop.mode != PAGE_MODE_NONE)
-		pte_root = iommu_virt_to_phys(domain->iop.root);
+		new.data[0] = iommu_virt_to_phys(domain->iop.root);
 
-	pte_root |= (domain->iop.mode & DEV_ENTRY_MODE_MASK)
+	new.data[0] |= (domain->iop.mode & DEV_ENTRY_MODE_MASK)
 		    << DEV_ENTRY_MODE_SHIFT;
 
-	pte_root |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V;
+	new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V;
 
 	/*
 	 * When SNP is enabled, Only set TV bit when IOMMU
 	 * page translation is in use.
 	 */
 	if (!amd_iommu_snp_en || (domid != 0))
-		pte_root |= DTE_FLAG_TV;
-
-	flags = dev_table[devid].data[1];
-
-	if (dev_data->ats_enabled)
-		flags |= DTE_FLAG_IOTLB;
+		new.data[0] |= DTE_FLAG_TV;
 
 	if (dev_data->ppr)
-		pte_root |= 1ULL << DEV_ENTRY_PPR;
+		new.data[0] |= 1ULL << DEV_ENTRY_PPR;
 
 	if (domain->dirty_tracking)
-		pte_root |= DTE_FLAG_HAD;
-
-	if (gcr3_info && gcr3_info->gcr3_tbl) {
-		u64 gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
-		u64 glx  = gcr3_info->glx;
-		u64 tmp;
-
-		pte_root |= DTE_FLAG_GV;
-		pte_root |= (glx & DTE_GLX_MASK) << DTE_GLX_SHIFT;
-
-		/* First mask out possible old values for GCR3 table */
-		tmp = DTE_GCR3_VAL_B(~0ULL) << DTE_GCR3_SHIFT_B;
-		flags    &= ~tmp;
+		new.data[0] |= DTE_FLAG_HAD;
 
-		tmp = DTE_GCR3_VAL_C(~0ULL) << DTE_GCR3_SHIFT_C;
-		flags    &= ~tmp;
-
-		/* Encode GCR3 table into DTE */
-		tmp = DTE_GCR3_VAL_A(gcr3) << DTE_GCR3_SHIFT_A;
-		pte_root |= tmp;
-
-		tmp = DTE_GCR3_VAL_B(gcr3) << DTE_GCR3_SHIFT_B;
-		flags    |= tmp;
-
-		tmp = DTE_GCR3_VAL_C(gcr3) << DTE_GCR3_SHIFT_C;
-		flags    |= tmp;
+	if (dev_data->ats_enabled)
+		new.data[1] |= DTE_FLAG_IOTLB;
+	else
+		new.data[1] &= ~DTE_FLAG_IOTLB;
 
-		if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL) {
-			dev_table[devid].data[2] |=
-				((u64)GUEST_PGTABLE_5_LEVEL << DTE_GPT_LEVEL_SHIFT);
-		}
+	old_domid = new.data[1] & DEV_DOMID_MASK;
+	new.data[1] &= ~DEV_DOMID_MASK;
+	new.data[1] |= domid;
 
-		/* GIOV is supported with V2 page table mode only */
-		if (pdom_is_v2_pgtbl_mode(domain))
-			pte_root |= DTE_FLAG_GIOV;
-	}
+	set_dte_gcr3_table(iommu, dev_data, &new);
 
-	flags &= ~DEV_DOMID_MASK;
-	flags |= domid;
+	update_dte256(iommu, dev_data, &new);
 
-	old_domid = dev_table[devid].data[1] & DEV_DOMID_MASK;
-	dev_table[devid].data[1]  = flags;
-	dev_table[devid].data[0]  = pte_root;
+	spin_unlock(&dev_data->dte_lock);
 
 	/*
 	 * A kdump kernel might be replacing a domain ID that was copied from
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 4/5] iommu/amd: Modify clear_dte_entry() to avoid in-place update
  2024-09-06 12:13 [PATCH v3 0/5] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
                   ` (2 preceding siblings ...)
  2024-09-06 12:13 ` [PATCH v3 3/5] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
@ 2024-09-06 12:13 ` Suravee Suthikulpanit
  2024-09-06 18:07   ` Jason Gunthorpe
  2024-09-06 12:13 ` [PATCH v3 5/5] iommu/amd: Do not update DTE in-place in amd_iommu_set_dirty_tracking and set_dte_irq_entry Suravee Suthikulpanit
  4 siblings, 1 reply; 17+ messages in thread
From: Suravee Suthikulpanit @ 2024-09-06 12:13 UTC (permalink / raw)
  To: linux-kernel, iommu
  Cc: joro, robin.murphy, vasant.hegde, ubizjak, jgg, jon.grimm,
	santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit

Lock DTE and copy value to a temporary storage before update using
cmpxchg128.

Also, refactor the function to simplify logic for applying erratum 63.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/amd_iommu_types.h |  2 ++
 drivers/iommu/amd/iommu.c           | 27 ++++++++++++++++++++-------
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 1836da2d9e60..81a994471a30 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -425,6 +425,8 @@
 
 #define DTE_GPT_LEVEL_SHIFT	54
 
+#define DTE_SYSMGT_MASK		GENMASK_ULL(41, 40)
+
 #define GCR3_VALID		0x01ULL
 
 #define IOMMU_PAGE_MASK (((1ULL << 52) - 1) & ~0xfffULL)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index f18ae6c077f4..15eb816d4313 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2025,19 +2025,32 @@ static void set_dte_entry(struct amd_iommu *iommu,
 	}
 }
 
-static void clear_dte_entry(struct amd_iommu *iommu, u16 devid)
+static void clear_dte_entry(struct amd_iommu *iommu, struct iommu_dev_data *dev_data)
 {
-	struct dev_table_entry *dev_table = get_dev_table(iommu);
+	struct dev_table_entry new;
+	struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
+
+	/*
+	 * Need to preserve DTE[96:106] because certain fields are
+	 * programmed using value in IVRS table from early init phase.
+	 */
+	spin_lock(&dev_data->dte_lock);
+	get_dte256(iommu, dev_data, &new);
 
 	/* remove entry from the device table seen by the hardware */
-	dev_table[devid].data[0]  = DTE_FLAG_V;
+	new.data[0] = DTE_FLAG_V;
 
 	if (!amd_iommu_snp_en)
-		dev_table[devid].data[0] |= DTE_FLAG_TV;
+		new.data[0] |= DTE_FLAG_TV;
 
-	dev_table[devid].data[1] &= DTE_FLAG_MASK;
+	new.data[1] &= DTE_FLAG_MASK;
 
-	amd_iommu_apply_erratum_63(iommu, devid);
+	/* Apply erratum 63 */
+	if (FIELD_GET(DTE_SYSMGT_MASK, new.data[1]) == 0x01)
+		new.data[0] |= BIT_ULL(DEV_ENTRY_IW);
+
+	WARN_ON(!try_cmpxchg128(&dte->data128[0], &dte->data128[0], new.data128[0]));
+	spin_unlock(&dev_data->dte_lock);
 }
 
 /* Update and flush DTE for the given device */
@@ -2048,7 +2061,7 @@ void amd_iommu_dev_update_dte(struct iommu_dev_data *dev_data, bool set)
 	if (set)
 		set_dte_entry(iommu, dev_data);
 	else
-		clear_dte_entry(iommu, dev_data->devid);
+		clear_dte_entry(iommu, dev_data);
 
 	clone_aliases(iommu, dev_data->dev);
 	device_flush_dte(dev_data);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 5/5] iommu/amd: Do not update DTE in-place in amd_iommu_set_dirty_tracking and set_dte_irq_entry
  2024-09-06 12:13 [PATCH v3 0/5] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
                   ` (3 preceding siblings ...)
  2024-09-06 12:13 ` [PATCH v3 4/5] iommu/amd: Modify clear_dte_entry() to avoid in-place update Suravee Suthikulpanit
@ 2024-09-06 12:13 ` Suravee Suthikulpanit
  4 siblings, 0 replies; 17+ messages in thread
From: Suravee Suthikulpanit @ 2024-09-06 12:13 UTC (permalink / raw)
  To: linux-kernel, iommu
  Cc: joro, robin.murphy, vasant.hegde, ubizjak, jgg, jon.grimm,
	santosh.shukla, pandoh, kumaranand, Suravee Suthikulpanit

Update the DTE in place could result in data tearing since the AMD IOMMU
hardware reads the entire DTE either in two 128-bit transactions or
a single 256-bit transaction.

In case of updating only single 64-bit, it is sufficient to lock use
WRITE_ONCE() beaucse it is writing to memory read back by HW.

Also, lock the DTE before updating.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 drivers/iommu/amd/iommu.c | 42 ++++++++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 15eb816d4313..e9cfe9238e83 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2753,12 +2753,12 @@ static int amd_iommu_set_dirty_tracking(struct iommu_domain *domain,
 					bool enable)
 {
 	struct protection_domain *pdomain = to_pdomain(domain);
-	struct dev_table_entry *dev_table;
+	struct dev_table_entry *dte;
 	struct iommu_dev_data *dev_data;
 	bool domain_flush = false;
 	struct amd_iommu *iommu;
 	unsigned long flags;
-	u64 pte_root;
+	u64 new;
 
 	spin_lock_irqsave(&pdomain->lock, flags);
 	if (!(pdomain->dirty_tracking ^ enable)) {
@@ -2767,16 +2767,15 @@ static int amd_iommu_set_dirty_tracking(struct iommu_domain *domain,
 	}
 
 	list_for_each_entry(dev_data, &pdomain->dev_list, list) {
+		spin_lock(&dev_data->dte_lock);
 		iommu = get_amd_iommu_from_dev_data(dev_data);
-
-		dev_table = get_dev_table(iommu);
-		pte_root = dev_table[dev_data->devid].data[0];
-
-		pte_root = (enable ? pte_root | DTE_FLAG_HAD :
-				     pte_root & ~DTE_FLAG_HAD);
+		dte = &get_dev_table(iommu)[dev_data->devid];
+		new = dte->data[0];
+		new = (enable ? new | DTE_FLAG_HAD : new & ~DTE_FLAG_HAD);
+		WRITE_ONCE(dte->data[0], new);
+		spin_unlock(&dev_data->dte_lock);
 
 		/* Flush device DTE */
-		dev_table[dev_data->devid].data[0] = pte_root;
 		device_flush_dte(dev_data);
 		domain_flush = true;
 	}
@@ -3042,17 +3041,24 @@ static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid)
 static void set_dte_irq_entry(struct amd_iommu *iommu, u16 devid,
 			      struct irq_remap_table *table)
 {
-	u64 dte;
-	struct dev_table_entry *dev_table = get_dev_table(iommu);
+	u64 new;
+	struct dev_table_entry *dte = &get_dev_table(iommu)[devid];
+	struct iommu_dev_data *dev_data = search_dev_data(iommu, devid);
+
+	if (dev_data)
+		spin_lock(&dev_data->dte_lock);
+
+	new = dte->data[2];
+	new &= ~DTE_IRQ_PHYS_ADDR_MASK;
+	new |= iommu_virt_to_phys(table->table);
+	new |= DTE_IRQ_REMAP_INTCTL;
+	new |= DTE_INTTABLEN;
+	new |= DTE_IRQ_REMAP_ENABLE;
 
-	dte	= dev_table[devid].data[2];
-	dte	&= ~DTE_IRQ_PHYS_ADDR_MASK;
-	dte	|= iommu_virt_to_phys(table->table);
-	dte	|= DTE_IRQ_REMAP_INTCTL;
-	dte	|= DTE_INTTABLEN;
-	dte	|= DTE_IRQ_REMAP_ENABLE;
+	WRITE_ONCE(dte->data[2], new);
 
-	dev_table[devid].data[2] = dte;
+	if (dev_data)
+		spin_unlock(&dev_data->dte_lock);
 }
 
 static struct irq_remap_table *get_irq_table(struct amd_iommu *iommu, u16 devid)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/5] iommu/amd: Introduce helper functions to access and update 256-bit DTE
  2024-09-06 12:13 ` [PATCH v3 2/5] iommu/amd: Introduce helper functions to access and update 256-bit DTE Suravee Suthikulpanit
@ 2024-09-06 15:53   ` Jacob Pan
  2024-09-06 17:00   ` Jason Gunthorpe
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Jacob Pan @ 2024-09-06 15:53 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, ubizjak,
	jgg, jon.grimm, santosh.shukla, pandoh, kumaranand

Hi Suravee,

On Fri, 6 Sep 2024 12:13:05 +0000
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> wrote:

> The current implementation does not follow 128-bit write requirement
> to update DTE as specified in the AMD I/O Virtualization Techonology
> (IOMMU) Specification.
> 
> Therefore, modify the struct dev_table_entry to contain union of u128
> data array, and introduce two helper functions:
> 
>   * update_dte256() to update DTE using two 128-bit cmpxchg
>     operations to update 256-bit DTE with the modified structure.
> 
>   * get_dte256() to copy 256-bit DTE to the provided structrue.
> 
> In addition, when access/update 256-bit DTE, it needs to be locked
> to prevent cmpxchg128 failure and data tearing. Therefore, introduce
> a per-DTE spin_lock struct dev_data.dte_lock to provide
> synchronization across get_dte256() and update_dte256().
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
>  drivers/iommu/amd/amd_iommu_types.h |  6 ++-
>  drivers/iommu/amd/iommu.c           | 80
> +++++++++++++++++++++++++---- 2 files changed, 76 insertions(+), 10
> deletions(-)
> 
> diff --git a/drivers/iommu/amd/amd_iommu_types.h
> b/drivers/iommu/amd/amd_iommu_types.h index
> c9f9a598eb82..1836da2d9e60 100644 ---
> a/drivers/iommu/amd/amd_iommu_types.h +++
> b/drivers/iommu/amd/amd_iommu_types.h @@ -833,6 +833,7 @@ struct
> devid_map { struct iommu_dev_data {
>  	/*Protect against attach/detach races */
>  	spinlock_t lock;
> +	spinlock_t dte_lock;              /* DTE lock for 256-bit
> access */ 
>  	struct list_head list;		  /* For
> domain->dev_list */ struct llist_node dev_data_list;  /* For global
> dev_data_list */ @@ -883,7 +884,10 @@ extern struct amd_iommu
> *amd_iommus[MAX_IOMMUS];
>   * Structure defining one entry in the device table
>   */
>  struct dev_table_entry {
> -	u64 data[4];
> +	union {
> +		u64 data[4];
> +		u128 data128[2];
> +	};
>  };
>  
>  /*
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 87c5385ce3f2..b994e7837306 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -85,6 +85,45 @@ static void set_dte_entry(struct amd_iommu *iommu,
>   *
>   ****************************************************************************/
>  
> +static void update_dte256(struct amd_iommu *iommu, struct
> iommu_dev_data *dev_data,
> +			  struct dev_table_entry *new)
> +{
> +	struct dev_table_entry *dev_table = get_dev_table(iommu);
> +	struct dev_table_entry *ptr = &dev_table[dev_data->devid];
> +	struct dev_table_entry old;
> +	u128 tmp;
> +
> +	lockdep_assert_held(&dev_data->dte_lock);
> +
> +	old.data128[0] = ptr->data128[0];
> +	old.data128[1] = ptr->data128[1];
> +
> +	tmp = cmpxchg128(&ptr->data128[1], old.data128[1],
> new->data128[1]);
> +	if (tmp == old.data128[1]) {
If you are able to deal with the failure, why not use try_cmpxchg128
for the hi 128 bit also? It is more efficient.

> +		if (!try_cmpxchg128(&ptr->data128[0],
> &old.data128[0], new->data128[0])) {
> +			/* Restore hi 128-bit */
> +			cmpxchg128(&ptr->data128[1],
> new->data128[1], tmp);
> +			pr_err("%s: Failed. devid=%#x,
> dte=%016llx:%016llx:%016llx:%016llx\n",
> +			       __func__, dev_data->devid,
> new->data[0], new->data[1],
> +			       new->data[2], new->data[3]);
> +		}
> +	}
> +}
> +
> +static void get_dte256(struct amd_iommu *iommu, struct
> iommu_dev_data *dev_data,
> +		      struct dev_table_entry *dte)
> +{
> +	struct dev_table_entry *ptr;
> +	struct dev_table_entry *dev_table = get_dev_table(iommu);
> +
> +	lockdep_assert_held(&dev_data->dte_lock);
> +
> +	ptr = &dev_table[dev_data->devid];
> +
> +	dte->data128[0] = ptr->data128[0];
> +	dte->data128[1] = ptr->data128[1];
> +}
> +
>  static inline bool pdom_is_v2_pgtbl_mode(struct protection_domain
> *pdom) {
>  	return (pdom && (pdom->pd_mode == PD_MODE_V2));
> @@ -205,6 +244,7 @@ static struct iommu_dev_data
> *alloc_dev_data(struct amd_iommu *iommu, u16 devid) return NULL;
>  
>  	spin_lock_init(&dev_data->lock);
> +	spin_lock_init(&dev_data->dte_lock);
>  	dev_data->devid = devid;
>  	ratelimit_default_init(&dev_data->rs);
>  
> @@ -232,9 +272,11 @@ static struct iommu_dev_data
> *search_dev_data(struct amd_iommu *iommu, u16 devid 
>  static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
>  {
> +	struct dev_table_entry dte;
>  	struct amd_iommu *iommu;
> -	struct dev_table_entry *dev_table;
> +	struct iommu_dev_data *dev_data, *alias_data;
>  	u16 devid = pci_dev_id(pdev);
> +	int ret;
>  
>  	if (devid == alias)
>  		return 0;
> @@ -243,13 +285,28 @@ static int clone_alias(struct pci_dev *pdev,
> u16 alias, void *data) if (!iommu)
>  		return 0;
>  
> -	amd_iommu_set_rlookup_table(iommu, alias);
> -	dev_table = get_dev_table(iommu);
> -	memcpy(dev_table[alias].data,
> -	       dev_table[devid].data,
> -	       sizeof(dev_table[alias].data));
> +	/* Get DTE for pdev */
> +	dev_data = dev_iommu_priv_get(&pdev->dev);
> +	if (!dev_data)
> +		return -EINVAL;
>  
> -	return 0;
> +	spin_lock(&dev_data->dte_lock);
> +	get_dte256(iommu, dev_data, &dte);
> +	spin_unlock(&dev_data->dte_lock);
> +
> +	/* Setup for alias */
> +	alias_data = search_dev_data(iommu, alias);
> +	if (!alias_data) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	spin_lock(&alias_data->dte_lock);
> +	update_dte256(iommu, alias_data, &dte);
> +	amd_iommu_set_rlookup_table(iommu, alias);
> +	spin_unlock(&alias_data->dte_lock);
> +out:
> +	return ret;
>  }
>  
>  static void clone_aliases(struct amd_iommu *iommu, struct device
> *dev) @@ -583,10 +640,15 @@ static void
> amd_iommu_uninit_device(struct device *dev) static void
> dump_dte_entry(struct amd_iommu *iommu, u16 devid) {
>  	int i;
> -	struct dev_table_entry *dev_table = get_dev_table(iommu);
> +	struct dev_table_entry dte;
> +	struct iommu_dev_data *dev_data = find_dev_data(iommu,
> devid); +
> +	spin_lock(&dev_data->dte_lock);
> +	get_dte256(iommu, dev_data, &dte);
> +	spin_unlock(&dev_data->dte_lock);
>  
>  	for (i = 0; i < 4; ++i)
> -		pr_err("DTE[%d]: %016llx\n", i,
> dev_table[devid].data[i]);
> +		pr_err("DTE[%d]: %016llx\n", i, dte.data[i]);
>  }
>  
>  static void dump_command(unsigned long phys_addr)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported
  2024-09-06 12:13 ` [PATCH v3 1/5] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported Suravee Suthikulpanit
@ 2024-09-06 16:38   ` Jason Gunthorpe
  2024-09-09 15:16     ` Jason Gunthorpe
  2024-09-16 16:11     ` Suthikulpanit, Suravee
  0 siblings, 2 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2024-09-06 16:38 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, ubizjak,
	jon.grimm, santosh.shukla, pandoh, kumaranand

On Fri, Sep 06, 2024 at 12:13:04PM +0000, Suravee Suthikulpanit wrote:
> According to the AMD IOMMU spec, the IOMMU reads the entire DTE either
> in two 128-bit transactions or a single 256-bit transaction. 

.. if two 128-bit transaction on the read side is possible then you
need flushing! :(

For instance this:

  IOMMU         CPU
Read [0]    
              Write [0]
              Write [1]
Read [1]

Will result in the iommu seeing torn incorrect data - the Guest paging
mode may not match the page table pointer, or the VIOMMU data may
become mismatched to the host translation.

Avoiding flushing is only possible if the full 256 bits are read
atomically.

> It is recommended to update DTE using 128-bit operation followed by
> an INVALIDATE_DEVTAB_ENTYRY command when the IV=1b or V=1b.

This advice only works when going from non-valid to valid.

> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
>  drivers/iommu/amd/init.c | 23 +++++++++--------------
>  1 file changed, 9 insertions(+), 14 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/5] iommu/amd: Introduce helper functions to access and update 256-bit DTE
  2024-09-06 12:13 ` [PATCH v3 2/5] iommu/amd: Introduce helper functions to access and update 256-bit DTE Suravee Suthikulpanit
  2024-09-06 15:53   ` Jacob Pan
@ 2024-09-06 17:00   ` Jason Gunthorpe
  2024-09-16 16:12     ` Suthikulpanit, Suravee
       [not found]   ` <66db2589.170a0220.6f57.d691SMTPIN_ADDED_BROKEN@mx.google.com>
  2024-09-07 13:36   ` kernel test robot
  3 siblings, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2024-09-06 17:00 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, ubizjak,
	jon.grimm, santosh.shukla, pandoh, kumaranand

On Fri, Sep 06, 2024 at 12:13:05PM +0000, Suravee Suthikulpanit wrote:

> +static void update_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
> +			  struct dev_table_entry *new)
> +{
> +	struct dev_table_entry *dev_table = get_dev_table(iommu);
> +	struct dev_table_entry *ptr = &dev_table[dev_data->devid];
> +	struct dev_table_entry old;
> +	u128 tmp;
> +
> +	lockdep_assert_held(&dev_data->dte_lock);
> +
> +	old.data128[0] = ptr->data128[0];
> +	old.data128[1] = ptr->data128[1];
> +
> +	tmp = cmpxchg128(&ptr->data128[1], old.data128[1], new->data128[1]);
> +	if (tmp == old.data128[1]) {
> +		if (!try_cmpxchg128(&ptr->data128[0], &old.data128[0], new->data128[0])) {
> +			/* Restore hi 128-bit */
> +			cmpxchg128(&ptr->data128[1], new->data128[1], tmp);

Like I said before, you can't fix this. Just go fowards. Keeping an
old DTE will UAF things, that is much worse than just forcing a new
DTE and loosing some interrupt settings.

> @@ -243,13 +285,28 @@ static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
>  	if (!iommu)
>  		return 0;
>  
> -	amd_iommu_set_rlookup_table(iommu, alias);
> -	dev_table = get_dev_table(iommu);
> -	memcpy(dev_table[alias].data,
> -	       dev_table[devid].data,
> -	       sizeof(dev_table[alias].data));
> +	/* Get DTE for pdev */
> +	dev_data = dev_iommu_priv_get(&pdev->dev);
> +	if (!dev_data)
> +		return -EINVAL;
>  
> -	return 0;
> +	spin_lock(&dev_data->dte_lock);
> +	get_dte256(iommu, dev_data, &dte);
> +	spin_unlock(&dev_data->dte_lock);

You can't unlock after reading the DTE and the relock it to program
it. The interrupt data can have changed while unlocked.

Put the lock inside update_dte256() and force the interrupt bits
under the lock.

Something like this is what I'm expecting:

static void write_upper(struct dev_table_entry *ptr, struct dev_table_entry *new)
{
	struct dev_table_entry old;

	lockdep_assert_held(&dev_data->dte_lock);

	do {
		old->data128[1] = ptr->data128[1];
		new->data64[2] &= ~INTR_MASK;
		new->data64[2] |= old->data64[2] & INTR_MASK;
	} while (!try_cmpxchg128(&ptr->data128[1], &old.data128[1],
				 new->data128[1]));
}

static void write_lower(struct dev_table_entry *ptr, struct dev_table_entry *new)
{
	struct dev_table_entry old;

	do {
		old->data128[0] = ptr->data128[0];
	} while (!try_cmpxchg128(&ptr->data128[0], &old.data128[0],
				 new->data128[0]));
}

static void update_dte256(struct amd_iommu *iommu,
			  struct iommu_dev_data *dev_data,
			  struct dev_table_entry *new)
{

	spin_lock(&dev_data->dte_lock);
	if (!(ptr->data64[0] & V)) {
		write_upper(ptr, new);
		/* NO FLUSH assume V=0 never cached */
		write_lower(ptr, new);
		/* FLUSH */
	} else if (!(new->data64[0] & V) {
		write_lower(ptr, new);
		/* FLUSH */
		write_upper(ptr, new);
		/* NO FLUSH assume V=0 never cached */
	} else {
		/* both are valid */
		if (FIELD_GET(ptr->data[2], GUEST_PAGING_MODE) ==
		    FIELD_GET(new->data[2], GUEST_PAGING_MODE)) {
			/* Upper doesn't change */
			write_upper(ptr, new);
			write_lower(ptr, new);
			/* FLUSH */
		else if (old has no guest page table) {
			write_upper(ptr, new);
			/* FLUSH */
			write_lower(ptr, new);
			/* FLUSH */
		else if (new has no guest page table) {
			write_lower(ptr, new);
			/* FLUSH */
			write_upper(ptr, new);
			/* FLUSH */
		} else {
			struct dev_table_entry clear = {};

			write_lower(ptr, &clear);
			/* FLUSH to set V=0 */
			write_upper(ptr, new);
			/* NO FLUSH assume V=0 never cached */
			write_lower(ptr, new);
			/* FLUSH */
		}
	}

	spin_unlock(&dev_data->dte_lock);
}

And it probably needs more logic to accomodate the VIOMMU valid bits
in the 2nd 128.

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/5] iommu/amd: Modify clear_dte_entry() to avoid in-place update
  2024-09-06 12:13 ` [PATCH v3 4/5] iommu/amd: Modify clear_dte_entry() to avoid in-place update Suravee Suthikulpanit
@ 2024-09-06 18:07   ` Jason Gunthorpe
  0 siblings, 0 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2024-09-06 18:07 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, ubizjak,
	jon.grimm, santosh.shukla, pandoh, kumaranand

On Fri, Sep 06, 2024 at 12:13:07PM +0000, Suravee Suthikulpanit wrote:
> Lock DTE and copy value to a temporary storage before update using
> cmpxchg128.
> 
> Also, refactor the function to simplify logic for applying erratum 63.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
>  drivers/iommu/amd/amd_iommu_types.h |  2 ++
>  drivers/iommu/amd/iommu.c           | 27 ++++++++++++++++++++-------
>  2 files changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
> index 1836da2d9e60..81a994471a30 100644
> --- a/drivers/iommu/amd/amd_iommu_types.h
> +++ b/drivers/iommu/amd/amd_iommu_types.h
> @@ -425,6 +425,8 @@
>  
>  #define DTE_GPT_LEVEL_SHIFT	54
>  
> +#define DTE_SYSMGT_MASK		GENMASK_ULL(41, 40)
> +
>  #define GCR3_VALID		0x01ULL
>  
>  #define IOMMU_PAGE_MASK (((1ULL << 52) - 1) & ~0xfffULL)
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index f18ae6c077f4..15eb816d4313 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2025,19 +2025,32 @@ static void set_dte_entry(struct amd_iommu *iommu,
>  	}
>  }
>  
> -static void clear_dte_entry(struct amd_iommu *iommu, u16 devid)
> +static void clear_dte_entry(struct amd_iommu *iommu, struct iommu_dev_data *dev_data)
>  {
> -	struct dev_table_entry *dev_table = get_dev_table(iommu);
> +	struct dev_table_entry new;
> +	struct dev_table_entry *dte = &get_dev_table(iommu)[dev_data->devid];
> +
> +	/*
> +	 * Need to preserve DTE[96:106] because certain fields are
> +	 * programmed using value in IVRS table from early init phase.
> +	 */
> +	spin_lock(&dev_data->dte_lock);
> +	get_dte256(iommu, dev_data, &new);

I think there is no point in the get?

	struct dev_table_entry new = {}

	new.data[0] = DTE_FLAG_V;
	if (FIELD_GET(DTE_SYSMGT_MASK, old.data[1]) == 0x01)
		new.data[0] |= BIT_ULL(DEV_ENTRY_IW);

	new.data[1] = old.data[1] & DTE_FLAG_MASK
	new.data[2..4] = 0

That is pretty clear and simple

> -	dev_table[devid].data[1] &= DTE_FLAG_MASK;
> +	new.data[1] &= DTE_FLAG_MASK;

Would be nice if DTE_FLAG_MASK was broken into fields someday..

> -	amd_iommu_apply_erratum_63(iommu, devid);
> +	/* Apply erratum 63 */
> +	if (FIELD_GET(DTE_SYSMGT_MASK, new.data[1]) == 0x01)
> +		new.data[0] |= BIT_ULL(DEV_ENTRY_IW);
> +
> +	WARN_ON(!try_cmpxchg128(&dte->data128[0], &dte->data128[0], new.data128[0]));

As before this has to move forward, we can't fail to clear the DTE, it
will open a UAF

This should also clear the top 128 bits, so I would call the
update_dte256 directly?

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/5] iommu/amd: Introduce helper functions to access and update 256-bit DTE
       [not found]   ` <66db2589.170a0220.6f57.d691SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2024-09-06 19:31     ` Uros Bizjak
  0 siblings, 0 replies; 17+ messages in thread
From: Uros Bizjak @ 2024-09-06 19:31 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Suravee Suthikulpanit, linux-kernel, iommu, joro, robin.murphy,
	vasant.hegde, jgg, jon.grimm, santosh.shukla, pandoh, kumaranand

On Fri, Sep 6, 2024 at 5:53 PM Jacob Pan <jacob.pan@linux.microsoft.com> wrote:

> > +static void update_dte256(struct amd_iommu *iommu, struct
> > iommu_dev_data *dev_data,
> > +                       struct dev_table_entry *new)
> > +{
> > +     struct dev_table_entry *dev_table = get_dev_table(iommu);
> > +     struct dev_table_entry *ptr = &dev_table[dev_data->devid];
> > +     struct dev_table_entry old;
> > +     u128 tmp;
> > +
> > +     lockdep_assert_held(&dev_data->dte_lock);
> > +
> > +     old.data128[0] = ptr->data128[0];
> > +     old.data128[1] = ptr->data128[1];
> > +
> > +     tmp = cmpxchg128(&ptr->data128[1], old.data128[1], new->data128[1]);
> > +     if (tmp == old.data128[1]) {
> If you are able to deal with the failure, why not use try_cmpxchg128
> for the hi 128 bit also? It is more efficient.

Indeed, just write:

if (try_cmpxchg128(&ptr->data128[1], &old.data128[1], new->data128[1])) {

to substitute the last two lines above.

Please also note that try_cmpxchg128() updates its second argument on failure.

Uros.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/5] iommu/amd: Introduce helper functions to access and update 256-bit DTE
  2024-09-06 12:13 ` [PATCH v3 2/5] iommu/amd: Introduce helper functions to access and update 256-bit DTE Suravee Suthikulpanit
                     ` (2 preceding siblings ...)
       [not found]   ` <66db2589.170a0220.6f57.d691SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2024-09-07 13:36   ` kernel test robot
  3 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2024-09-07 13:36 UTC (permalink / raw)
  To: Suravee Suthikulpanit, linux-kernel, iommu
  Cc: llvm, oe-kbuild-all, joro, robin.murphy, vasant.hegde, ubizjak,
	jgg, jon.grimm, santosh.shukla, pandoh, kumaranand,
	Suravee Suthikulpanit

Hi Suravee,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on joro-iommu/next v6.11-rc6 next-20240906]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Suravee-Suthikulpanit/iommu-amd-Disable-AMD-IOMMU-if-CMPXCHG16B-feature-is-not-supported/20240906-201533
base:   linus/master
patch link:    https://lore.kernel.org/r/20240906121308.5013-3-suravee.suthikulpanit%40amd.com
patch subject: [PATCH v3 2/5] iommu/amd: Introduce helper functions to access and update 256-bit DTE
config: x86_64-rhel-8.3-rust (https://download.01.org/0day-ci/archive/20240907/202409072100.WaB846Yg-lkp@intel.com/config)
compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240907/202409072100.WaB846Yg-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409072100.WaB846Yg-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/iommu/amd/iommu.c:299:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
     299 |         if (!alias_data) {
         |             ^~~~~~~~~~~
   drivers/iommu/amd/iommu.c:309:9: note: uninitialized use occurs here
     309 |         return ret;
         |                ^~~
   drivers/iommu/amd/iommu.c:299:2: note: remove the 'if' if its condition is always true
     299 |         if (!alias_data) {
         |         ^~~~~~~~~~~~~~~~
   drivers/iommu/amd/iommu.c:279:9: note: initialize the variable 'ret' to silence this warning
     279 |         int ret;
         |                ^
         |                 = 0
   1 warning generated.


vim +299 drivers/iommu/amd/iommu.c

   272	
   273	static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
   274	{
   275		struct dev_table_entry dte;
   276		struct amd_iommu *iommu;
   277		struct iommu_dev_data *dev_data, *alias_data;
   278		u16 devid = pci_dev_id(pdev);
   279		int ret;
   280	
   281		if (devid == alias)
   282			return 0;
   283	
   284		iommu = rlookup_amd_iommu(&pdev->dev);
   285		if (!iommu)
   286			return 0;
   287	
   288		/* Get DTE for pdev */
   289		dev_data = dev_iommu_priv_get(&pdev->dev);
   290		if (!dev_data)
   291			return -EINVAL;
   292	
   293		spin_lock(&dev_data->dte_lock);
   294		get_dte256(iommu, dev_data, &dte);
   295		spin_unlock(&dev_data->dte_lock);
   296	
   297		/* Setup for alias */
   298		alias_data = search_dev_data(iommu, alias);
 > 299		if (!alias_data) {
   300			ret = -EINVAL;
   301			goto out;
   302		}
   303	
   304		spin_lock(&alias_data->dte_lock);
   305		update_dte256(iommu, alias_data, &dte);
   306		amd_iommu_set_rlookup_table(iommu, alias);
   307		spin_unlock(&alias_data->dte_lock);
   308	out:
   309		return ret;
   310	}
   311	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported
  2024-09-06 16:38   ` Jason Gunthorpe
@ 2024-09-09 15:16     ` Jason Gunthorpe
  2024-09-16 17:19       ` Suthikulpanit, Suravee
  2024-09-16 16:11     ` Suthikulpanit, Suravee
  1 sibling, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2024-09-09 15:16 UTC (permalink / raw)
  To: Suravee Suthikulpanit
  Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, ubizjak,
	jon.grimm, santosh.shukla, pandoh, kumaranand

On Fri, Sep 06, 2024 at 01:38:18PM -0300, Jason Gunthorpe wrote:
> On Fri, Sep 06, 2024 at 12:13:04PM +0000, Suravee Suthikulpanit wrote:
> > According to the AMD IOMMU spec, the IOMMU reads the entire DTE either
> > in two 128-bit transactions or a single 256-bit transaction. 
> 
> .. if two 128-bit transaction on the read side is possible then you
> need flushing! :(
> 
> For instance this:
> 
>   IOMMU         CPU
> Read [0]    
>               Write [0]
>               Write [1]
> Read [1]
> 
> Will result in the iommu seeing torn incorrect data - the Guest paging
> mode may not match the page table pointer, or the VIOMMU data may
> become mismatched to the host translation.
> 
> Avoiding flushing is only possible if the full 256 bits are read
> atomically.

Also, please think about what qemu does when paravirtualizing
this. qemu will read the DTE entry using the CPU.

For your above remark it should be reading using two 128 bit loads.

However, it doesn't seem to be doing that:

static bool amdvi_get_dte(AMDVIState *s, int devid, uint64_t *entry)
{
    uint32_t offset = devid * AMDVI_DEVTAB_ENTRY_SIZE;

    if (dma_memory_read(&address_space_memory, s->devtab + offset, entry,
                        AMDVI_DEVTAB_ENTRY_SIZE, MEMTXATTRS_UNSPECIFIED)) {


The dma_memory_read eventually boils down to memcpy()

So qemu looks wrong to me.

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported
  2024-09-06 16:38   ` Jason Gunthorpe
  2024-09-09 15:16     ` Jason Gunthorpe
@ 2024-09-16 16:11     ` Suthikulpanit, Suravee
  2024-09-23 18:13       ` Jason Gunthorpe
  1 sibling, 1 reply; 17+ messages in thread
From: Suthikulpanit, Suravee @ 2024-09-16 16:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, ubizjak,
	jon.grimm, santosh.shukla, pandoh, kumaranand



On 9/6/2024 11:38 PM, Jason Gunthorpe wrote:
> On Fri, Sep 06, 2024 at 12:13:04PM +0000, Suravee Suthikulpanit wrote:
>> According to the AMD IOMMU spec, the IOMMU reads the entire DTE either
>> in two 128-bit transactions or a single 256-bit transaction.
> 
> .. if two 128-bit transaction on the read side is possible then you
> need flushing! :(
> 
> For instance this:
> 
>    IOMMU         CPU
> Read [0]
>                Write [0]
>                Write [1]
> Read [1]
> 
> Will result in the iommu seeing torn incorrect data - the Guest paging
> mode may not match the page table pointer, or the VIOMMU data may
> become mismatched to the host translation.
> 
> Avoiding flushing is only possible if the full 256 bits are read
> atomically.

I have verified with the hardware designer, and they have now confirmed 
that the IOMMU hardware has always been implemented with 256-bit read. 
The next revision of the IOMMU spec will be updated to correctly 
describe this part. Therefore, I will update the commit message and 
implement the code accordingly.

>> It is recommended to update DTE using 128-bit operation followed by
>> an INVALIDATE_DEVTAB_ENTYRY command when the IV=1b or V=1b.
> 
> This advice only works when going from non-valid to valid.

Actually, if we change the DTE when IV=1 or V=1, we would need to 
invalidate as well.

>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> ---
>>   drivers/iommu/amd/init.c | 23 +++++++++--------------
>>   1 file changed, 9 insertions(+), 14 deletions(-)
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> Jason

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/5] iommu/amd: Introduce helper functions to access and update 256-bit DTE
  2024-09-06 17:00   ` Jason Gunthorpe
@ 2024-09-16 16:12     ` Suthikulpanit, Suravee
  0 siblings, 0 replies; 17+ messages in thread
From: Suthikulpanit, Suravee @ 2024-09-16 16:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, ubizjak,
	jon.grimm, santosh.shukla, pandoh, kumaranand



On 9/7/2024 12:00 AM, Jason Gunthorpe wrote:
> On Fri, Sep 06, 2024 at 12:13:05PM +0000, Suravee Suthikulpanit wrote:
> 
>> +static void update_dte256(struct amd_iommu *iommu, struct iommu_dev_data *dev_data,
>> +			  struct dev_table_entry *new)
>> +{
>> +	struct dev_table_entry *dev_table = get_dev_table(iommu);
>> +	struct dev_table_entry *ptr = &dev_table[dev_data->devid];
>> +	struct dev_table_entry old;
>> +	u128 tmp;
>> +
>> +	lockdep_assert_held(&dev_data->dte_lock);
>> +
>> +	old.data128[0] = ptr->data128[0];
>> +	old.data128[1] = ptr->data128[1];
>> +
>> +	tmp = cmpxchg128(&ptr->data128[1], old.data128[1], new->data128[1]);
>> +	if (tmp == old.data128[1]) {
>> +		if (!try_cmpxchg128(&ptr->data128[0], &old.data128[0], new->data128[0])) {
>> +			/* Restore hi 128-bit */
>> +			cmpxchg128(&ptr->data128[1], new->data128[1], tmp);
> 
> Like I said before, you can't fix this. Just go fowards. Keeping an
> old DTE will UAF things, that is much worse than just forcing a new
> DTE and loosing some interrupt settings.
> 
>> @@ -243,13 +285,28 @@ static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
>>   	if (!iommu)
>>   		return 0;
>>   
>> -	amd_iommu_set_rlookup_table(iommu, alias);
>> -	dev_table = get_dev_table(iommu);
>> -	memcpy(dev_table[alias].data,
>> -	       dev_table[devid].data,
>> -	       sizeof(dev_table[alias].data));
>> +	/* Get DTE for pdev */
>> +	dev_data = dev_iommu_priv_get(&pdev->dev);
>> +	if (!dev_data)
>> +		return -EINVAL;
>>   
>> -	return 0;
>> +	spin_lock(&dev_data->dte_lock);
>> +	get_dte256(iommu, dev_data, &dte);
>> +	spin_unlock(&dev_data->dte_lock);
> 
> You can't unlock after reading the DTE and the relock it to program
> it. The interrupt data can have changed while unlocked.
> 
> Put the lock inside update_dte256() and force the interrupt bits
> under the lock.
> 
> Something like this is what I'm expecting:
> 
> static void write_upper(struct dev_table_entry *ptr, struct dev_table_entry *new)
> {
> 	struct dev_table_entry old;
> 
> 	lockdep_assert_held(&dev_data->dte_lock);
> 
> 	do {
> 		old->data128[1] = ptr->data128[1];
> 		new->data64[2] &= ~INTR_MASK;
> 		new->data64[2] |= old->data64[2] & INTR_MASK;
> 	} while (!try_cmpxchg128(&ptr->data128[1], &old.data128[1],
> 				 new->data128[1]));
> }
> 
> static void write_lower(struct dev_table_entry *ptr, struct dev_table_entry *new)
> {
> 	struct dev_table_entry old;
> 
> 	do {
> 		old->data128[0] = ptr->data128[0];
> 	} while (!try_cmpxchg128(&ptr->data128[0], &old.data128[0],
> 				 new->data128[0]));
> }
> 
> static void update_dte256(struct amd_iommu *iommu,
> 			  struct iommu_dev_data *dev_data,
> 			  struct dev_table_entry *new)
> {
> 
> 	spin_lock(&dev_data->dte_lock);
> 	if (!(ptr->data64[0] & V)) {
> 		write_upper(ptr, new);
> 		/* NO FLUSH assume V=0 never cached */
> 		write_lower(ptr, new);
> 		/* FLUSH */
> 	} else if (!(new->data64[0] & V) {
> 		write_lower(ptr, new);
> 		/* FLUSH */
> 		write_upper(ptr, new);
> 		/* NO FLUSH assume V=0 never cached */
> 	} else {
> 		/* both are valid */
> 		if (FIELD_GET(ptr->data[2], GUEST_PAGING_MODE) ==
> 		    FIELD_GET(new->data[2], GUEST_PAGING_MODE)) {
> 			/* Upper doesn't change */
> 			write_upper(ptr, new);
> 			write_lower(ptr, new);
> 			/* FLUSH */
> 		else if (old has no guest page table) {
> 			write_upper(ptr, new);
> 			/* FLUSH */
> 			write_lower(ptr, new);
> 			/* FLUSH */
> 		else if (new has no guest page table) {
> 			write_lower(ptr, new);
> 			/* FLUSH */
> 			write_upper(ptr, new);
> 			/* FLUSH */
> 		} else {
> 			struct dev_table_entry clear = {};
> 
> 			write_lower(ptr, &clear);
> 			/* FLUSH to set V=0 */
> 			write_upper(ptr, new);
> 			/* NO FLUSH assume V=0 never cached */
> 			write_lower(ptr, new);
> 			/* FLUSH */
> 		}
> 	}
> 
> 	spin_unlock(&dev_data->dte_lock);
> }

Got your point. I will update based on your suggestion here.

> And it probably needs more logic to accomodate the VIOMMU valid bits
> in the 2nd 128.

I'll take care of this when enable the HW-vIOMMU stuff.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported
  2024-09-09 15:16     ` Jason Gunthorpe
@ 2024-09-16 17:19       ` Suthikulpanit, Suravee
  0 siblings, 0 replies; 17+ messages in thread
From: Suthikulpanit, Suravee @ 2024-09-16 17:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, ubizjak,
	jon.grimm, santosh.shukla, pandoh, kumaranand



On 9/9/2024 10:16 PM, Jason Gunthorpe wrote:
> On Fri, Sep 06, 2024 at 01:38:18PM -0300, Jason Gunthorpe wrote:
>> On Fri, Sep 06, 2024 at 12:13:04PM +0000, Suravee Suthikulpanit wrote:
>>> According to the AMD IOMMU spec, the IOMMU reads the entire DTE either
>>> in two 128-bit transactions or a single 256-bit transaction.
>>
>> .. if two 128-bit transaction on the read side is possible then you
>> need flushing! :(
>>
>> For instance this:
>>
>>    IOMMU         CPU
>> Read [0]
>>                Write [0]
>>                Write [1]
>> Read [1]
>>
>> Will result in the iommu seeing torn incorrect data - the Guest paging
>> mode may not match the page table pointer, or the VIOMMU data may
>> become mismatched to the host translation.
>>
>> Avoiding flushing is only possible if the full 256 bits are read
>> atomically.
> 
> Also, please think about what qemu does when paravirtualizing
> this. qemu will read the DTE entry using the CPU.
> 
> For your above remark it should be reading using two 128 bit loads.
> 
> However, it doesn't seem to be doing that:
> 
> static bool amdvi_get_dte(AMDVIState *s, int devid, uint64_t *entry)
> {
>      uint32_t offset = devid * AMDVI_DEVTAB_ENTRY_SIZE;
> 
>      if (dma_memory_read(&address_space_memory, s->devtab + offset, entry,
>                          AMDVI_DEVTAB_ENTRY_SIZE, MEMTXATTRS_UNSPECIFIED)) {
> 
> 
> The dma_memory_read eventually boils down to memcpy()
> 
> So qemu looks wrong to me.
> 
> Jason

Thanks for pointing out. Let me check QEMU and see how to update the code.

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported
  2024-09-16 16:11     ` Suthikulpanit, Suravee
@ 2024-09-23 18:13       ` Jason Gunthorpe
  0 siblings, 0 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2024-09-23 18:13 UTC (permalink / raw)
  To: Suthikulpanit, Suravee
  Cc: linux-kernel, iommu, joro, robin.murphy, vasant.hegde, ubizjak,
	jon.grimm, santosh.shukla, pandoh, kumaranand

On Mon, Sep 16, 2024 at 11:11:46PM +0700, Suthikulpanit, Suravee wrote:

> > Avoiding flushing is only possible if the full 256 bits are read
> > atomically.
> 
> I have verified with the hardware designer, and they have now confirmed that
> the IOMMU hardware has always been implemented with 256-bit read. The next
> revision of the IOMMU spec will be updated to correctly describe this part.
> Therefore, I will update the commit message and implement the code
> accordingly.

That is certainly convenient, except qemu doesn't follow that spec :\

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-09-23 18:13 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-06 12:13 [PATCH v3 0/5] iommu/amd: Use 128-bit cmpxchg operation to update DTE Suravee Suthikulpanit
2024-09-06 12:13 ` [PATCH v3 1/5] iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported Suravee Suthikulpanit
2024-09-06 16:38   ` Jason Gunthorpe
2024-09-09 15:16     ` Jason Gunthorpe
2024-09-16 17:19       ` Suthikulpanit, Suravee
2024-09-16 16:11     ` Suthikulpanit, Suravee
2024-09-23 18:13       ` Jason Gunthorpe
2024-09-06 12:13 ` [PATCH v3 2/5] iommu/amd: Introduce helper functions to access and update 256-bit DTE Suravee Suthikulpanit
2024-09-06 15:53   ` Jacob Pan
2024-09-06 17:00   ` Jason Gunthorpe
2024-09-16 16:12     ` Suthikulpanit, Suravee
     [not found]   ` <66db2589.170a0220.6f57.d691SMTPIN_ADDED_BROKEN@mx.google.com>
2024-09-06 19:31     ` Uros Bizjak
2024-09-07 13:36   ` kernel test robot
2024-09-06 12:13 ` [PATCH v3 3/5] iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers Suravee Suthikulpanit
2024-09-06 12:13 ` [PATCH v3 4/5] iommu/amd: Modify clear_dte_entry() to avoid in-place update Suravee Suthikulpanit
2024-09-06 18:07   ` Jason Gunthorpe
2024-09-06 12:13 ` [PATCH v3 5/5] iommu/amd: Do not update DTE in-place in amd_iommu_set_dirty_tracking and set_dte_irq_entry Suravee Suthikulpanit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).