LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] iommu: Fix domain check on release
@ 2024-03-05  1:33 Lu Baolu
  2024-03-05  1:33 ` [PATCH v3 1/5] iommu: Add static iommu_ops->release_domain Lu Baolu
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Lu Baolu @ 2024-03-05  1:33 UTC (permalink / raw
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Eric Badger
  Cc: iommu, linux-kernel, Lu Baolu

This is a follow-up to the discussion thread here:

https://lore.kernel.org/linux-iommu/20240221154012.GC13491@ziepe.ca/

It fixes a NULL pointer dereference issue in the Intel iommu driver and
strengthens the iommu core to possibly prevent similar failures in other
iommu drivers.

There are two parts of this topic:
[x] Introduce release_domain and fix a kernel NULL pointer dereference
    issue in the intel iommu driver.
[x] Follow-up patches to cleanup intel iommu driver.

Best regards,
baolu

Change log:
v3:
 - Avoid global IOTLB and PASID cache invalidation in normal release
   path to mitigate the impact on other devices.
 - Comment and code refinements.

v2:
 - https://lore.kernel.org/linux-iommu/20240229094613.121575-1-baolu.lu@linux.intel.com/
 - https://lore.kernel.org/linux-iommu/20240229094804.121610-1-baolu.lu@linux.intel.com/
 - The scalable mode context entry should be removed in the release path
   as it's not part of the blocking domain.

v1: https://lore.kernel.org/linux-iommu/20240223051302.177596-1-baolu.lu@linux.intel.com/

Lu Baolu (5):
  iommu: Add static iommu_ops->release_domain
  iommu/vt-d: Fix NULL domain on device release
  iommu/vt-d: Setup scalable mode context entry in probe path
  iommu/vt-d: Remove scalable mode context entry setup from attach_dev
  iommu/vt-d: Remove scalabe mode in domain_context_clear_one()

 include/linux/iommu.h       |   1 +
 drivers/iommu/intel/pasid.h |   2 +
 drivers/iommu/intel/iommu.c | 214 +++++++++++-------------------------
 drivers/iommu/intel/pasid.c | 202 ++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c       |  19 +++-
 5 files changed, 283 insertions(+), 155 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/5] iommu: Add static iommu_ops->release_domain
  2024-03-05  1:33 [PATCH v3 0/5] iommu: Fix domain check on release Lu Baolu
@ 2024-03-05  1:33 ` Lu Baolu
  2024-04-10 15:26   ` Jason Gunthorpe
  2024-03-05  1:33 ` [PATCH v3 2/5] iommu/vt-d: Fix NULL domain on device release Lu Baolu
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Lu Baolu @ 2024-03-05  1:33 UTC (permalink / raw
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Eric Badger
  Cc: iommu, linux-kernel, Lu Baolu, Jason Gunthorpe

The current device_release callback for individual iommu drivers does the
following:

1) Silent IOMMU DMA translation: It detaches any existing domain from the
   device and puts it into a blocking state (some drivers might use the
   identity state).
2) Resource release: It releases resources allocated during the
   device_probe callback and restores the device to its pre-probe state.

Step 1 is challenging for individual iommu drivers because each must check
if a domain is already attached to the device. Additionally, if a deferred
attach never occurred, the device_release should avoid modifying hardware
configuration regardless of the reason for its call.

To simplify this process, introduce a static release_domain within the
iommu_ops structure. It can be either a blocking or identity domain
depending on the iommu hardware. The iommu core will decide whether to
attach this domain before the device_release callback, eliminating the
need for repetitive code in various drivers.

Consequently, the device_release callback can focus solely on the opposite
operations of device_probe, including releasing all resources allocated
during that callback.

Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 include/linux/iommu.h |  1 +
 drivers/iommu/iommu.c | 19 +++++++++++++++----
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index af6c367ed673..2e925b5eba53 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -585,6 +585,7 @@ struct iommu_ops {
 	struct module *owner;
 	struct iommu_domain *identity_domain;
 	struct iommu_domain *blocked_domain;
+	struct iommu_domain *release_domain;
 	struct iommu_domain *default_domain;
 };
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index eb50543bf956..098869007c69 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -462,13 +462,24 @@ static void iommu_deinit_device(struct device *dev)
 
 	/*
 	 * release_device() must stop using any attached domain on the device.
-	 * If there are still other devices in the group they are not effected
+	 * If there are still other devices in the group, they are not affected
 	 * by this callback.
 	 *
-	 * The IOMMU driver must set the device to either an identity or
-	 * blocking translation and stop using any domain pointer, as it is
-	 * going to be freed.
+	 * If the iommu driver provides release_domain, the core code ensures
+	 * that domain is attached prior to calling release_device. Drivers can
+	 * use this to enforce a translation on the idle iommu. Typically, the
+	 * global static blocked_domain is a good choice.
+	 *
+	 * Otherwise, the iommu driver must set the device to either an identity
+	 * or a blocking translation in release_device() and stop using any
+	 * domain pointer, as it is going to be freed.
+	 *
+	 * Regardless, if a delayed attach never occurred, then the release
+	 * should still avoid touching any hardware configuration either.
 	 */
+	if (!dev->iommu->attach_deferred && ops->release_domain)
+		ops->release_domain->ops->attach_dev(ops->release_domain, dev);
+
 	if (ops->release_device)
 		ops->release_device(dev);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/5] iommu/vt-d: Fix NULL domain on device release
  2024-03-05  1:33 [PATCH v3 0/5] iommu: Fix domain check on release Lu Baolu
  2024-03-05  1:33 ` [PATCH v3 1/5] iommu: Add static iommu_ops->release_domain Lu Baolu
@ 2024-03-05  1:33 ` Lu Baolu
  2024-03-05  1:48   ` Tian, Kevin
  2024-03-05  1:33 ` [PATCH v3 3/5] iommu/vt-d: Setup scalable mode context entry in probe path Lu Baolu
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Lu Baolu @ 2024-03-05  1:33 UTC (permalink / raw
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Eric Badger
  Cc: iommu, linux-kernel, Lu Baolu

In the kdump kernel, the IOMMU operates in deferred_attach mode. In this
mode, info->domain may not yet be assigned by the time the release_device
function is called. It leads to the following crash in the crash kernel:

    BUG: kernel NULL pointer dereference, address: 000000000000003c
    ...
    RIP: 0010:do_raw_spin_lock+0xa/0xa0
    ...
    _raw_spin_lock_irqsave+0x1b/0x30
    intel_iommu_release_device+0x96/0x170
    iommu_deinit_device+0x39/0xf0
    __iommu_group_remove_device+0xa0/0xd0
    iommu_bus_notifier+0x55/0xb0
    notifier_call_chain+0x5a/0xd0
    blocking_notifier_call_chain+0x41/0x60
    bus_notify+0x34/0x50
    device_del+0x269/0x3d0
    pci_remove_bus_device+0x77/0x100
    p2sb_bar+0xae/0x1d0
    ...
    i801_probe+0x423/0x740

Use the release_domain mechanism to fix it. The scalable mode context
entry which is not part of release_domain should be cleared in
release_device().

Fixes: 586081d3f6b1 ("iommu/vt-d: Remove DEFER_DEVICE_DOMAIN_INFO")
Reported-by: Eric Badger <ebadger@purestorage.com>
Closes: https://lore.kernel.org/r/20240113181713.1817855-1-ebadger@purestorage.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/pasid.h |  1 +
 drivers/iommu/intel/iommu.c | 31 ++++--------------
 drivers/iommu/intel/pasid.c | 64 +++++++++++++++++++++++++++++++++++++
 3 files changed, 71 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index 487ede039bdd..42fda97fd851 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -318,4 +318,5 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 				 bool fault_ignore);
 void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
 					  struct device *dev, u32 pasid);
+void intel_pasid_teardown_sm_context(struct device *dev);
 #endif /* __INTEL_PASID_H */
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index cc3994efd362..f74d42d3258f 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3869,30 +3869,6 @@ static void domain_context_clear(struct device_domain_info *info)
 			       &domain_context_clear_one_cb, info);
 }
 
-static void dmar_remove_one_dev_info(struct device *dev)
-{
-	struct device_domain_info *info = dev_iommu_priv_get(dev);
-	struct dmar_domain *domain = info->domain;
-	struct intel_iommu *iommu = info->iommu;
-	unsigned long flags;
-
-	if (!dev_is_real_dma_subdevice(info->dev)) {
-		if (dev_is_pci(info->dev) && sm_supported(iommu))
-			intel_pasid_tear_down_entry(iommu, info->dev,
-					IOMMU_NO_PASID, false);
-
-		iommu_disable_pci_caps(info);
-		domain_context_clear(info);
-	}
-
-	spin_lock_irqsave(&domain->lock, flags);
-	list_del(&info->link);
-	spin_unlock_irqrestore(&domain->lock, flags);
-
-	domain_detach_iommu(domain, iommu);
-	info->domain = NULL;
-}
-
 /*
  * Clear the page table pointer in context or pasid table entries so that
  * all DMA requests without PASID from the device are blocked. If the page
@@ -4431,7 +4407,11 @@ static void intel_iommu_release_device(struct device *dev)
 	mutex_lock(&iommu->iopf_lock);
 	device_rbtree_remove(info);
 	mutex_unlock(&iommu->iopf_lock);
-	dmar_remove_one_dev_info(dev);
+
+	if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev) &&
+	    !context_copied(iommu, info->bus, info->devfn))
+		intel_pasid_teardown_sm_context(dev);
+
 	intel_pasid_free_table(dev);
 	intel_iommu_debugfs_remove_dev(info);
 	kfree(info);
@@ -4922,6 +4902,7 @@ static const struct iommu_dirty_ops intel_dirty_ops = {
 
 const struct iommu_ops intel_iommu_ops = {
 	.blocked_domain		= &blocking_domain,
+	.release_domain		= &blocking_domain,
 	.capable		= intel_iommu_capable,
 	.hw_info		= intel_iommu_hw_info,
 	.domain_alloc		= intel_iommu_domain_alloc,
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 108158e2b907..9261ea986dbf 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -667,3 +667,67 @@ int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
 
 	return 0;
 }
+
+/*
+ * Interfaces to setup or teardown a pasid table to the scalable-mode
+ * context table entry:
+ */
+
+static void device_pasid_table_teardown(struct device *dev, u8 bus, u8 devfn)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct intel_iommu *iommu = info->iommu;
+	struct context_entry *context;
+
+	spin_lock(&iommu->lock);
+	context = iommu_context_addr(iommu, bus, devfn, false);
+	if (!context) {
+		spin_unlock(&iommu->lock);
+		return;
+	}
+
+	context_clear_entry(context);
+	__iommu_flush_cache(iommu, context, sizeof(*context));
+	spin_unlock(&iommu->lock);
+
+	/*
+	 * Cache invalidation for changes to a scalable-mode context table
+	 * entry.
+	 *
+	 * Section 6.5.3.3 of the VT-d spec:
+	 * - Device-selective context-cache invalidation;
+	 * - Domain-selective PASID-cache invalidation to affected domains
+	 *   (can be skipped if all PASID entries were not-present);
+	 * - Domain-selective IOTLB invalidation to affected domains;
+	 * - Global Device-TLB invalidation to affected functions.
+	 *
+	 * The iommu has been parked in the blocking state. All domains have
+	 * been detached from the device or PASID. The PASID and IOTLB caches
+	 * have been invalidated during the domain detach path.
+	 */
+	iommu->flush.flush_context(iommu, 0, PCI_DEVID(bus, devfn),
+				   DMA_CCMD_MASK_NOBIT, DMA_CCMD_DEVICE_INVL);
+	devtlb_invalidation_with_pasid(iommu, dev, IOMMU_NO_PASID);
+}
+
+static int pci_pasid_table_teardown(struct pci_dev *pdev, u16 alias, void *data)
+{
+	struct device *dev = data;
+
+	if (dev == &pdev->dev)
+		device_pasid_table_teardown(dev, PCI_BUS_NUM(alias), alias & 0xff);
+
+	return 0;
+}
+
+void intel_pasid_teardown_sm_context(struct device *dev)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+
+	if (!dev_is_pci(dev)) {
+		device_pasid_table_teardown(dev, info->bus, info->devfn);
+		return;
+	}
+
+	pci_for_each_dma_alias(to_pci_dev(dev), pci_pasid_table_teardown, dev);
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 3/5] iommu/vt-d: Setup scalable mode context entry in probe path
  2024-03-05  1:33 [PATCH v3 0/5] iommu: Fix domain check on release Lu Baolu
  2024-03-05  1:33 ` [PATCH v3 1/5] iommu: Add static iommu_ops->release_domain Lu Baolu
  2024-03-05  1:33 ` [PATCH v3 2/5] iommu/vt-d: Fix NULL domain on device release Lu Baolu
@ 2024-03-05  1:33 ` Lu Baolu
  2024-03-05  1:50   ` Tian, Kevin
  2024-03-05  1:33 ` [PATCH v3 4/5] iommu/vt-d: Remove scalable mode context entry setup from attach_dev Lu Baolu
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Lu Baolu @ 2024-03-05  1:33 UTC (permalink / raw
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Eric Badger
  Cc: iommu, linux-kernel, Lu Baolu

In contrast to legacy mode, the DMA translation table is configured in
the PASID table entry instead of the context entry for scalable mode.
For this reason, it is more appropriate to set up the scalable mode
context entry in the device_probe callback and direct it to the
appropriate PASID table.

The iommu domain attach/detach operations only affect the PASID table
entry. Therefore, there is no need to modify the context entry when
configuring the translation type and page table.

The only exception is the kdump case, where context entry setup is
postponed until the device driver invokes the first DMA interface.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel/pasid.h |   1 +
 drivers/iommu/intel/iommu.c |  12 ++++
 drivers/iommu/intel/pasid.c | 138 ++++++++++++++++++++++++++++++++++++
 3 files changed, 151 insertions(+)

diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index 42fda97fd851..da9978fef7ac 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -318,5 +318,6 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 				 bool fault_ignore);
 void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
 					  struct device *dev, u32 pasid);
+int intel_pasid_setup_sm_context(struct device *dev);
 void intel_pasid_teardown_sm_context(struct device *dev);
 #endif /* __INTEL_PASID_H */
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index f74d42d3258f..9b96d36b9d2a 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4073,6 +4073,10 @@ int prepare_domain_attach_device(struct iommu_domain *domain,
 		dmar_domain->agaw--;
 	}
 
+	if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev) &&
+	    context_copied(iommu, info->bus, info->devfn))
+		return intel_pasid_setup_sm_context(dev);
+
 	return 0;
 }
 
@@ -4386,11 +4390,19 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
 			dev_err(dev, "PASID table allocation failed\n");
 			goto clear_rbtree;
 		}
+
+		if (!context_copied(iommu, info->bus, info->devfn)) {
+			ret = intel_pasid_setup_sm_context(dev);
+			if (ret)
+				goto free_table;
+		}
 	}
 
 	intel_iommu_debugfs_create_dev(info);
 
 	return &iommu->iommu;
+free_table:
+	intel_pasid_free_table(dev);
 clear_rbtree:
 	device_rbtree_remove(info);
 free:
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 9261ea986dbf..625d5cff49c3 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -731,3 +731,141 @@ void intel_pasid_teardown_sm_context(struct device *dev)
 
 	pci_for_each_dma_alias(to_pci_dev(dev), pci_pasid_table_teardown, dev);
 }
+
+/*
+ * Get the PASID directory size for scalable mode context entry.
+ * Value of X in the PDTS field of a scalable mode context entry
+ * indicates PASID directory with 2^(X + 7) entries.
+ */
+static unsigned long context_get_sm_pds(struct pasid_table *table)
+{
+	unsigned long pds, max_pde;
+
+	max_pde = table->max_pasid >> PASID_PDE_SHIFT;
+	pds = find_first_bit(&max_pde, MAX_NR_PASID_BITS);
+	if (pds < 7)
+		return 0;
+
+	return pds - 7;
+}
+
+static int context_entry_set_pasid_table(struct context_entry *context,
+					 struct device *dev)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct pasid_table *table = info->pasid_table;
+	struct intel_iommu *iommu = info->iommu;
+	unsigned long pds;
+
+	context_clear_entry(context);
+
+	pds = context_get_sm_pds(table);
+	context->lo = (u64)virt_to_phys(table->table) | context_pdts(pds);
+	context_set_sm_rid2pasid(context, IOMMU_NO_PASID);
+
+	if (info->ats_supported)
+		context_set_sm_dte(context);
+	if (info->pri_supported)
+		context_set_sm_pre(context);
+	if (info->pasid_supported)
+		context_set_pasid(context);
+
+	context_set_fault_enable(context);
+	context_set_present(context);
+	__iommu_flush_cache(iommu, context, sizeof(*context));
+
+	return 0;
+}
+
+static int device_pasid_table_setup(struct device *dev, u8 bus, u8 devfn)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct intel_iommu *iommu = info->iommu;
+	struct context_entry *context;
+
+	spin_lock(&iommu->lock);
+	context = iommu_context_addr(iommu, bus, devfn, true);
+	if (!context) {
+		spin_unlock(&iommu->lock);
+		return -ENOMEM;
+	}
+
+	if (context_present(context) && !context_copied(iommu, bus, devfn)) {
+		spin_unlock(&iommu->lock);
+		return 0;
+	}
+
+	if (context_copied(iommu, bus, devfn)) {
+		context_clear_entry(context);
+		__iommu_flush_cache(iommu, context, sizeof(*context));
+
+		/*
+		 * For kdump cases, old valid entries may be cached due to
+		 * the in-flight DMA and copied pgtable, but there is no
+		 * unmapping behaviour for them, thus we need explicit cache
+		 * flushes for all affected domain IDs and PASIDs used in
+		 * the copied PASID table. Given that we have no idea about
+		 * which domain IDs and PASIDs were used in the copied tables,
+		 * upgrade them to global PASID and IOTLB cache invalidation.
+		 */
+		iommu->flush.flush_context(iommu, 0,
+					   PCI_DEVID(bus, devfn),
+					   DMA_CCMD_MASK_NOBIT,
+					   DMA_CCMD_DEVICE_INVL);
+		qi_flush_pasid_cache(iommu, 0, QI_PC_GLOBAL, 0);
+		iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH);
+		devtlb_invalidation_with_pasid(iommu, dev, IOMMU_NO_PASID);
+
+		/*
+		 * At this point, the device is supposed to finish reset at
+		 * its driver probe stage, so no in-flight DMA will exist,
+		 * and we don't need to worry anymore hereafter.
+		 */
+		clear_context_copied(iommu, bus, devfn);
+	}
+
+	context_entry_set_pasid_table(context, dev);
+	spin_unlock(&iommu->lock);
+
+	/*
+	 * It's a non-present to present mapping. If hardware doesn't cache
+	 * non-present entry we don't need to flush the caches. If it does
+	 * cache non-present entries, then it does so in the special
+	 * domain #0, which we have to flush:
+	 */
+	if (cap_caching_mode(iommu->cap)) {
+		iommu->flush.flush_context(iommu, 0,
+					   PCI_DEVID(bus, devfn),
+					   DMA_CCMD_MASK_NOBIT,
+					   DMA_CCMD_DEVICE_INVL);
+		iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_DSI_FLUSH);
+        }
+
+	return 0;
+}
+
+static int pci_pasid_table_setup(struct pci_dev *pdev, u16 alias, void *data)
+{
+	struct device *dev = data;
+
+	if (dev != &pdev->dev)
+		return 0;
+
+	return device_pasid_table_setup(dev, PCI_BUS_NUM(alias), alias & 0xff);
+}
+
+/*
+ * Set the device's PASID table to its context table entry.
+ *
+ * The PASID table is set to the context entries of both device itself
+ * and its alias requester ID for DMA.
+ */
+int intel_pasid_setup_sm_context(struct device *dev)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+
+	if (!dev_is_pci(dev))
+		return device_pasid_table_setup(dev, info->bus, info->devfn);
+
+	return pci_for_each_dma_alias(to_pci_dev(dev), pci_pasid_table_setup, dev);
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 4/5] iommu/vt-d: Remove scalable mode context entry setup from attach_dev
  2024-03-05  1:33 [PATCH v3 0/5] iommu: Fix domain check on release Lu Baolu
                   ` (2 preceding siblings ...)
  2024-03-05  1:33 ` [PATCH v3 3/5] iommu/vt-d: Setup scalable mode context entry in probe path Lu Baolu
@ 2024-03-05  1:33 ` Lu Baolu
  2024-03-05  1:33 ` [PATCH v3 5/5] iommu/vt-d: Remove scalabe mode in domain_context_clear_one() Lu Baolu
  2024-03-05  7:54 ` [PATCH v3 0/5] iommu: Fix domain check on release Joerg Roedel
  5 siblings, 0 replies; 13+ messages in thread
From: Lu Baolu @ 2024-03-05  1:33 UTC (permalink / raw
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Eric Badger
  Cc: iommu, linux-kernel, Lu Baolu

The scalable mode context entry is now setup in the probe_device path,
eliminating the need to configure it in the attach_dev path. Removes the
redundant code from the attach_dev path to avoid dead code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/iommu/intel/iommu.c | 156 ++++++++++--------------------------
 1 file changed, 44 insertions(+), 112 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9b96d36b9d2a..d682eb6ad4d2 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1850,34 +1850,17 @@ static void domain_exit(struct dmar_domain *domain)
 	kfree(domain);
 }
 
-/*
- * Get the PASID directory size for scalable mode context entry.
- * Value of X in the PDTS field of a scalable mode context entry
- * indicates PASID directory with 2^(X + 7) entries.
- */
-static unsigned long context_get_sm_pds(struct pasid_table *table)
-{
-	unsigned long pds, max_pde;
-
-	max_pde = table->max_pasid >> PASID_PDE_SHIFT;
-	pds = find_first_bit(&max_pde, MAX_NR_PASID_BITS);
-	if (pds < 7)
-		return 0;
-
-	return pds - 7;
-}
-
 static int domain_context_mapping_one(struct dmar_domain *domain,
 				      struct intel_iommu *iommu,
-				      struct pasid_table *table,
 				      u8 bus, u8 devfn)
 {
 	struct device_domain_info *info =
 			domain_lookup_dev_info(domain, iommu, bus, devfn);
 	u16 did = domain_id_iommu(domain, iommu);
 	int translation = CONTEXT_TT_MULTI_LEVEL;
+	struct dma_pte *pgd = domain->pgd;
 	struct context_entry *context;
-	int ret;
+	int agaw, ret;
 
 	if (hw_pass_through && domain_type_is_si(domain))
 		translation = CONTEXT_TT_PASS_THROUGH;
@@ -1920,65 +1903,37 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
 	}
 
 	context_clear_entry(context);
+	context_set_domain_id(context, did);
 
-	if (sm_supported(iommu)) {
-		unsigned long pds;
-
-		/* Setup the PASID DIR pointer: */
-		pds = context_get_sm_pds(table);
-		context->lo = (u64)virt_to_phys(table->table) |
-				context_pdts(pds);
-
-		/* Setup the RID_PASID field: */
-		context_set_sm_rid2pasid(context, IOMMU_NO_PASID);
-
+	if (translation != CONTEXT_TT_PASS_THROUGH) {
 		/*
-		 * Setup the Device-TLB enable bit and Page request
-		 * Enable bit:
+		 * Skip top levels of page tables for iommu which has
+		 * less agaw than default. Unnecessary for PT mode.
 		 */
+		for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
+			ret = -ENOMEM;
+			pgd = phys_to_virt(dma_pte_addr(pgd));
+			if (!dma_pte_present(pgd))
+				goto out_unlock;
+		}
+
 		if (info && info->ats_supported)
-			context_set_sm_dte(context);
-		if (info && info->pri_supported)
-			context_set_sm_pre(context);
-		if (info && info->pasid_supported)
-			context_set_pasid(context);
+			translation = CONTEXT_TT_DEV_IOTLB;
+		else
+			translation = CONTEXT_TT_MULTI_LEVEL;
+
+		context_set_address_root(context, virt_to_phys(pgd));
+		context_set_address_width(context, agaw);
 	} else {
-		struct dma_pte *pgd = domain->pgd;
-		int agaw;
-
-		context_set_domain_id(context, did);
-
-		if (translation != CONTEXT_TT_PASS_THROUGH) {
-			/*
-			 * Skip top levels of page tables for iommu which has
-			 * less agaw than default. Unnecessary for PT mode.
-			 */
-			for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
-				ret = -ENOMEM;
-				pgd = phys_to_virt(dma_pte_addr(pgd));
-				if (!dma_pte_present(pgd))
-					goto out_unlock;
-			}
-
-			if (info && info->ats_supported)
-				translation = CONTEXT_TT_DEV_IOTLB;
-			else
-				translation = CONTEXT_TT_MULTI_LEVEL;
-
-			context_set_address_root(context, virt_to_phys(pgd));
-			context_set_address_width(context, agaw);
-		} else {
-			/*
-			 * In pass through mode, AW must be programmed to
-			 * indicate the largest AGAW value supported by
-			 * hardware. And ASR is ignored by hardware.
-			 */
-			context_set_address_width(context, iommu->msagaw);
-		}
-
-		context_set_translation_type(context, translation);
+		/*
+		 * In pass through mode, AW must be programmed to
+		 * indicate the largest AGAW value supported by
+		 * hardware. And ASR is ignored by hardware.
+		 */
+		context_set_address_width(context, iommu->msagaw);
 	}
 
+	context_set_translation_type(context, translation);
 	context_set_fault_enable(context);
 	context_set_present(context);
 	if (!ecap_coherent(iommu->ecap))
@@ -2008,43 +1963,29 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
 	return ret;
 }
 
-struct domain_context_mapping_data {
-	struct dmar_domain *domain;
-	struct intel_iommu *iommu;
-	struct pasid_table *table;
-};
-
 static int domain_context_mapping_cb(struct pci_dev *pdev,
 				     u16 alias, void *opaque)
 {
-	struct domain_context_mapping_data *data = opaque;
+	struct device_domain_info *info = dev_iommu_priv_get(&pdev->dev);
+	struct intel_iommu *iommu = info->iommu;
+	struct dmar_domain *domain = opaque;
 
-	return domain_context_mapping_one(data->domain, data->iommu,
-					  data->table, PCI_BUS_NUM(alias),
-					  alias & 0xff);
+	return domain_context_mapping_one(domain, iommu,
+					  PCI_BUS_NUM(alias), alias & 0xff);
 }
 
 static int
 domain_context_mapping(struct dmar_domain *domain, struct device *dev)
 {
 	struct device_domain_info *info = dev_iommu_priv_get(dev);
-	struct domain_context_mapping_data data;
 	struct intel_iommu *iommu = info->iommu;
 	u8 bus = info->bus, devfn = info->devfn;
-	struct pasid_table *table;
-
-	table = intel_pasid_get_table(dev);
 
 	if (!dev_is_pci(dev))
-		return domain_context_mapping_one(domain, iommu, table,
-						  bus, devfn);
-
-	data.domain = domain;
-	data.iommu = iommu;
-	data.table = table;
+		return domain_context_mapping_one(domain, iommu, bus, devfn);
 
 	return pci_for_each_dma_alias(to_pci_dev(dev),
-				      &domain_context_mapping_cb, &data);
+				      domain_context_mapping_cb, domain);
 }
 
 /* Returns a number of VTD pages, but aligned to MM page size */
@@ -2404,28 +2345,19 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
 	list_add(&info->link, &domain->devices);
 	spin_unlock_irqrestore(&domain->lock, flags);
 
-	/* PASID table is mandatory for a PCI device in scalable mode. */
-	if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
-		/* Setup the PASID entry for requests without PASID: */
-		if (hw_pass_through && domain_type_is_si(domain))
-			ret = intel_pasid_setup_pass_through(iommu,
-					dev, IOMMU_NO_PASID);
-		else if (domain->use_first_level)
-			ret = domain_setup_first_level(iommu, domain, dev,
-					IOMMU_NO_PASID);
-		else
-			ret = intel_pasid_setup_second_level(iommu, domain,
-					dev, IOMMU_NO_PASID);
-		if (ret) {
-			dev_err(dev, "Setup RID2PASID failed\n");
-			device_block_translation(dev);
-			return ret;
-		}
-	}
+	if (dev_is_real_dma_subdevice(dev))
+		return 0;
+
+	if (!sm_supported(iommu))
+		ret = domain_context_mapping(domain, dev);
+	else if (hw_pass_through && domain_type_is_si(domain))
+		ret = intel_pasid_setup_pass_through(iommu, dev, IOMMU_NO_PASID);
+	else if (domain->use_first_level)
+		ret = domain_setup_first_level(iommu, domain, dev, IOMMU_NO_PASID);
+	else
+		ret = intel_pasid_setup_second_level(iommu, domain, dev, IOMMU_NO_PASID);
 
-	ret = domain_context_mapping(domain, dev);
 	if (ret) {
-		dev_err(dev, "Domain context map failed\n");
 		device_block_translation(dev);
 		return ret;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 5/5] iommu/vt-d: Remove scalabe mode in domain_context_clear_one()
  2024-03-05  1:33 [PATCH v3 0/5] iommu: Fix domain check on release Lu Baolu
                   ` (3 preceding siblings ...)
  2024-03-05  1:33 ` [PATCH v3 4/5] iommu/vt-d: Remove scalable mode context entry setup from attach_dev Lu Baolu
@ 2024-03-05  1:33 ` Lu Baolu
  2024-03-05  7:54 ` [PATCH v3 0/5] iommu: Fix domain check on release Joerg Roedel
  5 siblings, 0 replies; 13+ messages in thread
From: Lu Baolu @ 2024-03-05  1:33 UTC (permalink / raw
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Eric Badger
  Cc: iommu, linux-kernel, Lu Baolu

domain_context_clear_one() only handles the context entry teardown in
legacy mode. Remove the scalable mode check in it to avoid dead code.

Remove an unnecessary check in the code as well.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/iommu/intel/iommu.c | 15 +--------------
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d682eb6ad4d2..50eb9aed47cc 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2175,9 +2175,6 @@ static void domain_context_clear_one(struct device_domain_info *info, u8 bus, u8
 	struct context_entry *context;
 	u16 did_old;
 
-	if (!iommu)
-		return;
-
 	spin_lock(&iommu->lock);
 	context = iommu_context_addr(iommu, bus, devfn, 0);
 	if (!context) {
@@ -2185,14 +2182,7 @@ static void domain_context_clear_one(struct device_domain_info *info, u8 bus, u8
 		return;
 	}
 
-	if (sm_supported(iommu)) {
-		if (hw_pass_through && domain_type_is_si(info->domain))
-			did_old = FLPT_DEFAULT_DID;
-		else
-			did_old = domain_id_iommu(info->domain, iommu);
-	} else {
-		did_old = context_domain_id(context);
-	}
+	did_old = context_domain_id(context);
 
 	context_clear_entry(context);
 	__iommu_flush_cache(iommu, context, sizeof(*context));
@@ -2203,9 +2193,6 @@ static void domain_context_clear_one(struct device_domain_info *info, u8 bus, u8
 				   DMA_CCMD_MASK_NOBIT,
 				   DMA_CCMD_DEVICE_INVL);
 
-	if (sm_supported(iommu))
-		qi_flush_pasid_cache(iommu, did_old, QI_PC_ALL_PASIDS, 0);
-
 	iommu->flush.flush_iotlb(iommu,
 				 did_old,
 				 0,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* RE: [PATCH v3 2/5] iommu/vt-d: Fix NULL domain on device release
  2024-03-05  1:33 ` [PATCH v3 2/5] iommu/vt-d: Fix NULL domain on device release Lu Baolu
@ 2024-03-05  1:48   ` Tian, Kevin
  0 siblings, 0 replies; 13+ messages in thread
From: Tian, Kevin @ 2024-03-05  1:48 UTC (permalink / raw
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Badger, Eric
  Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Tuesday, March 5, 2024 9:33 AM
> 
> In the kdump kernel, the IOMMU operates in deferred_attach mode. In this
> mode, info->domain may not yet be assigned by the time the release_device
> function is called. It leads to the following crash in the crash kernel:
> 
>     BUG: kernel NULL pointer dereference, address: 000000000000003c
>     ...
>     RIP: 0010:do_raw_spin_lock+0xa/0xa0
>     ...
>     _raw_spin_lock_irqsave+0x1b/0x30
>     intel_iommu_release_device+0x96/0x170
>     iommu_deinit_device+0x39/0xf0
>     __iommu_group_remove_device+0xa0/0xd0
>     iommu_bus_notifier+0x55/0xb0
>     notifier_call_chain+0x5a/0xd0
>     blocking_notifier_call_chain+0x41/0x60
>     bus_notify+0x34/0x50
>     device_del+0x269/0x3d0
>     pci_remove_bus_device+0x77/0x100
>     p2sb_bar+0xae/0x1d0
>     ...
>     i801_probe+0x423/0x740
> 
> Use the release_domain mechanism to fix it. The scalable mode context
> entry which is not part of release_domain should be cleared in
> release_device().
> 
> Fixes: 586081d3f6b1 ("iommu/vt-d: Remove DEFER_DEVICE_DOMAIN_INFO")
> Reported-by: Eric Badger <ebadger@purestorage.com>
> Closes: https://lore.kernel.org/r/20240113181713.1817855-1-
> ebadger@purestorage.com
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH v3 3/5] iommu/vt-d: Setup scalable mode context entry in probe path
  2024-03-05  1:33 ` [PATCH v3 3/5] iommu/vt-d: Setup scalable mode context entry in probe path Lu Baolu
@ 2024-03-05  1:50   ` Tian, Kevin
  0 siblings, 0 replies; 13+ messages in thread
From: Tian, Kevin @ 2024-03-05  1:50 UTC (permalink / raw
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Badger, Eric
  Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Tuesday, March 5, 2024 9:33 AM
> 
> In contrast to legacy mode, the DMA translation table is configured in
> the PASID table entry instead of the context entry for scalable mode.
> For this reason, it is more appropriate to set up the scalable mode
> context entry in the device_probe callback and direct it to the
> appropriate PASID table.
> 
> The iommu domain attach/detach operations only affect the PASID table
> entry. Therefore, there is no need to modify the context entry when
> configuring the translation type and page table.
> 
> The only exception is the kdump case, where context entry setup is
> postponed until the device driver invokes the first DMA interface.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/5] iommu: Fix domain check on release
  2024-03-05  1:33 [PATCH v3 0/5] iommu: Fix domain check on release Lu Baolu
                   ` (4 preceding siblings ...)
  2024-03-05  1:33 ` [PATCH v3 5/5] iommu/vt-d: Remove scalabe mode in domain_context_clear_one() Lu Baolu
@ 2024-03-05  7:54 ` Joerg Roedel
  2024-03-05 12:01   ` Baolu Lu
  5 siblings, 1 reply; 13+ messages in thread
From: Joerg Roedel @ 2024-03-05  7:54 UTC (permalink / raw
  To: Lu Baolu
  Cc: Will Deacon, Robin Murphy, Jason Gunthorpe, Kevin Tian,
	Eric Badger, iommu, linux-kernel

Hi Baolu,

On Tue, Mar 05, 2024 at 09:33:00AM +0800, Lu Baolu wrote:
> It fixes a NULL pointer dereference issue in the Intel iommu driver and
> strengthens the iommu core to possibly prevent similar failures in other
> iommu drivers.

Please send me another pull request once you consider this ready. I
guess it is v6.9 material.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/5] iommu: Fix domain check on release
  2024-03-05  7:54 ` [PATCH v3 0/5] iommu: Fix domain check on release Joerg Roedel
@ 2024-03-05 12:01   ` Baolu Lu
  0 siblings, 0 replies; 13+ messages in thread
From: Baolu Lu @ 2024-03-05 12:01 UTC (permalink / raw
  To: Joerg Roedel
  Cc: baolu.lu, Will Deacon, Robin Murphy, Jason Gunthorpe, Kevin Tian,
	Eric Badger, iommu, linux-kernel

On 2024/3/5 15:54, Joerg Roedel wrote:
> On Tue, Mar 05, 2024 at 09:33:00AM +0800, Lu Baolu wrote:
>> It fixes a NULL pointer dereference issue in the Intel iommu driver and
>> strengthens the iommu core to possibly prevent similar failures in other
>> iommu drivers.
> Please send me another pull request once you consider this ready. I
> guess it is v6.9 material.

Sure. I will queue this series in a pull request.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/5] iommu: Add static iommu_ops->release_domain
  2024-03-05  1:33 ` [PATCH v3 1/5] iommu: Add static iommu_ops->release_domain Lu Baolu
@ 2024-04-10 15:26   ` Jason Gunthorpe
  2024-04-10 16:37     ` Robin Murphy
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2024-04-10 15:26 UTC (permalink / raw
  To: Lu Baolu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian, Eric Badger,
	iommu, linux-kernel

On Tue, Mar 05, 2024 at 09:33:01AM +0800, Lu Baolu wrote:
> The current device_release callback for individual iommu drivers does the
> following:
> 
> 1) Silent IOMMU DMA translation: It detaches any existing domain from the
>    device and puts it into a blocking state (some drivers might use the
>    identity state).
> 2) Resource release: It releases resources allocated during the
>    device_probe callback and restores the device to its pre-probe state.
> 
> Step 1 is challenging for individual iommu drivers because each must check
> if a domain is already attached to the device. Additionally, if a deferred
> attach never occurred, the device_release should avoid modifying hardware
> configuration regardless of the reason for its call.
> 
> To simplify this process, introduce a static release_domain within the
> iommu_ops structure. It can be either a blocking or identity domain
> depending on the iommu hardware. The iommu core will decide whether to
> attach this domain before the device_release callback, eliminating the
> need for repetitive code in various drivers.
> 
> Consequently, the device_release callback can focus solely on the opposite
> operations of device_probe, including releasing all resources allocated
> during that callback.
> 
> Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  include/linux/iommu.h |  1 +
>  drivers/iommu/iommu.c | 19 +++++++++++++++----
>  2 files changed, 16 insertions(+), 4 deletions(-)

I looked at all the drivers:
 1) Didn't spend time to guess what ipmmu-vmss is doing
 2) Several drivers are obviously missing the release_domain behavior
    and now be trivially fixed
 3) amd, s390, virtio-iommu and arm-smmu should probably support
    blocked_domain (I assume that is what their detach does)
 4) arm-smmuv3 can use it too once disable_bypass is removed
 5) Several drivers don't even have release_device functions.
    Probably all of them can do their identiy domains too.

This seems like a pretty reasonable thing to add to this series too:

diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index eb1e62cd499a58..3ddc4b4418a049 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -979,6 +979,7 @@ static void apple_dart_get_resv_regions(struct device *dev,
 static const struct iommu_ops apple_dart_iommu_ops = {
 	.identity_domain = &apple_dart_identity_domain,
 	.blocked_domain = &apple_dart_blocked_domain,
+	.release_domain = &apple_dart_blocked_domain,
 	.domain_alloc_paging = apple_dart_domain_alloc_paging,
 	.probe_device = apple_dart_probe_device,
 	.release_device = apple_dart_release_device,
diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index d98c9161948a25..902dc4da44f987 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -1424,8 +1424,6 @@ static void exynos_iommu_release_device(struct device *dev)
 	struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
 	struct sysmmu_drvdata *data;
 
-	WARN_ON(exynos_iommu_identity_attach(&exynos_identity_domain, dev));
-
 	list_for_each_entry(data, &owner->controllers, owner_node)
 		device_link_del(data->link);
 }
@@ -1471,6 +1469,7 @@ static int exynos_iommu_of_xlate(struct device *dev,
 
 static const struct iommu_ops exynos_iommu_ops = {
 	.identity_domain = &exynos_identity_domain,
+	.release_domain = &exynos_identity_domain,
 	.domain_alloc_paging = exynos_iommu_domain_alloc_paging,
 	.device_group = generic_device_group,
 	.probe_device = exynos_iommu_probe_device,
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index b8c47f18bc2612..b5693041b18dd4 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -1012,6 +1012,7 @@ static void mtk_iommu_get_resv_regions(struct device *dev,
 
 static const struct iommu_ops mtk_iommu_ops = {
 	.identity_domain = &mtk_iommu_identity_domain,
+	.release_domain = &mtk_iommu_identity_domain,
 	.domain_alloc_paging = mtk_iommu_domain_alloc_paging,
 	.probe_device	= mtk_iommu_probe_device,
 	.release_device	= mtk_iommu_release_device,
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index a9fa2a54dc9b39..9e7205af7d7316 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -580,6 +580,7 @@ static int mtk_iommu_v1_hw_init(const struct mtk_iommu_v1_data *data)
 
 static const struct iommu_ops mtk_iommu_v1_ops = {
 	.identity_domain = &mtk_iommu_v1_identity_domain,
+	.release_domain = &mtk_iommu_v1_identity_domain,
 	.domain_alloc_paging = mtk_iommu_v1_domain_alloc_paging,
 	.probe_device	= mtk_iommu_v1_probe_device,
 	.probe_finalize = mtk_iommu_v1_probe_finalize,
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index c9528065a59afa..c4c76aaec19e50 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1725,6 +1725,7 @@ static void omap_iommu_release_device(struct device *dev)
 
 static const struct iommu_ops omap_iommu_ops = {
 	.identity_domain = &omap_iommu_identity_domain,
+	.release_domain = &omap_iommu_identity_domain,
 	.domain_alloc_paging = omap_iommu_domain_alloc_paging,
 	.probe_device	= omap_iommu_probe_device,
 	.release_device	= omap_iommu_release_device,
diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index da79d9f4cf6371..e551c5b143bbd3 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -1162,6 +1162,7 @@ static int rk_iommu_of_xlate(struct device *dev,
 
 static const struct iommu_ops rk_iommu_ops = {
 	.identity_domain = &rk_identity_domain,
+	.release_domain = &rk_identity_domain,
 	.domain_alloc_paging = rk_iommu_domain_alloc_paging,
 	.probe_device = rk_iommu_probe_device,
 	.release_device = rk_iommu_release_device,

> +	if (!dev->iommu->attach_deferred && ops->release_domain)
> +		ops->release_domain->ops->attach_dev(ops->release_domain, dev);

We should probably be sensitive to the 
dev->iommu->require_direct flag - generally drivers should prefer the
blocked for the release domain, but in case the FW ias asking for
require_direct we need to switch to identity.

Also, may as well avoid switching a domain if the group is already
correct and use the common attach function to get the tracing.. So
like this?

	if (!dev->iommu->attach_deferred) {
		struct iommu_domain *release_domain = ops->release_domain;

		if (dev->iommu->require_direct && ops->identity_domain)
			release_domain = ops->identity_domain;

		if (release_domain && group->domain != release_domain)
			WARN_ON(__iommu_attach_device(release_domain, dev));
	}

Jason

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/5] iommu: Add static iommu_ops->release_domain
  2024-04-10 15:26   ` Jason Gunthorpe
@ 2024-04-10 16:37     ` Robin Murphy
  2024-04-11 13:41       ` Jason Gunthorpe
  0 siblings, 1 reply; 13+ messages in thread
From: Robin Murphy @ 2024-04-10 16:37 UTC (permalink / raw
  To: Jason Gunthorpe, Lu Baolu
  Cc: Joerg Roedel, Will Deacon, Kevin Tian, Eric Badger, iommu,
	linux-kernel

On 2024-04-10 4:26 pm, Jason Gunthorpe wrote:
> On Tue, Mar 05, 2024 at 09:33:01AM +0800, Lu Baolu wrote:
>> The current device_release callback for individual iommu drivers does the
>> following:
>>
>> 1) Silent IOMMU DMA translation: It detaches any existing domain from the
>>     device and puts it into a blocking state (some drivers might use the
>>     identity state).
>> 2) Resource release: It releases resources allocated during the
>>     device_probe callback and restores the device to its pre-probe state.
>>
>> Step 1 is challenging for individual iommu drivers because each must check
>> if a domain is already attached to the device. Additionally, if a deferred
>> attach never occurred, the device_release should avoid modifying hardware
>> configuration regardless of the reason for its call.
>>
>> To simplify this process, introduce a static release_domain within the
>> iommu_ops structure. It can be either a blocking or identity domain
>> depending on the iommu hardware. The iommu core will decide whether to
>> attach this domain before the device_release callback, eliminating the
>> need for repetitive code in various drivers.
>>
>> Consequently, the device_release callback can focus solely on the opposite
>> operations of device_probe, including releasing all resources allocated
>> during that callback.
>>
>> Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>> ---
>>   include/linux/iommu.h |  1 +
>>   drivers/iommu/iommu.c | 19 +++++++++++++++----
>>   2 files changed, 16 insertions(+), 4 deletions(-)
> 
> I looked at all the drivers:
>   1) Didn't spend time to guess what ipmmu-vmss is doing
>   2) Several drivers are obviously missing the release_domain behavior
>      and now be trivially fixed
>   3) amd, s390, virtio-iommu and arm-smmu should probably support
>      blocked_domain (I assume that is what their detach does)
>   4) arm-smmuv3 can use it too once disable_bypass is removed
>   5) Several drivers don't even have release_device functions.
>      Probably all of them can do their identiy domains too.
> 
> This seems like a pretty reasonable thing to add to this series too:
> 
> diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
> index eb1e62cd499a58..3ddc4b4418a049 100644
> --- a/drivers/iommu/apple-dart.c
> +++ b/drivers/iommu/apple-dart.c
> @@ -979,6 +979,7 @@ static void apple_dart_get_resv_regions(struct device *dev,
>   static const struct iommu_ops apple_dart_iommu_ops = {
>   	.identity_domain = &apple_dart_identity_domain,
>   	.blocked_domain = &apple_dart_blocked_domain,
> +	.release_domain = &apple_dart_blocked_domain,
>   	.domain_alloc_paging = apple_dart_domain_alloc_paging,
>   	.probe_device = apple_dart_probe_device,
>   	.release_device = apple_dart_release_device,
> diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
> index d98c9161948a25..902dc4da44f987 100644
> --- a/drivers/iommu/exynos-iommu.c
> +++ b/drivers/iommu/exynos-iommu.c
> @@ -1424,8 +1424,6 @@ static void exynos_iommu_release_device(struct device *dev)
>   	struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
>   	struct sysmmu_drvdata *data;
>   
> -	WARN_ON(exynos_iommu_identity_attach(&exynos_identity_domain, dev));
> -
>   	list_for_each_entry(data, &owner->controllers, owner_node)
>   		device_link_del(data->link);
>   }
> @@ -1471,6 +1469,7 @@ static int exynos_iommu_of_xlate(struct device *dev,
>   
>   static const struct iommu_ops exynos_iommu_ops = {
>   	.identity_domain = &exynos_identity_domain,
> +	.release_domain = &exynos_identity_domain,
>   	.domain_alloc_paging = exynos_iommu_domain_alloc_paging,
>   	.device_group = generic_device_group,
>   	.probe_device = exynos_iommu_probe_device,
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index b8c47f18bc2612..b5693041b18dd4 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -1012,6 +1012,7 @@ static void mtk_iommu_get_resv_regions(struct device *dev,
>   
>   static const struct iommu_ops mtk_iommu_ops = {
>   	.identity_domain = &mtk_iommu_identity_domain,
> +	.release_domain = &mtk_iommu_identity_domain,
>   	.domain_alloc_paging = mtk_iommu_domain_alloc_paging,
>   	.probe_device	= mtk_iommu_probe_device,
>   	.release_device	= mtk_iommu_release_device,
> diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
> index a9fa2a54dc9b39..9e7205af7d7316 100644
> --- a/drivers/iommu/mtk_iommu_v1.c
> +++ b/drivers/iommu/mtk_iommu_v1.c
> @@ -580,6 +580,7 @@ static int mtk_iommu_v1_hw_init(const struct mtk_iommu_v1_data *data)
>   
>   static const struct iommu_ops mtk_iommu_v1_ops = {
>   	.identity_domain = &mtk_iommu_v1_identity_domain,
> +	.release_domain = &mtk_iommu_v1_identity_domain,
>   	.domain_alloc_paging = mtk_iommu_v1_domain_alloc_paging,
>   	.probe_device	= mtk_iommu_v1_probe_device,
>   	.probe_finalize = mtk_iommu_v1_probe_finalize,
> diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
> index c9528065a59afa..c4c76aaec19e50 100644
> --- a/drivers/iommu/omap-iommu.c
> +++ b/drivers/iommu/omap-iommu.c
> @@ -1725,6 +1725,7 @@ static void omap_iommu_release_device(struct device *dev)
>   
>   static const struct iommu_ops omap_iommu_ops = {
>   	.identity_domain = &omap_iommu_identity_domain,
> +	.release_domain = &omap_iommu_identity_domain,
>   	.domain_alloc_paging = omap_iommu_domain_alloc_paging,
>   	.probe_device	= omap_iommu_probe_device,
>   	.release_device	= omap_iommu_release_device,
> diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> index da79d9f4cf6371..e551c5b143bbd3 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -1162,6 +1162,7 @@ static int rk_iommu_of_xlate(struct device *dev,
>   
>   static const struct iommu_ops rk_iommu_ops = {
>   	.identity_domain = &rk_identity_domain,
> +	.release_domain = &rk_identity_domain,
>   	.domain_alloc_paging = rk_iommu_domain_alloc_paging,
>   	.probe_device = rk_iommu_probe_device,
>   	.release_device = rk_iommu_release_device,
> 
>> +	if (!dev->iommu->attach_deferred && ops->release_domain)
>> +		ops->release_domain->ops->attach_dev(ops->release_domain, dev);
> 
> We should probably be sensitive to the
> dev->iommu->require_direct flag - generally drivers should prefer the
> blocked for the release domain, but in case the FW ias asking for
> require_direct we need to switch to identity.

At this point do we even need release_domain? It sounds like the logic 
you want to enforce is going to be trivial to resolve directly in the 
core code. As typed-in-mail-client pseudocode, roughly:

static void iommu_set_release_domain(struct device *dev)
{
	const struct iommu_ops *ops = dev_iommu_ops(dev);
	struct iommu_domain *rd;

	/*
	 * Static domains are expected not to track any device state,
	 * and thus be tolerant of devices disappearing once "attached"
	 */
	if (ops->blocked_domain && !(dev->iommu->require_direct || 
other_arch_or_platform_reason))
		rd = ops->blocked_domain;
	else if (ops->identity_domain)
		rd = ops->identity_domain;
	else /* Hope release_device does the right thing! */
		return;

	if (!dev->iommu->attach_deferred && rd != dev->iommu_group->domain)
		__iommu_attach_device(rd, dev);
}

...no driver churn necessary. (And frankly I think it's a further bonus 
to avoid risking any notion of release_domain being implementable as its 
own distinct special thing)

Thanks,
Robin.

> 
> Also, may as well avoid switching a domain if the group is already
> correct and use the common attach function to get the tracing.. So
> like this?
> 
> 	if (!dev->iommu->attach_deferred) {
> 		struct iommu_domain *release_domain = ops->release_domain;
> 
> 		if (dev->iommu->require_direct && ops->identity_domain)
> 			release_domain = ops->identity_domain;
> 
> 		if (release_domain && group->domain != release_domain)
> 			WARN_ON(__iommu_attach_device(release_domain, dev));
> 	}
> 
> Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/5] iommu: Add static iommu_ops->release_domain
  2024-04-10 16:37     ` Robin Murphy
@ 2024-04-11 13:41       ` Jason Gunthorpe
  0 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2024-04-11 13:41 UTC (permalink / raw
  To: Robin Murphy
  Cc: Lu Baolu, Joerg Roedel, Will Deacon, Kevin Tian, Eric Badger,
	iommu, linux-kernel

On Wed, Apr 10, 2024 at 05:37:06PM +0100, Robin Murphy wrote:
> > We should probably be sensitive to the
> > dev->iommu->require_direct flag - generally drivers should prefer the
> > blocked for the release domain, but in case the FW ias asking for
> > require_direct we need to switch to identity.
> 
> At this point do we even need release_domain?

Ultimately ideally not, but I feel better going through the exercise
driver-by-driver before we just make the core code do it
automatically. Maybe I'm being overly pessimistic about the drivers..

Have all the drivers setting identity/blocked domain set release
domain before we switch to this unconditional method.

Anyhow, I just noticed it went into -rc1 already, so may as well keep
going.

> static void iommu_set_release_domain(struct device *dev)
> {
> 	const struct iommu_ops *ops = dev_iommu_ops(dev);
> 	struct iommu_domain *rd;
> 
> 	/*
> 	 * Static domains are expected not to track any device state,
> 	 * and thus be tolerant of devices disappearing once "attached"
> 	 */
> 	if (ops->blocked_domain && !(dev->iommu->require_direct ||
> other_arch_or_platform_reason))
> 		rd = ops->blocked_domain;
> 	else if (ops->identity_domain)
> 		rd = ops->identity_domain;
> 	else /* Hope release_device does the right thing! */
> 		return;
> 
> 	if (!dev->iommu->attach_deferred && rd != dev->iommu_group->domain)
> 		__iommu_attach_device(rd, dev);
> }

Yeah, this is a good end goal.

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-04-11 13:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-05  1:33 [PATCH v3 0/5] iommu: Fix domain check on release Lu Baolu
2024-03-05  1:33 ` [PATCH v3 1/5] iommu: Add static iommu_ops->release_domain Lu Baolu
2024-04-10 15:26   ` Jason Gunthorpe
2024-04-10 16:37     ` Robin Murphy
2024-04-11 13:41       ` Jason Gunthorpe
2024-03-05  1:33 ` [PATCH v3 2/5] iommu/vt-d: Fix NULL domain on device release Lu Baolu
2024-03-05  1:48   ` Tian, Kevin
2024-03-05  1:33 ` [PATCH v3 3/5] iommu/vt-d: Setup scalable mode context entry in probe path Lu Baolu
2024-03-05  1:50   ` Tian, Kevin
2024-03-05  1:33 ` [PATCH v3 4/5] iommu/vt-d: Remove scalable mode context entry setup from attach_dev Lu Baolu
2024-03-05  1:33 ` [PATCH v3 5/5] iommu/vt-d: Remove scalabe mode in domain_context_clear_one() Lu Baolu
2024-03-05  7:54 ` [PATCH v3 0/5] iommu: Fix domain check on release Joerg Roedel
2024-03-05 12:01   ` Baolu Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).