All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] arm64: IOMMU-backed DMA mapping
@ 2015-07-16 18:40 ` Robin Murphy
  0 siblings, 0 replies; 24+ messages in thread
From: Robin Murphy @ 2015-07-16 18:40 UTC (permalink / raw
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	joro-zLv9SwRftAIdnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw, arnd-r2nGTMty4D4,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA, yong.wu-NuS5LvNUpcJWk0Htik3J/w

Hi all,

Here's a quick repost to address feedback on v3, and a couple of other
tweaks. Changes this round:

- Fix the heinous dma_mask/coherency confusion, which also simplifies
  some prototypes and parameter passing as a bonus.
- Reorder the iommu_map/unmap in alloc/free to prevent any potential
  access the wrong side of clearing/freeing the pages.
- Made the unconditional clearing of allocations a bit more explicit.
- Made the default domain workaround more robust against the possibility
  of an IOMMU driver giving back the same domain for multiple devices.

Robin.

v3: http://thread.gmane.org/gmane.linux.kernel.iommu/10133
Updated branch at: git://linux-arm.org/linux-rm iommu/dma

Robin Murphy (4):
  iommu/iova: Avoid over-allocating when size-aligned
  iommu: Implement common IOMMU ops for DMA mapping
  arm64: Add IOMMU dma_ops
  arm64: Hook up IOMMU dma_ops

 arch/arm64/Kconfig                   |   1 +
 arch/arm64/include/asm/dma-mapping.h |  15 +-
 arch/arm64/mm/dma-mapping.c          | 452 +++++++++++++++++++++++++++++
 drivers/iommu/Kconfig                |   7 +
 drivers/iommu/Makefile               |   1 +
 drivers/iommu/dma-iommu.c            | 538 +++++++++++++++++++++++++++++++++++
 drivers/iommu/intel-iommu.c          |   2 +
 drivers/iommu/iova.c                 |  23 +-
 include/linux/dma-iommu.h            |  84 ++++++
 include/linux/iommu.h                |   1 +
 10 files changed, 1099 insertions(+), 25 deletions(-)
 create mode 100644 drivers/iommu/dma-iommu.c
 create mode 100644 include/linux/dma-iommu.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 0/4] arm64: IOMMU-backed DMA mapping
@ 2015-07-16 18:40 ` Robin Murphy
  0 siblings, 0 replies; 24+ messages in thread
From: Robin Murphy @ 2015-07-16 18:40 UTC (permalink / raw
  To: linux-arm-kernel

Hi all,

Here's a quick repost to address feedback on v3, and a couple of other
tweaks. Changes this round:

- Fix the heinous dma_mask/coherency confusion, which also simplifies
  some prototypes and parameter passing as a bonus.
- Reorder the iommu_map/unmap in alloc/free to prevent any potential
  access the wrong side of clearing/freeing the pages.
- Made the unconditional clearing of allocations a bit more explicit.
- Made the default domain workaround more robust against the possibility
  of an IOMMU driver giving back the same domain for multiple devices.

Robin.

v3: http://thread.gmane.org/gmane.linux.kernel.iommu/10133
Updated branch at: git://linux-arm.org/linux-rm iommu/dma

Robin Murphy (4):
  iommu/iova: Avoid over-allocating when size-aligned
  iommu: Implement common IOMMU ops for DMA mapping
  arm64: Add IOMMU dma_ops
  arm64: Hook up IOMMU dma_ops

 arch/arm64/Kconfig                   |   1 +
 arch/arm64/include/asm/dma-mapping.h |  15 +-
 arch/arm64/mm/dma-mapping.c          | 452 +++++++++++++++++++++++++++++
 drivers/iommu/Kconfig                |   7 +
 drivers/iommu/Makefile               |   1 +
 drivers/iommu/dma-iommu.c            | 538 +++++++++++++++++++++++++++++++++++
 drivers/iommu/intel-iommu.c          |   2 +
 drivers/iommu/iova.c                 |  23 +-
 include/linux/dma-iommu.h            |  84 ++++++
 include/linux/iommu.h                |   1 +
 10 files changed, 1099 insertions(+), 25 deletions(-)
 create mode 100644 drivers/iommu/dma-iommu.c
 create mode 100644 include/linux/dma-iommu.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 1/4] iommu/iova: Avoid over-allocating when size-aligned
  2015-07-16 18:40 ` Robin Murphy
@ 2015-07-16 18:40     ` Robin Murphy
  -1 siblings, 0 replies; 24+ messages in thread
From: Robin Murphy @ 2015-07-16 18:40 UTC (permalink / raw
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	joro-zLv9SwRftAIdnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw, arnd-r2nGTMty4D4,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA, David Woodhouse,
	yong.wu-NuS5LvNUpcJWk0Htik3J/w

Currently, allocating a size-aligned IOVA region quietly adjusts the
actual allocation size in the process, returning a rounded-up
power-of-two-sized allocation. This results in mismatched behaviour in
the IOMMU driver if the original size was not a power of two, where the
original size is mapped, but the rounded-up IOVA size is unmapped.

Whilst some IOMMUs will happily unmap already-unmapped pages, others
consider this an error, so fix it by computing the necessary alignment
padding without altering the actual allocation size. Also clean up by
making pad_size unsigned, since its callers always pass unsigned values
and negative padding makes little sense here anyway.

CC: David Woodhouse <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/intel-iommu.c |  2 ++
 drivers/iommu/iova.c        | 23 ++++++-----------------
 2 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a98a7b2..9210159 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3233,6 +3233,8 @@ static struct iova *intel_alloc_iova(struct device *dev,
 
 	/* Restrict dma_mask to the width that the iommu can handle */
 	dma_mask = min_t(uint64_t, DOMAIN_MAX_ADDR(domain->gaw), dma_mask);
+	/* Ensure we reserve the whole size-aligned region */
+	nrpages = __roundup_pow_of_two(nrpages);
 
 	if (!dmar_forcedac && dma_mask > DMA_BIT_MASK(32)) {
 		/*
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b7c3d92..29f2efc 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -120,19 +120,14 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
 	}
 }
 
-/* Computes the padding size required, to make the
- * the start address naturally aligned on its size
+/*
+ * Computes the padding size required, to make the start address
+ * naturally aligned on the power-of-two order of its size
  */
-static int
-iova_get_pad_size(int size, unsigned int limit_pfn)
+static unsigned int
+iova_get_pad_size(unsigned int size, unsigned int limit_pfn)
 {
-	unsigned int pad_size = 0;
-	unsigned int order = ilog2(size);
-
-	if (order)
-		pad_size = (limit_pfn + 1) % (1 << order);
-
-	return pad_size;
+	return (limit_pfn + 1 - size) & (__roundup_pow_of_two(size) - 1);
 }
 
 static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
@@ -265,12 +260,6 @@ alloc_iova(struct iova_domain *iovad, unsigned long size,
 	if (!new_iova)
 		return NULL;
 
-	/* If size aligned is set then round the size to
-	 * to next power of two.
-	 */
-	if (size_aligned)
-		size = __roundup_pow_of_two(size);
-
 	ret = __alloc_and_insert_iova_range(iovad, size, limit_pfn,
 			new_iova, size_aligned);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 1/4] iommu/iova: Avoid over-allocating when size-aligned
@ 2015-07-16 18:40     ` Robin Murphy
  0 siblings, 0 replies; 24+ messages in thread
From: Robin Murphy @ 2015-07-16 18:40 UTC (permalink / raw
  To: linux-arm-kernel

Currently, allocating a size-aligned IOVA region quietly adjusts the
actual allocation size in the process, returning a rounded-up
power-of-two-sized allocation. This results in mismatched behaviour in
the IOMMU driver if the original size was not a power of two, where the
original size is mapped, but the rounded-up IOVA size is unmapped.

Whilst some IOMMUs will happily unmap already-unmapped pages, others
consider this an error, so fix it by computing the necessary alignment
padding without altering the actual allocation size. Also clean up by
making pad_size unsigned, since its callers always pass unsigned values
and negative padding makes little sense here anyway.

CC: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/intel-iommu.c |  2 ++
 drivers/iommu/iova.c        | 23 ++++++-----------------
 2 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a98a7b2..9210159 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3233,6 +3233,8 @@ static struct iova *intel_alloc_iova(struct device *dev,
 
 	/* Restrict dma_mask to the width that the iommu can handle */
 	dma_mask = min_t(uint64_t, DOMAIN_MAX_ADDR(domain->gaw), dma_mask);
+	/* Ensure we reserve the whole size-aligned region */
+	nrpages = __roundup_pow_of_two(nrpages);
 
 	if (!dmar_forcedac && dma_mask > DMA_BIT_MASK(32)) {
 		/*
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b7c3d92..29f2efc 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -120,19 +120,14 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
 	}
 }
 
-/* Computes the padding size required, to make the
- * the start address naturally aligned on its size
+/*
+ * Computes the padding size required, to make the start address
+ * naturally aligned on the power-of-two order of its size
  */
-static int
-iova_get_pad_size(int size, unsigned int limit_pfn)
+static unsigned int
+iova_get_pad_size(unsigned int size, unsigned int limit_pfn)
 {
-	unsigned int pad_size = 0;
-	unsigned int order = ilog2(size);
-
-	if (order)
-		pad_size = (limit_pfn + 1) % (1 << order);
-
-	return pad_size;
+	return (limit_pfn + 1 - size) & (__roundup_pow_of_two(size) - 1);
 }
 
 static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
@@ -265,12 +260,6 @@ alloc_iova(struct iova_domain *iovad, unsigned long size,
 	if (!new_iova)
 		return NULL;
 
-	/* If size aligned is set then round the size to
-	 * to next power of two.
-	 */
-	if (size_aligned)
-		size = __roundup_pow_of_two(size);
-
 	ret = __alloc_and_insert_iova_range(iovad, size, limit_pfn,
 			new_iova, size_aligned);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 2/4] iommu: Implement common IOMMU ops for DMA mapping
  2015-07-16 18:40 ` Robin Murphy
@ 2015-07-16 18:40     ` Robin Murphy
  -1 siblings, 0 replies; 24+ messages in thread
From: Robin Murphy @ 2015-07-16 18:40 UTC (permalink / raw
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	joro-zLv9SwRftAIdnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw, arnd-r2nGTMty4D4,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA, yong.wu-NuS5LvNUpcJWk0Htik3J/w

Taking inspiration from the existing arch/arm code, break out some
generic functions to interface the DMA-API to the IOMMU-API. This will
do the bulk of the heavy lifting for IOMMU-backed dma-mapping.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/Kconfig     |   7 +
 drivers/iommu/Makefile    |   1 +
 drivers/iommu/dma-iommu.c | 538 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-iommu.h |  84 ++++++++
 include/linux/iommu.h     |   1 +
 5 files changed, 631 insertions(+)
 create mode 100644 drivers/iommu/dma-iommu.c
 create mode 100644 include/linux/dma-iommu.h

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f1fb1d3..efb0e66 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -48,6 +48,13 @@ config OF_IOMMU
        def_bool y
        depends on OF && IOMMU_API
 
+# IOMMU-agnostic DMA-mapping layer
+config IOMMU_DMA
+	bool
+	depends on NEED_SG_DMA_LENGTH
+	select IOMMU_API
+	select IOMMU_IOVA
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PPC32
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index c6dcc51..f465cfb 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
+obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
new file mode 100644
index 0000000..cab3289
--- /dev/null
+++ b/drivers/iommu/dma-iommu.c
@@ -0,0 +1,538 @@
+/*
+ * A fairly generic DMA-API to IOMMU-API glue layer.
+ *
+ * Copyright (C) 2014-2015 ARM Ltd.
+ *
+ * based in part on arch/arm/mm/dma-mapping.c:
+ * Copyright (C) 2000-2004 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/device.h>
+#include <linux/dma-iommu.h>
+#include <linux/huge_mm.h>
+#include <linux/iommu.h>
+#include <linux/iova.h>
+#include <linux/mm.h>
+
+int iommu_dma_init(void)
+{
+	return iommu_iova_cache_init();
+}
+
+/**
+ * iommu_get_dma_cookie - Acquire DMA-API resources for a domain
+ * @domain: IOMMU domain to prepare for DMA-API usage
+ *
+ * IOMMU drivers should normally call this from their domain_alloc
+ * callback when domain->type == IOMMU_DOMAIN_DMA.
+ */
+int iommu_get_dma_cookie(struct iommu_domain *domain)
+{
+	struct iova_domain *iovad;
+
+	if (domain->dma_api_cookie)
+		return -EEXIST;
+
+	iovad = kzalloc(sizeof(*iovad), GFP_KERNEL);
+	domain->dma_api_cookie = iovad;
+
+	return iovad ? 0 : -ENOMEM;
+}
+EXPORT_SYMBOL(iommu_get_dma_cookie);
+
+/**
+ * iommu_put_dma_cookie - Release a domain's DMA mapping resources
+ * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
+ *
+ * IOMMU drivers should normally call this from their domain_free callback.
+ */
+void iommu_put_dma_cookie(struct iommu_domain *domain)
+{
+	struct iova_domain *iovad = domain->dma_api_cookie;
+
+	if (!iovad)
+		return;
+
+	put_iova_domain(iovad);
+	kfree(iovad);
+	domain->dma_api_cookie = NULL;
+}
+EXPORT_SYMBOL(iommu_put_dma_cookie);
+
+/**
+ * iommu_dma_init_domain - Initialise a DMA mapping domain
+ * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
+ * @base: IOVA at which the mappable address space starts
+ * @size: Size of IOVA space
+ *
+ * @base and @size should be exact multiples of IOMMU page granularity to
+ * avoid rounding surprises. If necessary, we reserve the page at address 0
+ * to ensure it is an invalid IOVA.
+ */
+int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 size)
+{
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	unsigned long order, base_pfn, end_pfn;
+
+	if (!iovad)
+		return -ENODEV;
+
+	/* Use the smallest supported page size for IOVA granularity */
+	order = __ffs(domain->ops->pgsize_bitmap);
+	base_pfn = max_t(unsigned long, 1, base >> order);
+	end_pfn = (base + size - 1) >> order;
+
+	/* Check the domain allows at least some access to the device... */
+	if (domain->geometry.force_aperture) {
+		if (base > domain->geometry.aperture_end ||
+		    base + size <= domain->geometry.aperture_start) {
+			pr_warn("specified DMA range outside IOMMU capability\n");
+			return -EFAULT;
+		}
+		/* ...then finally give it a kicking to make sure it fits */
+		base_pfn = max_t(unsigned long, base_pfn,
+				domain->geometry.aperture_start >> order);
+		end_pfn = min_t(unsigned long, end_pfn,
+				domain->geometry.aperture_end >> order);
+	}
+
+	/* All we can safely do with an existing domain is enlarge it */
+	if (iovad->start_pfn) {
+		if (1UL << order == iovad->granule &&
+		    base_pfn == iovad->start_pfn &&
+		    end_pfn >= iovad->dma_32bit_pfn) {
+			pr_warn("Incompatible range for DMA domain\n");
+			return -EFAULT;
+		}
+		iovad->dma_32bit_pfn = end_pfn;
+	} else {
+		init_iova_domain(iovad, 1UL << order, base_pfn, end_pfn);
+	}
+	return 0;
+}
+EXPORT_SYMBOL(iommu_dma_init_domain);
+
+/**
+ * dma_direction_to_prot - Translate DMA API directions to IOMMU API page flags
+ * @dir: Direction of DMA transfer
+ * @coherent: Is the DMA master cache-coherent?
+ *
+ * Return: corresponding IOMMU API page protection flags
+ */
+int dma_direction_to_prot(enum dma_data_direction dir, bool coherent)
+{
+	int prot = coherent ? IOMMU_CACHE : 0;
+
+	switch (dir) {
+	case DMA_BIDIRECTIONAL:
+		return prot | IOMMU_READ | IOMMU_WRITE;
+	case DMA_TO_DEVICE:
+		return prot | IOMMU_READ;
+	case DMA_FROM_DEVICE:
+		return prot | IOMMU_WRITE;
+	default:
+		return 0;
+	}
+}
+
+static struct iova *__alloc_iova(struct iova_domain *iovad, size_t size,
+		dma_addr_t dma_limit)
+{
+	unsigned long shift = iova_shift(iovad);
+	unsigned long length = iova_align(iovad, size) >> shift;
+
+	/*
+	 * Enforce size-alignment to be safe - there should probably be
+	 * an attribute to control this per-device, or at least per-domain...
+	 */
+	return alloc_iova(iovad, length, dma_limit >> shift, true);
+}
+
+/* The IOVA allocator knows what we mapped, so just unmap whatever that was */
+static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr)
+{
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	unsigned long shift = iova_shift(iovad);
+	unsigned long pfn = dma_addr >> shift;
+	struct iova *iova = find_iova(iovad, pfn);
+	size_t size = iova_size(iova) << shift;
+
+	/* ...and if we can't, then something is horribly, horribly wrong */
+	BUG_ON(iommu_unmap(domain, pfn << shift, size) < size);
+	__free_iova(iovad, iova);
+}
+
+static void __iommu_dma_free_pages(struct page **pages, int count)
+{
+	while (count--)
+		__free_page(pages[count]);
+	kvfree(pages);
+}
+
+static struct page **__iommu_dma_alloc_pages(unsigned int count, gfp_t gfp)
+{
+	struct page **pages;
+	unsigned int i = 0, array_size = count * sizeof(*pages);
+
+	if (array_size <= PAGE_SIZE)
+		pages = kzalloc(array_size, GFP_KERNEL);
+	else
+		pages = vzalloc(array_size);
+	if (!pages)
+		return NULL;
+
+	/* IOMMU can map any pages, so himem can also be used here */
+	gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
+
+	while (count) {
+		struct page *page = NULL;
+		int j, order = __fls(count);
+
+		/*
+		 * Higher-order allocations are a convenience rather
+		 * than a necessity, hence using __GFP_NORETRY until
+		 * falling back to single-page allocations.
+		 */
+		for (order = min(order, MAX_ORDER); order > 0; order--) {
+			page = alloc_pages(gfp | __GFP_NORETRY, order);
+			if (!page)
+				continue;
+			if (PageCompound(page)) {
+				if (!split_huge_page(page))
+					break;
+				__free_pages(page, order);
+			} else {
+				split_page(page, order);
+				break;
+			}
+		}
+		if (!page)
+			page = alloc_page(gfp);
+		if (!page) {
+			__iommu_dma_free_pages(pages, i);
+			return NULL;
+		}
+		j = 1 << order;
+		count -= j;
+		while (j--)
+			pages[i++] = page++;
+	}
+	return pages;
+}
+
+/**
+ * iommu_dma_free - Free a buffer allocated by iommu_dma_alloc()
+ * @dev: Device which owns this buffer
+ * @pages: Array of buffer pages as returned by iommu_dma_alloc()
+ * @size: Size of buffer in bytes
+ * @handle: DMA address of buffer
+ *
+ * Frees both the pages associated with the buffer, and the array
+ * describing them
+ */
+void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
+		dma_addr_t *handle)
+{
+	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), *handle);
+	__iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
+	*handle = DMA_ERROR_CODE;
+}
+
+/**
+ * iommu_dma_alloc - Allocate and map a buffer contiguous in IOVA space
+ * @dev: Device to allocate memory for. Must be a real device
+ *	 attached to an iommu_dma_domain
+ * @size: Size of buffer in bytes
+ * @gfp: Allocation flags
+ * @prot: IOMMU mapping flags
+ * @handle: Out argument for allocated DMA handle
+ * @flush_page: Arch callback to flush a single page from all caches to
+ *		ensure DMA visibility of __GFP_ZERO. May be NULL if not
+ *		required.
+ *
+ * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
+ * but an IOMMU which supports smaller pages might not map the whole thing.
+ *
+ * Return: Array of struct page pointers describing the buffer,
+ *	   or NULL on failure.
+ */
+struct page **iommu_dma_alloc(struct device *dev, size_t size,
+		gfp_t gfp, int prot, dma_addr_t *handle,
+		void (*flush_page)(const void *, phys_addr_t))
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	struct iova *iova;
+	struct page **pages;
+	struct sg_table sgt;
+	dma_addr_t dma_addr;
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+	*handle = DMA_ERROR_CODE;
+
+	pages = __iommu_dma_alloc_pages(count, gfp);
+	if (!pages)
+		return NULL;
+
+	iova = __alloc_iova(iovad, size, dev->coherent_dma_mask);
+	if (!iova)
+		goto out_free_pages;
+
+	size = iova_align(iovad, size);
+	if (sg_alloc_table_from_pages(&sgt, pages, count, 0, size, GFP_KERNEL))
+		goto out_free_iova;
+
+	if (gfp & __GFP_ZERO) {
+		struct sg_mapping_iter miter;
+		/*
+		 * The flushing provided by the SG_MITER_TO_SG flag only
+		 * applies to highmem pages, so it won't do the job here.
+		 */
+		sg_miter_start(&miter, sgt.sgl, sgt.orig_nents, SG_MITER_FROM_SG);
+		while (sg_miter_next(&miter)) {
+			memset(miter.addr, 0, PAGE_SIZE);
+			if (flush_page)
+				flush_page(miter.addr, page_to_phys(miter.page));
+		}
+		sg_miter_stop(&miter);
+	}
+
+	dma_addr = iova_dma_addr(iovad, iova);
+	if (iommu_map_sg(domain, dma_addr, sgt.sgl, sgt.orig_nents, prot)
+			< size)
+		goto out_free_sg;
+
+	*handle = dma_addr;
+	sg_free_table(&sgt);
+	return pages;
+
+out_free_sg:
+	sg_free_table(&sgt);
+out_free_iova:
+	__free_iova(iovad, iova);
+out_free_pages:
+	__iommu_dma_free_pages(pages, count);
+	return NULL;
+}
+
+/**
+ * iommu_dma_mmap - Map a buffer into provided user VMA
+ * @pages: Array representing buffer from iommu_dma_alloc()
+ * @size: Size of buffer in bytes
+ * @vma: VMA describing requested userspace mapping
+ *
+ * Maps the pages of the buffer in @pages into @vma. The caller is responsible
+ * for verifying the correct size and protection of @vma beforehand.
+ */
+
+int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma)
+{
+	unsigned long uaddr = vma->vm_start;
+	unsigned int i, count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	int ret = -ENXIO;
+
+	for (i = vma->vm_pgoff; i < count && uaddr < vma->vm_end; i++) {
+		ret = vm_insert_page(vma, uaddr, pages[i]);
+		if (ret)
+			break;
+		uaddr += PAGE_SIZE;
+	}
+	return ret;
+}
+
+dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
+		unsigned long offset, size_t size, int prot)
+{
+	dma_addr_t dma_addr;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	phys_addr_t phys = page_to_phys(page) + offset;
+	size_t iova_off = iova_offset(iovad, phys);
+	size_t len = iova_align(iovad, size + iova_off);
+	struct iova *iova = __alloc_iova(iovad, len, dma_get_mask(dev));
+
+	if (!iova)
+		return DMA_ERROR_CODE;
+
+	dma_addr = iova_dma_addr(iovad, iova);
+	if (iommu_map(domain, dma_addr, phys - iova_off, len, prot)) {
+		__free_iova(iovad, iova);
+		return DMA_ERROR_CODE;
+	}
+	return dma_addr + iova_off;
+}
+
+void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
+		enum dma_data_direction dir, struct dma_attrs *attrs)
+{
+	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle);
+}
+
+/*
+ * Go and look at iommu_dma_map_sg first; It's OK, I'll wait...
+ *
+ * ...right, now that the scatterlist pages are all contiguous from the
+ * device's viewpoint, we can collapse any buffer segments which run
+ * together (subject to the device's segment limitations), filling in
+ * the DMA fields at the same time as we run through the list.
+ */
+static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents,
+		dma_addr_t dma_addr)
+{
+	struct scatterlist *s, *seg = sg;
+	unsigned long seg_mask = dma_get_seg_boundary(dev);
+	unsigned int max_len = dma_get_max_seg_size(dev);
+	unsigned int seg_len = 0, seg_dma = 0;
+	int i, count = 1;
+
+	for_each_sg(sg, s, nents, i) {
+		/* Un-swizzling the fields here, hence the naming mismatch */
+		unsigned int s_offset = sg_dma_address(s);
+		unsigned int s_length = sg_dma_len(s);
+		unsigned int s_dma_len = s->length;
+
+		s->offset = s_offset;
+		s->length = s_length;
+		sg_dma_address(s) = DMA_ERROR_CODE;
+		sg_dma_len(s) = 0;
+
+		/*
+		 * This ensures any concatenation we do doesn't exceed the
+		 * dma_parms limits, but it also won't fail if any segments
+		 * were out of spec to begin with - they'll just stay as-is.
+		 */
+		if (seg_len && (seg_dma + seg_len == dma_addr + s_offset) &&
+		    (seg_len + s_dma_len <= max_len) &&
+		    ((seg_dma & seg_mask) <= seg_mask - (seg_len + s_length))
+		   ) {
+			sg_dma_len(seg) += s_dma_len;
+		} else {
+			if (seg_len) {
+				seg = sg_next(seg);
+				count++;
+			}
+			sg_dma_len(seg) = s_dma_len - s_offset;
+			sg_dma_address(seg) = dma_addr + s_offset;
+
+			seg_len = s_offset;
+			seg_dma = dma_addr + s_offset;
+		}
+		seg_len += s_length;
+		dma_addr += s_dma_len;
+	}
+	return count;
+}
+
+/*
+ * If mapping failed, then just restore the original list,
+ * but making sure the DMA fields are invalidated.
+ */
+static void __invalidate_sg(struct scatterlist *sg, int nents)
+{
+	struct scatterlist *s;
+	int i;
+
+	for_each_sg(sg, s, nents, i) {
+		if (sg_dma_address(s) != DMA_ERROR_CODE)
+			s->offset = sg_dma_address(s);
+		if (sg_dma_len(s))
+			s->length = sg_dma_len(s);
+		sg_dma_address(s) = DMA_ERROR_CODE;
+		sg_dma_len(s) = 0;
+	}
+}
+
+/*
+ * The DMA API client is passing in a scatterlist which could describe
+ * any old buffer layout, but the IOMMU API requires everything to be
+ * aligned to IOMMU pages. Hence the need for this complicated bit of
+ * impedance-matching, to be able to hand off a suitably-aligned list,
+ * but still preserve the original offsets and sizes for the caller.
+ */
+int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
+		int nents, int prot)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	struct iova *iova;
+	struct scatterlist *s;
+	dma_addr_t dma_addr;
+	size_t iova_len = 0;
+	int i;
+
+	/*
+	 * Work out how much IOVA space we need, and align the segments to
+	 * IOVA granules for the IOMMU driver to handle. With some clever
+	 * trickery we can modify the list in-place, but reversibly, by
+	 * hiding the original data in the as-yet-unused DMA fields.
+	 */
+	for_each_sg(sg, s, nents, i) {
+		size_t s_offset = iova_offset(iovad, s->offset);
+		size_t s_length = s->length;
+
+		sg_dma_address(s) = s->offset;
+		sg_dma_len(s) = s_length;
+		s->offset -= s_offset;
+		s_length = iova_align(iovad, s_length + s_offset);
+		s->length = s_length;
+
+		iova_len += s_length;
+	}
+
+	iova = __alloc_iova(iovad, iova_len, dma_get_mask(dev));
+	if (!iova)
+		goto out_restore_sg;
+
+	/*
+	 * We'll leave any physical concatenation to the IOMMU driver's
+	 * implementation - it knows better than we do.
+	 */
+	dma_addr = iova_dma_addr(iovad, iova);
+	if (iommu_map_sg(domain, dma_addr, sg, nents, prot) < iova_len)
+		goto out_free_iova;
+
+	return __finalise_sg(dev, sg, nents, dma_addr);
+
+out_free_iova:
+	__free_iova(iovad, iova);
+out_restore_sg:
+	__invalidate_sg(sg, nents);
+	return 0;
+}
+
+void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
+		enum dma_data_direction dir, struct dma_attrs *attrs)
+{
+	/*
+	 * The scatterlist segments are mapped contiguously
+	 * in IOVA space, so this is incredibly easy.
+	 */
+	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), sg_dma_address(sg));
+}
+
+int iommu_dma_supported(struct device *dev, u64 mask)
+{
+	/*
+	 * 'Special' IOMMUs which don't have the same addressing capability
+	 * as the CPU will have to wait until we have some way to query that
+	 * before they'll be able to use this framework.
+	 */
+	return 1;
+}
+
+int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+	return dma_addr == DMA_ERROR_CODE;
+}
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
new file mode 100644
index 0000000..6f08215
--- /dev/null
+++ b/include/linux/dma-iommu.h
@@ -0,0 +1,84 @@
+/*
+ * Copyright (C) 2014-2015 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __DMA_IOMMU_H
+#define __DMA_IOMMU_H
+
+#ifdef __KERNEL__
+
+#include <linux/iommu.h>
+
+#ifdef CONFIG_IOMMU_DMA
+
+int iommu_dma_init(void);
+
+/* Domain management interface for IOMMU drivers */
+int iommu_get_dma_cookie(struct iommu_domain *domain);
+void iommu_put_dma_cookie(struct iommu_domain *domain);
+
+/* Setup call for arch DMA mapping code */
+int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 size);
+
+/* General helpers for DMA-API <-> IOMMU-API interaction */
+int dma_direction_to_prot(enum dma_data_direction dir, bool coherent);
+
+/*
+ * These implement the bulk of the relevant DMA mapping callbacks, but require
+ * the arch code to take care of attributes and cache maintenance
+ */
+struct page **iommu_dma_alloc(struct device *dev, size_t size,
+		gfp_t gfp, int prot, dma_addr_t *handle,
+		void (*flush_page)(const void *, phys_addr_t));
+void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
+		dma_addr_t *handle);
+
+int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma);
+
+dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
+		unsigned long offset, size_t size, int prot);
+int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
+		int nents, int prot);
+
+/*
+ * Arch code with no special attribute handling may use these
+ * directly as DMA mapping callbacks for simplicity
+ */
+void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
+		enum dma_data_direction dir, struct dma_attrs *attrs);
+void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
+		enum dma_data_direction dir, struct dma_attrs *attrs);
+int iommu_dma_supported(struct device *dev, u64 mask);
+int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
+
+#else
+
+static inline int iommu_dma_init(void)
+{
+	return 0;
+}
+
+static inline int iommu_get_dma_cookie(struct iommu_domain *domain)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_put_dma_cookie(struct iommu_domain *domain)
+{
+}
+
+#endif	/* CONFIG_IOMMU_DMA */
+
+#endif	/* __KERNEL__ */
+#endif	/* __DMA_IOMMU_H */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index dc767f7..19eee27 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -81,6 +81,7 @@ struct iommu_domain {
 	iommu_fault_handler_t handler;
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
+	void *dma_api_cookie;
 };
 
 enum iommu_cap {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 2/4] iommu: Implement common IOMMU ops for DMA mapping
@ 2015-07-16 18:40     ` Robin Murphy
  0 siblings, 0 replies; 24+ messages in thread
From: Robin Murphy @ 2015-07-16 18:40 UTC (permalink / raw
  To: linux-arm-kernel

Taking inspiration from the existing arch/arm code, break out some
generic functions to interface the DMA-API to the IOMMU-API. This will
do the bulk of the heavy lifting for IOMMU-backed dma-mapping.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/Kconfig     |   7 +
 drivers/iommu/Makefile    |   1 +
 drivers/iommu/dma-iommu.c | 538 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-iommu.h |  84 ++++++++
 include/linux/iommu.h     |   1 +
 5 files changed, 631 insertions(+)
 create mode 100644 drivers/iommu/dma-iommu.c
 create mode 100644 include/linux/dma-iommu.h

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f1fb1d3..efb0e66 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -48,6 +48,13 @@ config OF_IOMMU
        def_bool y
        depends on OF && IOMMU_API
 
+# IOMMU-agnostic DMA-mapping layer
+config IOMMU_DMA
+	bool
+	depends on NEED_SG_DMA_LENGTH
+	select IOMMU_API
+	select IOMMU_IOVA
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PPC32
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index c6dcc51..f465cfb 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
+obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
new file mode 100644
index 0000000..cab3289
--- /dev/null
+++ b/drivers/iommu/dma-iommu.c
@@ -0,0 +1,538 @@
+/*
+ * A fairly generic DMA-API to IOMMU-API glue layer.
+ *
+ * Copyright (C) 2014-2015 ARM Ltd.
+ *
+ * based in part on arch/arm/mm/dma-mapping.c:
+ * Copyright (C) 2000-2004 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/device.h>
+#include <linux/dma-iommu.h>
+#include <linux/huge_mm.h>
+#include <linux/iommu.h>
+#include <linux/iova.h>
+#include <linux/mm.h>
+
+int iommu_dma_init(void)
+{
+	return iommu_iova_cache_init();
+}
+
+/**
+ * iommu_get_dma_cookie - Acquire DMA-API resources for a domain
+ * @domain: IOMMU domain to prepare for DMA-API usage
+ *
+ * IOMMU drivers should normally call this from their domain_alloc
+ * callback when domain->type == IOMMU_DOMAIN_DMA.
+ */
+int iommu_get_dma_cookie(struct iommu_domain *domain)
+{
+	struct iova_domain *iovad;
+
+	if (domain->dma_api_cookie)
+		return -EEXIST;
+
+	iovad = kzalloc(sizeof(*iovad), GFP_KERNEL);
+	domain->dma_api_cookie = iovad;
+
+	return iovad ? 0 : -ENOMEM;
+}
+EXPORT_SYMBOL(iommu_get_dma_cookie);
+
+/**
+ * iommu_put_dma_cookie - Release a domain's DMA mapping resources
+ * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
+ *
+ * IOMMU drivers should normally call this from their domain_free callback.
+ */
+void iommu_put_dma_cookie(struct iommu_domain *domain)
+{
+	struct iova_domain *iovad = domain->dma_api_cookie;
+
+	if (!iovad)
+		return;
+
+	put_iova_domain(iovad);
+	kfree(iovad);
+	domain->dma_api_cookie = NULL;
+}
+EXPORT_SYMBOL(iommu_put_dma_cookie);
+
+/**
+ * iommu_dma_init_domain - Initialise a DMA mapping domain
+ * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
+ * @base: IOVA at which the mappable address space starts
+ * @size: Size of IOVA space
+ *
+ * @base and @size should be exact multiples of IOMMU page granularity to
+ * avoid rounding surprises. If necessary, we reserve the page at address 0
+ * to ensure it is an invalid IOVA.
+ */
+int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 size)
+{
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	unsigned long order, base_pfn, end_pfn;
+
+	if (!iovad)
+		return -ENODEV;
+
+	/* Use the smallest supported page size for IOVA granularity */
+	order = __ffs(domain->ops->pgsize_bitmap);
+	base_pfn = max_t(unsigned long, 1, base >> order);
+	end_pfn = (base + size - 1) >> order;
+
+	/* Check the domain allows at least some access to the device... */
+	if (domain->geometry.force_aperture) {
+		if (base > domain->geometry.aperture_end ||
+		    base + size <= domain->geometry.aperture_start) {
+			pr_warn("specified DMA range outside IOMMU capability\n");
+			return -EFAULT;
+		}
+		/* ...then finally give it a kicking to make sure it fits */
+		base_pfn = max_t(unsigned long, base_pfn,
+				domain->geometry.aperture_start >> order);
+		end_pfn = min_t(unsigned long, end_pfn,
+				domain->geometry.aperture_end >> order);
+	}
+
+	/* All we can safely do with an existing domain is enlarge it */
+	if (iovad->start_pfn) {
+		if (1UL << order == iovad->granule &&
+		    base_pfn == iovad->start_pfn &&
+		    end_pfn >= iovad->dma_32bit_pfn) {
+			pr_warn("Incompatible range for DMA domain\n");
+			return -EFAULT;
+		}
+		iovad->dma_32bit_pfn = end_pfn;
+	} else {
+		init_iova_domain(iovad, 1UL << order, base_pfn, end_pfn);
+	}
+	return 0;
+}
+EXPORT_SYMBOL(iommu_dma_init_domain);
+
+/**
+ * dma_direction_to_prot - Translate DMA API directions to IOMMU API page flags
+ * @dir: Direction of DMA transfer
+ * @coherent: Is the DMA master cache-coherent?
+ *
+ * Return: corresponding IOMMU API page protection flags
+ */
+int dma_direction_to_prot(enum dma_data_direction dir, bool coherent)
+{
+	int prot = coherent ? IOMMU_CACHE : 0;
+
+	switch (dir) {
+	case DMA_BIDIRECTIONAL:
+		return prot | IOMMU_READ | IOMMU_WRITE;
+	case DMA_TO_DEVICE:
+		return prot | IOMMU_READ;
+	case DMA_FROM_DEVICE:
+		return prot | IOMMU_WRITE;
+	default:
+		return 0;
+	}
+}
+
+static struct iova *__alloc_iova(struct iova_domain *iovad, size_t size,
+		dma_addr_t dma_limit)
+{
+	unsigned long shift = iova_shift(iovad);
+	unsigned long length = iova_align(iovad, size) >> shift;
+
+	/*
+	 * Enforce size-alignment to be safe - there should probably be
+	 * an attribute to control this per-device, or at least per-domain...
+	 */
+	return alloc_iova(iovad, length, dma_limit >> shift, true);
+}
+
+/* The IOVA allocator knows what we mapped, so just unmap whatever that was */
+static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr)
+{
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	unsigned long shift = iova_shift(iovad);
+	unsigned long pfn = dma_addr >> shift;
+	struct iova *iova = find_iova(iovad, pfn);
+	size_t size = iova_size(iova) << shift;
+
+	/* ...and if we can't, then something is horribly, horribly wrong */
+	BUG_ON(iommu_unmap(domain, pfn << shift, size) < size);
+	__free_iova(iovad, iova);
+}
+
+static void __iommu_dma_free_pages(struct page **pages, int count)
+{
+	while (count--)
+		__free_page(pages[count]);
+	kvfree(pages);
+}
+
+static struct page **__iommu_dma_alloc_pages(unsigned int count, gfp_t gfp)
+{
+	struct page **pages;
+	unsigned int i = 0, array_size = count * sizeof(*pages);
+
+	if (array_size <= PAGE_SIZE)
+		pages = kzalloc(array_size, GFP_KERNEL);
+	else
+		pages = vzalloc(array_size);
+	if (!pages)
+		return NULL;
+
+	/* IOMMU can map any pages, so himem can also be used here */
+	gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
+
+	while (count) {
+		struct page *page = NULL;
+		int j, order = __fls(count);
+
+		/*
+		 * Higher-order allocations are a convenience rather
+		 * than a necessity, hence using __GFP_NORETRY until
+		 * falling back to single-page allocations.
+		 */
+		for (order = min(order, MAX_ORDER); order > 0; order--) {
+			page = alloc_pages(gfp | __GFP_NORETRY, order);
+			if (!page)
+				continue;
+			if (PageCompound(page)) {
+				if (!split_huge_page(page))
+					break;
+				__free_pages(page, order);
+			} else {
+				split_page(page, order);
+				break;
+			}
+		}
+		if (!page)
+			page = alloc_page(gfp);
+		if (!page) {
+			__iommu_dma_free_pages(pages, i);
+			return NULL;
+		}
+		j = 1 << order;
+		count -= j;
+		while (j--)
+			pages[i++] = page++;
+	}
+	return pages;
+}
+
+/**
+ * iommu_dma_free - Free a buffer allocated by iommu_dma_alloc()
+ * @dev: Device which owns this buffer
+ * @pages: Array of buffer pages as returned by iommu_dma_alloc()
+ * @size: Size of buffer in bytes
+ * @handle: DMA address of buffer
+ *
+ * Frees both the pages associated with the buffer, and the array
+ * describing them
+ */
+void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
+		dma_addr_t *handle)
+{
+	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), *handle);
+	__iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
+	*handle = DMA_ERROR_CODE;
+}
+
+/**
+ * iommu_dma_alloc - Allocate and map a buffer contiguous in IOVA space
+ * @dev: Device to allocate memory for. Must be a real device
+ *	 attached to an iommu_dma_domain
+ * @size: Size of buffer in bytes
+ * @gfp: Allocation flags
+ * @prot: IOMMU mapping flags
+ * @handle: Out argument for allocated DMA handle
+ * @flush_page: Arch callback to flush a single page from all caches to
+ *		ensure DMA visibility of __GFP_ZERO. May be NULL if not
+ *		required.
+ *
+ * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
+ * but an IOMMU which supports smaller pages might not map the whole thing.
+ *
+ * Return: Array of struct page pointers describing the buffer,
+ *	   or NULL on failure.
+ */
+struct page **iommu_dma_alloc(struct device *dev, size_t size,
+		gfp_t gfp, int prot, dma_addr_t *handle,
+		void (*flush_page)(const void *, phys_addr_t))
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	struct iova *iova;
+	struct page **pages;
+	struct sg_table sgt;
+	dma_addr_t dma_addr;
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+	*handle = DMA_ERROR_CODE;
+
+	pages = __iommu_dma_alloc_pages(count, gfp);
+	if (!pages)
+		return NULL;
+
+	iova = __alloc_iova(iovad, size, dev->coherent_dma_mask);
+	if (!iova)
+		goto out_free_pages;
+
+	size = iova_align(iovad, size);
+	if (sg_alloc_table_from_pages(&sgt, pages, count, 0, size, GFP_KERNEL))
+		goto out_free_iova;
+
+	if (gfp & __GFP_ZERO) {
+		struct sg_mapping_iter miter;
+		/*
+		 * The flushing provided by the SG_MITER_TO_SG flag only
+		 * applies to highmem pages, so it won't do the job here.
+		 */
+		sg_miter_start(&miter, sgt.sgl, sgt.orig_nents, SG_MITER_FROM_SG);
+		while (sg_miter_next(&miter)) {
+			memset(miter.addr, 0, PAGE_SIZE);
+			if (flush_page)
+				flush_page(miter.addr, page_to_phys(miter.page));
+		}
+		sg_miter_stop(&miter);
+	}
+
+	dma_addr = iova_dma_addr(iovad, iova);
+	if (iommu_map_sg(domain, dma_addr, sgt.sgl, sgt.orig_nents, prot)
+			< size)
+		goto out_free_sg;
+
+	*handle = dma_addr;
+	sg_free_table(&sgt);
+	return pages;
+
+out_free_sg:
+	sg_free_table(&sgt);
+out_free_iova:
+	__free_iova(iovad, iova);
+out_free_pages:
+	__iommu_dma_free_pages(pages, count);
+	return NULL;
+}
+
+/**
+ * iommu_dma_mmap - Map a buffer into provided user VMA
+ * @pages: Array representing buffer from iommu_dma_alloc()
+ * @size: Size of buffer in bytes
+ * @vma: VMA describing requested userspace mapping
+ *
+ * Maps the pages of the buffer in @pages into @vma. The caller is responsible
+ * for verifying the correct size and protection of @vma beforehand.
+ */
+
+int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma)
+{
+	unsigned long uaddr = vma->vm_start;
+	unsigned int i, count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	int ret = -ENXIO;
+
+	for (i = vma->vm_pgoff; i < count && uaddr < vma->vm_end; i++) {
+		ret = vm_insert_page(vma, uaddr, pages[i]);
+		if (ret)
+			break;
+		uaddr += PAGE_SIZE;
+	}
+	return ret;
+}
+
+dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
+		unsigned long offset, size_t size, int prot)
+{
+	dma_addr_t dma_addr;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	phys_addr_t phys = page_to_phys(page) + offset;
+	size_t iova_off = iova_offset(iovad, phys);
+	size_t len = iova_align(iovad, size + iova_off);
+	struct iova *iova = __alloc_iova(iovad, len, dma_get_mask(dev));
+
+	if (!iova)
+		return DMA_ERROR_CODE;
+
+	dma_addr = iova_dma_addr(iovad, iova);
+	if (iommu_map(domain, dma_addr, phys - iova_off, len, prot)) {
+		__free_iova(iovad, iova);
+		return DMA_ERROR_CODE;
+	}
+	return dma_addr + iova_off;
+}
+
+void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
+		enum dma_data_direction dir, struct dma_attrs *attrs)
+{
+	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle);
+}
+
+/*
+ * Go and look at iommu_dma_map_sg first; It's OK, I'll wait...
+ *
+ * ...right, now that the scatterlist pages are all contiguous from the
+ * device's viewpoint, we can collapse any buffer segments which run
+ * together (subject to the device's segment limitations), filling in
+ * the DMA fields at the same time as we run through the list.
+ */
+static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents,
+		dma_addr_t dma_addr)
+{
+	struct scatterlist *s, *seg = sg;
+	unsigned long seg_mask = dma_get_seg_boundary(dev);
+	unsigned int max_len = dma_get_max_seg_size(dev);
+	unsigned int seg_len = 0, seg_dma = 0;
+	int i, count = 1;
+
+	for_each_sg(sg, s, nents, i) {
+		/* Un-swizzling the fields here, hence the naming mismatch */
+		unsigned int s_offset = sg_dma_address(s);
+		unsigned int s_length = sg_dma_len(s);
+		unsigned int s_dma_len = s->length;
+
+		s->offset = s_offset;
+		s->length = s_length;
+		sg_dma_address(s) = DMA_ERROR_CODE;
+		sg_dma_len(s) = 0;
+
+		/*
+		 * This ensures any concatenation we do doesn't exceed the
+		 * dma_parms limits, but it also won't fail if any segments
+		 * were out of spec to begin with - they'll just stay as-is.
+		 */
+		if (seg_len && (seg_dma + seg_len == dma_addr + s_offset) &&
+		    (seg_len + s_dma_len <= max_len) &&
+		    ((seg_dma & seg_mask) <= seg_mask - (seg_len + s_length))
+		   ) {
+			sg_dma_len(seg) += s_dma_len;
+		} else {
+			if (seg_len) {
+				seg = sg_next(seg);
+				count++;
+			}
+			sg_dma_len(seg) = s_dma_len - s_offset;
+			sg_dma_address(seg) = dma_addr + s_offset;
+
+			seg_len = s_offset;
+			seg_dma = dma_addr + s_offset;
+		}
+		seg_len += s_length;
+		dma_addr += s_dma_len;
+	}
+	return count;
+}
+
+/*
+ * If mapping failed, then just restore the original list,
+ * but making sure the DMA fields are invalidated.
+ */
+static void __invalidate_sg(struct scatterlist *sg, int nents)
+{
+	struct scatterlist *s;
+	int i;
+
+	for_each_sg(sg, s, nents, i) {
+		if (sg_dma_address(s) != DMA_ERROR_CODE)
+			s->offset = sg_dma_address(s);
+		if (sg_dma_len(s))
+			s->length = sg_dma_len(s);
+		sg_dma_address(s) = DMA_ERROR_CODE;
+		sg_dma_len(s) = 0;
+	}
+}
+
+/*
+ * The DMA API client is passing in a scatterlist which could describe
+ * any old buffer layout, but the IOMMU API requires everything to be
+ * aligned to IOMMU pages. Hence the need for this complicated bit of
+ * impedance-matching, to be able to hand off a suitably-aligned list,
+ * but still preserve the original offsets and sizes for the caller.
+ */
+int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
+		int nents, int prot)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	struct iova *iova;
+	struct scatterlist *s;
+	dma_addr_t dma_addr;
+	size_t iova_len = 0;
+	int i;
+
+	/*
+	 * Work out how much IOVA space we need, and align the segments to
+	 * IOVA granules for the IOMMU driver to handle. With some clever
+	 * trickery we can modify the list in-place, but reversibly, by
+	 * hiding the original data in the as-yet-unused DMA fields.
+	 */
+	for_each_sg(sg, s, nents, i) {
+		size_t s_offset = iova_offset(iovad, s->offset);
+		size_t s_length = s->length;
+
+		sg_dma_address(s) = s->offset;
+		sg_dma_len(s) = s_length;
+		s->offset -= s_offset;
+		s_length = iova_align(iovad, s_length + s_offset);
+		s->length = s_length;
+
+		iova_len += s_length;
+	}
+
+	iova = __alloc_iova(iovad, iova_len, dma_get_mask(dev));
+	if (!iova)
+		goto out_restore_sg;
+
+	/*
+	 * We'll leave any physical concatenation to the IOMMU driver's
+	 * implementation - it knows better than we do.
+	 */
+	dma_addr = iova_dma_addr(iovad, iova);
+	if (iommu_map_sg(domain, dma_addr, sg, nents, prot) < iova_len)
+		goto out_free_iova;
+
+	return __finalise_sg(dev, sg, nents, dma_addr);
+
+out_free_iova:
+	__free_iova(iovad, iova);
+out_restore_sg:
+	__invalidate_sg(sg, nents);
+	return 0;
+}
+
+void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
+		enum dma_data_direction dir, struct dma_attrs *attrs)
+{
+	/*
+	 * The scatterlist segments are mapped contiguously
+	 * in IOVA space, so this is incredibly easy.
+	 */
+	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), sg_dma_address(sg));
+}
+
+int iommu_dma_supported(struct device *dev, u64 mask)
+{
+	/*
+	 * 'Special' IOMMUs which don't have the same addressing capability
+	 * as the CPU will have to wait until we have some way to query that
+	 * before they'll be able to use this framework.
+	 */
+	return 1;
+}
+
+int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+	return dma_addr == DMA_ERROR_CODE;
+}
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
new file mode 100644
index 0000000..6f08215
--- /dev/null
+++ b/include/linux/dma-iommu.h
@@ -0,0 +1,84 @@
+/*
+ * Copyright (C) 2014-2015 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __DMA_IOMMU_H
+#define __DMA_IOMMU_H
+
+#ifdef __KERNEL__
+
+#include <linux/iommu.h>
+
+#ifdef CONFIG_IOMMU_DMA
+
+int iommu_dma_init(void);
+
+/* Domain management interface for IOMMU drivers */
+int iommu_get_dma_cookie(struct iommu_domain *domain);
+void iommu_put_dma_cookie(struct iommu_domain *domain);
+
+/* Setup call for arch DMA mapping code */
+int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 size);
+
+/* General helpers for DMA-API <-> IOMMU-API interaction */
+int dma_direction_to_prot(enum dma_data_direction dir, bool coherent);
+
+/*
+ * These implement the bulk of the relevant DMA mapping callbacks, but require
+ * the arch code to take care of attributes and cache maintenance
+ */
+struct page **iommu_dma_alloc(struct device *dev, size_t size,
+		gfp_t gfp, int prot, dma_addr_t *handle,
+		void (*flush_page)(const void *, phys_addr_t));
+void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
+		dma_addr_t *handle);
+
+int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma);
+
+dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
+		unsigned long offset, size_t size, int prot);
+int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
+		int nents, int prot);
+
+/*
+ * Arch code with no special attribute handling may use these
+ * directly as DMA mapping callbacks for simplicity
+ */
+void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
+		enum dma_data_direction dir, struct dma_attrs *attrs);
+void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
+		enum dma_data_direction dir, struct dma_attrs *attrs);
+int iommu_dma_supported(struct device *dev, u64 mask);
+int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
+
+#else
+
+static inline int iommu_dma_init(void)
+{
+	return 0;
+}
+
+static inline int iommu_get_dma_cookie(struct iommu_domain *domain)
+{
+	return -ENODEV;
+}
+
+static inline void iommu_put_dma_cookie(struct iommu_domain *domain)
+{
+}
+
+#endif	/* CONFIG_IOMMU_DMA */
+
+#endif	/* __KERNEL__ */
+#endif	/* __DMA_IOMMU_H */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index dc767f7..19eee27 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -81,6 +81,7 @@ struct iommu_domain {
 	iommu_fault_handler_t handler;
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
+	void *dma_api_cookie;
 };
 
 enum iommu_cap {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 3/4] arm64: Add IOMMU dma_ops
  2015-07-16 18:40 ` Robin Murphy
@ 2015-07-16 18:40     ` Robin Murphy
  -1 siblings, 0 replies; 24+ messages in thread
From: Robin Murphy @ 2015-07-16 18:40 UTC (permalink / raw
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	joro-zLv9SwRftAIdnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw, arnd-r2nGTMty4D4,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA, yong.wu-NuS5LvNUpcJWk0Htik3J/w

Taking some inspiration from the arch/arm code, implement the
arch-specific side of the DMA mapping ops using the new IOMMU-DMA layer.

Unfortunately the device setup code has to start out as a big ugly mess
in order to work usefully right now, as 'proper' operation depends on
changes to device probe and DMA configuration ordering, IOMMU groups for
platform devices, and default domain support in arm/arm64 IOMMU drivers.
The workarounds here need only exist until that work is finished.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 arch/arm64/mm/dma-mapping.c | 428 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 428 insertions(+)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index d16a1ce..c89012e 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -526,3 +526,431 @@ static int __init dma_debug_do_init(void)
 	return 0;
 }
 fs_initcall(dma_debug_do_init);
+
+
+#ifdef CONFIG_IOMMU_DMA
+#include <linux/dma-iommu.h>
+#include <linux/platform_device.h>
+#include <linux/amba/bus.h>
+
+/* Thankfully, all cache ops are by VA so we can ignore phys here */
+static void flush_page(const void *virt, phys_addr_t phys)
+{
+	__dma_flush_range(virt, virt + PAGE_SIZE);
+}
+
+static void *__iommu_alloc_attrs(struct device *dev, size_t size,
+				 dma_addr_t *handle, gfp_t gfp,
+				 struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+	int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
+	void *addr;
+
+	if (WARN(!dev, "cannot create IOMMU mapping for unknown device\n"))
+		return NULL;
+
+	/*
+	 * Some drivers rely on this, and we may not want to potentially
+	 * expose stale kernel data to devices anyway.
+	 */
+	gfp |= __GFP_ZERO;
+
+	if (gfp & __GFP_WAIT) {
+		struct page **pages;
+		pgprot_t pgprot = coherent ? __pgprot(PROT_NORMAL) :
+					     __pgprot(PROT_NORMAL_NC);
+
+		pgprot = __get_dma_pgprot(attrs, pgprot, coherent);
+		pages = iommu_dma_alloc(dev, size, gfp, ioprot,	handle,
+					coherent ? NULL : flush_page);
+		if (!pages)
+			return NULL;
+
+		addr = dma_common_pages_remap(pages, size, VM_USERMAP, pgprot,
+					      __builtin_return_address(0));
+		if (!addr)
+			iommu_dma_free(dev, pages, size, handle);
+	} else {
+		struct page *page;
+		/*
+		 * In atomic context we can't remap anything, so we'll only
+		 * get the virtually contiguous buffer we need by way of a
+		 * physically contiguous allocation.
+		 */
+		if (coherent) {
+			page = alloc_pages(gfp, get_order(size));
+			addr = page ? page_address(page) : NULL;
+		} else {
+			addr = __alloc_from_pool(size, &page, gfp);
+		}
+		if (!addr)
+			return NULL;
+
+		*handle = iommu_dma_map_page(dev, page, 0, size, ioprot);
+		if (iommu_dma_mapping_error(dev, *handle)) {
+			if (coherent)
+				__free_pages(page, get_order(size));
+			else
+				__free_from_pool(addr, size);
+			addr = NULL;
+		}
+	}
+	return addr;
+}
+
+static void __iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
+			       dma_addr_t handle, struct dma_attrs *attrs)
+{
+	/*
+	 * @cpu_addr will be one of 3 things depending on how it was allocated:
+	 * - A remapped array of pages from iommu_dma_alloc(), for all
+	 *   non-atomic allocations.
+	 * - A non-cacheable alias from the atomic pool, for atomic
+	 *   allocations by non-coherent devices.
+	 * - A normal lowmem address, for atomic allocations by
+	 *   coherent devices.
+	 * Hence how dodgy the below logic looks...
+	 */
+	if (__in_atomic_pool(cpu_addr, size)) {
+		iommu_dma_unmap_page(dev, handle, size, 0, NULL);
+		__free_from_pool(cpu_addr, size);
+	} else if (is_vmalloc_addr(cpu_addr)){
+		struct vm_struct *area = find_vm_area(cpu_addr);
+
+		if (WARN_ON(!area || !area->pages))
+			return;
+		iommu_dma_free(dev, area->pages, size, &handle);
+		dma_common_free_remap(cpu_addr, size, VM_USERMAP);
+	} else {
+		iommu_dma_unmap_page(dev, handle, size, 0, NULL);
+		__free_pages(virt_to_page(cpu_addr), get_order(size));
+	}
+}
+
+static int __iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
+			      void *cpu_addr, dma_addr_t dma_addr, size_t size,
+			      struct dma_attrs *attrs)
+{
+	struct vm_struct *area;
+	int ret;
+
+	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot,
+					     is_device_dma_coherent(dev));
+
+	if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret))
+		return ret;
+
+	area = find_vm_area(cpu_addr);
+	if (WARN_ON(!area || !area->pages))
+		return -ENXIO;
+
+	return iommu_dma_mmap(area->pages, size, vma);
+}
+
+static int __iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
+			       void *cpu_addr, dma_addr_t dma_addr,
+			       size_t size, struct dma_attrs *attrs)
+{
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	struct vm_struct *area = find_vm_area(cpu_addr);
+
+	if (WARN_ON(!area || !area->pages))
+		return -ENXIO;
+
+	return sg_alloc_table_from_pages(sgt, area->pages, count, 0, size,
+					 GFP_KERNEL);
+}
+
+static void __iommu_sync_single_for_cpu(struct device *dev,
+					dma_addr_t dev_addr, size_t size,
+					enum dma_data_direction dir)
+{
+	phys_addr_t phys;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	__dma_unmap_area(phys_to_virt(phys), size, dir);
+}
+
+static void __iommu_sync_single_for_device(struct device *dev,
+					   dma_addr_t dev_addr, size_t size,
+					   enum dma_data_direction dir)
+{
+	phys_addr_t phys;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	__dma_map_area(phys_to_virt(phys), size, dir);
+}
+
+static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
+				   unsigned long offset, size_t size,
+				   enum dma_data_direction dir,
+				   struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+	int prot = dma_direction_to_prot(dir, coherent);
+	dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
+
+	if (!iommu_dma_mapping_error(dev, dev_addr) &&
+	    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_single_for_device(dev, dev_addr, size, dir);
+
+	return dev_addr;
+}
+
+static void __iommu_unmap_page(struct device *dev, dma_addr_t dev_addr,
+			       size_t size, enum dma_data_direction dir,
+			       struct dma_attrs *attrs)
+{
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_single_for_cpu(dev, dev_addr, size, dir);
+
+	iommu_dma_unmap_page(dev, dev_addr, size, dir, attrs);
+}
+
+static void __iommu_sync_sg_for_cpu(struct device *dev,
+				    struct scatterlist *sgl, int nelems,
+				    enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	for_each_sg(sgl, sg, nelems, i)
+		__dma_unmap_area(sg_virt(sg), sg->length, dir);
+}
+
+static void __iommu_sync_sg_for_device(struct device *dev,
+				       struct scatterlist *sgl, int nelems,
+				       enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	for_each_sg(sgl, sg, nelems, i)
+		__dma_map_area(sg_virt(sg), sg->length, dir);
+}
+
+static int __iommu_map_sg_attrs(struct device *dev, struct scatterlist *sgl,
+				int nelems, enum dma_data_direction dir,
+				struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_sg_for_device(dev, sgl, nelems, dir);
+
+	return iommu_dma_map_sg(dev, sgl, nelems,
+			dma_direction_to_prot(dir, coherent));
+}
+
+static void __iommu_unmap_sg_attrs(struct device *dev,
+				   struct scatterlist *sgl, int nelems,
+				   enum dma_data_direction dir,
+				   struct dma_attrs *attrs)
+{
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_sg_for_cpu(dev, sgl, nelems, dir);
+
+	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);
+}
+
+static struct dma_map_ops iommu_dma_ops = {
+	.alloc = __iommu_alloc_attrs,
+	.free = __iommu_free_attrs,
+	.mmap = __iommu_mmap_attrs,
+	.get_sgtable = __iommu_get_sgtable,
+	.map_page = __iommu_map_page,
+	.unmap_page = __iommu_unmap_page,
+	.map_sg = __iommu_map_sg_attrs,
+	.unmap_sg = __iommu_unmap_sg_attrs,
+	.sync_single_for_cpu = __iommu_sync_single_for_cpu,
+	.sync_single_for_device = __iommu_sync_single_for_device,
+	.sync_sg_for_cpu = __iommu_sync_sg_for_cpu,
+	.sync_sg_for_device = __iommu_sync_sg_for_device,
+	.dma_supported = iommu_dma_supported,
+	.mapping_error = iommu_dma_mapping_error,
+};
+
+/*
+ * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
+ * everything it needs to - the device isn't yet fully created, and the
+ * IOMMU driver hasn't seen it yet, so we need this delayed attachment
+ * dance. Once IOMMU probe ordering is sorted to move the
+ * arch_setup_dma_ops() call later, all the notifier bits below become
+ * unnecessary, and will go away.
+ */
+struct iommu_dma_notifier_data {
+	struct list_head list;
+	struct device *dev;
+	struct iommu_domain *dma_domain;
+};
+static LIST_HEAD(iommu_dma_masters);
+static DEFINE_MUTEX(iommu_dma_notifier_lock);
+
+static int __iommu_attach_notifier(struct notifier_block *nb,
+				   unsigned long action, void *data)
+{
+	struct iommu_dma_notifier_data *master, *tmp;
+
+	if (action != BUS_NOTIFY_ADD_DEVICE)
+		return 0;
+	/*
+	 * We expect the list to only contain the most recent addition
+	 * (which should be the same device as in @data) so just process
+	 * the whole thing blindly. If any previous attachments did happen
+	 * to fail, they get a free retry since the domains are still live.
+	 */
+	mutex_lock(&iommu_dma_notifier_lock);
+	list_for_each_entry_safe(master, tmp, &iommu_dma_masters, list) {
+		if (iommu_attach_device(master->dma_domain, master->dev)) {
+			pr_warn("Failed to attach device %s to IOMMU mapping; retaining platform DMA ops\n",
+				dev_name(master->dev));
+		} else {
+			master->dev->archdata.dma_ops = &iommu_dma_ops;
+			list_del(&master->list);
+			kfree(master);
+		}
+	}
+	mutex_unlock(&iommu_dma_notifier_lock);
+	return 0;
+}
+
+static int register_iommu_dma_ops_notifier(struct bus_type *bus)
+{
+	int ret;
+	struct notifier_block *nb = kzalloc(sizeof(*nb), GFP_KERNEL);
+
+	/*
+	 * The device must be attached to a domain before the driver probe
+	 * routine gets a chance to start allocating DMA buffers. However,
+	 * the IOMMU driver also needs a chance to configure the iommu_group
+	 * via its add_device callback first, so we need to make the attach
+	 * happen between those two points. Since the IOMMU core uses a bus
+	 * notifier with default priority for add_device, do the same but
+	 * with a lower priority to ensure the appropriate ordering.
+	 */
+	nb->notifier_call = __iommu_attach_notifier;
+	nb->priority = -100;
+
+	ret = bus_register_notifier(bus, nb);
+	if (ret) {
+		pr_warn("Failed to register DMA domain notifier; IOMMU DMA ops unavailable on bus '%s'\n",
+			bus->name);
+		kfree(nb);
+	}
+	return ret;
+}
+
+static int queue_iommu_attach(struct iommu_domain *domain, struct device *dev)
+{
+	struct iommu_dma_notifier_data *iommudata = NULL;
+
+	iommudata = kzalloc(sizeof(*iommudata), GFP_KERNEL);
+	if (!iommudata)
+		return -ENOMEM;
+
+	iommudata->dev = dev;
+	iommudata->dma_domain = domain;
+
+	mutex_lock(&iommu_dma_notifier_lock);
+	list_add(&iommudata->list, &iommu_dma_masters);
+	mutex_unlock(&iommu_dma_notifier_lock);
+	return 0;
+}
+
+static int __init arm64_iommu_dma_init(void)
+{
+	int ret;
+
+	ret = iommu_dma_init();
+	if (!ret)
+		ret = register_iommu_dma_ops_notifier(&platform_bus_type);
+	if (!ret)
+		ret = register_iommu_dma_ops_notifier(&amba_bustype);
+	return ret;
+}
+arch_initcall(arm64_iommu_dma_init);
+
+/* Hijack some domain feature flags for the stop-gap meddling below */
+#define __IOMMU_DOMAIN_ARM64		(1U << 31)
+#define __IOMMU_DOMAIN_ARM64_IOVA	(1U << 30)
+
+static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+				  const struct iommu_ops *ops)
+{
+	struct iommu_domain *domain;
+	int err;
+
+	if (!ops)
+		return;
+	/*
+	 * In a perfect world, everything happened in the right order up to
+	 * here, and the IOMMU core has already attached the device to an
+	 * appropriate default domain for us to set up...
+	 */
+	domain = iommu_get_domain_for_dev(dev);
+	if (!domain) {
+		/*
+		 * Urgh. Reliable default domains for platform devices can't
+		 * happen anyway without some sensible way of handling
+		 * non-trivial groups. So until then, HORRIBLE HACKS!
+		 */
+		domain = ops->domain_alloc(IOMMU_DOMAIN_DMA);
+		if (!domain)
+			domain = ops->domain_alloc(IOMMU_DOMAIN_UNMANAGED);
+		if (!domain)
+			goto out_no_domain;
+
+		domain->ops = ops;
+		domain->type = IOMMU_DOMAIN_DMA | __IOMMU_DOMAIN_ARM64;
+		if (!domain->dma_api_cookie) {
+			domain->type |= __IOMMU_DOMAIN_ARM64_IOVA;
+			if (iommu_get_dma_cookie(domain))
+				goto out_put_domain;
+		}
+	}
+
+	if (iommu_dma_init_domain(domain, dma_base, size)) {
+		pr_warn("Failed to create %llu-byte IOMMU mapping for device %s\n",
+			size, dev_name(dev));
+		goto out_put_domain;
+	}
+
+	if (dev->iommu_group)
+		err = iommu_attach_device(domain, dev);
+	else
+		err = queue_iommu_attach(domain, dev);
+
+	if (!err) {
+		dev->archdata.dma_ops = &iommu_dma_ops;
+		return;
+	}
+
+out_put_domain:
+	if (domain->type & __IOMMU_DOMAIN_ARM64_IOVA)
+		iommu_put_dma_cookie(domain);
+	if (domain->type & __IOMMU_DOMAIN_ARM64)
+		ops->domain_free(domain);
+out_no_domain:
+	pr_warn("Failed to set up IOMMU domain for device %s\n", dev_name(dev));
+}
+
+#else
+
+static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+				  struct iommu_ops *iommu)
+{ }
+
+#endif  /* CONFIG_IOMMU_DMA */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 3/4] arm64: Add IOMMU dma_ops
@ 2015-07-16 18:40     ` Robin Murphy
  0 siblings, 0 replies; 24+ messages in thread
From: Robin Murphy @ 2015-07-16 18:40 UTC (permalink / raw
  To: linux-arm-kernel

Taking some inspiration from the arch/arm code, implement the
arch-specific side of the DMA mapping ops using the new IOMMU-DMA layer.

Unfortunately the device setup code has to start out as a big ugly mess
in order to work usefully right now, as 'proper' operation depends on
changes to device probe and DMA configuration ordering, IOMMU groups for
platform devices, and default domain support in arm/arm64 IOMMU drivers.
The workarounds here need only exist until that work is finished.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 arch/arm64/mm/dma-mapping.c | 428 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 428 insertions(+)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index d16a1ce..c89012e 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -526,3 +526,431 @@ static int __init dma_debug_do_init(void)
 	return 0;
 }
 fs_initcall(dma_debug_do_init);
+
+
+#ifdef CONFIG_IOMMU_DMA
+#include <linux/dma-iommu.h>
+#include <linux/platform_device.h>
+#include <linux/amba/bus.h>
+
+/* Thankfully, all cache ops are by VA so we can ignore phys here */
+static void flush_page(const void *virt, phys_addr_t phys)
+{
+	__dma_flush_range(virt, virt + PAGE_SIZE);
+}
+
+static void *__iommu_alloc_attrs(struct device *dev, size_t size,
+				 dma_addr_t *handle, gfp_t gfp,
+				 struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+	int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
+	void *addr;
+
+	if (WARN(!dev, "cannot create IOMMU mapping for unknown device\n"))
+		return NULL;
+
+	/*
+	 * Some drivers rely on this, and we may not want to potentially
+	 * expose stale kernel data to devices anyway.
+	 */
+	gfp |= __GFP_ZERO;
+
+	if (gfp & __GFP_WAIT) {
+		struct page **pages;
+		pgprot_t pgprot = coherent ? __pgprot(PROT_NORMAL) :
+					     __pgprot(PROT_NORMAL_NC);
+
+		pgprot = __get_dma_pgprot(attrs, pgprot, coherent);
+		pages = iommu_dma_alloc(dev, size, gfp, ioprot,	handle,
+					coherent ? NULL : flush_page);
+		if (!pages)
+			return NULL;
+
+		addr = dma_common_pages_remap(pages, size, VM_USERMAP, pgprot,
+					      __builtin_return_address(0));
+		if (!addr)
+			iommu_dma_free(dev, pages, size, handle);
+	} else {
+		struct page *page;
+		/*
+		 * In atomic context we can't remap anything, so we'll only
+		 * get the virtually contiguous buffer we need by way of a
+		 * physically contiguous allocation.
+		 */
+		if (coherent) {
+			page = alloc_pages(gfp, get_order(size));
+			addr = page ? page_address(page) : NULL;
+		} else {
+			addr = __alloc_from_pool(size, &page, gfp);
+		}
+		if (!addr)
+			return NULL;
+
+		*handle = iommu_dma_map_page(dev, page, 0, size, ioprot);
+		if (iommu_dma_mapping_error(dev, *handle)) {
+			if (coherent)
+				__free_pages(page, get_order(size));
+			else
+				__free_from_pool(addr, size);
+			addr = NULL;
+		}
+	}
+	return addr;
+}
+
+static void __iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
+			       dma_addr_t handle, struct dma_attrs *attrs)
+{
+	/*
+	 * @cpu_addr will be one of 3 things depending on how it was allocated:
+	 * - A remapped array of pages from iommu_dma_alloc(), for all
+	 *   non-atomic allocations.
+	 * - A non-cacheable alias from the atomic pool, for atomic
+	 *   allocations by non-coherent devices.
+	 * - A normal lowmem address, for atomic allocations by
+	 *   coherent devices.
+	 * Hence how dodgy the below logic looks...
+	 */
+	if (__in_atomic_pool(cpu_addr, size)) {
+		iommu_dma_unmap_page(dev, handle, size, 0, NULL);
+		__free_from_pool(cpu_addr, size);
+	} else if (is_vmalloc_addr(cpu_addr)){
+		struct vm_struct *area = find_vm_area(cpu_addr);
+
+		if (WARN_ON(!area || !area->pages))
+			return;
+		iommu_dma_free(dev, area->pages, size, &handle);
+		dma_common_free_remap(cpu_addr, size, VM_USERMAP);
+	} else {
+		iommu_dma_unmap_page(dev, handle, size, 0, NULL);
+		__free_pages(virt_to_page(cpu_addr), get_order(size));
+	}
+}
+
+static int __iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
+			      void *cpu_addr, dma_addr_t dma_addr, size_t size,
+			      struct dma_attrs *attrs)
+{
+	struct vm_struct *area;
+	int ret;
+
+	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot,
+					     is_device_dma_coherent(dev));
+
+	if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret))
+		return ret;
+
+	area = find_vm_area(cpu_addr);
+	if (WARN_ON(!area || !area->pages))
+		return -ENXIO;
+
+	return iommu_dma_mmap(area->pages, size, vma);
+}
+
+static int __iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
+			       void *cpu_addr, dma_addr_t dma_addr,
+			       size_t size, struct dma_attrs *attrs)
+{
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	struct vm_struct *area = find_vm_area(cpu_addr);
+
+	if (WARN_ON(!area || !area->pages))
+		return -ENXIO;
+
+	return sg_alloc_table_from_pages(sgt, area->pages, count, 0, size,
+					 GFP_KERNEL);
+}
+
+static void __iommu_sync_single_for_cpu(struct device *dev,
+					dma_addr_t dev_addr, size_t size,
+					enum dma_data_direction dir)
+{
+	phys_addr_t phys;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	__dma_unmap_area(phys_to_virt(phys), size, dir);
+}
+
+static void __iommu_sync_single_for_device(struct device *dev,
+					   dma_addr_t dev_addr, size_t size,
+					   enum dma_data_direction dir)
+{
+	phys_addr_t phys;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	__dma_map_area(phys_to_virt(phys), size, dir);
+}
+
+static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
+				   unsigned long offset, size_t size,
+				   enum dma_data_direction dir,
+				   struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+	int prot = dma_direction_to_prot(dir, coherent);
+	dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
+
+	if (!iommu_dma_mapping_error(dev, dev_addr) &&
+	    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_single_for_device(dev, dev_addr, size, dir);
+
+	return dev_addr;
+}
+
+static void __iommu_unmap_page(struct device *dev, dma_addr_t dev_addr,
+			       size_t size, enum dma_data_direction dir,
+			       struct dma_attrs *attrs)
+{
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_single_for_cpu(dev, dev_addr, size, dir);
+
+	iommu_dma_unmap_page(dev, dev_addr, size, dir, attrs);
+}
+
+static void __iommu_sync_sg_for_cpu(struct device *dev,
+				    struct scatterlist *sgl, int nelems,
+				    enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	for_each_sg(sgl, sg, nelems, i)
+		__dma_unmap_area(sg_virt(sg), sg->length, dir);
+}
+
+static void __iommu_sync_sg_for_device(struct device *dev,
+				       struct scatterlist *sgl, int nelems,
+				       enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	for_each_sg(sgl, sg, nelems, i)
+		__dma_map_area(sg_virt(sg), sg->length, dir);
+}
+
+static int __iommu_map_sg_attrs(struct device *dev, struct scatterlist *sgl,
+				int nelems, enum dma_data_direction dir,
+				struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_sg_for_device(dev, sgl, nelems, dir);
+
+	return iommu_dma_map_sg(dev, sgl, nelems,
+			dma_direction_to_prot(dir, coherent));
+}
+
+static void __iommu_unmap_sg_attrs(struct device *dev,
+				   struct scatterlist *sgl, int nelems,
+				   enum dma_data_direction dir,
+				   struct dma_attrs *attrs)
+{
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_sg_for_cpu(dev, sgl, nelems, dir);
+
+	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);
+}
+
+static struct dma_map_ops iommu_dma_ops = {
+	.alloc = __iommu_alloc_attrs,
+	.free = __iommu_free_attrs,
+	.mmap = __iommu_mmap_attrs,
+	.get_sgtable = __iommu_get_sgtable,
+	.map_page = __iommu_map_page,
+	.unmap_page = __iommu_unmap_page,
+	.map_sg = __iommu_map_sg_attrs,
+	.unmap_sg = __iommu_unmap_sg_attrs,
+	.sync_single_for_cpu = __iommu_sync_single_for_cpu,
+	.sync_single_for_device = __iommu_sync_single_for_device,
+	.sync_sg_for_cpu = __iommu_sync_sg_for_cpu,
+	.sync_sg_for_device = __iommu_sync_sg_for_device,
+	.dma_supported = iommu_dma_supported,
+	.mapping_error = iommu_dma_mapping_error,
+};
+
+/*
+ * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
+ * everything it needs to - the device isn't yet fully created, and the
+ * IOMMU driver hasn't seen it yet, so we need this delayed attachment
+ * dance. Once IOMMU probe ordering is sorted to move the
+ * arch_setup_dma_ops() call later, all the notifier bits below become
+ * unnecessary, and will go away.
+ */
+struct iommu_dma_notifier_data {
+	struct list_head list;
+	struct device *dev;
+	struct iommu_domain *dma_domain;
+};
+static LIST_HEAD(iommu_dma_masters);
+static DEFINE_MUTEX(iommu_dma_notifier_lock);
+
+static int __iommu_attach_notifier(struct notifier_block *nb,
+				   unsigned long action, void *data)
+{
+	struct iommu_dma_notifier_data *master, *tmp;
+
+	if (action != BUS_NOTIFY_ADD_DEVICE)
+		return 0;
+	/*
+	 * We expect the list to only contain the most recent addition
+	 * (which should be the same device as in @data) so just process
+	 * the whole thing blindly. If any previous attachments did happen
+	 * to fail, they get a free retry since the domains are still live.
+	 */
+	mutex_lock(&iommu_dma_notifier_lock);
+	list_for_each_entry_safe(master, tmp, &iommu_dma_masters, list) {
+		if (iommu_attach_device(master->dma_domain, master->dev)) {
+			pr_warn("Failed to attach device %s to IOMMU mapping; retaining platform DMA ops\n",
+				dev_name(master->dev));
+		} else {
+			master->dev->archdata.dma_ops = &iommu_dma_ops;
+			list_del(&master->list);
+			kfree(master);
+		}
+	}
+	mutex_unlock(&iommu_dma_notifier_lock);
+	return 0;
+}
+
+static int register_iommu_dma_ops_notifier(struct bus_type *bus)
+{
+	int ret;
+	struct notifier_block *nb = kzalloc(sizeof(*nb), GFP_KERNEL);
+
+	/*
+	 * The device must be attached to a domain before the driver probe
+	 * routine gets a chance to start allocating DMA buffers. However,
+	 * the IOMMU driver also needs a chance to configure the iommu_group
+	 * via its add_device callback first, so we need to make the attach
+	 * happen between those two points. Since the IOMMU core uses a bus
+	 * notifier with default priority for add_device, do the same but
+	 * with a lower priority to ensure the appropriate ordering.
+	 */
+	nb->notifier_call = __iommu_attach_notifier;
+	nb->priority = -100;
+
+	ret = bus_register_notifier(bus, nb);
+	if (ret) {
+		pr_warn("Failed to register DMA domain notifier; IOMMU DMA ops unavailable on bus '%s'\n",
+			bus->name);
+		kfree(nb);
+	}
+	return ret;
+}
+
+static int queue_iommu_attach(struct iommu_domain *domain, struct device *dev)
+{
+	struct iommu_dma_notifier_data *iommudata = NULL;
+
+	iommudata = kzalloc(sizeof(*iommudata), GFP_KERNEL);
+	if (!iommudata)
+		return -ENOMEM;
+
+	iommudata->dev = dev;
+	iommudata->dma_domain = domain;
+
+	mutex_lock(&iommu_dma_notifier_lock);
+	list_add(&iommudata->list, &iommu_dma_masters);
+	mutex_unlock(&iommu_dma_notifier_lock);
+	return 0;
+}
+
+static int __init arm64_iommu_dma_init(void)
+{
+	int ret;
+
+	ret = iommu_dma_init();
+	if (!ret)
+		ret = register_iommu_dma_ops_notifier(&platform_bus_type);
+	if (!ret)
+		ret = register_iommu_dma_ops_notifier(&amba_bustype);
+	return ret;
+}
+arch_initcall(arm64_iommu_dma_init);
+
+/* Hijack some domain feature flags for the stop-gap meddling below */
+#define __IOMMU_DOMAIN_ARM64		(1U << 31)
+#define __IOMMU_DOMAIN_ARM64_IOVA	(1U << 30)
+
+static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+				  const struct iommu_ops *ops)
+{
+	struct iommu_domain *domain;
+	int err;
+
+	if (!ops)
+		return;
+	/*
+	 * In a perfect world, everything happened in the right order up to
+	 * here, and the IOMMU core has already attached the device to an
+	 * appropriate default domain for us to set up...
+	 */
+	domain = iommu_get_domain_for_dev(dev);
+	if (!domain) {
+		/*
+		 * Urgh. Reliable default domains for platform devices can't
+		 * happen anyway without some sensible way of handling
+		 * non-trivial groups. So until then, HORRIBLE HACKS!
+		 */
+		domain = ops->domain_alloc(IOMMU_DOMAIN_DMA);
+		if (!domain)
+			domain = ops->domain_alloc(IOMMU_DOMAIN_UNMANAGED);
+		if (!domain)
+			goto out_no_domain;
+
+		domain->ops = ops;
+		domain->type = IOMMU_DOMAIN_DMA | __IOMMU_DOMAIN_ARM64;
+		if (!domain->dma_api_cookie) {
+			domain->type |= __IOMMU_DOMAIN_ARM64_IOVA;
+			if (iommu_get_dma_cookie(domain))
+				goto out_put_domain;
+		}
+	}
+
+	if (iommu_dma_init_domain(domain, dma_base, size)) {
+		pr_warn("Failed to create %llu-byte IOMMU mapping for device %s\n",
+			size, dev_name(dev));
+		goto out_put_domain;
+	}
+
+	if (dev->iommu_group)
+		err = iommu_attach_device(domain, dev);
+	else
+		err = queue_iommu_attach(domain, dev);
+
+	if (!err) {
+		dev->archdata.dma_ops = &iommu_dma_ops;
+		return;
+	}
+
+out_put_domain:
+	if (domain->type & __IOMMU_DOMAIN_ARM64_IOVA)
+		iommu_put_dma_cookie(domain);
+	if (domain->type & __IOMMU_DOMAIN_ARM64)
+		ops->domain_free(domain);
+out_no_domain:
+	pr_warn("Failed to set up IOMMU domain for device %s\n", dev_name(dev));
+}
+
+#else
+
+static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+				  struct iommu_ops *iommu)
+{ }
+
+#endif  /* CONFIG_IOMMU_DMA */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 4/4] arm64: Hook up IOMMU dma_ops
  2015-07-16 18:40 ` Robin Murphy
@ 2015-07-16 18:40     ` Robin Murphy
  -1 siblings, 0 replies; 24+ messages in thread
From: Robin Murphy @ 2015-07-16 18:40 UTC (permalink / raw
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	joro-zLv9SwRftAIdnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw, arnd-r2nGTMty4D4,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA, yong.wu-NuS5LvNUpcJWk0Htik3J/w

With iommu_dma_ops in place, hook them up to the configuration code, so
IOMMU-fronted devices will get them automatically.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 arch/arm64/Kconfig                   |  1 +
 arch/arm64/include/asm/dma-mapping.h | 15 +++++++--------
 arch/arm64/mm/dma-mapping.c          | 24 ++++++++++++++++++++++++
 3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0f6edb1..3509621 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -72,6 +72,7 @@ config ARM64
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE
 	select HAVE_SYSCALL_TRACEPOINTS
+	select IOMMU_DMA if IOMMU_SUPPORT
 	select IRQ_DOMAIN
 	select IRQ_FORCED_THREADING
 	select MODULES_USE_ELF_RELA
diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index f0d6d0b..7f9edcb 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -56,16 +56,15 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 		return __generic_dma_ops(dev);
 }
 
-static inline void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				      struct iommu_ops *iommu, bool coherent)
-{
-	if (!acpi_disabled && !dev->archdata.dma_ops)
-		dev->archdata.dma_ops = dma_ops;
-
-	dev->archdata.dma_coherent = coherent;
-}
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+			struct iommu_ops *iommu, bool coherent);
 #define arch_setup_dma_ops	arch_setup_dma_ops
 
+#ifdef CONFIG_IOMMU_DMA
+void arch_teardown_dma_ops(struct device *dev);
+#define arch_teardown_dma_ops	arch_teardown_dma_ops
+#endif
+
 /* do not use this function in a driver */
 static inline bool is_device_dma_coherent(struct device *dev)
 {
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index c89012e..bdab211 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -947,6 +947,20 @@ out_no_domain:
 	pr_warn("Failed to set up IOMMU domain for device %s\n", dev_name(dev));
 }
 
+void arch_teardown_dma_ops(struct device *dev)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (domain) {
+		iommu_detach_device(domain, dev);
+		if (domain->type & __IOMMU_DOMAIN_ARM64_IOVA)
+			iommu_put_dma_cookie(domain);
+		if (domain->type & __IOMMU_DOMAIN_ARM64)
+			iommu_domain_free(domain);
+		dev->archdata.dma_ops = NULL;
+	}
+}
+
 #else
 
 static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
@@ -954,3 +968,13 @@ static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 { }
 
 #endif  /* CONFIG_IOMMU_DMA */
+
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+			struct iommu_ops *iommu, bool coherent)
+{
+	if (!acpi_disabled && !dev->archdata.dma_ops)
+		dev->archdata.dma_ops = dma_ops;
+
+	dev->archdata.dma_coherent = coherent;
+	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 4/4] arm64: Hook up IOMMU dma_ops
@ 2015-07-16 18:40     ` Robin Murphy
  0 siblings, 0 replies; 24+ messages in thread
From: Robin Murphy @ 2015-07-16 18:40 UTC (permalink / raw
  To: linux-arm-kernel

With iommu_dma_ops in place, hook them up to the configuration code, so
IOMMU-fronted devices will get them automatically.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 arch/arm64/Kconfig                   |  1 +
 arch/arm64/include/asm/dma-mapping.h | 15 +++++++--------
 arch/arm64/mm/dma-mapping.c          | 24 ++++++++++++++++++++++++
 3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0f6edb1..3509621 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -72,6 +72,7 @@ config ARM64
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE
 	select HAVE_SYSCALL_TRACEPOINTS
+	select IOMMU_DMA if IOMMU_SUPPORT
 	select IRQ_DOMAIN
 	select IRQ_FORCED_THREADING
 	select MODULES_USE_ELF_RELA
diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index f0d6d0b..7f9edcb 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -56,16 +56,15 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 		return __generic_dma_ops(dev);
 }
 
-static inline void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				      struct iommu_ops *iommu, bool coherent)
-{
-	if (!acpi_disabled && !dev->archdata.dma_ops)
-		dev->archdata.dma_ops = dma_ops;
-
-	dev->archdata.dma_coherent = coherent;
-}
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+			struct iommu_ops *iommu, bool coherent);
 #define arch_setup_dma_ops	arch_setup_dma_ops
 
+#ifdef CONFIG_IOMMU_DMA
+void arch_teardown_dma_ops(struct device *dev);
+#define arch_teardown_dma_ops	arch_teardown_dma_ops
+#endif
+
 /* do not use this function in a driver */
 static inline bool is_device_dma_coherent(struct device *dev)
 {
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index c89012e..bdab211 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -947,6 +947,20 @@ out_no_domain:
 	pr_warn("Failed to set up IOMMU domain for device %s\n", dev_name(dev));
 }
 
+void arch_teardown_dma_ops(struct device *dev)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (domain) {
+		iommu_detach_device(domain, dev);
+		if (domain->type & __IOMMU_DOMAIN_ARM64_IOVA)
+			iommu_put_dma_cookie(domain);
+		if (domain->type & __IOMMU_DOMAIN_ARM64)
+			iommu_domain_free(domain);
+		dev->archdata.dma_ops = NULL;
+	}
+}
+
 #else
 
 static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
@@ -954,3 +968,13 @@ static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 { }
 
 #endif  /* CONFIG_IOMMU_DMA */
+
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+			struct iommu_ops *iommu, bool coherent)
+{
+	if (!acpi_disabled && !dev->archdata.dma_ops)
+		dev->archdata.dma_ops = dma_ops;
+
+	dev->archdata.dma_coherent = coherent;
+	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 0/4] arm64: IOMMU-backed DMA mapping
  2015-07-16 18:40 ` Robin Murphy
@ 2015-07-20 15:26     ` Joerg Roedel
  -1 siblings, 0 replies; 24+ messages in thread
From: Joerg Roedel @ 2015-07-20 15:26 UTC (permalink / raw
  To: Robin Murphy
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw, arnd-r2nGTMty4D4,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA, yong.wu-NuS5LvNUpcJWk0Htik3J/w

On Thu, Jul 16, 2015 at 07:40:11PM +0100, Robin Murphy wrote:
>  arch/arm64/Kconfig                   |   1 +
>  arch/arm64/include/asm/dma-mapping.h |  15 +-
>  arch/arm64/mm/dma-mapping.c          | 452 +++++++++++++++++++++++++++++

What happened to the plan to merge this with the existing iommu-based
dma-api implementation for 32 bit ARM?


	Joerg

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 0/4] arm64: IOMMU-backed DMA mapping
@ 2015-07-20 15:26     ` Joerg Roedel
  0 siblings, 0 replies; 24+ messages in thread
From: Joerg Roedel @ 2015-07-20 15:26 UTC (permalink / raw
  To: linux-arm-kernel

On Thu, Jul 16, 2015 at 07:40:11PM +0100, Robin Murphy wrote:
>  arch/arm64/Kconfig                   |   1 +
>  arch/arm64/include/asm/dma-mapping.h |  15 +-
>  arch/arm64/mm/dma-mapping.c          | 452 +++++++++++++++++++++++++++++

What happened to the plan to merge this with the existing iommu-based
dma-api implementation for 32 bit ARM?


	Joerg

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 0/4] arm64: IOMMU-backed DMA mapping
  2015-07-20 15:26     ` Joerg Roedel
@ 2015-07-20 17:23         ` Robin Murphy
  -1 siblings, 0 replies; 24+ messages in thread
From: Robin Murphy @ 2015-07-20 17:23 UTC (permalink / raw
  To: Joerg Roedel
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw@public.gmane.org,
	arnd-r2nGTMty4D4@public.gmane.org, Catalin Marinas, Will Deacon,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA@public.gmane.org,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org,
	treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org,
	yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org

Hi Joerg,

On 20/07/15 16:26, Joerg Roedel wrote:
> On Thu, Jul 16, 2015 at 07:40:11PM +0100, Robin Murphy wrote:
>>   arch/arm64/Kconfig                   |   1 +
>>   arch/arm64/include/asm/dma-mapping.h |  15 +-
>>   arch/arm64/mm/dma-mapping.c          | 452 +++++++++++++++++++++++++++++
>
> What happened to the plan to merge this with the existing iommu-based
> dma-api implementation for 32 bit ARM?

The issue currently is that there are a bunch of drivers using the 
exported arm_iommu_* functions directly. From what I can tell, they seem 
like they could probably all be converted to using default domains 
and/or the new domain type abstractions via the core IOMMU API, which 
would then allow killing off dma_iommu_mapping and rewriting the 
arch/arm implementation to use the new shared code. I don't currently 
have any 32-bit platform to test with, so I'm a little dubious of taking 
that all on myself right now.

In the meantime on arm64, DMA mapping ops are needed for SMMUv3 platform 
device support, the Mediatek M4U patches and my own SMMUv2 work, so it 
would be very useful to get the arm64 and common code in as a first 
step, then look at cleaning up arch/arm for 4.4 without dangling 
dependencies.

Robin.

>
>
> 	Joerg
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 0/4] arm64: IOMMU-backed DMA mapping
@ 2015-07-20 17:23         ` Robin Murphy
  0 siblings, 0 replies; 24+ messages in thread
From: Robin Murphy @ 2015-07-20 17:23 UTC (permalink / raw
  To: linux-arm-kernel

Hi Joerg,

On 20/07/15 16:26, Joerg Roedel wrote:
> On Thu, Jul 16, 2015 at 07:40:11PM +0100, Robin Murphy wrote:
>>   arch/arm64/Kconfig                   |   1 +
>>   arch/arm64/include/asm/dma-mapping.h |  15 +-
>>   arch/arm64/mm/dma-mapping.c          | 452 +++++++++++++++++++++++++++++
>
> What happened to the plan to merge this with the existing iommu-based
> dma-api implementation for 32 bit ARM?

The issue currently is that there are a bunch of drivers using the 
exported arm_iommu_* functions directly. From what I can tell, they seem 
like they could probably all be converted to using default domains 
and/or the new domain type abstractions via the core IOMMU API, which 
would then allow killing off dma_iommu_mapping and rewriting the 
arch/arm implementation to use the new shared code. I don't currently 
have any 32-bit platform to test with, so I'm a little dubious of taking 
that all on myself right now.

In the meantime on arm64, DMA mapping ops are needed for SMMUv3 platform 
device support, the Mediatek M4U patches and my own SMMUv2 work, so it 
would be very useful to get the arm64 and common code in as a first 
step, then look at cleaning up arch/arm for 4.4 without dangling 
dependencies.

Robin.

>
>
> 	Joerg
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 0/4] arm64: IOMMU-backed DMA mapping
  2015-07-20 17:23         ` Robin Murphy
@ 2015-07-28 12:46             ` Will Deacon
  -1 siblings, 0 replies; 24+ messages in thread
From: Will Deacon @ 2015-07-28 12:46 UTC (permalink / raw
  To: Robin Murphy
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw@public.gmane.org,
	arnd-r2nGTMty4D4@public.gmane.org, Catalin Marinas,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA@public.gmane.org,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org,
	treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org,
	yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org

Hi Joerg, Robin,

On Mon, Jul 20, 2015 at 06:23:35PM +0100, Robin Murphy wrote:
> On 20/07/15 16:26, Joerg Roedel wrote:
> > On Thu, Jul 16, 2015 at 07:40:11PM +0100, Robin Murphy wrote:
> >>   arch/arm64/Kconfig                   |   1 +
> >>   arch/arm64/include/asm/dma-mapping.h |  15 +-
> >>   arch/arm64/mm/dma-mapping.c          | 452 +++++++++++++++++++++++++++++
> >
> > What happened to the plan to merge this with the existing iommu-based
> > dma-api implementation for 32 bit ARM?
> 
> The issue currently is that there are a bunch of drivers using the 
> exported arm_iommu_* functions directly. From what I can tell, they seem 
> like they could probably all be converted to using default domains 
> and/or the new domain type abstractions via the core IOMMU API, which 
> would then allow killing off dma_iommu_mapping and rewriting the 
> arch/arm implementation to use the new shared code. I don't currently 
> have any 32-bit platform to test with, so I'm a little dubious of taking 
> that all on myself right now.
> 
> In the meantime on arm64, DMA mapping ops are needed for SMMUv3 platform 
> device support, the Mediatek M4U patches and my own SMMUv2 work, so it 
> would be very useful to get the arm64 and common code in as a first 
> step, then look at cleaning up arch/arm for 4.4 without dangling 
> dependencies.

What's the plan with this? Do we need to port 32-bit ARM before it's a
candidate for merging, or can we push ahead with getting this up and
running for arm64 (which currently can't use an IOMMU for DMA)?

Will

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 0/4] arm64: IOMMU-backed DMA mapping
@ 2015-07-28 12:46             ` Will Deacon
  0 siblings, 0 replies; 24+ messages in thread
From: Will Deacon @ 2015-07-28 12:46 UTC (permalink / raw
  To: linux-arm-kernel

Hi Joerg, Robin,

On Mon, Jul 20, 2015 at 06:23:35PM +0100, Robin Murphy wrote:
> On 20/07/15 16:26, Joerg Roedel wrote:
> > On Thu, Jul 16, 2015 at 07:40:11PM +0100, Robin Murphy wrote:
> >>   arch/arm64/Kconfig                   |   1 +
> >>   arch/arm64/include/asm/dma-mapping.h |  15 +-
> >>   arch/arm64/mm/dma-mapping.c          | 452 +++++++++++++++++++++++++++++
> >
> > What happened to the plan to merge this with the existing iommu-based
> > dma-api implementation for 32 bit ARM?
> 
> The issue currently is that there are a bunch of drivers using the 
> exported arm_iommu_* functions directly. From what I can tell, they seem 
> like they could probably all be converted to using default domains 
> and/or the new domain type abstractions via the core IOMMU API, which 
> would then allow killing off dma_iommu_mapping and rewriting the 
> arch/arm implementation to use the new shared code. I don't currently 
> have any 32-bit platform to test with, so I'm a little dubious of taking 
> that all on myself right now.
> 
> In the meantime on arm64, DMA mapping ops are needed for SMMUv3 platform 
> device support, the Mediatek M4U patches and my own SMMUv2 work, so it 
> would be very useful to get the arm64 and common code in as a first 
> step, then look at cleaning up arch/arm for 4.4 without dangling 
> dependencies.

What's the plan with this? Do we need to port 32-bit ARM before it's a
candidate for merging, or can we push ahead with getting this up and
running for arm64 (which currently can't use an IOMMU for DMA)?

Will

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/4] iommu/iova: Avoid over-allocating when size-aligned
  2015-07-16 18:40     ` Robin Murphy
@ 2015-07-28 13:31         ` David Woodhouse
  -1 siblings, 0 replies; 24+ messages in thread
From: David Woodhouse @ 2015-07-28 13:31 UTC (permalink / raw
  To: Robin Murphy, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	joro-zLv9SwRftAIdnm+yROfE0A, will.deacon-5wv7dgnIgG8,
	catalin.marinas-5wv7dgnIgG8
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw, arnd-r2nGTMty4D4,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA, yong.wu-NuS5LvNUpcJWk0Htik3J/w


[-- Attachment #1.1: Type: text/plain, Size: 1394 bytes --]

On Thu, 2015-07-16 at 19:40 +0100, Robin Murphy wrote:
> Currently, allocating a size-aligned IOVA region quietly adjusts the
> actual allocation size in the process, returning a rounded-up
> power-of-two-sized allocation. This results in mismatched behaviour in
> the IOMMU driver if the original size was not a power of two, where the
> original size is mapped, but the rounded-up IOVA size is unmapped.
> 
> Whilst some IOMMUs will happily unmap already-unmapped pages, others
> consider this an error, so fix it by computing the necessary alignment
> padding without altering the actual allocation size. Also clean up by
> making pad_size unsigned, since its callers always pass unsigned values
> and negative padding makes little sense here anyway.

Applied; thanks.

I'm not 100% sure we *need* the hunk in intel-iommu.c, we can probably
live without rounding the size up. It means we'll use huge pages less
often, but it's not clear that using them when we didn't *mean* to map
the full size of them was the right thing to do in the first place.

But it makes sense to apply the patch as-is, without changing the
effective behaviour, and ponder that more deeply later.

Thanks.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org                              Intel Corporation

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 1/4] iommu/iova: Avoid over-allocating when size-aligned
@ 2015-07-28 13:31         ` David Woodhouse
  0 siblings, 0 replies; 24+ messages in thread
From: David Woodhouse @ 2015-07-28 13:31 UTC (permalink / raw
  To: linux-arm-kernel

On Thu, 2015-07-16 at 19:40 +0100, Robin Murphy wrote:
> Currently, allocating a size-aligned IOVA region quietly adjusts the
> actual allocation size in the process, returning a rounded-up
> power-of-two-sized allocation. This results in mismatched behaviour in
> the IOMMU driver if the original size was not a power of two, where the
> original size is mapped, but the rounded-up IOVA size is unmapped.
> 
> Whilst some IOMMUs will happily unmap already-unmapped pages, others
> consider this an error, so fix it by computing the necessary alignment
> padding without altering the actual allocation size. Also clean up by
> making pad_size unsigned, since its callers always pass unsigned values
> and negative padding makes little sense here anyway.

Applied; thanks.

I'm not 100% sure we *need* the hunk in intel-iommu.c, we can probably
live without rounding the size up. It means we'll use huge pages less
often, but it's not clear that using them when we didn't *mean* to map
the full size of them was the right thing to do in the first place.

But it makes sense to apply the patch as-is, without changing the
effective behaviour, and ponder that more deeply later.

Thanks.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse at intel.com                              Intel Corporation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5691 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150728/785b9d33/attachment.bin>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/4] iommu: Implement common IOMMU ops for DMA mapping
  2015-07-16 18:40     ` Robin Murphy
@ 2015-07-28 16:45         ` Catalin Marinas
  -1 siblings, 0 replies; 24+ messages in thread
From: Catalin Marinas @ 2015-07-28 16:45 UTC (permalink / raw
  To: Robin Murphy
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw, arnd-r2nGTMty4D4,
	will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA, yong.wu-NuS5LvNUpcJWk0Htik3J/w,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Jul 16, 2015 at 07:40:13PM +0100, Robin Murphy wrote:
> +/**
> + * iommu_dma_alloc - Allocate and map a buffer contiguous in IOVA space
> + * @dev: Device to allocate memory for. Must be a real device
> + *	 attached to an iommu_dma_domain
> + * @size: Size of buffer in bytes
> + * @gfp: Allocation flags
> + * @prot: IOMMU mapping flags
> + * @handle: Out argument for allocated DMA handle
> + * @flush_page: Arch callback to flush a single page from all caches to
> + *		ensure DMA visibility of __GFP_ZERO. May be NULL if not
> + *		required.
> + *
> + * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
> + * but an IOMMU which supports smaller pages might not map the whole thing.
> + *
> + * Return: Array of struct page pointers describing the buffer,
> + *	   or NULL on failure.
> + */
> +struct page **iommu_dma_alloc(struct device *dev, size_t size,
> +		gfp_t gfp, int prot, dma_addr_t *handle,
> +		void (*flush_page)(const void *, phys_addr_t))
> +{
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +	struct iova_domain *iovad = domain->dma_api_cookie;
> +	struct iova *iova;
> +	struct page **pages;
> +	struct sg_table sgt;
> +	dma_addr_t dma_addr;
> +	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
> +
> +	*handle = DMA_ERROR_CODE;
> +
> +	pages = __iommu_dma_alloc_pages(count, gfp);
> +	if (!pages)
> +		return NULL;
> +
> +	iova = __alloc_iova(iovad, size, dev->coherent_dma_mask);
> +	if (!iova)
> +		goto out_free_pages;
> +
> +	size = iova_align(iovad, size);
> +	if (sg_alloc_table_from_pages(&sgt, pages, count, 0, size, GFP_KERNEL))
> +		goto out_free_iova;
> +
> +	if (gfp & __GFP_ZERO) {
> +		struct sg_mapping_iter miter;
> +		/*
> +		 * The flushing provided by the SG_MITER_TO_SG flag only
> +		 * applies to highmem pages, so it won't do the job here.
> +		 */

The comment here is slightly wrong. The SG_MITER_FROM_SG flushing is
done by calling flush_kernel_dcache_page(). This function can be no-op
even on architectures implementing highmem. For example, on arm32 it
only does some flushing if there is a potential D-cache alias with user
space. The flush that you call below is for DMA operations, something
entirely different.

> +		sg_miter_start(&miter, sgt.sgl, sgt.orig_nents, SG_MITER_FROM_SG);
> +		while (sg_miter_next(&miter)) {
> +			memset(miter.addr, 0, PAGE_SIZE);

Isn't __GFP_ZERO already honoured by alloc_pages in
__iommu_dma_alloc_pages?

> +			if (flush_page)
> +				flush_page(miter.addr, page_to_phys(miter.page));

Could you instead do the check as !(prot & IOMEM_CACHE) so that it saves
architecture code from passing NULL when coherent?

I'm still not convinced we need this flushing here but I'll follow up in
patch 3/4.

-- 
Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 2/4] iommu: Implement common IOMMU ops for DMA mapping
@ 2015-07-28 16:45         ` Catalin Marinas
  0 siblings, 0 replies; 24+ messages in thread
From: Catalin Marinas @ 2015-07-28 16:45 UTC (permalink / raw
  To: linux-arm-kernel

On Thu, Jul 16, 2015 at 07:40:13PM +0100, Robin Murphy wrote:
> +/**
> + * iommu_dma_alloc - Allocate and map a buffer contiguous in IOVA space
> + * @dev: Device to allocate memory for. Must be a real device
> + *	 attached to an iommu_dma_domain
> + * @size: Size of buffer in bytes
> + * @gfp: Allocation flags
> + * @prot: IOMMU mapping flags
> + * @handle: Out argument for allocated DMA handle
> + * @flush_page: Arch callback to flush a single page from all caches to
> + *		ensure DMA visibility of __GFP_ZERO. May be NULL if not
> + *		required.
> + *
> + * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
> + * but an IOMMU which supports smaller pages might not map the whole thing.
> + *
> + * Return: Array of struct page pointers describing the buffer,
> + *	   or NULL on failure.
> + */
> +struct page **iommu_dma_alloc(struct device *dev, size_t size,
> +		gfp_t gfp, int prot, dma_addr_t *handle,
> +		void (*flush_page)(const void *, phys_addr_t))
> +{
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +	struct iova_domain *iovad = domain->dma_api_cookie;
> +	struct iova *iova;
> +	struct page **pages;
> +	struct sg_table sgt;
> +	dma_addr_t dma_addr;
> +	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
> +
> +	*handle = DMA_ERROR_CODE;
> +
> +	pages = __iommu_dma_alloc_pages(count, gfp);
> +	if (!pages)
> +		return NULL;
> +
> +	iova = __alloc_iova(iovad, size, dev->coherent_dma_mask);
> +	if (!iova)
> +		goto out_free_pages;
> +
> +	size = iova_align(iovad, size);
> +	if (sg_alloc_table_from_pages(&sgt, pages, count, 0, size, GFP_KERNEL))
> +		goto out_free_iova;
> +
> +	if (gfp & __GFP_ZERO) {
> +		struct sg_mapping_iter miter;
> +		/*
> +		 * The flushing provided by the SG_MITER_TO_SG flag only
> +		 * applies to highmem pages, so it won't do the job here.
> +		 */

The comment here is slightly wrong. The SG_MITER_FROM_SG flushing is
done by calling flush_kernel_dcache_page(). This function can be no-op
even on architectures implementing highmem. For example, on arm32 it
only does some flushing if there is a potential D-cache alias with user
space. The flush that you call below is for DMA operations, something
entirely different.

> +		sg_miter_start(&miter, sgt.sgl, sgt.orig_nents, SG_MITER_FROM_SG);
> +		while (sg_miter_next(&miter)) {
> +			memset(miter.addr, 0, PAGE_SIZE);

Isn't __GFP_ZERO already honoured by alloc_pages in
__iommu_dma_alloc_pages?

> +			if (flush_page)
> +				flush_page(miter.addr, page_to_phys(miter.page));

Could you instead do the check as !(prot & IOMEM_CACHE) so that it saves
architecture code from passing NULL when coherent?

I'm still not convinced we need this flushing here but I'll follow up in
patch 3/4.

-- 
Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 3/4] arm64: Add IOMMU dma_ops
  2015-07-16 18:40     ` Robin Murphy
@ 2015-07-28 17:27         ` Catalin Marinas
  -1 siblings, 0 replies; 24+ messages in thread
From: Catalin Marinas @ 2015-07-28 17:27 UTC (permalink / raw
  To: Robin Murphy
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw, arnd-r2nGTMty4D4,
	will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA, yong.wu-NuS5LvNUpcJWk0Htik3J/w,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Jul 16, 2015 at 07:40:14PM +0100, Robin Murphy wrote:
> +static void *__iommu_alloc_attrs(struct device *dev, size_t size,
> +				 dma_addr_t *handle, gfp_t gfp,
> +				 struct dma_attrs *attrs)
> +{
> +	bool coherent = is_device_dma_coherent(dev);
> +	int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
> +	void *addr;
> +
> +	if (WARN(!dev, "cannot create IOMMU mapping for unknown device\n"))
> +		return NULL;
> +
> +	/*
> +	 * Some drivers rely on this, and we may not want to potentially
> +	 * expose stale kernel data to devices anyway.
> +	 */
> +	gfp |= __GFP_ZERO;
> +
> +	if (gfp & __GFP_WAIT) {
> +		struct page **pages;
> +		pgprot_t pgprot = coherent ? __pgprot(PROT_NORMAL) :
> +					     __pgprot(PROT_NORMAL_NC);
> +
> +		pgprot = __get_dma_pgprot(attrs, pgprot, coherent);

I think you can simplify this:

		pgprot_t pgprot = __get_dma_pgprot(attrs, PAGE_KERNEL, coherent);

(and we could do the same in __dma_alloc)

> +		pages = iommu_dma_alloc(dev, size, gfp, ioprot,	handle,
> +					coherent ? NULL : flush_page);

I commented on patch 2/4. If we checked for IOMMU_CACHE in
iommu_dma_alloc(), we could always pass flush_page here without the
NULL.

> +		if (!pages)
> +			return NULL;
> +
> +		addr = dma_common_pages_remap(pages, size, VM_USERMAP, pgprot,
> +					      __builtin_return_address(0));
> +		if (!addr)
> +			iommu_dma_free(dev, pages, size, handle);

For arm64, it would have been easier to simply flush the caches here and
avoid the flush_page argument. However, I reread your earlier reply, so
if we ever honour the ATTR_NO_KERNEL_MAPPING, then we would have to
iterate over the individual pages in the arch code. Let's leave it as
per your patch 2/4 (with a flush_page argument).

-- 
Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 3/4] arm64: Add IOMMU dma_ops
@ 2015-07-28 17:27         ` Catalin Marinas
  0 siblings, 0 replies; 24+ messages in thread
From: Catalin Marinas @ 2015-07-28 17:27 UTC (permalink / raw
  To: linux-arm-kernel

On Thu, Jul 16, 2015 at 07:40:14PM +0100, Robin Murphy wrote:
> +static void *__iommu_alloc_attrs(struct device *dev, size_t size,
> +				 dma_addr_t *handle, gfp_t gfp,
> +				 struct dma_attrs *attrs)
> +{
> +	bool coherent = is_device_dma_coherent(dev);
> +	int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
> +	void *addr;
> +
> +	if (WARN(!dev, "cannot create IOMMU mapping for unknown device\n"))
> +		return NULL;
> +
> +	/*
> +	 * Some drivers rely on this, and we may not want to potentially
> +	 * expose stale kernel data to devices anyway.
> +	 */
> +	gfp |= __GFP_ZERO;
> +
> +	if (gfp & __GFP_WAIT) {
> +		struct page **pages;
> +		pgprot_t pgprot = coherent ? __pgprot(PROT_NORMAL) :
> +					     __pgprot(PROT_NORMAL_NC);
> +
> +		pgprot = __get_dma_pgprot(attrs, pgprot, coherent);

I think you can simplify this:

		pgprot_t pgprot = __get_dma_pgprot(attrs, PAGE_KERNEL, coherent);

(and we could do the same in __dma_alloc)

> +		pages = iommu_dma_alloc(dev, size, gfp, ioprot,	handle,
> +					coherent ? NULL : flush_page);

I commented on patch 2/4. If we checked for IOMMU_CACHE in
iommu_dma_alloc(), we could always pass flush_page here without the
NULL.

> +		if (!pages)
> +			return NULL;
> +
> +		addr = dma_common_pages_remap(pages, size, VM_USERMAP, pgprot,
> +					      __builtin_return_address(0));
> +		if (!addr)
> +			iommu_dma_free(dev, pages, size, handle);

For arm64, it would have been easier to simply flush the caches here and
avoid the flush_page argument. However, I reread your earlier reply, so
if we ever honour the ATTR_NO_KERNEL_MAPPING, then we would have to
iterate over the individual pages in the arch code. Let's leave it as
per your patch 2/4 (with a flush_page argument).

-- 
Catalin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 4/4] arm64: Hook up IOMMU dma_ops
  2015-07-16 18:40     ` Robin Murphy
@ 2015-07-28 17:30         ` Catalin Marinas
  -1 siblings, 0 replies; 24+ messages in thread
From: Catalin Marinas @ 2015-07-28 17:30 UTC (permalink / raw
  To: Robin Murphy
  Cc: laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw, arnd-r2nGTMty4D4,
	will.deacon-5wv7dgnIgG8,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	djkurtz-hpIqsD4AKlfQT0dZR+AlfA, yong.wu-NuS5LvNUpcJWk0Htik3J/w,
	thunder.leizhen-hv44wF8Li93QT0dZR+AlfA,
	yingjoe.chen-NuS5LvNUpcJWk0Htik3J/w,
	treding-DDmLM1+adcrQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Jul 16, 2015 at 07:40:15PM +0100, Robin Murphy wrote:
> With iommu_dma_ops in place, hook them up to the configuration code, so
> IOMMU-fronted devices will get them automatically.
> 
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>

Acked-by: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 4/4] arm64: Hook up IOMMU dma_ops
@ 2015-07-28 17:30         ` Catalin Marinas
  0 siblings, 0 replies; 24+ messages in thread
From: Catalin Marinas @ 2015-07-28 17:30 UTC (permalink / raw
  To: linux-arm-kernel

On Thu, Jul 16, 2015 at 07:40:15PM +0100, Robin Murphy wrote:
> With iommu_dma_ops in place, hook them up to the configuration code, so
> IOMMU-fronted devices will get them automatically.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-07-28 17:30 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-16 18:40 [PATCH v4 0/4] arm64: IOMMU-backed DMA mapping Robin Murphy
2015-07-16 18:40 ` Robin Murphy
     [not found] ` <cover.1437070995.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2015-07-16 18:40   ` [PATCH v4 1/4] iommu/iova: Avoid over-allocating when size-aligned Robin Murphy
2015-07-16 18:40     ` Robin Murphy
     [not found]     ` <35cb877828b570db84e89684ef76308cd1857f0c.1437070995.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2015-07-28 13:31       ` David Woodhouse
2015-07-28 13:31         ` David Woodhouse
2015-07-16 18:40   ` [PATCH v4 2/4] iommu: Implement common IOMMU ops for DMA mapping Robin Murphy
2015-07-16 18:40     ` Robin Murphy
     [not found]     ` <00fb13713dc8a69dc32b859f9311e70d78fb0c7a.1437070995.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2015-07-28 16:45       ` Catalin Marinas
2015-07-28 16:45         ` Catalin Marinas
2015-07-16 18:40   ` [PATCH v4 3/4] arm64: Add IOMMU dma_ops Robin Murphy
2015-07-16 18:40     ` Robin Murphy
     [not found]     ` <f51a6dfad1367107b07ca43624a990a62f3d4f3c.1437070995.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2015-07-28 17:27       ` Catalin Marinas
2015-07-28 17:27         ` Catalin Marinas
2015-07-16 18:40   ` [PATCH v4 4/4] arm64: Hook up " Robin Murphy
2015-07-16 18:40     ` Robin Murphy
     [not found]     ` <fecbcc50e62ae86639c886ab86e4ec067d52b16a.1437070995.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2015-07-28 17:30       ` Catalin Marinas
2015-07-28 17:30         ` Catalin Marinas
2015-07-20 15:26   ` [PATCH v4 0/4] arm64: IOMMU-backed DMA mapping Joerg Roedel
2015-07-20 15:26     ` Joerg Roedel
     [not found]     ` <20150720152637.GD10969-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-07-20 17:23       ` Robin Murphy
2015-07-20 17:23         ` Robin Murphy
     [not found]         ` <55AD2E97.1040901-5wv7dgnIgG8@public.gmane.org>
2015-07-28 12:46           ` Will Deacon
2015-07-28 12:46             ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.