asahi.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] IOMMU memory observability
@ 2023-11-30 20:14 Pasha Tatashin
  2023-11-30 20:14 ` [PATCH v2 01/10] iommu/vt-d: add wrapper functions for page allocations Pasha Tatashin
                   ` (10 more replies)
  0 siblings, 11 replies; 32+ messages in thread
From: Pasha Tatashin @ 2023-11-30 20:14 UTC (permalink / raw
  To: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, pasha.tatashin, paulmck, rdunlap, robin.murphy,
	samuel, suravee.suthikulpanit, sven, thierry.reding, tj,
	tomas.mudrunka, vdumpa, wens, will, yu-cheng.yu

From: Pasha Tatashin <tatashin@google.com>

----------------------------------------------------------------------
Changelog
----------------------------------------------------------------------
v2:
- Added Reviewed-by Janne Grunau
- Sync with 6.7.0-rc3, 3b47bc037bd44f142ac09848e8d3ecccc726be99
- Separated form the series patches:
vhost-vdpa: account iommu allocations
https://lore.kernel.org/all/20231130200447.2319543-1-pasha.tatashin@soleen.com
vfio: account iommu allocations
https://lore.kernel.org/all/20231130200900.2320829-1-pasha.tatashin@soleen.com
as suggested by Jason Gunthorpe
- Fixed SPARC build issue detected by kernel test robot
- Drop the following patches as they do account iommu page tables:
iommu/dma: use page allocation function provided by iommu-pages.h
iommu/fsl: use page allocation function provided by iommu-pages.h
iommu/iommufd: use page allocation function provided by iommu-pages.h
as suggested by Robin Murphy. These patches are not related to IOMMU
page tables. We might need to do a separate work to support DMA
observability.
- Remove support iommu/io-pgtable-arm-v7s as the 2nd level pages are
under a page size, thanks Robin Murphy for pointing this out.

----------------------------------------------------------------------
Description
----------------------------------------------------------------------
IOMMU subsystem may contain state that is in gigabytes. Majority of that
state is iommu page tables. Yet, there is currently, no way to observe
how much memory is actually used by the iommu subsystem.

This patch series solves this problem by adding both observability to
all pages that are allocated by IOMMU, and also accountability, so
admins can limit the amount if via cgroups.

The system-wide observability is using /proc/meminfo:
SecPageTables:    438176 kB

Contains IOMMU and KVM memory.

Per-node observability:
/sys/devices/system/node/nodeN/meminfo
Node N SecPageTables:    422204 kB

Contains IOMMU and KVM memory memory in the given NUMA node.

Per-node IOMMU only observability:
/sys/devices/system/node/nodeN/vmstat
nr_iommu_pages 105555

Contains number of pages IOMMU allocated in the given node.

Accountability: using sec_pagetables cgroup-v2 memory.stat entry.

With the change, iova_stress[1] stops as limit is reached:

# ./iova_stress
iova space:     0T      free memory:   497G
iova space:     1T      free memory:   495G
iova space:     2T      free memory:   493G
iova space:     3T      free memory:   491G

stops as limit is reached.

This series encorporates suggestions that came from the discussion
at LPC [2].
----------------------------------------------------------------------
[1] https://github.com/soleen/iova_stress
[2] https://lpc.events/event/17/contributions/1466
----------------------------------------------------------------------
Previous versions
v1: https://lore.kernel.org/all/20231128204938.1453583-1-pasha.tatashin@soleen.com
----------------------------------------------------------------------

Pasha Tatashin (10):
  iommu/vt-d: add wrapper functions for page allocations
  iommu/amd: use page allocation function provided by iommu-pages.h
  iommu/io-pgtable-arm: use page allocation function provided by
    iommu-pages.h
  iommu/io-pgtable-dart: use page allocation function provided by
    iommu-pages.h
  iommu/exynos: use page allocation function provided by iommu-pages.h
  iommu/rockchip: use page allocation function provided by iommu-pages.h
  iommu/sun50i: use page allocation function provided by iommu-pages.h
  iommu/tegra-smmu: use page allocation function provided by
    iommu-pages.h
  iommu: observability of the IOMMU allocations
  iommu: account IOMMU allocated memory

 Documentation/admin-guide/cgroup-v2.rst |   2 +-
 Documentation/filesystems/proc.rst      |   4 +-
 drivers/iommu/amd/amd_iommu.h           |   8 -
 drivers/iommu/amd/init.c                |  91 +++++-----
 drivers/iommu/amd/io_pgtable.c          |  13 +-
 drivers/iommu/amd/io_pgtable_v2.c       |  20 +-
 drivers/iommu/amd/iommu.c               |  13 +-
 drivers/iommu/exynos-iommu.c            |  14 +-
 drivers/iommu/intel/dmar.c              |  10 +-
 drivers/iommu/intel/iommu.c             |  47 ++---
 drivers/iommu/intel/iommu.h             |   2 -
 drivers/iommu/intel/irq_remapping.c     |  10 +-
 drivers/iommu/intel/pasid.c             |  12 +-
 drivers/iommu/intel/svm.c               |   7 +-
 drivers/iommu/io-pgtable-arm.c          |   7 +-
 drivers/iommu/io-pgtable-dart.c         |  37 ++--
 drivers/iommu/iommu-pages.h             | 231 ++++++++++++++++++++++++
 drivers/iommu/rockchip-iommu.c          |  14 +-
 drivers/iommu/sun50i-iommu.c            |   7 +-
 drivers/iommu/tegra-smmu.c              |  18 +-
 include/linux/mmzone.h                  |   5 +-
 mm/vmstat.c                             |   3 +
 22 files changed, 390 insertions(+), 185 deletions(-)
 create mode 100644 drivers/iommu/iommu-pages.h

-- 
2.43.0.rc2.451.g8631bc7472-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 01/10] iommu/vt-d: add wrapper functions for page allocations
  2023-11-30 20:14 [PATCH v2 00/10] IOMMU memory observability Pasha Tatashin
@ 2023-11-30 20:14 ` Pasha Tatashin
  2023-12-14 17:58   ` David Rientjes
  2023-11-30 20:14 ` [PATCH v2 02/10] iommu/amd: use page allocation function provided by iommu-pages.h Pasha Tatashin
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Pasha Tatashin @ 2023-11-30 20:14 UTC (permalink / raw
  To: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, pasha.tatashin, paulmck, rdunlap, robin.murphy,
	samuel, suravee.suthikulpanit, sven, thierry.reding, tj,
	tomas.mudrunka, vdumpa, wens, will, yu-cheng.yu

In order to improve observability and accountability of IOMMU layer, we
must account the number of pages that are allocated by functions that
are calling directly into buddy allocator.

This is achieved by first wrapping the allocation related functions into a
separate inline functions in new file:

drivers/iommu/iommu-pages.h

Convert all page allocation calls under iommu/intel to use these new
functions.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 drivers/iommu/intel/dmar.c          |  10 +-
 drivers/iommu/intel/iommu.c         |  47 +++----
 drivers/iommu/intel/iommu.h         |   2 -
 drivers/iommu/intel/irq_remapping.c |  10 +-
 drivers/iommu/intel/pasid.c         |  12 +-
 drivers/iommu/intel/svm.c           |   7 +-
 drivers/iommu/iommu-pages.h         | 199 ++++++++++++++++++++++++++++
 7 files changed, 236 insertions(+), 51 deletions(-)
 create mode 100644 drivers/iommu/iommu-pages.h

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index a3414afe11b0..a6937e1e20a5 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -32,6 +32,7 @@
 
 #include "iommu.h"
 #include "../irq_remapping.h"
+#include "../iommu-pages.h"
 #include "perf.h"
 #include "trace.h"
 #include "perfmon.h"
@@ -1185,7 +1186,7 @@ static void free_iommu(struct intel_iommu *iommu)
 	}
 
 	if (iommu->qi) {
-		free_page((unsigned long)iommu->qi->desc);
+		iommu_free_page(iommu->qi->desc);
 		kfree(iommu->qi->desc_status);
 		kfree(iommu->qi);
 	}
@@ -1714,6 +1715,7 @@ int dmar_enable_qi(struct intel_iommu *iommu)
 {
 	struct q_inval *qi;
 	struct page *desc_page;
+	int order;
 
 	if (!ecap_qis(iommu->ecap))
 		return -ENOENT;
@@ -1734,8 +1736,8 @@ int dmar_enable_qi(struct intel_iommu *iommu)
 	 * Need two pages to accommodate 256 descriptors of 256 bits each
 	 * if the remapping hardware supports scalable mode translation.
 	 */
-	desc_page = alloc_pages_node(iommu->node, GFP_ATOMIC | __GFP_ZERO,
-				     !!ecap_smts(iommu->ecap));
+	order = ecap_smts(iommu->ecap) ? 1 : 0;
+	desc_page = __iommu_alloc_pages_node(iommu->node, GFP_ATOMIC, order);
 	if (!desc_page) {
 		kfree(qi);
 		iommu->qi = NULL;
@@ -1746,7 +1748,7 @@ int dmar_enable_qi(struct intel_iommu *iommu)
 
 	qi->desc_status = kcalloc(QI_LENGTH, sizeof(int), GFP_ATOMIC);
 	if (!qi->desc_status) {
-		free_page((unsigned long) qi->desc);
+		iommu_free_page(qi->desc);
 		kfree(qi);
 		iommu->qi = NULL;
 		return -ENOMEM;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 3531b956556c..04f852175cbe 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -28,6 +28,7 @@
 #include "../dma-iommu.h"
 #include "../irq_remapping.h"
 #include "../iommu-sva.h"
+#include "../iommu-pages.h"
 #include "pasid.h"
 #include "cap_audit.h"
 #include "perfmon.h"
@@ -367,22 +368,6 @@ static int __init intel_iommu_setup(char *str)
 }
 __setup("intel_iommu=", intel_iommu_setup);
 
-void *alloc_pgtable_page(int node, gfp_t gfp)
-{
-	struct page *page;
-	void *vaddr = NULL;
-
-	page = alloc_pages_node(node, gfp | __GFP_ZERO, 0);
-	if (page)
-		vaddr = page_address(page);
-	return vaddr;
-}
-
-void free_pgtable_page(void *vaddr)
-{
-	free_page((unsigned long)vaddr);
-}
-
 static inline int domain_type_is_si(struct dmar_domain *domain)
 {
 	return domain->domain.type == IOMMU_DOMAIN_IDENTITY;
@@ -617,7 +602,7 @@ struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus,
 		if (!alloc)
 			return NULL;
 
-		context = alloc_pgtable_page(iommu->node, GFP_ATOMIC);
+		context = iommu_alloc_page_node(iommu->node, GFP_ATOMIC);
 		if (!context)
 			return NULL;
 
@@ -791,17 +776,17 @@ static void free_context_table(struct intel_iommu *iommu)
 	for (i = 0; i < ROOT_ENTRY_NR; i++) {
 		context = iommu_context_addr(iommu, i, 0, 0);
 		if (context)
-			free_pgtable_page(context);
+			iommu_free_page(context);
 
 		if (!sm_supported(iommu))
 			continue;
 
 		context = iommu_context_addr(iommu, i, 0x80, 0);
 		if (context)
-			free_pgtable_page(context);
+			iommu_free_page(context);
 	}
 
-	free_pgtable_page(iommu->root_entry);
+	iommu_free_page(iommu->root_entry);
 	iommu->root_entry = NULL;
 }
 
@@ -939,7 +924,7 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
 		if (!dma_pte_present(pte)) {
 			uint64_t pteval;
 
-			tmp_page = alloc_pgtable_page(domain->nid, gfp);
+			tmp_page = iommu_alloc_page_node(domain->nid, gfp);
 
 			if (!tmp_page)
 				return NULL;
@@ -951,7 +936,7 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
 
 			if (cmpxchg64(&pte->val, 0ULL, pteval))
 				/* Someone else set it while we were thinking; use theirs. */
-				free_pgtable_page(tmp_page);
+				iommu_free_page(tmp_page);
 			else
 				domain_flush_cache(domain, pte, sizeof(*pte));
 		}
@@ -1064,7 +1049,7 @@ static void dma_pte_free_level(struct dmar_domain *domain, int level,
 		      last_pfn < level_pfn + level_size(level) - 1)) {
 			dma_clear_pte(pte);
 			domain_flush_cache(domain, pte, sizeof(*pte));
-			free_pgtable_page(level_pte);
+			iommu_free_page(level_pte);
 		}
 next:
 		pfn += level_size(level);
@@ -1088,7 +1073,7 @@ static void dma_pte_free_pagetable(struct dmar_domain *domain,
 
 	/* free pgd */
 	if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) {
-		free_pgtable_page(domain->pgd);
+		iommu_free_page(domain->pgd);
 		domain->pgd = NULL;
 	}
 }
@@ -1190,7 +1175,7 @@ static int iommu_alloc_root_entry(struct intel_iommu *iommu)
 {
 	struct root_entry *root;
 
-	root = alloc_pgtable_page(iommu->node, GFP_ATOMIC);
+	root = iommu_alloc_page_node(iommu->node, GFP_ATOMIC);
 	if (!root) {
 		pr_err("Allocating root entry for %s failed\n",
 			iommu->name);
@@ -1863,7 +1848,7 @@ static void domain_exit(struct dmar_domain *domain)
 		LIST_HEAD(freelist);
 
 		domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw), &freelist);
-		put_pages_list(&freelist);
+		iommu_free_pages_list(&freelist);
 	}
 
 	if (WARN_ON(!list_empty(&domain->devices)))
@@ -2637,7 +2622,7 @@ static int copy_context_table(struct intel_iommu *iommu,
 			if (!old_ce)
 				goto out;
 
-			new_ce = alloc_pgtable_page(iommu->node, GFP_KERNEL);
+			new_ce = iommu_alloc_page_node(iommu->node, GFP_KERNEL);
 			if (!new_ce)
 				goto out_unmap;
 
@@ -3570,7 +3555,7 @@ static int intel_iommu_memory_notifier(struct notifier_block *nb,
 					start_vpfn, mhp->nr_pages,
 					list_empty(&freelist), 0);
 			rcu_read_unlock();
-			put_pages_list(&freelist);
+			iommu_free_pages_list(&freelist);
 		}
 		break;
 	}
@@ -4001,7 +3986,7 @@ static int md_domain_init(struct dmar_domain *domain, int guest_width)
 	domain->max_addr = 0;
 
 	/* always allocate the top pgd */
-	domain->pgd = alloc_pgtable_page(domain->nid, GFP_ATOMIC);
+	domain->pgd = iommu_alloc_page_node(domain->nid, GFP_ATOMIC);
 	if (!domain->pgd)
 		return -ENOMEM;
 	domain_flush_cache(domain, domain->pgd, PAGE_SIZE);
@@ -4148,7 +4133,7 @@ int prepare_domain_attach_device(struct iommu_domain *domain,
 		pte = dmar_domain->pgd;
 		if (dma_pte_present(pte)) {
 			dmar_domain->pgd = phys_to_virt(dma_pte_addr(pte));
-			free_pgtable_page(pte);
+			iommu_free_page(pte);
 		}
 		dmar_domain->agaw--;
 	}
@@ -4295,7 +4280,7 @@ static void intel_iommu_tlb_sync(struct iommu_domain *domain,
 				      start_pfn, nrpages,
 				      list_empty(&gather->freelist), 0);
 
-	put_pages_list(&gather->freelist);
+	iommu_free_pages_list(&gather->freelist);
 }
 
 static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain,
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 65d37a138c75..b505f3f44d0a 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -894,8 +894,6 @@ void domain_update_iommu_cap(struct dmar_domain *domain);
 
 int dmar_ir_support(void);
 
-void *alloc_pgtable_page(int node, gfp_t gfp);
-void free_pgtable_page(void *vaddr);
 void iommu_flush_write_buffer(struct intel_iommu *iommu);
 struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn);
 struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent,
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 29b9e55dcf26..72e1c1342c13 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -22,6 +22,7 @@
 
 #include "iommu.h"
 #include "../irq_remapping.h"
+#include "../iommu-pages.h"
 #include "cap_audit.h"
 
 enum irq_mode {
@@ -536,8 +537,8 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu)
 	if (!ir_table)
 		return -ENOMEM;
 
-	pages = alloc_pages_node(iommu->node, GFP_KERNEL | __GFP_ZERO,
-				 INTR_REMAP_PAGE_ORDER);
+	pages = __iommu_alloc_pages_node(iommu->node, GFP_KERNEL,
+					 INTR_REMAP_PAGE_ORDER);
 	if (!pages) {
 		pr_err("IR%d: failed to allocate pages of order %d\n",
 		       iommu->seq_id, INTR_REMAP_PAGE_ORDER);
@@ -622,7 +623,7 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu)
 out_free_bitmap:
 	bitmap_free(bitmap);
 out_free_pages:
-	__free_pages(pages, INTR_REMAP_PAGE_ORDER);
+	__iommu_free_pages(pages, INTR_REMAP_PAGE_ORDER);
 out_free_table:
 	kfree(ir_table);
 
@@ -643,8 +644,7 @@ static void intel_teardown_irq_remapping(struct intel_iommu *iommu)
 			irq_domain_free_fwnode(fn);
 			iommu->ir_domain = NULL;
 		}
-		free_pages((unsigned long)iommu->ir_table->base,
-			   INTR_REMAP_PAGE_ORDER);
+		iommu_free_pages(iommu->ir_table->base, INTR_REMAP_PAGE_ORDER);
 		bitmap_free(iommu->ir_table->bitmap);
 		kfree(iommu->ir_table);
 		iommu->ir_table = NULL;
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 74e8e4c17e81..1856e74bba78 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -20,6 +20,7 @@
 
 #include "iommu.h"
 #include "pasid.h"
+#include "../iommu-pages.h"
 
 /*
  * Intel IOMMU system wide PASID name space:
@@ -116,8 +117,7 @@ int intel_pasid_alloc_table(struct device *dev)
 
 	size = max_pasid >> (PASID_PDE_SHIFT - 3);
 	order = size ? get_order(size) : 0;
-	pages = alloc_pages_node(info->iommu->node,
-				 GFP_KERNEL | __GFP_ZERO, order);
+	pages = __iommu_alloc_pages_node(info->iommu->node, GFP_KERNEL, order);
 	if (!pages) {
 		kfree(pasid_table);
 		return -ENOMEM;
@@ -154,10 +154,10 @@ void intel_pasid_free_table(struct device *dev)
 	max_pde = pasid_table->max_pasid >> PASID_PDE_SHIFT;
 	for (i = 0; i < max_pde; i++) {
 		table = get_pasid_table_from_pde(&dir[i]);
-		free_pgtable_page(table);
+		iommu_free_page(table);
 	}
 
-	free_pages((unsigned long)pasid_table->table, pasid_table->order);
+	iommu_free_pages(pasid_table->table, pasid_table->order);
 	kfree(pasid_table);
 }
 
@@ -203,7 +203,7 @@ static struct pasid_entry *intel_pasid_get_entry(struct device *dev, u32 pasid)
 retry:
 	entries = get_pasid_table_from_pde(&dir[dir_index]);
 	if (!entries) {
-		entries = alloc_pgtable_page(info->iommu->node, GFP_ATOMIC);
+		entries = iommu_alloc_page_node(info->iommu->node, GFP_ATOMIC);
 		if (!entries)
 			return NULL;
 
@@ -215,7 +215,7 @@ static struct pasid_entry *intel_pasid_get_entry(struct device *dev, u32 pasid)
 		 */
 		if (cmpxchg64(&dir[dir_index].val, 0ULL,
 			      (u64)virt_to_phys(entries) | PASID_PTE_PRESENT)) {
-			free_pgtable_page(entries);
+			iommu_free_page(entries);
 			goto retry;
 		}
 		if (!ecap_coherent(info->iommu->ecap)) {
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 50a481c895b8..4cf8826b30e1 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -23,6 +23,7 @@
 #include "pasid.h"
 #include "perf.h"
 #include "../iommu-sva.h"
+#include "../iommu-pages.h"
 #include "trace.h"
 
 static irqreturn_t prq_event_thread(int irq, void *d);
@@ -67,7 +68,7 @@ int intel_svm_enable_prq(struct intel_iommu *iommu)
 	struct page *pages;
 	int irq, ret;
 
-	pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, PRQ_ORDER);
+	pages = __iommu_alloc_pages(GFP_KERNEL, PRQ_ORDER);
 	if (!pages) {
 		pr_warn("IOMMU: %s: Failed to allocate page request queue\n",
 			iommu->name);
@@ -118,7 +119,7 @@ int intel_svm_enable_prq(struct intel_iommu *iommu)
 	dmar_free_hwirq(irq);
 	iommu->pr_irq = 0;
 free_prq:
-	free_pages((unsigned long)iommu->prq, PRQ_ORDER);
+	iommu_free_pages(iommu->prq, PRQ_ORDER);
 	iommu->prq = NULL;
 
 	return ret;
@@ -141,7 +142,7 @@ int intel_svm_finish_prq(struct intel_iommu *iommu)
 		iommu->iopf_queue = NULL;
 	}
 
-	free_pages((unsigned long)iommu->prq, PRQ_ORDER);
+	iommu_free_pages(iommu->prq, PRQ_ORDER);
 	iommu->prq = NULL;
 
 	return 0;
diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
new file mode 100644
index 000000000000..2332f807d514
--- /dev/null
+++ b/drivers/iommu/iommu-pages.h
@@ -0,0 +1,199 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2023, Google LLC.
+ * Pasha Tatashin <pasha.tatashin@soleen.com>
+ */
+
+#ifndef __IOMMU_PAGES_H
+#define __IOMMU_PAGES_H
+
+#include <linux/vmstat.h>
+#include <linux/gfp.h>
+#include <linux/mm.h>
+
+/*
+ * All page allocation that are performed in the IOMMU subsystem must use one of
+ * the functions below.  This is necessary for the proper accounting as IOMMU
+ * state can be rather large, i.e. multiple gigabytes in size.
+ */
+
+/**
+ * __iommu_alloc_pages_node - allocate a zeroed page of a given order from
+ * specific NUMA node.
+ * @nid: memory NUMA node id
+ * @gfp: buddy allocator flags
+ * @order: page order
+ *
+ * returns the head struct page of the allocated page.
+ */
+static inline struct page *__iommu_alloc_pages_node(int nid, gfp_t gfp,
+						    int order)
+{
+	struct page *pages;
+
+	pages = alloc_pages_node(nid, gfp | __GFP_ZERO, order);
+	if (!pages)
+		return NULL;
+
+	return pages;
+}
+
+/**
+ * __iommu_alloc_pages - allocate a zeroed page of a given order.
+ * @gfp: buddy allocator flags
+ * @order: page order
+ *
+ * returns the head struct page of the allocated page.
+ */
+static inline struct page *__iommu_alloc_pages(gfp_t gfp, int order)
+{
+	struct page *pages;
+
+	pages = alloc_pages(gfp | __GFP_ZERO, order);
+	if (!pages)
+		return NULL;
+
+	return pages;
+}
+
+/**
+ * __iommu_alloc_page_node - allocate a zeroed page at specific NUMA node.
+ * @nid: memory NUMA node id
+ * @gfp: buddy allocator flags
+ *
+ * returns the struct page of the allocated page.
+ */
+static inline struct page *__iommu_alloc_page_node(int nid, gfp_t gfp)
+{
+	return __iommu_alloc_pages_node(nid, gfp, 0);
+}
+
+/**
+ * __iommu_alloc_page - allocate a zeroed page
+ * @gfp: buddy allocator flags
+ *
+ * returns the struct page of the allocated page.
+ */
+static inline struct page *__iommu_alloc_page(gfp_t gfp)
+{
+	return __iommu_alloc_pages(gfp, 0);
+}
+
+/**
+ * __iommu_free_pages - free page of a given order
+ * @pages: head struct page of the page
+ * @order: page order
+ */
+static inline void __iommu_free_pages(struct page *pages, int order)
+{
+	if (!pages)
+		return;
+
+	__free_pages(pages, order);
+}
+
+/**
+ * __iommu_free_page - free page
+ * @page: struct page of the page
+ */
+static inline void __iommu_free_page(struct page *page)
+{
+	__iommu_free_pages(page, 0);
+}
+
+/**
+ * iommu_alloc_pages_node - allocate a zeroed page of a given order from
+ * specific NUMA node.
+ * @nid: memory NUMA node id
+ * @gfp: buddy allocator flags
+ * @order: page order
+ *
+ * returns the virtual address of the allocated page
+ */
+static inline void *iommu_alloc_pages_node(int nid, gfp_t gfp, int order)
+{
+	struct page *pages = __iommu_alloc_pages_node(nid, gfp, order);
+
+	if (!pages)
+		return NULL;
+
+	return page_address(pages);
+}
+
+/**
+ * iommu_alloc_pages - allocate a zeroed page of a given order
+ * @gfp: buddy allocator flags
+ * @order: page order
+ *
+ * returns the virtual address of the allocated page
+ */
+static inline void *iommu_alloc_pages(gfp_t gfp, int order)
+{
+	struct page *pages = __iommu_alloc_pages(gfp, order);
+
+	if (!pages)
+		return NULL;
+
+	return page_address(pages);
+}
+
+/**
+ * iommu_alloc_page_node - allocate a zeroed page at specific NUMA node.
+ * @nid: memory NUMA node id
+ * @gfp: buddy allocator flags
+ *
+ * returns the virtual address of the allocated page
+ */
+static inline void *iommu_alloc_page_node(int nid, gfp_t gfp)
+{
+	return iommu_alloc_pages_node(nid, gfp, 0);
+}
+
+/**
+ * iommu_alloc_page - allocate a zeroed page
+ * @gfp: buddy allocator flags
+ *
+ * returns the virtual address of the allocated page
+ */
+static inline void *iommu_alloc_page(gfp_t gfp)
+{
+	return iommu_alloc_pages(gfp, 0);
+}
+
+/**
+ * iommu_free_pages - free page of a given order
+ * @virt: virtual address of the page to be freed.
+ * @order: page order
+ */
+static inline void iommu_free_pages(void *virt, int order)
+{
+	if (!virt)
+		return;
+
+	__iommu_free_pages(virt_to_page(virt), order);
+}
+
+/**
+ * iommu_free_page - free page
+ * @virt: virtual address of the page to be freed.
+ */
+static inline void iommu_free_page(void *virt)
+{
+	iommu_free_pages(virt, 0);
+}
+
+/**
+ * iommu_free_pages_list - free a list of pages.
+ * @pages: the head of the lru list to be freed.
+ */
+static inline void iommu_free_pages_list(struct list_head *pages)
+{
+	while (!list_empty(pages)) {
+		struct page *p = list_entry(pages->prev, struct page, lru);
+
+		list_del(&p->lru);
+		put_page(p);
+	}
+}
+
+#endif	/* __IOMMU_PAGES_H */
-- 
2.43.0.rc2.451.g8631bc7472-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 02/10] iommu/amd: use page allocation function provided by iommu-pages.h
  2023-11-30 20:14 [PATCH v2 00/10] IOMMU memory observability Pasha Tatashin
  2023-11-30 20:14 ` [PATCH v2 01/10] iommu/vt-d: add wrapper functions for page allocations Pasha Tatashin
@ 2023-11-30 20:14 ` Pasha Tatashin
  2023-12-24 21:34   ` David Rientjes
  2023-11-30 20:14 ` [PATCH v2 03/10] iommu/io-pgtable-arm: " Pasha Tatashin
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Pasha Tatashin @ 2023-11-30 20:14 UTC (permalink / raw
  To: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, pasha.tatashin, paulmck, rdunlap, robin.murphy,
	samuel, suravee.suthikulpanit, sven, thierry.reding, tj,
	tomas.mudrunka, vdumpa, wens, will, yu-cheng.yu

Convert iommu/amd/* files to use the new page allocation functions
provided in iommu-pages.h.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 drivers/iommu/amd/amd_iommu.h     |  8 ---
 drivers/iommu/amd/init.c          | 91 ++++++++++++++-----------------
 drivers/iommu/amd/io_pgtable.c    | 13 +++--
 drivers/iommu/amd/io_pgtable_v2.c | 20 +++----
 drivers/iommu/amd/iommu.c         | 13 +++--
 5 files changed, 64 insertions(+), 81 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 86be1edd50ee..bf697d566e0b 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -136,14 +136,6 @@ static inline int get_pci_sbdf_id(struct pci_dev *pdev)
 	return PCI_SEG_DEVID_TO_SBDF(seg, devid);
 }
 
-static inline void *alloc_pgtable_page(int nid, gfp_t gfp)
-{
-	struct page *page;
-
-	page = alloc_pages_node(nid, gfp | __GFP_ZERO, 0);
-	return page ? page_address(page) : NULL;
-}
-
 bool translation_pre_enabled(struct amd_iommu *iommu);
 bool amd_iommu_is_attach_deferred(struct device *dev);
 int __init add_special_device(u8 type, u8 id, u32 *devid, bool cmd_line);
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 64bcf3df37ee..5b8a80fc7e50 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -35,6 +35,7 @@
 
 #include "amd_iommu.h"
 #include "../irq_remapping.h"
+#include "../iommu-pages.h"
 
 /*
  * definitions for the ACPI scanning code
@@ -648,8 +649,8 @@ static int __init find_last_devid_acpi(struct acpi_table_header *table, u16 pci_
 /* Allocate per PCI segment device table */
 static inline int __init alloc_dev_table(struct amd_iommu_pci_seg *pci_seg)
 {
-	pci_seg->dev_table = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO | GFP_DMA32,
-						      get_order(pci_seg->dev_table_size));
+	pci_seg->dev_table = iommu_alloc_pages(GFP_KERNEL | GFP_DMA32,
+					       get_order(pci_seg->dev_table_size));
 	if (!pci_seg->dev_table)
 		return -ENOMEM;
 
@@ -658,17 +659,16 @@ static inline int __init alloc_dev_table(struct amd_iommu_pci_seg *pci_seg)
 
 static inline void free_dev_table(struct amd_iommu_pci_seg *pci_seg)
 {
-	free_pages((unsigned long)pci_seg->dev_table,
-		    get_order(pci_seg->dev_table_size));
+	iommu_free_pages(pci_seg->dev_table,
+			 get_order(pci_seg->dev_table_size));
 	pci_seg->dev_table = NULL;
 }
 
 /* Allocate per PCI segment IOMMU rlookup table. */
 static inline int __init alloc_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
 {
-	pci_seg->rlookup_table = (void *)__get_free_pages(
-						GFP_KERNEL | __GFP_ZERO,
-						get_order(pci_seg->rlookup_table_size));
+	pci_seg->rlookup_table = iommu_alloc_pages(GFP_KERNEL,
+						   get_order(pci_seg->rlookup_table_size));
 	if (pci_seg->rlookup_table == NULL)
 		return -ENOMEM;
 
@@ -677,16 +677,15 @@ static inline int __init alloc_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
 
 static inline void free_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
 {
-	free_pages((unsigned long)pci_seg->rlookup_table,
-		   get_order(pci_seg->rlookup_table_size));
+	iommu_free_pages(pci_seg->rlookup_table,
+			 get_order(pci_seg->rlookup_table_size));
 	pci_seg->rlookup_table = NULL;
 }
 
 static inline int __init alloc_irq_lookup_table(struct amd_iommu_pci_seg *pci_seg)
 {
-	pci_seg->irq_lookup_table = (void *)__get_free_pages(
-					     GFP_KERNEL | __GFP_ZERO,
-					     get_order(pci_seg->rlookup_table_size));
+	pci_seg->irq_lookup_table = iommu_alloc_pages(GFP_KERNEL,
+						      get_order(pci_seg->rlookup_table_size));
 	kmemleak_alloc(pci_seg->irq_lookup_table,
 		       pci_seg->rlookup_table_size, 1, GFP_KERNEL);
 	if (pci_seg->irq_lookup_table == NULL)
@@ -698,8 +697,8 @@ static inline int __init alloc_irq_lookup_table(struct amd_iommu_pci_seg *pci_se
 static inline void free_irq_lookup_table(struct amd_iommu_pci_seg *pci_seg)
 {
 	kmemleak_free(pci_seg->irq_lookup_table);
-	free_pages((unsigned long)pci_seg->irq_lookup_table,
-		   get_order(pci_seg->rlookup_table_size));
+	iommu_free_pages(pci_seg->irq_lookup_table,
+			 get_order(pci_seg->rlookup_table_size));
 	pci_seg->irq_lookup_table = NULL;
 }
 
@@ -707,8 +706,8 @@ static int __init alloc_alias_table(struct amd_iommu_pci_seg *pci_seg)
 {
 	int i;
 
-	pci_seg->alias_table = (void *)__get_free_pages(GFP_KERNEL,
-					get_order(pci_seg->alias_table_size));
+	pci_seg->alias_table = iommu_alloc_pages(GFP_KERNEL,
+						 get_order(pci_seg->alias_table_size));
 	if (!pci_seg->alias_table)
 		return -ENOMEM;
 
@@ -723,8 +722,8 @@ static int __init alloc_alias_table(struct amd_iommu_pci_seg *pci_seg)
 
 static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
 {
-	free_pages((unsigned long)pci_seg->alias_table,
-		   get_order(pci_seg->alias_table_size));
+	iommu_free_pages(pci_seg->alias_table,
+			 get_order(pci_seg->alias_table_size));
 	pci_seg->alias_table = NULL;
 }
 
@@ -735,8 +734,8 @@ static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
  */
 static int __init alloc_command_buffer(struct amd_iommu *iommu)
 {
-	iommu->cmd_buf = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
-						  get_order(CMD_BUFFER_SIZE));
+	iommu->cmd_buf = iommu_alloc_pages(GFP_KERNEL,
+					   get_order(CMD_BUFFER_SIZE));
 
 	return iommu->cmd_buf ? 0 : -ENOMEM;
 }
@@ -844,19 +843,19 @@ static void iommu_disable_command_buffer(struct amd_iommu *iommu)
 
 static void __init free_command_buffer(struct amd_iommu *iommu)
 {
-	free_pages((unsigned long)iommu->cmd_buf, get_order(CMD_BUFFER_SIZE));
+	iommu_free_pages(iommu->cmd_buf, get_order(CMD_BUFFER_SIZE));
 }
 
 static void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu,
 					 gfp_t gfp, size_t size)
 {
 	int order = get_order(size);
-	void *buf = (void *)__get_free_pages(gfp, order);
+	void *buf = iommu_alloc_pages(gfp, order);
 
 	if (buf &&
 	    check_feature(FEATURE_SNP) &&
 	    set_memory_4k((unsigned long)buf, (1 << order))) {
-		free_pages((unsigned long)buf, order);
+		iommu_free_pages(buf, order);
 		buf = NULL;
 	}
 
@@ -866,7 +865,7 @@ static void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu,
 /* allocates the memory where the IOMMU will log its events to */
 static int __init alloc_event_buffer(struct amd_iommu *iommu)
 {
-	iommu->evt_buf = iommu_alloc_4k_pages(iommu, GFP_KERNEL | __GFP_ZERO,
+	iommu->evt_buf = iommu_alloc_4k_pages(iommu, GFP_KERNEL,
 					      EVT_BUFFER_SIZE);
 
 	return iommu->evt_buf ? 0 : -ENOMEM;
@@ -900,14 +899,13 @@ static void iommu_disable_event_buffer(struct amd_iommu *iommu)
 
 static void __init free_event_buffer(struct amd_iommu *iommu)
 {
-	free_pages((unsigned long)iommu->evt_buf, get_order(EVT_BUFFER_SIZE));
+	iommu_free_pages(iommu->evt_buf, get_order(EVT_BUFFER_SIZE));
 }
 
 /* allocates the memory where the IOMMU will log its events to */
 static int __init alloc_ppr_log(struct amd_iommu *iommu)
 {
-	iommu->ppr_log = iommu_alloc_4k_pages(iommu, GFP_KERNEL | __GFP_ZERO,
-					      PPR_LOG_SIZE);
+	iommu->ppr_log = iommu_alloc_4k_pages(iommu, GFP_KERNEL, PPR_LOG_SIZE);
 
 	return iommu->ppr_log ? 0 : -ENOMEM;
 }
@@ -936,14 +934,14 @@ static void iommu_enable_ppr_log(struct amd_iommu *iommu)
 
 static void __init free_ppr_log(struct amd_iommu *iommu)
 {
-	free_pages((unsigned long)iommu->ppr_log, get_order(PPR_LOG_SIZE));
+	iommu_free_pages(iommu->ppr_log, get_order(PPR_LOG_SIZE));
 }
 
 static void free_ga_log(struct amd_iommu *iommu)
 {
 #ifdef CONFIG_IRQ_REMAP
-	free_pages((unsigned long)iommu->ga_log, get_order(GA_LOG_SIZE));
-	free_pages((unsigned long)iommu->ga_log_tail, get_order(8));
+	iommu_free_pages(iommu->ga_log, get_order(GA_LOG_SIZE));
+	iommu_free_pages(iommu->ga_log_tail, get_order(8));
 #endif
 }
 
@@ -988,13 +986,11 @@ static int iommu_init_ga_log(struct amd_iommu *iommu)
 	if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir))
 		return 0;
 
-	iommu->ga_log = (u8 *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
-					get_order(GA_LOG_SIZE));
+	iommu->ga_log = iommu_alloc_pages(GFP_KERNEL, get_order(GA_LOG_SIZE));
 	if (!iommu->ga_log)
 		goto err_out;
 
-	iommu->ga_log_tail = (u8 *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
-					get_order(8));
+	iommu->ga_log_tail = iommu_alloc_pages(GFP_KERNEL, get_order(8));
 	if (!iommu->ga_log_tail)
 		goto err_out;
 
@@ -1007,7 +1003,7 @@ static int iommu_init_ga_log(struct amd_iommu *iommu)
 
 static int __init alloc_cwwb_sem(struct amd_iommu *iommu)
 {
-	iommu->cmd_sem = iommu_alloc_4k_pages(iommu, GFP_KERNEL | __GFP_ZERO, 1);
+	iommu->cmd_sem = iommu_alloc_4k_pages(iommu, GFP_KERNEL, 1);
 
 	return iommu->cmd_sem ? 0 : -ENOMEM;
 }
@@ -1015,7 +1011,7 @@ static int __init alloc_cwwb_sem(struct amd_iommu *iommu)
 static void __init free_cwwb_sem(struct amd_iommu *iommu)
 {
 	if (iommu->cmd_sem)
-		free_page((unsigned long)iommu->cmd_sem);
+		iommu_free_page((void *)iommu->cmd_sem);
 }
 
 static void iommu_enable_xt(struct amd_iommu *iommu)
@@ -1080,7 +1076,6 @@ static bool __copy_device_table(struct amd_iommu *iommu)
 	u32 lo, hi, devid, old_devtb_size;
 	phys_addr_t old_devtb_phys;
 	u16 dom_id, dte_v, irq_v;
-	gfp_t gfp_flag;
 	u64 tmp;
 
 	/* Each IOMMU use separate device table with the same size */
@@ -1114,9 +1109,8 @@ static bool __copy_device_table(struct amd_iommu *iommu)
 	if (!old_devtb)
 		return false;
 
-	gfp_flag = GFP_KERNEL | __GFP_ZERO | GFP_DMA32;
-	pci_seg->old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
-						    get_order(pci_seg->dev_table_size));
+	pci_seg->old_dev_tbl_cpy = iommu_alloc_pages(GFP_KERNEL | GFP_DMA32,
+						     get_order(pci_seg->dev_table_size));
 	if (pci_seg->old_dev_tbl_cpy == NULL) {
 		pr_err("Failed to allocate memory for copying old device table!\n");
 		memunmap(old_devtb);
@@ -2800,8 +2794,8 @@ static void early_enable_iommus(void)
 
 		for_each_pci_segment(pci_seg) {
 			if (pci_seg->old_dev_tbl_cpy != NULL) {
-				free_pages((unsigned long)pci_seg->old_dev_tbl_cpy,
-						get_order(pci_seg->dev_table_size));
+				iommu_free_pages(pci_seg->old_dev_tbl_cpy,
+						 get_order(pci_seg->dev_table_size));
 				pci_seg->old_dev_tbl_cpy = NULL;
 			}
 		}
@@ -2814,8 +2808,8 @@ static void early_enable_iommus(void)
 		pr_info("Copied DEV table from previous kernel.\n");
 
 		for_each_pci_segment(pci_seg) {
-			free_pages((unsigned long)pci_seg->dev_table,
-				   get_order(pci_seg->dev_table_size));
+			iommu_free_pages(pci_seg->dev_table,
+					 get_order(pci_seg->dev_table_size));
 			pci_seg->dev_table = pci_seg->old_dev_tbl_cpy;
 		}
 
@@ -3018,8 +3012,8 @@ static bool __init check_ioapic_information(void)
 
 static void __init free_dma_resources(void)
 {
-	free_pages((unsigned long)amd_iommu_pd_alloc_bitmap,
-		   get_order(MAX_DOMAIN_ID/8));
+	iommu_free_pages(amd_iommu_pd_alloc_bitmap,
+			 get_order(MAX_DOMAIN_ID / 8));
 	amd_iommu_pd_alloc_bitmap = NULL;
 
 	free_unity_maps();
@@ -3091,9 +3085,8 @@ static int __init early_amd_iommu_init(void)
 	/* Device table - directly used by all IOMMUs */
 	ret = -ENOMEM;
 
-	amd_iommu_pd_alloc_bitmap = (void *)__get_free_pages(
-					    GFP_KERNEL | __GFP_ZERO,
-					    get_order(MAX_DOMAIN_ID/8));
+	amd_iommu_pd_alloc_bitmap = iommu_alloc_pages(GFP_KERNEL,
+						      get_order(MAX_DOMAIN_ID / 8));
 	if (amd_iommu_pd_alloc_bitmap == NULL)
 		goto out;
 
diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 6c0621f6f572..f8b7d4c39a9f 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -22,6 +22,7 @@
 
 #include "amd_iommu_types.h"
 #include "amd_iommu.h"
+#include "../iommu-pages.h"
 
 static void v1_tlb_flush_all(void *cookie)
 {
@@ -156,7 +157,7 @@ static bool increase_address_space(struct protection_domain *domain,
 	bool ret = true;
 	u64 *pte;
 
-	pte = alloc_pgtable_page(domain->nid, gfp);
+	pte = iommu_alloc_page_node(domain->nid, gfp);
 	if (!pte)
 		return false;
 
@@ -187,7 +188,7 @@ static bool increase_address_space(struct protection_domain *domain,
 
 out:
 	spin_unlock_irqrestore(&domain->lock, flags);
-	free_page((unsigned long)pte);
+	iommu_free_page(pte);
 
 	return ret;
 }
@@ -250,7 +251,7 @@ static u64 *alloc_pte(struct protection_domain *domain,
 
 		if (!IOMMU_PTE_PRESENT(__pte) ||
 		    pte_level == PAGE_MODE_NONE) {
-			page = alloc_pgtable_page(domain->nid, gfp);
+			page = iommu_alloc_page_node(domain->nid, gfp);
 
 			if (!page)
 				return NULL;
@@ -259,7 +260,7 @@ static u64 *alloc_pte(struct protection_domain *domain,
 
 			/* pte could have been changed somewhere. */
 			if (!try_cmpxchg64(pte, &__pte, __npte))
-				free_page((unsigned long)page);
+				iommu_free_page(page);
 			else if (IOMMU_PTE_PRESENT(__pte))
 				*updated = true;
 
@@ -430,7 +431,7 @@ static int iommu_v1_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 	}
 
 	/* Everything flushed out, free pages now */
-	put_pages_list(&freelist);
+	iommu_free_pages_list(&freelist);
 
 	return ret;
 }
@@ -579,7 +580,7 @@ static void v1_free_pgtable(struct io_pgtable *iop)
 	/* Make changes visible to IOMMUs */
 	amd_iommu_domain_update(dom);
 
-	put_pages_list(&freelist);
+	iommu_free_pages_list(&freelist);
 }
 
 static struct io_pgtable *v1_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
diff --git a/drivers/iommu/amd/io_pgtable_v2.c b/drivers/iommu/amd/io_pgtable_v2.c
index f818a7e254d4..1e08dab93686 100644
--- a/drivers/iommu/amd/io_pgtable_v2.c
+++ b/drivers/iommu/amd/io_pgtable_v2.c
@@ -18,6 +18,7 @@
 
 #include "amd_iommu_types.h"
 #include "amd_iommu.h"
+#include "../iommu-pages.h"
 
 #define IOMMU_PAGE_PRESENT	BIT_ULL(0)	/* Is present */
 #define IOMMU_PAGE_RW		BIT_ULL(1)	/* Writeable */
@@ -99,11 +100,6 @@ static inline int page_size_to_level(u64 pg_size)
 	return PAGE_MODE_1_LEVEL;
 }
 
-static inline void free_pgtable_page(u64 *pt)
-{
-	free_page((unsigned long)pt);
-}
-
 static void free_pgtable(u64 *pt, int level)
 {
 	u64 *p;
@@ -125,10 +121,10 @@ static void free_pgtable(u64 *pt, int level)
 		if (level > 2)
 			free_pgtable(p, level - 1);
 		else
-			free_pgtable_page(p);
+			iommu_free_page(p);
 	}
 
-	free_pgtable_page(pt);
+	iommu_free_page(pt);
 }
 
 /* Allocate page table */
@@ -156,14 +152,14 @@ static u64 *v2_alloc_pte(int nid, u64 *pgd, unsigned long iova,
 		}
 
 		if (!IOMMU_PTE_PRESENT(__pte)) {
-			page = alloc_pgtable_page(nid, gfp);
+			page = iommu_alloc_page_node(nid, gfp);
 			if (!page)
 				return NULL;
 
 			__npte = set_pgtable_attr(page);
 			/* pte could have been changed somewhere. */
 			if (cmpxchg64(pte, __pte, __npte) != __pte)
-				free_pgtable_page(page);
+				iommu_free_page(page);
 			else if (IOMMU_PTE_PRESENT(__pte))
 				*updated = true;
 
@@ -185,7 +181,7 @@ static u64 *v2_alloc_pte(int nid, u64 *pgd, unsigned long iova,
 		if (pg_size == IOMMU_PAGE_SIZE_1G)
 			free_pgtable(__pte, end_level - 1);
 		else if (pg_size == IOMMU_PAGE_SIZE_2M)
-			free_pgtable_page(__pte);
+			iommu_free_page(__pte);
 	}
 
 	return pte;
@@ -380,7 +376,7 @@ static struct io_pgtable *v2_alloc_pgtable(struct io_pgtable_cfg *cfg, void *coo
 	int ret;
 	int ias = IOMMU_IN_ADDR_BIT_SIZE;
 
-	pgtable->pgd = alloc_pgtable_page(pdom->nid, GFP_ATOMIC);
+	pgtable->pgd = iommu_alloc_page_node(pdom->nid, GFP_ATOMIC);
 	if (!pgtable->pgd)
 		return NULL;
 
@@ -403,7 +399,7 @@ static struct io_pgtable *v2_alloc_pgtable(struct io_pgtable_cfg *cfg, void *coo
 	return &pgtable->iop;
 
 err_free_pgd:
-	free_pgtable_page(pgtable->pgd);
+	iommu_free_page(pgtable->pgd);
 
 	return NULL;
 }
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index fcc987f5d4ed..9a228a95da0e 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -42,6 +42,7 @@
 #include "amd_iommu.h"
 #include "../dma-iommu.h"
 #include "../irq_remapping.h"
+#include "../iommu-pages.h"
 
 #define CMD_SET_TYPE(cmd, t) ((cmd)->data[1] |= ((t) << 28))
 
@@ -1642,7 +1643,7 @@ static void free_gcr3_tbl_level1(u64 *tbl)
 
 		ptr = iommu_phys_to_virt(tbl[i] & PAGE_MASK);
 
-		free_page((unsigned long)ptr);
+		iommu_free_page(ptr);
 	}
 }
 
@@ -1670,7 +1671,7 @@ static void free_gcr3_table(struct protection_domain *domain)
 	else
 		BUG_ON(domain->glx != 0);
 
-	free_page((unsigned long)domain->gcr3_tbl);
+	iommu_free_page(domain->gcr3_tbl);
 }
 
 /*
@@ -1697,7 +1698,7 @@ static int setup_gcr3_table(struct protection_domain *domain, int pasids)
 	if (levels > amd_iommu_max_glx_val)
 		return -EINVAL;
 
-	domain->gcr3_tbl = alloc_pgtable_page(domain->nid, GFP_ATOMIC);
+	domain->gcr3_tbl = iommu_alloc_page_node(domain->nid, GFP_ATOMIC);
 	if (domain->gcr3_tbl == NULL)
 		return -ENOMEM;
 
@@ -2092,7 +2093,7 @@ static void protection_domain_free(struct protection_domain *domain)
 		free_gcr3_table(domain);
 
 	if (domain->iop.root)
-		free_page((unsigned long)domain->iop.root);
+		iommu_free_page(domain->iop.root);
 
 	if (domain->id)
 		domain_id_free(domain->id);
@@ -2107,7 +2108,7 @@ static int protection_domain_init_v1(struct protection_domain *domain, int mode)
 	BUG_ON(mode < PAGE_MODE_NONE || mode > PAGE_MODE_6_LEVEL);
 
 	if (mode != PAGE_MODE_NONE) {
-		pt_root = (void *)get_zeroed_page(GFP_KERNEL);
+		pt_root = iommu_alloc_page(GFP_KERNEL);
 		if (!pt_root)
 			return -ENOMEM;
 	}
@@ -2783,7 +2784,7 @@ static u64 *__get_gcr3_pte(u64 *root, int level, u32 pasid, bool alloc)
 			if (!alloc)
 				return NULL;
 
-			root = (void *)get_zeroed_page(GFP_ATOMIC);
+			root = iommu_alloc_page(GFP_ATOMIC);
 			if (root == NULL)
 				return NULL;
 
-- 
2.43.0.rc2.451.g8631bc7472-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 03/10] iommu/io-pgtable-arm: use page allocation function provided by iommu-pages.h
  2023-11-30 20:14 [PATCH v2 00/10] IOMMU memory observability Pasha Tatashin
  2023-11-30 20:14 ` [PATCH v2 01/10] iommu/vt-d: add wrapper functions for page allocations Pasha Tatashin
  2023-11-30 20:14 ` [PATCH v2 02/10] iommu/amd: use page allocation function provided by iommu-pages.h Pasha Tatashin
@ 2023-11-30 20:14 ` Pasha Tatashin
  2023-12-24 21:34   ` David Rientjes
  2023-11-30 20:14 ` [PATCH v2 04/10] iommu/io-pgtable-dart: " Pasha Tatashin
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Pasha Tatashin @ 2023-11-30 20:14 UTC (permalink / raw
  To: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, pasha.tatashin, paulmck, rdunlap, robin.murphy,
	samuel, suravee.suthikulpanit, sven, thierry.reding, tj,
	tomas.mudrunka, vdumpa, wens, will, yu-cheng.yu

Convert iommu/io-pgtable-arm.c to use the new page allocation functions
provided in iommu-pages.h.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 drivers/iommu/io-pgtable-arm.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 72dcdd468cf3..21d315151ad6 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -21,6 +21,7 @@
 #include <asm/barrier.h>
 
 #include "io-pgtable-arm.h"
+#include "iommu-pages.h"
 
 #define ARM_LPAE_MAX_ADDR_BITS		52
 #define ARM_LPAE_S2_MAX_CONCAT_PAGES	16
@@ -197,7 +198,7 @@ static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
 	void *pages;
 
 	VM_BUG_ON((gfp & __GFP_HIGHMEM));
-	p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order);
+	p = __iommu_alloc_pages_node(dev_to_node(dev), gfp, order);
 	if (!p)
 		return NULL;
 
@@ -221,7 +222,7 @@ static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp,
 	dev_err(dev, "Cannot accommodate DMA translation for IOMMU page tables\n");
 	dma_unmap_single(dev, dma, size, DMA_TO_DEVICE);
 out_free:
-	__free_pages(p, order);
+	__iommu_free_pages(p, order);
 	return NULL;
 }
 
@@ -231,7 +232,7 @@ static void __arm_lpae_free_pages(void *pages, size_t size,
 	if (!cfg->coherent_walk)
 		dma_unmap_single(cfg->iommu_dev, __arm_lpae_dma_addr(pages),
 				 size, DMA_TO_DEVICE);
-	free_pages((unsigned long)pages, get_order(size));
+	iommu_free_pages(pages, get_order(size));
 }
 
 static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
-- 
2.43.0.rc2.451.g8631bc7472-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 04/10] iommu/io-pgtable-dart: use page allocation function provided by iommu-pages.h
  2023-11-30 20:14 [PATCH v2 00/10] IOMMU memory observability Pasha Tatashin
                   ` (2 preceding siblings ...)
  2023-11-30 20:14 ` [PATCH v2 03/10] iommu/io-pgtable-arm: " Pasha Tatashin
@ 2023-11-30 20:14 ` Pasha Tatashin
  2023-12-24 21:36   ` David Rientjes
  2023-11-30 20:14 ` [PATCH v2 05/10] iommu/exynos: " Pasha Tatashin
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Pasha Tatashin @ 2023-11-30 20:14 UTC (permalink / raw
  To: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, pasha.tatashin, paulmck, rdunlap, robin.murphy,
	samuel, suravee.suthikulpanit, sven, thierry.reding, tj,
	tomas.mudrunka, vdumpa, wens, will, yu-cheng.yu

Convert iommu/io-pgtable-dart.c to use the new page allocation functions
provided in iommu-pages.h.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Janne Grunau <j@jannau.net>
---
 drivers/iommu/io-pgtable-dart.c | 37 +++++++++++++--------------------
 1 file changed, 14 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/io-pgtable-dart.c b/drivers/iommu/io-pgtable-dart.c
index 74b1ef2b96be..ad28031e1e93 100644
--- a/drivers/iommu/io-pgtable-dart.c
+++ b/drivers/iommu/io-pgtable-dart.c
@@ -23,6 +23,7 @@
 #include <linux/types.h>
 
 #include <asm/barrier.h>
+#include "iommu-pages.h"
 
 #define DART1_MAX_ADDR_BITS	36
 
@@ -106,18 +107,12 @@ static phys_addr_t iopte_to_paddr(dart_iopte pte,
 	return paddr;
 }
 
-static void *__dart_alloc_pages(size_t size, gfp_t gfp,
-				    struct io_pgtable_cfg *cfg)
+static void *__dart_alloc_pages(size_t size, gfp_t gfp)
 {
 	int order = get_order(size);
-	struct page *p;
 
 	VM_BUG_ON((gfp & __GFP_HIGHMEM));
-	p = alloc_pages(gfp | __GFP_ZERO, order);
-	if (!p)
-		return NULL;
-
-	return page_address(p);
+	return iommu_alloc_pages(gfp, order);
 }
 
 static int dart_init_pte(struct dart_io_pgtable *data,
@@ -262,13 +257,13 @@ static int dart_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
 
 	/* no L2 table present */
 	if (!pte) {
-		cptep = __dart_alloc_pages(tblsz, gfp, cfg);
+		cptep = __dart_alloc_pages(tblsz, gfp);
 		if (!cptep)
 			return -ENOMEM;
 
 		pte = dart_install_table(cptep, ptep, 0, data);
 		if (pte)
-			free_pages((unsigned long)cptep, get_order(tblsz));
+			iommu_free_pages(cptep, get_order(tblsz));
 
 		/* L2 table is present (now) */
 		pte = READ_ONCE(*ptep);
@@ -419,8 +414,7 @@ apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 	cfg->apple_dart_cfg.n_ttbrs = 1 << data->tbl_bits;
 
 	for (i = 0; i < cfg->apple_dart_cfg.n_ttbrs; ++i) {
-		data->pgd[i] = __dart_alloc_pages(DART_GRANULE(data), GFP_KERNEL,
-					   cfg);
+		data->pgd[i] = __dart_alloc_pages(DART_GRANULE(data), GFP_KERNEL);
 		if (!data->pgd[i])
 			goto out_free_data;
 		cfg->apple_dart_cfg.ttbr[i] = virt_to_phys(data->pgd[i]);
@@ -429,9 +423,10 @@ apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 	return &data->iop;
 
 out_free_data:
-	while (--i >= 0)
-		free_pages((unsigned long)data->pgd[i],
-			   get_order(DART_GRANULE(data)));
+	while (--i >= 0) {
+		iommu_free_pages(data->pgd[i],
+				 get_order(DART_GRANULE(data)));
+	}
 	kfree(data);
 	return NULL;
 }
@@ -439,6 +434,7 @@ apple_dart_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 static void apple_dart_free_pgtable(struct io_pgtable *iop)
 {
 	struct dart_io_pgtable *data = io_pgtable_to_data(iop);
+	int order = get_order(DART_GRANULE(data));
 	dart_iopte *ptep, *end;
 	int i;
 
@@ -449,15 +445,10 @@ static void apple_dart_free_pgtable(struct io_pgtable *iop)
 		while (ptep != end) {
 			dart_iopte pte = *ptep++;
 
-			if (pte) {
-				unsigned long page =
-					(unsigned long)iopte_deref(pte, data);
-
-				free_pages(page, get_order(DART_GRANULE(data)));
-			}
+			if (pte)
+				iommu_free_pages(iopte_deref(pte, data), order);
 		}
-		free_pages((unsigned long)data->pgd[i],
-			   get_order(DART_GRANULE(data)));
+		iommu_free_pages(data->pgd[i], order);
 	}
 
 	kfree(data);
-- 
2.43.0.rc2.451.g8631bc7472-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 05/10] iommu/exynos: use page allocation function provided by iommu-pages.h
  2023-11-30 20:14 [PATCH v2 00/10] IOMMU memory observability Pasha Tatashin
                   ` (3 preceding siblings ...)
  2023-11-30 20:14 ` [PATCH v2 04/10] iommu/io-pgtable-dart: " Pasha Tatashin
@ 2023-11-30 20:14 ` Pasha Tatashin
  2023-12-24 21:37   ` David Rientjes
  2023-11-30 20:15 ` [PATCH v2 06/10] iommu/rockchip: " Pasha Tatashin
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Pasha Tatashin @ 2023-11-30 20:14 UTC (permalink / raw
  To: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, pasha.tatashin, paulmck, rdunlap, robin.murphy,
	samuel, suravee.suthikulpanit, sven, thierry.reding, tj,
	tomas.mudrunka, vdumpa, wens, will, yu-cheng.yu

Convert iommu/exynos-iommu.c to use the new page allocation functions
provided in iommu-pages.h.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 drivers/iommu/exynos-iommu.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 2c6e9094f1e9..3eab0ae65a4f 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -22,6 +22,8 @@
 #include <linux/pm_runtime.h>
 #include <linux/slab.h>
 
+#include "iommu-pages.h"
+
 typedef u32 sysmmu_iova_t;
 typedef u32 sysmmu_pte_t;
 static struct iommu_domain exynos_identity_domain;
@@ -900,11 +902,11 @@ static struct iommu_domain *exynos_iommu_domain_alloc_paging(struct device *dev)
 	if (!domain)
 		return NULL;
 
-	domain->pgtable = (sysmmu_pte_t *)__get_free_pages(GFP_KERNEL, 2);
+	domain->pgtable = iommu_alloc_pages(GFP_KERNEL, 2);
 	if (!domain->pgtable)
 		goto err_pgtable;
 
-	domain->lv2entcnt = (short *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
+	domain->lv2entcnt = iommu_alloc_pages(GFP_KERNEL, 1);
 	if (!domain->lv2entcnt)
 		goto err_counter;
 
@@ -930,9 +932,9 @@ static struct iommu_domain *exynos_iommu_domain_alloc_paging(struct device *dev)
 	return &domain->domain;
 
 err_lv2ent:
-	free_pages((unsigned long)domain->lv2entcnt, 1);
+	iommu_free_pages(domain->lv2entcnt, 1);
 err_counter:
-	free_pages((unsigned long)domain->pgtable, 2);
+	iommu_free_pages(domain->pgtable, 2);
 err_pgtable:
 	kfree(domain);
 	return NULL;
@@ -973,8 +975,8 @@ static void exynos_iommu_domain_free(struct iommu_domain *iommu_domain)
 					phys_to_virt(base));
 		}
 
-	free_pages((unsigned long)domain->pgtable, 2);
-	free_pages((unsigned long)domain->lv2entcnt, 1);
+	iommu_free_pages(domain->pgtable, 2);
+	iommu_free_pages(domain->lv2entcnt, 1);
 	kfree(domain);
 }
 
-- 
2.43.0.rc2.451.g8631bc7472-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 06/10] iommu/rockchip: use page allocation function provided by iommu-pages.h
  2023-11-30 20:14 [PATCH v2 00/10] IOMMU memory observability Pasha Tatashin
                   ` (4 preceding siblings ...)
  2023-11-30 20:14 ` [PATCH v2 05/10] iommu/exynos: " Pasha Tatashin
@ 2023-11-30 20:15 ` Pasha Tatashin
  2023-12-24 21:39   ` David Rientjes
  2023-11-30 20:15 ` [PATCH v2 07/10] iommu/sun50i: " Pasha Tatashin
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Pasha Tatashin @ 2023-11-30 20:15 UTC (permalink / raw
  To: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, pasha.tatashin, paulmck, rdunlap, robin.murphy,
	samuel, suravee.suthikulpanit, sven, thierry.reding, tj,
	tomas.mudrunka, vdumpa, wens, will, yu-cheng.yu

Convert iommu/rockchip-iommu.c to use the new page allocation functions
provided in iommu-pages.h.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 drivers/iommu/rockchip-iommu.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 2685861c0a12..e04f22d481d0 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -26,6 +26,8 @@
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 
+#include "iommu-pages.h"
+
 /** MMU register offsets */
 #define RK_MMU_DTE_ADDR		0x00	/* Directory table address */
 #define RK_MMU_STATUS		0x04
@@ -727,14 +729,14 @@ static u32 *rk_dte_get_page_table(struct rk_iommu_domain *rk_domain,
 	if (rk_dte_is_pt_valid(dte))
 		goto done;
 
-	page_table = (u32 *)get_zeroed_page(GFP_ATOMIC | rk_ops->gfp_flags);
+	page_table = iommu_alloc_page(GFP_ATOMIC | rk_ops->gfp_flags);
 	if (!page_table)
 		return ERR_PTR(-ENOMEM);
 
 	pt_dma = dma_map_single(dma_dev, page_table, SPAGE_SIZE, DMA_TO_DEVICE);
 	if (dma_mapping_error(dma_dev, pt_dma)) {
 		dev_err(dma_dev, "DMA mapping error while allocating page table\n");
-		free_page((unsigned long)page_table);
+		iommu_free_page(page_table);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -1061,7 +1063,7 @@ static struct iommu_domain *rk_iommu_domain_alloc_paging(struct device *dev)
 	 * Each level1 (dt) and level2 (pt) table has 1024 4-byte entries.
 	 * Allocate one 4 KiB page for each table.
 	 */
-	rk_domain->dt = (u32 *)get_zeroed_page(GFP_KERNEL | rk_ops->gfp_flags);
+	rk_domain->dt = iommu_alloc_page(GFP_KERNEL | rk_ops->gfp_flags);
 	if (!rk_domain->dt)
 		goto err_free_domain;
 
@@ -1083,7 +1085,7 @@ static struct iommu_domain *rk_iommu_domain_alloc_paging(struct device *dev)
 	return &rk_domain->domain;
 
 err_free_dt:
-	free_page((unsigned long)rk_domain->dt);
+	iommu_free_page(rk_domain->dt);
 err_free_domain:
 	kfree(rk_domain);
 
@@ -1104,13 +1106,13 @@ static void rk_iommu_domain_free(struct iommu_domain *domain)
 			u32 *page_table = phys_to_virt(pt_phys);
 			dma_unmap_single(dma_dev, pt_phys,
 					 SPAGE_SIZE, DMA_TO_DEVICE);
-			free_page((unsigned long)page_table);
+			iommu_free_page(page_table);
 		}
 	}
 
 	dma_unmap_single(dma_dev, rk_domain->dt_dma,
 			 SPAGE_SIZE, DMA_TO_DEVICE);
-	free_page((unsigned long)rk_domain->dt);
+	iommu_free_page(rk_domain->dt);
 
 	kfree(rk_domain);
 }
-- 
2.43.0.rc2.451.g8631bc7472-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 07/10] iommu/sun50i: use page allocation function provided by iommu-pages.h
  2023-11-30 20:14 [PATCH v2 00/10] IOMMU memory observability Pasha Tatashin
                   ` (5 preceding siblings ...)
  2023-11-30 20:15 ` [PATCH v2 06/10] iommu/rockchip: " Pasha Tatashin
@ 2023-11-30 20:15 ` Pasha Tatashin
  2023-12-24 21:39   ` David Rientjes
  2023-11-30 20:15 ` [PATCH v2 08/10] iommu/tegra-smmu: " Pasha Tatashin
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Pasha Tatashin @ 2023-11-30 20:15 UTC (permalink / raw
  To: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, pasha.tatashin, paulmck, rdunlap, robin.murphy,
	samuel, suravee.suthikulpanit, sven, thierry.reding, tj,
	tomas.mudrunka, vdumpa, wens, will, yu-cheng.yu

Convert iommu/sun50i-iommu.c to use the new page allocation functions
provided in iommu-pages.h.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 drivers/iommu/sun50i-iommu.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/sun50i-iommu.c b/drivers/iommu/sun50i-iommu.c
index 41484a5a399b..172ddb717eb5 100644
--- a/drivers/iommu/sun50i-iommu.c
+++ b/drivers/iommu/sun50i-iommu.c
@@ -26,6 +26,8 @@
 #include <linux/spinlock.h>
 #include <linux/types.h>
 
+#include "iommu-pages.h"
+
 #define IOMMU_RESET_REG			0x010
 #define IOMMU_RESET_RELEASE_ALL			0xffffffff
 #define IOMMU_ENABLE_REG		0x020
@@ -679,8 +681,7 @@ sun50i_iommu_domain_alloc_paging(struct device *dev)
 	if (!sun50i_domain)
 		return NULL;
 
-	sun50i_domain->dt = (u32 *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
-						    get_order(DT_SIZE));
+	sun50i_domain->dt = iommu_alloc_pages(GFP_KERNEL, get_order(DT_SIZE));
 	if (!sun50i_domain->dt)
 		goto err_free_domain;
 
@@ -702,7 +703,7 @@ static void sun50i_iommu_domain_free(struct iommu_domain *domain)
 {
 	struct sun50i_iommu_domain *sun50i_domain = to_sun50i_domain(domain);
 
-	free_pages((unsigned long)sun50i_domain->dt, get_order(DT_SIZE));
+	iommu_free_pages(sun50i_domain->dt, get_order(DT_SIZE));
 	sun50i_domain->dt = NULL;
 
 	kfree(sun50i_domain);
-- 
2.43.0.rc2.451.g8631bc7472-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 08/10] iommu/tegra-smmu: use page allocation function provided by iommu-pages.h
  2023-11-30 20:14 [PATCH v2 00/10] IOMMU memory observability Pasha Tatashin
                   ` (6 preceding siblings ...)
  2023-11-30 20:15 ` [PATCH v2 07/10] iommu/sun50i: " Pasha Tatashin
@ 2023-11-30 20:15 ` Pasha Tatashin
  2023-12-24 21:40   ` David Rientjes
  2023-11-30 20:15 ` [PATCH v2 09/10] iommu: observability of the IOMMU allocations Pasha Tatashin
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Pasha Tatashin @ 2023-11-30 20:15 UTC (permalink / raw
  To: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, pasha.tatashin, paulmck, rdunlap, robin.murphy,
	samuel, suravee.suthikulpanit, sven, thierry.reding, tj,
	tomas.mudrunka, vdumpa, wens, will, yu-cheng.yu

Convert iommu/tegra-smmu.c to use the new page allocation functions
provided in iommu-pages.h.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 drivers/iommu/tegra-smmu.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 310871728ab4..5e0730dc1b0e 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -19,6 +19,8 @@
 #include <soc/tegra/ahb.h>
 #include <soc/tegra/mc.h>
 
+#include "iommu-pages.h"
+
 struct tegra_smmu_group {
 	struct list_head list;
 	struct tegra_smmu *smmu;
@@ -282,7 +284,7 @@ static struct iommu_domain *tegra_smmu_domain_alloc_paging(struct device *dev)
 
 	as->attr = SMMU_PD_READABLE | SMMU_PD_WRITABLE | SMMU_PD_NONSECURE;
 
-	as->pd = alloc_page(GFP_KERNEL | __GFP_DMA | __GFP_ZERO);
+	as->pd = __iommu_alloc_page(GFP_KERNEL | __GFP_DMA);
 	if (!as->pd) {
 		kfree(as);
 		return NULL;
@@ -290,7 +292,7 @@ static struct iommu_domain *tegra_smmu_domain_alloc_paging(struct device *dev)
 
 	as->count = kcalloc(SMMU_NUM_PDE, sizeof(u32), GFP_KERNEL);
 	if (!as->count) {
-		__free_page(as->pd);
+		__iommu_free_page(as->pd);
 		kfree(as);
 		return NULL;
 	}
@@ -298,7 +300,7 @@ static struct iommu_domain *tegra_smmu_domain_alloc_paging(struct device *dev)
 	as->pts = kcalloc(SMMU_NUM_PDE, sizeof(*as->pts), GFP_KERNEL);
 	if (!as->pts) {
 		kfree(as->count);
-		__free_page(as->pd);
+		__iommu_free_page(as->pd);
 		kfree(as);
 		return NULL;
 	}
@@ -599,14 +601,14 @@ static u32 *as_get_pte(struct tegra_smmu_as *as, dma_addr_t iova,
 		dma = dma_map_page(smmu->dev, page, 0, SMMU_SIZE_PT,
 				   DMA_TO_DEVICE);
 		if (dma_mapping_error(smmu->dev, dma)) {
-			__free_page(page);
+			__iommu_free_page(page);
 			return NULL;
 		}
 
 		if (!smmu_dma_addr_valid(smmu, dma)) {
 			dma_unmap_page(smmu->dev, dma, SMMU_SIZE_PT,
 				       DMA_TO_DEVICE);
-			__free_page(page);
+			__iommu_free_page(page);
 			return NULL;
 		}
 
@@ -649,7 +651,7 @@ static void tegra_smmu_pte_put_use(struct tegra_smmu_as *as, unsigned long iova)
 		tegra_smmu_set_pde(as, iova, 0);
 
 		dma_unmap_page(smmu->dev, pte_dma, SMMU_SIZE_PT, DMA_TO_DEVICE);
-		__free_page(page);
+		__iommu_free_page(page);
 		as->pts[pde] = NULL;
 	}
 }
@@ -688,7 +690,7 @@ static struct page *as_get_pde_page(struct tegra_smmu_as *as,
 	if (gfpflags_allow_blocking(gfp))
 		spin_unlock_irqrestore(&as->lock, *flags);
 
-	page = alloc_page(gfp | __GFP_DMA | __GFP_ZERO);
+	page = __iommu_alloc_page(gfp | __GFP_DMA);
 
 	if (gfpflags_allow_blocking(gfp))
 		spin_lock_irqsave(&as->lock, *flags);
@@ -700,7 +702,7 @@ static struct page *as_get_pde_page(struct tegra_smmu_as *as,
 	 */
 	if (as->pts[pde]) {
 		if (page)
-			__free_page(page);
+			__iommu_free_page(page);
 
 		page = as->pts[pde];
 	}
-- 
2.43.0.rc2.451.g8631bc7472-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 09/10] iommu: observability of the IOMMU allocations
  2023-11-30 20:14 [PATCH v2 00/10] IOMMU memory observability Pasha Tatashin
                   ` (7 preceding siblings ...)
  2023-11-30 20:15 ` [PATCH v2 08/10] iommu/tegra-smmu: " Pasha Tatashin
@ 2023-11-30 20:15 ` Pasha Tatashin
  2023-12-14 17:59   ` David Rientjes
  2023-11-30 20:15 ` [PATCH v2 10/10] iommu: account IOMMU allocated memory Pasha Tatashin
  2023-12-24 21:49 ` [PATCH v2 00/10] IOMMU memory observability David Rientjes
  10 siblings, 1 reply; 32+ messages in thread
From: Pasha Tatashin @ 2023-11-30 20:15 UTC (permalink / raw
  To: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, pasha.tatashin, paulmck, rdunlap, robin.murphy,
	samuel, suravee.suthikulpanit, sven, thierry.reding, tj,
	tomas.mudrunka, vdumpa, wens, will, yu-cheng.yu

Add NR_IOMMU_PAGES into node_stat_item that counts number of pages
that are allocated by the IOMMU subsystem.

The allocations can be view per-node via:
/sys/devices/system/node/nodeN/vmstat.

For example:

$ grep iommu /sys/devices/system/node/node*/vmstat
/sys/devices/system/node/node0/vmstat:nr_iommu_pages 106025
/sys/devices/system/node/node1/vmstat:nr_iommu_pages 3464

The value is in page-count, therefore, in the above example
the iommu allocations amount to ~428M.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 drivers/iommu/iommu-pages.h | 30 ++++++++++++++++++++++++++++++
 include/linux/mmzone.h      |  3 +++
 mm/vmstat.c                 |  3 +++
 3 files changed, 36 insertions(+)

diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
index 2332f807d514..69895a355c0c 100644
--- a/drivers/iommu/iommu-pages.h
+++ b/drivers/iommu/iommu-pages.h
@@ -17,6 +17,30 @@
  * state can be rather large, i.e. multiple gigabytes in size.
  */
 
+/**
+ * __iommu_alloc_account - account for newly allocated page.
+ * @pages: head struct page of the page.
+ * @order: order of the page
+ */
+static inline void __iommu_alloc_account(struct page *pages, int order)
+{
+	const long pgcnt = 1l << order;
+
+	mod_node_page_state(page_pgdat(pages), NR_IOMMU_PAGES, pgcnt);
+}
+
+/**
+ * __iommu_free_account - account a page that is about to be freed.
+ * @pages: head struct page of the page.
+ * @order: order of the page
+ */
+static inline void __iommu_free_account(struct page *pages, int order)
+{
+	const long pgcnt = 1l << order;
+
+	mod_node_page_state(page_pgdat(pages), NR_IOMMU_PAGES, -pgcnt);
+}
+
 /**
  * __iommu_alloc_pages_node - allocate a zeroed page of a given order from
  * specific NUMA node.
@@ -35,6 +59,8 @@ static inline struct page *__iommu_alloc_pages_node(int nid, gfp_t gfp,
 	if (!pages)
 		return NULL;
 
+	__iommu_alloc_account(pages, order);
+
 	return pages;
 }
 
@@ -53,6 +79,8 @@ static inline struct page *__iommu_alloc_pages(gfp_t gfp, int order)
 	if (!pages)
 		return NULL;
 
+	__iommu_alloc_account(pages, order);
+
 	return pages;
 }
 
@@ -89,6 +117,7 @@ static inline void __iommu_free_pages(struct page *pages, int order)
 	if (!pages)
 		return;
 
+	__iommu_free_account(pages, order);
 	__free_pages(pages, order);
 }
 
@@ -192,6 +221,7 @@ static inline void iommu_free_pages_list(struct list_head *pages)
 		struct page *p = list_entry(pages->prev, struct page, lru);
 
 		list_del(&p->lru);
+		__iommu_free_account(p, 0);
 		put_page(p);
 	}
 }
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3c25226beeed..1a4d0bba3e8b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -200,6 +200,9 @@ enum node_stat_item {
 #endif
 	NR_PAGETABLE,		/* used for pagetables */
 	NR_SECONDARY_PAGETABLE, /* secondary pagetables, e.g. KVM pagetables */
+#ifdef CONFIG_IOMMU_SUPPORT
+	NR_IOMMU_PAGES,		/* # of pages allocated by IOMMU */
+#endif
 #ifdef CONFIG_SWAP
 	NR_SWAPCACHE,
 #endif
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 359460deb377..801b58890b6c 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1242,6 +1242,9 @@ const char * const vmstat_text[] = {
 #endif
 	"nr_page_table_pages",
 	"nr_sec_page_table_pages",
+#ifdef CONFIG_IOMMU_SUPPORT
+	"nr_iommu_pages",
+#endif
 #ifdef CONFIG_SWAP
 	"nr_swapcached",
 #endif
-- 
2.43.0.rc2.451.g8631bc7472-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 10/10] iommu: account IOMMU allocated memory
  2023-11-30 20:14 [PATCH v2 00/10] IOMMU memory observability Pasha Tatashin
                   ` (8 preceding siblings ...)
  2023-11-30 20:15 ` [PATCH v2 09/10] iommu: observability of the IOMMU allocations Pasha Tatashin
@ 2023-11-30 20:15 ` Pasha Tatashin
  2023-12-14 18:02   ` David Rientjes
  2023-12-24 21:49 ` [PATCH v2 00/10] IOMMU memory observability David Rientjes
  10 siblings, 1 reply; 32+ messages in thread
From: Pasha Tatashin @ 2023-11-30 20:15 UTC (permalink / raw
  To: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, pasha.tatashin, paulmck, rdunlap, robin.murphy,
	samuel, suravee.suthikulpanit, sven, thierry.reding, tj,
	tomas.mudrunka, vdumpa, wens, will, yu-cheng.yu

In order to be able to limit the amount of memory that is allocated
by IOMMU subsystem, the memory must be accounted.

Account IOMMU as part of the secondary pagetables as it was discussed
at LPC.

The value of SecPageTables now contains mmeory allocation by IOMMU
and KVM.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 2 +-
 Documentation/filesystems/proc.rst      | 4 ++--
 drivers/iommu/iommu-pages.h             | 2 ++
 include/linux/mmzone.h                  | 2 +-
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 3f85254f3cef..e004e05a7cde 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1418,7 +1418,7 @@ PAGE_SIZE multiple when read back.
 	  sec_pagetables
 		Amount of memory allocated for secondary page tables,
 		this currently includes KVM mmu allocations on x86
-		and arm64.
+		and arm64 and IOMMU page tables.
 
 	  percpu (npn)
 		Amount of memory used for storing per-cpu kernel
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 49ef12df631b..86f137a9b66b 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -1110,8 +1110,8 @@ KernelStack
 PageTables
               Memory consumed by userspace page tables
 SecPageTables
-              Memory consumed by secondary page tables, this currently
-              currently includes KVM mmu allocations on x86 and arm64.
+              Memory consumed by secondary page tables, this currently includes
+              KVM mmu and IOMMU allocations on x86 and arm64.
 NFS_Unstable
               Always zero. Previous counted pages which had been written to
               the server, but has not been committed to stable storage.
diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
index 69895a355c0c..cdd257585284 100644
--- a/drivers/iommu/iommu-pages.h
+++ b/drivers/iommu/iommu-pages.h
@@ -27,6 +27,7 @@ static inline void __iommu_alloc_account(struct page *pages, int order)
 	const long pgcnt = 1l << order;
 
 	mod_node_page_state(page_pgdat(pages), NR_IOMMU_PAGES, pgcnt);
+	mod_lruvec_page_state(pages, NR_SECONDARY_PAGETABLE, pgcnt);
 }
 
 /**
@@ -39,6 +40,7 @@ static inline void __iommu_free_account(struct page *pages, int order)
 	const long pgcnt = 1l << order;
 
 	mod_node_page_state(page_pgdat(pages), NR_IOMMU_PAGES, -pgcnt);
+	mod_lruvec_page_state(pages, NR_SECONDARY_PAGETABLE, -pgcnt);
 }
 
 /**
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 1a4d0bba3e8b..aaabb385663c 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -199,7 +199,7 @@ enum node_stat_item {
 	NR_KERNEL_SCS_KB,	/* measured in KiB */
 #endif
 	NR_PAGETABLE,		/* used for pagetables */
-	NR_SECONDARY_PAGETABLE, /* secondary pagetables, e.g. KVM pagetables */
+	NR_SECONDARY_PAGETABLE, /* secondary pagetables, KVM & IOMMU */
 #ifdef CONFIG_IOMMU_SUPPORT
 	NR_IOMMU_PAGES,		/* # of pages allocated by IOMMU */
 #endif
-- 
2.43.0.rc2.451.g8631bc7472-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 01/10] iommu/vt-d: add wrapper functions for page allocations
  2023-11-30 20:14 ` [PATCH v2 01/10] iommu/vt-d: add wrapper functions for page allocations Pasha Tatashin
@ 2023-12-14 17:58   ` David Rientjes
  2023-12-14 19:16     ` Pasha Tatashin
  0 siblings, 1 reply; 32+ messages in thread
From: David Rientjes @ 2023-12-14 17:58 UTC (permalink / raw
  To: Pasha Tatashin
  Cc: Andrew Morton, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas,
	cgroups, corbet, david, dwmw2, hannes, heiko, iommu,
	jernej.skrabec, jonathanh, joro, krzysztof.kozlowski, linux-doc,
	linux-fsdevel, linux-kernel, linux-mm, linux-rockchip,
	linux-samsung-soc, linux-sunxi, linux-tegra, lizefan.x, marcan,
	mhiramat, m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

On Thu, 30 Nov 2023, Pasha Tatashin wrote:

> diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
> new file mode 100644
> index 000000000000..2332f807d514
> --- /dev/null
> +++ b/drivers/iommu/iommu-pages.h
> @@ -0,0 +1,199 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2023, Google LLC.
> + * Pasha Tatashin <pasha.tatashin@soleen.com>
> + */
> +
> +#ifndef __IOMMU_PAGES_H
> +#define __IOMMU_PAGES_H
> +
> +#include <linux/vmstat.h>
> +#include <linux/gfp.h>
> +#include <linux/mm.h>
> +
> +/*
> + * All page allocation that are performed in the IOMMU subsystem must use one of
> + * the functions below.  This is necessary for the proper accounting as IOMMU
> + * state can be rather large, i.e. multiple gigabytes in size.
> + */
> +
> +/**
> + * __iommu_alloc_pages_node - allocate a zeroed page of a given order from
> + * specific NUMA node.
> + * @nid: memory NUMA node id

NUMA_NO_NODE if no locality requirements?

> + * @gfp: buddy allocator flags
> + * @order: page order
> + *
> + * returns the head struct page of the allocated page.
> + */
> +static inline struct page *__iommu_alloc_pages_node(int nid, gfp_t gfp,
> +						    int order)
> +{
> +	struct page *pages;

s/pages/page/ here and later in this file.

> +
> +	pages = alloc_pages_node(nid, gfp | __GFP_ZERO, order);
> +	if (!pages)

unlikely()?

> +		return NULL;
> +
> +	return pages;
> +}
> +
> +/**
> + * __iommu_alloc_pages - allocate a zeroed page of a given order.
> + * @gfp: buddy allocator flags
> + * @order: page order
> + *
> + * returns the head struct page of the allocated page.
> + */
> +static inline struct page *__iommu_alloc_pages(gfp_t gfp, int order)
> +{
> +	struct page *pages;
> +
> +	pages = alloc_pages(gfp | __GFP_ZERO, order);
> +	if (!pages)
> +		return NULL;
> +
> +	return pages;
> +}
> +
> +/**
> + * __iommu_alloc_page_node - allocate a zeroed page at specific NUMA node.
> + * @nid: memory NUMA node id
> + * @gfp: buddy allocator flags
> + *
> + * returns the struct page of the allocated page.
> + */
> +static inline struct page *__iommu_alloc_page_node(int nid, gfp_t gfp)
> +{
> +	return __iommu_alloc_pages_node(nid, gfp, 0);
> +}
> +
> +/**
> + * __iommu_alloc_page - allocate a zeroed page
> + * @gfp: buddy allocator flags
> + *
> + * returns the struct page of the allocated page.
> + */
> +static inline struct page *__iommu_alloc_page(gfp_t gfp)
> +{
> +	return __iommu_alloc_pages(gfp, 0);
> +}
> +
> +/**
> + * __iommu_free_pages - free page of a given order
> + * @pages: head struct page of the page

I think "pages" implies more than one page, this is just a (potentially 
compound) page?

> + * @order: page order
> + */
> +static inline void __iommu_free_pages(struct page *pages, int order)
> +{
> +	if (!pages)
> +		return;
> +
> +	__free_pages(pages, order);
> +}
> +
> +/**
> + * __iommu_free_page - free page
> + * @page: struct page of the page
> + */
> +static inline void __iommu_free_page(struct page *page)
> +{
> +	__iommu_free_pages(page, 0);
> +}
> +
> +/**
> + * iommu_alloc_pages_node - allocate a zeroed page of a given order from
> + * specific NUMA node.
> + * @nid: memory NUMA node id
> + * @gfp: buddy allocator flags
> + * @order: page order
> + *
> + * returns the virtual address of the allocated page
> + */
> +static inline void *iommu_alloc_pages_node(int nid, gfp_t gfp, int order)
> +{
> +	struct page *pages = __iommu_alloc_pages_node(nid, gfp, order);
> +
> +	if (!pages)
> +		return NULL;
> +
> +	return page_address(pages);
> +}
> +
> +/**
> + * iommu_alloc_pages - allocate a zeroed page of a given order
> + * @gfp: buddy allocator flags
> + * @order: page order
> + *
> + * returns the virtual address of the allocated page
> + */
> +static inline void *iommu_alloc_pages(gfp_t gfp, int order)
> +{
> +	struct page *pages = __iommu_alloc_pages(gfp, order);
> +
> +	if (!pages)
> +		return NULL;
> +
> +	return page_address(pages);
> +}
> +
> +/**
> + * iommu_alloc_page_node - allocate a zeroed page at specific NUMA node.
> + * @nid: memory NUMA node id
> + * @gfp: buddy allocator flags
> + *
> + * returns the virtual address of the allocated page
> + */
> +static inline void *iommu_alloc_page_node(int nid, gfp_t gfp)
> +{
> +	return iommu_alloc_pages_node(nid, gfp, 0);
> +}
> +
> +/**
> + * iommu_alloc_page - allocate a zeroed page
> + * @gfp: buddy allocator flags
> + *
> + * returns the virtual address of the allocated page
> + */
> +static inline void *iommu_alloc_page(gfp_t gfp)
> +{
> +	return iommu_alloc_pages(gfp, 0);
> +}
> +
> +/**
> + * iommu_free_pages - free page of a given order
> + * @virt: virtual address of the page to be freed.
> + * @order: page order
> + */
> +static inline void iommu_free_pages(void *virt, int order)
> +{
> +	if (!virt)
> +		return;
> +
> +	__iommu_free_pages(virt_to_page(virt), order);
> +}
> +
> +/**
> + * iommu_free_page - free page
> + * @virt: virtual address of the page to be freed.
> + */
> +static inline void iommu_free_page(void *virt)
> +{
> +	iommu_free_pages(virt, 0);
> +}
> +
> +/**
> + * iommu_free_pages_list - free a list of pages.
> + * @pages: the head of the lru list to be freed.

Document the locking requirements for this?

> + */
> +static inline void iommu_free_pages_list(struct list_head *pages)
> +{
> +	while (!list_empty(pages)) {
> +		struct page *p = list_entry(pages->prev, struct page, lru);
> +
> +		list_del(&p->lru);
> +		put_page(p);
> +	}
> +}
> +
> +#endif	/* __IOMMU_PAGES_H */
> -- 
> 2.43.0.rc2.451.g8631bc7472-goog
> 
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 09/10] iommu: observability of the IOMMU allocations
  2023-11-30 20:15 ` [PATCH v2 09/10] iommu: observability of the IOMMU allocations Pasha Tatashin
@ 2023-12-14 17:59   ` David Rientjes
  2023-12-14 19:18     ` Pasha Tatashin
  0 siblings, 1 reply; 32+ messages in thread
From: David Rientjes @ 2023-12-14 17:59 UTC (permalink / raw
  To: Pasha Tatashin
  Cc: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

On Thu, 30 Nov 2023, Pasha Tatashin wrote:

> Add NR_IOMMU_PAGES into node_stat_item that counts number of pages
> that are allocated by the IOMMU subsystem.
> 
> The allocations can be view per-node via:
> /sys/devices/system/node/nodeN/vmstat.
> 
> For example:
> 
> $ grep iommu /sys/devices/system/node/node*/vmstat
> /sys/devices/system/node/node0/vmstat:nr_iommu_pages 106025
> /sys/devices/system/node/node1/vmstat:nr_iommu_pages 3464
> 
> The value is in page-count, therefore, in the above example
> the iommu allocations amount to ~428M.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 10/10] iommu: account IOMMU allocated memory
  2023-11-30 20:15 ` [PATCH v2 10/10] iommu: account IOMMU allocated memory Pasha Tatashin
@ 2023-12-14 18:02   ` David Rientjes
  2023-12-15 21:11     ` Pasha Tatashin
  0 siblings, 1 reply; 32+ messages in thread
From: David Rientjes @ 2023-12-14 18:02 UTC (permalink / raw
  To: Pasha Tatashin
  Cc: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

On Thu, 30 Nov 2023, Pasha Tatashin wrote:

> In order to be able to limit the amount of memory that is allocated
> by IOMMU subsystem, the memory must be accounted.
> 
> Account IOMMU as part of the secondary pagetables as it was discussed
> at LPC.
> 
> The value of SecPageTables now contains mmeory allocation by IOMMU
> and KVM.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
>  Documentation/admin-guide/cgroup-v2.rst | 2 +-
>  Documentation/filesystems/proc.rst      | 4 ++--
>  drivers/iommu/iommu-pages.h             | 2 ++
>  include/linux/mmzone.h                  | 2 +-
>  4 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 3f85254f3cef..e004e05a7cde 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1418,7 +1418,7 @@ PAGE_SIZE multiple when read back.
>  	  sec_pagetables
>  		Amount of memory allocated for secondary page tables,
>  		this currently includes KVM mmu allocations on x86
> -		and arm64.
> +		and arm64 and IOMMU page tables.

Hmm, if existing users are parsing this field and alerting when it exceeds 
an expected value (a cloud provider, let's say), is it safe to add in a 
whole new set of page tables?

I understand the documentation allows for it, but I think potential impact 
on userspace would be more interesting.

>  
>  	  percpu (npn)
>  		Amount of memory used for storing per-cpu kernel
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index 49ef12df631b..86f137a9b66b 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -1110,8 +1110,8 @@ KernelStack
>  PageTables
>                Memory consumed by userspace page tables
>  SecPageTables
> -              Memory consumed by secondary page tables, this currently
> -              currently includes KVM mmu allocations on x86 and arm64.
> +              Memory consumed by secondary page tables, this currently includes
> +              KVM mmu and IOMMU allocations on x86 and arm64.
>  NFS_Unstable
>                Always zero. Previous counted pages which had been written to
>                the server, but has not been committed to stable storage.
> diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
> index 69895a355c0c..cdd257585284 100644
> --- a/drivers/iommu/iommu-pages.h
> +++ b/drivers/iommu/iommu-pages.h
> @@ -27,6 +27,7 @@ static inline void __iommu_alloc_account(struct page *pages, int order)
>  	const long pgcnt = 1l << order;
>  
>  	mod_node_page_state(page_pgdat(pages), NR_IOMMU_PAGES, pgcnt);
> +	mod_lruvec_page_state(pages, NR_SECONDARY_PAGETABLE, pgcnt);
>  }
>  
>  /**
> @@ -39,6 +40,7 @@ static inline void __iommu_free_account(struct page *pages, int order)
>  	const long pgcnt = 1l << order;
>  
>  	mod_node_page_state(page_pgdat(pages), NR_IOMMU_PAGES, -pgcnt);
> +	mod_lruvec_page_state(pages, NR_SECONDARY_PAGETABLE, -pgcnt);
>  }
>  
>  /**
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 1a4d0bba3e8b..aaabb385663c 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -199,7 +199,7 @@ enum node_stat_item {
>  	NR_KERNEL_SCS_KB,	/* measured in KiB */
>  #endif
>  	NR_PAGETABLE,		/* used for pagetables */
> -	NR_SECONDARY_PAGETABLE, /* secondary pagetables, e.g. KVM pagetables */
> +	NR_SECONDARY_PAGETABLE, /* secondary pagetables, KVM & IOMMU */
>  #ifdef CONFIG_IOMMU_SUPPORT
>  	NR_IOMMU_PAGES,		/* # of pages allocated by IOMMU */
>  #endif
> -- 
> 2.43.0.rc2.451.g8631bc7472-goog
> 
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 01/10] iommu/vt-d: add wrapper functions for page allocations
  2023-12-14 17:58   ` David Rientjes
@ 2023-12-14 19:16     ` Pasha Tatashin
  2023-12-24 21:30       ` David Rientjes
  2023-12-26  6:09       ` Liu, Jingqi
  0 siblings, 2 replies; 32+ messages in thread
From: Pasha Tatashin @ 2023-12-14 19:16 UTC (permalink / raw
  To: David Rientjes
  Cc: Andrew Morton, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas,
	cgroups, corbet, david, dwmw2, hannes, heiko, iommu,
	jernej.skrabec, jonathanh, joro, krzysztof.kozlowski, linux-doc,
	linux-fsdevel, linux-kernel, linux-mm, linux-rockchip,
	linux-samsung-soc, linux-sunxi, linux-tegra, lizefan.x, marcan,
	mhiramat, m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

On Thu, Dec 14, 2023 at 12:58 PM David Rientjes <rientjes@google.com> wrote:
>
> On Thu, 30 Nov 2023, Pasha Tatashin wrote:
>
> > diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
> > new file mode 100644
> > index 000000000000..2332f807d514
> > --- /dev/null
> > +++ b/drivers/iommu/iommu-pages.h
> > @@ -0,0 +1,199 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (c) 2023, Google LLC.
> > + * Pasha Tatashin <pasha.tatashin@soleen.com>
> > + */
> > +
> > +#ifndef __IOMMU_PAGES_H
> > +#define __IOMMU_PAGES_H
> > +
> > +#include <linux/vmstat.h>
> > +#include <linux/gfp.h>
> > +#include <linux/mm.h>
> > +
> > +/*
> > + * All page allocation that are performed in the IOMMU subsystem must use one of
> > + * the functions below.  This is necessary for the proper accounting as IOMMU
> > + * state can be rather large, i.e. multiple gigabytes in size.
> > + */
> > +
> > +/**
> > + * __iommu_alloc_pages_node - allocate a zeroed page of a given order from
> > + * specific NUMA node.
> > + * @nid: memory NUMA node id
>
> NUMA_NO_NODE if no locality requirements?

If no locality is required, there is a better interface:
__iommu_alloc_pages(). That one will also take a look at the calling
process policies to determine the proper NUMA node when nothing is
specified. However, when policies should be ignored, and no locality
required, NUMA_NO_NODE can be passed.

>
> > + * @gfp: buddy allocator flags
> > + * @order: page order
> > + *
> > + * returns the head struct page of the allocated page.
> > + */
> > +static inline struct page *__iommu_alloc_pages_node(int nid, gfp_t gfp,
> > +                                                 int order)
> > +{
> > +     struct page *pages;
>
> s/pages/page/ here and later in this file.

In this file, where there a page with an "order", I reference it with
"pages", when no order (i.e. order = 0), I reference it with "page"

I.e.: __iommu_alloc_page vs. __iommu_alloc_pages

>
> > +
> > +     pages = alloc_pages_node(nid, gfp | __GFP_ZERO, order);
> > +     if (!pages)
>
> unlikely()?

Will add it.

>
> > +             return NULL;
> > +
> > +     return pages;
> > +}
> > +
> > +/**
> > + * __iommu_alloc_pages - allocate a zeroed page of a given order.
> > + * @gfp: buddy allocator flags
> > + * @order: page order
> > + *
> > + * returns the head struct page of the allocated page.
> > + */
> > +static inline struct page *__iommu_alloc_pages(gfp_t gfp, int order)
> > +{
> > +     struct page *pages;
> > +
> > +     pages = alloc_pages(gfp | __GFP_ZERO, order);
> > +     if (!pages)
> > +             return NULL;
> > +
> > +     return pages;
> > +}
> > +
> > +/**
> > + * __iommu_alloc_page_node - allocate a zeroed page at specific NUMA node.
> > + * @nid: memory NUMA node id
> > + * @gfp: buddy allocator flags
> > + *
> > + * returns the struct page of the allocated page.
> > + */
> > +static inline struct page *__iommu_alloc_page_node(int nid, gfp_t gfp)
> > +{
> > +     return __iommu_alloc_pages_node(nid, gfp, 0);
> > +}
> > +
> > +/**
> > + * __iommu_alloc_page - allocate a zeroed page
> > + * @gfp: buddy allocator flags
> > + *
> > + * returns the struct page of the allocated page.
> > + */
> > +static inline struct page *__iommu_alloc_page(gfp_t gfp)
> > +{
> > +     return __iommu_alloc_pages(gfp, 0);
> > +}
> > +
> > +/**
> > + * __iommu_free_pages - free page of a given order
> > + * @pages: head struct page of the page
>
> I think "pages" implies more than one page, this is just a (potentially
> compound) page?

Yes, more than one page, basically, when order may be > 0.

> > +/**
> > + * iommu_free_page - free page
> > + * @virt: virtual address of the page to be freed.
> > + */
> > +static inline void iommu_free_page(void *virt)
> > +{
> > +     iommu_free_pages(virt, 0);
> > +}
> > +
> > +/**
> > + * iommu_free_pages_list - free a list of pages.
> > + * @pages: the head of the lru list to be freed.
>
> Document the locking requirements for this?

Thank you for the review. I will add info about locking requirements,
in fact they are very relaxed.

These pages are added to the list by unmaps or remaps operation in
Intel IOMMU implementation. These calls assume that whoever is doing
those operations has exclusive access to the VA range in the page
table of that operation. The pages in this freelist only belong to the
former page-tables from the IOVA range for those operations.

> > + */
> > +static inline void iommu_free_pages_list(struct list_head *pages)
> > +{
> > +     while (!list_empty(pages)) {
> > +             struct page *p = list_entry(pages->prev, struct page, lru);
> > +
> > +             list_del(&p->lru);
> > +             put_page(p);
> > +     }
> > +}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 09/10] iommu: observability of the IOMMU allocations
  2023-12-14 17:59   ` David Rientjes
@ 2023-12-14 19:18     ` Pasha Tatashin
  0 siblings, 0 replies; 32+ messages in thread
From: Pasha Tatashin @ 2023-12-14 19:18 UTC (permalink / raw
  To: David Rientjes
  Cc: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

On Thu, Dec 14, 2023 at 12:59 PM David Rientjes <rientjes@google.com> wrote:
>
> On Thu, 30 Nov 2023, Pasha Tatashin wrote:
>
> > Add NR_IOMMU_PAGES into node_stat_item that counts number of pages
> > that are allocated by the IOMMU subsystem.
> >
> > The allocations can be view per-node via:
> > /sys/devices/system/node/nodeN/vmstat.
> >
> > For example:
> >
> > $ grep iommu /sys/devices/system/node/node*/vmstat
> > /sys/devices/system/node/node0/vmstat:nr_iommu_pages 106025
> > /sys/devices/system/node/node1/vmstat:nr_iommu_pages 3464
> >
> > The value is in page-count, therefore, in the above example
> > the iommu allocations amount to ~428M.
> >
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
>
> Acked-by: David Rientjes <rientjes@google.com>

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 10/10] iommu: account IOMMU allocated memory
  2023-12-14 18:02   ` David Rientjes
@ 2023-12-15 21:11     ` Pasha Tatashin
  2023-12-24 21:44       ` David Rientjes
  0 siblings, 1 reply; 32+ messages in thread
From: Pasha Tatashin @ 2023-12-15 21:11 UTC (permalink / raw
  To: David Rientjes
  Cc: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

> > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > index 3f85254f3cef..e004e05a7cde 100644
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -1418,7 +1418,7 @@ PAGE_SIZE multiple when read back.
> >         sec_pagetables
> >               Amount of memory allocated for secondary page tables,
> >               this currently includes KVM mmu allocations on x86
> > -             and arm64.
> > +             and arm64 and IOMMU page tables.
>
> Hmm, if existing users are parsing this field and alerting when it exceeds
> an expected value (a cloud provider, let's say), is it safe to add in a
> whole new set of page tables?
>
> I understand the documentation allows for it, but I think potential impact
> on userspace would be more interesting.

Hi David,

This is something that was discussed at LPC'23. I also was proposing a
separate counter for iommu page tables, but it was noted that we
specifically have sec_pagetables called this way to include all non
regular CPU page tables, and we should therefore account for them
together.

Please also see this discussion from the previous version of this patch series:
https://lore.kernel.org/all/CAJD7tkb1FqTqwONrp2nphBDkEamQtPCOFm0208H3tp0Gq2OLMQ@mail.gmail.com/

Pasha

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 01/10] iommu/vt-d: add wrapper functions for page allocations
  2023-12-14 19:16     ` Pasha Tatashin
@ 2023-12-24 21:30       ` David Rientjes
  2023-12-24 21:48         ` Matthew Wilcox
  2023-12-26  6:09       ` Liu, Jingqi
  1 sibling, 1 reply; 32+ messages in thread
From: David Rientjes @ 2023-12-24 21:30 UTC (permalink / raw
  To: Pasha Tatashin
  Cc: Andrew Morton, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas,
	cgroups, corbet, david, dwmw2, hannes, heiko, iommu,
	jernej.skrabec, jonathanh, joro, krzysztof.kozlowski, linux-doc,
	linux-fsdevel, linux-kernel, linux-mm, linux-rockchip,
	linux-samsung-soc, linux-sunxi, linux-tegra, lizefan.x, marcan,
	mhiramat, m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

[-- Attachment #1: Type: text/plain, Size: 2355 bytes --]

On Thu, 14 Dec 2023, Pasha Tatashin wrote:

> On Thu, Dec 14, 2023 at 12:58 PM David Rientjes <rientjes@google.com> wrote:
> >
> > On Thu, 30 Nov 2023, Pasha Tatashin wrote:
> >
> > > diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
> > > new file mode 100644
> > > index 000000000000..2332f807d514
> > > --- /dev/null
> > > +++ b/drivers/iommu/iommu-pages.h
> > > @@ -0,0 +1,199 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > +/*
> > > + * Copyright (c) 2023, Google LLC.
> > > + * Pasha Tatashin <pasha.tatashin@soleen.com>
> > > + */
> > > +
> > > +#ifndef __IOMMU_PAGES_H
> > > +#define __IOMMU_PAGES_H
> > > +
> > > +#include <linux/vmstat.h>
> > > +#include <linux/gfp.h>
> > > +#include <linux/mm.h>
> > > +
> > > +/*
> > > + * All page allocation that are performed in the IOMMU subsystem must use one of
> > > + * the functions below.  This is necessary for the proper accounting as IOMMU
> > > + * state can be rather large, i.e. multiple gigabytes in size.
> > > + */
> > > +
> > > +/**
> > > + * __iommu_alloc_pages_node - allocate a zeroed page of a given order from
> > > + * specific NUMA node.
> > > + * @nid: memory NUMA node id
> >
> > NUMA_NO_NODE if no locality requirements?
> 
> If no locality is required, there is a better interface:
> __iommu_alloc_pages(). That one will also take a look at the calling
> process policies to determine the proper NUMA node when nothing is
> specified. However, when policies should be ignored, and no locality
> required, NUMA_NO_NODE can be passed.
> 

Gotcha, thanks!

> >
> > > + * @gfp: buddy allocator flags
> > > + * @order: page order
> > > + *
> > > + * returns the head struct page of the allocated page.
> > > + */
> > > +static inline struct page *__iommu_alloc_pages_node(int nid, gfp_t gfp,
> > > +                                                 int order)
> > > +{
> > > +     struct page *pages;
> >
> > s/pages/page/ here and later in this file.
> 
> In this file, where there a page with an "order", I reference it with
> "pages", when no order (i.e. order = 0), I reference it with "page"
> 
> I.e.: __iommu_alloc_page vs. __iommu_alloc_pages
> 

Eh, the struct page points to a (potentially compound) page, not a set or 
list of pages.  I won't bikeshed on it, but "struct page *pages" never 
makes sense unless it's **pages or *pages[] :)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 02/10] iommu/amd: use page allocation function provided by iommu-pages.h
  2023-11-30 20:14 ` [PATCH v2 02/10] iommu/amd: use page allocation function provided by iommu-pages.h Pasha Tatashin
@ 2023-12-24 21:34   ` David Rientjes
  0 siblings, 0 replies; 32+ messages in thread
From: David Rientjes @ 2023-12-24 21:34 UTC (permalink / raw
  To: Pasha Tatashin
  Cc: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

On Thu, 30 Nov 2023, Pasha Tatashin wrote:

> Convert iommu/amd/* files to use the new page allocation functions
> provided in iommu-pages.h.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>

Acked-by: David Rientjes <rientjes@google.com>

Although any feedback from Suravee and Joerg would be very useful on the 
overall approach and whether this looks good to them.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 03/10] iommu/io-pgtable-arm: use page allocation function provided by iommu-pages.h
  2023-11-30 20:14 ` [PATCH v2 03/10] iommu/io-pgtable-arm: " Pasha Tatashin
@ 2023-12-24 21:34   ` David Rientjes
  0 siblings, 0 replies; 32+ messages in thread
From: David Rientjes @ 2023-12-24 21:34 UTC (permalink / raw
  To: Pasha Tatashin
  Cc: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

On Thu, 30 Nov 2023, Pasha Tatashin wrote:

> Convert iommu/io-pgtable-arm.c to use the new page allocation functions
> provided in iommu-pages.h.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 04/10] iommu/io-pgtable-dart: use page allocation function provided by iommu-pages.h
  2023-11-30 20:14 ` [PATCH v2 04/10] iommu/io-pgtable-dart: " Pasha Tatashin
@ 2023-12-24 21:36   ` David Rientjes
  2023-12-26 18:08     ` Pasha Tatashin
  0 siblings, 1 reply; 32+ messages in thread
From: David Rientjes @ 2023-12-24 21:36 UTC (permalink / raw
  To: Pasha Tatashin
  Cc: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

On Thu, 30 Nov 2023, Pasha Tatashin wrote:

> Convert iommu/io-pgtable-dart.c to use the new page allocation functions
> provided in iommu-pages.h.
> 

... and remove unnecessary struct io_pgtable_cfg formal to 
__dart_alloc_pages() while there :)

> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> Reviewed-by: Janne Grunau <j@jannau.net>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 05/10] iommu/exynos: use page allocation function provided by iommu-pages.h
  2023-11-30 20:14 ` [PATCH v2 05/10] iommu/exynos: " Pasha Tatashin
@ 2023-12-24 21:37   ` David Rientjes
  0 siblings, 0 replies; 32+ messages in thread
From: David Rientjes @ 2023-12-24 21:37 UTC (permalink / raw
  To: Pasha Tatashin
  Cc: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

On Thu, 30 Nov 2023, Pasha Tatashin wrote:

> Convert iommu/exynos-iommu.c to use the new page allocation functions
> provided in iommu-pages.h.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 06/10] iommu/rockchip: use page allocation function provided by iommu-pages.h
  2023-11-30 20:15 ` [PATCH v2 06/10] iommu/rockchip: " Pasha Tatashin
@ 2023-12-24 21:39   ` David Rientjes
  0 siblings, 0 replies; 32+ messages in thread
From: David Rientjes @ 2023-12-24 21:39 UTC (permalink / raw
  To: Pasha Tatashin
  Cc: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

On Thu, 30 Nov 2023, Pasha Tatashin wrote:

> Convert iommu/rockchip-iommu.c to use the new page allocation functions
> provided in iommu-pages.h.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 07/10] iommu/sun50i: use page allocation function provided by iommu-pages.h
  2023-11-30 20:15 ` [PATCH v2 07/10] iommu/sun50i: " Pasha Tatashin
@ 2023-12-24 21:39   ` David Rientjes
  0 siblings, 0 replies; 32+ messages in thread
From: David Rientjes @ 2023-12-24 21:39 UTC (permalink / raw
  To: Pasha Tatashin
  Cc: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

On Thu, 30 Nov 2023, Pasha Tatashin wrote:

> Convert iommu/sun50i-iommu.c to use the new page allocation functions
> provided in iommu-pages.h.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 08/10] iommu/tegra-smmu: use page allocation function provided by iommu-pages.h
  2023-11-30 20:15 ` [PATCH v2 08/10] iommu/tegra-smmu: " Pasha Tatashin
@ 2023-12-24 21:40   ` David Rientjes
  0 siblings, 0 replies; 32+ messages in thread
From: David Rientjes @ 2023-12-24 21:40 UTC (permalink / raw
  To: Pasha Tatashin
  Cc: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

On Thu, 30 Nov 2023, Pasha Tatashin wrote:

> Convert iommu/tegra-smmu.c to use the new page allocation functions
> provided in iommu-pages.h.
> 
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 10/10] iommu: account IOMMU allocated memory
  2023-12-15 21:11     ` Pasha Tatashin
@ 2023-12-24 21:44       ` David Rientjes
  0 siblings, 0 replies; 32+ messages in thread
From: David Rientjes @ 2023-12-24 21:44 UTC (permalink / raw
  To: Pasha Tatashin
  Cc: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

On Fri, 15 Dec 2023, Pasha Tatashin wrote:

> > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > > index 3f85254f3cef..e004e05a7cde 100644
> > > --- a/Documentation/admin-guide/cgroup-v2.rst
> > > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > > @@ -1418,7 +1418,7 @@ PAGE_SIZE multiple when read back.
> > >         sec_pagetables
> > >               Amount of memory allocated for secondary page tables,
> > >               this currently includes KVM mmu allocations on x86
> > > -             and arm64.
> > > +             and arm64 and IOMMU page tables.
> >
> > Hmm, if existing users are parsing this field and alerting when it exceeds
> > an expected value (a cloud provider, let's say), is it safe to add in a
> > whole new set of page tables?
> >
> > I understand the documentation allows for it, but I think potential impact
> > on userspace would be more interesting.
> 
> Hi David,
> 
> This is something that was discussed at LPC'23. I also was proposing a
> separate counter for iommu page tables, but it was noted that we
> specifically have sec_pagetables called this way to include all non
> regular CPU page tables, and we should therefore account for them
> together.
> 
> Please also see this discussion from the previous version of this patch series:
> https://lore.kernel.org/all/CAJD7tkb1FqTqwONrp2nphBDkEamQtPCOFm0208H3tp0Gq2OLMQ@mail.gmail.com/
> 

Gotcha, I think that makes sense.  When sec_pagetables was introduced, I 
can understand the need to account for non-primary pagetables separately 
because of the long-standing behavior.  In that sense, sec_pagetables 
becomes a dumping ground for "all other page tables" which IOMMU would 
naturally include.

So this looks good to me.

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 01/10] iommu/vt-d: add wrapper functions for page allocations
  2023-12-24 21:30       ` David Rientjes
@ 2023-12-24 21:48         ` Matthew Wilcox
  2023-12-26 17:57           ` Pasha Tatashin
  0 siblings, 1 reply; 32+ messages in thread
From: Matthew Wilcox @ 2023-12-24 21:48 UTC (permalink / raw
  To: David Rientjes
  Cc: Pasha Tatashin, Andrew Morton, alim.akhtar, alyssa, asahi,
	baolu.lu, bhelgaas, cgroups, corbet, david, dwmw2, hannes, heiko,
	iommu, jernej.skrabec, jonathanh, joro, krzysztof.kozlowski,
	linux-doc, linux-fsdevel, linux-kernel, linux-mm, linux-rockchip,
	linux-samsung-soc, linux-sunxi, linux-tegra, lizefan.x, marcan,
	mhiramat, m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

On Sun, Dec 24, 2023 at 01:30:50PM -0800, David Rientjes wrote:
> > > s/pages/page/ here and later in this file.
> > 
> > In this file, where there a page with an "order", I reference it with
> > "pages", when no order (i.e. order = 0), I reference it with "page"
> > 
> > I.e.: __iommu_alloc_page vs. __iommu_alloc_pages
> > 
> 
> Eh, the struct page points to a (potentially compound) page, not a set or 
> list of pages.  I won't bikeshed on it, but "struct page *pages" never 
> makes sense unless it's **pages or *pages[] :)

I'd suggest that 'pages' also makes sense when _not_ using __GFP_COMP,
as we do in fact allocate an array of pages in that case.

That said, we shouldn't encourage the use of non-compound allocations.
It would also be good for someone to define a memdesc for iommu memory
like we have already for slab.  We'll need it eventually, and it'll work
out better if someone who understands iommus (ie not me) does it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 00/10] IOMMU memory observability
  2023-11-30 20:14 [PATCH v2 00/10] IOMMU memory observability Pasha Tatashin
                   ` (9 preceding siblings ...)
  2023-11-30 20:15 ` [PATCH v2 10/10] iommu: account IOMMU allocated memory Pasha Tatashin
@ 2023-12-24 21:49 ` David Rientjes
  10 siblings, 0 replies; 32+ messages in thread
From: David Rientjes @ 2023-12-24 21:49 UTC (permalink / raw
  To: Pasha Tatashin
  Cc: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

On Thu, 30 Nov 2023, Pasha Tatashin wrote:

> IOMMU subsystem may contain state that is in gigabytes. Majority of that
> state is iommu page tables. Yet, there is currently, no way to observe
> how much memory is actually used by the iommu subsystem.
> 
> This patch series solves this problem by adding both observability to
> all pages that are allocated by IOMMU, and also accountability, so
> admins can limit the amount if via cgroups.
> 
> The system-wide observability is using /proc/meminfo:
> SecPageTables:    438176 kB
> 
> Contains IOMMU and KVM memory.
> 
> Per-node observability:
> /sys/devices/system/node/nodeN/meminfo
> Node N SecPageTables:    422204 kB
> 
> Contains IOMMU and KVM memory memory in the given NUMA node.
> 
> Per-node IOMMU only observability:
> /sys/devices/system/node/nodeN/vmstat
> nr_iommu_pages 105555
> 
> Contains number of pages IOMMU allocated in the given node.
> 
> Accountability: using sec_pagetables cgroup-v2 memory.stat entry.
> 
> With the change, iova_stress[1] stops as limit is reached:
> 
> # ./iova_stress
> iova space:     0T      free memory:   497G
> iova space:     1T      free memory:   495G
> iova space:     2T      free memory:   493G
> iova space:     3T      free memory:   491G
> 
> stops as limit is reached.
> 

I think this is *very* useful to provide visibility into a significant 
amount of memory that we currently cannot observe on a host.  It can help 
to uncover bugs and shed light onto a particularly large amount of memory 
that would otherwise be mysterious.

Joerg, Will, Robin, I think this series would go through the 
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git tree since it 
depends on a common framework for all other IOMMU implementations to then 
use?

Any concerns about this patch series?  It would be very useful for us to 
create visibility into this memory.


Pasha: any chance of adding a selftest that can be run that will test the 
value of nr_iommu_pages?  I could imagine in the future that a bug could 
be introduced where either an allocation or free is done through 
alloc_pages() directly and its paired alloc/free function now results in a 
leak or underflow.

> This series encorporates suggestions that came from the discussion
> at LPC [2].
> ----------------------------------------------------------------------
> [1] https://github.com/soleen/iova_stress
> [2] https://lpc.events/event/17/contributions/1466
> ----------------------------------------------------------------------
> Previous versions
> v1: https://lore.kernel.org/all/20231128204938.1453583-1-pasha.tatashin@soleen.com
> ----------------------------------------------------------------------
> 
> Pasha Tatashin (10):
>   iommu/vt-d: add wrapper functions for page allocations
>   iommu/amd: use page allocation function provided by iommu-pages.h
>   iommu/io-pgtable-arm: use page allocation function provided by
>     iommu-pages.h
>   iommu/io-pgtable-dart: use page allocation function provided by
>     iommu-pages.h
>   iommu/exynos: use page allocation function provided by iommu-pages.h
>   iommu/rockchip: use page allocation function provided by iommu-pages.h
>   iommu/sun50i: use page allocation function provided by iommu-pages.h
>   iommu/tegra-smmu: use page allocation function provided by
>     iommu-pages.h
>   iommu: observability of the IOMMU allocations
>   iommu: account IOMMU allocated memory
> 
>  Documentation/admin-guide/cgroup-v2.rst |   2 +-
>  Documentation/filesystems/proc.rst      |   4 +-
>  drivers/iommu/amd/amd_iommu.h           |   8 -
>  drivers/iommu/amd/init.c                |  91 +++++-----
>  drivers/iommu/amd/io_pgtable.c          |  13 +-
>  drivers/iommu/amd/io_pgtable_v2.c       |  20 +-
>  drivers/iommu/amd/iommu.c               |  13 +-
>  drivers/iommu/exynos-iommu.c            |  14 +-
>  drivers/iommu/intel/dmar.c              |  10 +-
>  drivers/iommu/intel/iommu.c             |  47 ++---
>  drivers/iommu/intel/iommu.h             |   2 -
>  drivers/iommu/intel/irq_remapping.c     |  10 +-
>  drivers/iommu/intel/pasid.c             |  12 +-
>  drivers/iommu/intel/svm.c               |   7 +-
>  drivers/iommu/io-pgtable-arm.c          |   7 +-
>  drivers/iommu/io-pgtable-dart.c         |  37 ++--
>  drivers/iommu/iommu-pages.h             | 231 ++++++++++++++++++++++++
>  drivers/iommu/rockchip-iommu.c          |  14 +-
>  drivers/iommu/sun50i-iommu.c            |   7 +-
>  drivers/iommu/tegra-smmu.c              |  18 +-
>  include/linux/mmzone.h                  |   5 +-
>  mm/vmstat.c                             |   3 +
>  22 files changed, 390 insertions(+), 185 deletions(-)
>  create mode 100644 drivers/iommu/iommu-pages.h
> 
> -- 
> 2.43.0.rc2.451.g8631bc7472-goog
> 
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 01/10] iommu/vt-d: add wrapper functions for page allocations
  2023-12-14 19:16     ` Pasha Tatashin
  2023-12-24 21:30       ` David Rientjes
@ 2023-12-26  6:09       ` Liu, Jingqi
  2023-12-26 17:14         ` Pasha Tatashin
  1 sibling, 1 reply; 32+ messages in thread
From: Liu, Jingqi @ 2023-12-26  6:09 UTC (permalink / raw
  To: Pasha Tatashin, David Rientjes
  Cc: Andrew Morton, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas,
	cgroups, corbet, david, dwmw2, hannes, heiko, iommu,
	jernej.skrabec, jonathanh, joro, krzysztof.kozlowski, linux-doc,
	linux-fsdevel, linux-kernel, linux-mm, linux-rockchip,
	linux-samsung-soc, linux-sunxi, linux-tegra, lizefan.x, marcan,
	mhiramat, m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu



On 12/15/2023 3:16 AM, Pasha Tatashin wrote:
> On Thu, Dec 14, 2023 at 12:58 PM David Rientjes <rientjes@google.com> wrote:
>>
>> On Thu, 30 Nov 2023, Pasha Tatashin wrote:
>>
>>> diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
>>> new file mode 100644
>>> index 000000000000..2332f807d514
>>> --- /dev/null
>>> +++ b/drivers/iommu/iommu-pages.h
>>> @@ -0,0 +1,199 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +/*
>>> + * Copyright (c) 2023, Google LLC.
>>> + * Pasha Tatashin <pasha.tatashin@soleen.com>
>>> + */
>>> +
>>> +#ifndef __IOMMU_PAGES_H
>>> +#define __IOMMU_PAGES_H
>>> +
>>> +#include <linux/vmstat.h>
>>> +#include <linux/gfp.h>
>>> +#include <linux/mm.h>
>>> +
>>> +/*
>>> + * All page allocation that are performed in the IOMMU subsystem must use one of
>>> + * the functions below.  This is necessary for the proper accounting as IOMMU
>>> + * state can be rather large, i.e. multiple gigabytes in size.
>>> + */
>>> +
>>> +/**
>>> + * __iommu_alloc_pages_node - allocate a zeroed page of a given order from
>>> + * specific NUMA node.
>>> + * @nid: memory NUMA node id
>>
>> NUMA_NO_NODE if no locality requirements?
> 
> If no locality is required, there is a better interface:
> __iommu_alloc_pages(). That one will also take a look at the calling
> process policies to determine the proper NUMA node when nothing is
> specified. However, when policies should be ignored, and no locality
> required, NUMA_NO_NODE can be passed.
> 
>>
>>> + * @gfp: buddy allocator flags
>>> + * @order: page order
>>> + *
>>> + * returns the head struct page of the allocated page.
>>> + */
>>> +static inline struct page *__iommu_alloc_pages_node(int nid, gfp_t gfp,
>>> +                                                 int order)
>>> +{
>>> +     struct page *pages;
>>
>> s/pages/page/ here and later in this file.
> 
> In this file, where there a page with an "order", I reference it with
> "pages", when no order (i.e. order = 0), I reference it with "page"
> 
> I.e.: __iommu_alloc_page vs. __iommu_alloc_pages
> 
>>
>>> +
>>> +     pages = alloc_pages_node(nid, gfp | __GFP_ZERO, order);
>>> +     if (!pages)
>>
>> unlikely()?
> 
> Will add it.
> 
>>
>>> +             return NULL;
>>> +
>>> +     return pages;
>>> +}
>>> +
>>> +/**
>>> + * __iommu_alloc_pages - allocate a zeroed page of a given order.
>>> + * @gfp: buddy allocator flags
>>> + * @order: page order
>>> + *
>>> + * returns the head struct page of the allocated page.
>>> + */
>>> +static inline struct page *__iommu_alloc_pages(gfp_t gfp, int order)
>>> +{
>>> +     struct page *pages;
>>> +
>>> +     pages = alloc_pages(gfp | __GFP_ZERO, order);
>>> +     if (!pages)
>>> +             return NULL;
>>> +
>>> +     return pages;
>>> +}
>>> +
>>> +/**
>>> + * __iommu_alloc_page_node - allocate a zeroed page at specific NUMA node.
>>> + * @nid: memory NUMA node id
>>> + * @gfp: buddy allocator flags
>>> + *
>>> + * returns the struct page of the allocated page.
>>> + */
>>> +static inline struct page *__iommu_alloc_page_node(int nid, gfp_t gfp)
>>> +{
>>> +     return __iommu_alloc_pages_node(nid, gfp, 0);
>>> +}
>>> +
>>> +/**
>>> + * __iommu_alloc_page - allocate a zeroed page
>>> + * @gfp: buddy allocator flags
>>> + *
>>> + * returns the struct page of the allocated page.
>>> + */
>>> +static inline struct page *__iommu_alloc_page(gfp_t gfp)
>>> +{
>>> +     return __iommu_alloc_pages(gfp, 0);
>>> +}
>>> +
>>> +/**
>>> + * __iommu_free_pages - free page of a given order
>>> + * @pages: head struct page of the page
>>
>> I think "pages" implies more than one page, this is just a (potentially
>> compound) page?
> 
> Yes, more than one page, basically, when order may be > 0.
> 
>>> +/**
>>> + * iommu_free_page - free page
>>> + * @virt: virtual address of the page to be freed.
>>> + */
>>> +static inline void iommu_free_page(void *virt)
>>> +{
>>> +     iommu_free_pages(virt, 0);
>>> +}
>>> +
>>> +/**
>>> + * iommu_free_pages_list - free a list of pages.
>>> + * @pages: the head of the lru list to be freed.
>>
>> Document the locking requirements for this?
> 
> Thank you for the review. I will add info about locking requirements,
> in fact they are very relaxed.
> 
> These pages are added to the list by unmaps or remaps operation in
> Intel IOMMU implementation. These calls assume that whoever is doing
> those operations has exclusive access to the VA range in the page
> table of that operation. The pages in this freelist only belong to the
> former page-tables from the IOVA range for those operations.

These pages maybe be accessed concurrently by thread contexts other than 
the IOMMU driver (such as debugfs).

Thanks,
Jingqi
> 
>>> + */
>>> +static inline void iommu_free_pages_list(struct list_head *pages)
>>> +{
>>> +     while (!list_empty(pages)) {
>>> +             struct page *p = list_entry(pages->prev, struct page, lru);
>>> +
>>> +             list_del(&p->lru);
>>> +             put_page(p);
>>> +     }
>>> +}
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 01/10] iommu/vt-d: add wrapper functions for page allocations
  2023-12-26  6:09       ` Liu, Jingqi
@ 2023-12-26 17:14         ` Pasha Tatashin
  0 siblings, 0 replies; 32+ messages in thread
From: Pasha Tatashin @ 2023-12-26 17:14 UTC (permalink / raw
  To: Liu, Jingqi
  Cc: David Rientjes, Andrew Morton, alim.akhtar, alyssa, asahi,
	baolu.lu, bhelgaas, cgroups, corbet, david, dwmw2, hannes, heiko,
	iommu, jernej.skrabec, jonathanh, joro, krzysztof.kozlowski,
	linux-doc, linux-fsdevel, linux-kernel, linux-mm, linux-rockchip,
	linux-samsung-soc, linux-sunxi, linux-tegra, lizefan.x, marcan,
	mhiramat, m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

> >> Document the locking requirements for this?
> >
> > Thank you for the review. I will add info about locking requirements,
> > in fact they are very relaxed.
> >
> > These pages are added to the list by unmaps or remaps operation in
> > Intel IOMMU implementation. These calls assume that whoever is doing
> > those operations has exclusive access to the VA range in the page
> > table of that operation. The pages in this freelist only belong to the
> > former page-tables from the IOVA range for those operations.
>
> These pages maybe be accessed concurrently by thread contexts other than
> the IOMMU driver (such as debugfs).

Good point regarding debugfs. While, it does not change any locking
assumptions, for this series. It might change some design decisions
that we need to make when freeing pages on unmaps (a separate RFC
series that I sent). I will have to study how debugfs affect refcnt
and mapcount, and whether we could use per-page page table locking if
needed.

Thanks,
Pasha

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 01/10] iommu/vt-d: add wrapper functions for page allocations
  2023-12-24 21:48         ` Matthew Wilcox
@ 2023-12-26 17:57           ` Pasha Tatashin
  0 siblings, 0 replies; 32+ messages in thread
From: Pasha Tatashin @ 2023-12-26 17:57 UTC (permalink / raw
  To: Matthew Wilcox
  Cc: David Rientjes, Andrew Morton, alim.akhtar, alyssa, asahi,
	baolu.lu, bhelgaas, cgroups, corbet, david, dwmw2, hannes, heiko,
	iommu, jernej.skrabec, jonathanh, joro, krzysztof.kozlowski,
	linux-doc, linux-fsdevel, linux-kernel, linux-mm, linux-rockchip,
	linux-samsung-soc, linux-sunxi, linux-tegra, lizefan.x, marcan,
	mhiramat, m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

On Sun, Dec 24, 2023 at 4:48 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sun, Dec 24, 2023 at 01:30:50PM -0800, David Rientjes wrote:
> > > > s/pages/page/ here and later in this file.
> > >
> > > In this file, where there a page with an "order", I reference it with
> > > "pages", when no order (i.e. order = 0), I reference it with "page"
> > >
> > > I.e.: __iommu_alloc_page vs. __iommu_alloc_pages
> > >
> >
> > Eh, the struct page points to a (potentially compound) page, not a set or
> > list of pages.  I won't bikeshed on it, but "struct page *pages" never
> > makes sense unless it's **pages or *pages[] :)
>
> I'd suggest that 'pages' also makes sense when _not_ using __GFP_COMP,
> as we do in fact allocate an array of pages in that case.
>
> That said, we shouldn't encourage the use of non-compound allocations.
> It would also be good for someone to define a memdesc for iommu memory
> like we have already for slab.  We'll need it eventually, and it'll work
> out better if someone who understands iommus (ie not me) does it.

I was thinking of adding an IOMMU page table memdesc, at least by
starting with Intel implementation.

- For efficient freeing on page-unmap we need a counter.
- We might also need a per-page page table locking (aka PTL type
lock), if the current approach I am proposing is not scalable enough.
- Access to debugfs (I am studying this now).

However, iommu memdesc would really make sense, once all the various
page table management IOMMU implementations are unified.

Pasha

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 04/10] iommu/io-pgtable-dart: use page allocation function provided by iommu-pages.h
  2023-12-24 21:36   ` David Rientjes
@ 2023-12-26 18:08     ` Pasha Tatashin
  0 siblings, 0 replies; 32+ messages in thread
From: Pasha Tatashin @ 2023-12-26 18:08 UTC (permalink / raw
  To: David Rientjes
  Cc: akpm, alim.akhtar, alyssa, asahi, baolu.lu, bhelgaas, cgroups,
	corbet, david, dwmw2, hannes, heiko, iommu, jernej.skrabec,
	jonathanh, joro, krzysztof.kozlowski, linux-doc, linux-fsdevel,
	linux-kernel, linux-mm, linux-rockchip, linux-samsung-soc,
	linux-sunxi, linux-tegra, lizefan.x, marcan, mhiramat,
	m.szyprowski, paulmck, rdunlap, robin.murphy, samuel,
	suravee.suthikulpanit, sven, thierry.reding, tj, tomas.mudrunka,
	vdumpa, wens, will, yu-cheng.yu

On Sun, Dec 24, 2023 at 4:36 PM David Rientjes <rientjes@google.com> wrote:
>
> On Thu, 30 Nov 2023, Pasha Tatashin wrote:
>
> > Convert iommu/io-pgtable-dart.c to use the new page allocation functions
> > provided in iommu-pages.h.
> >
>
> ... and remove unnecessary struct io_pgtable_cfg formal to
> __dart_alloc_pages() while there :)

Added to the commit log.

>
> > Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> > Reviewed-by: Janne Grunau <j@jannau.net>
>
> Acked-by: David Rientjes <rientjes@google.com>

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-12-26 18:09 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-30 20:14 [PATCH v2 00/10] IOMMU memory observability Pasha Tatashin
2023-11-30 20:14 ` [PATCH v2 01/10] iommu/vt-d: add wrapper functions for page allocations Pasha Tatashin
2023-12-14 17:58   ` David Rientjes
2023-12-14 19:16     ` Pasha Tatashin
2023-12-24 21:30       ` David Rientjes
2023-12-24 21:48         ` Matthew Wilcox
2023-12-26 17:57           ` Pasha Tatashin
2023-12-26  6:09       ` Liu, Jingqi
2023-12-26 17:14         ` Pasha Tatashin
2023-11-30 20:14 ` [PATCH v2 02/10] iommu/amd: use page allocation function provided by iommu-pages.h Pasha Tatashin
2023-12-24 21:34   ` David Rientjes
2023-11-30 20:14 ` [PATCH v2 03/10] iommu/io-pgtable-arm: " Pasha Tatashin
2023-12-24 21:34   ` David Rientjes
2023-11-30 20:14 ` [PATCH v2 04/10] iommu/io-pgtable-dart: " Pasha Tatashin
2023-12-24 21:36   ` David Rientjes
2023-12-26 18:08     ` Pasha Tatashin
2023-11-30 20:14 ` [PATCH v2 05/10] iommu/exynos: " Pasha Tatashin
2023-12-24 21:37   ` David Rientjes
2023-11-30 20:15 ` [PATCH v2 06/10] iommu/rockchip: " Pasha Tatashin
2023-12-24 21:39   ` David Rientjes
2023-11-30 20:15 ` [PATCH v2 07/10] iommu/sun50i: " Pasha Tatashin
2023-12-24 21:39   ` David Rientjes
2023-11-30 20:15 ` [PATCH v2 08/10] iommu/tegra-smmu: " Pasha Tatashin
2023-12-24 21:40   ` David Rientjes
2023-11-30 20:15 ` [PATCH v2 09/10] iommu: observability of the IOMMU allocations Pasha Tatashin
2023-12-14 17:59   ` David Rientjes
2023-12-14 19:18     ` Pasha Tatashin
2023-11-30 20:15 ` [PATCH v2 10/10] iommu: account IOMMU allocated memory Pasha Tatashin
2023-12-14 18:02   ` David Rientjes
2023-12-15 21:11     ` Pasha Tatashin
2023-12-24 21:44       ` David Rientjes
2023-12-24 21:49 ` [PATCH v2 00/10] IOMMU memory observability David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).