LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/3] mm/compaction: fix invalid free_pfn and compact_cached_free_pfn
@ 2016-02-04  6:19 Joonsoo Kim
  2016-02-04  6:19 ` [PATCH v2 2/3] mm/compaction: pass only pageblock aligned range to pageblock_pfn_to_page Joonsoo Kim
  2016-02-04  6:19 ` [PATCH v2 3/3] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous Joonsoo Kim
  0 siblings, 2 replies; 15+ messages in thread
From: Joonsoo Kim @ 2016-02-04  6:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, linux-kernel, linux-mm, Joonsoo Kim

free_pfn and compact_cached_free_pfn are the pointer that remember
restart position of freepage scanner. When they are reset or invalid,
we set them to zone_end_pfn because freepage scanner works in reverse
direction. But, because zone range is defined as [zone_start_pfn,
zone_end_pfn), zone_end_pfn is invalid to access. Therefore, we should
not store it to free_pfn and compact_cached_free_pfn. Instead, we need
to store zone_end_pfn - 1 to them. There is one more thing we should
consider. Freepage scanner scan reversely by pageblock unit. If free_pfn
and compact_cached_free_pfn are set to middle of pageblock, it regards
that sitiation as that it already scans front part of pageblock so we
lose opportunity to scan there. To fix-up, this patch do round_down()
to guarantee that reset position will be pageblock aligned.

Note that thanks to the current pageblock_pfn_to_page() implementation,
actual access to zone_end_pfn doesn't happen until now. But, following
patch will change pageblock_pfn_to_page() so this patch is needed
from now on.

Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/compaction.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 585de54..56fa321 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -200,7 +200,8 @@ static void reset_cached_positions(struct zone *zone)
 {
 	zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
 	zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
-	zone->compact_cached_free_pfn = zone_end_pfn(zone);
+	zone->compact_cached_free_pfn =
+			round_down(zone_end_pfn(zone) - 1, pageblock_nr_pages);
 }
 
 /*
@@ -1371,11 +1372,11 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 	 */
 	cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
 	cc->free_pfn = zone->compact_cached_free_pfn;
-	if (cc->free_pfn < start_pfn || cc->free_pfn > end_pfn) {
-		cc->free_pfn = end_pfn & ~(pageblock_nr_pages-1);
+	if (cc->free_pfn < start_pfn || cc->free_pfn >= end_pfn) {
+		cc->free_pfn = round_down(end_pfn - 1, pageblock_nr_pages);
 		zone->compact_cached_free_pfn = cc->free_pfn;
 	}
-	if (cc->migrate_pfn < start_pfn || cc->migrate_pfn > end_pfn) {
+	if (cc->migrate_pfn < start_pfn || cc->migrate_pfn >= end_pfn) {
 		cc->migrate_pfn = start_pfn;
 		zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn;
 		zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/3] mm/compaction: pass only pageblock aligned range to pageblock_pfn_to_page
  2016-02-04  6:19 [PATCH v2 1/3] mm/compaction: fix invalid free_pfn and compact_cached_free_pfn Joonsoo Kim
@ 2016-02-04  6:19 ` Joonsoo Kim
  2016-02-10 12:52   ` Vlastimil Babka
  2016-02-04  6:19 ` [PATCH v2 3/3] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous Joonsoo Kim
  1 sibling, 1 reply; 15+ messages in thread
From: Joonsoo Kim @ 2016-02-04  6:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, linux-kernel, linux-mm, Joonsoo Kim

pageblock_pfn_to_page() is used to check there is valid pfn and all pages
in the pageblock is in a single zone. If there is a hole in the pageblock,
passing arbitrary position to pageblock_pfn_to_page() could cause to skip
whole pageblock scanning, instead of just skipping the hole page. For
deterministic behaviour, it's better to always pass pageblock aligned
range to pageblock_pfn_to_page(). It will also help further optimization
on pageblock_pfn_to_page() in the following patch.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/compaction.c | 41 ++++++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 56fa321..8ce36eb 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -555,13 +555,17 @@ unsigned long
 isolate_freepages_range(struct compact_control *cc,
 			unsigned long start_pfn, unsigned long end_pfn)
 {
-	unsigned long isolated, pfn, block_end_pfn;
+	unsigned long isolated, pfn, block_start_pfn, block_end_pfn;
 	LIST_HEAD(freelist);
 
 	pfn = start_pfn;
+	block_start_pfn = pfn & ~(pageblock_nr_pages - 1);
+	if (block_start_pfn < cc->zone->zone_start_pfn)
+		block_start_pfn = cc->zone->zone_start_pfn;
 	block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
 
 	for (; pfn < end_pfn; pfn += isolated,
+				block_start_pfn = block_end_pfn,
 				block_end_pfn += pageblock_nr_pages) {
 		/* Protect pfn from changing by isolate_freepages_block */
 		unsigned long isolate_start_pfn = pfn;
@@ -574,11 +578,13 @@ isolate_freepages_range(struct compact_control *cc,
 		 * scanning range to right one.
 		 */
 		if (pfn >= block_end_pfn) {
+			block_start_pfn = pfn & ~(pageblock_nr_pages - 1);
 			block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
 			block_end_pfn = min(block_end_pfn, end_pfn);
 		}
 
-		if (!pageblock_pfn_to_page(pfn, block_end_pfn, cc->zone))
+		if (!pageblock_pfn_to_page(block_start_pfn,
+					block_end_pfn, cc->zone))
 			break;
 
 		isolated = isolate_freepages_block(cc, &isolate_start_pfn,
@@ -864,18 +870,23 @@ unsigned long
 isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
 							unsigned long end_pfn)
 {
-	unsigned long pfn, block_end_pfn;
+	unsigned long pfn, block_start_pfn, block_end_pfn;
 
 	/* Scan block by block. First and last block may be incomplete */
 	pfn = start_pfn;
+	block_start_pfn = pfn & ~(pageblock_nr_pages - 1);
+	if (block_start_pfn < cc->zone->zone_start_pfn)
+		block_start_pfn = cc->zone->zone_start_pfn;
 	block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
 
 	for (; pfn < end_pfn; pfn = block_end_pfn,
+				block_start_pfn = block_end_pfn,
 				block_end_pfn += pageblock_nr_pages) {
 
 		block_end_pfn = min(block_end_pfn, end_pfn);
 
-		if (!pageblock_pfn_to_page(pfn, block_end_pfn, cc->zone))
+		if (!pageblock_pfn_to_page(block_start_pfn,
+					block_end_pfn, cc->zone))
 			continue;
 
 		pfn = isolate_migratepages_block(cc, pfn, block_end_pfn,
@@ -1104,7 +1115,9 @@ int sysctl_compact_unevictable_allowed __read_mostly = 1;
 static isolate_migrate_t isolate_migratepages(struct zone *zone,
 					struct compact_control *cc)
 {
-	unsigned long low_pfn, end_pfn;
+	unsigned long block_start_pfn;
+	unsigned long block_end_pfn;
+	unsigned long low_pfn;
 	unsigned long isolate_start_pfn;
 	struct page *page;
 	const isolate_mode_t isolate_mode =
@@ -1116,16 +1129,21 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	 * initialized by compact_zone()
 	 */
 	low_pfn = cc->migrate_pfn;
+	block_start_pfn = cc->migrate_pfn & ~(pageblock_nr_pages - 1);
+	if (block_start_pfn < zone->zone_start_pfn)
+		block_start_pfn = zone->zone_start_pfn;
 
 	/* Only scan within a pageblock boundary */
-	end_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages);
+	block_end_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages);
 
 	/*
 	 * Iterate over whole pageblocks until we find the first suitable.
 	 * Do not cross the free scanner.
 	 */
-	for (; end_pfn <= cc->free_pfn;
-			low_pfn = end_pfn, end_pfn += pageblock_nr_pages) {
+	for (; block_end_pfn <= cc->free_pfn;
+			low_pfn = block_end_pfn,
+			block_start_pfn = block_end_pfn,
+			block_end_pfn += pageblock_nr_pages) {
 
 		/*
 		 * This can potentially iterate a massively long zone with
@@ -1136,7 +1154,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 						&& compact_should_abort(cc))
 			break;
 
-		page = pageblock_pfn_to_page(low_pfn, end_pfn, zone);
+		page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn,
+									zone);
 		if (!page)
 			continue;
 
@@ -1155,8 +1174,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 
 		/* Perform the isolation */
 		isolate_start_pfn = low_pfn;
-		low_pfn = isolate_migratepages_block(cc, low_pfn, end_pfn,
-								isolate_mode);
+		low_pfn = isolate_migratepages_block(cc, low_pfn,
+						block_end_pfn, isolate_mode);
 
 		if (!low_pfn || cc->contended) {
 			acct_isolated(zone, cc);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/3] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2016-02-04  6:19 [PATCH v2 1/3] mm/compaction: fix invalid free_pfn and compact_cached_free_pfn Joonsoo Kim
  2016-02-04  6:19 ` [PATCH v2 2/3] mm/compaction: pass only pageblock aligned range to pageblock_pfn_to_page Joonsoo Kim
@ 2016-02-04  6:19 ` Joonsoo Kim
  2016-02-05  0:49   ` Andrew Morton
  1 sibling, 1 reply; 15+ messages in thread
From: Joonsoo Kim @ 2016-02-04  6:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, linux-kernel, linux-mm, Joonsoo Kim

There is a performance drop report due to hugepage allocation and in there
half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
In that workload, compaction is triggered to make hugepage but most of
pageblocks are un-available for compaction due to pageblock type and
skip bit so compaction usually fails. Most costly operations in this case
is to find valid pageblock while scanning whole zone range. To check
if pageblock is valid to compact, valid pfn within pageblock is required
and we can obtain it by calling pageblock_pfn_to_page(). This function
checks whether pageblock is in a single zone and return valid pfn
if possible. Problem is that we need to check it every time before
scanning pageblock even if we re-visit it and this turns out to
be very expensive in this workload.

Although we have no way to skip this pageblock check in the system
where hole exists at arbitrary position, we can use cached value for
zone continuity and just do pfn_to_page() in the system where hole doesn't
exist. This optimization considerably speeds up in above workload.

Before vs After
Max: 1096 MB/s vs 1325 MB/s
Min: 635 MB/s 1015 MB/s
Avg: 899 MB/s 1194 MB/s

Avg is improved by roughly 30% [2].

[1]: http://www.spinics.net/lists/linux-mm/msg97378.html
[2]: https://lkml.org/lkml/2015/12/9/23

v3
o remove pfn_valid_within() check for all pages in the pageblock
because pageblock_pfn_to_page() is only called with pageblock aligned pfn.

v2
o checking zone continuity after initialization
o handle memory-hotplug case

Reported and Tested-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/gfp.h            |  6 ----
 include/linux/memory_hotplug.h |  3 ++
 include/linux/mmzone.h         |  2 ++
 mm/compaction.c                | 43 -----------------------
 mm/internal.h                  | 12 +++++++
 mm/memory_hotplug.c            |  9 +++++
 mm/page_alloc.c                | 79 +++++++++++++++++++++++++++++++++++++++++-
 7 files changed, 104 insertions(+), 50 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 28ad5f6..bd7fccc 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -515,13 +515,7 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
 void drain_all_pages(struct zone *zone);
 void drain_local_pages(struct zone *zone);
 
-#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 void page_alloc_init_late(void);
-#else
-static inline void page_alloc_init_late(void)
-{
-}
-#endif
 
 /*
  * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 4340599..e960b78 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -196,6 +196,9 @@ void put_online_mems(void);
 void mem_hotplug_begin(void);
 void mem_hotplug_done(void);
 
+extern void set_zone_contiguous(struct zone *zone);
+extern void clear_zone_contiguous(struct zone *zone);
+
 #else /* ! CONFIG_MEMORY_HOTPLUG */
 /*
  * Stub functions for when hotplug is off
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7b6c2cf..f12b950 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -520,6 +520,8 @@ struct zone {
 	bool			compact_blockskip_flush;
 #endif
 
+	bool			contiguous;
+
 	ZONE_PADDING(_pad3_)
 	/* Zone statistics */
 	atomic_long_t		vm_stat[NR_VM_ZONE_STAT_ITEMS];
diff --git a/mm/compaction.c b/mm/compaction.c
index 8ce36eb..93f71d9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -71,49 +71,6 @@ static inline bool migrate_async_suitable(int migratetype)
 	return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
 }
 
-/*
- * Check that the whole (or subset of) a pageblock given by the interval of
- * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
- * with the migration of free compaction scanner. The scanners then need to
- * use only pfn_valid_within() check for arches that allow holes within
- * pageblocks.
- *
- * Return struct page pointer of start_pfn, or NULL if checks were not passed.
- *
- * It's possible on some configurations to have a setup like node0 node1 node0
- * i.e. it's possible that all pages within a zones range of pages do not
- * belong to a single zone. We assume that a border between node0 and node1
- * can occur within a single pageblock, but not a node0 node1 node0
- * interleaving within a single pageblock. It is therefore sufficient to check
- * the first and last page of a pageblock and avoid checking each individual
- * page in a pageblock.
- */
-static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
-				unsigned long end_pfn, struct zone *zone)
-{
-	struct page *start_page;
-	struct page *end_page;
-
-	/* end_pfn is one past the range we are checking */
-	end_pfn--;
-
-	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
-		return NULL;
-
-	start_page = pfn_to_page(start_pfn);
-
-	if (page_zone(start_page) != zone)
-		return NULL;
-
-	end_page = pfn_to_page(end_pfn);
-
-	/* This gives a shorter code than deriving page_zone(end_page) */
-	if (page_zone_id(start_page) != page_zone_id(end_page))
-		return NULL;
-
-	return start_page;
-}
-
 #ifdef CONFIG_COMPACTION
 
 /* Do not skip compaction more than 64 times */
diff --git a/mm/internal.h b/mm/internal.h
index 9006ce1..9609755 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -140,6 +140,18 @@ __find_buddy_index(unsigned long page_idx, unsigned int order)
 	return page_idx ^ (1 << order);
 }
 
+extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
+				unsigned long end_pfn, struct zone *zone);
+
+static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
+				unsigned long end_pfn, struct zone *zone)
+{
+	if (zone->contiguous)
+		return pfn_to_page(start_pfn);
+
+	return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
+}
+
 extern int __isolate_free_page(struct page *page, unsigned int order);
 extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
 					unsigned int order);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4af58a3..f06916c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -509,6 +509,8 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
 	int start_sec, end_sec;
 	struct vmem_altmap *altmap;
 
+	clear_zone_contiguous(zone);
+
 	/* during initialize mem_map, align hot-added range to section */
 	start_sec = pfn_to_section_nr(phys_start_pfn);
 	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
@@ -540,6 +542,8 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
 	}
 	vmemmap_populate_print_last();
 
+	set_zone_contiguous(zone);
+
 	return err;
 }
 EXPORT_SYMBOL_GPL(__add_pages);
@@ -811,6 +815,8 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 		}
 	}
 
+	clear_zone_contiguous(zone);
+
 	/*
 	 * We can only remove entire sections
 	 */
@@ -826,6 +832,9 @@ int __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
 		if (ret)
 			break;
 	}
+
+	set_zone_contiguous(zone);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(__remove_pages);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d60c860..466f7ff 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1255,9 +1255,13 @@ free_range:
 	pgdat_init_report_one_done();
 	return 0;
 }
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 
 void __init page_alloc_init_late(void)
 {
+	struct zone *zone;
+
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 	int nid;
 
 	/* There will be num_node_state(N_MEMORY) threads */
@@ -1271,8 +1275,81 @@ void __init page_alloc_init_late(void)
 
 	/* Reinit limits that are based on free pages after the kernel is up */
 	files_maxfiles_init();
+#endif
+
+	for_each_populated_zone(zone)
+		set_zone_contiguous(zone);
+}
+
+/*
+ * Check that the whole (or subset of) a pageblock given by the interval of
+ * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
+ * with the migration of free compaction scanner. The scanners then need to
+ * use only pfn_valid_within() check for arches that allow holes within
+ * pageblocks.
+ *
+ * Return struct page pointer of start_pfn, or NULL if checks were not passed.
+ *
+ * It's possible on some configurations to have a setup like node0 node1 node0
+ * i.e. it's possible that all pages within a zones range of pages do not
+ * belong to a single zone. We assume that a border between node0 and node1
+ * can occur within a single pageblock, but not a node0 node1 node0
+ * interleaving within a single pageblock. It is therefore sufficient to check
+ * the first and last page of a pageblock and avoid checking each individual
+ * page in a pageblock.
+ */
+struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
+				unsigned long end_pfn, struct zone *zone)
+{
+	struct page *start_page;
+	struct page *end_page;
+
+	/* end_pfn is one past the range we are checking */
+	end_pfn--;
+
+	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
+		return NULL;
+
+	start_page = pfn_to_page(start_pfn);
+
+	if (page_zone(start_page) != zone)
+		return NULL;
+
+	end_page = pfn_to_page(end_pfn);
+
+	/* This gives a shorter code than deriving page_zone(end_page) */
+	if (page_zone_id(start_page) != page_zone_id(end_page))
+		return NULL;
+
+	return start_page;
+}
+
+void set_zone_contiguous(struct zone *zone)
+{
+	unsigned long block_start_pfn = zone->zone_start_pfn;
+	unsigned long block_end_pfn;
+	unsigned long pfn;
+
+	block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
+	for (; block_start_pfn < zone_end_pfn(zone);
+		block_start_pfn = block_end_pfn,
+		block_end_pfn += pageblock_nr_pages) {
+
+		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
+
+		if (!__pageblock_pfn_to_page(block_start_pfn,
+					block_end_pfn, zone))
+			return;
+	}
+
+	/* We confirm that there is no hole */
+	zone->contiguous = true;
+}
+
+void clear_zone_contiguous(struct zone *zone)
+{
+	zone->contiguous = false;
 }
-#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 
 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2016-02-04  6:19 ` [PATCH v2 3/3] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous Joonsoo Kim
@ 2016-02-05  0:49   ` Andrew Morton
  2016-02-05 16:11     ` Joonsoo Kim
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2016-02-05  0:49 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Vlastimil Babka, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, linux-kernel, linux-mm, Joonsoo Kim

On Thu,  4 Feb 2016 15:19:35 +0900 Joonsoo Kim <js1304@gmail.com> wrote:

> There is a performance drop report due to hugepage allocation and in there
> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
> In that workload, compaction is triggered to make hugepage but most of
> pageblocks are un-available for compaction due to pageblock type and
> skip bit so compaction usually fails. Most costly operations in this case
> is to find valid pageblock while scanning whole zone range. To check
> if pageblock is valid to compact, valid pfn within pageblock is required
> and we can obtain it by calling pageblock_pfn_to_page(). This function
> checks whether pageblock is in a single zone and return valid pfn
> if possible. Problem is that we need to check it every time before
> scanning pageblock even if we re-visit it and this turns out to
> be very expensive in this workload.
> 
> Although we have no way to skip this pageblock check in the system
> where hole exists at arbitrary position, we can use cached value for
> zone continuity and just do pfn_to_page() in the system where hole doesn't
> exist. This optimization considerably speeds up in above workload.
> 
> Before vs After
> Max: 1096 MB/s vs 1325 MB/s
> Min: 635 MB/s 1015 MB/s
> Avg: 899 MB/s 1194 MB/s
> 
> Avg is improved by roughly 30% [2].
> 
> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
> [2]: https://lkml.org/lkml/2015/12/9/23
> 
> ...
>
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -196,6 +196,9 @@ void put_online_mems(void);
>  void mem_hotplug_begin(void);
>  void mem_hotplug_done(void);
>  
> +extern void set_zone_contiguous(struct zone *zone);
> +extern void clear_zone_contiguous(struct zone *zone);
> +
>  #else /* ! CONFIG_MEMORY_HOTPLUG */
>  /*
>   * Stub functions for when hotplug is off

Was it really intended that these declarations only exist if
CONFIG_MEMORY_HOTPLUG?  Seems unrelated.

The i386 allnocofnig build fails in preditable ways so I fixed that up
as below, but it seems wrong.


From: Andrew Morton <akpm@linux-foundation.org>

Move CONFIG_MEMORY_HOTPLUG code into memory_hotplug.c, fix
CONFIG_MEMORY_HOTPLUG=n build.

Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/memory_hotplug.h |   12 +++++++++++-
 include/linux/mmzone.h         |    2 ++
 mm/memory_hotplug.c            |   27 +++++++++++++++++++++++++++
 mm/page_alloc.c                |   27 ---------------------------
 4 files changed, 40 insertions(+), 28 deletions(-)

diff -puN include/linux/memory_hotplug.h~mm-compaction-speed-up-pageblock_pfn_to_page-when-zone-is-contiguous-fix include/linux/memory_hotplug.h
--- a/include/linux/memory_hotplug.h~mm-compaction-speed-up-pageblock_pfn_to_page-when-zone-is-contiguous-fix
+++ a/include/linux/memory_hotplug.h
@@ -200,7 +200,10 @@ void mem_hotplug_done(void);
 
 extern void set_zone_contiguous(struct zone *zone);
 extern void clear_zone_contiguous(struct zone *zone);
-
+static inline bool zone_contiguous(struct zone *zone)
+{
+	return zone->contiguous;
+}
 #else /* ! CONFIG_MEMORY_HOTPLUG */
 /*
  * Stub functions for when hotplug is off
@@ -243,6 +246,13 @@ static inline void put_online_mems(void)
 static inline void mem_hotplug_begin(void) {}
 static inline void mem_hotplug_done(void) {}
 
+static inline void set_zone_contiguous(struct zone *zone) {}
+static inline void clear_zone_contiguous(struct zone *zone) {}
+static inline bool zone_contiguous(struct zone *zone)
+{
+	return false;
+}
+
 #endif /* ! CONFIG_MEMORY_HOTPLUG */
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
diff -puN include/linux/mmzone.h~mm-compaction-speed-up-pageblock_pfn_to_page-when-zone-is-contiguous-fix include/linux/mmzone.h
--- a/include/linux/mmzone.h~mm-compaction-speed-up-pageblock_pfn_to_page-when-zone-is-contiguous-fix
+++ a/include/linux/mmzone.h
@@ -522,7 +522,9 @@ struct zone {
 	bool			compact_blockskip_flush;
 #endif
 
+#ifdef CONFIG_MEMORY_HOTPLUG
 	bool			contiguous;
+#endif
 
 	ZONE_PADDING(_pad3_)
 	/* Zone statistics */
diff -puN mm/memory_hotplug.c~mm-compaction-speed-up-pageblock_pfn_to_page-when-zone-is-contiguous-fix mm/memory_hotplug.c
--- a/mm/memory_hotplug.c~mm-compaction-speed-up-pageblock_pfn_to_page-when-zone-is-contiguous-fix
+++ a/mm/memory_hotplug.c
@@ -130,6 +130,33 @@ void mem_hotplug_done(void)
 	memhp_lock_release();
 }
 
+void set_zone_contiguous(struct zone *zone)
+{
+	unsigned long block_start_pfn = zone->zone_start_pfn;
+	unsigned long block_end_pfn;
+	unsigned long pfn;
+
+	block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
+	for (; block_start_pfn < zone_end_pfn(zone);
+		block_start_pfn = block_end_pfn,
+		block_end_pfn += pageblock_nr_pages) {
+
+		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
+
+		if (!__pageblock_pfn_to_page(block_start_pfn,
+					block_end_pfn, zone))
+			return;
+	}
+
+	/* We confirm that there is no hole */
+	zone->contiguous = true;
+}
+
+void clear_zone_contiguous(struct zone *zone)
+{
+	zone->contiguous = false;
+}
+
 /* add this memory to iomem resource */
 static struct resource *register_memory_resource(u64 start, u64 size)
 {
diff -puN mm/page_alloc.c~mm-compaction-speed-up-pageblock_pfn_to_page-when-zone-is-contiguous-fix mm/page_alloc.c
--- a/mm/page_alloc.c~mm-compaction-speed-up-pageblock_pfn_to_page-when-zone-is-contiguous-fix
+++ a/mm/page_alloc.c
@@ -1347,33 +1347,6 @@ struct page *__pageblock_pfn_to_page(uns
 	return start_page;
 }
 
-void set_zone_contiguous(struct zone *zone)
-{
-	unsigned long block_start_pfn = zone->zone_start_pfn;
-	unsigned long block_end_pfn;
-	unsigned long pfn;
-
-	block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
-	for (; block_start_pfn < zone_end_pfn(zone);
-		block_start_pfn = block_end_pfn,
-		block_end_pfn += pageblock_nr_pages) {
-
-		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
-
-		if (!__pageblock_pfn_to_page(block_start_pfn,
-					block_end_pfn, zone))
-			return;
-	}
-
-	/* We confirm that there is no hole */
-	zone->contiguous = true;
-}
-
-void clear_zone_contiguous(struct zone *zone)
-{
-	zone->contiguous = false;
-}
-
 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
_

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2016-02-05  0:49   ` Andrew Morton
@ 2016-02-05 16:11     ` Joonsoo Kim
  2016-02-09 17:58       ` Vlastimil Babka
  2016-02-14 10:21       ` zhong jiang
  0 siblings, 2 replies; 15+ messages in thread
From: Joonsoo Kim @ 2016-02-05 16:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, LKML, Linux Memory Management List, Joonsoo Kim

2016-02-05 9:49 GMT+09:00 Andrew Morton <akpm@linux-foundation.org>:
> On Thu,  4 Feb 2016 15:19:35 +0900 Joonsoo Kim <js1304@gmail.com> wrote:
>
>> There is a performance drop report due to hugepage allocation and in there
>> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
>> In that workload, compaction is triggered to make hugepage but most of
>> pageblocks are un-available for compaction due to pageblock type and
>> skip bit so compaction usually fails. Most costly operations in this case
>> is to find valid pageblock while scanning whole zone range. To check
>> if pageblock is valid to compact, valid pfn within pageblock is required
>> and we can obtain it by calling pageblock_pfn_to_page(). This function
>> checks whether pageblock is in a single zone and return valid pfn
>> if possible. Problem is that we need to check it every time before
>> scanning pageblock even if we re-visit it and this turns out to
>> be very expensive in this workload.
>>
>> Although we have no way to skip this pageblock check in the system
>> where hole exists at arbitrary position, we can use cached value for
>> zone continuity and just do pfn_to_page() in the system where hole doesn't
>> exist. This optimization considerably speeds up in above workload.
>>
>> Before vs After
>> Max: 1096 MB/s vs 1325 MB/s
>> Min: 635 MB/s 1015 MB/s
>> Avg: 899 MB/s 1194 MB/s
>>
>> Avg is improved by roughly 30% [2].
>>
>> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
>> [2]: https://lkml.org/lkml/2015/12/9/23
>>
>> ...
>>
>> --- a/include/linux/memory_hotplug.h
>> +++ b/include/linux/memory_hotplug.h
>> @@ -196,6 +196,9 @@ void put_online_mems(void);
>>  void mem_hotplug_begin(void);
>>  void mem_hotplug_done(void);
>>
>> +extern void set_zone_contiguous(struct zone *zone);
>> +extern void clear_zone_contiguous(struct zone *zone);
>> +
>>  #else /* ! CONFIG_MEMORY_HOTPLUG */
>>  /*
>>   * Stub functions for when hotplug is off
>
> Was it really intended that these declarations only exist if
> CONFIG_MEMORY_HOTPLUG?  Seems unrelated.

These are called for caching memory layout whether it is contiguous
or not. So, they are always called in memory initialization. Then,
hotplug could change memory layout so they should be called
there, too. So, they are defined in page_alloc.c and exported only
if CONFIG_MEMORY_HOTPLUG.

> The i386 allnocofnig build fails in preditable ways so I fixed that up
> as below, but it seems wrong.

Yeah, it seems wrong to me. :)
Here goes fix.

----------->8------------
>From ed6add18bc361e00a7ac6746de6eeb62109e6416 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Date: Thu, 10 Dec 2015 17:03:54 +0900
Subject: [PATCH] mm/compaction: speed up pageblock_pfn_to_page() when zone is
 contiguous

There is a performance drop report due to hugepage allocation and in there
half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
In that workload, compaction is triggered to make hugepage but most of
pageblocks are un-available for compaction due to pageblock type and
skip bit so compaction usually fails. Most costly operations in this case
is to find valid pageblock while scanning whole zone range. To check
if pageblock is valid to compact, valid pfn within pageblock is required
and we can obtain it by calling pageblock_pfn_to_page(). This function
checks whether pageblock is in a single zone and return valid pfn
if possible. Problem is that we need to check it every time before
scanning pageblock even if we re-visit it and this turns out to
be very expensive in this workload.

Although we have no way to skip this pageblock check in the system
where hole exists at arbitrary position, we can use cached value for
zone continuity and just do pfn_to_page() in the system where hole doesn't
exist. This optimization considerably speeds up in above workload.

Before vs After
Max: 1096 MB/s vs 1325 MB/s
Min: 635 MB/s 1015 MB/s
Avg: 899 MB/s 1194 MB/s

Avg is improved by roughly 30% [2].

[1]: http://www.spinics.net/lists/linux-mm/msg97378.html
[2]: https://lkml.org/lkml/2015/12/9/23

v3
o remove pfn_valid_within() check for all pages in the pageblock
because pageblock_pfn_to_page() is only called with pageblock aligned pfn.

v2
o checking zone continuity after initialization
o handle memory-hotplug case

Reported and Tested-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/gfp.h            |  6 ----
 include/linux/memory_hotplug.h |  3 ++
 include/linux/mmzone.h         |  2 ++
 mm/compaction.c                | 43 -----------------------
 mm/internal.h                  | 12 +++++++
 mm/memory_hotplug.c            |  9 +++++
 mm/page_alloc.c                | 78 +++++++++++++++++++++++++++++++++++++++++-
 7 files changed, 103 insertions(+), 50 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 28ad5f6..bd7fccc 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -515,13 +515,7 @@ void drain_zone_pages(struct zone *zone, struct
per_cpu_pages *pcp);
 void drain_all_pages(struct zone *zone);
 void drain_local_pages(struct zone *zone);

-#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 void page_alloc_init_late(void);
-#else
-static inline void page_alloc_init_late(void)
-{
-}
-#endif

 /*
  * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 4340599..e960b78 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -196,6 +196,9 @@ void put_online_mems(void);
 void mem_hotplug_begin(void);
 void mem_hotplug_done(void);

+extern void set_zone_contiguous(struct zone *zone);
+extern void clear_zone_contiguous(struct zone *zone);
+
 #else /* ! CONFIG_MEMORY_HOTPLUG */
 /*
  * Stub functions for when hotplug is off
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7b6c2cf..f12b950 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -520,6 +520,8 @@ struct zone {
  bool compact_blockskip_flush;
 #endif

+ bool contiguous;
+
  ZONE_PADDING(_pad3_)
  /* Zone statistics */
  atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS];
diff --git a/mm/compaction.c b/mm/compaction.c
index 8ce36eb..93f71d9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -71,49 +71,6 @@ static inline bool migrate_async_suitable(int migratetype)
  return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
 }

-/*
- * Check that the whole (or subset of) a pageblock given by the interval of
- * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
- * with the migration of free compaction scanner. The scanners then need to
- * use only pfn_valid_within() check for arches that allow holes within
- * pageblocks.
- *
- * Return struct page pointer of start_pfn, or NULL if checks were not passed.
- *
- * It's possible on some configurations to have a setup like node0 node1 node0
- * i.e. it's possible that all pages within a zones range of pages do not
- * belong to a single zone. We assume that a border between node0 and node1
- * can occur within a single pageblock, but not a node0 node1 node0
- * interleaving within a single pageblock. It is therefore sufficient to check
- * the first and last page of a pageblock and avoid checking each individual
- * page in a pageblock.
- */
-static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
- unsigned long end_pfn, struct zone *zone)
-{
- struct page *start_page;
- struct page *end_page;
-
- /* end_pfn is one past the range we are checking */
- end_pfn--;
-
- if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
- return NULL;
-
- start_page = pfn_to_page(start_pfn);
-
- if (page_zone(start_page) != zone)
- return NULL;
-
- end_page = pfn_to_page(end_pfn);
-
- /* This gives a shorter code than deriving page_zone(end_page) */
- if (page_zone_id(start_page) != page_zone_id(end_page))
- return NULL;
-
- return start_page;
-}
-
 #ifdef CONFIG_COMPACTION

 /* Do not skip compaction more than 64 times */
diff --git a/mm/internal.h b/mm/internal.h
index 9006ce1..9609755 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -140,6 +140,18 @@ __find_buddy_index(unsigned long page_idx,
unsigned int order)
  return page_idx ^ (1 << order);
 }

+extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
+ unsigned long end_pfn, struct zone *zone);
+
+static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
+ unsigned long end_pfn, struct zone *zone)
+{
+ if (zone->contiguous)
+ return pfn_to_page(start_pfn);
+
+ return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
+}
+
 extern int __isolate_free_page(struct page *page, unsigned int order);
 extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
  unsigned int order);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4af58a3..f06916c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -509,6 +509,8 @@ int __ref __add_pages(int nid, struct zone *zone,
unsigned long phys_start_pfn,
  int start_sec, end_sec;
  struct vmem_altmap *altmap;

+ clear_zone_contiguous(zone);
+
  /* during initialize mem_map, align hot-added range to section */
  start_sec = pfn_to_section_nr(phys_start_pfn);
  end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
@@ -540,6 +542,8 @@ int __ref __add_pages(int nid, struct zone *zone,
unsigned long phys_start_pfn,
  }
  vmemmap_populate_print_last();

+ set_zone_contiguous(zone);
+
  return err;
 }
 EXPORT_SYMBOL_GPL(__add_pages);
@@ -811,6 +815,8 @@ int __remove_pages(struct zone *zone, unsigned
long phys_start_pfn,
  }
  }

+ clear_zone_contiguous(zone);
+
  /*
  * We can only remove entire sections
  */
@@ -826,6 +832,9 @@ int __remove_pages(struct zone *zone, unsigned
long phys_start_pfn,
  if (ret)
  break;
  }
+
+ set_zone_contiguous(zone);
+
  return ret;
 }
 EXPORT_SYMBOL_GPL(__remove_pages);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d60c860..059f9c0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1105,6 +1105,75 @@ void __init __free_pages_bootmem(struct page
*page, unsigned long pfn,
  return __free_pages_boot_core(page, pfn, order);
 }

+/*
+ * Check that the whole (or subset of) a pageblock given by the interval of
+ * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
+ * with the migration of free compaction scanner. The scanners then need to
+ * use only pfn_valid_within() check for arches that allow holes within
+ * pageblocks.
+ *
+ * Return struct page pointer of start_pfn, or NULL if checks were not passed.
+ *
+ * It's possible on some configurations to have a setup like node0 node1 node0
+ * i.e. it's possible that all pages within a zones range of pages do not
+ * belong to a single zone. We assume that a border between node0 and node1
+ * can occur within a single pageblock, but not a node0 node1 node0
+ * interleaving within a single pageblock. It is therefore sufficient to check
+ * the first and last page of a pageblock and avoid checking each individual
+ * page in a pageblock.
+ */
+struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
+ unsigned long end_pfn, struct zone *zone)
+{
+ struct page *start_page;
+ struct page *end_page;
+
+ /* end_pfn is one past the range we are checking */
+ end_pfn--;
+
+ if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
+ return NULL;
+
+ start_page = pfn_to_page(start_pfn);
+
+ if (page_zone(start_page) != zone)
+ return NULL;
+
+ end_page = pfn_to_page(end_pfn);
+
+ /* This gives a shorter code than deriving page_zone(end_page) */
+ if (page_zone_id(start_page) != page_zone_id(end_page))
+ return NULL;
+
+ return start_page;
+}
+
+void set_zone_contiguous(struct zone *zone)
+{
+ unsigned long block_start_pfn = zone->zone_start_pfn;
+ unsigned long block_end_pfn;
+
+ block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
+ for (; block_start_pfn < zone_end_pfn(zone);
+ block_start_pfn = block_end_pfn,
+ block_end_pfn += pageblock_nr_pages) {
+
+ block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
+
+ if (!__pageblock_pfn_to_page(block_start_pfn,
+ block_end_pfn, zone))
+ return;
+ }
+
+ /* We confirm that there is no hole */
+ zone->contiguous = true;
+}
+
+void clear_zone_contiguous(struct zone *zone)
+{
+ zone->contiguous = false;
+}
+
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 static void __init deferred_free_range(struct page *page,
  unsigned long pfn, int nr_pages)
@@ -1255,9 +1324,13 @@ free_range:
  pgdat_init_report_one_done();
  return 0;
 }
+#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */

 void __init page_alloc_init_late(void)
 {
+ struct zone *zone;
+
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
  int nid;

  /* There will be num_node_state(N_MEMORY) threads */
@@ -1271,8 +1344,11 @@ void __init page_alloc_init_late(void)

  /* Reinit limits that are based on free pages after the kernel is up */
  files_maxfiles_init();
+#endif
+
+ for_each_populated_zone(zone)
+ set_zone_contiguous(zone);
 }
-#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */

 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2016-02-05 16:11     ` Joonsoo Kim
@ 2016-02-09 17:58       ` Vlastimil Babka
  2016-02-09 20:53         ` Andrew Morton
  2016-02-14 10:21       ` zhong jiang
  1 sibling, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2016-02-09 17:58 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Aaron Lu, Mel Gorman, Rik van Riel, David Rientjes, LKML,
	Linux Memory Management List, Joonsoo Kim

On 02/05/2016 05:11 PM, Joonsoo Kim wrote:
> Yeah, it seems wrong to me. :)
> Here goes fix.

Doesn't apply for me, even after fixing the most obvious line wraps.
Seems like the version in mmotm is still your original patch and
Andrew's hotfix?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2016-02-09 17:58       ` Vlastimil Babka
@ 2016-02-09 20:53         ` Andrew Morton
  2016-02-10 13:42           ` Vlastimil Babka
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2016-02-09 20:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Joonsoo Kim, Aaron Lu, Mel Gorman, Rik van Riel, David Rientjes,
	LKML, Linux Memory Management List, Joonsoo Kim

On Tue, 9 Feb 2016 18:58:32 +0100 Vlastimil Babka <vbabka@suse.cz> wrote:

> On 02/05/2016 05:11 PM, Joonsoo Kim wrote:
> > Yeah, it seems wrong to me. :)
> > Here goes fix.
> 
> Doesn't apply for me, even after fixing the most obvious line wraps.
> Seems like the version in mmotm is still your original patch and
> Andrew's hotfix?

Yes, that patch was hopelessly mailer-mangled.  I painstakingly fixed
it up and generated the incremental:


From: Joonsoo Kim <js1304@gmail.com>
Subject: mm-compaction-speed-up-pageblock_pfn_to_page-when-zone-is-contiguous-v3

v3
o remove pfn_valid_within() check for all pages in the pageblock
because pageblock_pfn_to_page() is only called with pageblock aligned pfn.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reported-by: Aaron Lu <aaron.lu@intel.com>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |  139 ++++++++++++++++++++++------------------------
 1 file changed, 69 insertions(+), 70 deletions(-)

diff -puN mm/page_alloc.c~mm-compaction-speed-up-pageblock_pfn_to_page-when-zone-is-contiguous-v3 mm/page_alloc.c
--- a/mm/page_alloc.c~mm-compaction-speed-up-pageblock_pfn_to_page-when-zone-is-contiguous-v3
+++ a/mm/page_alloc.c
@@ -1128,6 +1128,75 @@ void __init __free_pages_bootmem(struct
 	return __free_pages_boot_core(page, pfn, order);
 }
 
+/*
+ * Check that the whole (or subset of) a pageblock given by the interval of
+ * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
+ * with the migration of free compaction scanner. The scanners then need to
+ * use only pfn_valid_within() check for arches that allow holes within
+ * pageblocks.
+ *
+ * Return struct page pointer of start_pfn, or NULL if checks were not passed.
+ *
+ * It's possible on some configurations to have a setup like node0 node1 node0
+ * i.e. it's possible that all pages within a zones range of pages do not
+ * belong to a single zone. We assume that a border between node0 and node1
+ * can occur within a single pageblock, but not a node0 node1 node0
+ * interleaving within a single pageblock. It is therefore sufficient to check
+ * the first and last page of a pageblock and avoid checking each individual
+ * page in a pageblock.
+ */
+struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
+				     unsigned long end_pfn, struct zone *zone)
+{
+	struct page *start_page;
+	struct page *end_page;
+
+	/* end_pfn is one past the range we are checking */
+	end_pfn--;
+
+	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
+		return NULL;
+
+	start_page = pfn_to_page(start_pfn);
+
+	if (page_zone(start_page) != zone)
+		return NULL;
+
+	end_page = pfn_to_page(end_pfn);
+
+	/* This gives a shorter code than deriving page_zone(end_page) */
+	if (page_zone_id(start_page) != page_zone_id(end_page))
+		return NULL;
+
+	return start_page;
+}
+
+void set_zone_contiguous(struct zone *zone)
+{
+	unsigned long block_start_pfn = zone->zone_start_pfn;
+	unsigned long block_end_pfn;
+
+	block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
+	for (; block_start_pfn < zone_end_pfn(zone);
+			block_start_pfn = block_end_pfn,
+			 block_end_pfn += pageblock_nr_pages) {
+
+		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
+
+		if (!__pageblock_pfn_to_page(block_start_pfn,
+					     block_end_pfn, zone))
+			return;
+	}
+
+	/* We confirm that there is no hole */
+	zone->contiguous = true;
+}
+
+void clear_zone_contiguous(struct zone *zone)
+{
+	zone->contiguous = false;
+}
+
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 static void __init deferred_free_range(struct page *page,
 					unsigned long pfn, int nr_pages)
@@ -1304,76 +1373,6 @@ void __init page_alloc_init_late(void)
 		set_zone_contiguous(zone);
 }
 
-/*
- * Check that the whole (or subset of) a pageblock given by the interval of
- * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
- * with the migration of free compaction scanner. The scanners then need to
- * use only pfn_valid_within() check for arches that allow holes within
- * pageblocks.
- *
- * Return struct page pointer of start_pfn, or NULL if checks were not passed.
- *
- * It's possible on some configurations to have a setup like node0 node1 node0
- * i.e. it's possible that all pages within a zones range of pages do not
- * belong to a single zone. We assume that a border between node0 and node1
- * can occur within a single pageblock, but not a node0 node1 node0
- * interleaving within a single pageblock. It is therefore sufficient to check
- * the first and last page of a pageblock and avoid checking each individual
- * page in a pageblock.
- */
-struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
-				unsigned long end_pfn, struct zone *zone)
-{
-	struct page *start_page;
-	struct page *end_page;
-
-	/* end_pfn is one past the range we are checking */
-	end_pfn--;
-
-	if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
-		return NULL;
-
-	start_page = pfn_to_page(start_pfn);
-
-	if (page_zone(start_page) != zone)
-		return NULL;
-
-	end_page = pfn_to_page(end_pfn);
-
-	/* This gives a shorter code than deriving page_zone(end_page) */
-	if (page_zone_id(start_page) != page_zone_id(end_page))
-		return NULL;
-
-	return start_page;
-}
-
-void set_zone_contiguous(struct zone *zone)
-{
-	unsigned long block_start_pfn = zone->zone_start_pfn;
-	unsigned long block_end_pfn;
-	unsigned long pfn;
-
-	block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
-	for (; block_start_pfn < zone_end_pfn(zone);
-		block_start_pfn = block_end_pfn,
-		block_end_pfn += pageblock_nr_pages) {
-
-		block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
-
-		if (!__pageblock_pfn_to_page(block_start_pfn,
-					block_end_pfn, zone))
-			return;
-	}
-
-	/* We confirm that there is no hole */
-	zone->contiguous = true;
-}
-
-void clear_zone_contiguous(struct zone *zone)
-{
-	zone->contiguous = false;
-}
-
 #ifdef CONFIG_CMA
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void __init init_cma_reserved_pageblock(struct page *page)
_

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] mm/compaction: pass only pageblock aligned range to pageblock_pfn_to_page
  2016-02-04  6:19 ` [PATCH v2 2/3] mm/compaction: pass only pageblock aligned range to pageblock_pfn_to_page Joonsoo Kim
@ 2016-02-10 12:52   ` Vlastimil Babka
  0 siblings, 0 replies; 15+ messages in thread
From: Vlastimil Babka @ 2016-02-10 12:52 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Aaron Lu, Mel Gorman, Rik van Riel, David Rientjes, linux-kernel,
	linux-mm, Joonsoo Kim

On 02/04/2016 07:19 AM, Joonsoo Kim wrote:
> pageblock_pfn_to_page() is used to check there is valid pfn and all pages
> in the pageblock is in a single zone. If there is a hole in the pageblock,
> passing arbitrary position to pageblock_pfn_to_page() could cause to skip
> whole pageblock scanning, instead of just skipping the hole page. For
> deterministic behaviour, it's better to always pass pageblock aligned
> range to pageblock_pfn_to_page(). It will also help further optimization
> on pageblock_pfn_to_page() in the following patch.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2016-02-09 20:53         ` Andrew Morton
@ 2016-02-10 13:42           ` Vlastimil Babka
  2016-02-10 18:58             ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2016-02-10 13:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joonsoo Kim, Aaron Lu, Mel Gorman, Rik van Riel, David Rientjes,
	LKML, Linux Memory Management List, Joonsoo Kim

On 02/09/2016 09:53 PM, Andrew Morton wrote:
> On Tue, 9 Feb 2016 18:58:32 +0100 Vlastimil Babka <vbabka@suse.cz> wrote:
> 
>> On 02/05/2016 05:11 PM, Joonsoo Kim wrote:
>>> Yeah, it seems wrong to me. :)
>>> Here goes fix.
>>
>> Doesn't apply for me, even after fixing the most obvious line wraps.
>> Seems like the version in mmotm is still your original patch and
>> Andrew's hotfix?
> 
> Yes, that patch was hopelessly mailer-mangled.  I painstakingly fixed
> it up and generated the incremental:

Thanks a lot. My review of the final patch also involved pain (due to
the cold, not the patch!).

You can take my Acked-by, but I also find the definitions of
set_zone_contiguous/clear_zone_contiguous() "in the header of the
consumer" (hotplug) somewhat unusual. It works, but e.g. mm/internal.h
would be more expected.

Then there's this:

> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -509,6 +509,8 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
>  	int start_sec, end_sec;
>  	struct vmem_altmap *altmap;
>  
> +	clear_zone_contiguous(zone);
> +
>  	/* during initialize mem_map, align hot-added range to section */
>  	start_sec = pfn_to_section_nr(phys_start_pfn);
>  	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
> @@ -540,6 +542,8 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
>  	}
>  	vmemmap_populate_print_last();
>  
> +	set_zone_contiguous(zone);
> +
>  	return err;
>  }
>  EXPORT_SYMBOL_GPL(__add_pages);

Between the clear and set, __add_pages() might return with -EINVAL,
leaving the flag cleared potentially forever. Not critical, probably
rare, but it should be possible to avoid this by moving the clear below
the altmap check?

Thanks,
Vlastimil

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2016-02-10 13:42           ` Vlastimil Babka
@ 2016-02-10 18:58             ` Andrew Morton
  2016-02-11  1:58               ` Joonsoo Kim
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2016-02-10 18:58 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Joonsoo Kim, Aaron Lu, Mel Gorman, Rik van Riel, David Rientjes,
	LKML, Linux Memory Management List, Joonsoo Kim

On Wed, 10 Feb 2016 14:42:57 +0100 Vlastimil Babka <vbabka@suse.cz> wrote:

> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -509,6 +509,8 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
> >  	int start_sec, end_sec;
> >  	struct vmem_altmap *altmap;
> >  
> > +	clear_zone_contiguous(zone);
> > +
> >  	/* during initialize mem_map, align hot-added range to section */
> >  	start_sec = pfn_to_section_nr(phys_start_pfn);
> >  	end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
> > @@ -540,6 +542,8 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
> >  	}
> >  	vmemmap_populate_print_last();
> >  
> > +	set_zone_contiguous(zone);
> > +
> >  	return err;
> >  }
> >  EXPORT_SYMBOL_GPL(__add_pages);
> 
> Between the clear and set, __add_pages() might return with -EINVAL,
> leaving the flag cleared potentially forever. Not critical, probably
> rare, but it should be possible to avoid this by moving the clear below
> the altmap check?

um, yes.  return-in-the-middle-of-a-function strikes again.

--- a/mm/memory_hotplug.c~mm-compaction-speed-up-pageblock_pfn_to_page-when-zone-is-contiguous-fix
+++ a/mm/memory_hotplug.c
@@ -526,7 +526,8 @@ int __ref __add_pages(int nid, struct zo
 		if (altmap->base_pfn != phys_start_pfn
 				|| vmem_altmap_offset(altmap) > nr_pages) {
 			pr_warn_once("memory add fail, invalid altmap\n");
-			return -EINVAL;
+			err = -EINVAL;
+			goto out;
 		}
 		altmap->alloc = 0;
 	}
@@ -544,9 +545,8 @@ int __ref __add_pages(int nid, struct zo
 		err = 0;
 	}
 	vmemmap_populate_print_last();
-
+out:
 	set_zone_contiguous(zone);
-
 	return err;
 }
 EXPORT_SYMBOL_GPL(__add_pages);
_

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2016-02-10 18:58             ` Andrew Morton
@ 2016-02-11  1:58               ` Joonsoo Kim
  0 siblings, 0 replies; 15+ messages in thread
From: Joonsoo Kim @ 2016-02-11  1:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Aaron Lu, Mel Gorman, Rik van Riel,
	David Rientjes, LKML, Linux Memory Management List, Joonsoo Kim

2016-02-11 3:58 GMT+09:00 Andrew Morton <akpm@linux-foundation.org>:
> On Wed, 10 Feb 2016 14:42:57 +0100 Vlastimil Babka <vbabka@suse.cz> wrote:
>
>> > --- a/mm/memory_hotplug.c
>> > +++ b/mm/memory_hotplug.c
>> > @@ -509,6 +509,8 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
>> >     int start_sec, end_sec;
>> >     struct vmem_altmap *altmap;
>> >
>> > +   clear_zone_contiguous(zone);
>> > +
>> >     /* during initialize mem_map, align hot-added range to section */
>> >     start_sec = pfn_to_section_nr(phys_start_pfn);
>> >     end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
>> > @@ -540,6 +542,8 @@ int __ref __add_pages(int nid, struct zone *zone, unsigned long phys_start_pfn,
>> >     }
>> >     vmemmap_populate_print_last();
>> >
>> > +   set_zone_contiguous(zone);
>> > +
>> >     return err;
>> >  }
>> >  EXPORT_SYMBOL_GPL(__add_pages);
>>
>> Between the clear and set, __add_pages() might return with -EINVAL,
>> leaving the flag cleared potentially forever. Not critical, probably
>> rare, but it should be possible to avoid this by moving the clear below
>> the altmap check?
>
> um, yes.  return-in-the-middle-of-a-function strikes again.
>
> --- a/mm/memory_hotplug.c~mm-compaction-speed-up-pageblock_pfn_to_page-when-zone-is-contiguous-fix
> +++ a/mm/memory_hotplug.c
> @@ -526,7 +526,8 @@ int __ref __add_pages(int nid, struct zo
>                 if (altmap->base_pfn != phys_start_pfn
>                                 || vmem_altmap_offset(altmap) > nr_pages) {
>                         pr_warn_once("memory add fail, invalid altmap\n");
> -                       return -EINVAL;
> +                       err = -EINVAL;
> +                       goto out;
>                 }
>                 altmap->alloc = 0;
>         }
> @@ -544,9 +545,8 @@ int __ref __add_pages(int nid, struct zo
>                 err = 0;
>         }
>         vmemmap_populate_print_last();
> -
> +out:
>         set_zone_contiguous(zone);
> -
>         return err;
>  }
>  EXPORT_SYMBOL_GPL(__add_pages);

Sorry for late response. I was on biggest holiday in Korea until now.
It seems that there is no issue left.
Andrew, Vlastimil, thanks for fixes and review.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2016-02-05 16:11     ` Joonsoo Kim
  2016-02-09 17:58       ` Vlastimil Babka
@ 2016-02-14 10:21       ` zhong jiang
  2016-02-15  2:42         ` Joonsoo Kim
  1 sibling, 1 reply; 15+ messages in thread
From: zhong jiang @ 2016-02-14 10:21 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Vlastimil Babka, Aaron Lu, Mel Gorman,
	Rik van Riel, David Rientjes, LKML, Linux Memory Management List,
	Joonsoo Kim, Xishi Qiu

On 2016/2/6 0:11, Joonsoo Kim wrote:
> 2016-02-05 9:49 GMT+09:00 Andrew Morton <akpm@linux-foundation.org>:
>> On Thu,  4 Feb 2016 15:19:35 +0900 Joonsoo Kim <js1304@gmail.com> wrote:
>>
>>> There is a performance drop report due to hugepage allocation and in there
>>> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
>>> In that workload, compaction is triggered to make hugepage but most of
>>> pageblocks are un-available for compaction due to pageblock type and
>>> skip bit so compaction usually fails. Most costly operations in this case
>>> is to find valid pageblock while scanning whole zone range. To check
>>> if pageblock is valid to compact, valid pfn within pageblock is required
>>> and we can obtain it by calling pageblock_pfn_to_page(). This function
>>> checks whether pageblock is in a single zone and return valid pfn
>>> if possible. Problem is that we need to check it every time before
>>> scanning pageblock even if we re-visit it and this turns out to
>>> be very expensive in this workload.
>>>
>>> Although we have no way to skip this pageblock check in the system
>>> where hole exists at arbitrary position, we can use cached value for
>>> zone continuity and just do pfn_to_page() in the system where hole doesn't
>>> exist. This optimization considerably speeds up in above workload.
>>>
>>> Before vs After
>>> Max: 1096 MB/s vs 1325 MB/s
>>> Min: 635 MB/s 1015 MB/s
>>> Avg: 899 MB/s 1194 MB/s
>>>
>>> Avg is improved by roughly 30% [2].
>>>
>>> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
>>> [2]: https://lkml.org/lkml/2015/12/9/23
>>>
>>> ...
>>>
>>> --- a/include/linux/memory_hotplug.h
>>> +++ b/include/linux/memory_hotplug.h
>>> @@ -196,6 +196,9 @@ void put_online_mems(void);
>>>  void mem_hotplug_begin(void);
>>>  void mem_hotplug_done(void);
>>>
>>> +extern void set_zone_contiguous(struct zone *zone);
>>> +extern void clear_zone_contiguous(struct zone *zone);
>>> +
>>>  #else /* ! CONFIG_MEMORY_HOTPLUG */
>>>  /*
>>>   * Stub functions for when hotplug is off
>>
>> Was it really intended that these declarations only exist if
>> CONFIG_MEMORY_HOTPLUG?  Seems unrelated.
> 
> These are called for caching memory layout whether it is contiguous
> or not. So, they are always called in memory initialization. Then,
> hotplug could change memory layout so they should be called
> there, too. So, they are defined in page_alloc.c and exported only
> if CONFIG_MEMORY_HOTPLUG.
> 
>> The i386 allnocofnig build fails in preditable ways so I fixed that up
>> as below, but it seems wrong.
> 
> Yeah, it seems wrong to me. :)
> Here goes fix.
> 
> ----------->8------------
>>From ed6add18bc361e00a7ac6746de6eeb62109e6416 Mon Sep 17 00:00:00 2001
> From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Date: Thu, 10 Dec 2015 17:03:54 +0900
> Subject: [PATCH] mm/compaction: speed up pageblock_pfn_to_page() when zone is
>  contiguous
> 
> There is a performance drop report due to hugepage allocation and in there
> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
> In that workload, compaction is triggered to make hugepage but most of
> pageblocks are un-available for compaction due to pageblock type and
> skip bit so compaction usually fails. Most costly operations in this case
> is to find valid pageblock while scanning whole zone range. To check
> if pageblock is valid to compact, valid pfn within pageblock is required
> and we can obtain it by calling pageblock_pfn_to_page(). This function
> checks whether pageblock is in a single zone and return valid pfn
> if possible. Problem is that we need to check it every time before
> scanning pageblock even if we re-visit it and this turns out to
> be very expensive in this workload.
> 
> Although we have no way to skip this pageblock check in the system
> where hole exists at arbitrary position, we can use cached value for
> zone continuity and just do pfn_to_page() in the system where hole doesn't
> exist. This optimization considerably speeds up in above workload.
> 
> Before vs After
> Max: 1096 MB/s vs 1325 MB/s
> Min: 635 MB/s 1015 MB/s
> Avg: 899 MB/s 1194 MB/s
> 
> Avg is improved by roughly 30% [2].
> 
> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
> [2]: https://lkml.org/lkml/2015/12/9/23
> 
> v3
> o remove pfn_valid_within() check for all pages in the pageblock
> because pageblock_pfn_to_page() is only called with pageblock aligned pfn.

I have a question about the zone continuity. because hole exists at
arbitrary position in a page block. Therefore, only pageblock_pf_to_page()
is insufficiency, whether pageblock aligned pfn or not , the pfn_valid_within()
is necessary.

eh: 120M-122M is a range of page block, but the 120.5M-121.5M is holes, only by
pageblock_pfn_to_page() to conclude in the result is inaccurate

Thanks
zhongjiang

> v2
> o checking zone continuity after initialization
> o handle memory-hotplug case
> 
> Reported and Tested-by: Aaron Lu <aaron.lu@intel.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  include/linux/gfp.h            |  6 ----
>  include/linux/memory_hotplug.h |  3 ++
>  include/linux/mmzone.h         |  2 ++
>  mm/compaction.c                | 43 -----------------------
>  mm/internal.h                  | 12 +++++++
>  mm/memory_hotplug.c            |  9 +++++
>  mm/page_alloc.c                | 78 +++++++++++++++++++++++++++++++++++++++++-
>  7 files changed, 103 insertions(+), 50 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 28ad5f6..bd7fccc 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -515,13 +515,7 @@ void drain_zone_pages(struct zone *zone, struct
> per_cpu_pages *pcp);
>  void drain_all_pages(struct zone *zone);
>  void drain_local_pages(struct zone *zone);
> 
> -#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>  void page_alloc_init_late(void);
> -#else
> -static inline void page_alloc_init_late(void)
> -{
> -}
> -#endif
> 
>  /*
>   * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 4340599..e960b78 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -196,6 +196,9 @@ void put_online_mems(void);
>  void mem_hotplug_begin(void);
>  void mem_hotplug_done(void);
> 
> +extern void set_zone_contiguous(struct zone *zone);
> +extern void clear_zone_contiguous(struct zone *zone);
> +
>  #else /* ! CONFIG_MEMORY_HOTPLUG */
>  /*
>   * Stub functions for when hotplug is off
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 7b6c2cf..f12b950 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -520,6 +520,8 @@ struct zone {
>   bool compact_blockskip_flush;
>  #endif
> 
> + bool contiguous;
> +
>   ZONE_PADDING(_pad3_)
>   /* Zone statistics */
>   atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS];
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 8ce36eb..93f71d9 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -71,49 +71,6 @@ static inline bool migrate_async_suitable(int migratetype)
>   return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
>  }
> 
> -/*
> - * Check that the whole (or subset of) a pageblock given by the interval of
> - * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
> - * with the migration of free compaction scanner. The scanners then need to
> - * use only pfn_valid_within() check for arches that allow holes within
> - * pageblocks.
> - *
> - * Return struct page pointer of start_pfn, or NULL if checks were not passed.
> - *
> - * It's possible on some configurations to have a setup like node0 node1 node0
> - * i.e. it's possible that all pages within a zones range of pages do not
> - * belong to a single zone. We assume that a border between node0 and node1
> - * can occur within a single pageblock, but not a node0 node1 node0
> - * interleaving within a single pageblock. It is therefore sufficient to check
> - * the first and last page of a pageblock and avoid checking each individual
> - * page in a pageblock.
> - */
> -static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
> - unsigned long end_pfn, struct zone *zone)
> -{
> - struct page *start_page;
> - struct page *end_page;
> -
> - /* end_pfn is one past the range we are checking */
> - end_pfn--;
> -
> - if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
> - return NULL;
> -
> - start_page = pfn_to_page(start_pfn);
> -
> - if (page_zone(start_page) != zone)
> - return NULL;
> -
> - end_page = pfn_to_page(end_pfn);
> -
> - /* This gives a shorter code than deriving page_zone(end_page) */
> - if (page_zone_id(start_page) != page_zone_id(end_page))
> - return NULL;
> -
> - return start_page;
> -}
> -
>  #ifdef CONFIG_COMPACTION
> 
>  /* Do not skip compaction more than 64 times */
> diff --git a/mm/internal.h b/mm/internal.h
> index 9006ce1..9609755 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -140,6 +140,18 @@ __find_buddy_index(unsigned long page_idx,
> unsigned int order)
>   return page_idx ^ (1 << order);
>  }
> 
> +extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
> + unsigned long end_pfn, struct zone *zone);
> +
> +static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
> + unsigned long end_pfn, struct zone *zone)
> +{
> + if (zone->contiguous)
> + return pfn_to_page(start_pfn);
> +
> + return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
> +}
> +
>  extern int __isolate_free_page(struct page *page, unsigned int order);
>  extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
>   unsigned int order);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 4af58a3..f06916c 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -509,6 +509,8 @@ int __ref __add_pages(int nid, struct zone *zone,
> unsigned long phys_start_pfn,
>   int start_sec, end_sec;
>   struct vmem_altmap *altmap;
> 
> + clear_zone_contiguous(zone);
> +
>   /* during initialize mem_map, align hot-added range to section */
>   start_sec = pfn_to_section_nr(phys_start_pfn);
>   end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
> @@ -540,6 +542,8 @@ int __ref __add_pages(int nid, struct zone *zone,
> unsigned long phys_start_pfn,
>   }
>   vmemmap_populate_print_last();
> 
> + set_zone_contiguous(zone);
> +
>   return err;
>  }
>  EXPORT_SYMBOL_GPL(__add_pages);
> @@ -811,6 +815,8 @@ int __remove_pages(struct zone *zone, unsigned
> long phys_start_pfn,
>   }
>   }
> 
> + clear_zone_contiguous(zone);
> +
>   /*
>   * We can only remove entire sections
>   */
> @@ -826,6 +832,9 @@ int __remove_pages(struct zone *zone, unsigned
> long phys_start_pfn,
>   if (ret)
>   break;
>   }
> +
> + set_zone_contiguous(zone);
> +
>   return ret;
>  }
>  EXPORT_SYMBOL_GPL(__remove_pages);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d60c860..059f9c0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1105,6 +1105,75 @@ void __init __free_pages_bootmem(struct page
> *page, unsigned long pfn,
>   return __free_pages_boot_core(page, pfn, order);
>  }
> 
> +/*
> + * Check that the whole (or subset of) a pageblock given by the interval of
> + * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
> + * with the migration of free compaction scanner. The scanners then need to
> + * use only pfn_valid_within() check for arches that allow holes within
> + * pageblocks.
> + *
> + * Return struct page pointer of start_pfn, or NULL if checks were not passed.
> + *
> + * It's possible on some configurations to have a setup like node0 node1 node0
> + * i.e. it's possible that all pages within a zones range of pages do not
> + * belong to a single zone. We assume that a border between node0 and node1
> + * can occur within a single pageblock, but not a node0 node1 node0
> + * interleaving within a single pageblock. It is therefore sufficient to check
> + * the first and last page of a pageblock and avoid checking each individual
> + * page in a pageblock.
> + */
> +struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
> + unsigned long end_pfn, struct zone *zone)
> +{
> + struct page *start_page;
> + struct page *end_page;
> +
> + /* end_pfn is one past the range we are checking */
> + end_pfn--;
> +
> + if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
> + return NULL;
> +
> + start_page = pfn_to_page(start_pfn);
> +
> + if (page_zone(start_page) != zone)
> + return NULL;
> +
> + end_page = pfn_to_page(end_pfn);
> +
> + /* This gives a shorter code than deriving page_zone(end_page) */
> + if (page_zone_id(start_page) != page_zone_id(end_page))
> + return NULL;
> +
> + return start_page;
> +}
> +
> +void set_zone_contiguous(struct zone *zone)
> +{
> + unsigned long block_start_pfn = zone->zone_start_pfn;
> + unsigned long block_end_pfn;
> +
> + block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
> + for (; block_start_pfn < zone_end_pfn(zone);
> + block_start_pfn = block_end_pfn,
> + block_end_pfn += pageblock_nr_pages) {
> +
> + block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
> +
> + if (!__pageblock_pfn_to_page(block_start_pfn,
> + block_end_pfn, zone))
> + return;
> + }
> +
> + /* We confirm that there is no hole */
> + zone->contiguous = true;
> +}
> +
> +void clear_zone_contiguous(struct zone *zone)
> +{
> + zone->contiguous = false;
> +}
> +
>  #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>  static void __init deferred_free_range(struct page *page,
>   unsigned long pfn, int nr_pages)
> @@ -1255,9 +1324,13 @@ free_range:
>   pgdat_init_report_one_done();
>   return 0;
>  }
> +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
> 
>  void __init page_alloc_init_late(void)
>  {
> + struct zone *zone;
> +
> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>   int nid;
> 
>   /* There will be num_node_state(N_MEMORY) threads */
> @@ -1271,8 +1344,11 @@ void __init page_alloc_init_late(void)
> 
>   /* Reinit limits that are based on free pages after the kernel is up */
>   files_maxfiles_init();
> +#endif
> +
> + for_each_populated_zone(zone)
> + set_zone_contiguous(zone);
>  }
> -#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
> 
>  #ifdef CONFIG_CMA
>  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2016-02-14 10:21       ` zhong jiang
@ 2016-02-15  2:42         ` Joonsoo Kim
  2016-02-15 10:06           ` Xishi Qiu
  0 siblings, 1 reply; 15+ messages in thread
From: Joonsoo Kim @ 2016-02-15  2:42 UTC (permalink / raw)
  To: zhong jiang
  Cc: Andrew Morton, Vlastimil Babka, Aaron Lu, Mel Gorman,
	Rik van Riel, David Rientjes, LKML, Linux Memory Management List,
	Xishi Qiu

On Sun, Feb 14, 2016 at 06:21:03PM +0800, zhong jiang wrote:
> On 2016/2/6 0:11, Joonsoo Kim wrote:
> > 2016-02-05 9:49 GMT+09:00 Andrew Morton <akpm@linux-foundation.org>:
> >> On Thu,  4 Feb 2016 15:19:35 +0900 Joonsoo Kim <js1304@gmail.com> wrote:
> >>
> >>> There is a performance drop report due to hugepage allocation and in there
> >>> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
> >>> In that workload, compaction is triggered to make hugepage but most of
> >>> pageblocks are un-available for compaction due to pageblock type and
> >>> skip bit so compaction usually fails. Most costly operations in this case
> >>> is to find valid pageblock while scanning whole zone range. To check
> >>> if pageblock is valid to compact, valid pfn within pageblock is required
> >>> and we can obtain it by calling pageblock_pfn_to_page(). This function
> >>> checks whether pageblock is in a single zone and return valid pfn
> >>> if possible. Problem is that we need to check it every time before
> >>> scanning pageblock even if we re-visit it and this turns out to
> >>> be very expensive in this workload.
> >>>
> >>> Although we have no way to skip this pageblock check in the system
> >>> where hole exists at arbitrary position, we can use cached value for
> >>> zone continuity and just do pfn_to_page() in the system where hole doesn't
> >>> exist. This optimization considerably speeds up in above workload.
> >>>
> >>> Before vs After
> >>> Max: 1096 MB/s vs 1325 MB/s
> >>> Min: 635 MB/s 1015 MB/s
> >>> Avg: 899 MB/s 1194 MB/s
> >>>
> >>> Avg is improved by roughly 30% [2].
> >>>
> >>> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
> >>> [2]: https://lkml.org/lkml/2015/12/9/23
> >>>
> >>> ...
> >>>
> >>> --- a/include/linux/memory_hotplug.h
> >>> +++ b/include/linux/memory_hotplug.h
> >>> @@ -196,6 +196,9 @@ void put_online_mems(void);
> >>>  void mem_hotplug_begin(void);
> >>>  void mem_hotplug_done(void);
> >>>
> >>> +extern void set_zone_contiguous(struct zone *zone);
> >>> +extern void clear_zone_contiguous(struct zone *zone);
> >>> +
> >>>  #else /* ! CONFIG_MEMORY_HOTPLUG */
> >>>  /*
> >>>   * Stub functions for when hotplug is off
> >>
> >> Was it really intended that these declarations only exist if
> >> CONFIG_MEMORY_HOTPLUG?  Seems unrelated.
> > 
> > These are called for caching memory layout whether it is contiguous
> > or not. So, they are always called in memory initialization. Then,
> > hotplug could change memory layout so they should be called
> > there, too. So, they are defined in page_alloc.c and exported only
> > if CONFIG_MEMORY_HOTPLUG.
> > 
> >> The i386 allnocofnig build fails in preditable ways so I fixed that up
> >> as below, but it seems wrong.
> > 
> > Yeah, it seems wrong to me. :)
> > Here goes fix.
> > 
> > ----------->8------------
> >>From ed6add18bc361e00a7ac6746de6eeb62109e6416 Mon Sep 17 00:00:00 2001
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Date: Thu, 10 Dec 2015 17:03:54 +0900
> > Subject: [PATCH] mm/compaction: speed up pageblock_pfn_to_page() when zone is
> >  contiguous
> > 
> > There is a performance drop report due to hugepage allocation and in there
> > half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
> > In that workload, compaction is triggered to make hugepage but most of
> > pageblocks are un-available for compaction due to pageblock type and
> > skip bit so compaction usually fails. Most costly operations in this case
> > is to find valid pageblock while scanning whole zone range. To check
> > if pageblock is valid to compact, valid pfn within pageblock is required
> > and we can obtain it by calling pageblock_pfn_to_page(). This function
> > checks whether pageblock is in a single zone and return valid pfn
> > if possible. Problem is that we need to check it every time before
> > scanning pageblock even if we re-visit it and this turns out to
> > be very expensive in this workload.
> > 
> > Although we have no way to skip this pageblock check in the system
> > where hole exists at arbitrary position, we can use cached value for
> > zone continuity and just do pfn_to_page() in the system where hole doesn't
> > exist. This optimization considerably speeds up in above workload.
> > 
> > Before vs After
> > Max: 1096 MB/s vs 1325 MB/s
> > Min: 635 MB/s 1015 MB/s
> > Avg: 899 MB/s 1194 MB/s
> > 
> > Avg is improved by roughly 30% [2].
> > 
> > [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
> > [2]: https://lkml.org/lkml/2015/12/9/23
> > 
> > v3
> > o remove pfn_valid_within() check for all pages in the pageblock
> > because pageblock_pfn_to_page() is only called with pageblock aligned pfn.
> 
> I have a question about the zone continuity. because hole exists at
> arbitrary position in a page block. Therefore, only pageblock_pf_to_page()
> is insufficiency, whether pageblock aligned pfn or not , the pfn_valid_within()
> is necessary.
> 
> eh: 120M-122M is a range of page block, but the 120.5M-121.5M is holes, only by
> pageblock_pfn_to_page() to conclude in the result is inaccurate

contiguous may be misleading word. It doesn't represent there are no
hole. It only represents that all pageblocks within zone span belong to
corresponding zone and validity of all pageblock aligned pfn is
checked. So, if it is set, we can safely call pfn_to_page() for pageblock
aligned pfn in that zone without checking pfn_valid().

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2016-02-15  2:42         ` Joonsoo Kim
@ 2016-02-15 10:06           ` Xishi Qiu
  2016-02-15 14:24             ` Joonsoo Kim
  0 siblings, 1 reply; 15+ messages in thread
From: Xishi Qiu @ 2016-02-15 10:06 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: zhong jiang, Andrew Morton, Vlastimil Babka, Aaron Lu, Mel Gorman,
	Rik van Riel, David Rientjes, LKML, Linux Memory Management List

On 2016/2/15 10:42, Joonsoo Kim wrote:

>>
>> I have a question about the zone continuity. because hole exists at
>> arbitrary position in a page block. Therefore, only pageblock_pf_to_page()
>> is insufficiency, whether pageblock aligned pfn or not , the pfn_valid_within()
>> is necessary.
>>
>> eh: 120M-122M is a range of page block, but the 120.5M-121.5M is holes, only by
>> pageblock_pfn_to_page() to conclude in the result is inaccurate
> 
> contiguous may be misleading word. It doesn't represent there are no
> hole. It only represents that all pageblocks within zone span belong to
> corresponding zone and validity of all pageblock aligned pfn is
> checked. So, if it is set, we can safely call pfn_to_page() for pageblock
> aligned pfn in that zone without checking pfn_valid().
> 
> Thanks.
> 

Hi Joonsoo,

So "contiguous" here only means that struct page is exist, and don't care whether
the memory is exist, right?

Thanks,
Xishi Qiu

> 
> .
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
  2016-02-15 10:06           ` Xishi Qiu
@ 2016-02-15 14:24             ` Joonsoo Kim
  0 siblings, 0 replies; 15+ messages in thread
From: Joonsoo Kim @ 2016-02-15 14:24 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Joonsoo Kim, zhong jiang, Andrew Morton, Vlastimil Babka,
	Aaron Lu, Mel Gorman, Rik van Riel, David Rientjes, LKML,
	Linux Memory Management List

2016-02-15 19:06 GMT+09:00 Xishi Qiu <qiuxishi@huawei.com>:
> On 2016/2/15 10:42, Joonsoo Kim wrote:
>
>>>
>>> I have a question about the zone continuity. because hole exists at
>>> arbitrary position in a page block. Therefore, only pageblock_pf_to_page()
>>> is insufficiency, whether pageblock aligned pfn or not , the pfn_valid_within()
>>> is necessary.
>>>
>>> eh: 120M-122M is a range of page block, but the 120.5M-121.5M is holes, only by
>>> pageblock_pfn_to_page() to conclude in the result is inaccurate
>>
>> contiguous may be misleading word. It doesn't represent there are no
>> hole. It only represents that all pageblocks within zone span belong to
>> corresponding zone and validity of all pageblock aligned pfn is
>> checked. So, if it is set, we can safely call pfn_to_page() for pageblock
>> aligned pfn in that zone without checking pfn_valid().
>>
>> Thanks.
>>
>
> Hi Joonsoo,
>
> So "contiguous" here only means that struct page is exist, and don't care whether
> the memory is exist, right?

Yes.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-02-15 14:24 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-04  6:19 [PATCH v2 1/3] mm/compaction: fix invalid free_pfn and compact_cached_free_pfn Joonsoo Kim
2016-02-04  6:19 ` [PATCH v2 2/3] mm/compaction: pass only pageblock aligned range to pageblock_pfn_to_page Joonsoo Kim
2016-02-10 12:52   ` Vlastimil Babka
2016-02-04  6:19 ` [PATCH v2 3/3] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous Joonsoo Kim
2016-02-05  0:49   ` Andrew Morton
2016-02-05 16:11     ` Joonsoo Kim
2016-02-09 17:58       ` Vlastimil Babka
2016-02-09 20:53         ` Andrew Morton
2016-02-10 13:42           ` Vlastimil Babka
2016-02-10 18:58             ` Andrew Morton
2016-02-11  1:58               ` Joonsoo Kim
2016-02-14 10:21       ` zhong jiang
2016-02-15  2:42         ` Joonsoo Kim
2016-02-15 10:06           ` Xishi Qiu
2016-02-15 14:24             ` Joonsoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).