[PATCH 0/6] Assorted compaction cleanups and optimizations

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/6] Assorted compaction cleanups and optimizations
@ 2015-06-10  9:32 ` Vlastimil Babka
  0 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-10  9:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Christoph Lameter, David Rientjes,
	Joonsoo Kim, Mel Gorman, Michal Nazarewicz, Minchan Kim,
	Naoya Horiguchi, Rik van Riel

This series is partly the cleanups that were posted as part of the RFC for
changing initial scanning positions [1] and partly new relatively simple
scanner optimizations (yes, they are still possible). I've resumed working
on the bigger scanner changes, but that will take a while, so no point in
delaying these smaller patches.

The interesting patches are 4 and 5, and somewhat patch 6. In 4, skipping of
compound pages in single iteration is improved for migration scanner, so it
works also for !PageLRU compound pages such as hugetlbfs, slab etc. Patch 5
introduces this kind of skipping in the free scanner. The trick is that we
can read compound_order() without any protection, if we are careful to filter
out values larger than MAX_ORDER. The only danger is that we skip too much.
The same trick was already used for reading the freepage order in the migrate
scanner.

Patch 6 avoids some rescanning when compaction restarts from cached scanner
positions, but the benefits are small enough to be lost in the noise.

To demonstrate improvements of Patches 4 and 5 I've run stress-highalloc from
mmtests, set to simulate THP allocations (including __GFP_COMP) on a 4GB system
where 1GB was occupied by hugetlbfs pages. I'll include just the relevant
stats:

                               Patch 3     Patch 4     Patch 5     Patch 6

Compaction stalls                 7523        7529        7515        7495
Compaction success                 323         304         322         289
Compaction failures               7200        7224        7192        7206
Page migrate success            247778      264395      240737      248956
Page migrate failure             15358       33184       21621       23657
Compaction pages isolated       906928      980192      909983      958044
Compaction migrate scanned     2005277     1692805     1498800     1750952
Compaction free scanned       13255284    11539986     9011276     9703018
Compaction cost                    288         305         277         289

With 5 iterations per patch, the results are still noisy, but we can see that
Patch 4 does reduce migrate_scanned by 15% thanks to skipping the hugetlbfs
pages at once. Interestingly, free_scanned is also reduced and I have no idea
why. Patch 5 further reduces free_scanned as expected, by 15%. Other stats
are unaffected modulo noise. Patch 6 looks like a regression but I believe it's
just the noise. I've verified that compaction now restarts from the exact pfns
it left off, using tracepoints.

[1] https://lkml.org/lkml/2015/1/19/158

Vlastimil Babka (6):
  mm, compaction: more robust check for scanners meeting
  mm, compaction: simplify handling restart position in free pages
    scanner
  mm, compaction: encapsulate resetting cached scanner positions
  mm, compaction: always skip compound pages by order in migrate scanner
  mm, compaction: skip compound pages by order in free scanner
  mm, compaction: decouple updating pageblock_skip and cached pfn

 mm/compaction.c | 188 ++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 115 insertions(+), 73 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 0/6] Assorted compaction cleanups and optimizations
@ 2015-06-10  9:32 ` Vlastimil Babka
  0 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-10  9:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Christoph Lameter, David Rientjes,
	Joonsoo Kim, Mel Gorman, Michal Nazarewicz, Minchan Kim,
	Naoya Horiguchi, Rik van Riel

This series is partly the cleanups that were posted as part of the RFC for
changing initial scanning positions [1] and partly new relatively simple
scanner optimizations (yes, they are still possible). I've resumed working
on the bigger scanner changes, but that will take a while, so no point in
delaying these smaller patches.

The interesting patches are 4 and 5, and somewhat patch 6. In 4, skipping of
compound pages in single iteration is improved for migration scanner, so it
works also for !PageLRU compound pages such as hugetlbfs, slab etc. Patch 5
introduces this kind of skipping in the free scanner. The trick is that we
can read compound_order() without any protection, if we are careful to filter
out values larger than MAX_ORDER. The only danger is that we skip too much.
The same trick was already used for reading the freepage order in the migrate
scanner.

Patch 6 avoids some rescanning when compaction restarts from cached scanner
positions, but the benefits are small enough to be lost in the noise.

To demonstrate improvements of Patches 4 and 5 I've run stress-highalloc from
mmtests, set to simulate THP allocations (including __GFP_COMP) on a 4GB system
where 1GB was occupied by hugetlbfs pages. I'll include just the relevant
stats:

                               Patch 3     Patch 4     Patch 5     Patch 6

Compaction stalls                 7523        7529        7515        7495
Compaction success                 323         304         322         289
Compaction failures               7200        7224        7192        7206
Page migrate success            247778      264395      240737      248956
Page migrate failure             15358       33184       21621       23657
Compaction pages isolated       906928      980192      909983      958044
Compaction migrate scanned     2005277     1692805     1498800     1750952
Compaction free scanned       13255284    11539986     9011276     9703018
Compaction cost                    288         305         277         289

With 5 iterations per patch, the results are still noisy, but we can see that
Patch 4 does reduce migrate_scanned by 15% thanks to skipping the hugetlbfs
pages at once. Interestingly, free_scanned is also reduced and I have no idea
why. Patch 5 further reduces free_scanned as expected, by 15%. Other stats
are unaffected modulo noise. Patch 6 looks like a regression but I believe it's
just the noise. I've verified that compaction now restarts from the exact pfns
it left off, using tracepoints.

[1] https://lkml.org/lkml/2015/1/19/158

Vlastimil Babka (6):
  mm, compaction: more robust check for scanners meeting
  mm, compaction: simplify handling restart position in free pages
    scanner
  mm, compaction: encapsulate resetting cached scanner positions
  mm, compaction: always skip compound pages by order in migrate scanner
  mm, compaction: skip compound pages by order in free scanner
  mm, compaction: decouple updating pageblock_skip and cached pfn

 mm/compaction.c | 188 ++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 115 insertions(+), 73 deletions(-)

-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/6] mm, compaction: more robust check for scanners meeting
  2015-06-10  9:32 ` Vlastimil Babka
@ 2015-06-10  9:32   ` Vlastimil Babka
  -1 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-10  9:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Mel Gorman,
	Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
	Christoph Lameter, Rik van Riel, David Rientjes

Compaction should finish when the migration and free scanner meet, i.e. they
reach the same pageblock. Currently however, the test in compact_finished()
simply just compares the exact pfns, which may yield a false negative when the
free scanner position is in the middle of a pageblock and the migration scanner
reaches the beginning of the same pageblock.

This hasn't been a problem until commit e14c720efdd7 ("mm, compaction: remember
position within pageblock in free pages scanner") allowed the free scanner
position to be in the middle of a pageblock between invocations.  The hot-fix
1d5bfe1ffb5b ("mm, compaction: prevent infinite loop in compact_zone")
prevented the issue by adding a special check in the migration scanner to
satisfy the current detection of scanners meeting.

However, the proper fix is to make the detection more robust. This patch
introduces the compact_scanners_met() function that returns true when the free
scanner position is in the same or lower pageblock than the migration scanner.
The special case in isolate_migratepages() introduced by 1d5bfe1ffb5b is
removed.

Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 16e1b57..d46aaeb 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -902,6 +902,16 @@ static bool suitable_migration_target(struct page *page)
 }
 
 /*
+ * Test whether the free scanner has reached the same or lower pageblock than
+ * the migration scanner, and compaction should thus terminate.
+ */
+static inline bool compact_scanners_met(struct compact_control *cc)
+{
+	return (cc->free_pfn >> pageblock_order)
+		<= (cc->migrate_pfn >> pageblock_order);
+}
+
+/*
  * Based on information in the current compact_control, find blocks
  * suitable for isolating free pages from and then isolate them.
  */
@@ -1131,12 +1141,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	}
 
 	acct_isolated(zone, cc);
-	/*
-	 * Record where migration scanner will be restarted. If we end up in
-	 * the same pageblock as the free scanner, make the scanners fully
-	 * meet so that compact_finished() terminates compaction.
-	 */
-	cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;
+	/* Record where migration scanner will be restarted. */
+	cc->migrate_pfn = low_pfn;
 
 	return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
 }
@@ -1151,7 +1157,7 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
 		return COMPACT_PARTIAL;
 
 	/* Compaction run completes if the migrate and free scanner meet */
-	if (cc->free_pfn <= cc->migrate_pfn) {
+	if (compact_scanners_met(cc)) {
 		/* Let the next compaction start anew. */
 		zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
 		zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
@@ -1380,7 +1386,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 			 * migrate_pages() may return -ENOMEM when scanners meet
 			 * and we want compact_finished() to detect it
 			 */
-			if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
+			if (err == -ENOMEM && !compact_scanners_met(cc)) {
 				ret = COMPACT_PARTIAL;
 				goto out;
 			}
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 1/6] mm, compaction: more robust check for scanners meeting
@ 2015-06-10  9:32   ` Vlastimil Babka
  0 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-10  9:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Mel Gorman,
	Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
	Christoph Lameter, Rik van Riel, David Rientjes

Compaction should finish when the migration and free scanner meet, i.e. they
reach the same pageblock. Currently however, the test in compact_finished()
simply just compares the exact pfns, which may yield a false negative when the
free scanner position is in the middle of a pageblock and the migration scanner
reaches the beginning of the same pageblock.

This hasn't been a problem until commit e14c720efdd7 ("mm, compaction: remember
position within pageblock in free pages scanner") allowed the free scanner
position to be in the middle of a pageblock between invocations.  The hot-fix
1d5bfe1ffb5b ("mm, compaction: prevent infinite loop in compact_zone")
prevented the issue by adding a special check in the migration scanner to
satisfy the current detection of scanners meeting.

However, the proper fix is to make the detection more robust. This patch
introduces the compact_scanners_met() function that returns true when the free
scanner position is in the same or lower pageblock than the migration scanner.
The special case in isolate_migratepages() introduced by 1d5bfe1ffb5b is
removed.

Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 16e1b57..d46aaeb 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -902,6 +902,16 @@ static bool suitable_migration_target(struct page *page)
 }
 
 /*
+ * Test whether the free scanner has reached the same or lower pageblock than
+ * the migration scanner, and compaction should thus terminate.
+ */
+static inline bool compact_scanners_met(struct compact_control *cc)
+{
+	return (cc->free_pfn >> pageblock_order)
+		<= (cc->migrate_pfn >> pageblock_order);
+}
+
+/*
  * Based on information in the current compact_control, find blocks
  * suitable for isolating free pages from and then isolate them.
  */
@@ -1131,12 +1141,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	}
 
 	acct_isolated(zone, cc);
-	/*
-	 * Record where migration scanner will be restarted. If we end up in
-	 * the same pageblock as the free scanner, make the scanners fully
-	 * meet so that compact_finished() terminates compaction.
-	 */
-	cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;
+	/* Record where migration scanner will be restarted. */
+	cc->migrate_pfn = low_pfn;
 
 	return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
 }
@@ -1151,7 +1157,7 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
 		return COMPACT_PARTIAL;
 
 	/* Compaction run completes if the migrate and free scanner meet */
-	if (cc->free_pfn <= cc->migrate_pfn) {
+	if (compact_scanners_met(cc)) {
 		/* Let the next compaction start anew. */
 		zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
 		zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
@@ -1380,7 +1386,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 			 * migrate_pages() may return -ENOMEM when scanners meet
 			 * and we want compact_finished() to detect it
 			 */
-			if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
+			if (err == -ENOMEM && !compact_scanners_met(cc)) {
 				ret = COMPACT_PARTIAL;
 				goto out;
 			}
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/6] mm, compaction: simplify handling restart position in free pages scanner
  2015-06-10  9:32 ` Vlastimil Babka
@ 2015-06-10  9:32   ` Vlastimil Babka
  -1 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-10  9:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Mel Gorman,
	Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
	Christoph Lameter, Rik van Riel, David Rientjes

Handling the position where compaction free scanner should restart (stored in
cc->free_pfn) got more complex with commit e14c720efdd7 ("mm, compaction:
remember position within pageblock in free pages scanner"). Currently the
position is updated in each loop iteration of isolate_freepages(), although it
should be enough to update it only when breaking from the loop. There's also
an extra check outside the loop updates the position in case we have met the
migration scanner.

This can be simplified if we move the test for having isolated enough from the
for loop header next to the test for contention, and determining the restart
position only in these cases. We can reuse the isolate_start_pfn variable for
this instead of setting cc->free_pfn directly. Outside the loop, we can simply
set cc->free_pfn to current value of isolate_start_pfn without any extra check.

Also add a VM_BUG_ON to catch possible mistake in the future, in case we later
add a new condition that terminates isolate_freepages_block() prematurely
without also considering the condition in isolate_freepages().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index d46aaeb..7e0a814 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -947,8 +947,7 @@ static void isolate_freepages(struct compact_control *cc)
 	 * pages on cc->migratepages. We stop searching if the migrate
 	 * and free page scanners meet or enough free pages are isolated.
 	 */
-	for (; block_start_pfn >= low_pfn &&
-			cc->nr_migratepages > cc->nr_freepages;
+	for (; block_start_pfn >= low_pfn;
 				block_end_pfn = block_start_pfn,
 				block_start_pfn -= pageblock_nr_pages,
 				isolate_start_pfn = block_start_pfn) {
@@ -980,6 +979,8 @@ static void isolate_freepages(struct compact_control *cc)
 					block_end_pfn, freelist, false);
 
 		/*
+		 * If we isolated enough freepages, or aborted due to async
+		 * compaction being contended, terminate the loop.
 		 * Remember where the free scanner should restart next time,
 		 * which is where isolate_freepages_block() left off.
 		 * But if it scanned the whole pageblock, isolate_start_pfn
@@ -988,27 +989,31 @@ static void isolate_freepages(struct compact_control *cc)
 		 * In that case we will however want to restart at the start
 		 * of the previous pageblock.
 		 */
-		cc->free_pfn = (isolate_start_pfn < block_end_pfn) ?
-				isolate_start_pfn :
-				block_start_pfn - pageblock_nr_pages;
-
-		/*
-		 * isolate_freepages_block() might have aborted due to async
-		 * compaction being contended
-		 */
-		if (cc->contended)
+		if ((cc->nr_freepages >= cc->nr_migratepages)
+							|| cc->contended) {
+			if (isolate_start_pfn >= block_end_pfn)
+				isolate_start_pfn =
+					block_start_pfn - pageblock_nr_pages;
 			break;
+		} else {
+			/*
+			 * isolate_freepages_block() should not terminate
+			 * prematurely unless contended, or isolated enough
+			 */
+			VM_BUG_ON(isolate_start_pfn < block_end_pfn);
+		}
 	}
 
 	/* split_free_page does not map the pages */
 	map_pages(freelist);
 
 	/*
-	 * If we crossed the migrate scanner, we want to keep it that way
-	 * so that compact_finished() may detect this
+	 * Record where the free scanner will restart next time. Either we
+	 * broke from the loop and set isolate_start_pfn based on the last
+	 * call to isolate_freepages_block(), or we met the migration scanner
+	 * and the loop terminated due to isolate_start_pfn < low_pfn
 	 */
-	if (block_start_pfn < low_pfn)
-		cc->free_pfn = cc->migrate_pfn;
+	cc->free_pfn = isolate_start_pfn;
 }
 
 /*
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/6] mm, compaction: simplify handling restart position in free pages scanner
@ 2015-06-10  9:32   ` Vlastimil Babka
  0 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-10  9:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Mel Gorman,
	Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
	Christoph Lameter, Rik van Riel, David Rientjes

Handling the position where compaction free scanner should restart (stored in
cc->free_pfn) got more complex with commit e14c720efdd7 ("mm, compaction:
remember position within pageblock in free pages scanner"). Currently the
position is updated in each loop iteration of isolate_freepages(), although it
should be enough to update it only when breaking from the loop. There's also
an extra check outside the loop updates the position in case we have met the
migration scanner.

This can be simplified if we move the test for having isolated enough from the
for loop header next to the test for contention, and determining the restart
position only in these cases. We can reuse the isolate_start_pfn variable for
this instead of setting cc->free_pfn directly. Outside the loop, we can simply
set cc->free_pfn to current value of isolate_start_pfn without any extra check.

Also add a VM_BUG_ON to catch possible mistake in the future, in case we later
add a new condition that terminates isolate_freepages_block() prematurely
without also considering the condition in isolate_freepages().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index d46aaeb..7e0a814 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -947,8 +947,7 @@ static void isolate_freepages(struct compact_control *cc)
 	 * pages on cc->migratepages. We stop searching if the migrate
 	 * and free page scanners meet or enough free pages are isolated.
 	 */
-	for (; block_start_pfn >= low_pfn &&
-			cc->nr_migratepages > cc->nr_freepages;
+	for (; block_start_pfn >= low_pfn;
 				block_end_pfn = block_start_pfn,
 				block_start_pfn -= pageblock_nr_pages,
 				isolate_start_pfn = block_start_pfn) {
@@ -980,6 +979,8 @@ static void isolate_freepages(struct compact_control *cc)
 					block_end_pfn, freelist, false);
 
 		/*
+		 * If we isolated enough freepages, or aborted due to async
+		 * compaction being contended, terminate the loop.
 		 * Remember where the free scanner should restart next time,
 		 * which is where isolate_freepages_block() left off.
 		 * But if it scanned the whole pageblock, isolate_start_pfn
@@ -988,27 +989,31 @@ static void isolate_freepages(struct compact_control *cc)
 		 * In that case we will however want to restart at the start
 		 * of the previous pageblock.
 		 */
-		cc->free_pfn = (isolate_start_pfn < block_end_pfn) ?
-				isolate_start_pfn :
-				block_start_pfn - pageblock_nr_pages;
-
-		/*
-		 * isolate_freepages_block() might have aborted due to async
-		 * compaction being contended
-		 */
-		if (cc->contended)
+		if ((cc->nr_freepages >= cc->nr_migratepages)
+							|| cc->contended) {
+			if (isolate_start_pfn >= block_end_pfn)
+				isolate_start_pfn =
+					block_start_pfn - pageblock_nr_pages;
 			break;
+		} else {
+			/*
+			 * isolate_freepages_block() should not terminate
+			 * prematurely unless contended, or isolated enough
+			 */
+			VM_BUG_ON(isolate_start_pfn < block_end_pfn);
+		}
 	}
 
 	/* split_free_page does not map the pages */
 	map_pages(freelist);
 
 	/*
-	 * If we crossed the migrate scanner, we want to keep it that way
-	 * so that compact_finished() may detect this
+	 * Record where the free scanner will restart next time. Either we
+	 * broke from the loop and set isolate_start_pfn based on the last
+	 * call to isolate_freepages_block(), or we met the migration scanner
+	 * and the loop terminated due to isolate_start_pfn < low_pfn
 	 */
-	if (block_start_pfn < low_pfn)
-		cc->free_pfn = cc->migrate_pfn;
+	cc->free_pfn = isolate_start_pfn;
 }
 
 /*
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/6] mm, compaction: encapsulate resetting cached scanner positions
  2015-06-10  9:32 ` Vlastimil Babka
@ 2015-06-10  9:32   ` Vlastimil Babka
  -1 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-10  9:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Mel Gorman,
	Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
	Christoph Lameter, Rik van Riel, David Rientjes

Resetting the cached compaction scanner positions is now done implicitly in
__reset_isolation_suitable() and compact_finished(). Encapsulate the
functionality in a new function reset_cached_positions() and call it explicitly
where needed.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 7e0a814..d334bb3 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -207,6 +207,13 @@ static inline bool isolation_suitable(struct compact_control *cc,
 	return !get_pageblock_skip(page);
 }
 
+static void reset_cached_positions(struct zone *zone)
+{
+	zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
+	zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
+	zone->compact_cached_free_pfn = zone_end_pfn(zone);
+}
+
 /*
  * This function is called to clear all cached information on pageblocks that
  * should be skipped for page isolation when the migrate and free page scanner
@@ -218,9 +225,6 @@ static void __reset_isolation_suitable(struct zone *zone)
 	unsigned long end_pfn = zone_end_pfn(zone);
 	unsigned long pfn;
 
-	zone->compact_cached_migrate_pfn[0] = start_pfn;
-	zone->compact_cached_migrate_pfn[1] = start_pfn;
-	zone->compact_cached_free_pfn = end_pfn;
 	zone->compact_blockskip_flush = false;
 
 	/* Walk the zone and mark every pageblock as suitable for isolation */
@@ -250,8 +254,10 @@ void reset_isolation_suitable(pg_data_t *pgdat)
 			continue;
 
 		/* Only flush if a full compaction finished recently */
-		if (zone->compact_blockskip_flush)
+		if (zone->compact_blockskip_flush) {
 			__reset_isolation_suitable(zone);
+			reset_cached_positions(zone);
+		}
 	}
 }
 
@@ -1164,9 +1170,7 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
 	/* Compaction run completes if the migrate and free scanner meet */
 	if (compact_scanners_met(cc)) {
 		/* Let the next compaction start anew. */
-		zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
-		zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
-		zone->compact_cached_free_pfn = zone_end_pfn(zone);
+		reset_cached_positions(zone);
 
 		/*
 		 * Mark that the PG_migrate_skip information should be cleared
@@ -1329,8 +1333,10 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 	 * is about to be retried after being deferred. kswapd does not do
 	 * this reset as it'll reset the cached information when going to sleep.
 	 */
-	if (compaction_restarting(zone, cc->order) && !current_is_kswapd())
+	if (compaction_restarting(zone, cc->order) && !current_is_kswapd()) {
 		__reset_isolation_suitable(zone);
+		reset_cached_positions(zone);
+	}
 
 	/*
 	 * Setup to move all movable pages to the end of the zone. Used cached
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/6] mm, compaction: encapsulate resetting cached scanner positions
@ 2015-06-10  9:32   ` Vlastimil Babka
  0 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-10  9:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Mel Gorman,
	Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
	Christoph Lameter, Rik van Riel, David Rientjes

Resetting the cached compaction scanner positions is now done implicitly in
__reset_isolation_suitable() and compact_finished(). Encapsulate the
functionality in a new function reset_cached_positions() and call it explicitly
where needed.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 7e0a814..d334bb3 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -207,6 +207,13 @@ static inline bool isolation_suitable(struct compact_control *cc,
 	return !get_pageblock_skip(page);
 }
 
+static void reset_cached_positions(struct zone *zone)
+{
+	zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
+	zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
+	zone->compact_cached_free_pfn = zone_end_pfn(zone);
+}
+
 /*
  * This function is called to clear all cached information on pageblocks that
  * should be skipped for page isolation when the migrate and free page scanner
@@ -218,9 +225,6 @@ static void __reset_isolation_suitable(struct zone *zone)
 	unsigned long end_pfn = zone_end_pfn(zone);
 	unsigned long pfn;
 
-	zone->compact_cached_migrate_pfn[0] = start_pfn;
-	zone->compact_cached_migrate_pfn[1] = start_pfn;
-	zone->compact_cached_free_pfn = end_pfn;
 	zone->compact_blockskip_flush = false;
 
 	/* Walk the zone and mark every pageblock as suitable for isolation */
@@ -250,8 +254,10 @@ void reset_isolation_suitable(pg_data_t *pgdat)
 			continue;
 
 		/* Only flush if a full compaction finished recently */
-		if (zone->compact_blockskip_flush)
+		if (zone->compact_blockskip_flush) {
 			__reset_isolation_suitable(zone);
+			reset_cached_positions(zone);
+		}
 	}
 }
 
@@ -1164,9 +1170,7 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
 	/* Compaction run completes if the migrate and free scanner meet */
 	if (compact_scanners_met(cc)) {
 		/* Let the next compaction start anew. */
-		zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
-		zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
-		zone->compact_cached_free_pfn = zone_end_pfn(zone);
+		reset_cached_positions(zone);
 
 		/*
 		 * Mark that the PG_migrate_skip information should be cleared
@@ -1329,8 +1333,10 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 	 * is about to be retried after being deferred. kswapd does not do
 	 * this reset as it'll reset the cached information when going to sleep.
 	 */
-	if (compaction_restarting(zone, cc->order) && !current_is_kswapd())
+	if (compaction_restarting(zone, cc->order) && !current_is_kswapd()) {
 		__reset_isolation_suitable(zone);
+		reset_cached_positions(zone);
+	}
 
 	/*
 	 * Setup to move all movable pages to the end of the zone. Used cached
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/6] mm, compaction: always skip compound pages by order in migrate scanner
  2015-06-10  9:32 ` Vlastimil Babka
@ 2015-06-10  9:32   ` Vlastimil Babka
  -1 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-10  9:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Mel Gorman,
	Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
	Christoph Lameter, Rik van Riel, David Rientjes

The compaction migrate scanner tries to skip compound pages by their order, to
reduce number of iterations for pages it cannot isolate. The check is only done
if PageLRU() is true, which means it applies to THP pages, but not e.g.
hugetlbfs pages or any other non-LRU compound pages, which we have to iterate
by base pages.

This limitation comes from the assumption that it's only safe to read
compound_order() when we have the zone's lru_lock and THP cannot be split under
us. But the only danger (after filtering out order values that are not below
MAX_ORDER, to prevent overflows) is that we skip too much or too little after
reading a bogus compound_order() due to a rare race. This is the same reasoning
as patch 99c0fd5e51c4 ("mm, compaction: skip buddy pages by their order in the
migrate scanner") introduced for unsafely reading PageBuddy() order.

After this patch, all pages are tested for PageCompound() and we skip them by
compound_order().  The test is done after the test for balloon_page_movable()
as we don't want to assume if balloon pages (or other pages with own isolation
and migration implementation if a generic API gets implemented) are compound
or not.

When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
pages, the vmstat compact_migrate_scanned count decreased by 15%.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 36 +++++++++++++++++-------------------
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index d334bb3..e37d361 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -680,6 +680,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 	/* Time to isolate some pages for migration */
 	for (; low_pfn < end_pfn; low_pfn++) {
+		bool is_lru;
+
 		/*
 		 * Periodically drop the lock (if held) regardless of its
 		 * contention, to give chance to IRQs. Abort async compaction
@@ -723,39 +725,35 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		 * It's possible to migrate LRU pages and balloon pages
 		 * Skip any other type of page
 		 */
-		if (!PageLRU(page)) {
+		is_lru = PageLRU(page);
+		if (!is_lru) {
 			if (unlikely(balloon_page_movable(page))) {
 				if (balloon_page_isolate(page)) {
 					/* Successfully isolated */
 					goto isolate_success;
 				}
 			}
-			continue;
 		}
 
 		/*
-		 * PageLRU is set. lru_lock normally excludes isolation
-		 * splitting and collapsing (collapsing has already happened
-		 * if PageLRU is set) but the lock is not necessarily taken
-		 * here and it is wasteful to take it just to check transhuge.
-		 * Check PageCompound without lock and skip the whole pageblock
-		 * if it's a transhuge page, as calling compound_order()
-		 * without preventing THP from splitting the page underneath us
-		 * may return surprising results.
-		 * If we happen to check a THP tail page, compound_order()
-		 * returns 0. It should be rare enough to not bother with
-		 * using compound_head() in that case.
+		 * Regardless of being on LRU, compound pages such as THP and
+		 * hugetlbfs are not to be compacted. We can potentially save
+		 * a lot of iterations if we skip them at once. The check is
+		 * racy, but we can consider only valid values and the only
+		 * danger is skipping too much.
 		 */
 		if (PageCompound(page)) {
-			int nr;
-			if (locked)
-				nr = 1 << compound_order(page);
-			else
-				nr = pageblock_nr_pages;
-			low_pfn += nr - 1;
+			unsigned int comp_order = compound_order(page);
+
+			if (comp_order > 0 && comp_order < MAX_ORDER)
+				low_pfn += (1UL << comp_order) - 1;
+
 			continue;
 		}
 
+		if (!is_lru)
+			continue;
+
 		/*
 		 * Migration will fail if an anonymous page is pinned in memory,
 		 * so avoid taking lru_lock and isolating it unnecessarily in an
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/6] mm, compaction: always skip compound pages by order in migrate scanner
@ 2015-06-10  9:32   ` Vlastimil Babka
  0 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-10  9:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Mel Gorman,
	Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
	Christoph Lameter, Rik van Riel, David Rientjes

The compaction migrate scanner tries to skip compound pages by their order, to
reduce number of iterations for pages it cannot isolate. The check is only done
if PageLRU() is true, which means it applies to THP pages, but not e.g.
hugetlbfs pages or any other non-LRU compound pages, which we have to iterate
by base pages.

This limitation comes from the assumption that it's only safe to read
compound_order() when we have the zone's lru_lock and THP cannot be split under
us. But the only danger (after filtering out order values that are not below
MAX_ORDER, to prevent overflows) is that we skip too much or too little after
reading a bogus compound_order() due to a rare race. This is the same reasoning
as patch 99c0fd5e51c4 ("mm, compaction: skip buddy pages by their order in the
migrate scanner") introduced for unsafely reading PageBuddy() order.

After this patch, all pages are tested for PageCompound() and we skip them by
compound_order().  The test is done after the test for balloon_page_movable()
as we don't want to assume if balloon pages (or other pages with own isolation
and migration implementation if a generic API gets implemented) are compound
or not.

When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
pages, the vmstat compact_migrate_scanned count decreased by 15%.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 36 +++++++++++++++++-------------------
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index d334bb3..e37d361 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -680,6 +680,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 	/* Time to isolate some pages for migration */
 	for (; low_pfn < end_pfn; low_pfn++) {
+		bool is_lru;
+
 		/*
 		 * Periodically drop the lock (if held) regardless of its
 		 * contention, to give chance to IRQs. Abort async compaction
@@ -723,39 +725,35 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		 * It's possible to migrate LRU pages and balloon pages
 		 * Skip any other type of page
 		 */
-		if (!PageLRU(page)) {
+		is_lru = PageLRU(page);
+		if (!is_lru) {
 			if (unlikely(balloon_page_movable(page))) {
 				if (balloon_page_isolate(page)) {
 					/* Successfully isolated */
 					goto isolate_success;
 				}
 			}
-			continue;
 		}
 
 		/*
-		 * PageLRU is set. lru_lock normally excludes isolation
-		 * splitting and collapsing (collapsing has already happened
-		 * if PageLRU is set) but the lock is not necessarily taken
-		 * here and it is wasteful to take it just to check transhuge.
-		 * Check PageCompound without lock and skip the whole pageblock
-		 * if it's a transhuge page, as calling compound_order()
-		 * without preventing THP from splitting the page underneath us
-		 * may return surprising results.
-		 * If we happen to check a THP tail page, compound_order()
-		 * returns 0. It should be rare enough to not bother with
-		 * using compound_head() in that case.
+		 * Regardless of being on LRU, compound pages such as THP and
+		 * hugetlbfs are not to be compacted. We can potentially save
+		 * a lot of iterations if we skip them at once. The check is
+		 * racy, but we can consider only valid values and the only
+		 * danger is skipping too much.
 		 */
 		if (PageCompound(page)) {
-			int nr;
-			if (locked)
-				nr = 1 << compound_order(page);
-			else
-				nr = pageblock_nr_pages;
-			low_pfn += nr - 1;
+			unsigned int comp_order = compound_order(page);
+
+			if (comp_order > 0 && comp_order < MAX_ORDER)
+				low_pfn += (1UL << comp_order) - 1;
+
 			continue;
 		}
 
+		if (!is_lru)
+			continue;
+
 		/*
 		 * Migration will fail if an anonymous page is pinned in memory,
 		 * so avoid taking lru_lock and isolating it unnecessarily in an
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 5/6] mm, compaction: skip compound pages by order in free scanner
  2015-06-10  9:32 ` Vlastimil Babka
@ 2015-06-10  9:32   ` Vlastimil Babka
  -1 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-10  9:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Mel Gorman,
	Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
	Christoph Lameter, Rik van Riel, David Rientjes

The compaction free scanner is looking for PageBuddy() pages and skipping all
others.  For large compound pages such as THP or hugetlbfs, we can save a lot
of iterations if we skip them at once using their compound_order(). This is
generally unsafe and we can read a bogus value of order due to a race, but if
we are careful, the only danger is skipping too much.

When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
pages, the vmstat compact_free_scanned count decreased by at least 15%.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index e37d361..4a14084 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -437,6 +437,24 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
 
 		if (!valid_page)
 			valid_page = page;
+
+		/*
+		 * For compound pages such as THP and hugetlbfs, we can save
+		 * potentially a lot of iterations if we skip them at once.
+		 * The check is racy, but we can consider only valid values
+		 * and the only danger is skipping too much.
+		 */
+		if (PageCompound(page)) {
+			unsigned int comp_order = compound_order(page);
+
+			if (comp_order > 0 && comp_order < MAX_ORDER) {
+				blockpfn += (1UL << comp_order) - 1;
+				cursor += (1UL << comp_order) - 1;
+			}
+
+			goto isolate_fail;
+		}
+
 		if (!PageBuddy(page))
 			goto isolate_fail;
 
@@ -496,6 +514,13 @@ isolate_fail:
 
 	}
 
+	/*
+	 * There is a tiny chance that we have read bogus compound_order(),
+	 * so be careful to not go outside of the pageblock.
+	 */
+	if (unlikely(blockpfn > end_pfn))
+		blockpfn = end_pfn;
+
 	trace_mm_compaction_isolate_freepages(*start_pfn, blockpfn,
 					nr_scanned, total_isolated);
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 5/6] mm, compaction: skip compound pages by order in free scanner
@ 2015-06-10  9:32   ` Vlastimil Babka
  0 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-10  9:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Mel Gorman,
	Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
	Christoph Lameter, Rik van Riel, David Rientjes

The compaction free scanner is looking for PageBuddy() pages and skipping all
others.  For large compound pages such as THP or hugetlbfs, we can save a lot
of iterations if we skip them at once using their compound_order(). This is
generally unsafe and we can read a bogus value of order due to a race, but if
we are careful, the only danger is skipping too much.

When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
pages, the vmstat compact_free_scanned count decreased by at least 15%.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index e37d361..4a14084 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -437,6 +437,24 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
 
 		if (!valid_page)
 			valid_page = page;
+
+		/*
+		 * For compound pages such as THP and hugetlbfs, we can save
+		 * potentially a lot of iterations if we skip them at once.
+		 * The check is racy, but we can consider only valid values
+		 * and the only danger is skipping too much.
+		 */
+		if (PageCompound(page)) {
+			unsigned int comp_order = compound_order(page);
+
+			if (comp_order > 0 && comp_order < MAX_ORDER) {
+				blockpfn += (1UL << comp_order) - 1;
+				cursor += (1UL << comp_order) - 1;
+			}
+
+			goto isolate_fail;
+		}
+
 		if (!PageBuddy(page))
 			goto isolate_fail;
 
@@ -496,6 +514,13 @@ isolate_fail:
 
 	}
 
+	/*
+	 * There is a tiny chance that we have read bogus compound_order(),
+	 * so be careful to not go outside of the pageblock.
+	 */
+	if (unlikely(blockpfn > end_pfn))
+		blockpfn = end_pfn;
+
 	trace_mm_compaction_isolate_freepages(*start_pfn, blockpfn,
 					nr_scanned, total_isolated);
 
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 6/6] mm, compaction: decouple updating pageblock_skip and cached pfn
  2015-06-10  9:32 ` Vlastimil Babka
@ 2015-06-10  9:32   ` Vlastimil Babka
  -1 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-10  9:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Mel Gorman,
	Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
	Christoph Lameter, Rik van Riel, David Rientjes

The pageblock_skip bitmap and cached scanner pfn's are two mechanisms in
compaction to prevent rescanning pages where isolation has recently failed
or they were scanned during the previous compaction attempt.

Currently, both kinds of information are updated via update_pageblock_skip(),
which is suboptimal for the cached scanner pfn's:

- The condition "isolation has failed in the pageblock" checked by
  update_pageblock_skip() may be valid for the pageblock_skip bitmap, but makes
  less sense for cached pfn's. There's little point for the next compaction
  attempt to scan again a pageblock where all pages that could be isolated were
  already processed.

- whole pageblocks can be skipped at the level of isolate_migratepages() or
  isolate_freepages() before going into the corresponding _block() function.
  Not updating cached scanner positions at the higher level may again result
  in extra iterations.

This patch moves updating cached scanner pfn's from update_pageblock_skip()
to dedicated functions, which are called directly from isolate_migratepages()
and isolate_freepages(), resolving both inefficiencies.

During testing, the observed differences in compact_migrate_scanned and
compact_free_scanned were lost in the noise.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 48 +++++++++++++++++++++++++-----------------------
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 4a14084..c326607 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -261,17 +261,31 @@ void reset_isolation_suitable(pg_data_t *pgdat)
 	}
 }
 
+static inline void
+update_cached_migrate_pfn(struct zone *zone, unsigned long pfn,
+						enum migrate_mode mode)
+{
+	if (pfn > zone->compact_cached_migrate_pfn[0])
+		zone->compact_cached_migrate_pfn[0] = pfn;
+	if (mode != MIGRATE_ASYNC &&
+	    pfn > zone->compact_cached_migrate_pfn[1])
+		zone->compact_cached_migrate_pfn[1] = pfn;
+}
+
+static inline void
+update_cached_free_pfn(struct zone *zone, unsigned long pfn)
+{
+	if (pfn < zone->compact_cached_free_pfn)
+		zone->compact_cached_free_pfn = pfn;
+}
+
 /*
  * If no pages were isolated then mark this pageblock to be skipped in the
  * future. The information is later cleared by __reset_isolation_suitable().
  */
 static void update_pageblock_skip(struct compact_control *cc,
-			struct page *page, unsigned long nr_isolated,
-			bool migrate_scanner)
+			struct page *page, unsigned long nr_isolated)
 {
-	struct zone *zone = cc->zone;
-	unsigned long pfn;
-
 	if (cc->ignore_skip_hint)
 		return;
 
@@ -282,20 +296,6 @@ static void update_pageblock_skip(struct compact_control *cc,
 		return;
 
 	set_pageblock_skip(page);
-
-	pfn = page_to_pfn(page);
-
-	/* Update where async and sync compaction should restart */
-	if (migrate_scanner) {
-		if (pfn > zone->compact_cached_migrate_pfn[0])
-			zone->compact_cached_migrate_pfn[0] = pfn;
-		if (cc->mode != MIGRATE_ASYNC &&
-		    pfn > zone->compact_cached_migrate_pfn[1])
-			zone->compact_cached_migrate_pfn[1] = pfn;
-	} else {
-		if (pfn < zone->compact_cached_free_pfn)
-			zone->compact_cached_free_pfn = pfn;
-	}
 }
 #else
 static inline bool isolation_suitable(struct compact_control *cc,
@@ -305,8 +305,7 @@ static inline bool isolation_suitable(struct compact_control *cc,
 }
 
 static void update_pageblock_skip(struct compact_control *cc,
-			struct page *page, unsigned long nr_isolated,
-			bool migrate_scanner)
+			struct page *page, unsigned long nr_isolated)
 {
 }
 #endif /* CONFIG_COMPACTION */
@@ -540,7 +539,7 @@ isolate_fail:
 
 	/* Update the pageblock-skip if the whole pageblock was scanned */
 	if (blockpfn == end_pfn)
-		update_pageblock_skip(cc, valid_page, total_isolated, false);
+		update_pageblock_skip(cc, valid_page, total_isolated);
 
 	count_compact_events(COMPACTFREE_SCANNED, nr_scanned);
 	if (total_isolated)
@@ -843,7 +842,7 @@ isolate_success:
 	 * if the whole pageblock was scanned without isolating any page.
 	 */
 	if (low_pfn == end_pfn)
-		update_pageblock_skip(cc, valid_page, nr_isolated, true);
+		update_pageblock_skip(cc, valid_page, nr_isolated);
 
 	trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
 						nr_scanned, nr_isolated);
@@ -1043,6 +1042,7 @@ static void isolate_freepages(struct compact_control *cc)
 	 * and the loop terminated due to isolate_start_pfn < low_pfn
 	 */
 	cc->free_pfn = isolate_start_pfn;
+	update_cached_free_pfn(zone, isolate_start_pfn);
 }
 
 /*
@@ -1177,6 +1177,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	acct_isolated(zone, cc);
 	/* Record where migration scanner will be restarted. */
 	cc->migrate_pfn = low_pfn;
+	update_cached_migrate_pfn(zone, low_pfn, cc->mode);
+
 
 	return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
 }
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 6/6] mm, compaction: decouple updating pageblock_skip and cached pfn
@ 2015-06-10  9:32   ` Vlastimil Babka
  0 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-10  9:32 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Mel Gorman,
	Joonsoo Kim, Michal Nazarewicz, Naoya Horiguchi,
	Christoph Lameter, Rik van Riel, David Rientjes

The pageblock_skip bitmap and cached scanner pfn's are two mechanisms in
compaction to prevent rescanning pages where isolation has recently failed
or they were scanned during the previous compaction attempt.

Currently, both kinds of information are updated via update_pageblock_skip(),
which is suboptimal for the cached scanner pfn's:

- The condition "isolation has failed in the pageblock" checked by
  update_pageblock_skip() may be valid for the pageblock_skip bitmap, but makes
  less sense for cached pfn's. There's little point for the next compaction
  attempt to scan again a pageblock where all pages that could be isolated were
  already processed.

- whole pageblocks can be skipped at the level of isolate_migratepages() or
  isolate_freepages() before going into the corresponding _block() function.
  Not updating cached scanner positions at the higher level may again result
  in extra iterations.

This patch moves updating cached scanner pfn's from update_pageblock_skip()
to dedicated functions, which are called directly from isolate_migratepages()
and isolate_freepages(), resolving both inefficiencies.

During testing, the observed differences in compact_migrate_scanned and
compact_free_scanned were lost in the noise.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 48 +++++++++++++++++++++++++-----------------------
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 4a14084..c326607 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -261,17 +261,31 @@ void reset_isolation_suitable(pg_data_t *pgdat)
 	}
 }
 
+static inline void
+update_cached_migrate_pfn(struct zone *zone, unsigned long pfn,
+						enum migrate_mode mode)
+{
+	if (pfn > zone->compact_cached_migrate_pfn[0])
+		zone->compact_cached_migrate_pfn[0] = pfn;
+	if (mode != MIGRATE_ASYNC &&
+	    pfn > zone->compact_cached_migrate_pfn[1])
+		zone->compact_cached_migrate_pfn[1] = pfn;
+}
+
+static inline void
+update_cached_free_pfn(struct zone *zone, unsigned long pfn)
+{
+	if (pfn < zone->compact_cached_free_pfn)
+		zone->compact_cached_free_pfn = pfn;
+}
+
 /*
  * If no pages were isolated then mark this pageblock to be skipped in the
  * future. The information is later cleared by __reset_isolation_suitable().
  */
 static void update_pageblock_skip(struct compact_control *cc,
-			struct page *page, unsigned long nr_isolated,
-			bool migrate_scanner)
+			struct page *page, unsigned long nr_isolated)
 {
-	struct zone *zone = cc->zone;
-	unsigned long pfn;
-
 	if (cc->ignore_skip_hint)
 		return;
 
@@ -282,20 +296,6 @@ static void update_pageblock_skip(struct compact_control *cc,
 		return;
 
 	set_pageblock_skip(page);
-
-	pfn = page_to_pfn(page);
-
-	/* Update where async and sync compaction should restart */
-	if (migrate_scanner) {
-		if (pfn > zone->compact_cached_migrate_pfn[0])
-			zone->compact_cached_migrate_pfn[0] = pfn;
-		if (cc->mode != MIGRATE_ASYNC &&
-		    pfn > zone->compact_cached_migrate_pfn[1])
-			zone->compact_cached_migrate_pfn[1] = pfn;
-	} else {
-		if (pfn < zone->compact_cached_free_pfn)
-			zone->compact_cached_free_pfn = pfn;
-	}
 }
 #else
 static inline bool isolation_suitable(struct compact_control *cc,
@@ -305,8 +305,7 @@ static inline bool isolation_suitable(struct compact_control *cc,
 }
 
 static void update_pageblock_skip(struct compact_control *cc,
-			struct page *page, unsigned long nr_isolated,
-			bool migrate_scanner)
+			struct page *page, unsigned long nr_isolated)
 {
 }
 #endif /* CONFIG_COMPACTION */
@@ -540,7 +539,7 @@ isolate_fail:
 
 	/* Update the pageblock-skip if the whole pageblock was scanned */
 	if (blockpfn == end_pfn)
-		update_pageblock_skip(cc, valid_page, total_isolated, false);
+		update_pageblock_skip(cc, valid_page, total_isolated);
 
 	count_compact_events(COMPACTFREE_SCANNED, nr_scanned);
 	if (total_isolated)
@@ -843,7 +842,7 @@ isolate_success:
 	 * if the whole pageblock was scanned without isolating any page.
 	 */
 	if (low_pfn == end_pfn)
-		update_pageblock_skip(cc, valid_page, nr_isolated, true);
+		update_pageblock_skip(cc, valid_page, nr_isolated);
 
 	trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
 						nr_scanned, nr_isolated);
@@ -1043,6 +1042,7 @@ static void isolate_freepages(struct compact_control *cc)
 	 * and the loop terminated due to isolate_start_pfn < low_pfn
 	 */
 	cc->free_pfn = isolate_start_pfn;
+	update_cached_free_pfn(zone, isolate_start_pfn);
 }
 
 /*
@@ -1177,6 +1177,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	acct_isolated(zone, cc);
 	/* Record where migration scanner will be restarted. */
 	cc->migrate_pfn = low_pfn;
+	update_cached_migrate_pfn(zone, low_pfn, cc->mode);
+
 
 	return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
 }
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/6] mm, compaction: more robust check for scanners meeting
  2015-06-10  9:32   ` Vlastimil Babka
@ 2015-06-10 18:02     ` Rik van Riel
  -1 siblings, 0 replies; 50+ messages in thread
From: Rik van Riel @ 2015-06-10 18:02 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, linux-mm
  Cc: linux-kernel, Minchan Kim, Mel Gorman, Joonsoo Kim,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	David Rientjes

On 06/10/2015 05:32 AM, Vlastimil Babka wrote:
> Compaction should finish when the migration and free scanner meet, i.e. they
> reach the same pageblock. Currently however, the test in compact_finished()
> simply just compares the exact pfns, which may yield a false negative when the
> free scanner position is in the middle of a pageblock and the migration scanner
> reaches the beginning of the same pageblock.
> 
> This hasn't been a problem until commit e14c720efdd7 ("mm, compaction: remember
> position within pageblock in free pages scanner") allowed the free scanner
> position to be in the middle of a pageblock between invocations.  The hot-fix
> 1d5bfe1ffb5b ("mm, compaction: prevent infinite loop in compact_zone")
> prevented the issue by adding a special check in the migration scanner to
> satisfy the current detection of scanners meeting.
> 
> However, the proper fix is to make the detection more robust. This patch
> introduces the compact_scanners_met() function that returns true when the free
> scanner position is in the same or lower pageblock than the migration scanner.
> The special case in isolate_migratepages() introduced by 1d5bfe1ffb5b is
> removed.
> 
> Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>

Acked-by: Rik van Riel <riel@redhat.com>


-- 
All rights reversed

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/6] mm, compaction: more robust check for scanners meeting
@ 2015-06-10 18:02     ` Rik van Riel
  0 siblings, 0 replies; 50+ messages in thread
From: Rik van Riel @ 2015-06-10 18:02 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, linux-mm
  Cc: linux-kernel, Minchan Kim, Mel Gorman, Joonsoo Kim,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	David Rientjes

On 06/10/2015 05:32 AM, Vlastimil Babka wrote:
> Compaction should finish when the migration and free scanner meet, i.e. they
> reach the same pageblock. Currently however, the test in compact_finished()
> simply just compares the exact pfns, which may yield a false negative when the
> free scanner position is in the middle of a pageblock and the migration scanner
> reaches the beginning of the same pageblock.
> 
> This hasn't been a problem until commit e14c720efdd7 ("mm, compaction: remember
> position within pageblock in free pages scanner") allowed the free scanner
> position to be in the middle of a pageblock between invocations.  The hot-fix
> 1d5bfe1ffb5b ("mm, compaction: prevent infinite loop in compact_zone")
> prevented the issue by adding a special check in the migration scanner to
> satisfy the current detection of scanners meeting.
> 
> However, the proper fix is to make the detection more robust. This patch
> introduces the compact_scanners_met() function that returns true when the free
> scanner position is in the same or lower pageblock than the migration scanner.
> The special case in isolate_migratepages() introduced by 1d5bfe1ffb5b is
> removed.
> 
> Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>

Acked-by: Rik van Riel <riel@redhat.com>


-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/6] mm, compaction: more robust check for scanners meeting
  2015-06-10  9:32   ` Vlastimil Babka
@ 2015-06-12  9:55     ` Michal Nazarewicz
  -1 siblings, 0 replies; 50+ messages in thread
From: Michal Nazarewicz @ 2015-06-12  9:55 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Mel Gorman,
	Joonsoo Kim, Naoya Horiguchi, Christoph Lameter, Rik van Riel,
	David Rientjes

On Wed, Jun 10 2015, Vlastimil Babka wrote:
> Compaction should finish when the migration and free scanner meet, i.e. they
> reach the same pageblock. Currently however, the test in compact_finished()
> simply just compares the exact pfns, which may yield a false negative when the
> free scanner position is in the middle of a pageblock and the migration scanner
> reaches the beginning of the same pageblock.
>
> This hasn't been a problem until commit e14c720efdd7 ("mm, compaction: remember
> position within pageblock in free pages scanner") allowed the free scanner
> position to be in the middle of a pageblock between invocations.  The hot-fix
> 1d5bfe1ffb5b ("mm, compaction: prevent infinite loop in compact_zone")
> prevented the issue by adding a special check in the migration scanner to
> satisfy the current detection of scanners meeting.
>
> However, the proper fix is to make the detection more robust. This patch
> introduces the compact_scanners_met() function that returns true when the free
> scanner position is in the same or lower pageblock than the migration scanner.
> The special case in isolate_migratepages() introduced by 1d5bfe1ffb5b is
> removed.
>
> Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Michal Nazarewicz <mina86@mina86.com>

> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
>  mm/compaction.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 16e1b57..d46aaeb 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -902,6 +902,16 @@ static bool suitable_migration_target(struct page *page)
>  }
>  
>  /*
> + * Test whether the free scanner has reached the same or lower pageblock than
> + * the migration scanner, and compaction should thus terminate.
> + */
> +static inline bool compact_scanners_met(struct compact_control *cc)
> +{
> +	return (cc->free_pfn >> pageblock_order)
> +		<= (cc->migrate_pfn >> pageblock_order);
> +}
> +
> +/*
>   * Based on information in the current compact_control, find blocks
>   * suitable for isolating free pages from and then isolate them.
>   */
> @@ -1131,12 +1141,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  	}
>  
>  	acct_isolated(zone, cc);
> -	/*
> -	 * Record where migration scanner will be restarted. If we end up in
> -	 * the same pageblock as the free scanner, make the scanners fully
> -	 * meet so that compact_finished() terminates compaction.
> -	 */
> -	cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;
> +	/* Record where migration scanner will be restarted. */
> +	cc->migrate_pfn = low_pfn;
>  
>  	return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
>  }
> @@ -1151,7 +1157,7 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
>  		return COMPACT_PARTIAL;
>  
>  	/* Compaction run completes if the migrate and free scanner meet */
> -	if (cc->free_pfn <= cc->migrate_pfn) {
> +	if (compact_scanners_met(cc)) {
>  		/* Let the next compaction start anew. */
>  		zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
>  		zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> @@ -1380,7 +1386,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
>  			 * migrate_pages() may return -ENOMEM when scanners meet
>  			 * and we want compact_finished() to detect it
>  			 */
> -			if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
> +			if (err == -ENOMEM && !compact_scanners_met(cc)) {
>  				ret = COMPACT_PARTIAL;
>  				goto out;
>  			}
> -- 
> 2.1.4
>

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/6] mm, compaction: more robust check for scanners meeting
@ 2015-06-12  9:55     ` Michal Nazarewicz
  0 siblings, 0 replies; 50+ messages in thread
From: Michal Nazarewicz @ 2015-06-12  9:55 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, linux-mm
  Cc: linux-kernel, Minchan Kim, Mel Gorman, Joonsoo Kim,
	Naoya Horiguchi, Christoph Lameter, Rik van Riel, David Rientjes

On Wed, Jun 10 2015, Vlastimil Babka wrote:
> Compaction should finish when the migration and free scanner meet, i.e. they
> reach the same pageblock. Currently however, the test in compact_finished()
> simply just compares the exact pfns, which may yield a false negative when the
> free scanner position is in the middle of a pageblock and the migration scanner
> reaches the beginning of the same pageblock.
>
> This hasn't been a problem until commit e14c720efdd7 ("mm, compaction: remember
> position within pageblock in free pages scanner") allowed the free scanner
> position to be in the middle of a pageblock between invocations.  The hot-fix
> 1d5bfe1ffb5b ("mm, compaction: prevent infinite loop in compact_zone")
> prevented the issue by adding a special check in the migration scanner to
> satisfy the current detection of scanners meeting.
>
> However, the proper fix is to make the detection more robust. This patch
> introduces the compact_scanners_met() function that returns true when the free
> scanner position is in the same or lower pageblock than the migration scanner.
> The special case in isolate_migratepages() introduced by 1d5bfe1ffb5b is
> removed.
>
> Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Michal Nazarewicz <mina86@mina86.com>

> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
>  mm/compaction.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 16e1b57..d46aaeb 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -902,6 +902,16 @@ static bool suitable_migration_target(struct page *page)
>  }
>  
>  /*
> + * Test whether the free scanner has reached the same or lower pageblock than
> + * the migration scanner, and compaction should thus terminate.
> + */
> +static inline bool compact_scanners_met(struct compact_control *cc)
> +{
> +	return (cc->free_pfn >> pageblock_order)
> +		<= (cc->migrate_pfn >> pageblock_order);
> +}
> +
> +/*
>   * Based on information in the current compact_control, find blocks
>   * suitable for isolating free pages from and then isolate them.
>   */
> @@ -1131,12 +1141,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  	}
>  
>  	acct_isolated(zone, cc);
> -	/*
> -	 * Record where migration scanner will be restarted. If we end up in
> -	 * the same pageblock as the free scanner, make the scanners fully
> -	 * meet so that compact_finished() terminates compaction.
> -	 */
> -	cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;
> +	/* Record where migration scanner will be restarted. */
> +	cc->migrate_pfn = low_pfn;
>  
>  	return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
>  }
> @@ -1151,7 +1157,7 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
>  		return COMPACT_PARTIAL;
>  
>  	/* Compaction run completes if the migrate and free scanner meet */
> -	if (cc->free_pfn <= cc->migrate_pfn) {
> +	if (compact_scanners_met(cc)) {
>  		/* Let the next compaction start anew. */
>  		zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
>  		zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> @@ -1380,7 +1386,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
>  			 * migrate_pages() may return -ENOMEM when scanners meet
>  			 * and we want compact_finished() to detect it
>  			 */
> -			if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
> +			if (err == -ENOMEM && !compact_scanners_met(cc)) {
>  				ret = COMPACT_PARTIAL;
>  				goto out;
>  			}
> -- 
> 2.1.4
>

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/6] mm, compaction: encapsulate resetting cached scanner positions
  2015-06-10  9:32   ` Vlastimil Babka
@ 2015-06-12 10:07     ` Michal Nazarewicz
  -1 siblings, 0 replies; 50+ messages in thread
From: Michal Nazarewicz @ 2015-06-12 10:07 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Mel Gorman,
	Joonsoo Kim, Naoya Horiguchi, Christoph Lameter, Rik van Riel,
	David Rientjes

On Wed, Jun 10 2015, Vlastimil Babka wrote:
> Resetting the cached compaction scanner positions is now done implicitly in
> __reset_isolation_suitable() and compact_finished(). Encapsulate the
> functionality in a new function reset_cached_positions() and call it explicitly
> where needed.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Michal Nazarewicz <mina86@mina86.com>

> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
>  mm/compaction.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 7e0a814..d334bb3 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -207,6 +207,13 @@ static inline bool isolation_suitable(struct compact_control *cc,
>  	return !get_pageblock_skip(page);
>  }
>  
> +static void reset_cached_positions(struct zone *zone)
> +{
> +	zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
> +	zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> +	zone->compact_cached_free_pfn = zone_end_pfn(zone);
> +}
> +
>  /*
>   * This function is called to clear all cached information on pageblocks that
>   * should be skipped for page isolation when the migrate and free page scanner
> @@ -218,9 +225,6 @@ static void __reset_isolation_suitable(struct zone *zone)
>  	unsigned long end_pfn = zone_end_pfn(zone);
>  	unsigned long pfn;
>  
> -	zone->compact_cached_migrate_pfn[0] = start_pfn;
> -	zone->compact_cached_migrate_pfn[1] = start_pfn;
> -	zone->compact_cached_free_pfn = end_pfn;
>  	zone->compact_blockskip_flush = false;
>  
>  	/* Walk the zone and mark every pageblock as suitable for isolation */
> @@ -250,8 +254,10 @@ void reset_isolation_suitable(pg_data_t *pgdat)
>  			continue;
>  
>  		/* Only flush if a full compaction finished recently */
> -		if (zone->compact_blockskip_flush)
> +		if (zone->compact_blockskip_flush) {
>  			__reset_isolation_suitable(zone);
> +			reset_cached_positions(zone);
> +		}
>  	}
>  }
>  
> @@ -1164,9 +1170,7 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
>  	/* Compaction run completes if the migrate and free scanner meet */
>  	if (compact_scanners_met(cc)) {
>  		/* Let the next compaction start anew. */
> -		zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
> -		zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> -		zone->compact_cached_free_pfn = zone_end_pfn(zone);
> +		reset_cached_positions(zone);
>  
>  		/*
>  		 * Mark that the PG_migrate_skip information should be cleared
> @@ -1329,8 +1333,10 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
>  	 * is about to be retried after being deferred. kswapd does not do
>  	 * this reset as it'll reset the cached information when going to sleep.
>  	 */
> -	if (compaction_restarting(zone, cc->order) && !current_is_kswapd())
> +	if (compaction_restarting(zone, cc->order) && !current_is_kswapd()) {
>  		__reset_isolation_suitable(zone);
> +		reset_cached_positions(zone);
> +	}
>  
>  	/*
>  	 * Setup to move all movable pages to the end of the zone. Used cached
> -- 
> 2.1.4
>

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/6] mm, compaction: encapsulate resetting cached scanner positions
@ 2015-06-12 10:07     ` Michal Nazarewicz
  0 siblings, 0 replies; 50+ messages in thread
From: Michal Nazarewicz @ 2015-06-12 10:07 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, linux-mm
  Cc: linux-kernel, Minchan Kim, Mel Gorman, Joonsoo Kim,
	Naoya Horiguchi, Christoph Lameter, Rik van Riel, David Rientjes

On Wed, Jun 10 2015, Vlastimil Babka wrote:
> Resetting the cached compaction scanner positions is now done implicitly in
> __reset_isolation_suitable() and compact_finished(). Encapsulate the
> functionality in a new function reset_cached_positions() and call it explicitly
> where needed.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Michal Nazarewicz <mina86@mina86.com>

> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
>  mm/compaction.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 7e0a814..d334bb3 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -207,6 +207,13 @@ static inline bool isolation_suitable(struct compact_control *cc,
>  	return !get_pageblock_skip(page);
>  }
>  
> +static void reset_cached_positions(struct zone *zone)
> +{
> +	zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
> +	zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> +	zone->compact_cached_free_pfn = zone_end_pfn(zone);
> +}
> +
>  /*
>   * This function is called to clear all cached information on pageblocks that
>   * should be skipped for page isolation when the migrate and free page scanner
> @@ -218,9 +225,6 @@ static void __reset_isolation_suitable(struct zone *zone)
>  	unsigned long end_pfn = zone_end_pfn(zone);
>  	unsigned long pfn;
>  
> -	zone->compact_cached_migrate_pfn[0] = start_pfn;
> -	zone->compact_cached_migrate_pfn[1] = start_pfn;
> -	zone->compact_cached_free_pfn = end_pfn;
>  	zone->compact_blockskip_flush = false;
>  
>  	/* Walk the zone and mark every pageblock as suitable for isolation */
> @@ -250,8 +254,10 @@ void reset_isolation_suitable(pg_data_t *pgdat)
>  			continue;
>  
>  		/* Only flush if a full compaction finished recently */
> -		if (zone->compact_blockskip_flush)
> +		if (zone->compact_blockskip_flush) {
>  			__reset_isolation_suitable(zone);
> +			reset_cached_positions(zone);
> +		}
>  	}
>  }
>  
> @@ -1164,9 +1170,7 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
>  	/* Compaction run completes if the migrate and free scanner meet */
>  	if (compact_scanners_met(cc)) {
>  		/* Let the next compaction start anew. */
> -		zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
> -		zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> -		zone->compact_cached_free_pfn = zone_end_pfn(zone);
> +		reset_cached_positions(zone);
>  
>  		/*
>  		 * Mark that the PG_migrate_skip information should be cleared
> @@ -1329,8 +1333,10 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
>  	 * is about to be retried after being deferred. kswapd does not do
>  	 * this reset as it'll reset the cached information when going to sleep.
>  	 */
> -	if (compaction_restarting(zone, cc->order) && !current_is_kswapd())
> +	if (compaction_restarting(zone, cc->order) && !current_is_kswapd()) {
>  		__reset_isolation_suitable(zone);
> +		reset_cached_positions(zone);
> +	}
>  
>  	/*
>  	 * Setup to move all movable pages to the end of the zone. Used cached
> -- 
> 2.1.4
>

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/6] mm, compaction: always skip compound pages by order in migrate scanner
  2015-06-10  9:32   ` Vlastimil Babka
@ 2015-06-12 10:11     ` Michal Nazarewicz
  -1 siblings, 0 replies; 50+ messages in thread
From: Michal Nazarewicz @ 2015-06-12 10:11 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Mel Gorman,
	Joonsoo Kim, Naoya Horiguchi, Christoph Lameter, Rik van Riel,
	David Rientjes

On Wed, Jun 10 2015, Vlastimil Babka wrote:
> The compaction migrate scanner tries to skip compound pages by their order, to
> reduce number of iterations for pages it cannot isolate. The check is only done
> if PageLRU() is true, which means it applies to THP pages, but not e.g.
> hugetlbfs pages or any other non-LRU compound pages, which we have to iterate
> by base pages.
>
> This limitation comes from the assumption that it's only safe to read
> compound_order() when we have the zone's lru_lock and THP cannot be split under
> us. But the only danger (after filtering out order values that are not below
> MAX_ORDER, to prevent overflows) is that we skip too much or too little after
> reading a bogus compound_order() due to a rare race. This is the same reasoning
> as patch 99c0fd5e51c4 ("mm, compaction: skip buddy pages by their order in the
> migrate scanner") introduced for unsafely reading PageBuddy() order.
>
> After this patch, all pages are tested for PageCompound() and we skip them by
> compound_order().  The test is done after the test for balloon_page_movable()
> as we don't want to assume if balloon pages (or other pages with own isolation
> and migration implementation if a generic API gets implemented) are compound
> or not.
>
> When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
> pages, the vmstat compact_migrate_scanned count decreased by 15%.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Michal Nazarewicz <mina86@mina86.com>

> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
>  mm/compaction.c | 36 +++++++++++++++++-------------------
>  1 file changed, 17 insertions(+), 19 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index d334bb3..e37d361 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -680,6 +680,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  
>  	/* Time to isolate some pages for migration */
>  	for (; low_pfn < end_pfn; low_pfn++) {
> +		bool is_lru;
> +
>  		/*
>  		 * Periodically drop the lock (if held) regardless of its
>  		 * contention, to give chance to IRQs. Abort async compaction
> @@ -723,39 +725,35 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  		 * It's possible to migrate LRU pages and balloon pages
>  		 * Skip any other type of page
>  		 */
> -		if (!PageLRU(page)) {
> +		is_lru = PageLRU(page);
> +		if (!is_lru) {
>  			if (unlikely(balloon_page_movable(page))) {
>  				if (balloon_page_isolate(page)) {
>  					/* Successfully isolated */
>  					goto isolate_success;
>  				}
>  			}
> -			continue;
>  		}
>  
>  		/*
> -		 * PageLRU is set. lru_lock normally excludes isolation
> -		 * splitting and collapsing (collapsing has already happened
> -		 * if PageLRU is set) but the lock is not necessarily taken
> -		 * here and it is wasteful to take it just to check transhuge.
> -		 * Check PageCompound without lock and skip the whole pageblock
> -		 * if it's a transhuge page, as calling compound_order()
> -		 * without preventing THP from splitting the page underneath us
> -		 * may return surprising results.
> -		 * If we happen to check a THP tail page, compound_order()
> -		 * returns 0. It should be rare enough to not bother with
> -		 * using compound_head() in that case.
> +		 * Regardless of being on LRU, compound pages such as THP and
> +		 * hugetlbfs are not to be compacted. We can potentially save
> +		 * a lot of iterations if we skip them at once. The check is
> +		 * racy, but we can consider only valid values and the only
> +		 * danger is skipping too much.
>  		 */
>  		if (PageCompound(page)) {
> -			int nr;
> -			if (locked)
> -				nr = 1 << compound_order(page);
> -			else
> -				nr = pageblock_nr_pages;
> -			low_pfn += nr - 1;
> +			unsigned int comp_order = compound_order(page);
> +
> +			if (comp_order > 0 && comp_order < MAX_ORDER)
> +				low_pfn += (1UL << comp_order) - 1;
> +
>  			continue;
>  		}
>  
> +		if (!is_lru)
> +			continue;
> +
>  		/*
>  		 * Migration will fail if an anonymous page is pinned in memory,
>  		 * so avoid taking lru_lock and isolating it unnecessarily in an
> -- 
> 2.1.4
>

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/6] mm, compaction: always skip compound pages by order in migrate scanner
@ 2015-06-12 10:11     ` Michal Nazarewicz
  0 siblings, 0 replies; 50+ messages in thread
From: Michal Nazarewicz @ 2015-06-12 10:11 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, linux-mm
  Cc: linux-kernel, Minchan Kim, Mel Gorman, Joonsoo Kim,
	Naoya Horiguchi, Christoph Lameter, Rik van Riel, David Rientjes

On Wed, Jun 10 2015, Vlastimil Babka wrote:
> The compaction migrate scanner tries to skip compound pages by their order, to
> reduce number of iterations for pages it cannot isolate. The check is only done
> if PageLRU() is true, which means it applies to THP pages, but not e.g.
> hugetlbfs pages or any other non-LRU compound pages, which we have to iterate
> by base pages.
>
> This limitation comes from the assumption that it's only safe to read
> compound_order() when we have the zone's lru_lock and THP cannot be split under
> us. But the only danger (after filtering out order values that are not below
> MAX_ORDER, to prevent overflows) is that we skip too much or too little after
> reading a bogus compound_order() due to a rare race. This is the same reasoning
> as patch 99c0fd5e51c4 ("mm, compaction: skip buddy pages by their order in the
> migrate scanner") introduced for unsafely reading PageBuddy() order.
>
> After this patch, all pages are tested for PageCompound() and we skip them by
> compound_order().  The test is done after the test for balloon_page_movable()
> as we don't want to assume if balloon pages (or other pages with own isolation
> and migration implementation if a generic API gets implemented) are compound
> or not.
>
> When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
> pages, the vmstat compact_migrate_scanned count decreased by 15%.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Michal Nazarewicz <mina86@mina86.com>

> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
>  mm/compaction.c | 36 +++++++++++++++++-------------------
>  1 file changed, 17 insertions(+), 19 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index d334bb3..e37d361 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -680,6 +680,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  
>  	/* Time to isolate some pages for migration */
>  	for (; low_pfn < end_pfn; low_pfn++) {
> +		bool is_lru;
> +
>  		/*
>  		 * Periodically drop the lock (if held) regardless of its
>  		 * contention, to give chance to IRQs. Abort async compaction
> @@ -723,39 +725,35 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  		 * It's possible to migrate LRU pages and balloon pages
>  		 * Skip any other type of page
>  		 */
> -		if (!PageLRU(page)) {
> +		is_lru = PageLRU(page);
> +		if (!is_lru) {
>  			if (unlikely(balloon_page_movable(page))) {
>  				if (balloon_page_isolate(page)) {
>  					/* Successfully isolated */
>  					goto isolate_success;
>  				}
>  			}
> -			continue;
>  		}
>  
>  		/*
> -		 * PageLRU is set. lru_lock normally excludes isolation
> -		 * splitting and collapsing (collapsing has already happened
> -		 * if PageLRU is set) but the lock is not necessarily taken
> -		 * here and it is wasteful to take it just to check transhuge.
> -		 * Check PageCompound without lock and skip the whole pageblock
> -		 * if it's a transhuge page, as calling compound_order()
> -		 * without preventing THP from splitting the page underneath us
> -		 * may return surprising results.
> -		 * If we happen to check a THP tail page, compound_order()
> -		 * returns 0. It should be rare enough to not bother with
> -		 * using compound_head() in that case.
> +		 * Regardless of being on LRU, compound pages such as THP and
> +		 * hugetlbfs are not to be compacted. We can potentially save
> +		 * a lot of iterations if we skip them at once. The check is
> +		 * racy, but we can consider only valid values and the only
> +		 * danger is skipping too much.
>  		 */
>  		if (PageCompound(page)) {
> -			int nr;
> -			if (locked)
> -				nr = 1 << compound_order(page);
> -			else
> -				nr = pageblock_nr_pages;
> -			low_pfn += nr - 1;
> +			unsigned int comp_order = compound_order(page);
> +
> +			if (comp_order > 0 && comp_order < MAX_ORDER)
> +				low_pfn += (1UL << comp_order) - 1;
> +
>  			continue;
>  		}
>  
> +		if (!is_lru)
> +			continue;
> +
>  		/*
>  		 * Migration will fail if an anonymous page is pinned in memory,
>  		 * so avoid taking lru_lock and isolating it unnecessarily in an
> -- 
> 2.1.4
>

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 5/6] mm, compaction: skip compound pages by order in free scanner
  2015-06-10  9:32   ` Vlastimil Babka
@ 2015-06-12 10:18     ` Michal Nazarewicz
  -1 siblings, 0 replies; 50+ messages in thread
From: Michal Nazarewicz @ 2015-06-12 10:18 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Mel Gorman,
	Joonsoo Kim, Naoya Horiguchi, Christoph Lameter, Rik van Riel,
	David Rientjes

On Wed, Jun 10 2015, Vlastimil Babka wrote:
> The compaction free scanner is looking for PageBuddy() pages and skipping all
> others.  For large compound pages such as THP or hugetlbfs, we can save a lot
> of iterations if we skip them at once using their compound_order(). This is
> generally unsafe and we can read a bogus value of order due to a race, but if
> we are careful, the only danger is skipping too much.
>
> When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
> pages, the vmstat compact_free_scanned count decreased by at least 15%.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Michal Nazarewicz <mina86@mina86.com>

> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
>  mm/compaction.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index e37d361..4a14084 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -437,6 +437,24 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>  
>  		if (!valid_page)
>  			valid_page = page;
> +
> +		/*
> +		 * For compound pages such as THP and hugetlbfs, we can save
> +		 * potentially a lot of iterations if we skip them at once.
> +		 * The check is racy, but we can consider only valid values
> +		 * and the only danger is skipping too much.
> +		 */
> +		if (PageCompound(page)) {
> +			unsigned int comp_order = compound_order(page);
> +
> +			if (comp_order > 0 && comp_order < MAX_ORDER) {

+			if (comp_order < MAX_ORDER) {

Might produce shorter/faster code.  Dunno.  Maybe.  So much
micro-optimisations.  Applies to the previous patch as well.

> +				blockpfn += (1UL << comp_order) - 1;
> +				cursor += (1UL << comp_order) - 1;
> +			}
> +
> +			goto isolate_fail;
> +		}
> +
>  		if (!PageBuddy(page))
>  			goto isolate_fail;
>  
> @@ -496,6 +514,13 @@ isolate_fail:
>  
>  	}
>  
> +	/*
> +	 * There is a tiny chance that we have read bogus compound_order(),
> +	 * so be careful to not go outside of the pageblock.
> +	 */
> +	if (unlikely(blockpfn > end_pfn))
> +		blockpfn = end_pfn;
> +
>  	trace_mm_compaction_isolate_freepages(*start_pfn, blockpfn,
>  					nr_scanned, total_isolated);
>  
> -- 
> 2.1.4
>

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 5/6] mm, compaction: skip compound pages by order in free scanner
@ 2015-06-12 10:18     ` Michal Nazarewicz
  0 siblings, 0 replies; 50+ messages in thread
From: Michal Nazarewicz @ 2015-06-12 10:18 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, linux-mm
  Cc: linux-kernel, Minchan Kim, Mel Gorman, Joonsoo Kim,
	Naoya Horiguchi, Christoph Lameter, Rik van Riel, David Rientjes

On Wed, Jun 10 2015, Vlastimil Babka wrote:
> The compaction free scanner is looking for PageBuddy() pages and skipping all
> others.  For large compound pages such as THP or hugetlbfs, we can save a lot
> of iterations if we skip them at once using their compound_order(). This is
> generally unsafe and we can read a bogus value of order due to a race, but if
> we are careful, the only danger is skipping too much.
>
> When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
> pages, the vmstat compact_free_scanned count decreased by at least 15%.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Michal Nazarewicz <mina86@mina86.com>

> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
>  mm/compaction.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index e37d361..4a14084 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -437,6 +437,24 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>  
>  		if (!valid_page)
>  			valid_page = page;
> +
> +		/*
> +		 * For compound pages such as THP and hugetlbfs, we can save
> +		 * potentially a lot of iterations if we skip them at once.
> +		 * The check is racy, but we can consider only valid values
> +		 * and the only danger is skipping too much.
> +		 */
> +		if (PageCompound(page)) {
> +			unsigned int comp_order = compound_order(page);
> +
> +			if (comp_order > 0 && comp_order < MAX_ORDER) {

+			if (comp_order < MAX_ORDER) {

Might produce shorter/faster code.  Dunno.  Maybe.  So much
micro-optimisations.  Applies to the previous patch as well.

> +				blockpfn += (1UL << comp_order) - 1;
> +				cursor += (1UL << comp_order) - 1;
> +			}
> +
> +			goto isolate_fail;
> +		}
> +
>  		if (!PageBuddy(page))
>  			goto isolate_fail;
>  
> @@ -496,6 +514,13 @@ isolate_fail:
>  
>  	}
>  
> +	/*
> +	 * There is a tiny chance that we have read bogus compound_order(),
> +	 * so be careful to not go outside of the pageblock.
> +	 */
> +	if (unlikely(blockpfn > end_pfn))
> +		blockpfn = end_pfn;
> +
>  	trace_mm_compaction_isolate_freepages(*start_pfn, blockpfn,
>  					nr_scanned, total_isolated);
>  
> -- 
> 2.1.4
>

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/6] mm, compaction: more robust check for scanners meeting
  2015-06-10  9:32   ` Vlastimil Babka
@ 2015-06-16  5:37     ` Joonsoo Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Joonsoo Kim @ 2015-06-16  5:37 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Mel Gorman,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On Wed, Jun 10, 2015 at 11:32:29AM +0200, Vlastimil Babka wrote:
> Compaction should finish when the migration and free scanner meet, i.e. they
> reach the same pageblock. Currently however, the test in compact_finished()
> simply just compares the exact pfns, which may yield a false negative when the
> free scanner position is in the middle of a pageblock and the migration scanner
> reaches the beginning of the same pageblock.
> 
> This hasn't been a problem until commit e14c720efdd7 ("mm, compaction: remember
> position within pageblock in free pages scanner") allowed the free scanner
> position to be in the middle of a pageblock between invocations.  The hot-fix
> 1d5bfe1ffb5b ("mm, compaction: prevent infinite loop in compact_zone")
> prevented the issue by adding a special check in the migration scanner to
> satisfy the current detection of scanners meeting.
> 
> However, the proper fix is to make the detection more robust. This patch
> introduces the compact_scanners_met() function that returns true when the free
> scanner position is in the same or lower pageblock than the migration scanner.
> The special case in isolate_migratepages() introduced by 1d5bfe1ffb5b is
> removed.
> 
> Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/6] mm, compaction: more robust check for scanners meeting
@ 2015-06-16  5:37     ` Joonsoo Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Joonsoo Kim @ 2015-06-16  5:37 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Mel Gorman,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On Wed, Jun 10, 2015 at 11:32:29AM +0200, Vlastimil Babka wrote:
> Compaction should finish when the migration and free scanner meet, i.e. they
> reach the same pageblock. Currently however, the test in compact_finished()
> simply just compares the exact pfns, which may yield a false negative when the
> free scanner position is in the middle of a pageblock and the migration scanner
> reaches the beginning of the same pageblock.
> 
> This hasn't been a problem until commit e14c720efdd7 ("mm, compaction: remember
> position within pageblock in free pages scanner") allowed the free scanner
> position to be in the middle of a pageblock between invocations.  The hot-fix
> 1d5bfe1ffb5b ("mm, compaction: prevent infinite loop in compact_zone")
> prevented the issue by adding a special check in the migration scanner to
> satisfy the current detection of scanners meeting.
> 
> However, the proper fix is to make the detection more robust. This patch
> introduces the compact_scanners_met() function that returns true when the free
> scanner position is in the same or lower pageblock than the migration scanner.
> The special case in isolate_migratepages() introduced by 1d5bfe1ffb5b is
> removed.
> 
> Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/6] mm, compaction: simplify handling restart position in free pages scanner
  2015-06-10  9:32   ` Vlastimil Babka
@ 2015-06-16  5:38     ` Joonsoo Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Joonsoo Kim @ 2015-06-16  5:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Mel Gorman,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On Wed, Jun 10, 2015 at 11:32:30AM +0200, Vlastimil Babka wrote:
> Handling the position where compaction free scanner should restart (stored in
> cc->free_pfn) got more complex with commit e14c720efdd7 ("mm, compaction:
> remember position within pageblock in free pages scanner"). Currently the
> position is updated in each loop iteration of isolate_freepages(), although it
> should be enough to update it only when breaking from the loop. There's also
> an extra check outside the loop updates the position in case we have met the
> migration scanner.
> 
> This can be simplified if we move the test for having isolated enough from the
> for loop header next to the test for contention, and determining the restart
> position only in these cases. We can reuse the isolate_start_pfn variable for
> this instead of setting cc->free_pfn directly. Outside the loop, we can simply
> set cc->free_pfn to current value of isolate_start_pfn without any extra check.
> 
> Also add a VM_BUG_ON to catch possible mistake in the future, in case we later
> add a new condition that terminates isolate_freepages_block() prematurely
> without also considering the condition in isolate_freepages().
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/6] mm, compaction: simplify handling restart position in free pages scanner
@ 2015-06-16  5:38     ` Joonsoo Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Joonsoo Kim @ 2015-06-16  5:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Mel Gorman,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On Wed, Jun 10, 2015 at 11:32:30AM +0200, Vlastimil Babka wrote:
> Handling the position where compaction free scanner should restart (stored in
> cc->free_pfn) got more complex with commit e14c720efdd7 ("mm, compaction:
> remember position within pageblock in free pages scanner"). Currently the
> position is updated in each loop iteration of isolate_freepages(), although it
> should be enough to update it only when breaking from the loop. There's also
> an extra check outside the loop updates the position in case we have met the
> migration scanner.
> 
> This can be simplified if we move the test for having isolated enough from the
> for loop header next to the test for contention, and determining the restart
> position only in these cases. We can reuse the isolate_start_pfn variable for
> this instead of setting cc->free_pfn directly. Outside the loop, we can simply
> set cc->free_pfn to current value of isolate_start_pfn without any extra check.
> 
> Also add a VM_BUG_ON to catch possible mistake in the future, in case we later
> add a new condition that terminates isolate_freepages_block() prematurely
> without also considering the condition in isolate_freepages().
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/6] mm, compaction: encapsulate resetting cached scanner positions
  2015-06-10  9:32   ` Vlastimil Babka
@ 2015-06-16  5:41     ` Joonsoo Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Joonsoo Kim @ 2015-06-16  5:41 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Mel Gorman,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On Wed, Jun 10, 2015 at 11:32:31AM +0200, Vlastimil Babka wrote:
> Resetting the cached compaction scanner positions is now done implicitly in
> __reset_isolation_suitable() and compact_finished(). Encapsulate the
> functionality in a new function reset_cached_positions() and call it explicitly
> where needed.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
>  mm/compaction.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 7e0a814..d334bb3 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -207,6 +207,13 @@ static inline bool isolation_suitable(struct compact_control *cc,
>  	return !get_pageblock_skip(page);
>  }
>  
> +static void reset_cached_positions(struct zone *zone)
> +{
> +	zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
> +	zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> +	zone->compact_cached_free_pfn = zone_end_pfn(zone);
> +}
> +
>  /*
>   * This function is called to clear all cached information on pageblocks that
>   * should be skipped for page isolation when the migrate and free page scanner
> @@ -218,9 +225,6 @@ static void __reset_isolation_suitable(struct zone *zone)
>  	unsigned long end_pfn = zone_end_pfn(zone);
>  	unsigned long pfn;
>  
> -	zone->compact_cached_migrate_pfn[0] = start_pfn;
> -	zone->compact_cached_migrate_pfn[1] = start_pfn;
> -	zone->compact_cached_free_pfn = end_pfn;
>  	zone->compact_blockskip_flush = false;

Is there a valid reason not to call reset_cached_positions() in
__reset_isolation_suitable? You missed one callsite in
__compact_pgdat().

        if (cc->order == -1)
                __reset_isolation_suitable(zone);

This also needs reset_cached_positions().

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/6] mm, compaction: encapsulate resetting cached scanner positions
@ 2015-06-16  5:41     ` Joonsoo Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Joonsoo Kim @ 2015-06-16  5:41 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Mel Gorman,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On Wed, Jun 10, 2015 at 11:32:31AM +0200, Vlastimil Babka wrote:
> Resetting the cached compaction scanner positions is now done implicitly in
> __reset_isolation_suitable() and compact_finished(). Encapsulate the
> functionality in a new function reset_cached_positions() and call it explicitly
> where needed.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
>  mm/compaction.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 7e0a814..d334bb3 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -207,6 +207,13 @@ static inline bool isolation_suitable(struct compact_control *cc,
>  	return !get_pageblock_skip(page);
>  }
>  
> +static void reset_cached_positions(struct zone *zone)
> +{
> +	zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
> +	zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
> +	zone->compact_cached_free_pfn = zone_end_pfn(zone);
> +}
> +
>  /*
>   * This function is called to clear all cached information on pageblocks that
>   * should be skipped for page isolation when the migrate and free page scanner
> @@ -218,9 +225,6 @@ static void __reset_isolation_suitable(struct zone *zone)
>  	unsigned long end_pfn = zone_end_pfn(zone);
>  	unsigned long pfn;
>  
> -	zone->compact_cached_migrate_pfn[0] = start_pfn;
> -	zone->compact_cached_migrate_pfn[1] = start_pfn;
> -	zone->compact_cached_free_pfn = end_pfn;
>  	zone->compact_blockskip_flush = false;

Is there a valid reason not to call reset_cached_positions() in
__reset_isolation_suitable? You missed one callsite in
__compact_pgdat().

        if (cc->order == -1)
                __reset_isolation_suitable(zone);

This also needs reset_cached_positions().

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/6] mm, compaction: always skip compound pages by order in migrate scanner
  2015-06-10  9:32   ` Vlastimil Babka
@ 2015-06-16  5:44     ` Joonsoo Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Joonsoo Kim @ 2015-06-16  5:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Mel Gorman,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On Wed, Jun 10, 2015 at 11:32:32AM +0200, Vlastimil Babka wrote:
> The compaction migrate scanner tries to skip compound pages by their order, to
> reduce number of iterations for pages it cannot isolate. The check is only done
> if PageLRU() is true, which means it applies to THP pages, but not e.g.
> hugetlbfs pages or any other non-LRU compound pages, which we have to iterate
> by base pages.
> 
> This limitation comes from the assumption that it's only safe to read
> compound_order() when we have the zone's lru_lock and THP cannot be split under
> us. But the only danger (after filtering out order values that are not below
> MAX_ORDER, to prevent overflows) is that we skip too much or too little after
> reading a bogus compound_order() due to a rare race. This is the same reasoning
> as patch 99c0fd5e51c4 ("mm, compaction: skip buddy pages by their order in the
> migrate scanner") introduced for unsafely reading PageBuddy() order.
> 
> After this patch, all pages are tested for PageCompound() and we skip them by
> compound_order().  The test is done after the test for balloon_page_movable()
> as we don't want to assume if balloon pages (or other pages with own isolation
> and migration implementation if a generic API gets implemented) are compound
> or not.
> 
> When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
> pages, the vmstat compact_migrate_scanned count decreased by 15%.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
>  mm/compaction.c | 36 +++++++++++++++++-------------------
>  1 file changed, 17 insertions(+), 19 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index d334bb3..e37d361 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -680,6 +680,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  
>  	/* Time to isolate some pages for migration */
>  	for (; low_pfn < end_pfn; low_pfn++) {
> +		bool is_lru;
> +
>  		/*
>  		 * Periodically drop the lock (if held) regardless of its
>  		 * contention, to give chance to IRQs. Abort async compaction
> @@ -723,39 +725,35 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  		 * It's possible to migrate LRU pages and balloon pages
>  		 * Skip any other type of page
>  		 */
> -		if (!PageLRU(page)) {
> +		is_lru = PageLRU(page);
> +		if (!is_lru) {
>  			if (unlikely(balloon_page_movable(page))) {
>  				if (balloon_page_isolate(page)) {
>  					/* Successfully isolated */
>  					goto isolate_success;
>  				}
>  			}
> -			continue;
>  		}
>  
>  		/*
> -		 * PageLRU is set. lru_lock normally excludes isolation
> -		 * splitting and collapsing (collapsing has already happened
> -		 * if PageLRU is set) but the lock is not necessarily taken
> -		 * here and it is wasteful to take it just to check transhuge.
> -		 * Check PageCompound without lock and skip the whole pageblock
> -		 * if it's a transhuge page, as calling compound_order()
> -		 * without preventing THP from splitting the page underneath us
> -		 * may return surprising results.
> -		 * If we happen to check a THP tail page, compound_order()
> -		 * returns 0. It should be rare enough to not bother with
> -		 * using compound_head() in that case.
> +		 * Regardless of being on LRU, compound pages such as THP and
> +		 * hugetlbfs are not to be compacted. We can potentially save
> +		 * a lot of iterations if we skip them at once. The check is
> +		 * racy, but we can consider only valid values and the only
> +		 * danger is skipping too much.
>  		 */
>  		if (PageCompound(page)) {
> -			int nr;
> -			if (locked)
> -				nr = 1 << compound_order(page);
> -			else
> -				nr = pageblock_nr_pages;
> -			low_pfn += nr - 1;
> +			unsigned int comp_order = compound_order(page);
> +
> +			if (comp_order > 0 && comp_order < MAX_ORDER)
> +				low_pfn += (1UL << comp_order) - 1;
> +
>  			continue;
>  		}

How about moving this PageCompound() check up to the PageLRU check?
Is there any relationship between balloon page and PageCompound()?
It will remove is_lru and code would be more understandable.

Otherwise,

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/6] mm, compaction: always skip compound pages by order in migrate scanner
@ 2015-06-16  5:44     ` Joonsoo Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Joonsoo Kim @ 2015-06-16  5:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Mel Gorman,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On Wed, Jun 10, 2015 at 11:32:32AM +0200, Vlastimil Babka wrote:
> The compaction migrate scanner tries to skip compound pages by their order, to
> reduce number of iterations for pages it cannot isolate. The check is only done
> if PageLRU() is true, which means it applies to THP pages, but not e.g.
> hugetlbfs pages or any other non-LRU compound pages, which we have to iterate
> by base pages.
> 
> This limitation comes from the assumption that it's only safe to read
> compound_order() when we have the zone's lru_lock and THP cannot be split under
> us. But the only danger (after filtering out order values that are not below
> MAX_ORDER, to prevent overflows) is that we skip too much or too little after
> reading a bogus compound_order() due to a rare race. This is the same reasoning
> as patch 99c0fd5e51c4 ("mm, compaction: skip buddy pages by their order in the
> migrate scanner") introduced for unsafely reading PageBuddy() order.
> 
> After this patch, all pages are tested for PageCompound() and we skip them by
> compound_order().  The test is done after the test for balloon_page_movable()
> as we don't want to assume if balloon pages (or other pages with own isolation
> and migration implementation if a generic API gets implemented) are compound
> or not.
> 
> When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
> pages, the vmstat compact_migrate_scanned count decreased by 15%.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
>  mm/compaction.c | 36 +++++++++++++++++-------------------
>  1 file changed, 17 insertions(+), 19 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index d334bb3..e37d361 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -680,6 +680,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  
>  	/* Time to isolate some pages for migration */
>  	for (; low_pfn < end_pfn; low_pfn++) {
> +		bool is_lru;
> +
>  		/*
>  		 * Periodically drop the lock (if held) regardless of its
>  		 * contention, to give chance to IRQs. Abort async compaction
> @@ -723,39 +725,35 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  		 * It's possible to migrate LRU pages and balloon pages
>  		 * Skip any other type of page
>  		 */
> -		if (!PageLRU(page)) {
> +		is_lru = PageLRU(page);
> +		if (!is_lru) {
>  			if (unlikely(balloon_page_movable(page))) {
>  				if (balloon_page_isolate(page)) {
>  					/* Successfully isolated */
>  					goto isolate_success;
>  				}
>  			}
> -			continue;
>  		}
>  
>  		/*
> -		 * PageLRU is set. lru_lock normally excludes isolation
> -		 * splitting and collapsing (collapsing has already happened
> -		 * if PageLRU is set) but the lock is not necessarily taken
> -		 * here and it is wasteful to take it just to check transhuge.
> -		 * Check PageCompound without lock and skip the whole pageblock
> -		 * if it's a transhuge page, as calling compound_order()
> -		 * without preventing THP from splitting the page underneath us
> -		 * may return surprising results.
> -		 * If we happen to check a THP tail page, compound_order()
> -		 * returns 0. It should be rare enough to not bother with
> -		 * using compound_head() in that case.
> +		 * Regardless of being on LRU, compound pages such as THP and
> +		 * hugetlbfs are not to be compacted. We can potentially save
> +		 * a lot of iterations if we skip them at once. The check is
> +		 * racy, but we can consider only valid values and the only
> +		 * danger is skipping too much.
>  		 */
>  		if (PageCompound(page)) {
> -			int nr;
> -			if (locked)
> -				nr = 1 << compound_order(page);
> -			else
> -				nr = pageblock_nr_pages;
> -			low_pfn += nr - 1;
> +			unsigned int comp_order = compound_order(page);
> +
> +			if (comp_order > 0 && comp_order < MAX_ORDER)
> +				low_pfn += (1UL << comp_order) - 1;
> +
>  			continue;
>  		}

How about moving this PageCompound() check up to the PageLRU check?
Is there any relationship between balloon page and PageCompound()?
It will remove is_lru and code would be more understandable.

Otherwise,

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 5/6] mm, compaction: skip compound pages by order in free scanner
  2015-06-10  9:32   ` Vlastimil Babka
@ 2015-06-16  5:45     ` Joonsoo Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Joonsoo Kim @ 2015-06-16  5:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Mel Gorman,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On Wed, Jun 10, 2015 at 11:32:33AM +0200, Vlastimil Babka wrote:
> The compaction free scanner is looking for PageBuddy() pages and skipping all
> others.  For large compound pages such as THP or hugetlbfs, we can save a lot
> of iterations if we skip them at once using their compound_order(). This is
> generally unsafe and we can read a bogus value of order due to a race, but if
> we are careful, the only danger is skipping too much.
> 
> When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
> pages, the vmstat compact_free_scanned count decreased by at least 15%.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>


Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 5/6] mm, compaction: skip compound pages by order in free scanner
@ 2015-06-16  5:45     ` Joonsoo Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Joonsoo Kim @ 2015-06-16  5:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Mel Gorman,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On Wed, Jun 10, 2015 at 11:32:33AM +0200, Vlastimil Babka wrote:
> The compaction free scanner is looking for PageBuddy() pages and skipping all
> others.  For large compound pages such as THP or hugetlbfs, we can save a lot
> of iterations if we skip them at once using their compound_order(). This is
> generally unsafe and we can read a bogus value of order due to a race, but if
> we are careful, the only danger is skipping too much.
> 
> When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
> pages, the vmstat compact_free_scanned count decreased by at least 15%.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>


Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] mm, compaction: decouple updating pageblock_skip and cached pfn
  2015-06-10  9:32   ` Vlastimil Babka
@ 2015-06-16  6:10     ` Joonsoo Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Joonsoo Kim @ 2015-06-16  6:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Mel Gorman,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On Wed, Jun 10, 2015 at 11:32:34AM +0200, Vlastimil Babka wrote:
> The pageblock_skip bitmap and cached scanner pfn's are two mechanisms in
> compaction to prevent rescanning pages where isolation has recently failed
> or they were scanned during the previous compaction attempt.
> 
> Currently, both kinds of information are updated via update_pageblock_skip(),
> which is suboptimal for the cached scanner pfn's:
> 
> - The condition "isolation has failed in the pageblock" checked by
>   update_pageblock_skip() may be valid for the pageblock_skip bitmap, but makes
>   less sense for cached pfn's. There's little point for the next compaction
>   attempt to scan again a pageblock where all pages that could be isolated were
>   already processed.

In async compaction, compaction could be stopped due to cc->contended
in freepage scanner so sometimes isolated pages were not migrated. Your
change makes next async compaction skip these pages. This possibly causes
compaction complete prematurely by async compaction.

And, rescan previous attempted range could solve some race problem.
If allocated page waits to set PageLRU in pagevec, compaction will
pass it. If we try rescan after short time, page will have PageLRU and
compaction can isolate and migrate it and make high order freepage. This
requires some rescanning overhead but migration overhead which is more bigger
than scanning overhead is just a little. If compaction pass it like as
this change, pages on this area would be allocated for other requestor, and,
when compaction revisit, there would be more page to migrate.

I basically agree with this change because it is more intuitive. But,
I'd like to see some improvement result or test this patch myself before merging
it.

Thanks.

> 
> - whole pageblocks can be skipped at the level of isolate_migratepages() or
>   isolate_freepages() before going into the corresponding _block() function.
>   Not updating cached scanner positions at the higher level may again result
>   in extra iterations.
> 
> This patch moves updating cached scanner pfn's from update_pageblock_skip()
> to dedicated functions, which are called directly from isolate_migratepages()
> and isolate_freepages(), resolving both inefficiencies.
> 
> During testing, the observed differences in compact_migrate_scanned and
> compact_free_scanned were lost in the noise.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
>  mm/compaction.c | 48 +++++++++++++++++++++++++-----------------------
>  1 file changed, 25 insertions(+), 23 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 4a14084..c326607 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -261,17 +261,31 @@ void reset_isolation_suitable(pg_data_t *pgdat)
>  	}
>  }
>  
> +static inline void
> +update_cached_migrate_pfn(struct zone *zone, unsigned long pfn,
> +						enum migrate_mode mode)
> +{
> +	if (pfn > zone->compact_cached_migrate_pfn[0])
> +		zone->compact_cached_migrate_pfn[0] = pfn;
> +	if (mode != MIGRATE_ASYNC &&
> +	    pfn > zone->compact_cached_migrate_pfn[1])
> +		zone->compact_cached_migrate_pfn[1] = pfn;
> +}
> +
> +static inline void
> +update_cached_free_pfn(struct zone *zone, unsigned long pfn)
> +{
> +	if (pfn < zone->compact_cached_free_pfn)
> +		zone->compact_cached_free_pfn = pfn;
> +}
> +
>  /*
>   * If no pages were isolated then mark this pageblock to be skipped in the
>   * future. The information is later cleared by __reset_isolation_suitable().
>   */
>  static void update_pageblock_skip(struct compact_control *cc,
> -			struct page *page, unsigned long nr_isolated,
> -			bool migrate_scanner)
> +			struct page *page, unsigned long nr_isolated)
>  {
> -	struct zone *zone = cc->zone;
> -	unsigned long pfn;
> -
>  	if (cc->ignore_skip_hint)
>  		return;
>  
> @@ -282,20 +296,6 @@ static void update_pageblock_skip(struct compact_control *cc,
>  		return;
>  
>  	set_pageblock_skip(page);
> -
> -	pfn = page_to_pfn(page);
> -
> -	/* Update where async and sync compaction should restart */
> -	if (migrate_scanner) {
> -		if (pfn > zone->compact_cached_migrate_pfn[0])
> -			zone->compact_cached_migrate_pfn[0] = pfn;
> -		if (cc->mode != MIGRATE_ASYNC &&
> -		    pfn > zone->compact_cached_migrate_pfn[1])
> -			zone->compact_cached_migrate_pfn[1] = pfn;
> -	} else {
> -		if (pfn < zone->compact_cached_free_pfn)
> -			zone->compact_cached_free_pfn = pfn;
> -	}
>  }
>  #else
>  static inline bool isolation_suitable(struct compact_control *cc,
> @@ -305,8 +305,7 @@ static inline bool isolation_suitable(struct compact_control *cc,
>  }
>  
>  static void update_pageblock_skip(struct compact_control *cc,
> -			struct page *page, unsigned long nr_isolated,
> -			bool migrate_scanner)
> +			struct page *page, unsigned long nr_isolated)
>  {
>  }
>  #endif /* CONFIG_COMPACTION */
> @@ -540,7 +539,7 @@ isolate_fail:
>  
>  	/* Update the pageblock-skip if the whole pageblock was scanned */
>  	if (blockpfn == end_pfn)
> -		update_pageblock_skip(cc, valid_page, total_isolated, false);
> +		update_pageblock_skip(cc, valid_page, total_isolated);
>  
>  	count_compact_events(COMPACTFREE_SCANNED, nr_scanned);
>  	if (total_isolated)
> @@ -843,7 +842,7 @@ isolate_success:
>  	 * if the whole pageblock was scanned without isolating any page.
>  	 */
>  	if (low_pfn == end_pfn)
> -		update_pageblock_skip(cc, valid_page, nr_isolated, true);
> +		update_pageblock_skip(cc, valid_page, nr_isolated);
>  
>  	trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
>  						nr_scanned, nr_isolated);
> @@ -1043,6 +1042,7 @@ static void isolate_freepages(struct compact_control *cc)
>  	 * and the loop terminated due to isolate_start_pfn < low_pfn
>  	 */
>  	cc->free_pfn = isolate_start_pfn;
> +	update_cached_free_pfn(zone, isolate_start_pfn);
>  }
>  
>  /*
> @@ -1177,6 +1177,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  	acct_isolated(zone, cc);
>  	/* Record where migration scanner will be restarted. */
>  	cc->migrate_pfn = low_pfn;
> +	update_cached_migrate_pfn(zone, low_pfn, cc->mode);
> +
>  
>  	return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
>  }
> -- 
> 2.1.4
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] mm, compaction: decouple updating pageblock_skip and cached pfn
@ 2015-06-16  6:10     ` Joonsoo Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Joonsoo Kim @ 2015-06-16  6:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Mel Gorman,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On Wed, Jun 10, 2015 at 11:32:34AM +0200, Vlastimil Babka wrote:
> The pageblock_skip bitmap and cached scanner pfn's are two mechanisms in
> compaction to prevent rescanning pages where isolation has recently failed
> or they were scanned during the previous compaction attempt.
> 
> Currently, both kinds of information are updated via update_pageblock_skip(),
> which is suboptimal for the cached scanner pfn's:
> 
> - The condition "isolation has failed in the pageblock" checked by
>   update_pageblock_skip() may be valid for the pageblock_skip bitmap, but makes
>   less sense for cached pfn's. There's little point for the next compaction
>   attempt to scan again a pageblock where all pages that could be isolated were
>   already processed.

In async compaction, compaction could be stopped due to cc->contended
in freepage scanner so sometimes isolated pages were not migrated. Your
change makes next async compaction skip these pages. This possibly causes
compaction complete prematurely by async compaction.

And, rescan previous attempted range could solve some race problem.
If allocated page waits to set PageLRU in pagevec, compaction will
pass it. If we try rescan after short time, page will have PageLRU and
compaction can isolate and migrate it and make high order freepage. This
requires some rescanning overhead but migration overhead which is more bigger
than scanning overhead is just a little. If compaction pass it like as
this change, pages on this area would be allocated for other requestor, and,
when compaction revisit, there would be more page to migrate.

I basically agree with this change because it is more intuitive. But,
I'd like to see some improvement result or test this patch myself before merging
it.

Thanks.

> 
> - whole pageblocks can be skipped at the level of isolate_migratepages() or
>   isolate_freepages() before going into the corresponding _block() function.
>   Not updating cached scanner positions at the higher level may again result
>   in extra iterations.
> 
> This patch moves updating cached scanner pfn's from update_pageblock_skip()
> to dedicated functions, which are called directly from isolate_migratepages()
> and isolate_freepages(), resolving both inefficiencies.
> 
> During testing, the observed differences in compact_migrate_scanned and
> compact_free_scanned were lost in the noise.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
>  mm/compaction.c | 48 +++++++++++++++++++++++++-----------------------
>  1 file changed, 25 insertions(+), 23 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 4a14084..c326607 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -261,17 +261,31 @@ void reset_isolation_suitable(pg_data_t *pgdat)
>  	}
>  }
>  
> +static inline void
> +update_cached_migrate_pfn(struct zone *zone, unsigned long pfn,
> +						enum migrate_mode mode)
> +{
> +	if (pfn > zone->compact_cached_migrate_pfn[0])
> +		zone->compact_cached_migrate_pfn[0] = pfn;
> +	if (mode != MIGRATE_ASYNC &&
> +	    pfn > zone->compact_cached_migrate_pfn[1])
> +		zone->compact_cached_migrate_pfn[1] = pfn;
> +}
> +
> +static inline void
> +update_cached_free_pfn(struct zone *zone, unsigned long pfn)
> +{
> +	if (pfn < zone->compact_cached_free_pfn)
> +		zone->compact_cached_free_pfn = pfn;
> +}
> +
>  /*
>   * If no pages were isolated then mark this pageblock to be skipped in the
>   * future. The information is later cleared by __reset_isolation_suitable().
>   */
>  static void update_pageblock_skip(struct compact_control *cc,
> -			struct page *page, unsigned long nr_isolated,
> -			bool migrate_scanner)
> +			struct page *page, unsigned long nr_isolated)
>  {
> -	struct zone *zone = cc->zone;
> -	unsigned long pfn;
> -
>  	if (cc->ignore_skip_hint)
>  		return;
>  
> @@ -282,20 +296,6 @@ static void update_pageblock_skip(struct compact_control *cc,
>  		return;
>  
>  	set_pageblock_skip(page);
> -
> -	pfn = page_to_pfn(page);
> -
> -	/* Update where async and sync compaction should restart */
> -	if (migrate_scanner) {
> -		if (pfn > zone->compact_cached_migrate_pfn[0])
> -			zone->compact_cached_migrate_pfn[0] = pfn;
> -		if (cc->mode != MIGRATE_ASYNC &&
> -		    pfn > zone->compact_cached_migrate_pfn[1])
> -			zone->compact_cached_migrate_pfn[1] = pfn;
> -	} else {
> -		if (pfn < zone->compact_cached_free_pfn)
> -			zone->compact_cached_free_pfn = pfn;
> -	}
>  }
>  #else
>  static inline bool isolation_suitable(struct compact_control *cc,
> @@ -305,8 +305,7 @@ static inline bool isolation_suitable(struct compact_control *cc,
>  }
>  
>  static void update_pageblock_skip(struct compact_control *cc,
> -			struct page *page, unsigned long nr_isolated,
> -			bool migrate_scanner)
> +			struct page *page, unsigned long nr_isolated)
>  {
>  }
>  #endif /* CONFIG_COMPACTION */
> @@ -540,7 +539,7 @@ isolate_fail:
>  
>  	/* Update the pageblock-skip if the whole pageblock was scanned */
>  	if (blockpfn == end_pfn)
> -		update_pageblock_skip(cc, valid_page, total_isolated, false);
> +		update_pageblock_skip(cc, valid_page, total_isolated);
>  
>  	count_compact_events(COMPACTFREE_SCANNED, nr_scanned);
>  	if (total_isolated)
> @@ -843,7 +842,7 @@ isolate_success:
>  	 * if the whole pageblock was scanned without isolating any page.
>  	 */
>  	if (low_pfn == end_pfn)
> -		update_pageblock_skip(cc, valid_page, nr_isolated, true);
> +		update_pageblock_skip(cc, valid_page, nr_isolated);
>  
>  	trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
>  						nr_scanned, nr_isolated);
> @@ -1043,6 +1042,7 @@ static void isolate_freepages(struct compact_control *cc)
>  	 * and the loop terminated due to isolate_start_pfn < low_pfn
>  	 */
>  	cc->free_pfn = isolate_start_pfn;
> +	update_cached_free_pfn(zone, isolate_start_pfn);
>  }
>  
>  /*
> @@ -1177,6 +1177,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  	acct_isolated(zone, cc);
>  	/* Record where migration scanner will be restarted. */
>  	cc->migrate_pfn = low_pfn;
> +	update_cached_migrate_pfn(zone, low_pfn, cc->mode);
> +
>  
>  	return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
>  }
> -- 
> 2.1.4
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/6] mm, compaction: encapsulate resetting cached scanner positions
  2015-06-16  5:41     ` Joonsoo Kim
@ 2015-06-16 12:13       ` Vlastimil Babka
  -1 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-16 12:13 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Mel Gorman,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On 06/16/2015 07:41 AM, Joonsoo Kim wrote:
> On Wed, Jun 10, 2015 at 11:32:31AM +0200, Vlastimil Babka wrote:
>> Resetting the cached compaction scanner positions is now done implicitly in
>> __reset_isolation_suitable() and compact_finished(). Encapsulate the
>> functionality in a new function reset_cached_positions() and call it explicitly
>> where needed.
>> 
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Mel Gorman <mgorman@suse.de>
>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> Cc: Michal Nazarewicz <mina86@mina86.com>
>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> Cc: Christoph Lameter <cl@linux.com>
>> Cc: Rik van Riel <riel@redhat.com>
>> Cc: David Rientjes <rientjes@google.com>
>> ---
>>  mm/compaction.c | 22 ++++++++++++++--------
>>  1 file changed, 14 insertions(+), 8 deletions(-)
>> 
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 7e0a814..d334bb3 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -207,6 +207,13 @@ static inline bool isolation_suitable(struct compact_control *cc,
>>  	return !get_pageblock_skip(page);
>>  }
>>  
>> +static void reset_cached_positions(struct zone *zone)
>> +{
>> +	zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
>> +	zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
>> +	zone->compact_cached_free_pfn = zone_end_pfn(zone);
>> +}
>> +
>>  /*
>>   * This function is called to clear all cached information on pageblocks that
>>   * should be skipped for page isolation when the migrate and free page scanner
>> @@ -218,9 +225,6 @@ static void __reset_isolation_suitable(struct zone *zone)
>>  	unsigned long end_pfn = zone_end_pfn(zone);
>>  	unsigned long pfn;
>>  
>> -	zone->compact_cached_migrate_pfn[0] = start_pfn;
>> -	zone->compact_cached_migrate_pfn[1] = start_pfn;
>> -	zone->compact_cached_free_pfn = end_pfn;
>>  	zone->compact_blockskip_flush = false;
> 
> Is there a valid reason not to call reset_cached_positions() in
> __reset_isolation_suitable?

Hm maybe not, except being somewhat implicit again. It might had a stronger
reason in the previous patchset.

> You missed one callsite in
> __compact_pgdat().
> 
>         if (cc->order == -1)
>                 __reset_isolation_suitable(zone);
> 
> This also needs reset_cached_positions().

Ah, good catch. Thanks.

> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/6] mm, compaction: encapsulate resetting cached scanner positions
@ 2015-06-16 12:13       ` Vlastimil Babka
  0 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-16 12:13 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Mel Gorman,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On 06/16/2015 07:41 AM, Joonsoo Kim wrote:
> On Wed, Jun 10, 2015 at 11:32:31AM +0200, Vlastimil Babka wrote:
>> Resetting the cached compaction scanner positions is now done implicitly in
>> __reset_isolation_suitable() and compact_finished(). Encapsulate the
>> functionality in a new function reset_cached_positions() and call it explicitly
>> where needed.
>> 
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Mel Gorman <mgorman@suse.de>
>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> Cc: Michal Nazarewicz <mina86@mina86.com>
>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> Cc: Christoph Lameter <cl@linux.com>
>> Cc: Rik van Riel <riel@redhat.com>
>> Cc: David Rientjes <rientjes@google.com>
>> ---
>>  mm/compaction.c | 22 ++++++++++++++--------
>>  1 file changed, 14 insertions(+), 8 deletions(-)
>> 
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 7e0a814..d334bb3 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -207,6 +207,13 @@ static inline bool isolation_suitable(struct compact_control *cc,
>>  	return !get_pageblock_skip(page);
>>  }
>>  
>> +static void reset_cached_positions(struct zone *zone)
>> +{
>> +	zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
>> +	zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
>> +	zone->compact_cached_free_pfn = zone_end_pfn(zone);
>> +}
>> +
>>  /*
>>   * This function is called to clear all cached information on pageblocks that
>>   * should be skipped for page isolation when the migrate and free page scanner
>> @@ -218,9 +225,6 @@ static void __reset_isolation_suitable(struct zone *zone)
>>  	unsigned long end_pfn = zone_end_pfn(zone);
>>  	unsigned long pfn;
>>  
>> -	zone->compact_cached_migrate_pfn[0] = start_pfn;
>> -	zone->compact_cached_migrate_pfn[1] = start_pfn;
>> -	zone->compact_cached_free_pfn = end_pfn;
>>  	zone->compact_blockskip_flush = false;
> 
> Is there a valid reason not to call reset_cached_positions() in
> __reset_isolation_suitable?

Hm maybe not, except being somewhat implicit again. It might had a stronger
reason in the previous patchset.

> You missed one callsite in
> __compact_pgdat().
> 
>         if (cc->order == -1)
>                 __reset_isolation_suitable(zone);
> 
> This also needs reset_cached_positions().

Ah, good catch. Thanks.

> 
> Thanks.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/6] mm, compaction: always skip compound pages by order in migrate scanner
  2015-06-16  5:44     ` Joonsoo Kim
@ 2015-06-16 12:16       ` Vlastimil Babka
  -1 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-16 12:16 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Mel Gorman,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On 06/16/2015 07:44 AM, Joonsoo Kim wrote:
> On Wed, Jun 10, 2015 at 11:32:32AM +0200, Vlastimil Babka wrote:

[...]

>> @@ -723,39 +725,35 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>  		 * It's possible to migrate LRU pages and balloon pages
>>  		 * Skip any other type of page
>>  		 */
>> -		if (!PageLRU(page)) {
>> +		is_lru = PageLRU(page);
>> +		if (!is_lru) {
>>  			if (unlikely(balloon_page_movable(page))) {
>>  				if (balloon_page_isolate(page)) {
>>  					/* Successfully isolated */
>>  					goto isolate_success;
>>  				}
>>  			}
>> -			continue;
>>  		}
>>  
>>  		/*
>> -		 * PageLRU is set. lru_lock normally excludes isolation
>> -		 * splitting and collapsing (collapsing has already happened
>> -		 * if PageLRU is set) but the lock is not necessarily taken
>> -		 * here and it is wasteful to take it just to check transhuge.
>> -		 * Check PageCompound without lock and skip the whole pageblock
>> -		 * if it's a transhuge page, as calling compound_order()
>> -		 * without preventing THP from splitting the page underneath us
>> -		 * may return surprising results.
>> -		 * If we happen to check a THP tail page, compound_order()
>> -		 * returns 0. It should be rare enough to not bother with
>> -		 * using compound_head() in that case.
>> +		 * Regardless of being on LRU, compound pages such as THP and
>> +		 * hugetlbfs are not to be compacted. We can potentially save
>> +		 * a lot of iterations if we skip them at once. The check is
>> +		 * racy, but we can consider only valid values and the only
>> +		 * danger is skipping too much.
>>  		 */
>>  		if (PageCompound(page)) {
>> -			int nr;
>> -			if (locked)
>> -				nr = 1 << compound_order(page);
>> -			else
>> -				nr = pageblock_nr_pages;
>> -			low_pfn += nr - 1;
>> +			unsigned int comp_order = compound_order(page);
>> +
>> +			if (comp_order > 0 && comp_order < MAX_ORDER)
>> +				low_pfn += (1UL << comp_order) - 1;
>> +
>>  			continue;
>>  		}
> 
> How about moving this PageCompound() check up to the PageLRU check?
> Is there any relationship between balloon page and PageCompound()?

I didn't want to assume if there's a relationship or not, as per the changelog:

>> After this patch, all pages are tested for PageCompound() and we skip them by
>> compound_order().  The test is done after the test for balloon_page_movable()
>> as we don't want to assume if balloon pages (or other pages with own isolation
>> and migration implementation if a generic API gets implemented) are compound
>> or not.

> It will remove is_lru and code would be more understandable.

Right, it just felt safer and more future-proof this way.

> Otherwise,
> 
> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/6] mm, compaction: always skip compound pages by order in migrate scanner
@ 2015-06-16 12:16       ` Vlastimil Babka
  0 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-16 12:16 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Mel Gorman,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On 06/16/2015 07:44 AM, Joonsoo Kim wrote:
> On Wed, Jun 10, 2015 at 11:32:32AM +0200, Vlastimil Babka wrote:

[...]

>> @@ -723,39 +725,35 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>  		 * It's possible to migrate LRU pages and balloon pages
>>  		 * Skip any other type of page
>>  		 */
>> -		if (!PageLRU(page)) {
>> +		is_lru = PageLRU(page);
>> +		if (!is_lru) {
>>  			if (unlikely(balloon_page_movable(page))) {
>>  				if (balloon_page_isolate(page)) {
>>  					/* Successfully isolated */
>>  					goto isolate_success;
>>  				}
>>  			}
>> -			continue;
>>  		}
>>  
>>  		/*
>> -		 * PageLRU is set. lru_lock normally excludes isolation
>> -		 * splitting and collapsing (collapsing has already happened
>> -		 * if PageLRU is set) but the lock is not necessarily taken
>> -		 * here and it is wasteful to take it just to check transhuge.
>> -		 * Check PageCompound without lock and skip the whole pageblock
>> -		 * if it's a transhuge page, as calling compound_order()
>> -		 * without preventing THP from splitting the page underneath us
>> -		 * may return surprising results.
>> -		 * If we happen to check a THP tail page, compound_order()
>> -		 * returns 0. It should be rare enough to not bother with
>> -		 * using compound_head() in that case.
>> +		 * Regardless of being on LRU, compound pages such as THP and
>> +		 * hugetlbfs are not to be compacted. We can potentially save
>> +		 * a lot of iterations if we skip them at once. The check is
>> +		 * racy, but we can consider only valid values and the only
>> +		 * danger is skipping too much.
>>  		 */
>>  		if (PageCompound(page)) {
>> -			int nr;
>> -			if (locked)
>> -				nr = 1 << compound_order(page);
>> -			else
>> -				nr = pageblock_nr_pages;
>> -			low_pfn += nr - 1;
>> +			unsigned int comp_order = compound_order(page);
>> +
>> +			if (comp_order > 0 && comp_order < MAX_ORDER)
>> +				low_pfn += (1UL << comp_order) - 1;
>> +
>>  			continue;
>>  		}
> 
> How about moving this PageCompound() check up to the PageLRU check?
> Is there any relationship between balloon page and PageCompound()?

I didn't want to assume if there's a relationship or not, as per the changelog:

>> After this patch, all pages are tested for PageCompound() and we skip them by
>> compound_order().  The test is done after the test for balloon_page_movable()
>> as we don't want to assume if balloon pages (or other pages with own isolation
>> and migration implementation if a generic API gets implemented) are compound
>> or not.

> It will remove is_lru and code would be more understandable.

Right, it just felt safer and more future-proof this way.

> Otherwise,
> 
> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> Thanks.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] mm, compaction: decouple updating pageblock_skip and cached pfn
  2015-06-16  6:10     ` Joonsoo Kim
@ 2015-06-16 12:33       ` Vlastimil Babka
  -1 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-16 12:33 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Mel Gorman,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On 06/16/2015 08:10 AM, Joonsoo Kim wrote:
> On Wed, Jun 10, 2015 at 11:32:34AM +0200, Vlastimil Babka wrote:
>> The pageblock_skip bitmap and cached scanner pfn's are two mechanisms in
>> compaction to prevent rescanning pages where isolation has recently failed
>> or they were scanned during the previous compaction attempt.
>> 
>> Currently, both kinds of information are updated via update_pageblock_skip(),
>> which is suboptimal for the cached scanner pfn's:
>> 
>> - The condition "isolation has failed in the pageblock" checked by
>>   update_pageblock_skip() may be valid for the pageblock_skip bitmap, but makes
>>   less sense for cached pfn's. There's little point for the next compaction
>>   attempt to scan again a pageblock where all pages that could be isolated were
>>   already processed.
> 
> In async compaction, compaction could be stopped due to cc->contended
> in freepage scanner so sometimes isolated pages were not migrated. Your
> change makes next async compaction skip these pages. This possibly causes
> compaction complete prematurely by async compaction.

Hm, I see, thanks. That could be fixed when returning the non-migrated pages,
just like we do for the unused freepages and cached free scanner position.

> And, rescan previous attempted range could solve some race problem.
> If allocated page waits to set PageLRU in pagevec, compaction will
> pass it. If we try rescan after short time, page will have PageLRU and
> compaction can isolate and migrate it and make high order freepage. This
> requires some rescanning overhead but migration overhead which is more bigger
> than scanning overhead is just a little. If compaction pass it like as
> this change, pages on this area would be allocated for other requestor, and,
> when compaction revisit, there would be more page to migrate.

The same "race problem" (and many others) can happen when we don't abort and
later restart from cached pfn's, but just continue on to later pageblocks within
single compaction run. Still I would expect that it's statistically higher
chance to succeed in the next pageblock than rescanning pageblock(s) that we
just scanned.

> I basically agree with this change because it is more intuitive. But,
> I'd like to see some improvement result or test this patch myself before merging
> it.

Sure, please test. I don't expect much difference, the primary motivation was
really that the recorded pfn's by tracepoints looked much saner.

Thanks for the review!

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] mm, compaction: decouple updating pageblock_skip and cached pfn
@ 2015-06-16 12:33       ` Vlastimil Babka
  0 siblings, 0 replies; 50+ messages in thread
From: Vlastimil Babka @ 2015-06-16 12:33 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Mel Gorman,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On 06/16/2015 08:10 AM, Joonsoo Kim wrote:
> On Wed, Jun 10, 2015 at 11:32:34AM +0200, Vlastimil Babka wrote:
>> The pageblock_skip bitmap and cached scanner pfn's are two mechanisms in
>> compaction to prevent rescanning pages where isolation has recently failed
>> or they were scanned during the previous compaction attempt.
>> 
>> Currently, both kinds of information are updated via update_pageblock_skip(),
>> which is suboptimal for the cached scanner pfn's:
>> 
>> - The condition "isolation has failed in the pageblock" checked by
>>   update_pageblock_skip() may be valid for the pageblock_skip bitmap, but makes
>>   less sense for cached pfn's. There's little point for the next compaction
>>   attempt to scan again a pageblock where all pages that could be isolated were
>>   already processed.
> 
> In async compaction, compaction could be stopped due to cc->contended
> in freepage scanner so sometimes isolated pages were not migrated. Your
> change makes next async compaction skip these pages. This possibly causes
> compaction complete prematurely by async compaction.

Hm, I see, thanks. That could be fixed when returning the non-migrated pages,
just like we do for the unused freepages and cached free scanner position.

> And, rescan previous attempted range could solve some race problem.
> If allocated page waits to set PageLRU in pagevec, compaction will
> pass it. If we try rescan after short time, page will have PageLRU and
> compaction can isolate and migrate it and make high order freepage. This
> requires some rescanning overhead but migration overhead which is more bigger
> than scanning overhead is just a little. If compaction pass it like as
> this change, pages on this area would be allocated for other requestor, and,
> when compaction revisit, there would be more page to migrate.

The same "race problem" (and many others) can happen when we don't abort and
later restart from cached pfn's, but just continue on to later pageblocks within
single compaction run. Still I would expect that it's statistically higher
chance to succeed in the next pageblock than rescanning pageblock(s) that we
just scanned.

> I basically agree with this change because it is more intuitive. But,
> I'd like to see some improvement result or test this patch myself before merging
> it.

Sure, please test. I don't expect much difference, the primary motivation was
really that the recorded pfn's by tracepoints looked much saner.

Thanks for the review!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] mm, compaction: decouple updating pageblock_skip and cached pfn
  2015-06-16 12:33       ` Vlastimil Babka
@ 2015-06-16 13:03         ` Joonsoo Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Joonsoo Kim @ 2015-06-16 13:03 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Joonsoo Kim, Andrew Morton, Linux Memory Management List, LKML,
	Minchan Kim, Mel Gorman, Michal Nazarewicz, Naoya Horiguchi,
	Christoph Lameter, Rik van Riel, David Rientjes

2015-06-16 21:33 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
> On 06/16/2015 08:10 AM, Joonsoo Kim wrote:
>> On Wed, Jun 10, 2015 at 11:32:34AM +0200, Vlastimil Babka wrote:
>>> The pageblock_skip bitmap and cached scanner pfn's are two mechanisms in
>>> compaction to prevent rescanning pages where isolation has recently failed
>>> or they were scanned during the previous compaction attempt.
>>>
>>> Currently, both kinds of information are updated via update_pageblock_skip(),
>>> which is suboptimal for the cached scanner pfn's:
>>>
>>> - The condition "isolation has failed in the pageblock" checked by
>>>   update_pageblock_skip() may be valid for the pageblock_skip bitmap, but makes
>>>   less sense for cached pfn's. There's little point for the next compaction
>>>   attempt to scan again a pageblock where all pages that could be isolated were
>>>   already processed.
>>
>> In async compaction, compaction could be stopped due to cc->contended
>> in freepage scanner so sometimes isolated pages were not migrated. Your
>> change makes next async compaction skip these pages. This possibly causes
>> compaction complete prematurely by async compaction.
>
> Hm, I see, thanks. That could be fixed when returning the non-migrated pages,
> just like we do for the unused freepages and cached free scanner position.

Yes, that would work.

>> And, rescan previous attempted range could solve some race problem.
>> If allocated page waits to set PageLRU in pagevec, compaction will
>> pass it. If we try rescan after short time, page will have PageLRU and
>> compaction can isolate and migrate it and make high order freepage. This
>> requires some rescanning overhead but migration overhead which is more bigger
>> than scanning overhead is just a little. If compaction pass it like as
>> this change, pages on this area would be allocated for other requestor, and,
>> when compaction revisit, there would be more page to migrate.
>
> The same "race problem" (and many others) can happen when we don't abort and
> later restart from cached pfn's, but just continue on to later pageblocks within
> single compaction run. Still I would expect that it's statistically higher
> chance to succeed in the next pageblock than rescanning pageblock(s) that we
> just scanned.

If we consider just one compaction attempt, yes, scanning next pageblock
is more promising way to succeed. But, if we rescan pageblock that we
just scanned and, fortunately, we succeed to migrate and make high order
freeepage from that pageblock, more compaction attempt can be successful
before both scanner meet and this may result in less overhead in view of
overall compaction attempts.

>> I basically agree with this change because it is more intuitive. But,
>> I'd like to see some improvement result or test this patch myself before merging
>> it.
>
> Sure, please test. I don't expect much difference, the primary motivation was
> really that the recorded pfn's by tracepoints looked much saner.

Yes. that's what I'd like to say from *intuitive*. :)

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/6] mm, compaction: decouple updating pageblock_skip and cached pfn
@ 2015-06-16 13:03         ` Joonsoo Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Joonsoo Kim @ 2015-06-16 13:03 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Joonsoo Kim, Andrew Morton, Linux Memory Management List, LKML,
	Minchan Kim, Mel Gorman, Michal Nazarewicz, Naoya Horiguchi,
	Christoph Lameter, Rik van Riel, David Rientjes

2015-06-16 21:33 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
> On 06/16/2015 08:10 AM, Joonsoo Kim wrote:
>> On Wed, Jun 10, 2015 at 11:32:34AM +0200, Vlastimil Babka wrote:
>>> The pageblock_skip bitmap and cached scanner pfn's are two mechanisms in
>>> compaction to prevent rescanning pages where isolation has recently failed
>>> or they were scanned during the previous compaction attempt.
>>>
>>> Currently, both kinds of information are updated via update_pageblock_skip(),
>>> which is suboptimal for the cached scanner pfn's:
>>>
>>> - The condition "isolation has failed in the pageblock" checked by
>>>   update_pageblock_skip() may be valid for the pageblock_skip bitmap, but makes
>>>   less sense for cached pfn's. There's little point for the next compaction
>>>   attempt to scan again a pageblock where all pages that could be isolated were
>>>   already processed.
>>
>> In async compaction, compaction could be stopped due to cc->contended
>> in freepage scanner so sometimes isolated pages were not migrated. Your
>> change makes next async compaction skip these pages. This possibly causes
>> compaction complete prematurely by async compaction.
>
> Hm, I see, thanks. That could be fixed when returning the non-migrated pages,
> just like we do for the unused freepages and cached free scanner position.

Yes, that would work.

>> And, rescan previous attempted range could solve some race problem.
>> If allocated page waits to set PageLRU in pagevec, compaction will
>> pass it. If we try rescan after short time, page will have PageLRU and
>> compaction can isolate and migrate it and make high order freepage. This
>> requires some rescanning overhead but migration overhead which is more bigger
>> than scanning overhead is just a little. If compaction pass it like as
>> this change, pages on this area would be allocated for other requestor, and,
>> when compaction revisit, there would be more page to migrate.
>
> The same "race problem" (and many others) can happen when we don't abort and
> later restart from cached pfn's, but just continue on to later pageblocks within
> single compaction run. Still I would expect that it's statistically higher
> chance to succeed in the next pageblock than rescanning pageblock(s) that we
> just scanned.

If we consider just one compaction attempt, yes, scanning next pageblock
is more promising way to succeed. But, if we rescan pageblock that we
just scanned and, fortunately, we succeed to migrate and make high order
freeepage from that pageblock, more compaction attempt can be successful
before both scanner meet and this may result in less overhead in view of
overall compaction attempts.

>> I basically agree with this change because it is more intuitive. But,
>> I'd like to see some improvement result or test this patch myself before merging
>> it.
>
> Sure, please test. I don't expect much difference, the primary motivation was
> really that the recorded pfn's by tracepoints looked much saner.

Yes. that's what I'd like to say from *intuitive*. :)

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/6] mm, compaction: more robust check for scanners meeting
  2015-06-10  9:32   ` Vlastimil Babka
@ 2015-06-19 13:41     ` Mel Gorman
  -1 siblings, 0 replies; 50+ messages in thread
From: Mel Gorman @ 2015-06-19 13:41 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Joonsoo Kim,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On Wed, Jun 10, 2015 at 11:32:29AM +0200, Vlastimil Babka wrote:
> Compaction should finish when the migration and free scanner meet, i.e. they
> reach the same pageblock. Currently however, the test in compact_finished()
> simply just compares the exact pfns, which may yield a false negative when the
> free scanner position is in the middle of a pageblock and the migration scanner
> reaches the beginning of the same pageblock.
> 
> This hasn't been a problem until commit e14c720efdd7 ("mm, compaction: remember
> position within pageblock in free pages scanner") allowed the free scanner
> position to be in the middle of a pageblock between invocations.  The hot-fix
> 1d5bfe1ffb5b ("mm, compaction: prevent infinite loop in compact_zone")
> prevented the issue by adding a special check in the migration scanner to
> satisfy the current detection of scanners meeting.
> 
> However, the proper fix is to make the detection more robust. This patch
> introduces the compact_scanners_met() function that returns true when the free
> scanner position is in the same or lower pageblock than the migration scanner.
> The special case in isolate_migratepages() introduced by 1d5bfe1ffb5b is
> removed.
> 
> Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/6] mm, compaction: more robust check for scanners meeting
@ 2015-06-19 13:41     ` Mel Gorman
  0 siblings, 0 replies; 50+ messages in thread
From: Mel Gorman @ 2015-06-19 13:41 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Joonsoo Kim,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On Wed, Jun 10, 2015 at 11:32:29AM +0200, Vlastimil Babka wrote:
> Compaction should finish when the migration and free scanner meet, i.e. they
> reach the same pageblock. Currently however, the test in compact_finished()
> simply just compares the exact pfns, which may yield a false negative when the
> free scanner position is in the middle of a pageblock and the migration scanner
> reaches the beginning of the same pageblock.
> 
> This hasn't been a problem until commit e14c720efdd7 ("mm, compaction: remember
> position within pageblock in free pages scanner") allowed the free scanner
> position to be in the middle of a pageblock between invocations.  The hot-fix
> 1d5bfe1ffb5b ("mm, compaction: prevent infinite loop in compact_zone")
> prevented the issue by adding a special check in the migration scanner to
> satisfy the current detection of scanners meeting.
> 
> However, the proper fix is to make the detection more robust. This patch
> introduces the compact_scanners_met() function that returns true when the free
> scanner position is in the same or lower pageblock than the migration scanner.
> The special case in isolate_migratepages() introduced by 1d5bfe1ffb5b is
> removed.
> 
> Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/6] mm, compaction: simplify handling restart position in free pages scanner
  2015-06-10  9:32   ` Vlastimil Babka
@ 2015-06-19 13:48     ` Mel Gorman
  -1 siblings, 0 replies; 50+ messages in thread
From: Mel Gorman @ 2015-06-19 13:48 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Joonsoo Kim,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On Wed, Jun 10, 2015 at 11:32:30AM +0200, Vlastimil Babka wrote:
> Handling the position where compaction free scanner should restart (stored in
> cc->free_pfn) got more complex with commit e14c720efdd7 ("mm, compaction:
> remember position within pageblock in free pages scanner"). Currently the
> position is updated in each loop iteration of isolate_freepages(), although it
> should be enough to update it only when breaking from the loop. There's also
> an extra check outside the loop updates the position in case we have met the
> migration scanner.
> 
> This can be simplified if we move the test for having isolated enough from the
> for loop header next to the test for contention, and determining the restart
> position only in these cases. We can reuse the isolate_start_pfn variable for
> this instead of setting cc->free_pfn directly. Outside the loop, we can simply
> set cc->free_pfn to current value of isolate_start_pfn without any extra check.
> 
> Also add a VM_BUG_ON to catch possible mistake in the future, in case we later
> add a new condition that terminates isolate_freepages_block() prematurely
> without also considering the condition in isolate_freepages().
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/6] mm, compaction: simplify handling restart position in free pages scanner
@ 2015-06-19 13:48     ` Mel Gorman
  0 siblings, 0 replies; 50+ messages in thread
From: Mel Gorman @ 2015-06-19 13:48 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Joonsoo Kim,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On Wed, Jun 10, 2015 at 11:32:30AM +0200, Vlastimil Babka wrote:
> Handling the position where compaction free scanner should restart (stored in
> cc->free_pfn) got more complex with commit e14c720efdd7 ("mm, compaction:
> remember position within pageblock in free pages scanner"). Currently the
> position is updated in each loop iteration of isolate_freepages(), although it
> should be enough to update it only when breaking from the loop. There's also
> an extra check outside the loop updates the position in case we have met the
> migration scanner.
> 
> This can be simplified if we move the test for having isolated enough from the
> for loop header next to the test for contention, and determining the restart
> position only in these cases. We can reuse the isolate_start_pfn variable for
> this instead of setting cc->free_pfn directly. Outside the loop, we can simply
> set cc->free_pfn to current value of isolate_start_pfn without any extra check.
> 
> Also add a VM_BUG_ON to catch possible mistake in the future, in case we later
> add a new condition that terminates isolate_freepages_block() prematurely
> without also considering the condition in isolate_freepages().
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/6] mm, compaction: always skip compound pages by order in migrate scanner
  2015-06-10  9:32   ` Vlastimil Babka
@ 2015-06-19 13:58     ` Mel Gorman
  -1 siblings, 0 replies; 50+ messages in thread
From: Mel Gorman @ 2015-06-19 13:58 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Joonsoo Kim,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On Wed, Jun 10, 2015 at 11:32:32AM +0200, Vlastimil Babka wrote:
> The compaction migrate scanner tries to skip compound pages by their order, to
> reduce number of iterations for pages it cannot isolate. The check is only done
> if PageLRU() is true, which means it applies to THP pages, but not e.g.
> hugetlbfs pages or any other non-LRU compound pages, which we have to iterate
> by base pages.
> 
> This limitation comes from the assumption that it's only safe to read
> compound_order() when we have the zone's lru_lock and THP cannot be split under
> us. But the only danger (after filtering out order values that are not below
> MAX_ORDER, to prevent overflows) is that we skip too much or too little after
> reading a bogus compound_order() due to a rare race. This is the same reasoning
> as patch 99c0fd5e51c4 ("mm, compaction: skip buddy pages by their order in the
> migrate scanner") introduced for unsafely reading PageBuddy() order.
> 
> After this patch, all pages are tested for PageCompound() and we skip them by
> compound_order().  The test is done after the test for balloon_page_movable()
> as we don't want to assume if balloon pages (or other pages with own isolation
> and migration implementation if a generic API gets implemented) are compound
> or not.
> 
> When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
> pages, the vmstat compact_migrate_scanned count decreased by 15%.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/6] mm, compaction: always skip compound pages by order in migrate scanner
@ 2015-06-19 13:58     ` Mel Gorman
  0 siblings, 0 replies; 50+ messages in thread
From: Mel Gorman @ 2015-06-19 13:58 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Minchan Kim, Joonsoo Kim,
	Michal Nazarewicz, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes

On Wed, Jun 10, 2015 at 11:32:32AM +0200, Vlastimil Babka wrote:
> The compaction migrate scanner tries to skip compound pages by their order, to
> reduce number of iterations for pages it cannot isolate. The check is only done
> if PageLRU() is true, which means it applies to THP pages, but not e.g.
> hugetlbfs pages or any other non-LRU compound pages, which we have to iterate
> by base pages.
> 
> This limitation comes from the assumption that it's only safe to read
> compound_order() when we have the zone's lru_lock and THP cannot be split under
> us. But the only danger (after filtering out order values that are not below
> MAX_ORDER, to prevent overflows) is that we skip too much or too little after
> reading a bogus compound_order() due to a rare race. This is the same reasoning
> as patch 99c0fd5e51c4 ("mm, compaction: skip buddy pages by their order in the
> migrate scanner") introduced for unsafely reading PageBuddy() order.
> 
> After this patch, all pages are tested for PageCompound() and we skip them by
> compound_order().  The test is done after the test for balloon_page_movable()
> as we don't want to assume if balloon pages (or other pages with own isolation
> and migration implementation if a generic API gets implemented) are compound
> or not.
> 
> When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
> pages, the vmstat compact_migrate_scanned count decreased by 15%.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@suse.de>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2015-06-19 13:58 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-10  9:32 [PATCH 0/6] Assorted compaction cleanups and optimizations Vlastimil Babka
2015-06-10  9:32 ` Vlastimil Babka
2015-06-10  9:32 ` [PATCH 1/6] mm, compaction: more robust check for scanners meeting Vlastimil Babka
2015-06-10  9:32   ` Vlastimil Babka
2015-06-10 18:02   ` Rik van Riel
2015-06-10 18:02     ` Rik van Riel
2015-06-12  9:55   ` Michal Nazarewicz
2015-06-12  9:55     ` Michal Nazarewicz
2015-06-16  5:37   ` Joonsoo Kim
2015-06-16  5:37     ` Joonsoo Kim
2015-06-19 13:41   ` Mel Gorman
2015-06-19 13:41     ` Mel Gorman
2015-06-10  9:32 ` [PATCH 2/6] mm, compaction: simplify handling restart position in free pages scanner Vlastimil Babka
2015-06-10  9:32   ` Vlastimil Babka
2015-06-16  5:38   ` Joonsoo Kim
2015-06-16  5:38     ` Joonsoo Kim
2015-06-19 13:48   ` Mel Gorman
2015-06-19 13:48     ` Mel Gorman
2015-06-10  9:32 ` [PATCH 3/6] mm, compaction: encapsulate resetting cached scanner positions Vlastimil Babka
2015-06-10  9:32   ` Vlastimil Babka
2015-06-12 10:07   ` Michal Nazarewicz
2015-06-12 10:07     ` Michal Nazarewicz
2015-06-16  5:41   ` Joonsoo Kim
2015-06-16  5:41     ` Joonsoo Kim
2015-06-16 12:13     ` Vlastimil Babka
2015-06-16 12:13       ` Vlastimil Babka
2015-06-10  9:32 ` [PATCH 4/6] mm, compaction: always skip compound pages by order in migrate scanner Vlastimil Babka
2015-06-10  9:32   ` Vlastimil Babka
2015-06-12 10:11   ` Michal Nazarewicz
2015-06-12 10:11     ` Michal Nazarewicz
2015-06-16  5:44   ` Joonsoo Kim
2015-06-16  5:44     ` Joonsoo Kim
2015-06-16 12:16     ` Vlastimil Babka
2015-06-16 12:16       ` Vlastimil Babka
2015-06-19 13:58   ` Mel Gorman
2015-06-19 13:58     ` Mel Gorman
2015-06-10  9:32 ` [PATCH 5/6] mm, compaction: skip compound pages by order in free scanner Vlastimil Babka
2015-06-10  9:32   ` Vlastimil Babka
2015-06-12 10:18   ` Michal Nazarewicz
2015-06-12 10:18     ` Michal Nazarewicz
2015-06-16  5:45   ` Joonsoo Kim
2015-06-16  5:45     ` Joonsoo Kim
2015-06-10  9:32 ` [PATCH 6/6] mm, compaction: decouple updating pageblock_skip and cached pfn Vlastimil Babka
2015-06-10  9:32   ` Vlastimil Babka
2015-06-16  6:10   ` Joonsoo Kim
2015-06-16  6:10     ` Joonsoo Kim
2015-06-16 12:33     ` Vlastimil Babka
2015-06-16 12:33       ` Vlastimil Babka
2015-06-16 13:03       ` Joonsoo Kim
2015-06-16 13:03         ` Joonsoo Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.