[RFC][PATCHv2 0/8] introduce automatic pool compaction

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* [RFC][PATCHv2 0/8] introduce automatic pool compaction
@ 2015-06-05 12:03 ` Sergey Senozhatsky
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-05 12:03 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Hello,

This patch set tweaks compaction and makes it possible to trigger
pool compaction automatically when system is getting low on memory.

zsmalloc in some cases can suffer from a notable fragmentation and
compaction can release some considerable amount of memory. The problem
here is that currently we fully rely on user space to perform compaction
when needed. However, performing zsmalloc compaction is not always an
obvious thing to do. For example, suppose we have a `idle' fragmented
(compaction was never performed) zram device and system is getting low
on memory due to some 3rd party user processes (gcc LTO, or firefox, etc.).
It's quite unlikely that user space will issue zpool compaction in this
case. Besides, user space cannot tell for sure how badly pool is
fragmented; however, this info is known to zsmalloc and, hence, to a
shrinker.

v2:
-- use a slab shrinker instead of triggering compaction from zs_free (Minchan)

Sergey Senozhatsky (8):
  zsmalloc: drop unused variable `nr_to_migrate'
  zsmalloc: partial page ordering within a fullness_list
  zsmalloc: lower ZS_ALMOST_FULL waterline
  zsmalloc: always keep per-class stats
  zsmalloc: introduce zs_can_compact() function
  zsmalloc: cosmetic compaction code adjustments
  zsmalloc/zram: move `num_migrated' to zs_pool
  zsmalloc: register a shrinker to trigger auto-compaction

 drivers/block/zram/zram_drv.c |  12 +--
 drivers/block/zram/zram_drv.h |   1 -
 include/linux/zsmalloc.h      |   1 +
 mm/zsmalloc.c                 | 228 +++++++++++++++++++++++++++---------------
 4 files changed, 152 insertions(+), 90 deletions(-)

-- 
2.4.2.387.gf86f31a

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC][PATCHv2 0/8] introduce automatic pool compaction
@ 2015-06-05 12:03 ` Sergey Senozhatsky
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-05 12:03 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Hello,

This patch set tweaks compaction and makes it possible to trigger
pool compaction automatically when system is getting low on memory.

zsmalloc in some cases can suffer from a notable fragmentation and
compaction can release some considerable amount of memory. The problem
here is that currently we fully rely on user space to perform compaction
when needed. However, performing zsmalloc compaction is not always an
obvious thing to do. For example, suppose we have a `idle' fragmented
(compaction was never performed) zram device and system is getting low
on memory due to some 3rd party user processes (gcc LTO, or firefox, etc.).
It's quite unlikely that user space will issue zpool compaction in this
case. Besides, user space cannot tell for sure how badly pool is
fragmented; however, this info is known to zsmalloc and, hence, to a
shrinker.

v2:
-- use a slab shrinker instead of triggering compaction from zs_free (Minchan)

Sergey Senozhatsky (8):
  zsmalloc: drop unused variable `nr_to_migrate'
  zsmalloc: partial page ordering within a fullness_list
  zsmalloc: lower ZS_ALMOST_FULL waterline
  zsmalloc: always keep per-class stats
  zsmalloc: introduce zs_can_compact() function
  zsmalloc: cosmetic compaction code adjustments
  zsmalloc/zram: move `num_migrated' to zs_pool
  zsmalloc: register a shrinker to trigger auto-compaction

 drivers/block/zram/zram_drv.c |  12 +--
 drivers/block/zram/zram_drv.h |   1 -
 include/linux/zsmalloc.h      |   1 +
 mm/zsmalloc.c                 | 228 +++++++++++++++++++++++++++---------------
 4 files changed, 152 insertions(+), 90 deletions(-)

-- 
2.4.2.387.gf86f31a

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [RFC][PATCHv2 1/8] zsmalloc: drop unused variable `nr_to_migrate'
  2015-06-05 12:03 ` Sergey Senozhatsky
@ 2015-06-05 12:03   ` Sergey Senozhatsky
  -1 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-05 12:03 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

__zs_compact() does not use `nr_to_migrate', drop it.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index c766240..ce3310c 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1702,7 +1702,6 @@ static struct page *isolate_source_page(struct size_class *class)
 static unsigned long __zs_compact(struct zs_pool *pool,
 				struct size_class *class)
 {
-	int nr_to_migrate;
 	struct zs_compact_control cc;
 	struct page *src_page;
 	struct page *dst_page = NULL;
@@ -1713,8 +1712,6 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 
 		BUG_ON(!is_first_page(src_page));
 
-		/* The goal is to migrate all live objects in source page */
-		nr_to_migrate = src_page->inuse;
 		cc.index = 0;
 		cc.s_page = src_page;
 
@@ -1729,7 +1726,6 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 
 			putback_zspage(pool, class, dst_page);
 			nr_total_migrated += cc.nr_migrated;
-			nr_to_migrate -= cc.nr_migrated;
 		}
 
 		/* Stop if we couldn't find slot */
-- 
2.4.2.387.gf86f31a


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC][PATCHv2 1/8] zsmalloc: drop unused variable `nr_to_migrate'
@ 2015-06-05 12:03   ` Sergey Senozhatsky
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-05 12:03 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

__zs_compact() does not use `nr_to_migrate', drop it.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index c766240..ce3310c 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1702,7 +1702,6 @@ static struct page *isolate_source_page(struct size_class *class)
 static unsigned long __zs_compact(struct zs_pool *pool,
 				struct size_class *class)
 {
-	int nr_to_migrate;
 	struct zs_compact_control cc;
 	struct page *src_page;
 	struct page *dst_page = NULL;
@@ -1713,8 +1712,6 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 
 		BUG_ON(!is_first_page(src_page));
 
-		/* The goal is to migrate all live objects in source page */
-		nr_to_migrate = src_page->inuse;
 		cc.index = 0;
 		cc.s_page = src_page;
 
@@ -1729,7 +1726,6 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 
 			putback_zspage(pool, class, dst_page);
 			nr_total_migrated += cc.nr_migrated;
-			nr_to_migrate -= cc.nr_migrated;
 		}
 
 		/* Stop if we couldn't find slot */
-- 
2.4.2.387.gf86f31a

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC][PATCHv2 2/8] zsmalloc: partial page ordering within a fullness_list
  2015-06-05 12:03 ` Sergey Senozhatsky
@ 2015-06-05 12:03   ` Sergey Senozhatsky
  -1 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-05 12:03 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

We want to see more ZS_FULL pages and less ZS_ALMOST_{FULL, EMPTY}
pages. Put a page with higher ->inuse count first within its
->fullness_list, which will give us better chances to fill up this
page with new objects (find_get_zspage() return ->fullness_list head
for new object allocation), so some zspages will become
ZS_ALMOST_FULL/ZS_FULL quicker.

It performs a trivial and cheap ->inuse compare which does not slow
down zsmalloc, and in the worst case it keeps the list pages not in
any particular order, just like we do it now.

A more expensive solution could sort fullness_list by ->inuse count.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 mm/zsmalloc.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index ce3310c..cd37bda 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -658,8 +658,16 @@ static void insert_zspage(struct page *page, struct size_class *class,
 		return;
 
 	head = &class->fullness_list[fullness];
-	if (*head)
-		list_add_tail(&page->lru, &(*head)->lru);
+	if (*head) {
+		/*
+		 * We want to see more ZS_FULL pages and less almost
+		 * empty/full. Put pages with higher ->inuse first.
+		 */
+		if (page->inuse < (*head)->inuse)
+			list_add_tail(&page->lru, &(*head)->lru);
+		else
+			list_add(&page->lru, &(*head)->lru);
+	}
 
 	*head = page;
 	zs_stat_inc(class, fullness == ZS_ALMOST_EMPTY ?
-- 
2.4.2.387.gf86f31a


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC][PATCHv2 2/8] zsmalloc: partial page ordering within a fullness_list
@ 2015-06-05 12:03   ` Sergey Senozhatsky
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-05 12:03 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

We want to see more ZS_FULL pages and less ZS_ALMOST_{FULL, EMPTY}
pages. Put a page with higher ->inuse count first within its
->fullness_list, which will give us better chances to fill up this
page with new objects (find_get_zspage() return ->fullness_list head
for new object allocation), so some zspages will become
ZS_ALMOST_FULL/ZS_FULL quicker.

It performs a trivial and cheap ->inuse compare which does not slow
down zsmalloc, and in the worst case it keeps the list pages not in
any particular order, just like we do it now.

A more expensive solution could sort fullness_list by ->inuse count.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 mm/zsmalloc.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index ce3310c..cd37bda 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -658,8 +658,16 @@ static void insert_zspage(struct page *page, struct size_class *class,
 		return;
 
 	head = &class->fullness_list[fullness];
-	if (*head)
-		list_add_tail(&page->lru, &(*head)->lru);
+	if (*head) {
+		/*
+		 * We want to see more ZS_FULL pages and less almost
+		 * empty/full. Put pages with higher ->inuse first.
+		 */
+		if (page->inuse < (*head)->inuse)
+			list_add_tail(&page->lru, &(*head)->lru);
+		else
+			list_add(&page->lru, &(*head)->lru);
+	}
 
 	*head = page;
 	zs_stat_inc(class, fullness == ZS_ALMOST_EMPTY ?
-- 
2.4.2.387.gf86f31a

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC][PATCHv2 3/8] zsmalloc: lower ZS_ALMOST_FULL waterline
  2015-06-05 12:03 ` Sergey Senozhatsky
@ 2015-06-05 12:03   ` Sergey Senozhatsky
  -1 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-05 12:03 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

get_fullness_group() considers 3/4 full pages as almost empty.
That, unfortunately, marks as ALMOST_EMPTY pages that we would
probably like to keep in ALMOST_FULL lists.

ALMOST_EMPTY:
[..]
  inuse: 3 max_objects: 4
  inuse: 5 max_objects: 7
  inuse: 5 max_objects: 7
  inuse: 2 max_objects: 3
[..]

For "inuse: 5 max_objexts: 7" ALMOST_EMPTY page, for example,
it'll take 2 obj_malloc to make the page FULL and 5 obj_free to
make it EMPTY. Compaction selects ALMOST_EMPTY pages as source
pages, which can result in extra object moves.

In other words, from compaction point of view, it makes more
sense to fill this page, rather than drain it.

Decrease ALMOST_FULL waterline to 2/3 of max capacity; which is,
of course, still imperfect, but can shorten compaction
execution time.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 mm/zsmalloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index cd37bda..b94e281 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -198,7 +198,7 @@ static int zs_size_classes;
  *
  * (see: fix_fullness_group())
  */
-static const int fullness_threshold_frac = 4;
+static const int fullness_threshold_frac = 3;

 struct size_class {
 	/*
@@ -633,7 +633,7 @@ static enum fullness_group get_fullness_group(struct page *page)
 		fg = ZS_EMPTY;
 	else if (inuse == max_objects)
 		fg = ZS_FULL;
-	else if (inuse <= 3 * max_objects / fullness_threshold_frac)
+	else if (inuse <= 2 * max_objects / fullness_threshold_frac)
 		fg = ZS_ALMOST_EMPTY;
 	else
 		fg = ZS_ALMOST_FULL;
-- 
2.4.2.387.gf86f31a

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC][PATCHv2 3/8] zsmalloc: lower ZS_ALMOST_FULL waterline
@ 2015-06-05 12:03   ` Sergey Senozhatsky
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-05 12:03 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

get_fullness_group() considers 3/4 full pages as almost empty.
That, unfortunately, marks as ALMOST_EMPTY pages that we would
probably like to keep in ALMOST_FULL lists.

ALMOST_EMPTY:
[..]
  inuse: 3 max_objects: 4
  inuse: 5 max_objects: 7
  inuse: 5 max_objects: 7
  inuse: 2 max_objects: 3
[..]

For "inuse: 5 max_objexts: 7" ALMOST_EMPTY page, for example,
it'll take 2 obj_malloc to make the page FULL and 5 obj_free to
make it EMPTY. Compaction selects ALMOST_EMPTY pages as source
pages, which can result in extra object moves.

In other words, from compaction point of view, it makes more
sense to fill this page, rather than drain it.

Decrease ALMOST_FULL waterline to 2/3 of max capacity; which is,
of course, still imperfect, but can shorten compaction
execution time.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 mm/zsmalloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index cd37bda..b94e281 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -198,7 +198,7 @@ static int zs_size_classes;
  *
  * (see: fix_fullness_group())
  */
-static const int fullness_threshold_frac = 4;
+static const int fullness_threshold_frac = 3;
 
 struct size_class {
 	/*
@@ -633,7 +633,7 @@ static enum fullness_group get_fullness_group(struct page *page)
 		fg = ZS_EMPTY;
 	else if (inuse == max_objects)
 		fg = ZS_FULL;
-	else if (inuse <= 3 * max_objects / fullness_threshold_frac)
+	else if (inuse <= 2 * max_objects / fullness_threshold_frac)
 		fg = ZS_ALMOST_EMPTY;
 	else
 		fg = ZS_ALMOST_FULL;
-- 
2.4.2.387.gf86f31a

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC][PATCHv2 4/8] zsmalloc: always keep per-class stats
  2015-06-05 12:03 ` Sergey Senozhatsky
@ 2015-06-05 12:03   ` Sergey Senozhatsky
  -1 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-05 12:03 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Always account per-class `zs_size_stat' stats. This data will
help us make better decisions during compaction. We are especially
interested in OBJ_ALLOCATED and OBJ_USED, which can tell us if
class compaction will result in any memory gain.

For instance, we know the number of allocated objects in the class,
the number of objects being used (so we also know how many objects
are not used) and the number of objects per-page. So we can ensure
if we have enough unused objects to form at least one ZS_EMPTY
zspage during compaction.

We calculate this value on per-class basis so we can calculate a
total number of zspages that can be released. Which is exactly what
a shrinker wants to know.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 mm/zsmalloc.c | 49 ++++++++++++-------------------------------------
 1 file changed, 12 insertions(+), 37 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index b94e281..0453347 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -169,14 +169,12 @@ enum zs_stat_type {
 	NR_ZS_STAT_TYPE,
 };
 
-#ifdef CONFIG_ZSMALLOC_STAT
-
-static struct dentry *zs_stat_root;
-
 struct zs_size_stat {
 	unsigned long objs[NR_ZS_STAT_TYPE];
 };
 
+#ifdef CONFIG_ZSMALLOC_STAT
+static struct dentry *zs_stat_root;
 #endif
 
 /*
@@ -201,25 +199,21 @@ static int zs_size_classes;
 static const int fullness_threshold_frac = 3;
 
 struct size_class {
+	spinlock_t		lock;
+	struct page		*fullness_list[_ZS_NR_FULLNESS_GROUPS];
 	/*
 	 * Size of objects stored in this class. Must be multiple
 	 * of ZS_ALIGN.
 	 */
-	int size;
-	unsigned int index;
+	int			size;
+	unsigned int		index;
 
 	/* Number of PAGE_SIZE sized pages to combine to form a 'zspage' */
-	int pages_per_zspage;
-	/* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */
-	bool huge;
-
-#ifdef CONFIG_ZSMALLOC_STAT
-	struct zs_size_stat stats;
-#endif
-
-	spinlock_t lock;
+	int			pages_per_zspage;
+	struct zs_size_stat	stats;
 
-	struct page *fullness_list[_ZS_NR_FULLNESS_GROUPS];
+	/* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */
+	bool			huge;
 };
 
 /*
@@ -440,8 +434,6 @@ static int get_size_class_index(int size)
 	return min(zs_size_classes - 1, idx);
 }
 
-#ifdef CONFIG_ZSMALLOC_STAT
-
 static inline void zs_stat_inc(struct size_class *class,
 				enum zs_stat_type type, unsigned long cnt)
 {
@@ -460,6 +452,8 @@ static inline unsigned long zs_stat_get(struct size_class *class,
 	return class->stats.objs[type];
 }
 
+#ifdef CONFIG_ZSMALLOC_STAT
+
 static int __init zs_stat_init(void)
 {
 	if (!debugfs_initialized())
@@ -575,23 +569,6 @@ static void zs_pool_stat_destroy(struct zs_pool *pool)
 }
 
 #else /* CONFIG_ZSMALLOC_STAT */
-
-static inline void zs_stat_inc(struct size_class *class,
-				enum zs_stat_type type, unsigned long cnt)
-{
-}
-
-static inline void zs_stat_dec(struct size_class *class,
-				enum zs_stat_type type, unsigned long cnt)
-{
-}
-
-static inline unsigned long zs_stat_get(struct size_class *class,
-				enum zs_stat_type type)
-{
-	return 0;
-}
-
 static int __init zs_stat_init(void)
 {
 	return 0;
@@ -609,7 +586,6 @@ static inline int zs_pool_stat_create(char *name, struct zs_pool *pool)
 static inline void zs_pool_stat_destroy(struct zs_pool *pool)
 {
 }
-
 #endif
 
 
@@ -1691,7 +1667,6 @@ static void putback_zspage(struct zs_pool *pool, struct size_class *class,
 			class->size, class->pages_per_zspage));
 		atomic_long_sub(class->pages_per_zspage,
 				&pool->pages_allocated);
-
 		free_zspage(first_page);
 	}
 }
-- 
2.4.2.387.gf86f31a


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC][PATCHv2 4/8] zsmalloc: always keep per-class stats
@ 2015-06-05 12:03   ` Sergey Senozhatsky
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-05 12:03 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Always account per-class `zs_size_stat' stats. This data will
help us make better decisions during compaction. We are especially
interested in OBJ_ALLOCATED and OBJ_USED, which can tell us if
class compaction will result in any memory gain.

For instance, we know the number of allocated objects in the class,
the number of objects being used (so we also know how many objects
are not used) and the number of objects per-page. So we can ensure
if we have enough unused objects to form at least one ZS_EMPTY
zspage during compaction.

We calculate this value on per-class basis so we can calculate a
total number of zspages that can be released. Which is exactly what
a shrinker wants to know.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 mm/zsmalloc.c | 49 ++++++++++++-------------------------------------
 1 file changed, 12 insertions(+), 37 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index b94e281..0453347 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -169,14 +169,12 @@ enum zs_stat_type {
 	NR_ZS_STAT_TYPE,
 };
 
-#ifdef CONFIG_ZSMALLOC_STAT
-
-static struct dentry *zs_stat_root;
-
 struct zs_size_stat {
 	unsigned long objs[NR_ZS_STAT_TYPE];
 };
 
+#ifdef CONFIG_ZSMALLOC_STAT
+static struct dentry *zs_stat_root;
 #endif
 
 /*
@@ -201,25 +199,21 @@ static int zs_size_classes;
 static const int fullness_threshold_frac = 3;
 
 struct size_class {
+	spinlock_t		lock;
+	struct page		*fullness_list[_ZS_NR_FULLNESS_GROUPS];
 	/*
 	 * Size of objects stored in this class. Must be multiple
 	 * of ZS_ALIGN.
 	 */
-	int size;
-	unsigned int index;
+	int			size;
+	unsigned int		index;
 
 	/* Number of PAGE_SIZE sized pages to combine to form a 'zspage' */
-	int pages_per_zspage;
-	/* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */
-	bool huge;
-
-#ifdef CONFIG_ZSMALLOC_STAT
-	struct zs_size_stat stats;
-#endif
-
-	spinlock_t lock;
+	int			pages_per_zspage;
+	struct zs_size_stat	stats;
 
-	struct page *fullness_list[_ZS_NR_FULLNESS_GROUPS];
+	/* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */
+	bool			huge;
 };
 
 /*
@@ -440,8 +434,6 @@ static int get_size_class_index(int size)
 	return min(zs_size_classes - 1, idx);
 }
 
-#ifdef CONFIG_ZSMALLOC_STAT
-
 static inline void zs_stat_inc(struct size_class *class,
 				enum zs_stat_type type, unsigned long cnt)
 {
@@ -460,6 +452,8 @@ static inline unsigned long zs_stat_get(struct size_class *class,
 	return class->stats.objs[type];
 }
 
+#ifdef CONFIG_ZSMALLOC_STAT
+
 static int __init zs_stat_init(void)
 {
 	if (!debugfs_initialized())
@@ -575,23 +569,6 @@ static void zs_pool_stat_destroy(struct zs_pool *pool)
 }
 
 #else /* CONFIG_ZSMALLOC_STAT */
-
-static inline void zs_stat_inc(struct size_class *class,
-				enum zs_stat_type type, unsigned long cnt)
-{
-}
-
-static inline void zs_stat_dec(struct size_class *class,
-				enum zs_stat_type type, unsigned long cnt)
-{
-}
-
-static inline unsigned long zs_stat_get(struct size_class *class,
-				enum zs_stat_type type)
-{
-	return 0;
-}
-
 static int __init zs_stat_init(void)
 {
 	return 0;
@@ -609,7 +586,6 @@ static inline int zs_pool_stat_create(char *name, struct zs_pool *pool)
 static inline void zs_pool_stat_destroy(struct zs_pool *pool)
 {
 }
-
 #endif
 
 
@@ -1691,7 +1667,6 @@ static void putback_zspage(struct zs_pool *pool, struct size_class *class,
 			class->size, class->pages_per_zspage));
 		atomic_long_sub(class->pages_per_zspage,
 				&pool->pages_allocated);
-
 		free_zspage(first_page);
 	}
 }
-- 
2.4.2.387.gf86f31a

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC][PATCHv2 5/8] zsmalloc: introduce zs_can_compact() function
  2015-06-05 12:03 ` Sergey Senozhatsky
@ 2015-06-05 12:03   ` Sergey Senozhatsky
  -1 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-05 12:03 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

This function checks if class compaction will free any pages.
Rephrasing -- do we have enough unused objects to form at least
one ZS_EMPTY page and free it. It aborts compaction if class
compaction will not result in any (further) savings.

EXAMPLE (this debug output is not part of this patch set):

-- class size
-- number of allocated objects
-- number of used objects,
-- estimated number of pages that will be freed

[..]
 class-3072 objs:24652 inuse:24628 maxobjs-per-page:4  pages-tofree:6
 class-3072 objs:24648 inuse:24628 maxobjs-per-page:4  pages-tofree:5
 class-3072 objs:24644 inuse:24628 maxobjs-per-page:4  pages-tofree:4
 class-3072 objs:24640 inuse:24628 maxobjs-per-page:4  pages-tofree:3
 class-3072 objs:24636 inuse:24628 maxobjs-per-page:4  pages-tofree:2
 class-3072 objs:24632 inuse:24628 maxobjs-per-page:4  pages-tofree:1
 class-2720 objs:17970 inuse:17966 maxobjs-per-page:3  pages-tofree:1
 class-2720 objs:17967 inuse:17966 maxobjs-per-page:3  pages-tofree:0
 class-2720: Compaction is useless
 class-2448 objs:7680  inuse:7674  maxobjs-per-page:5  pages-tofree:1
 class-2336 objs:13510 inuse:13500 maxobjs-per-page:7  pages-tofree:1
 class-2336 objs:13503 inuse:13500 maxobjs-per-page:7  pages-tofree:0
 class-2336: Compaction is useless
 class-1808 objs:1161  inuse:1154  maxobjs-per-page:9  pages-tofree:0
 class-1808: Compaction is useless
 class-1744 objs:2135  inuse:2131  maxobjs-per-page:7  pages-tofree:0
 class-1744: Compaction is useless
 class-1536 objs:1328  inuse:1323  maxobjs-per-page:8  pages-tofree:0
 class-1536: Compaction is useless
 class-1488 objs:8855  inuse:8847  maxobjs-per-page:11 pages-tofree:0
 class-1488: Compaction is useless
 class-1360 objs:14880 inuse:14878 maxobjs-per-page:3  pages-tofree:0
 class-1360: Compaction is useless
 class-1248 objs:3588  inuse:3577  maxobjs-per-page:13 pages-tofree:0
 class-1248: Compaction is useless
 class-1216 objs:3380  inuse:3372  maxobjs-per-page:10 pages-tofree:0
 class-1216: Compaction is useless
 class-1168 objs:3416  inuse:3401  maxobjs-per-page:7  pages-tofree:2
 class-1168 objs:3409  inuse:3401  maxobjs-per-page:7  pages-tofree:1
 class-1104 objs:605   inuse:599   maxobjs-per-page:11 pages-tofree:0
 class-1104: Compaction is useless
[..]

Every "Compaction is useless" indicates that we saved some CPU cycles.

class-1104 has
	605	object allocated
	599	objects used
	11	objects per-page

Even if we have a ALMOST_EMPTY zspage, we still don't have enough room to
migrate all of its objects and free this zspage; so compaction will not
make a lot of sense here, it's better to just leave it as is.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 mm/zsmalloc.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 0453347..5d8f1d4 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1682,6 +1682,28 @@ static struct page *isolate_source_page(struct size_class *class)
 	return page;
 }
 
+/*
+ * Make sure that we actually can compact this class,
+ * IOW if migration will empty at least one page.
+ *
+ * Should be called under class->lock
+ */
+static unsigned long zs_can_compact(struct size_class *class)
+{
+	/*
+	 * Calculate how many unused allocated objects we
+	 * have and see if we can free any zspages. Otherwise,
+	 * compaction can just move objects back and forth w/o
+	 * any memory gain.
+	 */
+	unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
+		zs_stat_get(class, OBJ_USED);
+
+	obj_wasted /= get_maxobj_per_zspage(class->size,
+			class->pages_per_zspage);
+	return obj_wasted;
+}
+
 static unsigned long __zs_compact(struct zs_pool *pool,
 				struct size_class *class)
 {
@@ -1695,6 +1717,9 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 
 		BUG_ON(!is_first_page(src_page));
 
+		if (!zs_can_compact(class))
+			break;
+
 		cc.index = 0;
 		cc.s_page = src_page;
 
-- 
2.4.2.387.gf86f31a


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC][PATCHv2 5/8] zsmalloc: introduce zs_can_compact() function
@ 2015-06-05 12:03   ` Sergey Senozhatsky
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-05 12:03 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

This function checks if class compaction will free any pages.
Rephrasing -- do we have enough unused objects to form at least
one ZS_EMPTY page and free it. It aborts compaction if class
compaction will not result in any (further) savings.

EXAMPLE (this debug output is not part of this patch set):

-- class size
-- number of allocated objects
-- number of used objects,
-- estimated number of pages that will be freed

[..]
 class-3072 objs:24652 inuse:24628 maxobjs-per-page:4  pages-tofree:6
 class-3072 objs:24648 inuse:24628 maxobjs-per-page:4  pages-tofree:5
 class-3072 objs:24644 inuse:24628 maxobjs-per-page:4  pages-tofree:4
 class-3072 objs:24640 inuse:24628 maxobjs-per-page:4  pages-tofree:3
 class-3072 objs:24636 inuse:24628 maxobjs-per-page:4  pages-tofree:2
 class-3072 objs:24632 inuse:24628 maxobjs-per-page:4  pages-tofree:1
 class-2720 objs:17970 inuse:17966 maxobjs-per-page:3  pages-tofree:1
 class-2720 objs:17967 inuse:17966 maxobjs-per-page:3  pages-tofree:0
 class-2720: Compaction is useless
 class-2448 objs:7680  inuse:7674  maxobjs-per-page:5  pages-tofree:1
 class-2336 objs:13510 inuse:13500 maxobjs-per-page:7  pages-tofree:1
 class-2336 objs:13503 inuse:13500 maxobjs-per-page:7  pages-tofree:0
 class-2336: Compaction is useless
 class-1808 objs:1161  inuse:1154  maxobjs-per-page:9  pages-tofree:0
 class-1808: Compaction is useless
 class-1744 objs:2135  inuse:2131  maxobjs-per-page:7  pages-tofree:0
 class-1744: Compaction is useless
 class-1536 objs:1328  inuse:1323  maxobjs-per-page:8  pages-tofree:0
 class-1536: Compaction is useless
 class-1488 objs:8855  inuse:8847  maxobjs-per-page:11 pages-tofree:0
 class-1488: Compaction is useless
 class-1360 objs:14880 inuse:14878 maxobjs-per-page:3  pages-tofree:0
 class-1360: Compaction is useless
 class-1248 objs:3588  inuse:3577  maxobjs-per-page:13 pages-tofree:0
 class-1248: Compaction is useless
 class-1216 objs:3380  inuse:3372  maxobjs-per-page:10 pages-tofree:0
 class-1216: Compaction is useless
 class-1168 objs:3416  inuse:3401  maxobjs-per-page:7  pages-tofree:2
 class-1168 objs:3409  inuse:3401  maxobjs-per-page:7  pages-tofree:1
 class-1104 objs:605   inuse:599   maxobjs-per-page:11 pages-tofree:0
 class-1104: Compaction is useless
[..]

Every "Compaction is useless" indicates that we saved some CPU cycles.

class-1104 has
	605	object allocated
	599	objects used
	11	objects per-page

Even if we have a ALMOST_EMPTY zspage, we still don't have enough room to
migrate all of its objects and free this zspage; so compaction will not
make a lot of sense here, it's better to just leave it as is.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 mm/zsmalloc.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 0453347..5d8f1d4 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1682,6 +1682,28 @@ static struct page *isolate_source_page(struct size_class *class)
 	return page;
 }
 
+/*
+ * Make sure that we actually can compact this class,
+ * IOW if migration will empty at least one page.
+ *
+ * Should be called under class->lock
+ */
+static unsigned long zs_can_compact(struct size_class *class)
+{
+	/*
+	 * Calculate how many unused allocated objects we
+	 * have and see if we can free any zspages. Otherwise,
+	 * compaction can just move objects back and forth w/o
+	 * any memory gain.
+	 */
+	unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
+		zs_stat_get(class, OBJ_USED);
+
+	obj_wasted /= get_maxobj_per_zspage(class->size,
+			class->pages_per_zspage);
+	return obj_wasted;
+}
+
 static unsigned long __zs_compact(struct zs_pool *pool,
 				struct size_class *class)
 {
@@ -1695,6 +1717,9 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 
 		BUG_ON(!is_first_page(src_page));
 
+		if (!zs_can_compact(class))
+			break;
+
 		cc.index = 0;
 		cc.s_page = src_page;
 
-- 
2.4.2.387.gf86f31a

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC][PATCHv2 6/8] zsmalloc: cosmetic compaction code adjustments
  2015-06-05 12:03 ` Sergey Senozhatsky
@ 2015-06-05 12:03   ` Sergey Senozhatsky
  -1 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-05 12:03 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Change zs_object_copy() argument order to be (DST, SRC) rather
than (SRC, DST). copy/move functions usually have (to, from)
arguments order.

Rename alloc_target_page() to isolate_target_page(). This
function doesn't allocate anything, it isolates target page,
pretty much like isolate_source_page().

Tweak __zs_compact() comment.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 5d8f1d4..e9f653d 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1478,7 +1478,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
 }
 EXPORT_SYMBOL_GPL(zs_free);
 
-static void zs_object_copy(unsigned long src, unsigned long dst,
+static void zs_object_copy(unsigned long dst, unsigned long src,
 				struct size_class *class)
 {
 	struct page *s_page, *d_page;
@@ -1619,7 +1619,7 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
 
 		used_obj = handle_to_obj(handle);
 		free_obj = obj_malloc(d_page, class, handle);
-		zs_object_copy(used_obj, free_obj, class);
+		zs_object_copy(free_obj, used_obj, class);
 		index++;
 		record_obj(handle, free_obj);
 		unpin_tag(handle);
@@ -1635,7 +1635,7 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
 	return ret;
 }
 
-static struct page *alloc_target_page(struct size_class *class)
+static struct page *isolate_target_page(struct size_class *class)
 {
 	int i;
 	struct page *page;
@@ -1723,11 +1723,11 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 		cc.index = 0;
 		cc.s_page = src_page;
 
-		while ((dst_page = alloc_target_page(class))) {
+		while ((dst_page = isolate_target_page(class))) {
 			cc.d_page = dst_page;
 			/*
-			 * If there is no more space in dst_page, try to
-			 * allocate another zspage.
+			 * If there is no more space in dst_page, resched
+			 * and see if anyone had allocated another zspage.
 			 */
 			if (!migrate_zspage(pool, class, &cc))
 				break;
-- 
2.4.2.387.gf86f31a


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC][PATCHv2 6/8] zsmalloc: cosmetic compaction code adjustments
@ 2015-06-05 12:03   ` Sergey Senozhatsky
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-05 12:03 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Change zs_object_copy() argument order to be (DST, SRC) rather
than (SRC, DST). copy/move functions usually have (to, from)
arguments order.

Rename alloc_target_page() to isolate_target_page(). This
function doesn't allocate anything, it isolates target page,
pretty much like isolate_source_page().

Tweak __zs_compact() comment.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 5d8f1d4..e9f653d 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1478,7 +1478,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
 }
 EXPORT_SYMBOL_GPL(zs_free);
 
-static void zs_object_copy(unsigned long src, unsigned long dst,
+static void zs_object_copy(unsigned long dst, unsigned long src,
 				struct size_class *class)
 {
 	struct page *s_page, *d_page;
@@ -1619,7 +1619,7 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
 
 		used_obj = handle_to_obj(handle);
 		free_obj = obj_malloc(d_page, class, handle);
-		zs_object_copy(used_obj, free_obj, class);
+		zs_object_copy(free_obj, used_obj, class);
 		index++;
 		record_obj(handle, free_obj);
 		unpin_tag(handle);
@@ -1635,7 +1635,7 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
 	return ret;
 }
 
-static struct page *alloc_target_page(struct size_class *class)
+static struct page *isolate_target_page(struct size_class *class)
 {
 	int i;
 	struct page *page;
@@ -1723,11 +1723,11 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 		cc.index = 0;
 		cc.s_page = src_page;
 
-		while ((dst_page = alloc_target_page(class))) {
+		while ((dst_page = isolate_target_page(class))) {
 			cc.d_page = dst_page;
 			/*
-			 * If there is no more space in dst_page, try to
-			 * allocate another zspage.
+			 * If there is no more space in dst_page, resched
+			 * and see if anyone had allocated another zspage.
 			 */
 			if (!migrate_zspage(pool, class, &cc))
 				break;
-- 
2.4.2.387.gf86f31a

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC][PATCHv2 7/8] zsmalloc/zram: move `num_migrated' to zs_pool
  2015-06-05 12:03 ` Sergey Senozhatsky
@ 2015-06-05 12:03   ` Sergey Senozhatsky
  -1 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-05 12:03 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Remove the number of migrated objects from `zs_compact_control'
and move it to `zs_pool'. `zs_compact_control' has a limited
lifespan, we lose it when zs_compaction() returns back to zram.

To keep track of objects migrated during auto-compaction (issued
by the shrinker) we need to store this number in zs_pool.

Introduce zs_get_num_migrated() to export zs_pool's ->num_migrated
counter and use it in zram, so we can also drop a zram's copy of
`num_migrated'.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/zram_drv.c | 12 ++++++------
 drivers/block/zram/zram_drv.h |  1 -
 include/linux/zsmalloc.h      |  1 +
 mm/zsmalloc.c                 | 41 +++++++++++++++++++----------------------
 4 files changed, 26 insertions(+), 29 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 1cd8ade..1cb2a18 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -388,7 +388,6 @@ static ssize_t comp_algorithm_store(struct device *dev,
 static ssize_t compact_store(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t len)
 {
-	unsigned long nr_migrated;
 	struct zram *zram = dev_to_zram(dev);
 	struct zram_meta *meta;
 
@@ -399,8 +398,7 @@ static ssize_t compact_store(struct device *dev,
 	}
 
 	meta = zram->meta;
-	nr_migrated = zs_compact(meta->mem_pool);
-	atomic64_add(nr_migrated, &zram->stats.num_migrated);
+	zs_compact(meta->mem_pool);
 	up_read(&zram->init_lock);
 
 	return len;
@@ -428,13 +426,15 @@ static ssize_t mm_stat_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
 	struct zram *zram = dev_to_zram(dev);
-	u64 orig_size, mem_used = 0;
+	u64 orig_size, mem_used = 0, num_migrated = 0;
 	long max_used;
 	ssize_t ret;
 
 	down_read(&zram->init_lock);
-	if (init_done(zram))
+	if (init_done(zram)) {
 		mem_used = zs_get_total_pages(zram->meta->mem_pool);
+		num_migrated = zs_get_num_migrated(zram->meta->mem_pool);
+	}
 
 	orig_size = atomic64_read(&zram->stats.pages_stored);
 	max_used = atomic_long_read(&zram->stats.max_used_pages);
@@ -447,7 +447,7 @@ static ssize_t mm_stat_show(struct device *dev,
 			zram->limit_pages << PAGE_SHIFT,
 			max_used << PAGE_SHIFT,
 			(u64)atomic64_read(&zram->stats.zero_pages),
-			(u64)atomic64_read(&zram->stats.num_migrated));
+			num_migrated);
 	up_read(&zram->init_lock);
 
 	return ret;
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 6dbe2df..8e92339 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -78,7 +78,6 @@ struct zram_stats {
 	atomic64_t compr_data_size;	/* compressed size of pages stored */
 	atomic64_t num_reads;	/* failed + successful */
 	atomic64_t num_writes;	/* --do-- */
-	atomic64_t num_migrated;	/* no. of migrated object */
 	atomic64_t failed_reads;	/* can happen when memory is too low */
 	atomic64_t failed_writes;	/* can happen when memory is too low */
 	atomic64_t invalid_io;	/* non-page-aligned I/O requests */
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 1338190..e878875 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -47,6 +47,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 void zs_unmap_object(struct zs_pool *pool, unsigned long handle);
 
 unsigned long zs_get_total_pages(struct zs_pool *pool);
+unsigned long zs_get_num_migrated(struct zs_pool *pool);
 unsigned long zs_compact(struct zs_pool *pool);
 
 #endif
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e9f653d..a81e75b 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -237,16 +237,19 @@ struct link_free {
 };
 
 struct zs_pool {
-	char *name;
+	char			*name;
 
-	struct size_class **size_class;
-	struct kmem_cache *handle_cachep;
+	struct size_class	**size_class;
+	struct kmem_cache	*handle_cachep;
 
-	gfp_t flags;	/* allocation flags used when growing pool */
-	atomic_long_t pages_allocated;
+	/* Allocation flags used when growing pool */
+	gfp_t			flags;
+	atomic_long_t		pages_allocated;
+	/* How many objects were migrated */
+	unsigned long		num_migrated;
 
 #ifdef CONFIG_ZSMALLOC_STAT
-	struct dentry *stat_dentry;
+	struct dentry		*stat_dentry;
 #endif
 };
 
@@ -1219,6 +1222,12 @@ unsigned long zs_get_total_pages(struct zs_pool *pool)
 }
 EXPORT_SYMBOL_GPL(zs_get_total_pages);
 
+unsigned long zs_get_num_migrated(struct zs_pool *pool)
+{
+	return pool->num_migrated;
+}
+EXPORT_SYMBOL_GPL(zs_get_num_migrated);
+
 /**
  * zs_map_object - get address of allocated object from handle.
  * @pool: pool from which the object was allocated
@@ -1585,8 +1594,6 @@ struct zs_compact_control {
 	 /* Starting object index within @s_page which used for live object
 	  * in the subpage. */
 	int index;
-	/* how many of objects are migrated */
-	int nr_migrated;
 };
 
 static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
@@ -1597,7 +1604,6 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
 	struct page *s_page = cc->s_page;
 	struct page *d_page = cc->d_page;
 	unsigned long index = cc->index;
-	int nr_migrated = 0;
 	int ret = 0;
 
 	while (1) {
@@ -1624,13 +1630,12 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
 		record_obj(handle, free_obj);
 		unpin_tag(handle);
 		obj_free(pool, class, used_obj);
-		nr_migrated++;
+		pool->num_migrated++;
 	}
 
 	/* Remember last position in this iteration */
 	cc->s_page = s_page;
 	cc->index = index;
-	cc->nr_migrated = nr_migrated;
 
 	return ret;
 }
@@ -1704,13 +1709,11 @@ static unsigned long zs_can_compact(struct size_class *class)
 	return obj_wasted;
 }
 
-static unsigned long __zs_compact(struct zs_pool *pool,
-				struct size_class *class)
+static void __zs_compact(struct zs_pool *pool, struct size_class *class)
 {
 	struct zs_compact_control cc;
 	struct page *src_page;
 	struct page *dst_page = NULL;
-	unsigned long nr_total_migrated = 0;
 
 	spin_lock(&class->lock);
 	while ((src_page = isolate_source_page(class))) {
@@ -1733,7 +1736,6 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 				break;
 
 			putback_zspage(pool, class, dst_page);
-			nr_total_migrated += cc.nr_migrated;
 		}
 
 		/* Stop if we couldn't find slot */
@@ -1743,7 +1745,6 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 		putback_zspage(pool, class, dst_page);
 		putback_zspage(pool, class, src_page);
 		spin_unlock(&class->lock);
-		nr_total_migrated += cc.nr_migrated;
 		cond_resched();
 		spin_lock(&class->lock);
 	}
@@ -1752,14 +1753,11 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 		putback_zspage(pool, class, src_page);
 
 	spin_unlock(&class->lock);
-
-	return nr_total_migrated;
 }
 
 unsigned long zs_compact(struct zs_pool *pool)
 {
 	int i;
-	unsigned long nr_migrated = 0;
 	struct size_class *class;
 
 	for (i = zs_size_classes - 1; i >= 0; i--) {
@@ -1768,10 +1766,9 @@ unsigned long zs_compact(struct zs_pool *pool)
 			continue;
 		if (class->index != i)
 			continue;
-		nr_migrated += __zs_compact(pool, class);
+		__zs_compact(pool, class);
 	}
-
-	return nr_migrated;
+	return pool->num_migrated;
 }
 EXPORT_SYMBOL_GPL(zs_compact);
 
-- 
2.4.2.387.gf86f31a


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC][PATCHv2 7/8] zsmalloc/zram: move `num_migrated' to zs_pool
@ 2015-06-05 12:03   ` Sergey Senozhatsky
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-05 12:03 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Remove the number of migrated objects from `zs_compact_control'
and move it to `zs_pool'. `zs_compact_control' has a limited
lifespan, we lose it when zs_compaction() returns back to zram.

To keep track of objects migrated during auto-compaction (issued
by the shrinker) we need to store this number in zs_pool.

Introduce zs_get_num_migrated() to export zs_pool's ->num_migrated
counter and use it in zram, so we can also drop a zram's copy of
`num_migrated'.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/zram_drv.c | 12 ++++++------
 drivers/block/zram/zram_drv.h |  1 -
 include/linux/zsmalloc.h      |  1 +
 mm/zsmalloc.c                 | 41 +++++++++++++++++++----------------------
 4 files changed, 26 insertions(+), 29 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 1cd8ade..1cb2a18 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -388,7 +388,6 @@ static ssize_t comp_algorithm_store(struct device *dev,
 static ssize_t compact_store(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t len)
 {
-	unsigned long nr_migrated;
 	struct zram *zram = dev_to_zram(dev);
 	struct zram_meta *meta;
 
@@ -399,8 +398,7 @@ static ssize_t compact_store(struct device *dev,
 	}
 
 	meta = zram->meta;
-	nr_migrated = zs_compact(meta->mem_pool);
-	atomic64_add(nr_migrated, &zram->stats.num_migrated);
+	zs_compact(meta->mem_pool);
 	up_read(&zram->init_lock);
 
 	return len;
@@ -428,13 +426,15 @@ static ssize_t mm_stat_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
 	struct zram *zram = dev_to_zram(dev);
-	u64 orig_size, mem_used = 0;
+	u64 orig_size, mem_used = 0, num_migrated = 0;
 	long max_used;
 	ssize_t ret;
 
 	down_read(&zram->init_lock);
-	if (init_done(zram))
+	if (init_done(zram)) {
 		mem_used = zs_get_total_pages(zram->meta->mem_pool);
+		num_migrated = zs_get_num_migrated(zram->meta->mem_pool);
+	}
 
 	orig_size = atomic64_read(&zram->stats.pages_stored);
 	max_used = atomic_long_read(&zram->stats.max_used_pages);
@@ -447,7 +447,7 @@ static ssize_t mm_stat_show(struct device *dev,
 			zram->limit_pages << PAGE_SHIFT,
 			max_used << PAGE_SHIFT,
 			(u64)atomic64_read(&zram->stats.zero_pages),
-			(u64)atomic64_read(&zram->stats.num_migrated));
+			num_migrated);
 	up_read(&zram->init_lock);
 
 	return ret;
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 6dbe2df..8e92339 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -78,7 +78,6 @@ struct zram_stats {
 	atomic64_t compr_data_size;	/* compressed size of pages stored */
 	atomic64_t num_reads;	/* failed + successful */
 	atomic64_t num_writes;	/* --do-- */
-	atomic64_t num_migrated;	/* no. of migrated object */
 	atomic64_t failed_reads;	/* can happen when memory is too low */
 	atomic64_t failed_writes;	/* can happen when memory is too low */
 	atomic64_t invalid_io;	/* non-page-aligned I/O requests */
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 1338190..e878875 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -47,6 +47,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 void zs_unmap_object(struct zs_pool *pool, unsigned long handle);
 
 unsigned long zs_get_total_pages(struct zs_pool *pool);
+unsigned long zs_get_num_migrated(struct zs_pool *pool);
 unsigned long zs_compact(struct zs_pool *pool);
 
 #endif
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index e9f653d..a81e75b 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -237,16 +237,19 @@ struct link_free {
 };
 
 struct zs_pool {
-	char *name;
+	char			*name;
 
-	struct size_class **size_class;
-	struct kmem_cache *handle_cachep;
+	struct size_class	**size_class;
+	struct kmem_cache	*handle_cachep;
 
-	gfp_t flags;	/* allocation flags used when growing pool */
-	atomic_long_t pages_allocated;
+	/* Allocation flags used when growing pool */
+	gfp_t			flags;
+	atomic_long_t		pages_allocated;
+	/* How many objects were migrated */
+	unsigned long		num_migrated;
 
 #ifdef CONFIG_ZSMALLOC_STAT
-	struct dentry *stat_dentry;
+	struct dentry		*stat_dentry;
 #endif
 };
 
@@ -1219,6 +1222,12 @@ unsigned long zs_get_total_pages(struct zs_pool *pool)
 }
 EXPORT_SYMBOL_GPL(zs_get_total_pages);
 
+unsigned long zs_get_num_migrated(struct zs_pool *pool)
+{
+	return pool->num_migrated;
+}
+EXPORT_SYMBOL_GPL(zs_get_num_migrated);
+
 /**
  * zs_map_object - get address of allocated object from handle.
  * @pool: pool from which the object was allocated
@@ -1585,8 +1594,6 @@ struct zs_compact_control {
 	 /* Starting object index within @s_page which used for live object
 	  * in the subpage. */
 	int index;
-	/* how many of objects are migrated */
-	int nr_migrated;
 };
 
 static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
@@ -1597,7 +1604,6 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
 	struct page *s_page = cc->s_page;
 	struct page *d_page = cc->d_page;
 	unsigned long index = cc->index;
-	int nr_migrated = 0;
 	int ret = 0;
 
 	while (1) {
@@ -1624,13 +1630,12 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
 		record_obj(handle, free_obj);
 		unpin_tag(handle);
 		obj_free(pool, class, used_obj);
-		nr_migrated++;
+		pool->num_migrated++;
 	}
 
 	/* Remember last position in this iteration */
 	cc->s_page = s_page;
 	cc->index = index;
-	cc->nr_migrated = nr_migrated;
 
 	return ret;
 }
@@ -1704,13 +1709,11 @@ static unsigned long zs_can_compact(struct size_class *class)
 	return obj_wasted;
 }
 
-static unsigned long __zs_compact(struct zs_pool *pool,
-				struct size_class *class)
+static void __zs_compact(struct zs_pool *pool, struct size_class *class)
 {
 	struct zs_compact_control cc;
 	struct page *src_page;
 	struct page *dst_page = NULL;
-	unsigned long nr_total_migrated = 0;
 
 	spin_lock(&class->lock);
 	while ((src_page = isolate_source_page(class))) {
@@ -1733,7 +1736,6 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 				break;
 
 			putback_zspage(pool, class, dst_page);
-			nr_total_migrated += cc.nr_migrated;
 		}
 
 		/* Stop if we couldn't find slot */
@@ -1743,7 +1745,6 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 		putback_zspage(pool, class, dst_page);
 		putback_zspage(pool, class, src_page);
 		spin_unlock(&class->lock);
-		nr_total_migrated += cc.nr_migrated;
 		cond_resched();
 		spin_lock(&class->lock);
 	}
@@ -1752,14 +1753,11 @@ static unsigned long __zs_compact(struct zs_pool *pool,
 		putback_zspage(pool, class, src_page);
 
 	spin_unlock(&class->lock);
-
-	return nr_total_migrated;
 }
 
 unsigned long zs_compact(struct zs_pool *pool)
 {
 	int i;
-	unsigned long nr_migrated = 0;
 	struct size_class *class;
 
 	for (i = zs_size_classes - 1; i >= 0; i--) {
@@ -1768,10 +1766,9 @@ unsigned long zs_compact(struct zs_pool *pool)
 			continue;
 		if (class->index != i)
 			continue;
-		nr_migrated += __zs_compact(pool, class);
+		__zs_compact(pool, class);
 	}
-
-	return nr_migrated;
+	return pool->num_migrated;
 }
 EXPORT_SYMBOL_GPL(zs_compact);
 
-- 
2.4.2.387.gf86f31a

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
  2015-06-05 12:03 ` Sergey Senozhatsky
@ 2015-06-05 12:03   ` Sergey Senozhatsky
  -1 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-05 12:03 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Perform automatic pool compaction by a shrinker when system
is getting tight on memory.

Demonstration (this output is merely to show auto-compaction
effectiveness and is not part of the code):
[..]
[ 4283.803766] zram0 zs_shrinker_scan freed 364
[ 4285.398937] zram0 zs_shrinker_scan freed 471
[ 4286.044095] zram0 zs_shrinker_scan freed 273
[ 4286.951824] zram0 zs_shrinker_scan freed 312
[ 4287.583563] zram0 zs_shrinker_scan freed 222
[ 4289.360971] zram0 zs_shrinker_scan freed 343
[ 4289.884653] zram0 zs_shrinker_scan freed 210
[ 4291.204705] zram0 zs_shrinker_scan freed 175
[ 4292.043656] zram0 zs_shrinker_scan freed 425
[ 4292.273397] zram0 zs_shrinker_scan freed 109
[ 4292.513351] zram0 zs_shrinker_scan freed 191
[..]

cat /sys/block/zram0/mm_stat
 2908798976 2061913167 2091438080        0 2128449536      868     6074

Compaction now has a relatively quick pool scan so we are able to
estimate the number of pages that will be freed easily, which makes it
possible to call this function from a shrinker->count_objects() callback.
We also abort compaction as soon as we detect that we can't free any
pages any more, preventing wasteful objects migrations. In the example
above, "6074 objects were migrated" implies that we actually released
zspages back to system.

The initial patch was triggering compaction from zs_free() for
every ZS_ALMOST_EMPTY page. Minchan Kim proposed to use a slab
shrinker.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 71 insertions(+), 10 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index a81e75b..f262d8d 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -247,7 +247,9 @@ struct zs_pool {
 	atomic_long_t		pages_allocated;
 	/* How many objects were migrated */
 	unsigned long		num_migrated;
-
+	/* Compact classes */
+	struct shrinker		shrinker;
+	bool			shrinker_enabled;
 #ifdef CONFIG_ZSMALLOC_STAT
 	struct dentry		*stat_dentry;
 #endif
@@ -1728,12 +1730,9 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
 
 		while ((dst_page = isolate_target_page(class))) {
 			cc.d_page = dst_page;
-			/*
-			 * If there is no more space in dst_page, resched
-			 * and see if anyone had allocated another zspage.
-			 */
+
 			if (!migrate_zspage(pool, class, &cc))
-				break;
+				goto out;
 
 			putback_zspage(pool, class, dst_page);
 		}
@@ -1744,11 +1743,10 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
 
 		putback_zspage(pool, class, dst_page);
 		putback_zspage(pool, class, src_page);
-		spin_unlock(&class->lock);
-		cond_resched();
-		spin_lock(&class->lock);
 	}
-
+out:
+	if (dst_page)
+		putback_zspage(pool, class, dst_page);
 	if (src_page)
 		putback_zspage(pool, class, src_page);
 
@@ -1772,6 +1770,65 @@ unsigned long zs_compact(struct zs_pool *pool)
 }
 EXPORT_SYMBOL_GPL(zs_compact);
 
+static unsigned long zs_shrinker_scan(struct shrinker *shrinker,
+		struct shrink_control *sc)
+{
+	unsigned long freed;
+	struct zs_pool *pool = container_of(shrinker, struct zs_pool,
+			shrinker);
+
+	freed = pool->num_migrated;
+	/* Compact classes and calculate compaction delta */
+	freed = zs_compact(pool) - freed;
+
+	return freed ? freed : SHRINK_STOP;
+}
+
+static unsigned long zs_shrinker_count(struct shrinker *shrinker,
+		struct shrink_control *sc)
+{
+	int i;
+	struct size_class *class;
+	unsigned long to_free = 0;
+	struct zs_pool *pool = container_of(shrinker, struct zs_pool,
+			shrinker);
+
+	if (!pool->shrinker_enabled)
+		return 0;
+
+	for (i = zs_size_classes - 1; i >= 0; i--) {
+		class = pool->size_class[i];
+		if (!class)
+			continue;
+		if (class->index != i)
+			continue;
+
+		spin_lock(&class->lock);
+		to_free += zs_can_compact(class);
+		spin_unlock(&class->lock);
+	}
+
+	return to_free;
+}
+
+static void zs_unregister_shrinker(struct zs_pool *pool)
+{
+	if (pool->shrinker_enabled) {
+		unregister_shrinker(&pool->shrinker);
+		pool->shrinker_enabled = false;
+	}
+}
+
+static int zs_register_shrinker(struct zs_pool *pool)
+{
+	pool->shrinker.scan_objects = zs_shrinker_scan;
+	pool->shrinker.count_objects = zs_shrinker_count;
+	pool->shrinker.batch = 0;
+	pool->shrinker.seeks = DEFAULT_SEEKS;
+
+	return register_shrinker(&pool->shrinker);
+}
+
 /**
  * zs_create_pool - Creates an allocation pool to work from.
  * @flags: allocation flags used to allocate pool metadata
@@ -1857,6 +1914,9 @@ struct zs_pool *zs_create_pool(char *name, gfp_t flags)
 	if (zs_pool_stat_create(name, pool))
 		goto err;
 
+	/* Not critical, we still can use the pool */
+	if (zs_register_shrinker(pool) == 0)
+		pool->shrinker_enabled = true;
 	return pool;
 
 err:
@@ -1869,6 +1929,7 @@ void zs_destroy_pool(struct zs_pool *pool)
 {
 	int i;
 
+	zs_unregister_shrinker(pool);
 	zs_pool_stat_destroy(pool);
 
 	for (i = 0; i < zs_size_classes; i++) {
-- 
2.4.2.387.gf86f31a


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
@ 2015-06-05 12:03   ` Sergey Senozhatsky
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-05 12:03 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Perform automatic pool compaction by a shrinker when system
is getting tight on memory.

Demonstration (this output is merely to show auto-compaction
effectiveness and is not part of the code):
[..]
[ 4283.803766] zram0 zs_shrinker_scan freed 364
[ 4285.398937] zram0 zs_shrinker_scan freed 471
[ 4286.044095] zram0 zs_shrinker_scan freed 273
[ 4286.951824] zram0 zs_shrinker_scan freed 312
[ 4287.583563] zram0 zs_shrinker_scan freed 222
[ 4289.360971] zram0 zs_shrinker_scan freed 343
[ 4289.884653] zram0 zs_shrinker_scan freed 210
[ 4291.204705] zram0 zs_shrinker_scan freed 175
[ 4292.043656] zram0 zs_shrinker_scan freed 425
[ 4292.273397] zram0 zs_shrinker_scan freed 109
[ 4292.513351] zram0 zs_shrinker_scan freed 191
[..]

cat /sys/block/zram0/mm_stat
 2908798976 2061913167 2091438080        0 2128449536      868     6074

Compaction now has a relatively quick pool scan so we are able to
estimate the number of pages that will be freed easily, which makes it
possible to call this function from a shrinker->count_objects() callback.
We also abort compaction as soon as we detect that we can't free any
pages any more, preventing wasteful objects migrations. In the example
above, "6074 objects were migrated" implies that we actually released
zspages back to system.

The initial patch was triggering compaction from zs_free() for
every ZS_ALMOST_EMPTY page. Minchan Kim proposed to use a slab
shrinker.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 71 insertions(+), 10 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index a81e75b..f262d8d 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -247,7 +247,9 @@ struct zs_pool {
 	atomic_long_t		pages_allocated;
 	/* How many objects were migrated */
 	unsigned long		num_migrated;
-
+	/* Compact classes */
+	struct shrinker		shrinker;
+	bool			shrinker_enabled;
 #ifdef CONFIG_ZSMALLOC_STAT
 	struct dentry		*stat_dentry;
 #endif
@@ -1728,12 +1730,9 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
 
 		while ((dst_page = isolate_target_page(class))) {
 			cc.d_page = dst_page;
-			/*
-			 * If there is no more space in dst_page, resched
-			 * and see if anyone had allocated another zspage.
-			 */
+
 			if (!migrate_zspage(pool, class, &cc))
-				break;
+				goto out;
 
 			putback_zspage(pool, class, dst_page);
 		}
@@ -1744,11 +1743,10 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
 
 		putback_zspage(pool, class, dst_page);
 		putback_zspage(pool, class, src_page);
-		spin_unlock(&class->lock);
-		cond_resched();
-		spin_lock(&class->lock);
 	}
-
+out:
+	if (dst_page)
+		putback_zspage(pool, class, dst_page);
 	if (src_page)
 		putback_zspage(pool, class, src_page);
 
@@ -1772,6 +1770,65 @@ unsigned long zs_compact(struct zs_pool *pool)
 }
 EXPORT_SYMBOL_GPL(zs_compact);
 
+static unsigned long zs_shrinker_scan(struct shrinker *shrinker,
+		struct shrink_control *sc)
+{
+	unsigned long freed;
+	struct zs_pool *pool = container_of(shrinker, struct zs_pool,
+			shrinker);
+
+	freed = pool->num_migrated;
+	/* Compact classes and calculate compaction delta */
+	freed = zs_compact(pool) - freed;
+
+	return freed ? freed : SHRINK_STOP;
+}
+
+static unsigned long zs_shrinker_count(struct shrinker *shrinker,
+		struct shrink_control *sc)
+{
+	int i;
+	struct size_class *class;
+	unsigned long to_free = 0;
+	struct zs_pool *pool = container_of(shrinker, struct zs_pool,
+			shrinker);
+
+	if (!pool->shrinker_enabled)
+		return 0;
+
+	for (i = zs_size_classes - 1; i >= 0; i--) {
+		class = pool->size_class[i];
+		if (!class)
+			continue;
+		if (class->index != i)
+			continue;
+
+		spin_lock(&class->lock);
+		to_free += zs_can_compact(class);
+		spin_unlock(&class->lock);
+	}
+
+	return to_free;
+}
+
+static void zs_unregister_shrinker(struct zs_pool *pool)
+{
+	if (pool->shrinker_enabled) {
+		unregister_shrinker(&pool->shrinker);
+		pool->shrinker_enabled = false;
+	}
+}
+
+static int zs_register_shrinker(struct zs_pool *pool)
+{
+	pool->shrinker.scan_objects = zs_shrinker_scan;
+	pool->shrinker.count_objects = zs_shrinker_count;
+	pool->shrinker.batch = 0;
+	pool->shrinker.seeks = DEFAULT_SEEKS;
+
+	return register_shrinker(&pool->shrinker);
+}
+
 /**
  * zs_create_pool - Creates an allocation pool to work from.
  * @flags: allocation flags used to allocate pool metadata
@@ -1857,6 +1914,9 @@ struct zs_pool *zs_create_pool(char *name, gfp_t flags)
 	if (zs_pool_stat_create(name, pool))
 		goto err;
 
+	/* Not critical, we still can use the pool */
+	if (zs_register_shrinker(pool) == 0)
+		pool->shrinker_enabled = true;
 	return pool;
 
 err:
@@ -1869,6 +1929,7 @@ void zs_destroy_pool(struct zs_pool *pool)
 {
 	int i;
 
+	zs_unregister_shrinker(pool);
 	zs_pool_stat_destroy(pool);
 
 	for (i = 0; i < zs_size_classes; i++) {
-- 
2.4.2.387.gf86f31a

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 0/8] introduce automatic pool compaction
  2015-06-05 12:03 ` Sergey Senozhatsky
@ 2015-06-10  0:04   ` Minchan Kim
  -1 siblings, 0 replies; 52+ messages in thread
From: Minchan Kim @ 2015-06-10  0:04 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky

Hello Sergey,

Thanks for looking this and sorry for the delay for review.
I don't have a time to hold a review yet.
Please wait and I try to get a time within this week.

Thanks for your patience.

On Fri, Jun 05, 2015 at 09:03:50PM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> This patch set tweaks compaction and makes it possible to trigger
> pool compaction automatically when system is getting low on memory.
> 
> zsmalloc in some cases can suffer from a notable fragmentation and
> compaction can release some considerable amount of memory. The problem
> here is that currently we fully rely on user space to perform compaction
> when needed. However, performing zsmalloc compaction is not always an
> obvious thing to do. For example, suppose we have a `idle' fragmented
> (compaction was never performed) zram device and system is getting low
> on memory due to some 3rd party user processes (gcc LTO, or firefox, etc.).
> It's quite unlikely that user space will issue zpool compaction in this
> case. Besides, user space cannot tell for sure how badly pool is
> fragmented; however, this info is known to zsmalloc and, hence, to a
> shrinker.
> 
> v2:
> -- use a slab shrinker instead of triggering compaction from zs_free (Minchan)
> 
> Sergey Senozhatsky (8):
>   zsmalloc: drop unused variable `nr_to_migrate'
>   zsmalloc: partial page ordering within a fullness_list
>   zsmalloc: lower ZS_ALMOST_FULL waterline
>   zsmalloc: always keep per-class stats
>   zsmalloc: introduce zs_can_compact() function
>   zsmalloc: cosmetic compaction code adjustments
>   zsmalloc/zram: move `num_migrated' to zs_pool
>   zsmalloc: register a shrinker to trigger auto-compaction
> 
>  drivers/block/zram/zram_drv.c |  12 +--
>  drivers/block/zram/zram_drv.h |   1 -
>  include/linux/zsmalloc.h      |   1 +
>  mm/zsmalloc.c                 | 228 +++++++++++++++++++++++++++---------------
>  4 files changed, 152 insertions(+), 90 deletions(-)
> 
> -- 
> 2.4.2.387.gf86f31a
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 0/8] introduce automatic pool compaction
@ 2015-06-10  0:04   ` Minchan Kim
  0 siblings, 0 replies; 52+ messages in thread
From: Minchan Kim @ 2015-06-10  0:04 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky

Hello Sergey,

Thanks for looking this and sorry for the delay for review.
I don't have a time to hold a review yet.
Please wait and I try to get a time within this week.

Thanks for your patience.

On Fri, Jun 05, 2015 at 09:03:50PM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> This patch set tweaks compaction and makes it possible to trigger
> pool compaction automatically when system is getting low on memory.
> 
> zsmalloc in some cases can suffer from a notable fragmentation and
> compaction can release some considerable amount of memory. The problem
> here is that currently we fully rely on user space to perform compaction
> when needed. However, performing zsmalloc compaction is not always an
> obvious thing to do. For example, suppose we have a `idle' fragmented
> (compaction was never performed) zram device and system is getting low
> on memory due to some 3rd party user processes (gcc LTO, or firefox, etc.).
> It's quite unlikely that user space will issue zpool compaction in this
> case. Besides, user space cannot tell for sure how badly pool is
> fragmented; however, this info is known to zsmalloc and, hence, to a
> shrinker.
> 
> v2:
> -- use a slab shrinker instead of triggering compaction from zs_free (Minchan)
> 
> Sergey Senozhatsky (8):
>   zsmalloc: drop unused variable `nr_to_migrate'
>   zsmalloc: partial page ordering within a fullness_list
>   zsmalloc: lower ZS_ALMOST_FULL waterline
>   zsmalloc: always keep per-class stats
>   zsmalloc: introduce zs_can_compact() function
>   zsmalloc: cosmetic compaction code adjustments
>   zsmalloc/zram: move `num_migrated' to zs_pool
>   zsmalloc: register a shrinker to trigger auto-compaction
> 
>  drivers/block/zram/zram_drv.c |  12 +--
>  drivers/block/zram/zram_drv.h |   1 -
>  include/linux/zsmalloc.h      |   1 +
>  mm/zsmalloc.c                 | 228 +++++++++++++++++++++++++++---------------
>  4 files changed, 152 insertions(+), 90 deletions(-)
> 
> -- 
> 2.4.2.387.gf86f31a
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 0/8] introduce automatic pool compaction
  2015-06-10  0:04   ` Minchan Kim
@ 2015-06-10  0:07     ` Sergey Senozhatsky
  -1 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-10  0:07 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel,
	Sergey Senozhatsky

Hello,

On (06/10/15 09:04), Minchan Kim wrote:
> Hello Sergey,
> 
> Thanks for looking this and sorry for the delay for review.
> I don't have a time to hold a review yet.
> Please wait and I try to get a time within this week.
> 
> Thanks for your patience.

sure, no problem at all.

	-ss

> On Fri, Jun 05, 2015 at 09:03:50PM +0900, Sergey Senozhatsky wrote:
> > Hello,
> > 
> > This patch set tweaks compaction and makes it possible to trigger
> > pool compaction automatically when system is getting low on memory.
> > 
> > zsmalloc in some cases can suffer from a notable fragmentation and
> > compaction can release some considerable amount of memory. The problem
> > here is that currently we fully rely on user space to perform compaction
> > when needed. However, performing zsmalloc compaction is not always an
> > obvious thing to do. For example, suppose we have a `idle' fragmented
> > (compaction was never performed) zram device and system is getting low
> > on memory due to some 3rd party user processes (gcc LTO, or firefox, etc.).
> > It's quite unlikely that user space will issue zpool compaction in this
> > case. Besides, user space cannot tell for sure how badly pool is
> > fragmented; however, this info is known to zsmalloc and, hence, to a
> > shrinker.
> > 
> > v2:
> > -- use a slab shrinker instead of triggering compaction from zs_free (Minchan)
> > 
> > Sergey Senozhatsky (8):
> >   zsmalloc: drop unused variable `nr_to_migrate'
> >   zsmalloc: partial page ordering within a fullness_list
> >   zsmalloc: lower ZS_ALMOST_FULL waterline
> >   zsmalloc: always keep per-class stats
> >   zsmalloc: introduce zs_can_compact() function
> >   zsmalloc: cosmetic compaction code adjustments
> >   zsmalloc/zram: move `num_migrated' to zs_pool
> >   zsmalloc: register a shrinker to trigger auto-compaction
> > 
> >  drivers/block/zram/zram_drv.c |  12 +--
> >  drivers/block/zram/zram_drv.h |   1 -
> >  include/linux/zsmalloc.h      |   1 +
> >  mm/zsmalloc.c                 | 228 +++++++++++++++++++++++++++---------------
> >  4 files changed, 152 insertions(+), 90 deletions(-)
> > 
> > -- 
> > 2.4.2.387.gf86f31a
> > 
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 0/8] introduce automatic pool compaction
@ 2015-06-10  0:07     ` Sergey Senozhatsky
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-10  0:07 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel,
	Sergey Senozhatsky

Hello,

On (06/10/15 09:04), Minchan Kim wrote:
> Hello Sergey,
> 
> Thanks for looking this and sorry for the delay for review.
> I don't have a time to hold a review yet.
> Please wait and I try to get a time within this week.
> 
> Thanks for your patience.

sure, no problem at all.

	-ss

> On Fri, Jun 05, 2015 at 09:03:50PM +0900, Sergey Senozhatsky wrote:
> > Hello,
> > 
> > This patch set tweaks compaction and makes it possible to trigger
> > pool compaction automatically when system is getting low on memory.
> > 
> > zsmalloc in some cases can suffer from a notable fragmentation and
> > compaction can release some considerable amount of memory. The problem
> > here is that currently we fully rely on user space to perform compaction
> > when needed. However, performing zsmalloc compaction is not always an
> > obvious thing to do. For example, suppose we have a `idle' fragmented
> > (compaction was never performed) zram device and system is getting low
> > on memory due to some 3rd party user processes (gcc LTO, or firefox, etc.).
> > It's quite unlikely that user space will issue zpool compaction in this
> > case. Besides, user space cannot tell for sure how badly pool is
> > fragmented; however, this info is known to zsmalloc and, hence, to a
> > shrinker.
> > 
> > v2:
> > -- use a slab shrinker instead of triggering compaction from zs_free (Minchan)
> > 
> > Sergey Senozhatsky (8):
> >   zsmalloc: drop unused variable `nr_to_migrate'
> >   zsmalloc: partial page ordering within a fullness_list
> >   zsmalloc: lower ZS_ALMOST_FULL waterline
> >   zsmalloc: always keep per-class stats
> >   zsmalloc: introduce zs_can_compact() function
> >   zsmalloc: cosmetic compaction code adjustments
> >   zsmalloc/zram: move `num_migrated' to zs_pool
> >   zsmalloc: register a shrinker to trigger auto-compaction
> > 
> >  drivers/block/zram/zram_drv.c |  12 +--
> >  drivers/block/zram/zram_drv.h |   1 -
> >  include/linux/zsmalloc.h      |   1 +
> >  mm/zsmalloc.c                 | 228 +++++++++++++++++++++++++++---------------
> >  4 files changed, 152 insertions(+), 90 deletions(-)
> > 
> > -- 
> > 2.4.2.387.gf86f31a
> > 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 2/8] zsmalloc: partial page ordering within a fullness_list
  2015-06-05 12:03   ` Sergey Senozhatsky
@ 2015-06-16 13:19     ` Minchan Kim
  -1 siblings, 0 replies; 52+ messages in thread
From: Minchan Kim @ 2015-06-16 13:19 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky

Hello Sergey,

On Fri, Jun 05, 2015 at 09:03:52PM +0900, Sergey Senozhatsky wrote:
> We want to see more ZS_FULL pages and less ZS_ALMOST_{FULL, EMPTY}
> pages. Put a page with higher ->inuse count first within its
> ->fullness_list, which will give us better chances to fill up this
> page with new objects (find_get_zspage() return ->fullness_list head
> for new object allocation), so some zspages will become
> ZS_ALMOST_FULL/ZS_FULL quicker.
> 
> It performs a trivial and cheap ->inuse compare which does not slow
> down zsmalloc, and in the worst case it keeps the list pages not in
> any particular order, just like we do it now.

Fair enough.

I think it would be better with small cost and it matches current
zsmalloc design which allocates a object from ALMOST_FULL zspage
first to reduce memory footprint.

Although we uses page->lru to link zspages, it's not lru order
so I like your idea.

One think I want to say is it doesn't need to be part of this patchset.
I hope you gives some data to prove gain and includes it in changelog
and resubmit, please.

> 
> A more expensive solution could sort fullness_list by ->inuse count.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  mm/zsmalloc.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index ce3310c..cd37bda 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -658,8 +658,16 @@ static void insert_zspage(struct page *page, struct size_class *class,
>  		return;
>  
>  	head = &class->fullness_list[fullness];
> -	if (*head)
> -		list_add_tail(&page->lru, &(*head)->lru);
> +	if (*head) {
> +		/*
> +		 * We want to see more ZS_FULL pages and less almost
> +		 * empty/full. Put pages with higher ->inuse first.
> +		 */
> +		if (page->inuse < (*head)->inuse)
> +			list_add_tail(&page->lru, &(*head)->lru);
> +		else
> +			list_add(&page->lru, &(*head)->lru);
> +	}
>  
>  	*head = page;
>  	zs_stat_inc(class, fullness == ZS_ALMOST_EMPTY ?
> -- 
> 2.4.2.387.gf86f31a
> 

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 2/8] zsmalloc: partial page ordering within a fullness_list
@ 2015-06-16 13:19     ` Minchan Kim
  0 siblings, 0 replies; 52+ messages in thread
From: Minchan Kim @ 2015-06-16 13:19 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky

Hello Sergey,

On Fri, Jun 05, 2015 at 09:03:52PM +0900, Sergey Senozhatsky wrote:
> We want to see more ZS_FULL pages and less ZS_ALMOST_{FULL, EMPTY}
> pages. Put a page with higher ->inuse count first within its
> ->fullness_list, which will give us better chances to fill up this
> page with new objects (find_get_zspage() return ->fullness_list head
> for new object allocation), so some zspages will become
> ZS_ALMOST_FULL/ZS_FULL quicker.
> 
> It performs a trivial and cheap ->inuse compare which does not slow
> down zsmalloc, and in the worst case it keeps the list pages not in
> any particular order, just like we do it now.

Fair enough.

I think it would be better with small cost and it matches current
zsmalloc design which allocates a object from ALMOST_FULL zspage
first to reduce memory footprint.

Although we uses page->lru to link zspages, it's not lru order
so I like your idea.

One think I want to say is it doesn't need to be part of this patchset.
I hope you gives some data to prove gain and includes it in changelog
and resubmit, please.

> 
> A more expensive solution could sort fullness_list by ->inuse count.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  mm/zsmalloc.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index ce3310c..cd37bda 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -658,8 +658,16 @@ static void insert_zspage(struct page *page, struct size_class *class,
>  		return;
>  
>  	head = &class->fullness_list[fullness];
> -	if (*head)
> -		list_add_tail(&page->lru, &(*head)->lru);
> +	if (*head) {
> +		/*
> +		 * We want to see more ZS_FULL pages and less almost
> +		 * empty/full. Put pages with higher ->inuse first.
> +		 */
> +		if (page->inuse < (*head)->inuse)
> +			list_add_tail(&page->lru, &(*head)->lru);
> +		else
> +			list_add(&page->lru, &(*head)->lru);
> +	}
>  
>  	*head = page;
>  	zs_stat_inc(class, fullness == ZS_ALMOST_EMPTY ?
> -- 
> 2.4.2.387.gf86f31a
> 

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 3/8] zsmalloc: lower ZS_ALMOST_FULL waterline
  2015-06-05 12:03   ` Sergey Senozhatsky
@ 2015-06-16 13:37     ` Minchan Kim
  -1 siblings, 0 replies; 52+ messages in thread
From: Minchan Kim @ 2015-06-16 13:37 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky

On Fri, Jun 05, 2015 at 09:03:53PM +0900, Sergey Senozhatsky wrote:
> get_fullness_group() considers 3/4 full pages as almost empty.
> That, unfortunately, marks as ALMOST_EMPTY pages that we would
> probably like to keep in ALMOST_FULL lists.
> 
> ALMOST_EMPTY:
> [..]
>   inuse: 3 max_objects: 4
>   inuse: 5 max_objects: 7
>   inuse: 5 max_objects: 7
>   inuse: 2 max_objects: 3
> [..]
> 
> For "inuse: 5 max_objexts: 7" ALMOST_EMPTY page, for example,
> it'll take 2 obj_malloc to make the page FULL and 5 obj_free to
> make it EMPTY. Compaction selects ALMOST_EMPTY pages as source
> pages, which can result in extra object moves.
> 
> In other words, from compaction point of view, it makes more
> sense to fill this page, rather than drain it.
> 
> Decrease ALMOST_FULL waterline to 2/3 of max capacity; which is,
> of course, still imperfect, but can shorten compaction
> execution time.

However, at worst case, once compaction is done, it could remain
33% fragment space while it can remain 25% fragment space in current.
Maybe 25% wouldn't enough so we might need to scan ZS_ALMOST_FULL as
source in future. Anyway, compaction is really slow path now so
I prefer saving memory space by reduce internal fragmentation to
performance caused more copy of objects.

> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  mm/zsmalloc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index cd37bda..b94e281 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -198,7 +198,7 @@ static int zs_size_classes;
>   *
>   * (see: fix_fullness_group())
>   */
> -static const int fullness_threshold_frac = 4;
> +static const int fullness_threshold_frac = 3;
>  
>  struct size_class {
>  	/*
> @@ -633,7 +633,7 @@ static enum fullness_group get_fullness_group(struct page *page)
>  		fg = ZS_EMPTY;
>  	else if (inuse == max_objects)
>  		fg = ZS_FULL;
> -	else if (inuse <= 3 * max_objects / fullness_threshold_frac)
> +	else if (inuse <= 2 * max_objects / fullness_threshold_frac)
>  		fg = ZS_ALMOST_EMPTY;
>  	else
>  		fg = ZS_ALMOST_FULL;
> -- 
> 2.4.2.387.gf86f31a
> 

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 3/8] zsmalloc: lower ZS_ALMOST_FULL waterline
@ 2015-06-16 13:37     ` Minchan Kim
  0 siblings, 0 replies; 52+ messages in thread
From: Minchan Kim @ 2015-06-16 13:37 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky

On Fri, Jun 05, 2015 at 09:03:53PM +0900, Sergey Senozhatsky wrote:
> get_fullness_group() considers 3/4 full pages as almost empty.
> That, unfortunately, marks as ALMOST_EMPTY pages that we would
> probably like to keep in ALMOST_FULL lists.
> 
> ALMOST_EMPTY:
> [..]
>   inuse: 3 max_objects: 4
>   inuse: 5 max_objects: 7
>   inuse: 5 max_objects: 7
>   inuse: 2 max_objects: 3
> [..]
> 
> For "inuse: 5 max_objexts: 7" ALMOST_EMPTY page, for example,
> it'll take 2 obj_malloc to make the page FULL and 5 obj_free to
> make it EMPTY. Compaction selects ALMOST_EMPTY pages as source
> pages, which can result in extra object moves.
> 
> In other words, from compaction point of view, it makes more
> sense to fill this page, rather than drain it.
> 
> Decrease ALMOST_FULL waterline to 2/3 of max capacity; which is,
> of course, still imperfect, but can shorten compaction
> execution time.

However, at worst case, once compaction is done, it could remain
33% fragment space while it can remain 25% fragment space in current.
Maybe 25% wouldn't enough so we might need to scan ZS_ALMOST_FULL as
source in future. Anyway, compaction is really slow path now so
I prefer saving memory space by reduce internal fragmentation to
performance caused more copy of objects.

> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  mm/zsmalloc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index cd37bda..b94e281 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -198,7 +198,7 @@ static int zs_size_classes;
>   *
>   * (see: fix_fullness_group())
>   */
> -static const int fullness_threshold_frac = 4;
> +static const int fullness_threshold_frac = 3;
>  
>  struct size_class {
>  	/*
> @@ -633,7 +633,7 @@ static enum fullness_group get_fullness_group(struct page *page)
>  		fg = ZS_EMPTY;
>  	else if (inuse == max_objects)
>  		fg = ZS_FULL;
> -	else if (inuse <= 3 * max_objects / fullness_threshold_frac)
> +	else if (inuse <= 2 * max_objects / fullness_threshold_frac)
>  		fg = ZS_ALMOST_EMPTY;
>  	else
>  		fg = ZS_ALMOST_FULL;
> -- 
> 2.4.2.387.gf86f31a
> 

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 5/8] zsmalloc: introduce zs_can_compact() function
  2015-06-05 12:03   ` Sergey Senozhatsky
@ 2015-06-16 14:19     ` Minchan Kim
  -1 siblings, 0 replies; 52+ messages in thread
From: Minchan Kim @ 2015-06-16 14:19 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky

On Fri, Jun 05, 2015 at 09:03:55PM +0900, Sergey Senozhatsky wrote:
> This function checks if class compaction will free any pages.
> Rephrasing -- do we have enough unused objects to form at least
> one ZS_EMPTY page and free it. It aborts compaction if class
> compaction will not result in any (further) savings.
> 
> EXAMPLE (this debug output is not part of this patch set):
> 
> -- class size
> -- number of allocated objects
> -- number of used objects,
> -- estimated number of pages that will be freed
> 
> [..]
>  class-3072 objs:24652 inuse:24628 maxobjs-per-page:4  pages-tofree:6

Please use clear term. We have been used zspage as cluster of pages.

                                     maxobjs-per-zspage:4

And say what is pages-per-zspage for each class.
then, write how you calculate it for easy reviewing.

* class-3072
* pages-per-zspage: 3
* maxobjs-per-zspage = 4

In your example, allocated obj = 24652 and inuse obj = 24628
so 24652 - 24628 = 24 = 4(ie, maxobjs-per-zspage) * 6
so we can save 6 zspage. A zspage includes 3 pages so we can
save 3 * 6 = 18 pages via compaction.


>  class-3072 objs:24648 inuse:24628 maxobjs-per-page:4  pages-tofree:5
>  class-3072 objs:24644 inuse:24628 maxobjs-per-page:4  pages-tofree:4
>  class-3072 objs:24640 inuse:24628 maxobjs-per-page:4  pages-tofree:3
>  class-3072 objs:24636 inuse:24628 maxobjs-per-page:4  pages-tofree:2
>  class-3072 objs:24632 inuse:24628 maxobjs-per-page:4  pages-tofree:1
>  class-2720 objs:17970 inuse:17966 maxobjs-per-page:3  pages-tofree:1
>  class-2720 objs:17967 inuse:17966 maxobjs-per-page:3  pages-tofree:0
>  class-2720: Compaction is useless
>  class-2448 objs:7680  inuse:7674  maxobjs-per-page:5  pages-tofree:1
>  class-2336 objs:13510 inuse:13500 maxobjs-per-page:7  pages-tofree:1
>  class-2336 objs:13503 inuse:13500 maxobjs-per-page:7  pages-tofree:0
>  class-2336: Compaction is useless
>  class-1808 objs:1161  inuse:1154  maxobjs-per-page:9  pages-tofree:0
>  class-1808: Compaction is useless
>  class-1744 objs:2135  inuse:2131  maxobjs-per-page:7  pages-tofree:0
>  class-1744: Compaction is useless
>  class-1536 objs:1328  inuse:1323  maxobjs-per-page:8  pages-tofree:0
>  class-1536: Compaction is useless
>  class-1488 objs:8855  inuse:8847  maxobjs-per-page:11 pages-tofree:0
>  class-1488: Compaction is useless
>  class-1360 objs:14880 inuse:14878 maxobjs-per-page:3  pages-tofree:0
>  class-1360: Compaction is useless
>  class-1248 objs:3588  inuse:3577  maxobjs-per-page:13 pages-tofree:0
>  class-1248: Compaction is useless
>  class-1216 objs:3380  inuse:3372  maxobjs-per-page:10 pages-tofree:0
>  class-1216: Compaction is useless
>  class-1168 objs:3416  inuse:3401  maxobjs-per-page:7  pages-tofree:2
>  class-1168 objs:3409  inuse:3401  maxobjs-per-page:7  pages-tofree:1
>  class-1104 objs:605   inuse:599   maxobjs-per-page:11 pages-tofree:0
>  class-1104: Compaction is useless
> [..]
> 
> Every "Compaction is useless" indicates that we saved some CPU cycles.
> 
> class-1104 has
> 	605	object allocated
> 	599	objects used
> 	11	objects per-page
> 
> Even if we have a ALMOST_EMPTY zspage, we still don't have enough room to
> migrate all of its objects and free this zspage; so compaction will not
> make a lot of sense here, it's better to just leave it as is.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  mm/zsmalloc.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 0453347..5d8f1d4 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1682,6 +1682,28 @@ static struct page *isolate_source_page(struct size_class *class)
>  	return page;
>  }
>  
> +/*
> + * Make sure that we actually can compact this class,
> + * IOW if migration will empty at least one page.

                            free at least one zspage
> + *
> + * Should be called under class->lock
> + */
> +static unsigned long zs_can_compact(struct size_class *class)
> +{
> +	/*
> +	 * Calculate how many unused allocated objects we
> +	 * have and see if we can free any zspages. Otherwise,
> +	 * compaction can just move objects back and forth w/o
> +	 * any memory gain.
> +	 */
> +	unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
> +		zs_stat_get(class, OBJ_USED);
> +
> +	obj_wasted /= get_maxobj_per_zspage(class->size,
> +			class->pages_per_zspage);

I don't think we need division and make it simple.

        return obj_wasted >= get_maxobj_per_zspage

> +	return obj_wasted;
> +}
> +
>  static unsigned long __zs_compact(struct zs_pool *pool,
>  				struct size_class *class)
>  {
> @@ -1695,6 +1717,9 @@ static unsigned long __zs_compact(struct zs_pool *pool,
>  
>  		BUG_ON(!is_first_page(src_page));
>  
> +		if (!zs_can_compact(class))
> +			break;
> +
>  		cc.index = 0;
>  		cc.s_page = src_page;
>  
> -- 
> 2.4.2.387.gf86f31a
> 

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 5/8] zsmalloc: introduce zs_can_compact() function
@ 2015-06-16 14:19     ` Minchan Kim
  0 siblings, 0 replies; 52+ messages in thread
From: Minchan Kim @ 2015-06-16 14:19 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky

On Fri, Jun 05, 2015 at 09:03:55PM +0900, Sergey Senozhatsky wrote:
> This function checks if class compaction will free any pages.
> Rephrasing -- do we have enough unused objects to form at least
> one ZS_EMPTY page and free it. It aborts compaction if class
> compaction will not result in any (further) savings.
> 
> EXAMPLE (this debug output is not part of this patch set):
> 
> -- class size
> -- number of allocated objects
> -- number of used objects,
> -- estimated number of pages that will be freed
> 
> [..]
>  class-3072 objs:24652 inuse:24628 maxobjs-per-page:4  pages-tofree:6

Please use clear term. We have been used zspage as cluster of pages.

                                     maxobjs-per-zspage:4

And say what is pages-per-zspage for each class.
then, write how you calculate it for easy reviewing.

* class-3072
* pages-per-zspage: 3
* maxobjs-per-zspage = 4

In your example, allocated obj = 24652 and inuse obj = 24628
so 24652 - 24628 = 24 = 4(ie, maxobjs-per-zspage) * 6
so we can save 6 zspage. A zspage includes 3 pages so we can
save 3 * 6 = 18 pages via compaction.


>  class-3072 objs:24648 inuse:24628 maxobjs-per-page:4  pages-tofree:5
>  class-3072 objs:24644 inuse:24628 maxobjs-per-page:4  pages-tofree:4
>  class-3072 objs:24640 inuse:24628 maxobjs-per-page:4  pages-tofree:3
>  class-3072 objs:24636 inuse:24628 maxobjs-per-page:4  pages-tofree:2
>  class-3072 objs:24632 inuse:24628 maxobjs-per-page:4  pages-tofree:1
>  class-2720 objs:17970 inuse:17966 maxobjs-per-page:3  pages-tofree:1
>  class-2720 objs:17967 inuse:17966 maxobjs-per-page:3  pages-tofree:0
>  class-2720: Compaction is useless
>  class-2448 objs:7680  inuse:7674  maxobjs-per-page:5  pages-tofree:1
>  class-2336 objs:13510 inuse:13500 maxobjs-per-page:7  pages-tofree:1
>  class-2336 objs:13503 inuse:13500 maxobjs-per-page:7  pages-tofree:0
>  class-2336: Compaction is useless
>  class-1808 objs:1161  inuse:1154  maxobjs-per-page:9  pages-tofree:0
>  class-1808: Compaction is useless
>  class-1744 objs:2135  inuse:2131  maxobjs-per-page:7  pages-tofree:0
>  class-1744: Compaction is useless
>  class-1536 objs:1328  inuse:1323  maxobjs-per-page:8  pages-tofree:0
>  class-1536: Compaction is useless
>  class-1488 objs:8855  inuse:8847  maxobjs-per-page:11 pages-tofree:0
>  class-1488: Compaction is useless
>  class-1360 objs:14880 inuse:14878 maxobjs-per-page:3  pages-tofree:0
>  class-1360: Compaction is useless
>  class-1248 objs:3588  inuse:3577  maxobjs-per-page:13 pages-tofree:0
>  class-1248: Compaction is useless
>  class-1216 objs:3380  inuse:3372  maxobjs-per-page:10 pages-tofree:0
>  class-1216: Compaction is useless
>  class-1168 objs:3416  inuse:3401  maxobjs-per-page:7  pages-tofree:2
>  class-1168 objs:3409  inuse:3401  maxobjs-per-page:7  pages-tofree:1
>  class-1104 objs:605   inuse:599   maxobjs-per-page:11 pages-tofree:0
>  class-1104: Compaction is useless
> [..]
> 
> Every "Compaction is useless" indicates that we saved some CPU cycles.
> 
> class-1104 has
> 	605	object allocated
> 	599	objects used
> 	11	objects per-page
> 
> Even if we have a ALMOST_EMPTY zspage, we still don't have enough room to
> migrate all of its objects and free this zspage; so compaction will not
> make a lot of sense here, it's better to just leave it as is.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  mm/zsmalloc.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 0453347..5d8f1d4 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1682,6 +1682,28 @@ static struct page *isolate_source_page(struct size_class *class)
>  	return page;
>  }
>  
> +/*
> + * Make sure that we actually can compact this class,
> + * IOW if migration will empty at least one page.

                            free at least one zspage
> + *
> + * Should be called under class->lock
> + */
> +static unsigned long zs_can_compact(struct size_class *class)
> +{
> +	/*
> +	 * Calculate how many unused allocated objects we
> +	 * have and see if we can free any zspages. Otherwise,
> +	 * compaction can just move objects back and forth w/o
> +	 * any memory gain.
> +	 */
> +	unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
> +		zs_stat_get(class, OBJ_USED);
> +
> +	obj_wasted /= get_maxobj_per_zspage(class->size,
> +			class->pages_per_zspage);

I don't think we need division and make it simple.

        return obj_wasted >= get_maxobj_per_zspage

> +	return obj_wasted;
> +}
> +
>  static unsigned long __zs_compact(struct zs_pool *pool,
>  				struct size_class *class)
>  {
> @@ -1695,6 +1717,9 @@ static unsigned long __zs_compact(struct zs_pool *pool,
>  
>  		BUG_ON(!is_first_page(src_page));
>  
> +		if (!zs_can_compact(class))
> +			break;
> +
>  		cc.index = 0;
>  		cc.s_page = src_page;
>  
> -- 
> 2.4.2.387.gf86f31a
> 

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 2/8] zsmalloc: partial page ordering within a fullness_list
  2015-06-16 13:19     ` Minchan Kim
@ 2015-06-16 14:30       ` Sergey Senozhatsky
  -1 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-16 14:30 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel,
	Sergey Senozhatsky

Hello,

On (06/16/15 22:19), Minchan Kim wrote:
> Hello Sergey,
> 
> On Fri, Jun 05, 2015 at 09:03:52PM +0900, Sergey Senozhatsky wrote:
> > We want to see more ZS_FULL pages and less ZS_ALMOST_{FULL, EMPTY}
> > pages. Put a page with higher ->inuse count first within its
> > ->fullness_list, which will give us better chances to fill up this
> > page with new objects (find_get_zspage() return ->fullness_list head
> > for new object allocation), so some zspages will become
> > ZS_ALMOST_FULL/ZS_FULL quicker.
> > 
> > It performs a trivial and cheap ->inuse compare which does not slow
> > down zsmalloc, and in the worst case it keeps the list pages not in
> > any particular order, just like we do it now.
> 
> Fair enough.
> 
> I think it would be better with small cost and it matches current
> zsmalloc design which allocates a object from ALMOST_FULL zspage
> first to reduce memory footprint.
> 
> Although we uses page->lru to link zspages, it's not lru order
> so I like your idea.
> 
> One think I want to say is it doesn't need to be part of this patchset.
> I hope you gives some data to prove gain and includes it in changelog
> and resubmit, please.

OK, I'll do some testing and come back with some numbers.

Well, none of these patches will get into 4.2, so splitting
the patchset will make things a bit less easy, but I guess I
can do this.

	-ss

> > 
> > A more expensive solution could sort fullness_list by ->inuse count.
> > 
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > ---
> >  mm/zsmalloc.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > index ce3310c..cd37bda 100644
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -658,8 +658,16 @@ static void insert_zspage(struct page *page, struct size_class *class,
> >  		return;
> >  
> >  	head = &class->fullness_list[fullness];
> > -	if (*head)
> > -		list_add_tail(&page->lru, &(*head)->lru);
> > +	if (*head) {
> > +		/*
> > +		 * We want to see more ZS_FULL pages and less almost
> > +		 * empty/full. Put pages with higher ->inuse first.
> > +		 */
> > +		if (page->inuse < (*head)->inuse)
> > +			list_add_tail(&page->lru, &(*head)->lru);
> > +		else
> > +			list_add(&page->lru, &(*head)->lru);
> > +	}
> >  
> >  	*head = page;
> >  	zs_stat_inc(class, fullness == ZS_ALMOST_EMPTY ?
> > -- 
> > 2.4.2.387.gf86f31a
> > 
> 
> -- 
> Kind regards,
> Minchan Kim
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 2/8] zsmalloc: partial page ordering within a fullness_list
@ 2015-06-16 14:30       ` Sergey Senozhatsky
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-16 14:30 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel,
	Sergey Senozhatsky

Hello,

On (06/16/15 22:19), Minchan Kim wrote:
> Hello Sergey,
> 
> On Fri, Jun 05, 2015 at 09:03:52PM +0900, Sergey Senozhatsky wrote:
> > We want to see more ZS_FULL pages and less ZS_ALMOST_{FULL, EMPTY}
> > pages. Put a page with higher ->inuse count first within its
> > ->fullness_list, which will give us better chances to fill up this
> > page with new objects (find_get_zspage() return ->fullness_list head
> > for new object allocation), so some zspages will become
> > ZS_ALMOST_FULL/ZS_FULL quicker.
> > 
> > It performs a trivial and cheap ->inuse compare which does not slow
> > down zsmalloc, and in the worst case it keeps the list pages not in
> > any particular order, just like we do it now.
> 
> Fair enough.
> 
> I think it would be better with small cost and it matches current
> zsmalloc design which allocates a object from ALMOST_FULL zspage
> first to reduce memory footprint.
> 
> Although we uses page->lru to link zspages, it's not lru order
> so I like your idea.
> 
> One think I want to say is it doesn't need to be part of this patchset.
> I hope you gives some data to prove gain and includes it in changelog
> and resubmit, please.

OK, I'll do some testing and come back with some numbers.

Well, none of these patches will get into 4.2, so splitting
the patchset will make things a bit less easy, but I guess I
can do this.

	-ss

> > 
> > A more expensive solution could sort fullness_list by ->inuse count.
> > 
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > ---
> >  mm/zsmalloc.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > index ce3310c..cd37bda 100644
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -658,8 +658,16 @@ static void insert_zspage(struct page *page, struct size_class *class,
> >  		return;
> >  
> >  	head = &class->fullness_list[fullness];
> > -	if (*head)
> > -		list_add_tail(&page->lru, &(*head)->lru);
> > +	if (*head) {
> > +		/*
> > +		 * We want to see more ZS_FULL pages and less almost
> > +		 * empty/full. Put pages with higher ->inuse first.
> > +		 */
> > +		if (page->inuse < (*head)->inuse)
> > +			list_add_tail(&page->lru, &(*head)->lru);
> > +		else
> > +			list_add(&page->lru, &(*head)->lru);
> > +	}
> >  
> >  	*head = page;
> >  	zs_stat_inc(class, fullness == ZS_ALMOST_EMPTY ?
> > -- 
> > 2.4.2.387.gf86f31a
> > 
> 
> -- 
> Kind regards,
> Minchan Kim
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 3/8] zsmalloc: lower ZS_ALMOST_FULL waterline
  2015-06-16 13:37     ` Minchan Kim
@ 2015-06-16 14:35       ` Sergey Senozhatsky
  -1 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-16 14:35 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel,
	Sergey Senozhatsky

On (06/16/15 22:37), Minchan Kim wrote:
> On Fri, Jun 05, 2015 at 09:03:53PM +0900, Sergey Senozhatsky wrote:
> > get_fullness_group() considers 3/4 full pages as almost empty.
> > That, unfortunately, marks as ALMOST_EMPTY pages that we would
> > probably like to keep in ALMOST_FULL lists.
> > 
> > ALMOST_EMPTY:
> > [..]
> >   inuse: 3 max_objects: 4
> >   inuse: 5 max_objects: 7
> >   inuse: 5 max_objects: 7
> >   inuse: 2 max_objects: 3
> > [..]
> > 
> > For "inuse: 5 max_objexts: 7" ALMOST_EMPTY page, for example,
> > it'll take 2 obj_malloc to make the page FULL and 5 obj_free to
> > make it EMPTY. Compaction selects ALMOST_EMPTY pages as source
> > pages, which can result in extra object moves.
> > 
> > In other words, from compaction point of view, it makes more
> > sense to fill this page, rather than drain it.
> > 
> > Decrease ALMOST_FULL waterline to 2/3 of max capacity; which is,
> > of course, still imperfect, but can shorten compaction
> > execution time.
> 
> However, at worst case, once compaction is done, it could remain
> 33% fragment space while it can remain 25% fragment space in current.
> Maybe 25% wouldn't enough so we might need to scan ZS_ALMOST_FULL as
> source in future. Anyway, compaction is really slow path now so
> I prefer saving memory space by reduce internal fragmentation to
> performance caused more copy of objects.
> 

Well, agree. I think we can drop this one from the series for now.
It's very hard to support this patch with any numbers, etc. because
it depends on things that are out of zram/zsmalloc control -- IO and
data patterns.

(opposing this patch is also very hard, due to exactly same reason).

	-ss

> > 
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > ---
> >  mm/zsmalloc.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > index cd37bda..b94e281 100644
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -198,7 +198,7 @@ static int zs_size_classes;
> >   *
> >   * (see: fix_fullness_group())
> >   */
> > -static const int fullness_threshold_frac = 4;
> > +static const int fullness_threshold_frac = 3;
> >  
> >  struct size_class {
> >  	/*
> > @@ -633,7 +633,7 @@ static enum fullness_group get_fullness_group(struct page *page)
> >  		fg = ZS_EMPTY;
> >  	else if (inuse == max_objects)
> >  		fg = ZS_FULL;
> > -	else if (inuse <= 3 * max_objects / fullness_threshold_frac)
> > +	else if (inuse <= 2 * max_objects / fullness_threshold_frac)
> >  		fg = ZS_ALMOST_EMPTY;
> >  	else
> >  		fg = ZS_ALMOST_FULL;
> > -- 
> > 2.4.2.387.gf86f31a
> > 
> 
> -- 
> Kind regards,
> Minchan Kim
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 3/8] zsmalloc: lower ZS_ALMOST_FULL waterline
@ 2015-06-16 14:35       ` Sergey Senozhatsky
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-16 14:35 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel,
	Sergey Senozhatsky

On (06/16/15 22:37), Minchan Kim wrote:
> On Fri, Jun 05, 2015 at 09:03:53PM +0900, Sergey Senozhatsky wrote:
> > get_fullness_group() considers 3/4 full pages as almost empty.
> > That, unfortunately, marks as ALMOST_EMPTY pages that we would
> > probably like to keep in ALMOST_FULL lists.
> > 
> > ALMOST_EMPTY:
> > [..]
> >   inuse: 3 max_objects: 4
> >   inuse: 5 max_objects: 7
> >   inuse: 5 max_objects: 7
> >   inuse: 2 max_objects: 3
> > [..]
> > 
> > For "inuse: 5 max_objexts: 7" ALMOST_EMPTY page, for example,
> > it'll take 2 obj_malloc to make the page FULL and 5 obj_free to
> > make it EMPTY. Compaction selects ALMOST_EMPTY pages as source
> > pages, which can result in extra object moves.
> > 
> > In other words, from compaction point of view, it makes more
> > sense to fill this page, rather than drain it.
> > 
> > Decrease ALMOST_FULL waterline to 2/3 of max capacity; which is,
> > of course, still imperfect, but can shorten compaction
> > execution time.
> 
> However, at worst case, once compaction is done, it could remain
> 33% fragment space while it can remain 25% fragment space in current.
> Maybe 25% wouldn't enough so we might need to scan ZS_ALMOST_FULL as
> source in future. Anyway, compaction is really slow path now so
> I prefer saving memory space by reduce internal fragmentation to
> performance caused more copy of objects.
> 

Well, agree. I think we can drop this one from the series for now.
It's very hard to support this patch with any numbers, etc. because
it depends on things that are out of zram/zsmalloc control -- IO and
data patterns.

(opposing this patch is also very hard, due to exactly same reason).

	-ss

> > 
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > ---
> >  mm/zsmalloc.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > index cd37bda..b94e281 100644
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -198,7 +198,7 @@ static int zs_size_classes;
> >   *
> >   * (see: fix_fullness_group())
> >   */
> > -static const int fullness_threshold_frac = 4;
> > +static const int fullness_threshold_frac = 3;
> >  
> >  struct size_class {
> >  	/*
> > @@ -633,7 +633,7 @@ static enum fullness_group get_fullness_group(struct page *page)
> >  		fg = ZS_EMPTY;
> >  	else if (inuse == max_objects)
> >  		fg = ZS_FULL;
> > -	else if (inuse <= 3 * max_objects / fullness_threshold_frac)
> > +	else if (inuse <= 2 * max_objects / fullness_threshold_frac)
> >  		fg = ZS_ALMOST_EMPTY;
> >  	else
> >  		fg = ZS_ALMOST_FULL;
> > -- 
> > 2.4.2.387.gf86f31a
> > 
> 
> -- 
> Kind regards,
> Minchan Kim
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 5/8] zsmalloc: introduce zs_can_compact() function
  2015-06-16 14:19     ` Minchan Kim
@ 2015-06-16 14:41       ` Sergey Senozhatsky
  -1 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-16 14:41 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel,
	Sergey Senozhatsky

On (06/16/15 23:19), Minchan Kim wrote:
> On Fri, Jun 05, 2015 at 09:03:55PM +0900, Sergey Senozhatsky wrote:
> > This function checks if class compaction will free any pages.
> > Rephrasing -- do we have enough unused objects to form at least
> > one ZS_EMPTY page and free it. It aborts compaction if class
> > compaction will not result in any (further) savings.
> > 
> > EXAMPLE (this debug output is not part of this patch set):
> > 
> > -- class size
> > -- number of allocated objects
> > -- number of used objects,
> > -- estimated number of pages that will be freed
> > 
> > [..]
> >  class-3072 objs:24652 inuse:24628 maxobjs-per-page:4  pages-tofree:6
> 
> Please use clear term. We have been used zspage as cluster of pages.
> 
>                                      maxobjs-per-zspage:4
> 

OK, will correct.

class size
sats[OBJ_ALLOCATED]
stats[OBJ_USED]
get_maxobj_per_zspage()
+pages-per-zspage
zspages-to-free

> And say what is pages-per-zspage for each class.
> then, write how you calculate it for easy reviewing.
> 
> * class-3072
> * pages-per-zspage: 3
> * maxobjs-per-zspage = 4
> 
> In your example, allocated obj = 24652 and inuse obj = 24628
> so 24652 - 24628 = 24 = 4(ie, maxobjs-per-zspage) * 6
> so we can save 6 zspage. A zspage includes 3 pages so we can
> save 3 * 6 = 18 pages via compaction.
> 
> 

[..]

> > + * Make sure that we actually can compact this class,
> > + * IOW if migration will empty at least one page.
> 
>                             free at least one zspage

OK.

> > + *
> > + * Should be called under class->lock
> > + */
> > +static unsigned long zs_can_compact(struct size_class *class)
> > +{
> > +	/*
> > +	 * Calculate how many unused allocated objects we
> > +	 * have and see if we can free any zspages. Otherwise,
> > +	 * compaction can just move objects back and forth w/o
> > +	 * any memory gain.
> > +	 */
> > +	unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
> > +		zs_stat_get(class, OBJ_USED);
> > +
> > +	obj_wasted /= get_maxobj_per_zspage(class->size,
> > +			class->pages_per_zspage);
> 
> I don't think we need division and make it simple.
> 
>         return obj_wasted >= get_maxobj_per_zspage

I use this number later for the shrinker, as a shrinker.count_objects()
return value (total number of zsages that can be freed).


	-ss

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 5/8] zsmalloc: introduce zs_can_compact() function
@ 2015-06-16 14:41       ` Sergey Senozhatsky
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-16 14:41 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel,
	Sergey Senozhatsky

On (06/16/15 23:19), Minchan Kim wrote:
> On Fri, Jun 05, 2015 at 09:03:55PM +0900, Sergey Senozhatsky wrote:
> > This function checks if class compaction will free any pages.
> > Rephrasing -- do we have enough unused objects to form at least
> > one ZS_EMPTY page and free it. It aborts compaction if class
> > compaction will not result in any (further) savings.
> > 
> > EXAMPLE (this debug output is not part of this patch set):
> > 
> > -- class size
> > -- number of allocated objects
> > -- number of used objects,
> > -- estimated number of pages that will be freed
> > 
> > [..]
> >  class-3072 objs:24652 inuse:24628 maxobjs-per-page:4  pages-tofree:6
> 
> Please use clear term. We have been used zspage as cluster of pages.
> 
>                                      maxobjs-per-zspage:4
> 

OK, will correct.

class size
sats[OBJ_ALLOCATED]
stats[OBJ_USED]
get_maxobj_per_zspage()
+pages-per-zspage
zspages-to-free

> And say what is pages-per-zspage for each class.
> then, write how you calculate it for easy reviewing.
> 
> * class-3072
> * pages-per-zspage: 3
> * maxobjs-per-zspage = 4
> 
> In your example, allocated obj = 24652 and inuse obj = 24628
> so 24652 - 24628 = 24 = 4(ie, maxobjs-per-zspage) * 6
> so we can save 6 zspage. A zspage includes 3 pages so we can
> save 3 * 6 = 18 pages via compaction.
> 
> 

[..]

> > + * Make sure that we actually can compact this class,
> > + * IOW if migration will empty at least one page.
> 
>                             free at least one zspage

OK.

> > + *
> > + * Should be called under class->lock
> > + */
> > +static unsigned long zs_can_compact(struct size_class *class)
> > +{
> > +	/*
> > +	 * Calculate how many unused allocated objects we
> > +	 * have and see if we can free any zspages. Otherwise,
> > +	 * compaction can just move objects back and forth w/o
> > +	 * any memory gain.
> > +	 */
> > +	unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
> > +		zs_stat_get(class, OBJ_USED);
> > +
> > +	obj_wasted /= get_maxobj_per_zspage(class->size,
> > +			class->pages_per_zspage);
> 
> I don't think we need division and make it simple.
> 
>         return obj_wasted >= get_maxobj_per_zspage

I use this number later for the shrinker, as a shrinker.count_objects()
return value (total number of zsages that can be freed).


	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
  2015-06-05 12:03   ` Sergey Senozhatsky
@ 2015-06-16 14:47     ` Minchan Kim
  -1 siblings, 0 replies; 52+ messages in thread
From: Minchan Kim @ 2015-06-16 14:47 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky

On Fri, Jun 05, 2015 at 09:03:58PM +0900, Sergey Senozhatsky wrote:
> Perform automatic pool compaction by a shrinker when system
> is getting tight on memory.
> 
> Demonstration (this output is merely to show auto-compaction
> effectiveness and is not part of the code):
> [..]
> [ 4283.803766] zram0 zs_shrinker_scan freed 364
> [ 4285.398937] zram0 zs_shrinker_scan freed 471
> [ 4286.044095] zram0 zs_shrinker_scan freed 273
> [ 4286.951824] zram0 zs_shrinker_scan freed 312
> [ 4287.583563] zram0 zs_shrinker_scan freed 222
> [ 4289.360971] zram0 zs_shrinker_scan freed 343
> [ 4289.884653] zram0 zs_shrinker_scan freed 210
> [ 4291.204705] zram0 zs_shrinker_scan freed 175
> [ 4292.043656] zram0 zs_shrinker_scan freed 425
> [ 4292.273397] zram0 zs_shrinker_scan freed 109
> [ 4292.513351] zram0 zs_shrinker_scan freed 191
> [..]
> 
> cat /sys/block/zram0/mm_stat
>  2908798976 2061913167 2091438080        0 2128449536      868     6074
> 
> Compaction now has a relatively quick pool scan so we are able to
> estimate the number of pages that will be freed easily, which makes it
> possible to call this function from a shrinker->count_objects() callback.
> We also abort compaction as soon as we detect that we can't free any
> pages any more, preventing wasteful objects migrations. In the example
> above, "6074 objects were migrated" implies that we actually released
> zspages back to system.
> 
> The initial patch was triggering compaction from zs_free() for
> every ZS_ALMOST_EMPTY page. Minchan Kim proposed to use a slab
> shrinker.

First of all, thanks for mentioning me as proposer.
However, it's not a helpful comment for other reviewers and
anonymous people who will review this in future.

At least, write why I suggested it so others can understand
the pros/cons.

> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Reported-by: Minchan Kim <minchan@kernel.org>

I didn't report anything. ;-).


> ---
>  mm/zsmalloc.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 71 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index a81e75b..f262d8d 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -247,7 +247,9 @@ struct zs_pool {
>  	atomic_long_t		pages_allocated;
>  	/* How many objects were migrated */
>  	unsigned long		num_migrated;
> -
> +	/* Compact classes */
> +	struct shrinker		shrinker;
> +	bool			shrinker_enabled;
>  #ifdef CONFIG_ZSMALLOC_STAT
>  	struct dentry		*stat_dentry;
>  #endif
> @@ -1728,12 +1730,9 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
>  
>  		while ((dst_page = isolate_target_page(class))) {
>  			cc.d_page = dst_page;
> -			/*
> -			 * If there is no more space in dst_page, resched
> -			 * and see if anyone had allocated another zspage.
> -			 */
> +
>  			if (!migrate_zspage(pool, class, &cc))
> -				break;
> +				goto out;
>  
>  			putback_zspage(pool, class, dst_page);
>  		}
> @@ -1744,11 +1743,10 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
>  
>  		putback_zspage(pool, class, dst_page);
>  		putback_zspage(pool, class, src_page);
> -		spin_unlock(&class->lock);
> -		cond_resched();
> -		spin_lock(&class->lock);

So should we hold class lock until finishing the compaction of the class?
It would make horrible latency for other allocation from the class
in parallel.

I will review remain parts tomorrow(I hope) but what I want to say
before going sleep is:

I like the idea but still have a concern to lack of fragmented zspages
during memory pressure because auto-compaction will prevent fragment
most of time. Surely, using fragment space as buffer in heavy memory
pressure is not intened design so it could be fragile but I'm afraid
this feature might accelrate it and it ends up having a problem and
change current behavior in zram as swap.

I hope you test this feature with considering my concern.
Of course, I will test it with enough time.

Thanks.


>  	}
> -
> +out:
> +	if (dst_page)
> +		putback_zspage(pool, class, dst_page);
>  	if (src_page)
>  		putback_zspage(pool, class, src_page);
>  
> @@ -1772,6 +1770,65 @@ unsigned long zs_compact(struct zs_pool *pool)
>  }
>  EXPORT_SYMBOL_GPL(zs_compact);
>  
> +static unsigned long zs_shrinker_scan(struct shrinker *shrinker,
> +		struct shrink_control *sc)
> +{
> +	unsigned long freed;
> +	struct zs_pool *pool = container_of(shrinker, struct zs_pool,
> +			shrinker);
> +
> +	freed = pool->num_migrated;
> +	/* Compact classes and calculate compaction delta */
> +	freed = zs_compact(pool) - freed;
> +
> +	return freed ? freed : SHRINK_STOP;
> +}
> +
> +static unsigned long zs_shrinker_count(struct shrinker *shrinker,
> +		struct shrink_control *sc)
> +{
> +	int i;
> +	struct size_class *class;
> +	unsigned long to_free = 0;
> +	struct zs_pool *pool = container_of(shrinker, struct zs_pool,
> +			shrinker);
> +
> +	if (!pool->shrinker_enabled)
> +		return 0;
> +
> +	for (i = zs_size_classes - 1; i >= 0; i--) {
> +		class = pool->size_class[i];
> +		if (!class)
> +			continue;
> +		if (class->index != i)
> +			continue;
> +
> +		spin_lock(&class->lock);
> +		to_free += zs_can_compact(class);
> +		spin_unlock(&class->lock);
> +	}
> +
> +	return to_free;
> +}
> +
> +static void zs_unregister_shrinker(struct zs_pool *pool)
> +{
> +	if (pool->shrinker_enabled) {
> +		unregister_shrinker(&pool->shrinker);
> +		pool->shrinker_enabled = false;
> +	}
> +}
> +
> +static int zs_register_shrinker(struct zs_pool *pool)
> +{
> +	pool->shrinker.scan_objects = zs_shrinker_scan;
> +	pool->shrinker.count_objects = zs_shrinker_count;
> +	pool->shrinker.batch = 0;
> +	pool->shrinker.seeks = DEFAULT_SEEKS;
> +
> +	return register_shrinker(&pool->shrinker);
> +}
> +
>  /**
>   * zs_create_pool - Creates an allocation pool to work from.
>   * @flags: allocation flags used to allocate pool metadata
> @@ -1857,6 +1914,9 @@ struct zs_pool *zs_create_pool(char *name, gfp_t flags)
>  	if (zs_pool_stat_create(name, pool))
>  		goto err;
>  
> +	/* Not critical, we still can use the pool */
> +	if (zs_register_shrinker(pool) == 0)
> +		pool->shrinker_enabled = true;
>  	return pool;
>  
>  err:
> @@ -1869,6 +1929,7 @@ void zs_destroy_pool(struct zs_pool *pool)
>  {
>  	int i;
>  
> +	zs_unregister_shrinker(pool);
>  	zs_pool_stat_destroy(pool);
>  
>  	for (i = 0; i < zs_size_classes; i++) {
> -- 
> 2.4.2.387.gf86f31a
> 

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
@ 2015-06-16 14:47     ` Minchan Kim
  0 siblings, 0 replies; 52+ messages in thread
From: Minchan Kim @ 2015-06-16 14:47 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky

On Fri, Jun 05, 2015 at 09:03:58PM +0900, Sergey Senozhatsky wrote:
> Perform automatic pool compaction by a shrinker when system
> is getting tight on memory.
> 
> Demonstration (this output is merely to show auto-compaction
> effectiveness and is not part of the code):
> [..]
> [ 4283.803766] zram0 zs_shrinker_scan freed 364
> [ 4285.398937] zram0 zs_shrinker_scan freed 471
> [ 4286.044095] zram0 zs_shrinker_scan freed 273
> [ 4286.951824] zram0 zs_shrinker_scan freed 312
> [ 4287.583563] zram0 zs_shrinker_scan freed 222
> [ 4289.360971] zram0 zs_shrinker_scan freed 343
> [ 4289.884653] zram0 zs_shrinker_scan freed 210
> [ 4291.204705] zram0 zs_shrinker_scan freed 175
> [ 4292.043656] zram0 zs_shrinker_scan freed 425
> [ 4292.273397] zram0 zs_shrinker_scan freed 109
> [ 4292.513351] zram0 zs_shrinker_scan freed 191
> [..]
> 
> cat /sys/block/zram0/mm_stat
>  2908798976 2061913167 2091438080        0 2128449536      868     6074
> 
> Compaction now has a relatively quick pool scan so we are able to
> estimate the number of pages that will be freed easily, which makes it
> possible to call this function from a shrinker->count_objects() callback.
> We also abort compaction as soon as we detect that we can't free any
> pages any more, preventing wasteful objects migrations. In the example
> above, "6074 objects were migrated" implies that we actually released
> zspages back to system.
> 
> The initial patch was triggering compaction from zs_free() for
> every ZS_ALMOST_EMPTY page. Minchan Kim proposed to use a slab
> shrinker.

First of all, thanks for mentioning me as proposer.
However, it's not a helpful comment for other reviewers and
anonymous people who will review this in future.

At least, write why I suggested it so others can understand
the pros/cons.

> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Reported-by: Minchan Kim <minchan@kernel.org>

I didn't report anything. ;-).


> ---
>  mm/zsmalloc.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 71 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index a81e75b..f262d8d 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -247,7 +247,9 @@ struct zs_pool {
>  	atomic_long_t		pages_allocated;
>  	/* How many objects were migrated */
>  	unsigned long		num_migrated;
> -
> +	/* Compact classes */
> +	struct shrinker		shrinker;
> +	bool			shrinker_enabled;
>  #ifdef CONFIG_ZSMALLOC_STAT
>  	struct dentry		*stat_dentry;
>  #endif
> @@ -1728,12 +1730,9 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
>  
>  		while ((dst_page = isolate_target_page(class))) {
>  			cc.d_page = dst_page;
> -			/*
> -			 * If there is no more space in dst_page, resched
> -			 * and see if anyone had allocated another zspage.
> -			 */
> +
>  			if (!migrate_zspage(pool, class, &cc))
> -				break;
> +				goto out;
>  
>  			putback_zspage(pool, class, dst_page);
>  		}
> @@ -1744,11 +1743,10 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
>  
>  		putback_zspage(pool, class, dst_page);
>  		putback_zspage(pool, class, src_page);
> -		spin_unlock(&class->lock);
> -		cond_resched();
> -		spin_lock(&class->lock);

So should we hold class lock until finishing the compaction of the class?
It would make horrible latency for other allocation from the class
in parallel.

I will review remain parts tomorrow(I hope) but what I want to say
before going sleep is:

I like the idea but still have a concern to lack of fragmented zspages
during memory pressure because auto-compaction will prevent fragment
most of time. Surely, using fragment space as buffer in heavy memory
pressure is not intened design so it could be fragile but I'm afraid
this feature might accelrate it and it ends up having a problem and
change current behavior in zram as swap.

I hope you test this feature with considering my concern.
Of course, I will test it with enough time.

Thanks.


>  	}
> -
> +out:
> +	if (dst_page)
> +		putback_zspage(pool, class, dst_page);
>  	if (src_page)
>  		putback_zspage(pool, class, src_page);
>  
> @@ -1772,6 +1770,65 @@ unsigned long zs_compact(struct zs_pool *pool)
>  }
>  EXPORT_SYMBOL_GPL(zs_compact);
>  
> +static unsigned long zs_shrinker_scan(struct shrinker *shrinker,
> +		struct shrink_control *sc)
> +{
> +	unsigned long freed;
> +	struct zs_pool *pool = container_of(shrinker, struct zs_pool,
> +			shrinker);
> +
> +	freed = pool->num_migrated;
> +	/* Compact classes and calculate compaction delta */
> +	freed = zs_compact(pool) - freed;
> +
> +	return freed ? freed : SHRINK_STOP;
> +}
> +
> +static unsigned long zs_shrinker_count(struct shrinker *shrinker,
> +		struct shrink_control *sc)
> +{
> +	int i;
> +	struct size_class *class;
> +	unsigned long to_free = 0;
> +	struct zs_pool *pool = container_of(shrinker, struct zs_pool,
> +			shrinker);
> +
> +	if (!pool->shrinker_enabled)
> +		return 0;
> +
> +	for (i = zs_size_classes - 1; i >= 0; i--) {
> +		class = pool->size_class[i];
> +		if (!class)
> +			continue;
> +		if (class->index != i)
> +			continue;
> +
> +		spin_lock(&class->lock);
> +		to_free += zs_can_compact(class);
> +		spin_unlock(&class->lock);
> +	}
> +
> +	return to_free;
> +}
> +
> +static void zs_unregister_shrinker(struct zs_pool *pool)
> +{
> +	if (pool->shrinker_enabled) {
> +		unregister_shrinker(&pool->shrinker);
> +		pool->shrinker_enabled = false;
> +	}
> +}
> +
> +static int zs_register_shrinker(struct zs_pool *pool)
> +{
> +	pool->shrinker.scan_objects = zs_shrinker_scan;
> +	pool->shrinker.count_objects = zs_shrinker_count;
> +	pool->shrinker.batch = 0;
> +	pool->shrinker.seeks = DEFAULT_SEEKS;
> +
> +	return register_shrinker(&pool->shrinker);
> +}
> +
>  /**
>   * zs_create_pool - Creates an allocation pool to work from.
>   * @flags: allocation flags used to allocate pool metadata
> @@ -1857,6 +1914,9 @@ struct zs_pool *zs_create_pool(char *name, gfp_t flags)
>  	if (zs_pool_stat_create(name, pool))
>  		goto err;
>  
> +	/* Not critical, we still can use the pool */
> +	if (zs_register_shrinker(pool) == 0)
> +		pool->shrinker_enabled = true;
>  	return pool;
>  
>  err:
> @@ -1869,6 +1929,7 @@ void zs_destroy_pool(struct zs_pool *pool)
>  {
>  	int i;
>  
> +	zs_unregister_shrinker(pool);
>  	zs_pool_stat_destroy(pool);
>  
>  	for (i = 0; i < zs_size_classes; i++) {
> -- 
> 2.4.2.387.gf86f31a
> 

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
  2015-06-16 14:47     ` Minchan Kim
@ 2015-06-16 15:45       ` Sergey Senozhatsky
  -1 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-16 15:45 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel,
	Sergey Senozhatsky

On (06/16/15 23:47), Minchan Kim wrote:
[..]
> > 
> > Compaction now has a relatively quick pool scan so we are able to
> > estimate the number of pages that will be freed easily, which makes it
> > possible to call this function from a shrinker->count_objects() callback.
> > We also abort compaction as soon as we detect that we can't free any
> > pages any more, preventing wasteful objects migrations. In the example
> > above, "6074 objects were migrated" implies that we actually released
> > zspages back to system.
> > 
> > The initial patch was triggering compaction from zs_free() for
> > every ZS_ALMOST_EMPTY page. Minchan Kim proposed to use a slab
> > shrinker.
> 
> First of all, thanks for mentioning me as proposer.
> However, it's not a helpful comment for other reviewers and
> anonymous people who will review this in future.
> 
> At least, write why I suggested it so others can understand
> the pros/cons.

OK, this one is far from perfect. Will try to improve later.

> > 
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > Reported-by: Minchan Kim <minchan@kernel.org>
> 
> I didn't report anything. ;-).

:-)

> 
> > ---

[..]

> 
> So should we hold class lock until finishing the compaction of the class?
> It would make horrible latency for other allocation from the class
> in parallel.

hm, what's the difference with the existing implementation?
The 'new one' aborts when (a) !zs_can_compact() and (b) !migrate_zspage().
It holds the class lock less time than current compaction.

> I will review remain parts tomorrow(I hope) but what I want to say
> before going sleep is:
> 
> I like the idea but still have a concern to lack of fragmented zspages
> during memory pressure because auto-compaction will prevent fragment
> most of time. Surely, using fragment space as buffer in heavy memory
> pressure is not intened design so it could be fragile but I'm afraid
> this feature might accelrate it and it ends up having a problem and
> change current behavior in zram as swap.

Well, it's nearly impossible to prove anything with the numbers obtained
during some particular case. I agree that fragmentation can be both
'good' (depending on IO pattern) and 'bad'.

Auto-compaction of IDLE zram devices certainly makes sense, when system
is getting low on memory. zram devices are not always 'busy', serving
heavy IO. There may be N idle zram devices simply sitting and wasting
memory; or being 'moderately' busy; so compaction will not cause any
significant slow down there.

Auto-compaction of BUSY zram devices is less `desired', of course;
but not entirely terrible I think (zs_can_compact() can help here a
lot).

Just an idea
we can move shrinker registration from zsmalloc to zram. zram will be
able to STOP (or forbid) any shrinker activities while it [zram] serves
IO requests (or has requests in its request_queue).

But, again, advocating fragmentation is tricky.

I'll quote from the cover letter

: zsmalloc in some cases can suffer from a notable fragmentation and
: compaction can release some considerable amount of memory. The problem
: here is that currently we fully rely on user space to perform compaction
: when needed. However, performing zsmalloc compaction is not always an
: obvious thing to do. For example, suppose we have a `idle' fragmented
: (compaction was never performed) zram device and system is getting low
: on memory due to some 3rd party user processes (gcc LTO, or firefox, etc.).
: It's quite unlikely that user space will issue zpool compaction in this
: case. Besides, user space cannot tell for sure how badly pool is
: fragmented; however, this info is known to zsmalloc and, hence, to a
: shrinker.

I find this case (a) interesting and (b) quite possible.
/* Besides, this happens on one of my old x86_64 boxen all the time.
 And I do like/appreciate that zram automatically releases some memory. */

> I hope you test this feature with considering my concern.
> Of course, I will test it with enough time.
> 
> Thanks.
> 

sure.

Thanks.

	-ss

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
@ 2015-06-16 15:45       ` Sergey Senozhatsky
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-16 15:45 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel,
	Sergey Senozhatsky

On (06/16/15 23:47), Minchan Kim wrote:
[..]
> > 
> > Compaction now has a relatively quick pool scan so we are able to
> > estimate the number of pages that will be freed easily, which makes it
> > possible to call this function from a shrinker->count_objects() callback.
> > We also abort compaction as soon as we detect that we can't free any
> > pages any more, preventing wasteful objects migrations. In the example
> > above, "6074 objects were migrated" implies that we actually released
> > zspages back to system.
> > 
> > The initial patch was triggering compaction from zs_free() for
> > every ZS_ALMOST_EMPTY page. Minchan Kim proposed to use a slab
> > shrinker.
> 
> First of all, thanks for mentioning me as proposer.
> However, it's not a helpful comment for other reviewers and
> anonymous people who will review this in future.
> 
> At least, write why I suggested it so others can understand
> the pros/cons.

OK, this one is far from perfect. Will try to improve later.

> > 
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > Reported-by: Minchan Kim <minchan@kernel.org>
> 
> I didn't report anything. ;-).

:-)

> 
> > ---

[..]

> 
> So should we hold class lock until finishing the compaction of the class?
> It would make horrible latency for other allocation from the class
> in parallel.

hm, what's the difference with the existing implementation?
The 'new one' aborts when (a) !zs_can_compact() and (b) !migrate_zspage().
It holds the class lock less time than current compaction.

> I will review remain parts tomorrow(I hope) but what I want to say
> before going sleep is:
> 
> I like the idea but still have a concern to lack of fragmented zspages
> during memory pressure because auto-compaction will prevent fragment
> most of time. Surely, using fragment space as buffer in heavy memory
> pressure is not intened design so it could be fragile but I'm afraid
> this feature might accelrate it and it ends up having a problem and
> change current behavior in zram as swap.

Well, it's nearly impossible to prove anything with the numbers obtained
during some particular case. I agree that fragmentation can be both
'good' (depending on IO pattern) and 'bad'.

Auto-compaction of IDLE zram devices certainly makes sense, when system
is getting low on memory. zram devices are not always 'busy', serving
heavy IO. There may be N idle zram devices simply sitting and wasting
memory; or being 'moderately' busy; so compaction will not cause any
significant slow down there.

Auto-compaction of BUSY zram devices is less `desired', of course;
but not entirely terrible I think (zs_can_compact() can help here a
lot).

Just an idea
we can move shrinker registration from zsmalloc to zram. zram will be
able to STOP (or forbid) any shrinker activities while it [zram] serves
IO requests (or has requests in its request_queue).

But, again, advocating fragmentation is tricky.

I'll quote from the cover letter

: zsmalloc in some cases can suffer from a notable fragmentation and
: compaction can release some considerable amount of memory. The problem
: here is that currently we fully rely on user space to perform compaction
: when needed. However, performing zsmalloc compaction is not always an
: obvious thing to do. For example, suppose we have a `idle' fragmented
: (compaction was never performed) zram device and system is getting low
: on memory due to some 3rd party user processes (gcc LTO, or firefox, etc.).
: It's quite unlikely that user space will issue zpool compaction in this
: case. Besides, user space cannot tell for sure how badly pool is
: fragmented; however, this info is known to zsmalloc and, hence, to a
: shrinker.

I find this case (a) interesting and (b) quite possible.
/* Besides, this happens on one of my old x86_64 boxen all the time.
 And I do like/appreciate that zram automatically releases some memory. */

> I hope you test this feature with considering my concern.
> Of course, I will test it with enough time.
> 
> Thanks.
> 

sure.

Thanks.

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
  2015-06-16 14:47     ` Minchan Kim
@ 2015-06-17  7:11       ` Sergey Senozhatsky
  -1 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-17  7:11 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel,
	Sergey Senozhatsky

Hello,

On (06/16/15 23:47), Minchan Kim wrote:
[..]
> 
> I like the idea but still have a concern to lack of fragmented zspages
> during memory pressure because auto-compaction will prevent fragment
> most of time. Surely, using fragment space as buffer in heavy memory
> pressure is not intened design so it could be fragile but I'm afraid
> this feature might accelrate it and it ends up having a problem and
> change current behavior in zram as swap.
> 
> I hope you test this feature with considering my concern.
> Of course, I will test it with enough time.
> 

OK, to explore "compaction leaves no fragmentation in classes" I did some
heavy testing today -- parallel copy/remove of the linux kernel, git, glibc;
parallel builds (-j4), parallel clean ups (make clean); git gc, etc.


device's IO stats:
cat /sys/block/zram0/stat←
   277050        0  2216400     1463  8442846        0 67542768   106536        0   107810   108146

device's MM stats:
cat /sys/block/zram0/mm_stat←
 3095515136 2020518768 2057990144        0 2645716992     2030   182119


We auto-compacted (mostly auto-compacted, because I triggered manual compaction
less than 5 times)    182119  objects.


Now, during the compaction I also accounted the number of classes that ended up to
be 'fully compacted' (class->OBJ_ALLOCATED) == class->OBJ_USED) and 'partially
compacted'.


And the results (after 1377 compactions) are:

 pool compaction nr:1377 (full:487 part:35498)



So, we 'fully compact'-ed only 487/(35498 + 487) == 0.0135

roughtly ~1.35%

This argument does not stand anymore. We leave 'holes' in classes in ~98% of the
cases.



code that I used to gather those stats:

---

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 55cfda8..894773a 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -253,6 +253,9 @@ struct zs_pool {
 #ifdef CONFIG_ZSMALLOC_STAT
 	struct dentry		*stat_dentry;
 #endif
+	int			compaction_nr;
+	long			full_compact;
+	long			part_compact;
 };
 
 /*
@@ -1717,6 +1720,7 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
 	struct zs_compact_control cc;
 	struct page *src_page;
 	struct page *dst_page = NULL;
+	bool compacted = false;
 
 	spin_lock(&class->lock);
 	while ((src_page = isolate_source_page(class))) {
@@ -1726,6 +1730,8 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
 		if (!zs_can_compact(class))
 			break;
 
+		compacted = true;
+
 		cc.index = 0;
 		cc.s_page = src_page;
 
@@ -1751,6 +1757,13 @@ out:
 	if (src_page)
 		putback_zspage(pool, class, src_page);
 
+	if (compacted) {
+		if (zs_stat_get(class, OBJ_ALLOCATED) == zs_stat_get(class, OBJ_USED))
+			pool->full_compact++;
+		else
+			pool->part_compact++;
+	}
+
 	spin_unlock(&class->lock);
 }
 
@@ -1767,6 +1780,11 @@ unsigned long zs_compact(struct zs_pool *pool)
 			continue;
 		__zs_compact(pool, class);
 	}
+
+	pool->compaction_nr++;
+	pr_err("pool compaction nr:%d (full:%ld part:%ld)\n", pool->compaction_nr,
+			pool->full_compact, pool->part_compact);
+
 	return pool->num_migrated;
 }
 EXPORT_SYMBOL_GPL(zs_compact);

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
@ 2015-06-17  7:11       ` Sergey Senozhatsky
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-17  7:11 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel,
	Sergey Senozhatsky

Hello,

On (06/16/15 23:47), Minchan Kim wrote:
[..]
> 
> I like the idea but still have a concern to lack of fragmented zspages
> during memory pressure because auto-compaction will prevent fragment
> most of time. Surely, using fragment space as buffer in heavy memory
> pressure is not intened design so it could be fragile but I'm afraid
> this feature might accelrate it and it ends up having a problem and
> change current behavior in zram as swap.
> 
> I hope you test this feature with considering my concern.
> Of course, I will test it with enough time.
> 

OK, to explore "compaction leaves no fragmentation in classes" I did some
heavy testing today -- parallel copy/remove of the linux kernel, git, glibc;
parallel builds (-j4), parallel clean ups (make clean); git gc, etc.


device's IO stats:
cat /sys/block/zram0/stata??
   277050        0  2216400     1463  8442846        0 67542768   106536        0   107810   108146

device's MM stats:
cat /sys/block/zram0/mm_stata??
 3095515136 2020518768 2057990144        0 2645716992     2030   182119


We auto-compacted (mostly auto-compacted, because I triggered manual compaction
less than 5 times)    182119  objects.


Now, during the compaction I also accounted the number of classes that ended up to
be 'fully compacted' (class->OBJ_ALLOCATED) == class->OBJ_USED) and 'partially
compacted'.


And the results (after 1377 compactions) are:

 pool compaction nr:1377 (full:487 part:35498)



So, we 'fully compact'-ed only 487/(35498 + 487) == 0.0135

roughtly ~1.35%

This argument does not stand anymore. We leave 'holes' in classes in ~98% of the
cases.



code that I used to gather those stats:

---

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 55cfda8..894773a 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -253,6 +253,9 @@ struct zs_pool {
 #ifdef CONFIG_ZSMALLOC_STAT
 	struct dentry		*stat_dentry;
 #endif
+	int			compaction_nr;
+	long			full_compact;
+	long			part_compact;
 };
 
 /*
@@ -1717,6 +1720,7 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
 	struct zs_compact_control cc;
 	struct page *src_page;
 	struct page *dst_page = NULL;
+	bool compacted = false;
 
 	spin_lock(&class->lock);
 	while ((src_page = isolate_source_page(class))) {
@@ -1726,6 +1730,8 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
 		if (!zs_can_compact(class))
 			break;
 
+		compacted = true;
+
 		cc.index = 0;
 		cc.s_page = src_page;
 
@@ -1751,6 +1757,13 @@ out:
 	if (src_page)
 		putback_zspage(pool, class, src_page);
 
+	if (compacted) {
+		if (zs_stat_get(class, OBJ_ALLOCATED) == zs_stat_get(class, OBJ_USED))
+			pool->full_compact++;
+		else
+			pool->part_compact++;
+	}
+
 	spin_unlock(&class->lock);
 }
 
@@ -1767,6 +1780,11 @@ unsigned long zs_compact(struct zs_pool *pool)
 			continue;
 		__zs_compact(pool, class);
 	}
+
+	pool->compaction_nr++;
+	pr_err("pool compaction nr:%d (full:%ld part:%ld)\n", pool->compaction_nr,
+			pool->full_compact, pool->part_compact);
+
 	return pool->num_migrated;
 }
 EXPORT_SYMBOL_GPL(zs_compact);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
  2015-06-16 15:45       ` Sergey Senozhatsky
@ 2015-06-18  1:50         ` Minchan Kim
  -1 siblings, 0 replies; 52+ messages in thread
From: Minchan Kim @ 2015-06-18  1:50 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky

Hi Sergey,

On Wed, Jun 17, 2015 at 12:45:29AM +0900, Sergey Senozhatsky wrote:
> On (06/16/15 23:47), Minchan Kim wrote:
> [..]
> > > 
> > > Compaction now has a relatively quick pool scan so we are able to
> > > estimate the number of pages that will be freed easily, which makes it
> > > possible to call this function from a shrinker->count_objects() callback.
> > > We also abort compaction as soon as we detect that we can't free any
> > > pages any more, preventing wasteful objects migrations. In the example
> > > above, "6074 objects were migrated" implies that we actually released
> > > zspages back to system.
> > > 
> > > The initial patch was triggering compaction from zs_free() for
> > > every ZS_ALMOST_EMPTY page. Minchan Kim proposed to use a slab
> > > shrinker.
> > 
> > First of all, thanks for mentioning me as proposer.
> > However, it's not a helpful comment for other reviewers and
> > anonymous people who will review this in future.
> > 
> > At least, write why I suggested it so others can understand
> > the pros/cons.
> 
> OK, this one is far from perfect. Will try to improve later.
> 
> > > 
> > > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > > Reported-by: Minchan Kim <minchan@kernel.org>
> > 
> > I didn't report anything. ;-).
> 
> :-)
> 
> > 
> > > ---
> 
> [..]
> 
> > 
> > So should we hold class lock until finishing the compaction of the class?
> > It would make horrible latency for other allocation from the class
> > in parallel.
> 
> hm, what's the difference with the existing implementation?
> The 'new one' aborts when (a) !zs_can_compact() and (b) !migrate_zspage().
> It holds the class lock less time than current compaction.

At old, it unlocks periodically(ie, per-zspage migration) so other who
want to allocate a zspage in the class can have a chance but your patch
increases lock holding time until all of zspages in the class is done
so other will be blocked until all of zspage migration in the class is
done.

> 
> > I will review remain parts tomorrow(I hope) but what I want to say
> > before going sleep is:
> > 
> > I like the idea but still have a concern to lack of fragmented zspages
> > during memory pressure because auto-compaction will prevent fragment
> > most of time. Surely, using fragment space as buffer in heavy memory
> > pressure is not intened design so it could be fragile but I'm afraid
> > this feature might accelrate it and it ends up having a problem and
> > change current behavior in zram as swap.
> 
> Well, it's nearly impossible to prove anything with the numbers obtained
> during some particular case. I agree that fragmentation can be both
> 'good' (depending on IO pattern) and 'bad'.

Yes, it's not easy and I believe a few artificial testing are not enough
to prove no regression but we don't have any choice.
Actually, I think this patchset does make sense. Although it might have
a problem on situation heavy memory pressure by lacking of fragment space,
I think we should go with this patchset and fix the problem with another way
(e,g. memory pooling rather than relying on the luck of fragment).
But I need something to take the risk. That's why I ask the number
although it's not complete. It can cover a case at least, it is better than
none. :)

> 
> 
> Auto-compaction of IDLE zram devices certainly makes sense, when system
> is getting low on memory. zram devices are not always 'busy', serving
> heavy IO. There may be N idle zram devices simply sitting and wasting
> memory; or being 'moderately' busy; so compaction will not cause any
> significant slow down there.
> 
> Auto-compaction of BUSY zram devices is less `desired', of course;
> but not entirely terrible I think (zs_can_compact() can help here a
> lot).

My concern is not a compacion overhead but higher memory footprint
consumed by zram in reserved memory.
It might hang system if zram used up reserved memory of system with
ALLOC_NO_WATERMARKS. With auto-compaction, userspace has a higher chance
to use more memory with uncompressible pages or file-backed pages
so zram-swap can use more reserved memory. We need to evaluate it, I think.

> 
> Just an idea
> we can move shrinker registration from zsmalloc to zram. zram will be
> able to STOP (or forbid) any shrinker activities while it [zram] serves
> IO requests (or has requests in its request_queue).
> 
> But, again, advocating fragmentation is tricky.
> 
> 
> I'll quote from the cover letter
> 
> : zsmalloc in some cases can suffer from a notable fragmentation and
> : compaction can release some considerable amount of memory. The problem
> : here is that currently we fully rely on user space to perform compaction
> : when needed. However, performing zsmalloc compaction is not always an
> : obvious thing to do. For example, suppose we have a `idle' fragmented
> : (compaction was never performed) zram device and system is getting low
> : on memory due to some 3rd party user processes (gcc LTO, or firefox, etc.).
> : It's quite unlikely that user space will issue zpool compaction in this
> : case. Besides, user space cannot tell for sure how badly pool is
> : fragmented; however, this info is known to zsmalloc and, hence, to a
> : shrinker.
> 
> 
> I find this case (a) interesting and (b) quite possible.
> /* Besides, this happens on one of my old x86_64 boxen all the time.
>  And I do like/appreciate that zram automatically releases some memory. */
> 
> 
> > I hope you test this feature with considering my concern.
> > Of course, I will test it with enough time.
> > 
> > Thanks.
> > 
> 
> sure.
> 
> Thanks.
> 
> 	-ss

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
@ 2015-06-18  1:50         ` Minchan Kim
  0 siblings, 0 replies; 52+ messages in thread
From: Minchan Kim @ 2015-06-18  1:50 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky

Hi Sergey,

On Wed, Jun 17, 2015 at 12:45:29AM +0900, Sergey Senozhatsky wrote:
> On (06/16/15 23:47), Minchan Kim wrote:
> [..]
> > > 
> > > Compaction now has a relatively quick pool scan so we are able to
> > > estimate the number of pages that will be freed easily, which makes it
> > > possible to call this function from a shrinker->count_objects() callback.
> > > We also abort compaction as soon as we detect that we can't free any
> > > pages any more, preventing wasteful objects migrations. In the example
> > > above, "6074 objects were migrated" implies that we actually released
> > > zspages back to system.
> > > 
> > > The initial patch was triggering compaction from zs_free() for
> > > every ZS_ALMOST_EMPTY page. Minchan Kim proposed to use a slab
> > > shrinker.
> > 
> > First of all, thanks for mentioning me as proposer.
> > However, it's not a helpful comment for other reviewers and
> > anonymous people who will review this in future.
> > 
> > At least, write why I suggested it so others can understand
> > the pros/cons.
> 
> OK, this one is far from perfect. Will try to improve later.
> 
> > > 
> > > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > > Reported-by: Minchan Kim <minchan@kernel.org>
> > 
> > I didn't report anything. ;-).
> 
> :-)
> 
> > 
> > > ---
> 
> [..]
> 
> > 
> > So should we hold class lock until finishing the compaction of the class?
> > It would make horrible latency for other allocation from the class
> > in parallel.
> 
> hm, what's the difference with the existing implementation?
> The 'new one' aborts when (a) !zs_can_compact() and (b) !migrate_zspage().
> It holds the class lock less time than current compaction.

At old, it unlocks periodically(ie, per-zspage migration) so other who
want to allocate a zspage in the class can have a chance but your patch
increases lock holding time until all of zspages in the class is done
so other will be blocked until all of zspage migration in the class is
done.

> 
> > I will review remain parts tomorrow(I hope) but what I want to say
> > before going sleep is:
> > 
> > I like the idea but still have a concern to lack of fragmented zspages
> > during memory pressure because auto-compaction will prevent fragment
> > most of time. Surely, using fragment space as buffer in heavy memory
> > pressure is not intened design so it could be fragile but I'm afraid
> > this feature might accelrate it and it ends up having a problem and
> > change current behavior in zram as swap.
> 
> Well, it's nearly impossible to prove anything with the numbers obtained
> during some particular case. I agree that fragmentation can be both
> 'good' (depending on IO pattern) and 'bad'.

Yes, it's not easy and I believe a few artificial testing are not enough
to prove no regression but we don't have any choice.
Actually, I think this patchset does make sense. Although it might have
a problem on situation heavy memory pressure by lacking of fragment space,
I think we should go with this patchset and fix the problem with another way
(e,g. memory pooling rather than relying on the luck of fragment).
But I need something to take the risk. That's why I ask the number
although it's not complete. It can cover a case at least, it is better than
none. :)

> 
> 
> Auto-compaction of IDLE zram devices certainly makes sense, when system
> is getting low on memory. zram devices are not always 'busy', serving
> heavy IO. There may be N idle zram devices simply sitting and wasting
> memory; or being 'moderately' busy; so compaction will not cause any
> significant slow down there.
> 
> Auto-compaction of BUSY zram devices is less `desired', of course;
> but not entirely terrible I think (zs_can_compact() can help here a
> lot).

My concern is not a compacion overhead but higher memory footprint
consumed by zram in reserved memory.
It might hang system if zram used up reserved memory of system with
ALLOC_NO_WATERMARKS. With auto-compaction, userspace has a higher chance
to use more memory with uncompressible pages or file-backed pages
so zram-swap can use more reserved memory. We need to evaluate it, I think.

> 
> Just an idea
> we can move shrinker registration from zsmalloc to zram. zram will be
> able to STOP (or forbid) any shrinker activities while it [zram] serves
> IO requests (or has requests in its request_queue).
> 
> But, again, advocating fragmentation is tricky.
> 
> 
> I'll quote from the cover letter
> 
> : zsmalloc in some cases can suffer from a notable fragmentation and
> : compaction can release some considerable amount of memory. The problem
> : here is that currently we fully rely on user space to perform compaction
> : when needed. However, performing zsmalloc compaction is not always an
> : obvious thing to do. For example, suppose we have a `idle' fragmented
> : (compaction was never performed) zram device and system is getting low
> : on memory due to some 3rd party user processes (gcc LTO, or firefox, etc.).
> : It's quite unlikely that user space will issue zpool compaction in this
> : case. Besides, user space cannot tell for sure how badly pool is
> : fragmented; however, this info is known to zsmalloc and, hence, to a
> : shrinker.
> 
> 
> I find this case (a) interesting and (b) quite possible.
> /* Besides, this happens on one of my old x86_64 boxen all the time.
>  And I do like/appreciate that zram automatically releases some memory. */
> 
> 
> > I hope you test this feature with considering my concern.
> > Of course, I will test it with enough time.
> > 
> > Thanks.
> > 
> 
> sure.
> 
> Thanks.
> 
> 	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
  2015-06-18  1:50         ` Minchan Kim
@ 2015-06-18  2:41           ` Sergey Senozhatsky
  -1 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-18  2:41 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel,
	Sergey Senozhatsky

Hi,

On (06/18/15 10:50), Minchan Kim wrote:
[..]
> > hm, what's the difference with the existing implementation?
> > The 'new one' aborts when (a) !zs_can_compact() and (b) !migrate_zspage().
> > It holds the class lock less time than current compaction.
> 
> At old, it unlocks periodically(ie, per-zspage migration) so other who
> want to allocate a zspage in the class can have a chance but your patch
> increases lock holding time until all of zspages in the class is done
> so other will be blocked until all of zspage migration in the class is
> done.

ah, I see.
it doesn't hold the lock `until all the pages are done`. it holds it
as long as zs_can_compact() returns > 0. hm, I'm not entirely sure that
this patch set has increased the locking time (in average).


> > 
> > > I will review remain parts tomorrow(I hope) but what I want to say
> > > before going sleep is:
> > > 
> > > I like the idea but still have a concern to lack of fragmented zspages
> > > during memory pressure because auto-compaction will prevent fragment
> > > most of time. Surely, using fragment space as buffer in heavy memory
> > > pressure is not intened design so it could be fragile but I'm afraid
> > > this feature might accelrate it and it ends up having a problem and
> > > change current behavior in zram as swap.
> > 
> > Well, it's nearly impossible to prove anything with the numbers obtained
> > during some particular case. I agree that fragmentation can be both
> > 'good' (depending on IO pattern) and 'bad'.
> 
> Yes, it's not easy and I believe a few artificial testing are not enough
> to prove no regression but we don't have any choice.
> Actually, I think this patchset does make sense. Although it might have
> a problem on situation heavy memory pressure by lacking of fragment space,


I tested exactly this scenario yesterday (and sent an email). We leave `no holes'
in classes only in ~1.35% of cases. so, no, this argument is not valid. we preserve
fragmentation.

	-ss

> I think we should go with this patchset and fix the problem with another way
> (e,g. memory pooling rather than relying on the luck of fragment).
> But I need something to take the risk. That's why I ask the number
> although it's not complete. It can cover a case at least, it is better than
> none. :)
> 
> > 
> > 
> > Auto-compaction of IDLE zram devices certainly makes sense, when system
> > is getting low on memory. zram devices are not always 'busy', serving
> > heavy IO. There may be N idle zram devices simply sitting and wasting
> > memory; or being 'moderately' busy; so compaction will not cause any
> > significant slow down there.
> > 
> > Auto-compaction of BUSY zram devices is less `desired', of course;
> > but not entirely terrible I think (zs_can_compact() can help here a
> > lot).
> 
> My concern is not a compacion overhead but higher memory footprint
> consumed by zram in reserved memory.
> It might hang system if zram used up reserved memory of system with
> ALLOC_NO_WATERMARKS. With auto-compaction, userspace has a higher chance
> to use more memory with uncompressible pages or file-backed pages
> so zram-swap can use more reserved memory. We need to evaluate it, I think.
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
@ 2015-06-18  2:41           ` Sergey Senozhatsky
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-18  2:41 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel,
	Sergey Senozhatsky

Hi,

On (06/18/15 10:50), Minchan Kim wrote:
[..]
> > hm, what's the difference with the existing implementation?
> > The 'new one' aborts when (a) !zs_can_compact() and (b) !migrate_zspage().
> > It holds the class lock less time than current compaction.
> 
> At old, it unlocks periodically(ie, per-zspage migration) so other who
> want to allocate a zspage in the class can have a chance but your patch
> increases lock holding time until all of zspages in the class is done
> so other will be blocked until all of zspage migration in the class is
> done.

ah, I see.
it doesn't hold the lock `until all the pages are done`. it holds it
as long as zs_can_compact() returns > 0. hm, I'm not entirely sure that
this patch set has increased the locking time (in average).


> > 
> > > I will review remain parts tomorrow(I hope) but what I want to say
> > > before going sleep is:
> > > 
> > > I like the idea but still have a concern to lack of fragmented zspages
> > > during memory pressure because auto-compaction will prevent fragment
> > > most of time. Surely, using fragment space as buffer in heavy memory
> > > pressure is not intened design so it could be fragile but I'm afraid
> > > this feature might accelrate it and it ends up having a problem and
> > > change current behavior in zram as swap.
> > 
> > Well, it's nearly impossible to prove anything with the numbers obtained
> > during some particular case. I agree that fragmentation can be both
> > 'good' (depending on IO pattern) and 'bad'.
> 
> Yes, it's not easy and I believe a few artificial testing are not enough
> to prove no regression but we don't have any choice.
> Actually, I think this patchset does make sense. Although it might have
> a problem on situation heavy memory pressure by lacking of fragment space,


I tested exactly this scenario yesterday (and sent an email). We leave `no holes'
in classes only in ~1.35% of cases. so, no, this argument is not valid. we preserve
fragmentation.

	-ss

> I think we should go with this patchset and fix the problem with another way
> (e,g. memory pooling rather than relying on the luck of fragment).
> But I need something to take the risk. That's why I ask the number
> although it's not complete. It can cover a case at least, it is better than
> none. :)
> 
> > 
> > 
> > Auto-compaction of IDLE zram devices certainly makes sense, when system
> > is getting low on memory. zram devices are not always 'busy', serving
> > heavy IO. There may be N idle zram devices simply sitting and wasting
> > memory; or being 'moderately' busy; so compaction will not cause any
> > significant slow down there.
> > 
> > Auto-compaction of BUSY zram devices is less `desired', of course;
> > but not entirely terrible I think (zs_can_compact() can help here a
> > lot).
> 
> My concern is not a compacion overhead but higher memory footprint
> consumed by zram in reserved memory.
> It might hang system if zram used up reserved memory of system with
> ALLOC_NO_WATERMARKS. With auto-compaction, userspace has a higher chance
> to use more memory with uncompressible pages or file-backed pages
> so zram-swap can use more reserved memory. We need to evaluate it, I think.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
  2015-06-18  2:41           ` Sergey Senozhatsky
@ 2015-06-18  3:01             ` Sergey Senozhatsky
  -1 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-18  3:01 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Minchan Kim, Sergey Senozhatsky, Andrew Morton, linux-mm,
	linux-kernel

On (06/18/15 11:41), Sergey Senozhatsky wrote:
[..]
> > My concern is not a compacion overhead but higher memory footprint
> > consumed by zram in reserved memory.
> > It might hang system if zram used up reserved memory of system with
> > ALLOC_NO_WATERMARKS. With auto-compaction, userspace has a higher chance
> > to use more memory with uncompressible pages or file-backed pages
> > so zram-swap can use more reserved memory. We need to evaluate it, I think.
> > 

a couple of _not really related_ ideas that I want to voice.

(a) I'm thinking of extending zramX/compact attr. right now it's WO,
  and I think it makes sense to make it RW:
    ->write will trigger compaction
    ->read will return estimated number of bytes
  "zs_can_compact() * pages per zspage * page_size" that can be freed.
  so user-space will have at least minimal idea whether compaction is
  reasonable. but sure, this is racy and in general case things may
  change between `cat compact` and `echo 1 > compact`.


(b) adding a knob (yeah, like we don't have enough knobs already :-))
that will allow 'enable/disable auto compaction'.

	-ss

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
@ 2015-06-18  3:01             ` Sergey Senozhatsky
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-18  3:01 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Minchan Kim, Sergey Senozhatsky, Andrew Morton, linux-mm,
	linux-kernel

On (06/18/15 11:41), Sergey Senozhatsky wrote:
[..]
> > My concern is not a compacion overhead but higher memory footprint
> > consumed by zram in reserved memory.
> > It might hang system if zram used up reserved memory of system with
> > ALLOC_NO_WATERMARKS. With auto-compaction, userspace has a higher chance
> > to use more memory with uncompressible pages or file-backed pages
> > so zram-swap can use more reserved memory. We need to evaluate it, I think.
> > 

a couple of _not really related_ ideas that I want to voice.

(a) I'm thinking of extending zramX/compact attr. right now it's WO,
  and I think it makes sense to make it RW:
    ->write will trigger compaction
    ->read will return estimated number of bytes
  "zs_can_compact() * pages per zspage * page_size" that can be freed.
  so user-space will have at least minimal idea whether compaction is
  reasonable. but sure, this is racy and in general case things may
  change between `cat compact` and `echo 1 > compact`.


(b) adding a knob (yeah, like we don't have enough knobs already :-))
that will allow 'enable/disable auto compaction'.

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
  2015-06-18  2:41           ` Sergey Senozhatsky
@ 2015-06-18  3:39             ` Minchan Kim
  -1 siblings, 0 replies; 52+ messages in thread
From: Minchan Kim @ 2015-06-18  3:39 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel

On Thu, Jun 18, 2015 at 11:41:07AM +0900, Sergey Senozhatsky wrote:
> Hi,
> 
> On (06/18/15 10:50), Minchan Kim wrote:
> [..]
> > > hm, what's the difference with the existing implementation?
> > > The 'new one' aborts when (a) !zs_can_compact() and (b) !migrate_zspage().
> > > It holds the class lock less time than current compaction.
> > 
> > At old, it unlocks periodically(ie, per-zspage migration) so other who
> > want to allocate a zspage in the class can have a chance but your patch
> > increases lock holding time until all of zspages in the class is done
> > so other will be blocked until all of zspage migration in the class is
> > done.
> 
> ah, I see.
> it doesn't hold the lock `until all the pages are done`. it holds it
> as long as zs_can_compact() returns > 0. hm, I'm not entirely sure that
> this patch set has increased the locking time (in average).

I see your point. Sorry for the consusing.
My point is not average but max time. I bet your patch will increase
it and it will affect others who want to allocate zspage in parallel on
another CPU.

> 
> 
> > > 
> > > > I will review remain parts tomorrow(I hope) but what I want to say
> > > > before going sleep is:
> > > > 
> > > > I like the idea but still have a concern to lack of fragmented zspages
> > > > during memory pressure because auto-compaction will prevent fragment
> > > > most of time. Surely, using fragment space as buffer in heavy memory
> > > > pressure is not intened design so it could be fragile but I'm afraid
> > > > this feature might accelrate it and it ends up having a problem and
> > > > change current behavior in zram as swap.
> > > 
> > > Well, it's nearly impossible to prove anything with the numbers obtained
> > > during some particular case. I agree that fragmentation can be both
> > > 'good' (depending on IO pattern) and 'bad'.
> > 
> > Yes, it's not easy and I believe a few artificial testing are not enough
> > to prove no regression but we don't have any choice.
> > Actually, I think this patchset does make sense. Although it might have
> > a problem on situation heavy memory pressure by lacking of fragment space,
> 
> 
> I tested exactly this scenario yesterday (and sent an email). We leave `no holes'
> in classes only in ~1.35% of cases. so, no, this argument is not valid. we preserve
> fragmentation.

Thanks, Sergey.

I want to test by myself to simulate worst case scenario to make to use up
reserved memory by zram. For it, please fix below first and resubmit, please.

1. doesn't hold lock until class compation is done.
   It could prevent another allocation on another CPU.
   I want to make worst case scenario and it needs it.

2. No touch ZS_ALMOST_FULL waterline. It can put more zspages
   in ZS_ALMOST_FULL list so it couldn't be selected by migration
   source.

With new patchset, I want to watch min(free_pages of the system),
zram.max_used_pages, testing time and so on.

Really sorry for bothering you, Sergey but I think it's important
feature on zram so I want to be careful because risk management is
my role.

> 
> 	-ss
> 
> > I think we should go with this patchset and fix the problem with another way
> > (e,g. memory pooling rather than relying on the luck of fragment).
> > But I need something to take the risk. That's why I ask the number
> > although it's not complete. It can cover a case at least, it is better than
> > none. :)
> > 
> > > 
> > > 
> > > Auto-compaction of IDLE zram devices certainly makes sense, when system
> > > is getting low on memory. zram devices are not always 'busy', serving
> > > heavy IO. There may be N idle zram devices simply sitting and wasting
> > > memory; or being 'moderately' busy; so compaction will not cause any
> > > significant slow down there.
> > > 
> > > Auto-compaction of BUSY zram devices is less `desired', of course;
> > > but not entirely terrible I think (zs_can_compact() can help here a
> > > lot).
> > 
> > My concern is not a compacion overhead but higher memory footprint
> > consumed by zram in reserved memory.
> > It might hang system if zram used up reserved memory of system with
> > ALLOC_NO_WATERMARKS. With auto-compaction, userspace has a higher chance
> > to use more memory with uncompressible pages or file-backed pages
> > so zram-swap can use more reserved memory. We need to evaluate it, I think.
> > 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
@ 2015-06-18  3:39             ` Minchan Kim
  0 siblings, 0 replies; 52+ messages in thread
From: Minchan Kim @ 2015-06-18  3:39 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel

On Thu, Jun 18, 2015 at 11:41:07AM +0900, Sergey Senozhatsky wrote:
> Hi,
> 
> On (06/18/15 10:50), Minchan Kim wrote:
> [..]
> > > hm, what's the difference with the existing implementation?
> > > The 'new one' aborts when (a) !zs_can_compact() and (b) !migrate_zspage().
> > > It holds the class lock less time than current compaction.
> > 
> > At old, it unlocks periodically(ie, per-zspage migration) so other who
> > want to allocate a zspage in the class can have a chance but your patch
> > increases lock holding time until all of zspages in the class is done
> > so other will be blocked until all of zspage migration in the class is
> > done.
> 
> ah, I see.
> it doesn't hold the lock `until all the pages are done`. it holds it
> as long as zs_can_compact() returns > 0. hm, I'm not entirely sure that
> this patch set has increased the locking time (in average).

I see your point. Sorry for the consusing.
My point is not average but max time. I bet your patch will increase
it and it will affect others who want to allocate zspage in parallel on
another CPU.

> 
> 
> > > 
> > > > I will review remain parts tomorrow(I hope) but what I want to say
> > > > before going sleep is:
> > > > 
> > > > I like the idea but still have a concern to lack of fragmented zspages
> > > > during memory pressure because auto-compaction will prevent fragment
> > > > most of time. Surely, using fragment space as buffer in heavy memory
> > > > pressure is not intened design so it could be fragile but I'm afraid
> > > > this feature might accelrate it and it ends up having a problem and
> > > > change current behavior in zram as swap.
> > > 
> > > Well, it's nearly impossible to prove anything with the numbers obtained
> > > during some particular case. I agree that fragmentation can be both
> > > 'good' (depending on IO pattern) and 'bad'.
> > 
> > Yes, it's not easy and I believe a few artificial testing are not enough
> > to prove no regression but we don't have any choice.
> > Actually, I think this patchset does make sense. Although it might have
> > a problem on situation heavy memory pressure by lacking of fragment space,
> 
> 
> I tested exactly this scenario yesterday (and sent an email). We leave `no holes'
> in classes only in ~1.35% of cases. so, no, this argument is not valid. we preserve
> fragmentation.

Thanks, Sergey.

I want to test by myself to simulate worst case scenario to make to use up
reserved memory by zram. For it, please fix below first and resubmit, please.

1. doesn't hold lock until class compation is done.
   It could prevent another allocation on another CPU.
   I want to make worst case scenario and it needs it.

2. No touch ZS_ALMOST_FULL waterline. It can put more zspages
   in ZS_ALMOST_FULL list so it couldn't be selected by migration
   source.

With new patchset, I want to watch min(free_pages of the system),
zram.max_used_pages, testing time and so on.

Really sorry for bothering you, Sergey but I think it's important
feature on zram so I want to be careful because risk management is
my role.

> 
> 	-ss
> 
> > I think we should go with this patchset and fix the problem with another way
> > (e,g. memory pooling rather than relying on the luck of fragment).
> > But I need something to take the risk. That's why I ask the number
> > although it's not complete. It can cover a case at least, it is better than
> > none. :)
> > 
> > > 
> > > 
> > > Auto-compaction of IDLE zram devices certainly makes sense, when system
> > > is getting low on memory. zram devices are not always 'busy', serving
> > > heavy IO. There may be N idle zram devices simply sitting and wasting
> > > memory; or being 'moderately' busy; so compaction will not cause any
> > > significant slow down there.
> > > 
> > > Auto-compaction of BUSY zram devices is less `desired', of course;
> > > but not entirely terrible I think (zs_can_compact() can help here a
> > > lot).
> > 
> > My concern is not a compacion overhead but higher memory footprint
> > consumed by zram in reserved memory.
> > It might hang system if zram used up reserved memory of system with
> > ALLOC_NO_WATERMARKS. With auto-compaction, userspace has a higher chance
> > to use more memory with uncompressible pages or file-backed pages
> > so zram-swap can use more reserved memory. We need to evaluate it, I think.
> > 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
  2015-06-18  3:01             ` Sergey Senozhatsky
@ 2015-06-18  3:46               ` Minchan Kim
  -1 siblings, 0 replies; 52+ messages in thread
From: Minchan Kim @ 2015-06-18  3:46 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel

On Thu, Jun 18, 2015 at 12:01:36PM +0900, Sergey Senozhatsky wrote:
> On (06/18/15 11:41), Sergey Senozhatsky wrote:
> [..]
> > > My concern is not a compacion overhead but higher memory footprint
> > > consumed by zram in reserved memory.
> > > It might hang system if zram used up reserved memory of system with
> > > ALLOC_NO_WATERMARKS. With auto-compaction, userspace has a higher chance
> > > to use more memory with uncompressible pages or file-backed pages
> > > so zram-swap can use more reserved memory. We need to evaluate it, I think.
> > > 
> 
> a couple of _not really related_ ideas that I want to voice.
> 
> (a) I'm thinking of extending zramX/compact attr. right now it's WO,
>   and I think it makes sense to make it RW:
>     ->write will trigger compaction
>     ->read will return estimated number of bytes
>   "zs_can_compact() * pages per zspage * page_size" that can be freed.
>   so user-space will have at least minimal idea whether compaction is
>   reasonable. but sure, this is racy and in general case things may
>   change between `cat compact` and `echo 1 > compact`.

It's a good idea. with that, memory manager on platform could be smart.

if memory pressure == soft and zram.can_compact > 20M
        do zram.compact
if memory pressure == hard and zram.can_compact > 5M
        do zram.compact

With this, userspace have more flexibility. :)

However, FYI, I want to make auto-compact default in future
so let's see how auto-compact is going.

> 
> 
> (b) adding a knob (yeah, like we don't have enough knobs already :-))
> that will allow 'enable/disable auto compaction'.

I agree.

> 
> 	-ss

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
@ 2015-06-18  3:46               ` Minchan Kim
  0 siblings, 0 replies; 52+ messages in thread
From: Minchan Kim @ 2015-06-18  3:46 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, linux-mm, linux-kernel

On Thu, Jun 18, 2015 at 12:01:36PM +0900, Sergey Senozhatsky wrote:
> On (06/18/15 11:41), Sergey Senozhatsky wrote:
> [..]
> > > My concern is not a compacion overhead but higher memory footprint
> > > consumed by zram in reserved memory.
> > > It might hang system if zram used up reserved memory of system with
> > > ALLOC_NO_WATERMARKS. With auto-compaction, userspace has a higher chance
> > > to use more memory with uncompressible pages or file-backed pages
> > > so zram-swap can use more reserved memory. We need to evaluate it, I think.
> > > 
> 
> a couple of _not really related_ ideas that I want to voice.
> 
> (a) I'm thinking of extending zramX/compact attr. right now it's WO,
>   and I think it makes sense to make it RW:
>     ->write will trigger compaction
>     ->read will return estimated number of bytes
>   "zs_can_compact() * pages per zspage * page_size" that can be freed.
>   so user-space will have at least minimal idea whether compaction is
>   reasonable. but sure, this is racy and in general case things may
>   change between `cat compact` and `echo 1 > compact`.

It's a good idea. with that, memory manager on platform could be smart.

if memory pressure == soft and zram.can_compact > 20M
        do zram.compact
if memory pressure == hard and zram.can_compact > 5M
        do zram.compact

With this, userspace have more flexibility. :)

However, FYI, I want to make auto-compact default in future
so let's see how auto-compact is going.

> 
> 
> (b) adding a knob (yeah, like we don't have enough knobs already :-))
> that will allow 'enable/disable auto compaction'.

I agree.

> 
> 	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
  2015-06-18  3:39             ` Minchan Kim
@ 2015-06-18  3:58               ` Sergey Senozhatsky
  -1 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-18  3:58 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton, linux-mm,
	linux-kernel

On (06/18/15 12:39), Minchan Kim wrote:
[..]
> > ah, I see.
> > it doesn't hold the lock `until all the pages are done`. it holds it
> > as long as zs_can_compact() returns > 0. hm, I'm not entirely sure that
> > this patch set has increased the locking time (in average).
> 
> I see your point. Sorry for the consusing.
> My point is not average but max time. I bet your patch will increase
> it and it will affect others who want to allocate zspage in parallel on
> another CPU.

makes sense.

[..]
> > > Yes, it's not easy and I believe a few artificial testing are not enough
> > > to prove no regression but we don't have any choice.
> > > Actually, I think this patchset does make sense. Although it might have
> > > a problem on situation heavy memory pressure by lacking of fragment space,
> > 
> > 
> > I tested exactly this scenario yesterday (and sent an email). We leave `no holes'
> > in classes only in ~1.35% of cases. so, no, this argument is not valid. we preserve
> > fragmentation.
> 
> Thanks, Sergey.
> 
> I want to test by myself to simulate worst case scenario to make to use up
> reserved memory by zram. For it, please fix below first and resubmit, please.
> 
> 1. doesn't hold lock until class compation is done.
>    It could prevent another allocation on another CPU.
>    I want to make worst case scenario and it needs it.
> 
> 2. No touch ZS_ALMOST_FULL waterline. It can put more zspages
>    in ZS_ALMOST_FULL list so it couldn't be selected by migration
>    source.
> 
> With new patchset, I want to watch min(free_pages of the system),
> zram.max_used_pages, testing time and so on.
> 
> Really sorry for bothering you, Sergey but I think it's important
> feature on zram so I want to be careful because risk management is
> my role.

ok. will take a day or two to gather new numbers.

	-ss

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction
@ 2015-06-18  3:58               ` Sergey Senozhatsky
  0 siblings, 0 replies; 52+ messages in thread
From: Sergey Senozhatsky @ 2015-06-18  3:58 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton, linux-mm,
	linux-kernel

On (06/18/15 12:39), Minchan Kim wrote:
[..]
> > ah, I see.
> > it doesn't hold the lock `until all the pages are done`. it holds it
> > as long as zs_can_compact() returns > 0. hm, I'm not entirely sure that
> > this patch set has increased the locking time (in average).
> 
> I see your point. Sorry for the consusing.
> My point is not average but max time. I bet your patch will increase
> it and it will affect others who want to allocate zspage in parallel on
> another CPU.

makes sense.

[..]
> > > Yes, it's not easy and I believe a few artificial testing are not enough
> > > to prove no regression but we don't have any choice.
> > > Actually, I think this patchset does make sense. Although it might have
> > > a problem on situation heavy memory pressure by lacking of fragment space,
> > 
> > 
> > I tested exactly this scenario yesterday (and sent an email). We leave `no holes'
> > in classes only in ~1.35% of cases. so, no, this argument is not valid. we preserve
> > fragmentation.
> 
> Thanks, Sergey.
> 
> I want to test by myself to simulate worst case scenario to make to use up
> reserved memory by zram. For it, please fix below first and resubmit, please.
> 
> 1. doesn't hold lock until class compation is done.
>    It could prevent another allocation on another CPU.
>    I want to make worst case scenario and it needs it.
> 
> 2. No touch ZS_ALMOST_FULL waterline. It can put more zspages
>    in ZS_ALMOST_FULL list so it couldn't be selected by migration
>    source.
> 
> With new patchset, I want to watch min(free_pages of the system),
> zram.max_used_pages, testing time and so on.
> 
> Really sorry for bothering you, Sergey but I think it's important
> feature on zram so I want to be careful because risk management is
> my role.

ok. will take a day or two to gather new numbers.

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2015-06-18  3:58 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-05 12:03 [RFC][PATCHv2 0/8] introduce automatic pool compaction Sergey Senozhatsky
2015-06-05 12:03 ` Sergey Senozhatsky
2015-06-05 12:03 ` [RFC][PATCHv2 1/8] zsmalloc: drop unused variable `nr_to_migrate' Sergey Senozhatsky
2015-06-05 12:03   ` Sergey Senozhatsky
2015-06-05 12:03 ` [RFC][PATCHv2 2/8] zsmalloc: partial page ordering within a fullness_list Sergey Senozhatsky
2015-06-05 12:03   ` Sergey Senozhatsky
2015-06-16 13:19   ` Minchan Kim
2015-06-16 13:19     ` Minchan Kim
2015-06-16 14:30     ` Sergey Senozhatsky
2015-06-16 14:30       ` Sergey Senozhatsky
2015-06-05 12:03 ` [RFC][PATCHv2 3/8] zsmalloc: lower ZS_ALMOST_FULL waterline Sergey Senozhatsky
2015-06-05 12:03   ` Sergey Senozhatsky
2015-06-16 13:37   ` Minchan Kim
2015-06-16 13:37     ` Minchan Kim
2015-06-16 14:35     ` Sergey Senozhatsky
2015-06-16 14:35       ` Sergey Senozhatsky
2015-06-05 12:03 ` [RFC][PATCHv2 4/8] zsmalloc: always keep per-class stats Sergey Senozhatsky
2015-06-05 12:03   ` Sergey Senozhatsky
2015-06-05 12:03 ` [RFC][PATCHv2 5/8] zsmalloc: introduce zs_can_compact() function Sergey Senozhatsky
2015-06-05 12:03   ` Sergey Senozhatsky
2015-06-16 14:19   ` Minchan Kim
2015-06-16 14:19     ` Minchan Kim
2015-06-16 14:41     ` Sergey Senozhatsky
2015-06-16 14:41       ` Sergey Senozhatsky
2015-06-05 12:03 ` [RFC][PATCHv2 6/8] zsmalloc: cosmetic compaction code adjustments Sergey Senozhatsky
2015-06-05 12:03   ` Sergey Senozhatsky
2015-06-05 12:03 ` [RFC][PATCHv2 7/8] zsmalloc/zram: move `num_migrated' to zs_pool Sergey Senozhatsky
2015-06-05 12:03   ` Sergey Senozhatsky
2015-06-05 12:03 ` [RFC][PATCHv2 8/8] zsmalloc: register a shrinker to trigger auto-compaction Sergey Senozhatsky
2015-06-05 12:03   ` Sergey Senozhatsky
2015-06-16 14:47   ` Minchan Kim
2015-06-16 14:47     ` Minchan Kim
2015-06-16 15:45     ` Sergey Senozhatsky
2015-06-16 15:45       ` Sergey Senozhatsky
2015-06-18  1:50       ` Minchan Kim
2015-06-18  1:50         ` Minchan Kim
2015-06-18  2:41         ` Sergey Senozhatsky
2015-06-18  2:41           ` Sergey Senozhatsky
2015-06-18  3:01           ` Sergey Senozhatsky
2015-06-18  3:01             ` Sergey Senozhatsky
2015-06-18  3:46             ` Minchan Kim
2015-06-18  3:46               ` Minchan Kim
2015-06-18  3:39           ` Minchan Kim
2015-06-18  3:39             ` Minchan Kim
2015-06-18  3:58             ` Sergey Senozhatsky
2015-06-18  3:58               ` Sergey Senozhatsky
2015-06-17  7:11     ` Sergey Senozhatsky
2015-06-17  7:11       ` Sergey Senozhatsky
2015-06-10  0:04 ` [RFC][PATCHv2 0/8] introduce automatic pool compaction Minchan Kim
2015-06-10  0:04   ` Minchan Kim
2015-06-10  0:07   ` Sergey Senozhatsky
2015-06-10  0:07     ` Sergey Senozhatsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.